Professional Documents
Culture Documents
Olaf Wolkenhauer
Kwang-Hyun Cho
Hiroki Yokota
Editors
Encyclopedia of
Systems Biology
1 3Reference
Encyclopedia of Systems Biology
Werner Dubitzky • Olaf Wolkenhauer
Kwang-Hyun Cho • Hiroki Yokota
Editors
Encyclopedia of
Systems Biology
Systems biology refers to the quantitative analysis of the dynamic interactions among
multiple components of a biological system and aims to understand the behavior of
the system as a whole. Systems biology involves the development and application of
system-theoretic concepts for the study of complex biological systems through
iteration over mathematical modeling, computational simulation and biological
experimentation. Systems biology could be viewed as a new concept and tool to
increase our understanding of biological systems, to develop more directed experi-
ments, and to allow accurate predictions.
Systems biology is among the most rapidly growing fields of scientific research
and technology development. Over the past ten to fifteen years, the research activities
in this field have been increasing exponentially. This is evidenced by the number of
published papers as well as the growing number of conferences, journals, research
institutions and centers, and academic events on the topic of systems biology. While
research, development and applications in this field are becoming more widespread,
one major challenge is the lack of a single reference resource providing up-to-date
overviews and definitions of the major concepts and subjects across the various sub-
areas of systems biology. The Encyclopedia of Systems Biology is addressing this
lack and provides a unified information resource. It has been conceived as
a comprehensive reference work covering all aspects of systems biology, including
the investigation of living matter through a tight coupling of biological experimen-
tation, mathematical modeling and computational analysis and simulation. The main
aim of the Encyclopedia is to provide a complete reference of established knowledge
in systems biology – a ‘one-stop shop’ for someone seeking information on key
concepts in this field. As a result, the Encyclopedia comprises a broad range of topics
relevant in the context of systems biology.
The audience targeted by the Encyclopedia includes researchers, developers,
teachers, students and practitioners who are studying or working in the field of
systems biology. Keeping in mind the varying needs of the potential readership, we
have structured and presented the content in a way that is accessible to readers from
a wide range of backgrounds. In contrast to encyclopedic online resources, which
often rely on the general public to author their content, a key consideration in the
development of the Encyclopedia of Systems Biology was to have subject matter
experts define the concepts and subjects of systems biology. We believe that there is
no other treatment of this subject in terms of depth and authority of the field.
Ultimately, systems biology represents a trans-disciplinary scientific paradigm, in
which researchers from diverse backgrounds work jointly using a shared conceptual
framework and combined disciplinary-specific approaches to address complex life
science R&D problems. Thus, the Encyclopedia of Systems Biology derives its
v
vi Preface
raison d’eˆtre from a modern trans-disciplinary perspective on how to think about and
explore complex biological phenomena. We believe that this new publication will
help to define the meaning and scope of systems biology, and to extend the impact that
systems biology will have on future developments in the life sciences.
Acknowledgments
It has taken several years of hard work by many people to bring this Encyclopedia to
life. We would like to give our sincere thanks to the members of the Editorial Board,
the contributing authors and the reviewers. Without their expertise, dedication and
continuous efforts this comprehensive reference work would not exist.
We wish to thank Joseph Burns (Senior Editor), Sandra Fabiani (Executive Editor)
and Melanie Tucker (Editor) who proposed the project to us and provided invaluable
counsel in shaping and developing this reference work. We wish to express our
profound gratitude to Sylvia Blago and Simone Giesler from Springer for their
outstanding editorial efforts in producing this exciting Encyclopedia.
vii
About the Editors
ix
x About the Editors
Professor Olaf Wolkenhauer received first degrees in control engineering from the
University of Applied Sciences in Hamburg, Germany and the University of
Portsmouth, U.K. in 1994 and his Ph.D. from UMIST in Manchester in 1997). Before
moving to the University of Rostock, where he occupies the Chair in Systems Biology &
Bioinformatics, he a research lectureship at the Control Systems Centre in Manchester,
a joint senior lectureship between the Department of Biomolecular Sciences and the
Department of Electrical Engineering and Electronics, at UMIST. Since October 2004
he is an Adjunct Professor in the Department of Electrical Engineering and Computer
Science at Case Western Reserve University, Cleveland, USA an became 2005 a fellow
of the Stellenbosch Institute for Advanced Study (STIAS). Olaf Wolkenhauer’s
research interest is mathematical modelling and data analysis, focussing on nonlinear
dynamical systems in molecular and cell biology. Further, detailed information can be
obtained from his webpage at www.sbi.uni-rostock.de.
Systems biology should not be considered a discipline but an approach to under-
stand complex biological systems. Nonlinear spatio-temporal phenomena, crossing
multiple levels of structural and functional organization are the reason why mathe-
matical modelling and simulation can play an important role. The fact that so many
different topics come together in systems biology is a challenge but also an additional
motivation to advance our understanding of living systems.
About the Editors xi
Professor Kwang-Hyun Cho received B.S., M.S., and Ph.D. degrees in electrical
engineering from the Korea Advanced Institute of Science and Technology (KAIST)
in 1993 (summa cum laude), 1995, and 1998, respectively. He was an Associate
Professor at the College of Medicine, Seoul National University, Korea until 2007
and is currently the Chair Professor in the Department of Bio and Brain Engineering
at the KAIST and a director of the Laboratory for Systems Biology and Bio-Inspired
Engineering (http://sbie.kaist.ac.kr). He was the recipient of the IEEE/IEEK Joint
Award for Young IT Engineer, the Young Scientist Award from the President of
Korea, and the Walton Fellow Award from Science Foundation of Ireland. He has
been working on systems biology with biomedical applications, network evolution,
and bio-inspired engineering. His innovative contribution to systems biology and
bio-inspired engineering research by combining an engineering approach with
biochemical experimentation has led to over 126 high-profile international
journal publications. He is currently the Editor-in-Chief of IET Systems Biology
(IET, London, U.K.).
As a systems scientist, I was intrigued by the complex and puzzling dynamics of
living systems. In particular, their self-organizing behaviors and emergent properties
led me to think about the fundamental difference of living systems compared to
non-living systems from a systems science point of view. Systems biology is the way
of approaching answers to such questions and I hope the Encyclopedia of Systems
Biology to be a useful travel guide for that expedition.
xii About the Editors
Aritificial Life
Jan T. Kim University of East Anglia, School of Computing Sciences, Norwich, UK
Cancer
Carlos Sonnenschein Department of Anatomy and Cellular Biology, Tufts Univer-
sity School of Medicine, Boston, MA, USA
Ana Soto Department of Anatomy and Cellular Biology, Tufts University School of
Medicine, Boston, MA, USA
Cell Cycle
Attila Csikász-Nagy CoSBi, The Microsoft Research, Centre for Computational
and Systems Biology, University of Trento, Povo (Trento), Italy
xiii
xiv Section Editors
Metabolic Networks
Osbaldo Resendis-Antonio Systems Biology Group, Instituto Nacional de
Medicina Genomica, (INMEGEN), Tlalpan, Mexico City, DF, Mexico
Model Databases
Jacky Snoep Department of Biochemistry, University of Stellenbosch, Matieland,
South Africa
Resources
Eduardo Mendoza Department of Computer Science, University of the Philippines,
Diliman, Quezon City, Philippines
Systems Immunology
Sudipto Saha Centenary Campus, Bose Institute, Kolkata, West Bengal, India
Text Mining
J€
org Hakenberg Department of Computer Science and Department of Biomedical
Informatics, Arizona State University, Tempe, AZ, USA
Toponomics
Walter Schubert Molecular Pattern Recognition Research Group, Medical Faculty,
Otto-von-Guericke-University Magdeburg, Magdeburg, Germany
Transcriptional Regulation
Masami Horikoshi Institute of Molecular and Cellular Biosciences, University of
Tokyo, Bunkyo-ku, Tokyo, Japan
Frank Allg€ ower Institute for Systems Theory and Automatic Control, University of
Stuttgart, Stuttgart, Germany
xvii
xviii Contributors
Miew Keen Choong Centre for Health Informatics, Australian Institute for Health
Innovation, University of New South Wales, Sydney, NSW, Australia
I-Chun Chou The Wallace H. Coulter Department of Biomedical Engineering,
Georgia Institute of Technology and Emory University, Atlanta, GA, USA
Yunfei Chu Artie McFerrin Department of Chemical Engineering, Texas A&M
University, College Station, TX, USA
Andrea Ciliberto IFOM – The FIRC Institute of Molecular Oncology, Milan, Italy
Jean Clairambault Equipe-Projet BANG, INRIA Paris-Rocquencourt BP 105,
Le Chesnay, Paris, France
Amanda Clare Department of Computer Science, Aberystwyth University,
Aberystwyth, UK
Tim Clark Department of Neurology, Massachusetts General Hospital and Harvard
Medical School, Boston, MA, USA
Richard H. Clayton Department of Computer Science, University of Sheffield,
Sheffield, UK
Gilles Clermont Center for Inflammation and Regenerative Modeling, University
of Pittsburgh, Pittsburgh, PA, USA
George M. Coghill Institute for Complex System and Mathematical Biology,
University of Aberdeen, Aberdeen, UK
Kevin Bretonnel Cohen Center for Computational Pharmacology, University of
Colorado, Aurora, CO, USA
Matteo Comin Department Information Engineering, University of Padova,
Padova, Italy
David Cook AstraZeneca, Translational Patient Safety and Enabling Science,
Macclesfield, Chester, UK
Carlo Cosentino Department of Clinical and Experimental Medicine, Magna
Graecia University of Catanzaro, Catanzaro, Italy
Richard G. Côté European Molecular Biology Laboratory, European Bioinformat-
ics Institute (EBI), Hinxton, Cambridge, UK
Carl F. Craver Philosophy and Philosophy-Neuroscience-Psychology Program,
Philosophy Department, Washington University in Saint Louis, St. Louis, MO, USA
Attila Csikász-Nagy The Microsoft Research - University of Trento Centre for
Computational and Systems Biology, Trento, Italy
Lalit Darunte Department of Chemical Engineering, Indian Institute of Techno-
logy, Mumbai, India
Ranjan K. Dash Department of Physiology, Biotechnology and Bioengineering
Center, Medical College of Wisconsin, Milwaukee, WI, USA
Jonathan F. Davies Fondazione Bruno Kessler, Trento, Italy
Contributors xxiii
David S. DeLuca Cancer Genome Analysis Group, The Broad Institute, Cambridge,
MA, USA
Andreas Deutsch Center for Information Services and High Performance Compu-
ting (ZIH), Technical University Dresden, Dresden, Germany
Rohan Dhiman Center for Pulmonary and Infectious Disease Control, University of
Texas Health Science Center at Tyler, Tyler, TX, USA
Andreas Dr€
ager Center for Bioinformatics, University of Tuebingen, Tuebingen,
Germany
Yufei Huang Picower Institute for Learning and Memory, Massachusetts Institute
of Technology, Cambridge, MA, USA
Greehey Children’s Cancer Research Institute, University of Texas at Texas Health
Science Center at San Antonio, San Antonio, TX, USA
Department of Epidemiology and Biostatistics, University of Texas at Texas Health
Science Center at San Antonio, San Antonio, TX, USA
Fuyan Hu Institute of Systems Biology, Shanghai University, Shanghai, China
Tze Hau Lam Department of Microbiology, Yong Loo Lin School of Medicine,
National University of Singapore, Singapore, Singapore
Christoph H. Lampert IST Austria, Klosterneuburg, Austria
Qizong Lao Laboratory of Receptor Biology and Gene Expression, National Cancer
Institute, National Institutes of Health, Bethesda, MD, USA
Sneh Lata Cold Spring Harbor Laboratory, Genome Research Center, Woodbury,
NY, USA
Reinhard Laubenbacher Virginia Bioinformatics Institute, Virginia Tech,
Blacksburg, VA, USA
Sang Yup Lee Department of Chemical and Biomolecular Engineering and
Department of Bio and Brain Engineering, Korea Advanced Institute of Science
and Technology (KAIST), Daejeon, Korea
Tae Lee School of Biology and School of Physics, Georgia Institute of Technology,
Atlanta, GA, USA
Yun Lee The Wallace H. Coulter Department of Biomedical Engineering, Georgia
Institute of Technology and Emory University, Atlanta, GA, USA
Marie-Paule Lefranc Laboratoire d’ImmunoGénétique Moléculaire, Institut de
Génétique Humaine UPR 1142, Université Montpellier 2, Montpellier, France
Jinzhi Lei Zhou Pei-Yuan Center for Applied Mathematics, Tsinghua University of
Beijing, Beijing, China
Florian Leitner Structural Biology and BioComputing Programme, Spanish
National Cancer Research Centre (CNIO), Madrid, Spain
Sabina Leonelli ESRC Centre for Genomics in Society, University of Exeter,
Exeter, Devon, UK
Ulf Leser Institute for Computer Science, Humboldt-Universit€at zu Berlin, Berlin,
Germany
Arthur M. Lesk Department of Biochemistry and Molecular Biology, and Huck
Institutes of Genomics, Proteomics, and Bioinformatics, Pennsylvania State Univer-
sity, University Park, PA, USA
Wentian Li The Robert S. Boas Center for Genomics and Human Genetics,
Feinstein Institute for Medical Research, Manhasset, NY, USA
Yan Li Department of Pediatrics, Indiana University School of Medicine, Indiana-
polis, IN, USA
Zhenping Li School of Information, Beijing Wuzi University, Beijing, China
Wolfram Liebermeister Institut f€ur Biochemie, Charité - Universit€atsmedizin
Berlin, Berlin, Germany
Frank Lin Centre for Health Informatics, Australian Institute for Health Innovation,
University of New South Wales, Sydney, NSW, Australia
Ke-Qin Liu Institute of Systems Biology, Shanghai University, Shanghai, China
xxxii Contributors
Xiaojun Liu Internal Medicine, The Second Hospital of Hebei Medical University,
Shijiazhuang, Hebei, China
Xiaoping Liu Institute of Systems Biology, Shanghai University, Shanghai, China
Zhi-Ping Liu Key Laboratory of Systems Biology, Shanghai Institutes for Biolo-
gical Sciences, Chinese Academy of Sciences, Shanghai, China
Catherine M. Lloyd Auckland Bioengineering Institute, University of Auckland,
Auckland, New Zealand
Luca Lombardi Department of Computer Engineering and Systems Science,
University of Pavia, Pavia, Italy
Isaac F. López-Moyado Center for Genomic Sciences-UNAM, Universidad
Nacional Autonoma de Mexico, Cuernavaca (Morelos), Mexico
Jingky Lozano-K€
uhne Department of Public Health, University of Oxford,
Oxford, UK
Long Jason Lu Division of Biomedical Informatics, Cincinnati Children’s Hospital
Research Foundation, Cincinnati, OH, USA
John Lygeros Automatic Control Laboratory, ETH Zurich, Zurich, Switzerland
Zoi Lygerou School of Medicine, Laboratory of General Biology, University of
Patras, Patras, Greece
Shuangge Ma Yale School of Public Health, Yale University, New Haven, CT, USA
Hong-Wu Ma School of Informatics, University of Edinburgh, Edinburgh, UK
Lars Malmstr€ om Institute of Molecular Systems Biology IMSB, ETH Zurich,
Zurich, Switzerland
Marcos Malumbres Spanish National Cancer Research Centre (CNIO), Madrid,
Spain
Sebastian Mana-Capelli Department of Molecular Genetics and Microbiology,
University of Massachusetts, Worcester, MA, USA
Athanasios Mantalaris Biological Systems Engineering Laboratory, Department
of Chemical Engineering, Centre for Process Systems Engineering, Imperial College,
London, UK
Samuel Marguerat Department of Genetics, Evolution and Environment and UCL
Cancer Institute, University College London, London, UK
Alberto Marin-Sanguino Faculty of Mechanical Engineering, Specialty
Division for Systems Biotechnology, Technical University Munich, Garching,
Germany
José Antonio Martı́nez Department of Computer Architecture and Electronics,
University of Almerı́a, Almerı́a, Spain
Elio Mattia Centre for Systems Chemistry, Stratingh Institute for Chemistry,
Rijksuniversiteit Groningen, Groningen, The Netherlands
Contributors xxxiii
Barry McMullin The Rince Institute, Dublin City University, Dublin, Ireland
Hendrik Mehlhorn Leibniz Institute of Plant Genetics and Crop Plant Research (IPK),
Gatersleben, Stadt Seeland, Germany
Jia Meng Picower Institute for Learning and Memory, Massachusetts Institute of
Technology, Cambridge, MA, USA
Kalpana Raja Data Mining and Text Mining Laboratory, Department of Bioinfor-
matics, Bharathiar University, Coimbatore, India
xxxviii Contributors
Falk Schreiber Leibniz Institute of Plant Genetics and Crop Plant Research (IPK),
OT Gatersleben, Stadt Seeland, Germany
Martin Luther University Halle–Wittenberg, Halle, Germany
Jagesh Shah Department of Systems Biology, Harvard Medical School and Renal
Division, Brigham and Women’s Hospital, Boston, MA, USA
Nobuo Shimamoto Faculty of Life Sciences, Kyoto Sangyo University, Kyoto, Japan
Sandro J. de Souza Ludwig Institute for Cancer Research, Sao Paolo, Brazil
5-azaCytosine References
Vani Brahmachari and Shruti Jain Kaminskas E, Farrell AT, Wang YC, Sridhara R, Pazdur R
(2005) FDA drug approval summary: azacitidine
Dr. B. R. Ambedkar Center for Biomedical Research,
(5-azacytidine, Vidaza) for injectable suspension. Oncolo-
University of Delhi, Delhi, India gist 10(3):176–182
Synonyms
Azacitidine 7-Aminoactinomycin D
Characteristics References
The key principle underlying abduction is that Flach PA, Kakas AC (2000) Abduction and induction – essays
on their relation and integration. Kluwer, Dordrecht
of hypothesizing a set of facts that, together with the
Kakas AC, Kowalski RA, Toni F (1992) Abductive logic
available knowledge, explain a specific observation. For programming. J Log Comput 2(6):719–770
instance, given background knowledge about flying and Poole D (1993) Probabilistic Horn abduction and Bayesian
non-flying objects, including a rule stating that normal networks. Artif Intell 64:81–129
birds fly (8x:birdðxÞ ^ normalðxÞ ! fliesðxÞ), the
observation that Tweety flies (flies(tweety)) can be
explained by abducing that Tweety is a normal bird Abortive Transcription
({bird(tweety), normal(tweety)}).
More formally, given background knowledge B, Nobuo Shimamoto
a set A of abducibles (i.e., facts that can be part of Faculty of Life Sciences, Kyoto Sangyo University,
explanations), and a ground fact or observation o, the Kyoto, Japan
task of abduction is to find an explanation for o, that is,
a set of facts EA such that o can be inferred from B∪E
using ▶ deduction. Definition
Similarly to ▶ induction, abduction can be seen as
a form of inverted ▶ deduction. However, whereas The iterative synthesis of short oligo RNA of typically
induction finds general rules from several examples, 2–15 bases long by a promoter complex.
abduction assumes these rules to be part of the
background knowledge and instead finds ground facts Cross-References
explaining a single example (Flach and Kakas 2000).
The principle of abduction goes back to the ▶ Transcription in Bacteria
philosopher and logician Charles Sanders Peirce, who
viewed it as the part of scientific methodology seeking
explanations that turn surprising observations into Absolute Protein Quantification
plausible consequences of an existing theory.
In the context of Bayesian networks, finding the ▶ Selective Reaction Monitoring
most probable explanation (MPE), that is, the most
likely values of all non-observed variables given
observed values for a subset of variables, is some- Absorption Spectroscopy
times also considered abductive inference. A similar
idea, albeit in the context of a first order logic lan- Xiaohua Wu
guage and with a clear distinction between abducibles Department of Pediatrics, Herman B Wells Center for
and the rest of the theory, is used in probabilistic Pediatric Research, Indiana University School of
horn abduction (PHA) (Poole 1993). PHA associates Medicine, Indianapolis, IN, USA
probabilities to abducibles, thus allowing one to
choose explanations based on their probabilities. In
abductive logic programming (Kakas et al. 1992), Definition
integrity constraints are used to restrict the set of
possible explanations. A technique in which the power of a beam of light
measured before and after interaction with a sample is
compared.
Cross-References
Cross-References
▶ Deduction
▶ Induction ▶ Spectroscopy and Spectromicroscopy
Abstraction 3 A
(Giunchiglia et al. 1997) (▶ Knowledge Representa-
Abstraction tion) with as scope to simplify a logical theory of
a domain of interest or its visualization, using heuris-
A
C. Maria Keet tics, syntactic, or semantic abstraction, whereas
KRDB Research Centre, Free University of Bozen- the first approach to abstraction is also used to make
Bolzano, Bolzano, Italy distinctions between the implementation layer, logical
design layer, and conceptual modeling layer.
Synonyms
Characteristics
Aggregation; Generalization; Grouping
Abstraction Mechanisms
There are at least three principle mechanisms of
Definition abstraction in the second sense, which are graphically
depicted in Fig. 1, and each mechanism can have
Abstraction is used in two principal distinct senses: variations. For instance, for R-ABS, one can take the
1. The process to go from instances to a universal or abstraction from the constituent parts (▶ Mereology)
concept and, in turn, its meta-level or members to the whole (e.g., the individual sheep
2. The process to go from a detailed to simplified into a flock, enzymatic reaction into a metabolic path-
representation where one chooses to ignore certain way process) and for F-ABS, one “folds” (e.g., cus-
aspects, thereby reducing the scope toward salient tomer, payment, and book onto a “BookOrder” entity
semantics. or the steps in the MAPK cascade).
For systems biology, both senses of abstraction Implicitly, abstraction forces into existence a notion
are used, where the second sense comes into play of levels or layers, and successive abstractions then
only if the former becomes too wieldy at either the generate an abstraction hierarchy, which is a close
instance or the type level. Software support is often relative of levels of ▶ granularity and relates to the
required in order to deal with abstractions in systems notion of ▶ modularity and ▶ modularization of
biology, therefore much effort has gone into research a system. Not all approaches toward, and usages of,
of abstraction. The second sense enjoys attention abstraction consider levels and hierarchies explicitly
in particular in conceptual modeling for database and reify them into entities themselves that can be
systems (Talheim 2009) and artificial intelligence manipulated. For those that do, there are different
y y y
x
x x
abs(x) = y
Abstraction, Fig. 1 Three conceptually distinct types of remodeled as a function, F-ABS folding multiple entities and
abstraction operations; x and y are entities at a finer and coarser relations into a different type of entity, D-ABS deleting semanti-
level of abstraction (indicated with an oval), and x is abstracted cally less relevant entities and relations (Source: based on Keet
into y using an abstraction function abs. R-ABS the relation is (2007))
A 4 AC#
Limitations Synonyms
To date, there is no overwhelming evidence that the
solutions proposed by computing are implementable Shielding of activation domain
either in information systems for systems biology
or modeling for systems biology. Where it already
can be useful is to provide guidelines to create Definition
abstraction levels in a consistent way, that is, not ad
hoc but with guiding principles and abstraction In budding yeast, several GAL genes including
procedures. GAL1, GAL10, and GAL7 are activated by Gal4p,
which is a transcriptional activator for the GAL
regulon. Expression of the GAL genes allows the cell
Cross-References to utilize the sugar galactose as a carbon source. In
the absence of galactose, Gal4p is bound to DNA but
▶ Granularity is inactive because its activation domain (AD) is
masked by Gal80p (Fig. 1a) (Wong and Struhl 2011).
Once galactose is added to the media, the signal
References transduction protein Gal3p binds the sugar along with
ATP to allow activation of Gal4p. Specifically, the
Giunchiglia F, Villafiorita A, Walsh T (1997) Theories of galactose- and ATP-bound form of Gal3p can
abstraction. AI Commun 10(3–4):167–176
form a stable complex with Gal80p, thereby titrating
Keet CM (2007) Enhancing comprehension of ontologies and
conceptual models through abstractions. In: Basili R, out the inhibitory effect of Gal80p on Gal4p
Pazienza MT (eds) 10th congress of the Italian association (Lavy et al. 2012). It is not known whether Gal80p
for artificial intelligence (AI*IA 2007), Springer Lecture is released from the transcription complex upon
Notes in Artificial Intelligence 4733, pp 814–822
activation or instead remains in complex with
Talheim B (2009) Abstraction. In: Liu L, Tamer Ozsu M (eds)
Encyclopedia of database system. Springer, New York Gal3p, possibly in a different position or orientation.
Regardless, upon binding of Gal3p and Gal80p, the
AD of Gal4p becomes able to recruit a range of
transcriptional coactivators. Such shielding of ADs
may represent the simplest mode of action by which
AC# transcriptional corepressors like Gal80p negatively
regulate gene expression.
▶ Database Accession Number The Tup1p-Cyc8p complex of budding yeast is
a well-known transcriptional corepressor that represses
transcription of several hundreds of genes. This com-
plex is widely conserved among eukaryotes and is
Activating Complexes therefore likely to be crucial for proper regulation of
gene expression. Previously, it was believed that
▶ Trithorax Complexes Tup1p-Cyc8p represses transcription by several
Activation Domain Shielding 5 A
Activation Domain
Shielding,
Fig. 1 Transcriptional
repression by shielding the A
activation domains (ADs) of
transcription activators.
(a) Regulation of Gal4p by
Gal80p. Under repressing
conditions, the AD of Gal4p is
shielded by Gal80p. In
derepressing conditions,
Gal80p dissociates from
Gal4p, exposing the AD of
Gal4p for recruitment of
various coactivators.
(b) Regulation of some
activators by Tup1p-Cyc8p.
Under repressing conditions,
Tup1p-Cyc8p prevents
activation by shielding the
ADs of these activators. Upon
derepression, these activators
may undergo specific
conformational changes that
affect their interactions with
Tup1p-Cyc8p, such that the
ADs can now recruit various
coactivators. In contrast to
Gal80p, Tup1p-Cyc8p
remains at the promoter,
where it contributes somehow
to the activation process.
When repression needs to be
reestablished, Tup1p-Cyc8p
may help to remove these
coactivators from the
promoter
distinct mechanisms that function redundantly, e.g., by Tup1p-depleted cells. Therefore, at least some tran-
recruiting histone deacetylases (HDACs), positioning scriptional repressors can be converted to transcrip-
nucleosomes at inhibitory sites, or inhibiting Mediator tional activators by exchange of their corepressor
function at promoters. Recently, however, it has been accessory factors for coactivating factors. However, it
proposed that the major function of Tup1p-Cyc8p is to remains possible that some other transcriptional repres-
shield ADs from interacting with coactivators such as sors may repress transcription more actively, e.g., in the
SWI/SNF, SAGA, and Mediator (Fig. 1b) (Wong and manner previously proposed for Tup1p-Cyc8p.
Struhl 2011). In this regard, Tup1p-Cyc8p and Gal80p
are functionally similar, although the former is
not specific for the GAL regulon and functions more Cross-References
widely than the latter. Consistent with this newer model,
the binding sites of Tup1p-Cyc1p are located very close ▶ Mechanisms of Transcriptional Activation and
to those of coactivators, as revealed by studies of Repression
A 6 Activator
References Definition
Lavy T, Kumar PR, He H, Joshua-Tor L (2012) The Gal3p Given a set of unlabeled training examples, an active
transducer of the GAL regulon interacts with the Gal80p
learning algorithm is allowed to select a number of
repressor in its ligand-induced closed conformation. Genes
Dev 26(3):294–303 training examples (one by one or in batches) and to
Wong KH, Struhl K (2011) The Cyc8-Tup1 complex inhibits obtain their target value at a certain cost per example.
transcription primarily by masking the activation domain of Next, the learning algorithm should predict the target
the recruiting protein. Genes Dev 25(23):2525–2539
value of a number of unseen test examples, incurring
a certain cost per mistake. The goal of the active
learning algorithm is to minimize the total cost.
Activator
Problem Settings
Active Labeling During Cell Division There are several variations of the active learning
problem. First, active learning can be seen as learning
▶ Quantifying Lymphocyte Division, Methods by asking queries to some oracle (the teacher, experi-
mental setup, etc.) (Angluin 1988). Different query
types have been defined, among which the most impor-
Active Labeling Incorporation tant ones are:
• The membership query, which asks whether a par-
▶ Lymphocyte Labeling, Cell Division Investigation ticular example belongs to a particular class or not
• The equivalence query, which asks whether
a particular learned model is equivalent to the con-
Active Learning cept to be learned and asks for a counterexample if
the learned concept is not correct
Jan Ramon The equivalence query is very powerful and there-
Declarative Languages and Artificial Intelligence fore not very useful in practical applications. In partic-
Group, Katholieke Universiteit Leuven, Leuven, ular, it is usually not possible to verify whether a theory
Belgium is perfect (due to the absence of perfect experimenta-
tion equipment, resources or sufficiently knowledge-
able experts), and for many datasets it is even unlikely
Synonyms that a perfect theory can be derived. The membership
query lets the oracle return the target value of a given
Experimental design; Learning by queries example, and is more usual in most experimental
Active Learning 7 A
research. In the case of noisy experiments, the mem- the active learning algorithm is used directly by an
bership query may only give an approximative value experimental setup. One successful application was in
and it may be useful to ask the same query again to get the field of functional genomics (King et al. 2004).
A
a more precise measurement. Even so, the value of active learning strategies has
Second, variation is possible in the interaction pro- been demonstrated in several application areas, either
cedure between the learner and the oracle. Most com- on benchmark data or in comparative experiments with
mon is the situation where the learner asks queries one standard experimental designs. These domains include
by one and the oracle provides the answers immedi- drug discovery (De Grave et al. 2008), nanofiltration
ately. In practice, this is not always realistic. State-of- (Cano-Odena et al. 2010), sensor placement (Guestrin
the-art high-performance experimental setups et al. 2005), and robot gait optimization (Lizotte et al.
(microarrays, bioassays, . . .) process several experi- 2007).
ments simultaneously in batches or in pipelines. It is
therefore more efficient in practice to ask a next query Methods
before the first one has been answered. Several methods for active learning have been
Third, different prediction objectives are possible. described in the literature, for example, (Angluin
In the normal setting of active learning, the algorithm 1988; Tong and Koller 2001a; Lizotte et al. 2007;
is scored on prediction performance on a test set drawn Guestrin et al. 2005). Here, we mainly summarize
from the same distribution as the training set (Kearns a few general principles:
and Vazzirani 1994). However, one is not always most • When learning a predictive model, a common strat-
interested in learning a model of all aspects of the egy is to select instances for which the prediction
instance space. In some cases, one wants to find the using the current model is most uncertain. This is
best instance in the instance space according to some a particularly useful strategy for predictive models
criterion. Accordingly, one is interested in a model that which also report confidence, such as Gaussian
is especially accurate for the instances that are good processes (Guestrin et al. 2005).
while an accurate prediction of the value of bad • Alternatively, for version space–based algorithms,
instances is not very important. For example, in drug a good strategy is to try as well as possible to halve
discovery (De Grave et al. 2008), one wants to find the the version space with each query. For example,
“best” (most active, least toxic, . . .) molecule in when the parameters of a model are known to be in
a library of molecules. some region, it is good to choose instances to query
Finally, one can distinguish between membership such that this region is maximally reduced by the
query–based approaches where the learning algorithm answer. Tong and Koller (2001b) discuss the appli-
should provide the instances for which the class should cation of such strategies for support vector machines.
be obtained, and the membership query–based • When choosing several instances to query, a trade-
approaches where the algorithm can choose from off must be made between informativeness and
a predefined pool of instances. The latter is more real- diversity of the instances. In particular, choosing
istic, as in the first case it may happen that the learner several informative but very similar instances may
asks to perform a practically hard or impossible to not give a maximal total amount of new informa-
realize experiment. For example, for drug discovery tion. For example, in drug discovery one often
processes, large libraries of purchasable compounds investigates molecules similar to molecules known
are available. Other molecules can be tested too, but to have some activity in the hope to find a better
having to synthesize such molecules oneself drasti- variant, but one also performs a wider screening in
cally increases the cost of the experiment. the hope to find new regions in molecular space
where active compounds can be found.
Applications • Another idea arises from ▶ ensemble learning.
Active learning can be applied in a wide range of When several different models are learned on the
different applications. same data, they may agree on a part of the test
In principle, it is most useful in experimental instances and disagree on another part of the test
research. Only few systems have been implemented in instances. In such a case, Liere and Tadepalli (1997)
a completely automated closed loop where the output of argues that it is good to query the target values of
A 8 Actomyosin Ring
the instances on which the committee of classifiers Tong S, Koller D (2001b) Support vector machine active learn-
learned so far disagrees most. ing with applications to text classification. J Mach Learn Res
2:45–66
• When attempting to find the best instances in a set,
one will focus on the best instances according to the
quality criterion. A typical strategy among several
alternatives (De Grave et al. 2008) is to select Actomyosin Ring
instances optimistically, for example, by adding to
the current prediction a few standard deviations on Sebastian Mana-Capelli and Dannel McCollum
that prediction. This makes a trade-off between Department of Molecular Genetics and Microbiology,
selecting instances which will probably have University of Massachusetts, Worcester, MA, USA
a high quality and selecting instances in poorly
explored regions of the instance space.
Synonyms
▶ Ensemble
▶ Experiment Definition
Adaptationism
References Adipocytes
Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walter P Steven D. Rhodes
(2002) The adaptive immune system. In: Molecular
School of Medicine, Indiana University, Indianapolis,
biology of the cell, 4th edn. Garland Science, New York,
pp 1363–1421 IN, USA
Definition
Adhesin
Cross-References
Ramachandran Srinivasan
G.N. Ramachandran Knowledge Centre for Genome ▶ Single Cell Assay, Mesenchymal Stem Cells
Informatics, Institute of Genomics and Integrative
Biology, Delhi, India
Adjuvants
Synonyms
Gajendra Raghava
Adhesion molecule Bioinformatics Centre, Institute of Microbial
Technology, Chandigarh, Chandigarh, India
Definition
Definition
Adhesins are cell surface–localized proteins helping in
the process of adhesion. In pathogens, adhesins consti- These are molecules used to improve the efficacy of
tute major virulence factor helping in the process of vaccine mostly through delivery of the vaccine in
adherence and colonization. controlled release mode and at the desired location.
A 12 Adult T-Cell Leukemia
Definition Cross-References
HTLV-1 is the only known human retrovirus with onco- ▶ HTLV, Cellular Transcription
genic potential. About 5% of infected HTLV-1 individ-
uals develop ATL, a non-Hodgkin type lymphoma with References
abnormal mature CD4+ T-cell phenotypes and extremely
long latency of up to 60 years. HTLV-1 infections and Bazarbachi A, Plumelle Y, Ramos JC, Tortevoye P, Otrock Z,
Tax-mediated transformation of CD4+ T-cells alone is Taylor G, Gessain A, Harrington W, Panelatti G, Hermine O
not sufficient to trigger ATL. Other factors that are (2010) Meta-analysis on the use of Zidovudine and
Interferon-Alfa in adult T-Cell leukemia/lymphoma showing
necessary to develop ATL include proviral load, pro- improved survival in the leukemic subtypes. J Clin Oncol
gressive weakening of the host immune system, genetic 28(27):4177–4183. Epub ahead of print
background, virus-mediated genetic and epigenetic Higuchi M, Fujii M (2009) Distinct functions of HTLV-1
changes resulting in DNA damage, and clonal expansion Tax1 from HTLV-2 Tax2 contribute key roles to viral path-
ogenesis. Retrovirology 6:117
of altered CD4+ T-cells (Higuchi and Fujii 2009). Takatsuki K (2005) Discovery of adult T-cell leukemia.
Infections with HTLV-2 may result in HAM/ Retrovirology 2:16
TSP-like disease, but do not cause ATL or other
forms of leukemia/lymphoma. Although HTLV-2 is
transforming infected CD4+ T-cells, differences in Tax
proteins of HTLV-1 and HTLV-2 and their molecular Adult T-Cell Leukemia/Lymphoma
interactions with host proteins are thought to be
responsible for their distinct disease outcomes ▶ Adult T-Cell Leukemia
(Higuchi and Fujii 2009).
Typical ATL symptoms are skin lesions, lymphade-
nopathy, and hepatosplenomegaly. ATL is often Adverse Events
accompanied by hypercalcemia and increased osteo-
clast numbers. Clinically, ATL is divided into four Ravi Iyengar
subclasses (Table 1) (Takatsuki 2005). Department of Pharmacology and Systems
At present, there is no effective treatment of ATL. Therapeutics, Mount Sinai School of Medicine,
Chemotherapy, INFA2 plus antiviral treatment, human- New York, NY, USA
ized monoclonal IL2 antibody therapy, or allogeneic
hematopoietic stem cell transplantation yield mixed
results (Takatsuki 2005). Nevertheless, INFA2 combined Definition
with antiviral zidovudine (AZT) treatment prolonged the
survival time and improved clinical symptoms in patients Complications and/or undesirable side effects that are
with acute, chronic, and smoldering ATL, except lym- associated with pharmacological treatments, which
phoma-type ATL (Bazarbachi et al. 2010). can be the result of unpredicted interactions.
Agent-based Modeling 13 A
Cross-References or in vivo biological experiments in order to validate
and calibrate these rules according to relevant data.
▶ Biomarker Discovery, Knowledge Base In the field of cancer research, the use of ABM to
A
▶ Systems Pharmacology simulate cancer growth per se is not new (Wang and
Deisboeck 2008). Some sophisticated ABMs have
been developed to identify and quantify the relation-
ship between individual molecular properties,
Aerobic Glycolysis their microenvironmental conditions, and the overall
tumor morphology. It should be noted that current
▶ Warburg Effect ABMs often model extracellular factors as continu-
ous quantities, thereby rendering the models hybrid
in nature. The ABM approach allows researchers to
investigate one of the most important problems in
Agent-based Modeling cancer research – tumor heterogeneity, an inherent
feature of cancer cells. This is achieved by addressing
Zhihui Wang and Thomas S. Deisboeck the role of diversity in cell populations and
Harvard-MIT (HST) Athinoula A. Martinos Center for also within each individual cell. In current ABMs,
Biomedical Imaging, Massachusetts General Hospital, tumor growth and invasion patterns are neither
Charlestown, MA, USA predefined nor intuitive, but emerge as a result of
individual dynamics, which include cell–cell and
cell–microenvironment interactions and intercellular
Synonyms signaling. However, a major drawback of ABMs is
that they are generally too detailed to simulate over
Individual-based modeling (IBM) a long period of time, particularly in large, 3D
domains, and as a consequence current ABMs can
only process a relatively small number of cells.
Definition Hence, a more innovative way of thinking about
ABM is required to solve this problem in order to
Agent-based modeling (ABM), also referred to as move ABMs toward clinical application. We have
individual-based modeling (IBM), simulates the inter- seen a number of new methods being developed in
actions of autonomous entities (i.e., the agents) with the field (interested readers should refer to Wang and
each other and their local environment to predict Deisboeck (2008) for details).
higher-level emergent phenomena (Bonabeau 2002).
In biomedical research, while an agent can
represent a part of a cell or a cluster of cells, the ideal Cross-References
candidate for a software agent is now more commonly
recognized to be an individual cell (Walker and ▶ Multilevel Modeling, Cell Proliferation
Southgate 2009), since models can benefit from such
a direct one-to-one mapping between real and
virtual cells in terms of parameter acquisition from
experiments and model validation. As a simulation References
progresses, agents (representing individual cells)
Bonabeau E (2002) Agent-based modeling: methods and tech-
interact or communicate with other agents and their
niques for simulating human systems. Proc Natl Acad Sci
common microenvironment according to a set of USA 99(Suppl 3):7280–7287
predefined, biomedical data-driven “rules.” Because Walker DC, Southgate J (2009) The virtual cell–a candidate co-
an ABM’s simulation results are highly dependent ordinator for “middle-out” modelling of biological systems.
Brief Bioinform 10:450–461
on these rules, it is necessary to tightly couple these
Wang Z, Deisboeck TS (2008) Computational modeling of brain
algorithms at all stages of model development with tumors: discrete, continuum or hybrid? Sci Model Simul
iterative in silico studies as well as available in vitro 15:381–393
A 14 Agent-based Models, Discrete Models and Mathematics
Definition Assumptions
There are a number of underlying assumptions that are
Agent-based modeling (ABM) is a computational typically made when constructing an ABM to study
modeling approach whereby individual entities in biological tissue patterning, and these are summarized
a complex system (e.g., biological cells) are below:
represented by discrete agents that interact autono- • Agents are finite in number, and unless otherwise
mously in simulated space and time to produce emer- specified, each agent represents a single biological
gent, often nonintuitive outcomes at the population cell within a tissue.
(e.g., biological tissue) level. ABM has roots in the • Biological cells are treated as “black boxes” in that
field of artificial intelligence and can be considered an biological phenomena at the subcellular level (e.g.,
offshoot of cellular automata (CA) modeling gene expression or intracellular signaling) are
(▶ Cellular Automata), which was first introduced in implicit and reflected in the behaviors and attributes
the 1940s by John Von Neumann (von Neumann and of agents at the cell level.
Morgenstern 2007). Unlike CA, however, agents in an • Agents operate within a bounded space whose
ABM are not confined to stationary points on a grid boarders are analogous to those of the biological
(i.e., they are mobile), they do not have to perform their tissue being simulated.
actions simultaneously at each time step, and they can • Space and/or time is represented discretely.
be heterogeneous in size or type (Grimm and Railsback A typical two-dimensional ABM simulation space,
2005). With the advent of personal computers in the for example, is partitioned into a grid of pixels on
1980s, the adoption of ABM accelerated and expanded which agents act and interact with one another and
into different fields, including ecology, epidemiology, with their simulated environment.
finance and economics, political science, the social and • The state of knowledge in a given field can be
behavioral sciences, and more recently, biological pat- effectively captured by the rule set that governs
terning. As applied to biological and biomedical inves- agent behaviors in an ABM.
tigation, ABMs are particularly well suited for • If the prediction of the ABM matches indepen-
investigating multicellular phenomena, or processes dently measured outcomes of a biological analogue,
where the independent behaviors of individual cells the rule set may, but not necessarily, capture key
acting in response to their dynamic local environment mechanisms controlling the biological system.
give rise to emergent tissue-level adaptations. In this
context, ABMs have been used to simulate and study Formulation
a variety of physiological and pathological processes Historically, biological ABMs have been used for two
including tumorigenesis, inflammation, wound primary purposes: (1) to replicate the emergent behav-
healing, and angiogenesis (Peirce et al. 2006). The iors of a complex biological system in order to better
rules governing individual agent behaviors are central understand its underlying mechanisms and (2) to
to the outcomes/predictions of ABMs and often address specific fundamental questions/hypotheses
Agent-based Models, Discrete Models and Mathematics 15 A
Agent-based Models, Discrete Models and Mathematics, cells (yellow) visualized using immunohistochemistry and con-
Fig. 1 The multicell composition of a complex tissue (rat mes- focal microscopy (a) (▶ Fluorescence Microscopy). An ABM
entery) includes microvascular endothelial cells (green) represents cells as agents and can simulate patterning outcomes
arranged in a network of vessels and interstitial progenitor at the level of the microvascular network (b)
about the underlying mechanisms of complex biolog- (e.g., matrix metalloproteinase) or bind and release
ical systems. The process of formulating an ABM is growth factors.
iterative, and few, if any, formalized protocols for the A reasonable third step is the selection of initial
construction of biological ABMs exist; however, conditions, an appropriate time step (clock increments
a general recommended stepwise approach for devel- during which events in the simulation are scheduled),
oping an ABM is summarized below. and a total duration (i.e., temporal window) for the
First, the simulation space and the agents are simulation. Frequently, time steps are discrete and rep-
defined according to the tissue being simulated and resent from millisecond to hours. During each time step,
its cellular and acellular composition (Fig. 1). the agents survey the environment, make decisions
Specifically, the simulation boundaries are set by the based on their local environment that are dictated by
size of the tissue being simulated (in two dimensions or the rule set, and exhibit behaviors. ABMs can simulate
three dimensions). Spatial boundaries, or edges of temporal windows ranging from minutes to years.
the simulation space, can be periodic, or wrapping The fourth step involves determining the relevant
from left to right and top to bottom, in the case of output metrics that capture the salient features of the
a two-dimensional simulation. An alternative bound- emergent outcome at the tissue level. It is often useful
ary condition is a closed boundary, wherein the to design metrics that quantitatively reflect the shape or
edges of the simulation space are rigid and serve as architecture of the tissue at different time steps, and/or
“walls” to prevent agents from exiting the space. The patterns of cell aggregates and multicell structures
number and phenotype of cells/agents contained within the tissue. An obvious guideline for selecting
within the tissue and the extracellular composition metrics is to include those that are utilized to describe
of the tissue are defined and ascribed to agent-types analogous patterning features in experimental data, as
in the ABM. this facilitates cross-validation of the ABM predictions
Second, the agents are endowed with the ability to against independent empirical studies.
exhibit the relevant physiological or pathological The fifth step involves compiling the set of rules that
behaviors. For example, cellular agents can proliferate, determine the behaviors of the individual agents in the
migrate, undergo apoptosis, differentiate, and/or ABM. Rules can be constructed as abstract representa-
secrete proteins, all in response to other signals in tions of known biological relationships or as specific
their simulated environment. Likewise, acellular representations of actual empirical data. Frequently
agents (e.g., collagen) may be endowed with the ability rules within a rule set are crafted as Boolean, true-
to remodel in response to enzymes in the environment false “if-then” statements, but they can also incorporate
A 16 Agent-based Models, Discrete Models and Mathematics
stochasticity and describe agent behaviors according to understanding and suggest specific experiments that
probability distributions. An example of a rule that would offer the most value moving forward.
might exist in a two-dimensional ABM simulation In summary, developing and using ABMs is an
meant to represent cell contact inhibition would be: “If iterative process that is most informative when married
an agent is within one pixel of a neighboring agent for with experimental studies to determine the underlying
more than 24 h, that agent will not divide.” The length of cell-level mechanisms of tissue patterning (Thorne
a biological ABM rule set can range from one to over et al. 2007). By integrating different types of experi-
200 rules, and can be limited by computational power mental data into a unified rule set and allowing agent/
and the available empirical data. cells to reveal the outcome of this integration in space
The first five steps described above pertain to the and time, ABMs can uncover new understanding that
design and assembly of the model, and the sixth step is is not obtainable using experimental approaches alone.
run and verify model by determining whether or not ABMs may play a role in translational research,
the output is internally consistent and, at least at an informing drug design and target identification, as
intuitive level, aligned with representative biological well as predicting side effects and determining mech-
outcomes. This step can also include performing addi- anisms of action (Vodovotz et al. 2010). Ultimately,
tional simulations in order to conduct a sensitivity ABMs should be considered as another tool in the
analysis on the parameter/rule set and to understand repertoire of systems biology approaches that is par-
if the unperturbed system reaches dynamic equilibrium ticularly useful in studying complex, multicell pro-
or steady state after multiple time steps. cesses and outcomes at the tissue level.
The final step in developing and using an ABM to
study biological processes involves performing an inde- Strengths and Weaknesses of ABM
pendent validation of the predictions by comparing Strengths
ABM outcomes to experimentally measured outcomes • Multicell behaviors can be represented at the level
and using the model to interrogate the underlying mech- of the tissue and individual cells can be tracked in
anisms of the complex system. This may include space and time in a relatively computationally effi-
performing simulations where agents or rules are sys- cient manner.
tematically removed or altered to understand the rela- • Heterogeneities in cell distribution and tissue com-
tive contributions of those factors to the overall system. position, as well as stochastic changes in biological
Removing the influences of key growth factors or events, can be modeled explicitly.
chemokines one by one, for example, would enable • ABMs produce graphical outputs that are easily
a “knockout” analysis to elucidate how the system interpreted by nonexperts in computational model-
performs in the absence of these factors. One might ing, enabling a potentially rich and fluid dialogue
perform such an analysis in order to independently between experimentalists and modelers.
predict phenotypes observed in an analogous transgenic • ABMs integrate diverse types of experimental data
knockout mouse. This step might also include testing and serve to instantiate existing and new hypotheses
alternative hypotheses by implementing them as rules in to elucidate mechanistic relationships across com-
the ABM rule set and comparing the different predic- plex biological systems.
tions to empirically measured results. This type of anal-
ysis might suggest a subset of plausible working Weaknesses
hypotheses to pursue experimentally, thus providing • Because ABMs can be very sensitive to small var-
a tool to inform the design of experiments in the wet iations in the rule set and rarely account for physical
lab and streamlining and/or accelerating the overall conservation laws explicitly, their predictions can
discovery process (Hunt et al. 2009). A common be highly irregular or nonbiological. Therefore, the
misconception about ABMs is that they are only infor- risk of trivial or irrelevant predictions should be
mative if they recapitulate features of the biological mitigated by rigorous independent validation
system. On the contrary, ABMs can be highly instruc- against experimental data.
tive when they fail to accurately predict measured • ABMs that incorporate stochastic rules require mul-
biological outcomes because they can point to specific tiple runs to fully explore the extent of possible
voids, inconstancies, and/or errors in data or outcomes.
Aging 17 A
• Depending on the size and scope of the model, von Neumann J, Morgenstern O (2007) Theory of games and
ABMs can have large numbers of rules and param- economic behavior (commemorative edition). Princeton
Classic Editions, Princeton
eters, which may make parameter identification dif-
A
ficult and require extensive sensitivity analyses to
determine the robustness of the predictions.
• ABMs do not explicitly account for intracellular Agglomerative Hierarchical Clustering
events and serve as abstractions of the biology at
this level, thus limiting their ability to provide mean- ▶ Hierarchical Agglomerative Clustering
ingful results regarding gene regulation, metabolic
processes, and intracellular signaling, for example.
Cross-References ▶ Mereology
▶ Meronymy
▶ Cellular Automata
▶ Fluorescence Microscopy
Aging
References
Rajeswara Babu Mythri1, Shireen Vali2 and
Grimm V, Railsback SF (2005) Individual-based modeling and
M. M. Srinivas Bharath1
ecology, Princeton series in theoretical and computational 1
biology. Princeton Classic Editions, Princeton Department of Neurochemistry, National Institute of
Hunt CA, Ropella GE, Lam TN, Tang J, Kim SH, Engelberg JA, Mental Health and Neurosciences (NIMHANS),
Sheikh-Bahaei S (2009) At the biological modeling and Bangalore, Karnataka, India
simulation frontier. Pharm Res 26:2369–400 2
Cell Works Group Inc., Bangalore, India
Peirce SM, Skalak TC, Papin JA (2006) Coupling intracellular
networks with tissue-level physiology. IBM J Res 50:1–15
Railsback SF, Lytinen SL (2006) Agent-based simulation plat-
forms: review and development recommendations. Definition
Simulations 82:609–623
Thorne BC, Bailey AM, Peirce SM (2007) Combining
experiments with multi-cell agent-based modelling to study Aging is classically defined as a long-term physiological
biological tissue patterning. Brief Bioinform 8:245–257 process associated with morphological and functional
Vodovotz Y, Constantine G, Faeder J, Mi Q, Rubin J, Bartels J, changes in the human body at the tissue and cellular
Sarkar J, Squires RH Jr, Okonkwo DO, Gerlach J, Zamora R,
level, aggravated by injury resulting in a progressive
Luckhart S, Ermentrout B, An G (2010) Translational sys-
tems approaches to the biology of inflammation and healing. imbalance of the regulatory systems including neuronal,
Immunopharmacol Immunotoxicol 32:181–95 hormonal, and immune mechanisms. Human life
A 18 Agonist
expectancy has nearly doubled since the twentieth cen- An agonist may be endogenous, i.e., a natural ligand
tury causing tremendous increase in aged population that binds to its receptor under in vivo conditions, or it
(65 years and older). The increasing number and sever- can be a synthetic, exogenous ligand.
ity of health problems associated with aging presents
one of the major health care challenges in the world.
This includes the increased propensity of aged popula-
tion to encounter neurological diseases. Akaike’s Information Criterion (AIC)
Physiological aging is an important causative factor
underlying the onset of sporadic PD. This is because the Kejia Xu
cumulative changes in the midbrain contribute to spe- Institute of Systems Biology, Shanghai University,
cific degenerative pathways and PD pathology. For Shanghai, China
example increased oxidative stress manifested by accu-
mulation of oxidatively modified proteins increases
with age. This increase magnifies several folds in late Definition
age due to a combination of increased production of
reactive species, decreased antioxidant activity, and Akaike’s Information Criterion (AIC) is a technique
impaired ability to repair or remove the modified pro- that measures the goodness of an estimated statistical
teins. Additionally, activities of the protein clearing model and selects a model from a set of candidate
cellular machinery including the ubiquitin-proteasome models. The chosen model is the one that is expected
system (UPS) and autophagy decline with age thereby to minimize the difference between the model and the
reducing the neuron’s ability to remove damaged or truth. Given a data set, several competing models may
modified proteins or protect itself from damaging free be ranked according to their corresponding AIC, and
radicals. This also leads to neurotoxic aggregation of the one having the lowest AIC will be the best.
uncleared proteins in the cell. Such cellular insults are In the general case, the AIC is defined as
compounded by other age-associated pathways such as
mitochondrial dysfunction, accumulation of iron, AIC ¼ 2k 2ln L; (1)
increased DA oxidation, increased deposits of lipids,
etc. which increase with aging. where k is the number of parameters in the statistical
model, and L is the maximized value of the likelihood
function for the estimated model.
Cross-References
Agonist
Melissa L. Kemp
The Wallace H. Coulter Department of Biomedical Alarmins
Engineering, Georgia Institute of Technology and
Emory University, Atlanta, GA, USA ▶ Damage-associated Molecular Patterns
Definition
Algorithmic Modeling Languages
An agonist is a ligand that binds to a cell surface
receptor and triggers a response from the cell. ▶ Cell Cycle Modeling, Process Algebra
AliBaba 19 A
enables users to quickly compare KEGG pathways
AliBaba to recent findings in the literature.
A
Conrad Plake1 and J€org Hakenberg2 Usage
1
Biotechnology Center (BIOTEC), Technische AliBaba builds on the Java Web Start technology and
Universit€at Dresden, Dresden, Germany launches a client on the user’s machine. The client
2
Department of Computer Science and Department of accepts a PubMed query, sends it to the AliBaba server
Biomedical Informatics, Arizona State University, who in turn retrieves and analyzes the results from
Tempe, AZ, USA PubMed and gets back annotated abstracts. The query
syntax and semantics are exactly the same as already
known to PubMed users, and AliBaba also keeps the
Definition order of results. The results are parsed and shown on
a graphical user interface mainly consisting of three
AliBaba is a text mining (see ▶ Applied Text Mining) panels that show different aspects of the gathered
system that analyzes scientific abstracts, extracts information (see Fig. 1).
biomedical entities such as genes and diseases
(▶ Named Entity Recognition), and displays the Use Case
results as a graph depicting relationships between A patient with cough received standard treatment with
these entities, such as protein–protein interactions codeine and fell unresponsive after a while. To search
(Plake et al. 2006). AliBaba is available at: alibaba. a reason, “codeine intoxication” is a meaningful
informatik.hu-berlin.de. PubMed query, but it results in 162 abstracts. Posing
this query to AliBaba immediately shows a path from
codeine through poisoning to morphine and further to
Characteristics CYP2D6 (Fig. 1). This indicates the solution for this
case, where the patient was suffering from a CYP2D6
Literature searches form a substantial part of day-to- mutation leading to the ultrarapid metabolism of
day work in life science–related domains. Researchers codeine to morphine.
track recent developments in their fields, search for
answers to specific questions, or obtain overviews Named Entity Recognition
of known facts on a given topic. The PubMed Recognition of named entities (names referring to
(see ▶ MEDLINE and PubMed) interface to Medline proteins, species, drugs, diseases, etc.) is based on
is the most renowned search engine in the life pre-compiled word lists. Recognition of these names
sciences, indexing over 20 million abstracts from is performed using class-specific fuzzy searches.
more than 5,000 journals. Manually searching this Slight spelling variations (plural, capitalization,
vast amount is a tedious task, especially when exhaus- hyphenation) are allowed for all classes. For species,
tive information are needed, or when information we expand the word lists with common types of
about a set of objects are required. AliBaba helps abbreviations (where they are not already contained
users in analyzing large PubMed search results in the database), such as “S. cerevisiae” for the entry
by automatically extracting facts that are important “Saccharomyces cerevisiae.” Protein names appear
in molecular biology and molecular medicine, and with different variations, such as Arabic to Roman
by arranging them in an intuitive graphical represen- numbering conversions (“factor 7” versus “factor
tation. The software extracts both objects and VII”). AliBaba maps all recognized entities to their
their interrelationships, forming edges and nodes respective entries in the source databases. This
of the graph, respectively. Users can also manually enables the user to quickly access further information,
edit graphs (adding, changing, or deleting nodes for instance, protein sequences or functional descrip-
and edges) to prepare them for further usage or tions not contained in the current set of PubMed
screenshots. An interface to KEGG allows to overlay abstracts. In addition, this mapping helps to overcome
extracted networks with known pathways, which the problem of synonymity. This arises when
A 20 AliBaba
AliBaba, Fig. 1 A drug-disease-protein network extracted likewise connected to morphine and CYP2D6. The reason is that
from a PubMed search result and shown in AliBaba. The net- codeine is bioactivated by CYP2D6 into morphine, which can
work shows a connection between codeine (marked in the graph lead to life-threatening intoxication in certain patients, who
with a blue frame), cough, poisoning, and CYP2D6. Poisoning is show an ultrarapid form of this metabolism mark
different authors use different names for the same 50% for protein–protein interactions, thus requiring
entity (for example, CD95, Fas, and Apo-1 for the different approaches. AliBaba’s second strategy is
same protein). based on aligning sentences against a predefined set
of patterns learned from a training corpus. This tech-
Relation Mining nique is used to find protein–protein interactions and
AliBaba follows two different strategies for finding cellular locations of proteins. For protein–protein
associations between entities. The first is based on the interactions, the method achieves a precision of
occurrence of two entities in the same sentence. In 79% and a recall of 52%, as evaluated on an indepen-
this case, AliBaba calculates the confidence score for dent corpus. An external evaluation on the
each pair of entities depending on the number of ▶ BioCreative II.5 and the FEBS Letters Experiment
entities that occur between them. Other studies have on Structured Digital Abstracts II IPS test corpus
shown that the precision of co-occurrence-based showed a recall of 69%. The alignment score also
methods can be as high as 94% for gene-disease and defines the quality of a potential match which is
other associations; but it is, for example, less than reflected in the confidence score.
Alternative Routes 21 A
Related Tools
Tools related to AliBaba are EBIMed (▶ Retrieving a-Helix, b-Sheet, Loop Carbon-Alpha
and Extracting Entity Relations from EBIMed) and Positioning A
iHOP.
▶ Protein Structure Metapredictors
Cross-References
Alternative Routes
▶ Applied Text Mining
▶ BioCreative II.5 and the FEBS Letters Experiment Ernesto Perez-Rueda
on Structured Digital Abstracts Departamento de Ingenieŕıa Celular y Biocatálisis,
▶ MEDLINE and PubMed Instituto de Biotecnologia, Universidad Nacional
▶ Named Entity Recognition Autonóma de México, Cuernavaca, Morelos, Mexico
▶ Retrieving and Extracting Entity Relations from
EBIMed
Definition
Allergen Cross-References
Definition
Cross-References
For the majority of genes, RNA transcribed from
DNA must be processed to become a mature messen- ▶ Post-transcriptional Regulatory Networks
ger RNA, which is later translated into protein. One
RNA processing step is splicing, which involves
the removal of intervening sequences, or introns,
and connection of exons, which contain the genetic Amplification
information required for protein synthesis. Greater
than 95% of the annotated multi-exons contain Kareenhalli V. Venkatesh
protein-coding genes that are alternatively spliced. Department of Chemical Engineering, Indian Institute
Alternative splicing is an important mechanism for of Technology Bombay, Powai, Mumbai,
gene regulation as well as a mechanism to create Maharashtra, India
a diverse proteome given a limited number of genes
encoded by the genome. Biologically, alternative
splicing can create different protein products, which Definition
may be differentially expressed both temporally and
spatially. Splicing is catalyzed by the spliceosome, Signaling networks contain modules that help the cell
a large macromolecular complex composed of five in responding to weak extracellular stimulus. Such
main small nuclear ribonucleoproteins (snRNP) in a capability to respond arises from the ability of the
concert with many other auxiliary proteins. However, signaling pathway to amplify a weak stimulus and is
the decision to remove or include particular exons or termed as amplification. There are two kinds of signal
splice sites is dictated by two elements, sequence amplification in biological systems, namely, magni-
elements encoded by the RNA (which usually reside tude amplification and sensitivity amplification. Mag-
at the exon-intron junction) and trans-acting proteins. nitude amplification occurs when the output response
Depending on the splicing event desired, the trans- molecules are produced in larger number than the input
acting proteins are subdivided into four categories signal (stimulus) molecules, while the sensitivity
based on its action: exonic splicing enhancers, exonic amplification occurs when the percentage change in
splicing silencers, intronic splicing enhancers, and response is higher than the percentage change in stim-
intronic splicing silencers. ulus (Koshland et al. 1982).
Many common alternative splicing events can Amplification can be characterized by plotting the
occur; these include exon skipping, intron retention, steady state input-output response for a given input
and alternative 5’ splice site selection, alternative 3’ signal for an upstream and downstream effector in
splice site selection, mutually exclusive exons, alter- a signaling pathway. The shift in the curve of a dose
native promoters, and alternative polyadenylation. response toward the left, for activation of
Exon skipping is by far the most common splicing a downstream component as compared to an upstream
event. On an average, exons are 50–200 bp in length component demonstrates amplification (see Fig. 1).
while intronic sequences are usually thousands of base Amplification is quantified using the half-saturation
pairs long. constant (K0.5) obtained using Hill Equation of the
Amplification 23 A
Amplification, 1
Fig. 1 Amplification in
0.9
am
signaling pathways. Dotted
tre
curve: Response of the 0.8 A
s
wn
upstream component; Solid
Do
Fractional Response
am
0.7
curve: Response of the
re
st
downstream component 0.6
Up
Upstream
0.5 K 0.5
Amplification = =3/1=3
Downstream
0.4 K 0.5
0.3
0.2
0.1
0
0 1 2 3 4 5 6 7 8 9 10
Input Concentration in Arbitrary Units (AU)
input-output response curves. The ratio of the half- cascade is lost due to the saturation of the upstream
saturation constants for an upstream component to component. If the upstream component is saturated
that of a downstream component to changes in an (more than 90%) for a given input value and for the
input signal provides the measure of amplification, same input value if the downstream component is also
and is defined as follows: saturated, then the role of signal amplification is insig-
nificant (see Fig. 1). Amplification aids the cell to
Upstream respond to the smallest variations in the stimulus
K0:5
Amplification ¼ Downstream (1) allowing it to respond to a broad range of stimulus
K0:5
strengths. However, the downside of amplification is
Upstream
where K0:5 Downstream
and K0:5 are the half-saturation that the design motif also amplifies the signal noise
constants of the upstream and downstream compo- along with the input signal (Shibata and Fujimoto
nents of a signaling pathway obtained using Hill equa- 2005).
tion (see Fig. 1). Cascade of covalent modification
cycles is known to yield amplification. A classic exam-
ple is the activation of MAPK pathway in Xenopus Cross-References
oocytes. Hundredfold amplification in MAPK and
30-fold amplification in MAPKK have been observed ▶ Hill Equation
with respect to the upstream kinase, MAPKK (Huang ▶ Pathway Modeling, Metabolic
and Ferrell 1996). Signal amplification increases along ▶ Signaling Network Resources
the cascade and the amplification at particular step
depends upon the signal amplitude of the preceding
step (Heinrich et al. 2002). The extent of amplification References
depends on the ratio of the counteracting kinase to
phosphatase reaction rates, where the rate of phospha- Heinrich R, Neel BG, Rapoport TA (2002) Mathematical models
tase should be lesser than that of kinase to bring about of protein kinase signal transduction. Mol Cell 9(5):957–970
Huang CYF, Ferrell JE (1996) Ultrasensitivity in the mitogen-
signal amplification. activated protein kinase cascade. Proc Natl Acad Sci USA
The cascading of signals through multiple steps 93(19):10078–10083
helps in amplifying a weak signal along with increas- Koshland DE, Goldbeter A, Stock JB (1982) Amplification and
ing the response sensitivity. The extent of amplifica- adaptation in regulatory and sensory systems. Science
217:220–225
tion decreases with increasing strength of the signal Shibata T, Fujimoto K (2005) Noisy signal amplification in
and vice versa. There is certain threshold for the input ultrasensitive signal transduction. Proc Natl Acad Sci USA
stimulus above which the amplification ability of the 102(2):331–336
A 24 Analysis of Variance
350
250
°
200
150
100
specify substrate recognition, promoting anaphase pro- Plk1, securin and cyclin B (reviewed in Peters 2006).
gression, mitotic exit, or the maintenance of G1. Securin is an inhibitor of separase, a protease that
cleaves cohesin at anaphase onset. Securin is
Cross-References degraded when all kinetochores are attached by spin-
dle microtubules and therefore the cell is ready to go
▶ Anaphase-Promoting Complex Inhibitors into anaphase. Destruction of securin in turn activates
▶ CDK Inhibitors separase so that it can cleave cohesin. Another sub-
strate cyclin B is required for mitotic/meiotic progres-
sion until metaphase, but is downregulated by APC/C
Anaphase-Promoting Complex at anaphase onset onward. In general, APC/C sub-
Inhibitors strates contain motifs such as the destruction box
(D-box) or the KEN box. These motifs are directly
Masamitsu Sato recognized by APC/C co-activators Cdc20 and Cdh1
Department of Biophysics and Biochemistry, Graduate (see below).
School of Science, University of Tokyo, Bunkyo–ku,
Tokyo, Japan
PRESTO, Japan Science and Technology Agency, Co-activators of APC/C
Kawaguchi, Saitama, Japan APC/C is activated by co-activators Cdc20/Slp1/Fizzy
and Cdh1/Ste9/Fzr (Fizzy-related). These
co-activators contain WD40 repeats as well as the
Synonyms IR-tail at the C-termini. Those proteins are thought to
be important for substrate recognition and specificity
Anaphase-Promoting Complex (APC/C); Cyclosome of APC/C. One of the most important aspects of
APC/C function is how ordered destruction of sub-
Definition strates is achieved. As shown in Fig. 1, there are mainly
three stages of the cell cycle when APC/C operates:
APC/C (anaphase-promoting complex/cyclosome) is prometa-metaphase, anaphase, and G1 phase. The
an E3 enzyme (ubiquitin ligase) that degrades sub- potential activity of APC/C-Cdc20 increased during
strates including cyclin B and securin, to trigger ana- prometaphase, but the degradation of other substrates
phase onset. An E3 enzyme brings its substrates into such as cyclin B and securin is delayed by spindle
close proximity of a ubiquitin conjugating enzyme E2 assembly checkpoint (see below). During
so that E2 can transfer ubiquitin to the substrate. prometaphase, cyclin A is degraded, whereas securin
APC/C is a 1.5 MDa protein complex composed of at and cyclin B are degraded at anaphase onset. In late
least a dozen subunits. Activation of APC/C is essen- anaphase, “substrate switch” occurs because APC/C
tial for anaphase onset during mitosis. In meiosis, in degrades its co-activator Cdc20 and replaces it with
order to ensure two consecutive rounds of chromo- Cdh1. In budding yeast, Cdh1 is activated by Cdc14,
some segregation in meiosis I and meiosis II, APC/C the protein phosphatase that functions antagonistic to
activity needs to be partially inhibited during the Cdk1. APC/C-Cdh1 contributes to the degradation of
interkinesis period (the period between meiosis residual cyclin B and other substrates such as Plk1 and
I and II). This is operated by APC/C inhibitors Mes1 Cdc20. Both Cdc20 and Cdh1 recognize D-boxes, but
in fission yeast and Erp1/Emi2 in Xenopus oocytes. Cdh1 can also interact with KEN-boxes. This differ-
ential recognition system by Cdc20 and Cdh1 should
contribute to the ordered destruction of substrates. In
Characteristics addition, the processivity of polyubiquitination is also
important for temporal regulation of destruction of
Substrates of APC/C Cdh1 substrates (Rape et al. 2006). Among Cdh1 sub-
The substrates of APC/C are key regulators of mitosis, strates, for instance, securin is shown to be
including Aurora kinases and the Polo-like kinase polyubiquitinated by APC/C-Cdh1 in a processive
Anaphase-Promoting Complex Inhibitors 27 A
Ubiquitination Degraded proteins
Ub
Ub Ub
Securin Securin
Ub
Ub Ub Ub Slow Ub Ub
Cyclin A Cyclin B Ub Ub Cyclin B Cyclin A Cyclin A
Ub
Cdk2 Cdk1 Cdk1
Ub Ub
Cdc20 Ub
Ub Ub
Securin Processive
Cdc14 Cdc14
Inactive Active
Anaphase-Promoting Complex Inhibitors, Fig. 1 A time- temporally organized by co-activators (Cdc20/Slp1/Fizzy and
line of APC/C activation and the substrate specificity. APC/C Cdh1/Ste9/Fzr). Red arrows indicate ubiquitination (Ub) by
has many kinds of substrates, but the timing of destruction is APC/C with co-activators
manner, while it takes long time for cyclin A to be Although budding yeast Mad3 contains a D-box,
polyubiquitinated (Fig. 1). Mad3 is not degraded by APC/C-Cdc20. Thus, Mad3
may be a pseudosubstrate inhibitor that docks to the
Inhibition of APC/C co-activator Cdc20 so that it cannot activate APC/C
For transition from meiosis I to meiosis II, inhibition of (Burton and Solomon 2007). In contrast, fission yeast
APC/C seems to be a conserved and essential system to Mad3 seems to be an actual substrate of APC/C to
ensure the consecutive M phases (see Fig. 2 of the inhibit cyclin degradation, which leads to metaphase
essay ▶ Meiosis). Moreover, inhibition of premature arrest if the checkpoint is activated (King et al. 2007).
activation of APC/C before anaphase is also an impor-
tant aspect of APC/C regulation. There are many kinds Mes1
of APC/C inhibitors that play roles in M phase pro- Fission yeast Mes1 is an APC/C inhibitor essential for
gression, as exemplified below. meiosis I–II transition. Mes1 is a small protein of
11.3 kDa and contains a KEN-box and a D-box to
The Spindle Assembly Checkpoint Components directly bind to the WD40 repeats of an APC/C
Mad3/BubR1 is a component of the spindle assembly co-activator Slp1/Cdc20/Fizzy. Mes1 is shown to be
checkpoint, which monitors if the attachment of micro- an actual substrate of APC/C (Kimata et al. 2008),
tubules to all kinetochores is established or not. When rather than a pseudosubstrate.
unattached kinetochores exist or tension between sister
chromatids is not sufficient, the spindle assembly Erp1/Emi2
checkpoint is activated to arrest the cell cycle. Mad2 An Erp1 homolog Emi1 is a pseudosubstrate inhibitor
recognizes unattached kinetochores, and Mad3/ of APC/C. An APC/C inhibitor Erp1 of Xenopus laevis
BubR1, together with Mad2, binds to Cdc20 to inhibit has a D-box and a zinc-binding region (ZBR). Muta-
premature activation of APC/C before anaphase. tion in the D-box lost the function of Erp1, and Erp1
A 28 Anaplasia
▶ Cancer Pathology
Cross-References
Marsha A. Moses
Department of Surgery/Harvard Medical School,
References Vascular Biology Program/Children’s Hospital
Boston, Boston, MA, USA
Burton JL, Solomon MJ (2007) Mad3p, a pseudosubstrate inhib-
itor of APCCdc20 in the spindle assembly checkpoint. Genes
Dev 21:655–667
Kimata Y, Trickey M, Izawa D, Gannon J, Yamamoto M, Definition
Yamano H (2008) A mutual inhibition between APC/C and
its substrate Mes1 required for meiotic progression in fission
yeast. Dev Cell 14:446–454 The angiogenic switch refers to a key early stage in
King EM, van der Sar SJ, Hardwick KG (2007) solid tumor progression during which an avascular
Mad3 KEN boxes mediate both Cdc20 and Mad3 tumor lesion first becomes vascularized (Fig. 1)
turnover, and are critical for the spindle checkpoint. PLoS
(Harper and Moses 2006). The acquisition of this
ONE 2:e342
Marangos P, Carroll J (2008) Securin regulates entry into angiogenic phenotype is a rate-limiting step in
M-phase by modulating the stability of cyclin B. Nat Cell a tumor’s development in that these new capillaries
Biol 10:445–451 provide necessary nutrients and gas exchange for the
Nishiyama T, Ohsumi K, Kishimoto T (2007) Phosphorylation
nascent tumor. The acquisition of a capillary network
of Erp1 by p90rsk is required for cytostatic factor arrest in
Xenopus laevis eggs. Nature 446:1096–1099 is required for exponential growth of the tumor and for
Peters JM (2006) The anaphase promoting complex/cyclosome: its metastasis. It has been proposed that the angiogenic
a machine designed to destroy. Nat Rev Mol Cell Biol switch occurs as a function of a perturbation in the
7:644–656
Rape M, Reddy SK, Kirschner MW (2006) The processivity of
balance between angiogenic stimulators and inhibitors
multiubiquitination by the APC determines the order of sub- in favor of the positive regulators (Folkman 2002;
strate degradation. Cell 124:89–103 Harper and Moses 2006).
Anoxia 29 A
Anisokaryosis
A
Barbara J. Davis
Section of Pathology, Tufts Cummings School of
Veterinary Medicine Biomedical Sciences,
North Grafton, MA, USA
Definition
Angiogenic Switch, Fig. 1 An in vivo tumor model that reli-
ably recapitulates the angiogenic switch (Harper and Moses Anisokaryosis is variation in nuclear size.
2006). Preangiogenic tumor nodules (avascular, A) and angio-
genic tumor nodules (vascular, V) are shown. Scale bar ¼ 1 mm
Cross-References
References ANOVA
Folkman J (2002) Role of angiogenesis in tumor growth and
metastasis. Semin Oncol 29(6 Suppl 16):15–18
▶ Analysis of Variance
Harper J, Moses MA (2006) Molecular regulation of tumor
angiogenesis: mechanisms and therapeutic implications. In:
Cancer: cell structures, carcinogens and genomic instability.
Birkhauser, Basel, pp 223–268
Anoxia
Bernhard Schmierer
Anisocytosis Department of Biochemistry, Oxford Centre for
Integrative Systems Biology (OCISB), University of
Barbara J. Davis Oxford, Oxford, UK
Section of Pathology, Tufts Cummings School of
Veterinary Medicine Biomedical Sciences,
North Grafton, MA, USA Definition
Cross-References Cross-References
Antigen-Binding Site
Definition
▶ Paratope
Antagonist is the opposite of an agonist. An antagonist
is a ligand that binds to the cell surface receptor but
does not trigger any response. Instead, once bound to
its target receptor, an antagonist blocks ligand access Antigenic Determinant
to the receptor.
▶ Epitope
▶ Systems Immunology, Data Modeling and
Scripting in R
Antibody-dependant Immune Response
Cross-References Characteristics
settings of potential toxicity. Evidence continues to antimicrobial peptides act by bacterial membrane dis-
mount in support that it is the fundamental differences ruption, which is absolutely necessary for bacterial
in biochemical and biophysical properties of microbial physiology; so, the chances of developing resistance
versus host cells that provide selective toxicity to anti- are remote. Antimicrobial peptides also possess
microbial peptides. The composition of membrane immunomodulating activity. These peptides have been
provides an important determinant by which antimi- reported to direct chemotaxis, neutralize Lipopolysac-
crobial peptides target microbial versus host mem- charides (LPS), and play role in wound healing too.
branes. Since the surface of the bacterial membranes Several peptides and their derivatives have already
is more negatively charged than mammalian cells, passed clinical trials successfully (Hancock and
antimicrobial peptides will show different affinities Chapple 1999; Levy 2000) and several others are in
toward the bacterial membranes and mammalian cell pipeline as potential therapeutics (Hancock and
membranes. Cationic property of antimicrobial pep- Chapple 1999). Their role as food preservative is well
tides enables them to interact more effectively with established; in addition, they have a number of other
the relatively charged microbial membrane than with biotechnological applications, e.g., in veterinary &
the relatively neutral mammalian membranes. So, Medical science, in transgenic plants (Osusky et al.
charge affinity is likely an important means conferring 2000), in aquaculture, and as aerosol spray for patients
selectivity to antimicrobial peptides. In addition, the of cystic fibrosis, as “chemical condoms” to limit the
presence of cholesterol membrane as stabilizing agent spread of sexually transmitted diseases, can enhance the
in mammalian cells and its absence in bacterial cells potency of existing antibiotics in vivo, probably by
also affect selectivity. Alternatively, access to host facilitating access of antibiotics into the bacterial cell
tissues may be restricted by the localization and regu- (Zasloff 2002). Designing novel peptides with antimi-
lated expression of antimicrobial peptides. crobial activities requires prediction methods to narrow
down the candidate peptides. Computer-aided predic-
Mode of Action tion methods (Lata et al. 2010, 2007) are proving to be
Antimicrobial peptides are believed to have various of great help in this direction.
modes of action to kill bacteria. These cationic pep-
tides first interact with the relatively negatively
charged bacterial membranes, attach to them and Cross-References
their amphipathicity allows them to partition thorough
the membrane and insert into membrane bilayers to ▶ Antimicrobial Peptides
form pores. Pore formation in the cell membrane ▶ Innate Immunity
brings about the leaking of the cytosolic components ▶ Pharmaceutical Toxicology, Application of
also known as cytolysis, thus killing the bacteria. Biosimulation
Alternatively, these may penetrate the cell membrane ▶ Prediction Model Integration
and travel inside the cell to bind intracellular mole-
cules like DNA, RNA, and proteins, thus hampering
the crucial metabolic activities. However, in many
References
cases, the exact mechanism of killing is not known.
Diamond G, Beckloff N et al (2009) The roles of antimicrobial
Applications peptides in innate host defense. Curr Pharm Des
In the past few decades, antimicrobial peptides have 15(21):2377–2392
Hancock RE, Chapple DS (1999) Peptide antibiotics.
emerged as promising solutions to the problem of grow- Antimicrob Agents Chemother 43(6):1317–1323
ing antibiotic resistance among various strains of bacte- Lata S, Sharma BK et al (2007) Analysis and prediction of
ria. Researchers are focusing on antimicrobial peptides, antibacterial peptides. BMC Bioinform 8:263
which also play an important role in innate immunity, as Lata S, Mishra NK et al (2010) AntiBP2: improved version of
antibacterial peptide prediction. BMC Bioinform 11(1):S19
alternative drugs. Their short length and fast & efficient
Levy O (2000) Antimicrobial proteins and peptides of
action against microbes has made them potential candi- blood: templates for novel antimicrobial agents. Blood
dates as peptide drugs. Unlike conventional antibiotics, 96(8):2664–2672
Applied Text Mining 33 A
Osusky M, Zhou G et al (2000) Transgenic plants expressing morphological and biochemical changes that result in
cationic peptide chimeras exhibit broad-spectrum resistance cell death. The changes include blebbing; cell shrink-
to phytopathogens. Nat Biotechnol 18(11):1162–1166
age; nuclear and DNA fragmentation. Apoptosis of
Powers JP, Hancock RE (2003) The relationship between pep- A
tide structure and antibacterial activity. Peptides tumor cells is beneficial as it terminates the growth of
24(11):1681–1691 cells on one hand and decreases the chances of metas-
Zasloff M (2002) Antimicrobial peptides of multicellular organ- tasis of tumors on the other. On the other hand, exces-
isms. Nature 415(6870):389–395
sive and inappropriate cell death is seen in case of
neurological diseases such as Parkinson disease.
Anti-oncogen Cross-References
Application Ontology
Antiviral Interferons
▶ Cell Cycle Ontology (CCO)
▶ Pro-inflammatory Mediators
Applied Text Mining, Fig. 1 Screen capture of the Hanalyzer system, showing the nature of an edge in the graph and the pointer to
the PubMed abstract that provides evidence for it
(Hunter et al. 2008). The net effect is a substantial uncovering bugs in programs than running those pro-
increase in the amount of knowledge available to the grams against huge test collections, and that in addition,
system just from preexisting databases. using such test suites is considerably more efficient
because it can have a much faster run time (Bretonnel
Software Testing and Quality Assurance in Cohen et al. 2008).
Applied Text Mining
The fact that applied text mining systems are intended to Acknowledgment The authors would like to thank Hannah
be applied to research in human health and disease Tipney for providing the Hanalyzer figure.
places a special burden on system builders to ensure
that their systems are built to the highest, industrial-
strength standards of software quality. Testing software Cross-References
whose input is language presents special challenges that
are not found in other types of software. Here it has been ▶ Natural Language Processing
found helpful to combine techniques from software
testing with techniques from the field of descriptive
linguistics, or the science of describing unknown lan-
References
guages. These two fields turn out to have much in
common, both in theoretical background and in terms Blaschke C, Andrade MA, Ouzounis C, Valencia A (1999)
of practical techniques. In this approach, the text mining Automatic extraction of biological information from scien-
application is modeled as a language that we do not tific text: protein–protein interactions. In: Intelligent Systems
for Molecular Biology, AAAI Press, Menlo Park, pp 60–67
know anything about. One finding of this research has
Blaschke CJ, Oliveros C, Valencia A (2001) Mining functional
been that using structured test suites containing information associated with expression arrays. Funct Integr
manufactured data can be a more effective tool for Genomics 1(4):256–268
Archetypes 37 A
Bretonnel Cohen K, Baumgartner WA Jr, Hunter L (2008) Soft- Dai HJ, Liu F, Chen YF, Sun CJ, Katrenko S, Adriaans P,
ware testing and the naturally occurring data assumption in Blaschke C, Torres R, Neves M, Nakov P, Divoli A,
natural language processing. In: Software Engineering, Test- Mana-Lopez M, Mata J, Wilber WJ (2008) Overview of
ing, and Quality Assurance for Natural Language Processing, BioCreative II gene mention recognition. Genome Biol A
Association for Computational Linguistics, Columbus, 9(Suppl 2): S2
pp 23–30
Chang JT, Raychaudhri S, Altman RB (2001) Including
biological literature improves homology search. In: Pacific
Symposium on Biocomputing, Mauna Lani, HI, pp 374–383
Chen H, Sharp BM (2004) Content-rich biological network (Approx.) Boundary Condition
constructed by mining pubmed abstracts. BMC Bioinformat-
ics 5:1471–2105
▶ Constraint
Craven M, Kumlien J (1999) Constructing biological knowledge
bases by extracting information from text sources. In: Intel-
ligent Systems for Molecular Biology, Heidelberg, pp 77–86
Gregory Caporaso J, Baumgartner WA Jr, Randolph DA,
Bretonnel Cohen K, Hunter L (2007) Mutationfinder:
a high-performance system for extracting point mutation
ARC
mentions from text. Bioinformatics 23:1862–1865
Hersh W, Voorhees E (2008) TREC genomics special issue ▶ Mediator
overview. Inf Retrieval 12:1
Hirschman L, Yeh A, Blaschke C, Valencia A (2005) Overview
of BioCreAtIvE: critical assessment of information extrac-
tion for biology. BMC Bioinformatics 6:S1
Horn F, Lau AL, Cohen FE (2004) Automated extraction of Archetypes
mutation data from the literature: application of MuteXt to
G protein-coupled receptors and nuclear hormone receptors. Jesus Bisbal
Bioinformatics 20(4):557–568
Hu ZZ, Narayanaswami M, Ravikumar KE, Vijay-Shanker K, Departament de Tecnologies de la Informació i les
Wu CH (2005) Literature mining and database annotation of Comunicacions, Universitat Pompeu Fabra,
protein phosphorylation using a rule-based system. Bioinfor- Barcelona, Spain
matics 21(11):2759–2765
Hunter L, B̃retonnel Cohen K (2006) Biomedical language
processing: what’s beyond PubMed? Mol Cell 21:589–594
Hunter L, Lu Z, Firby J, Baumgartner WA Jr, Johnson HL, Ogren Synonyms
PV, Bretonnel Cohen K (2008) OpenDMAP: an open-source,
ontology-driven concept analysis engine, with applications to Templates
capturing knowledge regarding protein transport, protein
interactions and cell-specific gene expression. BMC Bioin-
formatics 9(78) http://www.biomedcentral.com/1471-2105/9/
78 (doi:10.1186/1471-2105- 9-78) Definition
Krallinger M, Leitner F, Rodriguez-Penagos C, Valencia A
(2008a) Overview of the protein–protein interaction
annotation extraction task of BioCreative II. Genome Biol Archetypes are the formal representation of informa-
9(suppl 2):S4 tion concepts used in a given domain of application.
Krallinger M, Morgan A, Smith L, Leitner F, Tanabe L, An advanced approach to the modeling and man-
Wilbur J, Hirschman L, Valencia A (2008b) The BioCreative agement of information separates domain information
II – critical assessment for information extraction in biology
challenge. Genome Biol 9:1 models into three independent components: data,
Leach SM, Tipney H, Feng W, Baumgartner WA Jr, Kasliwal P, archetypes (i.e., structure), and semantics. Archetypes
Schuyler RP, Williams T, Spritz RA, Hunter L (2009) are used to define the different sets of related data
Biomedical discovery acceleration, with applications to cra- attributes, which together provide the necessary con-
niofacial development. PLoS Comput Biol 5(3):e1000215
Ng S-K (2006) Integrating text mining with data mining. In: text to adequately interpret the information which is to
Ananiadou S, McNaught J (eds) Text mining for biology and be stored, communicated, or shared. External ontol-
biomedicine. Artech House, Norwood ogies or terminological resources are optionally
Smith L, Tanabe LK, Johnson R, Kuo CJ, Chung IF, Hsu CN, referred to from the archetype definitions in order to
Lin YS, Klinger R, Friedrich CM, Ganchev K, Torii M,
Liu HF, Haddow B, Struble CA, Povinelli RJ, Vlachos A, annotate those definitions with semantically rich
Baumgartner WA Jr, Hunter L, Carpenter B, Tsai RTH, resources, facilitating interoperability and integration.
A 38 Area under the ROC Curve
A
0.8 B
True positive rate
Francisco Melo
0.4
Pontificia Universidad Católica de Chile,
Santiago, Chile
Millennium Institute on Immunology and 0.2
In Computer Science
Within computer science, artificial evolution is com-
Argonaute-mediated Cleavage monly implemented in terms of an evolutionary or
genetic algorithm (▶ Genetic Algorithms, ▶ Evolution
▶ Target Cleavage Programming) (GA). A GA is a search procedure that
implements the three essential elements of Darwinian
evolution - replication, mutation, and selection. Typi-
cally, the procedure acts on a population of candidate
Arrays solutions, which are scored according to a fitness
criterion and selected to be replicated according to
▶ DNA Microarrays a formula that takes the fitness score into account
(see Fig. 1).
This formula may be as simple as selecting
a percentage of the highest fitness individuals (elite
Arrow Ontology selection) or be a probability that is proportional to
fitness (roulette wheel selection) or a mixture of those.
▶ Edge Ontology The fitness function is usually determined by the user
so that the maximum of that function is achieved for
the optimal solution. In cases where it is not known
which function is maximized by the optimal solution to
Artificial Evolution a problem, the construction of an appropriate fitness
function can be a difficult research problem. After
Christoph Adami being selected for replication, individuals are mutated
Department of Microbiology and Molecular Genetics, and recombined with a given rate to create new
Michigan State University, East Lansing, MI, USA candidate solutions that have inherited the features of
their parents but potentially carry new features not
previously present in the population. Typically, GAs
Synonyms use a variety of mutation mechanisms to create
variation. Evolutionary algorithms are alternatives to
Simulated evolution more traditional random or heuristic search algorithms
and work best in large search spaces where the best
solution consists of partial solutions that are also fit.
Definition
In Engineering
Artificial evolution refers to any procedure that uses Artificial evolution can be used to solve engineering
the mechanism of Darwinian evolution to generate problems by applying evolutionary computation tech-
a product. While this definition encompasses the evo- niques to hardware rather than software. For example,
lution of organic life forms by means of artificial it is possible to design electronic circuits in
selection (breeding of animals, plants, or microorgan- reconfigurable hardware (Thompson 1996) that have
isms), artificial evolution commonly refers to the novel properties that exploit the material properties of
instantiation of evolution within a nonbiological the substrate. In another example of unconventional
medium. Artificial evolution is sometimes used computing (▶ Unconventional Computation), a team
A 40 Artificial Evolution
Digital Life
evolved a circuit with the goal of producing an oscil- Computer viruses (or, more generally, malware)
latory signal. One of the resulting circuits evolved evolve by artificial means when the malware creators
a radio receiver that captured the clock signal from react to a changed fitness landscape (e.g., new security
a computer on a nearby desk (Bird and Layzell 2002). countermeasures) by adapting their virus to these
changes. In this instantiation of artificial evolution,
In Evolutionary Biology replication is usually a key feature of the malware
Artificial evolution is used in evolutionary biology to program, while the fitness function is implicit to the
illustrate the process of evolution itself by instantiating environment within which the virus seeks to replicate,
evolution algorithmically, usually within a computer rather than specified externally. Variation is usually
(Adami 1998). One of the earliest uses of computers to directed by the malicious users but can sometimes be
study the process of evolution is Richard Dawkins’ autonomous. The concept of an evolving computer
thought experiment that demonstrates the process of program as an instantiation of evolution (as opposed
evolution in an artificial medium by evolving the target to a simulation of evolution) gave rise to the field of
phrase “Methinks it is like a weasel” from a randomly digital life, where self-replicating computer programs
generated sequence of letters drawn from an alphabet are executed on virtual (simulated) central processing
of 28 (Dawkins 1986). In this example, sequences are units (CPUs). By encasing programs in a virtual world,
copied 100 times (instantiating heredity), while 1 in 20 it is possible to design the language they are written in
characters is changed (instantiating the process of (their genetic code), to control the program’s rate of
mutation). The resulting sequences are compared to mutation, and the complexity of the world they live in.
the target phrase, and the sequence that is closest to The first working digital life system called “tierra”
the target is chosen to be copied again (instantiating was created by Tom Ray (Ray 1992). In tierra,
selection), see Fig. 2. self-replicating computer programs live in simulated
In the same book, Dawkins introduces the core memory and compete for CPU time and memory
“biomorph” program that evolves tree-like structures space. Because the ▶ fitness of any particular ▶ digital
whose appearance is determined by nine developmen- organism in tierra is solely determined by its ability to
tal “genes” that determine the tree’s appearance. The contribute offspring to future generations, there is no
genes that encode the tree are mutated and recombined a priori optimal program. The digital life system
at random and selected by the user that, thus, guides the “Avida” (see Figs. 3 and 4) was developed at the
process of evolution to generate desired shapes. California Institute of Technology in order to study
Another well-known example of artificial evolution fundamental aspects of evolutionary biology, such as
that probes the coevolution of morphology and the evolution of complexity (Adami and Brown 1994;
behavior is due to Karl Sims (Sims 1994), who studied Ofria et al. 1998). Digital life research can provide
how creatures built from interconnected blocks a system’s view of evolutionary biology because
Artificial Evolution 41 A
Artificial Evolution,
Fig. 3 The CPU acting on
Memory Input
a self-replicating program in
the Avida environment. In this
nand IP Output A
version, four threads (marked
nop-A OP1? Environment
“Heads”) are executing the
nop-B CPU
avidian code simultaneously,
akin to the transcription of nop-D
genetic code by four DNA h-search FLOW Registers
polymerases h-copy AX:FF0265DC
if-label READ
BX:00000100
nop-C
h-divide CX:1864CDFE
h-copy
mov-head WRITE
nop-B
Heads
Stacks
Artificial Evolution,
Fig. 4 An avidian genome in
the act of self-replication.
Letters a–z stand for one of the
26 possible instructions. The
solid black line follows the
forward execution, while red
denotes a backward jump
digital organisms are best understood within the study the process of evolution itself but can also
selective environment they evolved in, which includes be adapted to study the evolution of behavior and
the population of programs itself. intelligence as well as software design (Beckmann
Digital life research has given rise to a number of et al. 2008).
discoveries in evolutionary biology that have been
validated later in biological organisms, such as the
“survival-of-the-flattest” effect or the importance of Cross-References
▶ epistasis in the evolution of ▶ complexity
(Adami 2006). This type of work illustrates the ▶ Artificial Life
idea that the difference between evolution in the bio- ▶ Complexity
chemical realm and evolution in the computational ▶ Digital Organism
domain lies only in the substrate that carries ▶ Epistasis
the information that evolves, but that the processes ▶ Evolution Programming
that underlie the dynamics (the algorithmic rules) ▶ Fitness
are the same. Because of this generality, digital life ▶ Genetic Algorithms
systems such as Avida have been used not only to ▶ Unconventional Computation
A 42 Artificial Immune Systems
▶ Information in Biological Modeling Kim JT (2001) Transsys: a generic formalism for modelling
▶ Information Theory and Toponomics regulatory networks in morphogenesis. In: Kelemen J,
Sosik P (eds) Advances in artificial life (ECAL 2001),
▶ Lindenmayer System vol 2159, Lecture notes in artificial intelligence. Springer
▶ Organization Verlag, Berlin/Heidelberg, pp 242–251
▶ Probabilistic Boolean Networks Kiran M, Richmond P, Holcombe M, Chin LS, Worth D,
▶ Reaction-Diffusion-Advection Equation Greenough C (2010) FLAME: simulating large populations
of agents on paralled hardware architectures. In: Proceedings
▶ Regulatory Networks of the 9th international conference on autonomous agents and
▶ Unconventional Computation multiagent systems: volume 1, vol 1. International Founda-
tion for Autonomous Agents and Multiagent Systems, Rich-
land, SC, pp 1633–1636, http://dl.acm.org/citation.cfm?
id¼1838206. 1838517. ISBN 978-0-9826571-1-9
References Knabe JF, Schilstra MJ, Nehaniv C (2010) Evolution and mor-
phogenesis of differentiated multicellular organisms: auton-
Adamatzky A, Komosinski M (eds) (2005) Framsticks: omously generated diffusion gradients for positional
a platform for modelings, simulating and evolving 3D crea- information. In: Bullock S, Noble J, Watson RA, Bedau
tures. Springer, Berlin/Heidelberg, pp 37–66 MA (eds) Artificial life XI. MIT Press, Cambridge, MA,
Bedau MA, Packard NH (1992) Measurement of pp 321–328
evolutionary activity, teleology, and life. In: Langton CG, Kniemeyer O, Buck-Sorlin GH, Kurth W (2004) A graph gram-
Taylor C, Farmer JD, Rasmussen S (eds) Artificial life II, mar approach to artificial life. Artif Life 10(4):413–431.
vol X of Santa Fe institute studies in the sciences of doi:10.1162/1064546041766451, ISSN 10654–5462
complexity, proceedings. Addison-Wesley, Redwood City, Kumar S, Bentley PJ (eds) (2003) On growth, from and com-
pp 431–461 puters. Elsevier, Amsterdam. ISBN 0-12-428765-4
Bedau MA, McCaskill JS, Packard NH, Rasmussen S, Adami C, Langton CG (1984) Self-reproduction in cellular automata.
Green DG, Ikegami T, Kaneko K, Ray TS (2000) Open Physica D 22:120–149
problems in artificial life. Artif Life 6:363–376 Langton CG (1992a) Life at the edge of chaos. In: Langton CG,
Boerlijst M, Hogewag P (1992) Self-structuring and selection: Taylor C, Farmer JD, Rasmussen S (eds) Artificial life II,
spiral waves as a substrate for prebiotic evolution. In: volume X of Santa Fe institute studies in the sciences of
Langton CG, Taylor C, Farmer JD, Rasmussen S (eds) Arti- complexity, proceedings. Addison-Wesley, Redwood City,
ficial life II, vol X of Santa Fe institute studies in the sciences CA, pp 41–91
of complexity, proceedings. Addison-Wesley, Redwood Langton CG (1992b) Preface. In: Langton CG, Taylor C,
City, pp 431–461 Farmer JD, Rasmussen S (eds) Artificial life II, volume X
Carlson S (2000) Artificial life. Boids of a feather flock together. of Santa Fe institute studies in the sciences of complexity,
Sci Am 283:112–114 proceedings. Addison-Wesley, Redwood City, CA,
Cover TM, Thomas JA (1991) Elements of information theory. pp xiii–xviii
Wiley, New York Marnellos G, Mjolsness E (1998) A gene network model of
Dittrich P, Ziegler J, Banzhaf W (2001) Artificial chemistries — resource allocation to growth and reproduction. In:
a review. Artif Life 7:225–275 Adami C, Belew RK, Kitanp H, Taylor C (eds) Artificial
Eigen M, Schuster P (1978) The hypercycle. a principle of life VI. MIT Press, Cambridge, MA, pp 433–437
natural self-organization. Part a: emergence of the Meinhardt H (1982) Models of biological pattern formation.
hypercycle. Naturwissenschaften 64:541–565 Academic, London
Flann N, Hu J, Bansal M, Patel V, Podgorski G (2005) Biological Nolfi S, Floreano D (2000) Evolutionary robotics. MIT Press,
development of cell patterns: characterizing the space of cell Cambridge, MA. ISBN 978-0-262-14070-6
chemistry genetic regulatory networks. In: Capcarrere M, Ofria C, Brown TC, Adami C (1998) Avida user’s manual. In:
Freitas AA, Bentley PJ, Johnson CG, Timmis J (eds) Adami C (ed) Introduction to artificial life. TELOS/Springer-
Advances in artificial life (ECAL 2005), vol 3630, Lecture Verlag, Berlin/Heidelberg
notes in artificial intelligence. Springer Verlag, Berlin/ Prusinkiewicz P, Lindenmayer A (1990) The algorithmic beauty
Heidelberg, pp 57–66 of plants. Springer, New York
Gesteland RF, Cech TR, Atkins JF (eds) (2006) The RNA Ray TS (1992) An approach to the synthesis of life. In:
World. Cold Spring Harbor Laboratory Press, Cold Spring Langton CG, Taylor C, Farmer JD, Rasmussen S (eds) Arti-
Harbor. ISBN 0-87969-739-3 ficial life II, volume X of Santa Fe institute studies in
Jacob C (1996) Evolution programs evolved. In: Voigt H-M, the sciences of complexity, proceedings. Addison-Wesley,
Ebeling W, Rechenberg I, Schwefel H-P (eds) PPSN-IV. Redwood City, CA, pp 371–408
Springer-Verlag, Berlin, pp 42–51 Reynolds CW (1987) Flocks, herds, and schools: a distributed
Kauffman SA, Weinberger EW (1989) The NK model of rugged behavioral model. Comput Graph 21:25–34, http://www.cs.
fitness landscapes and its application to maturation of the toronto.edu/dt/siggraph97-course/cwr87/
immune response. J Theor Biol 141:211–245 Sims K (1994) Evolving 3D morphology and behavior by com-
Kim JT (2000) LindEvol: artificial models for natural plant petition. In: Brooks RA, Maes P (eds) Artificial life IV.
evolution. K€unstliche Intelligenz 1(2000):26–32 Cambridge, MA, MIT Press, pp 28–39
Asymmetric Cell Division 47 A
Smith JM, Szathmáry E (1995) The major transitions in evolu- The importance of an association rule is often char-
tion. Oxford University Press, Oxford, UK. ISBN acterized with two measures:
978–0198502944
Support measures the fraction of all rows in the data-
Sniegowski PD, Gerrish PJ, Lenski RE (1997) Evolution of high A
mutation rates in experimental populations of e. coli. Nature base that satisfy both, body and head of the rule.
387:703–705 Rules with higher support are more important.
Steels L, Brooks RA (eds) (1995) The artificial life route Confidence measures the fraction of the rows that
to artificial intelligence: building embodied situated
agents. Lawrence Erlbaum, New Haven, CT. ISBN 978-0- satisfy the body of the rule, which also satisfy the
8058-1518-4 head of the rule. Rules with high confidence have
Wuensche A (2011) Esploring discrete dynamics. Luniver Press, a higher correlation between the properties
Frome, UK described in the head and the properties described
Zhang SQ, Ching WK, Ng MK, Akutsu T (2007) Simulation
study in probabilistic boolean network models for genetic in the body.
regulatory networks. Int J Data Mining Bioinformatics If the above rule has a support of 10% and a confidence
1:217–240 of 80%, this means that 10% of all people buy bread,
butter, milk, and cheese together, and that 80% of all
people that buy bread and butter also buy milk and
cheese.
Asf1
Cross-References
▶ CIA/Asf1
▶ Rule
▶ Rule-Based Methods
Assay
▶ Biological Assay
Asymmetric Cell Division
Heiko Enderling
Association Rule Center of Cancer Systems Biology, St. Elizabeth’s
Medical Center - CBR 115D, Tufts University School
Johannes F€
urnkranz of Medicine, Boston, MA, USA
Technical University of Darmstadt, Darmstadt,
Germany
Definition
B
specifies that people that buy bread and butter also tend
to buy milk and cheese.
A 48 Asymmetry of the Heart, Development
Cross-References
HH 11
Asymmetry of the Heart, Development TBX18
WT1 PITX2
Philippe Huneman
Institut d’Histoire et de Philosophie (IHPST), des
PITX2 SNAI1 NODAL
Sciences et des Techniques, Université Paris 1
Panthéon-Sorbonne, Paris, France FGF8 NODAL CFC HH 4-6
BMP4 SHH
Definition
Definition
Definition
In mammals, yeast SAGA has diverged into two
Asynchrony, in the general meaning, is the state of related complexes: SAGA, which like yeast SALSA/
not being synchronized (▶ Synchronization). In sig- SLIK lacks Spt8, and ATAC (Ada-Two-A-Containing)
naling network, asynchrony refers to the situation in (Spedale et al. 2012). Human SAGA contains 18 sub-
which each unit in the network shows independent units (catalytic subunit Gcn5 or Pcaf, Taf5L, Taf6L,
responses to external signals, and no correlation Taf9, Taf10, Taf12, Ada1, Ada2b, Ada3, Spt3, Spt7,
among the units. Spt20, Sgf11/ATX7L3, Sgf29, Sgf73/ATX7, Usp22,
ENY2, and TRRAP), whereas human ATAC contains
12 subunits (catalytic subunit Gcn5 or Pcaf, catalytic
subunit Atac2, Ada2a, Ada3, Sgf29, Atac1, Hcf1,
Cross-References Wdr5, NC2b, Yeats2, MBIP, CHRAC17). SAGA and
ATAC can both contain either Gcn5 or Pcaf
▶ Synchronization (paralogues) as a catalytic subunit; thus, there are two
SAGA ATAC
Atac2 Atac2
ATLL Synonyms
Bodenreider O, McCray A (2003) Exploring semantic groups A formal language is designed in which a problem’s
through visual approaches. J Biomed Inform 36(6):414–432 assumptions and conclusion can be written down and
Hirschman L, Yeh A, Blaschke C, Valencia A (2005) Overview
of BioCreAtIvE: critical assessment of information extrac- algorithms are provided to solve the problem with
tion for biology. BMC Bioinform 6(1):S1 a computer in an efficient way.
Kim J.D, Ohta T, Tsuruoka Y, Tateishi Y, Collier N (2004)
Introduction to the bio-entity recognition task at JNLPBA.
In: Proceedings of the JNLPBA-04, Geneva, Switzerland,
pp 70–75 Characteristics
Kim J.D, Ohta T, Pyysalo S, Kano Y, Tsujii J (2009) Overview
of BioNLP’09 shared task on event extraction. In: Proceed- People employ reasoning informally by taking a set of
ings of the workshop on BioNLP: shared task, Colorado, premises and somehow arriving at a conclusion, that is,
pp 1–9
Krallinger M, Morgan A, Smith L, Leitner F, Ta-nabe L, Wilbur J, it is entailed by the premises (▶ Deduction), arriving at
Hirschman L, Valencia A (2008) Evaluation of textmining a hypothesis (▶ Abduction), or generalizing from facts
systems for biology: overview of the second BioCreAtIvE to an assumption (▶ Induction). Mathematicians and
community challenge. Genome Biol 9(2):S1 computer scientists developed ways to capture this
LLL’05 challenge. http://www.cs.york.ac.uk/aig/lll/lll05/
Maglott D, Ostell J, Pruitt KD, Tatusova T (2007) Entrez gene: formally with logic languages to represent the knowl-
gene-centered information at NCBI. Nucleic Acids Res 35 edge and rules that may be applied to the axioms so that
(Database issue):D26–D31 one can construct a formal proof that the conclusion
Proceedings of the First CALBC Workshop. http://www.ebi.ac. can be derived from the premises. This can be done by
uk/Rebholz-srv/CALBC/docs/1stworkshopProceeding.pdf
Rebholz-Schuhmann D, Kirsch H, Nenadic G (2006) IeXML: hand (Solow 2005) for small theories, but this does not
towards a framework for interoperability of text processing scale up when one has, say, 80 or more axioms even
modules to improve annotation of semantic types in biomed- though there are much larger theories that require
ical text. In: Proceedings of BioLINK, ISMB 2006, a formal analysis, such as checking that the theory
Fortaleza, Brazil
Rebholz-Schuhmann D, Jimeno Yepes AJ, van Mulligen E.M, can indeed have a model and thus does not contradict
Kang N, Kors J, Milward D, Corbett P, Buyko E, itself. To this end, much work has gone into automat-
Tomanek K, Beisswanger E, Hahn U (2010) The CALBC ing reasoning.
silver standard corpus for biomedical named entities: a study The remainder of this section introduces briefly
in harmonizing the contributions from four independent
named entity taggers. In: Proceedings of the LREC several of the many purposes and usages of automated
(to appear) reasoning, the principal mechanisms (being deduction
Rebholz-Schuhmann D et al (2010b) Assessment of NER solu- and to a lesser extent abduction and induction), its
tions against the first and second CALBC Silver Standard limitations, and types of automated reasoners.
Corpus. Semantic Mining in Biomedicine, Hinxton
Rebholz-Schuhmann D, Jimeno Yepes AJ, Van Mulligen EM,
Kang N, Kors J, Milward D, Corbett P, Buyko E, Purposes
Beisswanger E, Hahn U (2010c) CALBC silver standard Automated reasoning, and deduction in particular, has
corpus. J Bioinform Comput Biol 8(1):163–179 found applications in “everyday life.” A notable exam-
The Universal Protein Resource (UniProt) (2009) Nucleic Acids
Res 37(Database):D169–D174 ple is hardware and (critical) software verification,
which gained prominence after Intel had shipped its
Pentium processors with a floating point unit error in
1994 that lost the company about $500 million. Since
Automated Reasoning then, chips are routinely automatically proven to func-
tion correctly according to specification before taken
C. Maria Keet into production. A different scenario is scheduling
KRDB Research Centre, Free University of problems at schools to find an optimal combination
Bozen-Bolzano, Bolzano, Italy of course, lecturer, and timing for the class or
degree program, which used to take a summer to do
manually but can now be computed in a fraction of
Definition it using constraint programming. Much closer to sys-
tems biology is the demonstration of discovering
Automated reasoning is the approach of implementing (more precisely: deriving) novel knowledge about pro-
the ability to make inferences on a computing system. tein phosphatases by Wolstencroft and coauthors
Automated Reasoning 55 A
(in Baker and Cheung 2008). They represented the some results); if it is unsound, then false conclusions
knowledge about the subject domain of protein phos- can be derived from true premises, which his even
phatases in humans in a formal ▶ bio-ontology and more undesirable (Hedman 2004; Portoraro 2010).
A
classified the enzymes of both human and the fungus An example is included in ▶ Proof, proving the
Aspergillus fumigatus using an automated reasoner, validity of a formula (the class of the problem) in
which showed that (1) the reasoner was as good as propositional logic (the formal language) using tableau
human expert classification, (2) it identified additional reasoning (the way how to compute the solution) with
p-domains (an aspect of the phosphatases) so that the the Tree Proof Generator (the automated reasoner).
human-originated classification could be refined, and
(3) it identified a novel type of calcineurin phosphatase Deduction
like in other pathogenic fungi. The fact that one can use Deduction is a way to ascertain if a theory
an automated reasoner (in this case: deduction, using T represented in a logic language entails an axiom a
a Description Logics knowledge base) as a viable that is not explicitly asserted in T (written as T |¼ a),
method in science is an encouragement to explore that is, a can be derived from the premises
such avenues further. though repeated application of deduction rules for
instance, a theory that states that “each arachnid has
Basic Idea as part 8 legs” and “each tarantula is an arachnid,” then
Essential to automated reasoning are: one can deduce – it is entailed in the theory – that “each
1. The choice of the class of problems the software tarantula has as part 8 legs.” An example is included in
program has to solve, such as checking the consis- ▶ deduction, which formally demonstrates that
tency of a theory (i.e., there are no contradictions) a formula is entailed in a theory T using said rules.
or computing a classification hierarchy of concepts Thus, strictly speaking, a deduction does not reveal
subsuming one another based on the properties novel knowledge but only that what was already
represented in the logical theory represented implicitly in the theory. Nevertheless,
2. The formal language in which to represent the with large theories, it is often difficult to oversee all
problems, which may have more or less features to implications of the represented knowledge, and, hence,
represent the subject domain knowledge, such as the deductions may be perceived as novel from
cardinality constraints (e.g., a member of the arach- a domain expert perspective, such as with the example
nids has as part exactly eight legs), probabilities, or about the protein phosphatases. This is in contrast to
temporal knowledge (e.g., that a butterfly is ▶ abduction and ▶ induction, where the reasoner
a transformation of a caterpillar) “guesses” knowledge that is not already entailed in
3. The way how the program has to compute the solu- the theory.
tion, such as using natural deduction or resolution
4. How to do this efficiently, be this achieved through Abduction
constraining the language into one of low complex- Compared to deduction, there is less permeation of
ity or optimizing the algorithms to compute the automated reasoning for abduction. From a scientist’s
solution or both perspective, automation of abduction may seem
Concerning the first item, with a problem being, for appealing because it would help one generate
example, “find out if my theory is consistent,” then the a hypothesis based on the facts put into the reasoner
problem’s assumptions are the axioms in the logical (Aliseda 2004). Practically, it has been used for, for
theory and the problem’s conclusion that is computed instance, fault detection: given the knowledge about
by the automated reasoner is a “yes” or a “no” a system and the observed defective state, find the
(provided the language in which the assumptions are likely fault in the system.
represented is decidable and thus guaranteed to termi- To formally capture theory with assumptions and
nate with an answer). With respect to how this is done facts and find the conclusion, several approaches have
(item 3), two properties are important for the calculus been proposed, each with their specific application
used: soundness and completeness (T |- j if and only if areas, for instance, sequent calculus, belief revision,
M |¼ j). If it is incomplete, then there exist entail- probabilistic abductive reasoning, and ▶ Bayesian
ments that cannot be computed (hence, “missing” networks.
A 56 Automated Reasoning
Induction that is, they are languages such that the corresponding
Logic Induction allows one to arrive at a conclusion reasoning services are guaranteed to terminate with an
that actually may be false even though the premises are answer. Description Logics form the basis of most of
true. The premises provide a degree of support so as to the Web Ontology Languages OWL and OWL 2 and
infer a as an explanation of b. Such a “degree” can be are gaining increasing importance in the semantic web
based on probabilities (a statistical syllogism) or applications area. Giving up expressiveness, however,
analogy. For instance, we have a premise that does lead to criticism from the modelers’ community,
“the proportion of bacteria that acquire genes through as a computationally nice language may not have the
horizontal gene transfer is 95%” and the fact that features deemed necessary to represent the subject
“Staphylococcus aureus is a bacterium,” then we domain adequately.
induce that the probability that S. aureus acquires
genes through horizontal gene transfer is 95%. Induc- Tools
tion by analogy is weaker version of reasoning, in There are many tools for automated reasoning, which
particular in logic-based systems, and yields very dif- differ in which language they accept, the reasoning
ferent answers than deduction. For instance, let us services they provide, and, with that, the purpose they
encode that some instance, say, Tibbles is a cat and aim to serve.
we know that all cats have the properties of having There are, among others, generic first- and higher-
a tail, four legs, and are furry. When we encode that order logic theorem provers (e.g., Prover9, MACE4,
another animal, Tib, who happens to have four legs and Vampire, HOL4); SAT solvers that compute if there is
is also furry, then by inductive reasoning by analogy, a model for the formal theory (e.g., GRASP, Satz);
we conclude that Tib is also a cat, even though in constraint satisfaction programming for solving, for
reality it may well be an instance of cheetah. On the example, scheduling problems and reasoning with soft
other hand, by deductive reasoning, Tib will not be constraints (e.g., Eclipse); DL reasoners that are used for
classified as being an instance of cat but may be an reasoning over OWL ontologies using deductive reason-
instance of a superclass of cats (e.g., still within the ing to compute satisfiability, consistency, and perform
suborder Feliformia), provided that the superclass has taxonomic and instance classification (e.g., Fact++,
declared that all instances have four legs and are furry. RacerPro, Hermit, CEL, QuOnto); and inductive logic
Given that humans do perform such reasoning, there programming tools (e.g., PROGOL and Aleph).
are attempts to mimic this process in software applica-
tions, most notably in the area of machine learning and
inductive logic programming (Lavrac and Dzeroski Cross-References
1994). The principal approach with inductive logic
programming is to take as input positive examples + ▶ Abduction
negative examples + background knowledge and then ▶ Bio-Ontologies
derive a hypothesized logic program that entails all the ▶ Classification
positive and none of the negative examples. ▶ Closed World Assumption
▶ Deduction
Limitations ▶ Fuzzy Logic
While many advances have been made in specific ▶ Induction
application areas, the main limitation of the ▶ Knowledge Representation
implementations is due to the computational complex- ▶ Open World Assumption
ity of the chosen representation language and the ▶ Proof, Logic
desired automated reasoning services. This is being
addressed by implementations of optimizations of the
algorithms or by limiting the expressiveness of the References
language, or both. One family of logics that focus
Aliseda A (2004) Logics in scientific discovery. Found Sci
principally on “computationally well-behaved” lan- 9:339–363
guages is Description Logics, which are decidable Baader F, Calvanese D, McGuinness DL, Nardi D,
fragments of first order logic (Baader et al. 2008); Patel-Schneider PF (eds) (2008) The description logics
Automated Term Recognition 57 A
handbook: theory and applications, 2nd edn. Cambridge the linguistic realization of specialized concepts in
University Press, Cambridge, UK the biology domain. As the main purpose of terms is
Baker CJO, Cheung H (eds) (2008) Semantic web:
to classify scientific knowledge, they are used as
revolutionizing knowledge discovery in the life sciences. A
Springer, New York a means of scientific communication to convey biolog-
Hedman S (2004) A first course in logic – an introduction to ical concepts. Once extracted, they can be used to
model theory, proof theory, computability, and complexity. construct conceptual networks of knowledge, and
Oxford University Press, Oxford, UK
Lavrac N, Dzeroski S (1994) Inductive logic programming: may be organized into hierarchies denoting different
techniques and applications. Ellis Horwood, New York types of relationships, such as is-a, part-of, etc., in
Portoraro F (2010) Automated reasoning. In: Zalta E (ed) addition to domain-specific relations such as inhibits,
Stanford encyclopedia of philosophy. Stable URL, http:// induces, etc. These relationships can be subsequently
plato.stanford.edu/entries/reasoning-automated/
Solow D (2005) How to read and do proofs: an introduction used to develop ontologies (▶ Ontologies). Biological
to mathematical thought processes, 4th edn. Wiley, terms are complex, since they include a vast number of
Hoboken synonyms and variants, such as acronyms. As an
example, consider the term ERK2, which denotes
a protein (▶ Named Entity) in Homo sapiens. Within
the literature, a large number of synonymous term
Automated Term Recognition variants of ERK2 can occur, including the
following: MAPK1, MAP kinase 1, ERT1,
Sophia Ananiadou extracellular signal-regulated kinase 2, ERK-2,
National Centre for Text Mining, School of Mitogen-activated protein kinase 2, MAP kinase 2,
Computer Science, University of Manchester, MAPK2 (http://www.uniprot.org/uniprot/P28482).
Manchester, UK Addressing terminological variability is crucial to
ensure the accuracy of text-based search systems in
biology, to allow integration of different resources, and
Synonyms to facilitate the development of terminologies
and ontologies (▶ Ontologies). Existing terminologi-
Term extraction; Terminology management cal and ontological resources do not cover all instances
of terms found in the literature. Since most of these
resources are manually curated, keeping track of all
Definition new terms and variants that appear in the literature is
a virtually impossible task. Term extraction tools can
Automatic term recognition (ATR) refers to the be a great asset to the curators of biological databases
automatic extraction of technical terms from and terminological resources, in that they can signifi-
domain-specific texts. It additionally encompasses cantly reduce the burden of keeping the resources
the assignment of semantic categories of the extracted up-to-date. The challenges faced by such tools thus
terms and mapping of them to concepts contained include not only the automatic identification of
within terminological or ontological resources. biological terms, but also the identification of
synonyms belonging to the same concept, and the
disambiguation of terms that belong to different
Characteristics concepts. In order to address all of these challenges,
integrated techniques such as automatic term recogni-
The Need for Term Recognition tion, ▶ term normalization and ▶ term disambiguation
With an overwhelming number of textual resources are essential for the systematic collection and update of
becoming available in biomedicine, there is an terminologies and their integration into systems
increased interest in ▶ text mining techniques that biology applications.
can identify, extract, manage, and exploit biomedical
knowledge hidden in the literature (Ananiadou and Methods of Automatic Term Recognition (ATR)
McNaught 2006). In systems biology, terms are the The process of ATR consists of three steps: term
backbone of such knowledge, since they constitute recognition, term classification, and term mapping.
A 58 Automated Term Recognition
Term recognition identifies terms from the literature. collection of documents that deal with a specific
Term classification assigns coarse-grained semantic domain can denote their degree of termhood (Kageura
tags to the recognized terms, for example, metabolites, and Umino 1996), that is, the extent to which such
chemical compounds, genes, proteins, etc. These tags sequences represent domain-specific terms. Another
are determined by using ▶ named entity recognition way of looking at word distribution is to measure the
techniques. Lastly, the recognized terms are mapped attachment strength between the components of a term
to terminological or ontological resources such as (unithood). Techniques like ▶ mutual information,
UMLS (Bodenreider 2004), OBO (Open Biological log-likelihood ratios test, frequency of occurrence
Ontologies (http://www.obofoundry.org/)), Uberon and likelihood have all been used to measure word
(http://obofoundry.org/wiki/index.php/UBERON:Main_ distribution. These approaches are typically
Page), GO (Ashburner et al. 2000), and nomenclatures knowledge poor, that is, they do not make use of
such as HUGO. existing, domain-specific resources, and are thus
The main approaches to term recognition portable to many domains. A popular approach used
are dictionary-based, statistical, and hybrid, the latter in biology, based on the notion of termhood, is C-value
of which combines linguistic knowledge with (Frantzi et al. 2000). This is a hybrid measure, as it
statistics. combines linguistic knowledge with a number of
Dictionary-based approaches use existing termino- statistical characteristics. This approach facilitates the
logical resources or ontologies to identify terms in text. recognition of embedded subterms, which are very
If a particular sequence of words in a text matches an frequent in biology. C-Value is the measure underlying
entry in one of the existing resources, then it is the term extraction service TerMine (http://www.
considered to be a term. Although there are several nactem.ac.uk/software/termine/), provided by the
inventories of terms in biology, for example, ChEBI National Centre for Text Mining.
(de Matos et al. 2010), BioThesaurus (Liu et al. 2006),
NCBI taxonomy, etc., the use of such resources to aid Incorporating Term Variability and
term recognition has several drawbacks, for example, Disambiguation into Term Extraction
each resource has a different focus, they are often not The integration of techniques to recognize biological
linked together, and they do not have sufficient termi- terms and their variants into text search engines
nological coverage. As new terms are created daily in improves accuracy, since all possible variants of
biology, and as most resources are manually curated a particular term can be mapped to the same canonical
and are costly to build, term extraction tools are crucial entity. This means that if user provides any single
for populating these resources. Wide coverage variant of a term as input to the engine, documents
terminological resources, such as the BioLexicon containing all known variants of the term will be
(Thompson et al. 2011), can be helpful in the construc- returned. Orthographic variations are one of the
tion of these tools. The BioLexicon combines together common types of term variation (e.g., tumour vs
terms from a number of existing resources, which are tumor; NF-KB vs NF-kb; amino-acid vs amino acid).
further augmented with terms extracted through the However, one of the most prolific types of terminolog-
automatic analysis of MEDLINE abstracts, in order ical variation by far in biology is acronyms. By way of
to account for real, observed usage of terms in text. an example, the acronym ER has about 80 possible
The resource incorporates gene/protein term variants definitions (expansions), including estrogen receptor,
and other linguistic and semantic information required emergency room, receptor alpha, enhancement ratio,
to drive text mining applications, such as ▶ informa- etc. In turn, each expansion can have several different
tion extraction and ▶ information retrieval. variant forms. For example, variants of estrogen recep-
Statistical approaches are based on various statisti- tor include the following: oestrogen receptor, estrogen
cal distributions of words in text. Since the vast receptors, estrogen-receptor, estrogenic receptor,
majority of terms consist of nouns or sequences of etc. Techniques for the automatic extraction of
nouns, the main strategy is to extract specific noun acronyms and their expanded forms are useful for
phrases as term candidates and estimate their probabil- ▶ information retrieval and ▶ information extraction.
ity of being terms. The frequency with which For example, Wren and Garner (2002) reported that
sequences of words co-occur in a document or PubMed could retrieve 5,477 documents using the
Automatic Classification Methods 59 A
acroymn JNK as a search term, but only 3,773 docu- Cross-References
ments when using the expanded form as a search term,
that is, c-jun N-terminal kinase. As with term extrac- ▶ Applied Text Mining
A
tion, a common approach for extracting acronyms is ▶ Clustering
utilizing statistical clues in text, that is, co-occurrences ▶ Information Extraction
between short-forms (e.g., ER) and long-forms ▶ Information Retrieval
(e.g., estrogen receptor). The strength of these ▶ Mutual Information
co-occurrences can be measured via techniques such ▶ Named Entity
as ▶ mutual information, Dice coefficient, ▶ Named Entity Recognition
log-likelihood ratio, etc. An example of a service that ▶ Term Disambiguation, Text Mining
automatically recognizes acronyms, and expands them ▶ Term Normalization, Text Mining
into their definitions and variant forms is Acromine
(http://www.nactem.ac.uk/software/acromine/).
Term variation is not the only challenge for auto- References
matic term management. Terms are frequently associ-
ated with multiple meanings. For example, the Ananiadou S, McNaught J (eds) (2006) Text mining for biology
and biomedicine. Artech House, Boston/London
abbreviation PCR has 129 expanded forms that can
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H,
be consolidated to 30 senses (e.g., polymerase chain Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT,
reaction, pathologic complete response, and Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S,
phosphocreatine). In general, each sense has more Matese JC, Richardson JE, Ringwald M, Rubin GM,
Sherlock G (2000) Gene ontology: tool for the unification
than one surface form (i.e., variant). Abbreviation
of biology. Nat Genet 25(1):25–29
disambiguation methods use resources such as Sense Bodenreider O (2004) The unified medical language system
Inventories (Okazaki et al. 2010) and apply ▶ cluster- (UMLS): integrating biomedical terminology. Nucleic
ing to the expanded definitions, using a number of Acids Res 32(Database issue):D267–D270
de Matos P, Alcantara R, Dekker A, Ennis M, Hastings J,
similarity methods to capture the context in which
Haug K, Spiteri I, Turner S, Steinbeck C (2010) Chemical
expanded forms appear. entities of biological interest: an update. Nucleic Acids Res
38(Database issue):D249–D254
Frantzi K, Ananiadou S, Mima H (2000) Automatic recognition
of multi-word terms: the C-value/NC-value method.
Conclusion Int J Digit Libr 3(2):115–130
Kageura K, Umino B (1996) Methods of automatic term
Given the frequency with which new terms appear recognition – a review. Terminology 3(2):259–289
in the literature, it is necessary to use techniques Liu H, Hu ZZ, Zhang J, Wu C (2006) BioThesaurus: a web-based
thesaurus of protein and gene names. Bioinformatics
such as the recognition of terms and their variants,
22(1):103–105
together with disambiguation. These techniques can Okazaki N, Ananiadou S, Tsujii J (2010) Building a high quality
facilitate the automatic extraction and classification sense inventory for improved abbreviation disambiguation.
of new terms, and allow them to be linked with Bioinformatics 26(9):1246–1253
Thompson P, McNaught J, Montemagni S, Calzolari N, Del
databases, ontologies, and controlled vocabularies.
Gratta R, Lee V, Marchi S, Monachini M, Pezik P,
An immediate application of automatic term extrac- Quochi V, Rupp C, Sasaki Y, Venturi G, Rebholz-
tion is to enhance the performance of text search Schuhmann D, Ananiadou S (2011) The BioLexicon:
engines. While existing search methods normally a large-scale terminological resource for biomedical text
mining. BMC Bioinformatics 12:397
restrict retrieval to exact matching of query terms,
Wren JD, Garner HR (2002) Heuristics for identification of
which can result in either low precision (irrelevant acronym-definition patterns within text: towards an auto-
information is retrieved) or low recall (relevant mated construction of comprehensive acronym definition
information is overlooked), term recognition and dictionaries. Methods Inf Med 41(5):426–434
term management methods can improve search
results by linking terms as they appear in text and in
various biological resources, discovering relation- Automatic Classification Methods
ships between terms and improving classification
and clustering methods. ▶ Clustering Methods
A 60 Autonomy
Characteristics
Autonomy
Although the idea of autonomy is commonly used, it is
Alvaro Moreno quite difficult to be defined. It is applied to very differ-
Department of Logic and Philosophy of Science, ent objects, in rather different contexts, and with dif-
University of the Basque Country, San Sebastian, ferent rigor. Systems as diverse as machines, control
Spain devices, robots, computer programs, human beings,
institutions, countries, animals, organs, cells, and
others, apparently deserve to be called “autonomous,”
Definition or at least to be studied and compared in terms of the
“autonomy” displayed in their behavior. Originally the
The concept of autonomy expresses the property of term was used in the context of law and sociology (in
a system that builds and actively maintains the rules the sense of self-government of the Greek polis) or
that define itself, as well as the way it behaves in human cognition and rationality (in the sense of
the world. So autonomy covers the main properties a cognitive agent that acts according to rationally
shown by any living system at the individual level: self-generated rules).
(1) self-construction (i.e., the fact that life is continu- Etymologically, autonomy means self-law (giving),
ously building, through cellular metabolisms, the namely, the capacity to act according to self-
components which are directly responsible for the determined principles. Broadly speaking, autonomy
behavior of the system) and (2) functional action is understood as the capacity to act according to self-
on and through the environment (i.e., the fact that determined principles. However, the idea of autonomy
organisms are agents, because they necessarily modify can adopt a more specific, minimal sense related to the
their boundary conditions in order to ensure their own capacity of a system to self-define, to construct its own
maintenance as far-from-equilibrium, dissipative identity. It is in this more basic sense that autonomy
systems). proves relevant for biology, to refer to the main feature
There are several key points. First, the notion of of the organization of living organisms, namely, the
cyclic causal organization or recursivity remains the fact that they are able to self-repair, self-produce, and
conceptual nucleus of autonomy as identity preserva- self-maintain.
tion. The system is constituted as a series of causal Although the idea of autonomy as a key concept to
processes (energy transduction, component produc- understand the organization of organisms has its roots
tion, etc.) converging to a given (initial) state, as an in Kant (1790), it was the theory of autopoiesis which
indefinite repetition of the same loop (Bechtel 2007). placed the notion of autonomy at the center of the
Second, this identity is something more than a mere biological understanding of living beings.
ordered pattern: It is a true set of mechanisms, whose Within the autopoietic theory, the concept of auton-
function is the very maintenance of the system, and omy has been applied to different biological domains:
therefore of themselves. Third, this organization is the basic metabolic domain, the immune domain, and
intrinsically precarious because it depends on the neural domain (Varela 1979). The root of the con-
a dynamical order whose cohesion is dissipative. cept, however, was the characterization of the basic
Fourth, an autonomous system is continuously organization of the living, namely, its metabolic orga-
performing actions, doing things that contribute to its nization. Actually, it was a simplified and rather
own maintenance. These interactions with the environ- abstract version of a cyclic idea of metabolism that
ment are, at the same time, a result of the constitutive inspired Maturana and Varela (1973; Varela et al.
processes of the system and a necessary condition for 1974) to define living beings as autopoietic (self-
their continuity. In other words, there is a reciprocal producing) systems. Thus, the concept of autopoiesis
dependence between what defines the “self” (or the refers to a recursive process of component production
subject) and the actions derived from its existence, that builds up its own physical border. The global
because it is not possible to separate the system’s network of component relations establishes a self-
doing from its being (Moreno et al. 2008). maintaining dynamics, which brings about the
Autonomy 61 A
constitution of the system as an operational unit. In Kauffman’s (2000) approach. Starting from the con-
sum, components and processes are entangled in cept of an autocatalytic set, the main condition
a cyclic, recursive production logic and they constitute required to consider a system as autonomous is that it
A
an identity. According to Maturana and Varela, auton- be capable of performing what this author calls (at
omous capacities stem from the logic of autopoiesis least) one “work-constraint cycle.” This capacity is
(Maturana and Varela 1973, 1980). implemented through a deep entanglement between
Although the stress is made in the circular logic of work and constraints: “work begets constraints begets
the system, (The idea that the essence of life consists in work,” as some form of closure in the abstract space of
a cyclic or causally closed organization is also shared catalytic tasks. This idea is based, on the one hand, in
by other authors like Rosen (1981) or Ganti (1975).) in Atkins’ view (1984) of work as a coordinated, coher-
other places these authors also consider the relation of ent, and constrained release of energy, and, on the
the system with its environment, admitting that auton- other hand, in the recognition that work is absolutely
omy implies the capacity to maintain the identity necessary to build constraints. To be autonomous,
through the active compensation of deformations a system must do work; to be capable of work it
(Maturana and Varela 1980). Thus, phenomena like requires constraints to channel the flow of energy in
tornadoes, whirlpools, and candle flames, which are to an appropriate way; and to build those constraints the
a certain degree self-organizing and self-maintaining system requires appropriately constrained energy
systems, are not autonomous, because they lack any flows, that is to say, work. This is the circularity of
form of active interaction exerted by the system. In that the “work-constraint cycle.” Kauffman’s account
sense, what distinguishes simple forms of self- envisages how functionality or utility (implicit in the
organization and self-maintenance from autonomy is idea of work) can come out of the causal circularity of
that the former lack an internal organization which the system, where this circularity is not only under-
would be complex enough to be recruited for stood in terms of abstract relations of component pro-
displaying selective actions, capable to actively ensure duction, but also as a thermodynamic logic sustaining
the maintenance of the system. (This is not to say that the specific chemical recursivity of the system: All the
simple self-maintaining systems are necessarily less processes ongoing in the system are constrained in
robust than autonomous ones). order to satisfy the condition of self-maintenance.
However, Maturana and Varela have not analyzed
which type of material organization may satisfy this
requirement. Actually, their view on autonomy was Implications
explicitly abstract and functionalist. More recently,
other authors have developed different approaches on The idea of autonomy can serve as a bridge from the
the concept of biological autonomy. Hooker, Collier, nonliving to the living domain (Ruiz-Mirazo et al.
Christensen, and Bickhard, for example (Christensen 2004). Autonomy is applicable to this gap between
and Hooker 2000; Collier 1998; Bickhard 2000), have chemistry and biology, beyond standard self-
stressed both the material-energetic and the interactive organization, because in a living organism component
dimension of autonomous systems, derived from the production relationships (chemical transformations,
fact that the systems are in far-from-equilibrium con- reaction feedbacks, mutual interactions, catalytic
ditions: since the cohesive organization of these sys- effects, etc.) can be interpreted as self-constraining
tems exists only in far-from-equilibrium conditions, processes, that is, as the generation by the very organ-
they must maintain an adequate interchange of matter ism of the local rules (constraints) that govern its
and energy with their environment in order to keep this dynamic behavior (Pattee 1972). It is through this
cohesion. Otherwise these systems will disintegrate. self-constraining action that the organism actually
Therefore, an autonomous system needs an internal defines itself, constructs an identity of its own.
mechanism able to organize and channel energy The concept of autonomy has been proposed for
flows for the system’s self-maintenance. defining the very nature of biological individuals,
A similar concern in formulating the idea of auton- because it is intended to account for the complex
omy in material terms is probably on the basis of material organization underlying any living organism,
A 62 Average Path Length
namely, its metabolism, understood as a cyclic, self- Universitaria. Santiago (1994, 3rd edition including new
maintaining network of reactions by means of which prefaces by each author)
Maturana H, Varela F (1980) Autopoiesis and cognition: the
the components of an organism are continually realization of the living. Reidel, Dordecht
produced. Moreno A, Etxeberria A, Umerez J (2008) The autonomy of
biological individuals and artificial models. Biosystems
91(2):309–319
Pattee H (1972) Laws and constraints, symbols and languages.
Cross-References In: Waddington CH (ed) Towards a theoretical biology,
vol 4, Essays. Edinburgh University Press, Edinburgh,
▶ Closure, Causal pp 248–258
▶ Constraint Rosen R (1991) Life itself: a comprehensive inquiry into the
nature, origin and fabrication of life. Columbia University
▶ Function Press, New York
▶ Organization Ruiz-Mirazo K, Moreno A (2004) Basic autonomy as
▶ Perturbation a fundamental step in the synthesis of life. Artif Life
▶ Robustness 10(3):235–259
Ruiz-Mirazo K, Pereto J, Moreno A (2004) A universal defini-
▶ Self-Organization tion of life: autonomy and open-ended evolution. Orig Life
▶ Systems, Autopoietic Evol Biosph 34:323–346
Varela F (1979) Principles of biological autonomy. Elsevier
North Holland, New York
Varela F, Maturana H, Uribe R (1974) Autopoiesis: the
References organization of living systems, its characterization and
a model. Biosystems 5:187–196
Atkins PW (1984) The second law. Freeman, New York
Bechtel W (2007) Biological mechanisms: organized to main-
tain autonomy. In: Boogerd F, Bruggeman F, Hofmeyr JH,
Westerhoff HV (eds) Systems biology: philosophical
foundations. Elsevier, Amsterdam, pp 269–302
Average Path Length
Bickhard M (2000) Autonomy, function, and representation.
Communication and Cognition - Artificial Intelligence ▶ Characteristic Path Length
17(3–4):111–131
Christensen W, Hooker C (2000) Anticipation in autonomous
systems: foundations for a theory of embodied agents. Int
J Comput Anticip Syst 5:135–154
Collier J (1998) Autonomy in anticipatory systems: significance Avidian
for functionality, intentionality and meaning. In: Dubois D
(ed) Proceedings of CASYS’98, the second international
▶ Digital Organism
conference on computing anticipatory systems. Springer-
Verlag, New York
Kant I (1790:1952) Critique of judgement. Oxford University
Press, Oxford
Kauffman S (2000) Investigations. Oxford University Press,
Oxford
Azacitidine
Maturana H, Varela F (1973) De máquinas y Seres Vivos.
Autopoiesis: La organización de lo Vivo. Editorial ▶ 5-azaCytosine
B
▶ Lymphocyte Labeling, Cell Division Investigation Yasser EL-Manzalawy and Vasant Honavar
Center for Computational Intelligence, Learning, and
Discovery, Computer Science, Iowa State University,
Ames, IA, USA
B Cell Epitope
Synonyms
Ramachandran Srinivasan
G.N. Ramachandran Knowledge Centre for Genome Antigen–antibody binding site prediction; Antigen–
Informatics, Institute of Genomics and Integrative antibody interface residue prediction; Computational
Biology, Delhi, India methods for mapping B cell epitopes
Definition Definition
A B cell epitope is the region of the antigen recognized B cell epitopes, also known as antigenic determinants,
by soluble or membrane-bound antibodies. B cell are restricted parts of molecules that are recognized by
epitopes are classified as either linear or discontinuous immunoglobulin molecules (antibodies) either in their
epitopes. free form or as membrane-bound B cell receptors.
B cell epitopes typically belong to one of two classes:
linear (continuous or sequential) epitopes or confor-
Cross-References mational (discontinuous) epitopes. Linear epitopes are
short peptides that correspond to a contiguous amino
▶ B Cells acid sequence fragment of a protein. Linear epitopes
▶ Systems Immunology, Data Modeling and Scripting are usually identified using assays such as PEPSCAN.
in R Consequently, current experimental methods offer
B Cell Epitope Prediction, Fig. 1 Residue-based (left) residue is denoted with the letter “E.” In the case of peptide-
and peptide-based (right) linear B cell epitope predictions. In based predictions, the predicted epitopes are encoded with
the case of residue-based predictions, each predicted epitope a string of “E”s
little direct evidence indicating that each residue in the and output the predicted epitope(s). Some predictors,
epitope does in fact make contact with one or more called peptide-based predictors, require the user to
residues in the paratope (the part in the antibody that specify the length of the epitope as an input parameter.
binds to the antigen). The second class of B cell Peptide-based predictors are trained to classify amino
epitopes is called conformational (discontinuous) acid fragments into epitopes and other non-epitopes.
B cell epitopes that represent the vast majority of B cell Residue-based predictors, on the other hand, are trained
epitopes found in proteins. These epitopes are com- to score each residue in the query sequence; the higher
posed of amino acids that, although not contiguous in the score, the more likely it is that the corresponding
the primary sequence, are brought into close proximity residue belongs to an epitope. Figure 1 illustrates pep-
within the folded three-dimensional protein structure. tide-based and residue-based B cell epitope prediction.
In contrast to peptide-based predictors that can only
identify epitopes of specified length or antigenic
Characteristics regions, residue-based predictors can be used to infer
antigenic regions of unknown length or even conforma-
Predicting B Cell Epitopes tional B cell epitopes. Conformational B cell epitope
Identification of B cell epitopes is often a necessary predictors take as input an antigen structure (actual or
first step in developing safe and effective vaccines. predicted PDB coordinate file) or antigen sequence, and
Characterization of sequence and structural features extract some sequence- and/or structure-based feature
of B cell epitopes is important from the standpoint of representation of each residue in the input query anti-
understanding of pathogenicity and the adaptive gen. The output of the predictor is a per residue scores
immune response. Several experimental techniques assigned to each residue in the query antigen.
are currently available for mapping B cell epitopes.
However, with rapid increase in the number of fully or Predicting Linear B Cell Epitopes
partially sequenced pathogen genomes and prohibitive Although a vast majority of B cell epitopes are
cost and effort required for experimental identification believed to be conformational epitopes (Walter
of epitopes, there is an urgent need for cost-effective 1986), most of the existing experimental B cell-epitope
computational methods for reliable genome-wide mapping techniques and, perhaps consequently, most
identification of B cell epitopes. of the existing computational methods for B cell epi-
B cell epitope predictors can be categorized based tope prediction deal with linear B cell epitopes. Several
on the type of the B cell epitope predicted into linear computational-based methods for predicting linear
and conformational B cell epitope predictors. Linear B B cell epitopes have been proposed in literature
cell epitope predictors accept as input a suitable repre- (EL-Manzalawy and Honavar 2010). B cell epitopes
sentation of the amino acid sequence of a target antigen predicted using such methods can be easily
B Cell Epitope Prediction 65 B
synthesized and tested for their antibody-binding prop- binary labels (epitopes vs. non-epitopes) to obtain
erties. Existing computational methods for predicting peptide-based B cell epitope predictors. Alternatively,
linear B cell epitopes fall into two major categories: they can be trained using examples that are fixed length
(1) propensity scale–based methods; (2) machine amino acid sequence windows whose binary labels
learning–based methods. indicate whether or not the residue at the center of B
each sequence window is an epitope residue to obtain
Propensity Scale–Based Methods residue-based B cell epitope predictors. In either case,
Propensity scale–based methods (e.g., Parker and Guo a variety of amino acid residue–based and amino acid
1986; Pellequer et al. 1993) rely on the observed cor- sequence–based features for encoding the input to the
relations between specific physicochemical properties classifiers (EL-Manzalawy and Honavar 2010) have
(e.g., hydorophilicity, flexibility, or solvent accessibil- been utilized. Despite recent advances in machine
ity) of amino acids and the antigenic determinants in learning–based linear B cell epitope predictors, there
protein sequences to identify the location(s) of the is much room for improvement in the performance of
linear B cell epitope(s) in the query protein sequence. the state-of-the-art in computational prediction of lin-
The main idea is to assign a score to each amino acid in ear B cell epitopes (Greenbaum et al. 2007). This is due
a query protein sequence. These propensity scores at least in part to the limited amount of experimentally
measure the tendency of an amino acid to be part of characterized epitope data (and in particular, lack of
a B cell epitope (as compared to the background). The reliable negative, i.e., non-epitope data). Conse-
score for each target amino acid residue in a query quently, the development of more reliable linear
sequence is computed as the average of the propensity B cell epitope predictors remains a major challenge
values of the amino acids in a sliding window centered in computational immunology.
at the target residue. The propensity scores are then
used as a basis of predicting whether a given amino Predicting Conformational B Cell Epitopes
acid sequence residue is likely to be part of a linear Several experimental techniques can be used for iden-
B cell epitope. Figure 2 shows the analysis of receptor- tifying conformational B cell epitopes. The most accu-
binding domain (RBD) of Severe Acute Respiratory rate method relies on the determination of the structure
Syndrome coronavirus (SARS-CoV) spike protein of antigen–antibody complexes using X-ray crystal-
using Parker’s hydrophilic scale. lography (Fleury et al. 2000). Progress of computa-
Recently, Blythe and Flower (Blythe and Flower tional methods for conformational B cell epitope
2005) evaluated 484 amino acid propensity scales on prediction has been hindered at least in part by the
a data set of 50 proteins and concluded that the best limited number of solved antigen–antibody com-
achievable performance is only marginally better than plexes. As noted earlier, propensity scale–based
random guessing. This result underscores the need for methods can be used to predict both linear and confor-
more sophisticated methods (e.g., those that use state- mational B cell epitopes using only amino acid
of-the-art machine learning algorithms together with sequence information of antigens. Two recently devel-
appropriate data representations) for constructing oped methods for predicting conformational B cell
improved linear B cell epitope predictors. epitopes, DiscoTope and PEPITO, improve the accu-
racy of propensity scale–based methods by incorporat-
Machine Learning–based Methods ing information derived from the structure of the
Machine learning currently offers one of the most cost- antigens, e.g., solvent accessibility of residues. The
effective and hence widely used approaches to devel- task of predicting conformational B cell epitopes can
oping predictive models from data in bioinformatics be reduced to identifying protein–protein interface res-
applications (Baldi and Brunak 2001). Several idues on the surface of a target protein (antigen). This
machine learning–based linear B cell epitope predic- opens up the possibility of adapting the state-of-the-art
tors have been developed using Support Vector sequence and/or structure-based protein–protein inter-
Machine, Artificial Neural Network, Decision Tree, face residue prediction methods for developing con-
k-Nearest Neighbor, and Ensemble classifiers. Such formational B cell epitope predictors. A recent study
classifiers can be trained using examples that are (Ponomarenko and Bourne 2007) compared the per-
amino acid peptide sequences with the corresponding formance of six publicly available protein–protein
B 66 B Cell Epitope Prediction
B Cell Epitope Prediction, Fig. 2 Analysis of RBD domain of in the window. The higher the score, the more likely it is that the
SARS-CoV Spike protein using Parker’s hydrophilic scale. corresponding residue is an epitope residue. Positive peaks along
A sliding window of seven amino acids is used to assign the sequence in the plot of the residue scores indicate the pres-
a propensity score to the residue in the center of the window. ence of an epitope in that region of the sequence
The score is the sum of the epitope propensity of each amino acid
interface residue prediction tools on the conforma- (named mimotopes) that bind to the antibody with
tional B cell epitope prediction task. The reported high affinity. It is assumed that this panel of mimotopes
performance of such methods (with an average area mimics the physicochemical properties and spatial
under curve (AUC) no greater than 0.7) underscores organization of the genuine epitopes. Because the pre-
the need for developing protein–protein interface pre- cise identification of the epitope mimicked by the set of
dictors that are customized for the conformational B mimotopes is not straightforward since the epitope is
cell epitope prediction task. often discontinuous (conformational) and the epitope
A recently developed approach for identifying B cell and mimotopes do not necessarily share a high degree
epitopes uses a combination of both experimental and of sequence similarity, several computational methods
computational techniques. In this approach, a phage- have been proposed for localizing the panel of affinity-
display library of random peptides is scanned against selected peptides on the surface of a target antigen (e.g.,
an antibody of interest to obtain a panel of peptides Bublil et al. 2007).
B Cells 67 B
References Definition
Baldi P, Brunak S (2001) Bioinformatics: the machine learning B cell-mediated immune response is defined as the
approach, 2nd edn. MIT Press, Cambridge, MA
immune response cascade triggered by the binding of
Blythe M, Flower D (2005) Benchmarking B cell epitope prediction:
underperformance of existing methods. Protein Sci 14:246–248 antibodies (produced by the B cells) to the antigens B
Bublil E, Freund N, Mayrose I, Penn O, Roitburd-Berman A, and subsequent identification by the cell surface recep-
Rubinstein N, Pupko T, Gershoni J (2007) Stepwise predic- tors of macrophages, neutrophils, or other cells of
tion of conformational discontinuous B cell epitopes using
the B cell-mediated immunity to destroy the antigens.
the Mapitope algorithm. Proteins Struct Funct Bioinform
68:294–304, Wiley It is a type of adaptive immunity in vertebrates
EL-Manzalawy Y, Honavar V (2010) Recent advances in B cell (Alberts et al. 2002).
epitope prediction methods. Immunome Res 6:S2
Fleury D, Daniels R, Skehel J, Knossow M, Bizebard T (2000)
Structural evidence for recognition of a single epitope by two
distinct antibodies. Proteins 40:572 Cross-References
Greenbaum J, Andersen P, Blythe M, Bui H, Cachau R, Crowe J,
Davies M, Kolaskar A, Lund O, Morrison S, Mumey B, ▶ Major Histocompatibility Complex (MHC)
Ofran Y, Pellequer J, Pinilla C, Ponomarenko J, Raghava G,
▶ TCR Recognition of MHC-Peptide Complexes
van Regenmortel M, Roggen E, Sette A, Schlessinger A,
Sollner J, Zand M, Peters B (2007) Towards a consensus on
datasets and evaluation metrics for developing B cell epitope
prediction tools. J Mol Recognit 20:75–82 References
Parker J, Guo D (1986) New hydrophilicity scale derived from
high-performance liquid chromatography peptide retention
Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walter P
data: correlation of predicted surface residues with antige-
(2002) The adaptive immune system. In: Molecular biology
nicity and x-ray-derived accessible sites. Biochemistry
of the cell, 4th edn. Garland Science, New York,
25:5425–5432, American Chemical Society
pp 1363–1421
Pellequer J, Westhof E, Van Regenmortel M (1993) Correlation
between the location of antigenic sites and the prediction of
turns in proteins. Immunol Lett 36:83–99
Ponomarenko J, Bourne P (2007) Antibody-protein interactions:
benchmark datasets and prediction tools evaluation. BMC
Struct Biol 7:64, BioMed Central Ltd
Walter G (1986) Production and use of antibodies against syn-
B Cells
thetic peptides. J Immunol Methods 88:149–161
Ramachandran Srinivasan
G.N. Ramachandran Knowledge Centre for Genome
Informatics, Institute of Genomics and Integrative
Biology, Delhi, India
B Cell-mediated Immune Response
Cross-References
Bacterial Transcriptional Cascade
▶ B Cell Epitope
▶ Systems Immunology, Data Modeling and Scripting ▶ Sigma Cascade
in R
Bagged Predictors
B Lymphocyte
▶ Bagging
▶ B Cells
Bagging
Backbone or Carbon-a Chain (Both with
the Sidechain) Celine Vens
Department of Computer Science, Katholieke
▶ Protein Structure Metapredictors Universiteit Leuven, Leuven, Belgium
Synonyms
Bacterial Artificial Chromosome (BAC)
Bagged predictors; Bootstrap aggregating
Myong-Hee Sung
Laboratory of Receptor Biology and Gene Expression,
National Cancer Institute, National Institutes of Definition
Health, Bethesda, MD, USA
Bagging is an ▶ ensemble learning method. Each mem-
ber of the ensemble is trained on a different ▶ bootstrap
Definition replicate of the training set. The outcomes of the indi-
vidual learning models are aggregated to obtain the
A bacterial artificial chromosome is a DNA construct outcome of the ensemble. Bagging often uses ▶ deci-
used for cloning a relatively large piece of DNA sion tree learners to create the individual models, but it
(100700 kb), usually by transforming E. coli. can be used with any unstable learning method.
Characteristics
Bacterial Cell Cycle
Constructing Bagged Predictors
▶ Cell Cycle, Prokaryotes Bagging proceeds by applying the same learning algo-
rithm to different bootstrap samples of the training set
(Breiman 1996a). Given a training set D, M new train-
ing sets Dk are generated, by uniformly sampling
Bacterial Transcription examples from D, with replacement. Usually, the sam-
pling procedure is terminated when each training set
▶ Transcription in Bacteria contains an equal amount of training examples as the
Basal Lamina 69 B
original set D. Thus, each example may appear zero, In each resampled training set, about one third of the
one, or multiple times in each of the bootstrap samples. instances are left out (actually 1/e in the limit). As
Algorithm 1 shows the pseudo-code to construct a result, out-of-bag estimates are based on combining
a bagged ensemble. only about one third of the total number of classifiers in
the ensemble. This means that they might overestimate B
Algorithm 1 Pseudo-code for constructing an ensemble using the error rate, certainly when a small number of trees
bagging. D denotes the training set, M is the number of models in
the ensemble. are used in the ensemble.
1: for k (1 to M do
2: Dk ( Bootstrap(D) Applications in Systems Biology
3: hk ( Learning Algorithm (Dk) Dudoit and Fridlyand (2003) propose a bagging pro-
4: end for cedure to improve the clustering of gene expression
5: return Uhk
data from cancer microarray studies. Schietgat et al.
The outputs of the base classifiers are aggregated (2010) use bagging of ▶ decision trees to predict the
(hence the name bagging, which is an acronym for functions of genes.
bootstrap aggregating) to form the output of the bag-
ging ensemble. Bagging is most popular in the context Implementations
of ▶ supervised learning, in which case the output Most data mining tools (e.g., Weka [Weka, Machine
corresponds to a prediction of the target attribute. For Learning Tool]) include an implementation of the
▶ classification tasks, the ▶ majority vote of the base bagging procedure.
classifiers’ predictions is taken; for ▶ regression, the
average is taken.
The number of models, M, is a parameter to be Cross-References
chosen by the user. Different values have been
reported in the literature. Breiman (1996a) uses ▶ Bootstrapping
50 models for classification tasks, and 25 models for ▶ Classification
regression. ▶ Decision Tree
Bagging improves the predictive performance of ▶ Ensemble
the base predictors if the base predictor is unstable. ▶ Learning, Supervised
This means that small changes in the training set can ▶ Regression
yield large changes in the predictive model. ▶ Deci-
sion trees and artificial neural networks are examples
of unstable predictors. Nearest neighbor methods, for References
instance, are not. In terms of the bias-variance trade-
Breiman L (1996a) Bagging predictors. Machine Learning
off, bagging is known to reduce variance, while 24(2):123–140
slightly increasing bias. Breiman L (1996b) Out-of-bag estimation. Technical Report.
University of California, Berkely ftp.stat.berkeley.edu/pub/
Out-Of-Bag Error Estimates users/breiman/OOBestimation.ps.Z. Accessed Aug 4, 2011
An advantage of using bagging is that out-of-bag error Dudoit S, Fridlyand J (2003) Bagging to improve the accuracy of
a clustering procedure. Bioinformatics 19(9):1090–1099
estimates (Breiman 1996b) can be used to estimate the Schietgat L, Vens C, Struyf J, Blockeel H, Kocev D, Džeroski S
generalization error of the ensemble, which removes (2010) Predicting gene function using hierarchical multi-
the need for a set aside test set. If the ensemble com- label decision tree ensembles. BMC Bioinformatics 11(2)
prises decision trees, then the out-of-bag error estima-
tion proceeds as follows: for every example in the
training set, a prediction is made, but only those trees
for which the example was not in the bootstrap sample Basal Lamina
are used. The error rate of this resulting out-of-bag
classifier is called the out-of-bag error estimate. ▶ Extracellular Matrix
B 70 Basal Transcription Factors
Definition
Definition
In probability theory, the definition of the conditional
The basement membrane is a thin layer of extracellular probability of A given B, P(A|B), is
components that lies underneath the epithelium of
many organs, as well as the basal surface of the endo- PðA; BÞ
thelium of the entire vasculature. Its main components PðAjBÞ ¼
PðBÞ
include, but not limited to, laminin, fibronectin,
entactin, proteoglycans, and collagen IV (Yurchenco
and Patton 2009). The function of the basement mem- where P(A) and P(B) are the probabilities of A and B,
brane is to provide structural support for organs and for respectively. Also, P(B|A) is, by symmetry,
the vascular architecture. Degradation of the basement
membrane by matrix metalloproteinases serves as PðA; BÞ
PðBjAÞ ¼
a key step during tumor angiogenesis by facilitating PðAÞ
endothelial cell sprouting and pericyte detachment
(Jakobsson and Claesson-Welsh 2008). Then Bayes rule is expressed as follows:
PðBjAÞPðAÞ
Cross-References PðAjBÞ ¼
PðBÞ
▶ Extracellular Matrix
In the situation where P(A|B) is difficult to compute
▶ Neovascularization
directly but we have direct information about P(B|A),
Bayes rule enables us to compute P(A|B) in terms of P
References
(B|A). If the denominator P(B) in the above equation is
Jakobsson L, Claesson-Welsh L (2008) Vascular basement a normalizing constant which can be computed by
membrane components in angiogenesis – an act of balance. marginalization, then
Scientific World Journal 8:1246–1249
Yurchenco PD, Patton BL (2009) Developmental and patho- X X
genic mechanisms of basement membrane assembly. Curr
PðBÞ ¼ PðBjAi Þ ¼ PðBjAi ÞPðAi Þ
i i
Pharm Des 15(12):1277–1294
Bayesian Inference
Bayesian Decision Analysis
Roger Higdon
Malcolm Farrow Seattle Children’s Research Institute, Seattle,
School of Mathematics & Statistics, Newcastle WA, USA
University, Newcastle upon Tyne, UK
Synonyms
Synonyms
Bayesian statistics
Bayesian; Decision theory
Definition
Definition
The principle of Bayesian inference is about assigning
Bayesian decision analysis (Smith 2010) is concerned probability to “the state of knowledge” of parameters
with making choices when the outcomes cannot be (y) related to an experiment. To do so Bayesian infer-
predicted with certainty. Probabilities are assigned to ence assigns probability distribution to all unknown
the various possible outcomes and so to values of the quantities in a statistical problem, parameters (y) as
reward or payoff, under each possible choice. We well as the data (X). This is in contrast to frequentist
choose between probability distributions of rewards. interference (▶ Frequentist Approach) where parame-
This is done by making the choice which maximizes ters are fixed unknown quantities that define the fre-
the expected utility (▶ utility function), that is, by quency of the data.
choosing the alternative which gives the probability
distribution of rewards with the greatest expectation of
utility, where utility is a function of the reward. For Characteristics
example, suppose that a person’s utility for small
financial rewards is an increasing linear function of The term “Bayesian” refers to Thomas Bayes
the monetary value. Suppose that this person is given (1702–1761), who proved a special case of what is
the choice between alternatives A and B with reward now called Bayes’ theorem (Bayes 1763). However,
distributions as follows: it was Pierre-Simon Laplace (1749–1827) who intro-
A: $1 with probability 0.6 or $2 with probability 0.4 duced a general version of the theorem and used it to
B: $0 with probability 0.2 or $2 with probability 0.8 approach problems in celestial mechanics, medical
The optimal choice is then B, with expectation $1.6, statistics, reliability, and jurisprudence.
while A has expectation $1.4. Bayesian inference provides a standardized
See also ▶ Utility Function. approach for analyzing statistical problems
B 72 Bayesian Inference
(Berger 1985). It is based on the posterior distribution, problems involve complex integration of the posterior
p(y|X), the distribution of parameters given in the distribution in order generate means or quantities such
data. The posterior can be represented as a constant as highest posterior density regions (the Bayesian
times product of the prior distribution and the analog to the confidence interval). These calculations
likelihood typically involve complex numerical approaches
such as Markov Chain Monte Carlo (MCMC)
pðyjXÞ ¼ k pðXjyÞpðyÞ (Brooks 1998).
A compromise between Bayesian and frequentist
The likelihood is the joint distribution of the data as approaches are empirical Bayes methods (Carlin and
a function of y. The prior distribution represents the Louis 2000). In empirical Bayes methods, the Bayes-
state of knowledge about the experimental parameters ian distributional framework is used; however, instead
before generating the data. of specifying the parameters in the prior distribution,
The ability to incorporate prior information is what the parameters are estimated from the data themselves.
gives Bayesian inference its power and makes it con- Empirical Bayes methods are quite commonly used in
troversial. The prior distribution is a vehicle by which the analysis of microarrays and other high-throughput
previous knowledge can be used to guide the statistical experimental data.
inference. One must be judicious in how the prior is There are Bayesian analogs to most classical statis-
specified since a very strong prior distribution can tical approaches such as t-tests, linear regression, or
overwhelm the evidence given by the data. One can chi-squared tests. Bayesian methods have become very
use an uninformative or reference prior if there is no popular in systems biology and bioinformatics because
strong initial belief about the parameters. In this case, they offer a more standardized way of handling com-
Bayesian inference gives very similar results to plex and noisy data. No specialized approaches are
maximum likelihood inference. needed if models are not fully specified or if data are
A simple example illustrating Bayesian inference is missing; this is not the case using traditional
batting averages in the sport of baseball. Suppose, frequentist methods. Examples of Bayesian methods
a player is observed to get six hits (successes) in ten in systems biology are sequence alignment (Durbin
at bats (trials) over the course of two games, the usual et al. 1998), motif finding (Zhou and Liu 2004), struc-
frequentist estimate of the player’s batting average is ture prediction (Schmidler et al. 2000), microarray
.600. This is a very unrealistic estimate since through- analysis (Gottardo et al. 2003), and Bayesian networks
out the history of the sport, only a few players have (Jansen et al. 2003).
ever batted even .400. If one puts a prior distribution on
the batting average p, where p is distributed beta
(47,127), thus giving p a mean of .270 and standard
Cross-References
deviation of .3 (this corresponds roughly to the typical
distribution of batting averages for baseball players).
▶ Bayesian Method
So this implies that the posterior distribution is as
▶ Frequentist Approach
follows.
Definition
Synonyms
Bayesian method refers to a probability method to con-
Schwarz criterion; Schwarz information criterion (SIC) struct a knowledge structure used to support decision
making, such as classification (▶ Classification;
▶ Identification of Gene Regulatory Networks,
Definition Machine Learning) and regression (▶ Regression
Analysis) tasks, processes, or analyses. Bayesian
In statistics, the Bayesian information criterion (BIC) methods are valuable whenever there is a need to extract
(Schwarz 1978) is a model selection criterion. It is a information from data that are uncertain or subject to
selection criterion for choosing between different any kind of error or noise (including measurement error
models with different numbers of parameters. and experiment error, as well as noise or random vari-
The BIC is an asymptotic result derived under the ation intrinsic to the process of interest).
assumption that the data distribution belongs to the
exponential family.
Suppose that: Characteristics
1. x ¼ the observed data.
2. n ¼ the number of data points in x, or equivalently, Traditional statistical techniques struggle to cope
the sample size. with complex nonlinear models that are only
3. k ¼ the number of free parameters to be estimated. partially observed. Due to the fact that the Bayesian
4. pðxjkÞ ¼ the probability of the observed data given statistical paradigm is fully probabilistic, there is no
the number of parameters. fundamental distinction between any of the unknowns
5. L ¼ the maximized value of the likelihood function in a statistical model – parameters, hidden variables,
for the estimated model. and observations are all treated together in a consistent
The formula for the BIC is: manner – and it is from this that the power of the
2 ln pðxjkÞ BIC ¼ 2 ln L þ k lnðnÞ methodology is derived. Provided that you can write
down a statistical model relating the quantities you are
Given any two estimated models, the model with interested in to the data you can observe (possibly
the lower BIC value is the one to be preferred. via many unobserved intermediary variables), then
B 74 Bayesian Method
you can carry out Bayesian method to extract factory #1 was the prior probability, PðH1 Þ, which
the information in the data to give fully probabilistic was 0.5. After observing the unqualified light
information on all unobserved model variables. bulb, we must revise the probability to PðH1 jEÞ;
which is 0.33.
Mathematical Formulation
In the Bayesian inference, we specify a sampling Practical Application of Bayesian Methods
model PðZjyÞ (density or probability mass function) The main limiting factor in applying Bayesian
for our data given the parameters, and a prior distribu- methods is computational. For nontrivial problems,
tion for the parameters PðyÞ reflecting our knowledge analytic approaches to Bayesian inference are not
about y before we see the data. We then compute the possible, and their numerical solution is often
posterior distribution challenging due to the need to solve high-dimensional
integration problems (which in the discrete case
PðZjyÞPðyÞ PðZjyÞPðyÞ translate to combinatorial summation problems).
PðyjZÞ ¼ ¼Ð ; (1) Advances in the speed of commodity computing
PðZÞ PðZjyÞPðyÞdy
hardware in recent decades have been paralleled by
developments in computationally intensive algorithms
The example shown in following illustrates for Bayesian inference. Arguably, the most important
the calculation of posterior probability for a hidden advance has been the development of a range of
variable. Suppose there are two light bulb factories. techniques based on ▶ Markov Chain Monte Carlo
Factory #1 has 40 qualified products and 10 unquali- (MCMC). The ideas originate from statistical physics,
fied products, while factory #2 has 30 qualified prod- but are now widely used for Bayesian inference.
ucts and 20 unqualified products. One person chooses Although by no means a panacea, carefully crafted
a factory at random, and then picks a light bulb at MCMC algorithms executed on fast computers are
random. It is assumed that the two factories are treated able to solve a phenomenal range of problems that
equally, likewise for the light bulbs. The light bulb would have been considered completely intractable
turns out to be an unqualified one. How probable is it only a few years ago. In the high-dimensional context,
that the light bulb is out of factory #1? it is often necessary to decompose the full problem
Intuitively, it seems clear that the answer should be according to the underlying conditional independence
less than a half, since there are less unqualified light structure of the model, and it is in this context that
bulbs in factory #1. The precise answer is given by graphical model (also known as ▶ Bayesian Network
Bayesian approach. Let H1 correspond to factory #1, Model) is particularly useful.
and H2 to factory #2. It is given that the bowls Gibbs sampler is just one of MCMC procedures for
are identical from the person’s point of view, sampling from posterior distributions. It uses condi-
thus PðH1 Þ ¼ PðH2 Þ, and the two must add up to 1, tional sampling of each parameter given the rest, and
so both are equal to 0.5. The event E is the observation is useful when the structure of the problem makes
of an unqualified light bulb. From the products of the this sampling easy to carry out. Specifically,
factories, we know that PðEjH1 Þ ¼ 10 50 ¼ 0:2 and a Markov chain is constructed with ▶ equilibrium
PðEjH2 Þ ¼ 2050 ¼ 0:4. Bayesian formula then yields probability distribution PðyjZÞ. Each iteration of the
sampler involves cycling through each component of
PðEjH1 ÞPðH1 Þ the K-dimensional vector y in order and sampling
PðH1 jEÞ ¼ from Pðyi jyi ; ZÞ; i ¼ 1; ; K; where yi denotes
PðEjH1 ÞPðH1 Þ þ PðEjH2 ÞPðH2 Þ
the vector of all components of y except yi . Knowledge
0:2 0:5 (2)
¼ ; of the ▶ Bayesian network for the model can simplify
0:2 0:5 þ 0:4 0:5
the computation of these so-called full-conditional
¼ 0:33 distributions. In many cases, the full-conditionals
will be straightforward to sample directly, but in
Before we observed the unqualified light bulb, the others, a Metropolis-Hastings method will be
probability we assigned for the person having chosen required. Here, a proposed new value is simulated
Bayesian Network Model 75 B
from a largely arbitrary proposal distribution, Definition
qðyi jyi Þ and accepted with a probability.
A Bayesian network model is a probabilistic graphical
Bayesian Algorithms and Tools model of a set of variables for considering their condi-
There is a large variety of Bayesian algorithms and tional independencies in a directed acyclic graph B
tools; open source tools widely used in computational (DAG).
biology include Matlab (Matlab, Machine Learning
Tool and Matlab, Data Analysis Tool).
Characteristics
being discrete. If the variable Xi hasri possible values, where N xi ; mi ; s2i is a univariate normal
x1i ; :::; xrii , the local distribution, g xi jpaj;S
i ; y i is an distribution with mean mi and variance vi ¼ s2i for
unrestricted discrete distribution: the i-th variable.
Taking this definition into account, an edge missing
from Xj to Xi implies bji ¼ 0 in the former linear-
g xki paj;S
i ; y i ¼ yxk jpaj (3)
i i regression model. The local parameters are given by
ui ¼ (mi, bi, vi), where bi ¼ (b1i, b2i,..., bi1i)T is
qi;S
where pa1;Si ; :::; pai denotes the values of PaSi , that is, a column vector. A probabilistic graphical model
the set of parents of the variable Xi in the structure S; qi built from these local density functions is known as
is the number of different possible instantiations of the a Gaussian network (Shachter and Kenley 1989).
parent variables of Xi. The local parameters are given The components of the local parameters are
by yi ¼ ðyijk Þk ¼1 ri Þj ¼1 qi . In other words, the param- as follows: mi is the unconditional mean of Xi, vi is
eter yijk represents the conditional probability that var- the conditional variance of Xi given Pai, and bji
iable Xi takes its k-th value, knowing that its parent is a linear coefficient that measures the strength of
variables have taken their j-th combination of values. the relationship between Xj and Xi. Figure 1 is an
We assume that every yijk is greater than zero. example of a Gaussian network in a four-dimensional
space.
Gaussian Network Model For investigating how Gaussian networks and
Here, one example of Bayesian network model is illus- multivariate normal densities are related, the joint
trated, which assumes the joint density function to be density function of the continuous n-dimensional
a multivariate Gaussian density, Gaussian network variable X is by definition a multivariate normal dis-
model (Whittaker 1990). tribution iff
An individual x ¼ (x1, x2,..., xn) consists of
a continuous value in ℜn. The local density function f ðxÞ ¼ Nðx; m; SÞ
for the i-th variable Xi can be computed as the linear
S1 ðxmÞ
¼ ð2pÞ2 jSj2 e2ðxmÞ
n 1 1 T
regression model (5)
!
X where m is the vector of means, S is covariance matrix
f xi paSi ; yi ¼ N xi ; mi þ bji ðxj mj Þ; ni n n, and |S| denotes the determinant of S. The
xj 2pai
inverse of this matrix, W ¼ S1, in which elements
(4) are denoted by wij, is known as the precision matrix.
BCR Receptor Diversity 77 B
This density can also be written as a product of n cannot represent feedback loops in the networks. It is
conditional densities using the chain rule, namely, well known that cyclic machinery is a common mech-
anism in various biological functions. Another is
Y
n
a more serious problem: Arbitral structure learning in
f ðxÞ ¼ f ðxi jx1 ; x2 ; . . . :xi1 Þ
i¼1
Bayesian network is an NP-hard problem, which B
! means that we are allowed to use only heuristic strat-
Y
n X
i1
¼ N x i ; mi þ bji ðxj mj Þ; ni (6) egies for finding approximate solutions. But yet,
i¼1 j¼1 Bayesian network is of certain worth in terms of its
handiness and ease of interpretation.
where mi is the unconditional mean of Xi, vi is the
variance of Xi given X1, X2,..., Xi1, and bji is a linear
coefficient reflecting the strength of the relationship Cross-References
between variables Xj and Xi. This notation allows us to
represent a multivariate normal distribution as ▶ Bayes Rule
a Gaussian network, where for any bji 6¼ 0 with j < i ▶ Bayesian Method
this network will contain an edge from Xj to Xi. ▶ Causal Relationship
▶ Conditional Independency
Applications of Biological Network ▶ Probabilistic Graphical Model
The application of Bayesian network model to the
biological network is first found in Friedman et al.
(2000). One of the important research themes for apply- References
ing Bayesian network model to biological network is to
develop algorithm for obtaining accurate network struc- Dawid AP (1979) Conditional independence in statistical theory.
J Roy Stat Soc Ser B 41:1–31
ture in reasonable computational time, which is called
Friedman N, Linial M, Nachman I, Pe’er D (2000) Using
“structure learning.” Currently, approaches for structure Bayesian networks to analyze expression data. J Comput
learning are roughly divided into two categories. One is Biol 7:601–620
“score-based method,” which provides a structure by Pearl J (2000) Causality: models, reasoning, and inference.
Cambridge University Press, Cambridge/New York
maximizing some scoring function with respect to the
Shachter R, Kenley C (1989) Gaussian influence diagrams.
posterior probability of the structure, such as Bayesian Manag Sci 35:527–550
Dirichlet equivalent (BDe) and Bayesian information Whittaker J (1990) Graphical models in applied multivariate
criterion (BIC), by using several maximization statistics. Wiley, Chichester
methods, such as greedy, K2, and Markov chain
Monte Carlo. The other is “constraint-based method,”
which provides a structure by checking conditional
independency among nodes in the graph, such as SGS/ Bayesian Statistics
PC-algorithm and CI-algorithm. The score-based
methods try to get an optimal structure in terms of ▶ Bayesian Inference
likelihood (i.e., goodness of fit of the model). On the
other hand, the constraint-based method focuses on
consistency of conditional independencies in the graph
(local Markov property). B Cell-mediated Immunity
According to improvement of measurement tech-
nology like DNA microarray, Bayesian networks have ▶ B Cell-mediated Immune Response
become quite a popular approach for genetic network
inference. Nevertheless, although a lot of successful
applications of Bayesian networks under various bio-
logical settings can be enumerated, Bayesian networks BCR Receptor Diversity
still have several limitations. One is that Bayesian
networks accept only acyclic graphs, that is, they ▶ Immune Repertoire Diversity
B 78 Belief Elicitation
Definition i
Pi q
m
Localized disorganized tissue mass that does not spread
and is amenable to local excision. The tissue character- Reject hypotheses i ¼ 1, 2, 3,..., k. The Benjamini–
istics are more similar to the tissue it is derived from. Hochberg method has been proven to control the FDR
for all tests at a level of q.
Cross-References
References
▶ Cancer Pathology
Benjamini Y, Hochberg Y (1995) Controlling the false discovery
rate: a practical and powerful approach to multiple hypothe-
sis testing. J R Stat Soc B 57:289–300
Benjamini–Hochberg Method
Winston Haynes
Seattle Children’s Research Institute, Seattle, Beta Workbench
WA, USA
▶ BlenX
Definition
Characteristics
Definition
Bifurcation Diagram
Bias is a property of experimental design defined as In mathematics, particularly in dynamical systems,
a systematic error in data-derived conclusions regard- a bifurcation diagram shows the possible long-term
ing the unknowns. values (equilibria/fixed points or periodic orbits) of
a system as a function of a bifurcation parameter in
the system. It is usual to represent stable solutions with
Cross-References a solid line and unstable solutions with a dotted line.
An example is the bifurcation diagram of the logis-
▶ Designing Experiments for Sound Statistical tic map:
Inference
xnþ1 ¼ rxn ð1 xn Þ (1)
Bifurcation Types
Definition It is useful to divide bifurcations into two principal
classes:
Bifurcation means the splitting of a main body into two Local bifurcations, which can be analyzed entirely
parts. Bifurcation theory is the mathematical study of through changes in the local stability properties of
changes in the qualitative or topological structure of ▶ equilibria, periodic orbits, or other invariant sets
a given family, such as the integral curves of a family as parameters cross through critical thresholds.
of vector fields, and the solutions of a family of differ- Global bifurcations, which often occur when larger
ential equations. Most commonly applied to the invariant sets of the system “collide” with each
mathematical study of dynamical systems, a bifurca- other, or with equilibria of the system. They cannot
tion occurs when a small smooth change made to be detected purely by a stability analysis of the
the parameter values (the bifurcation parameters) equilibria (fixed points).
B 80 Bifurcation
2.6
2.4
2.2
πe 2
1.8
1.6
1.4
1 1.05 1.1 1.15 1.2 1.25 1.3 1.35 1.4 1.45 1.5
b
In fact, the changes in topology extend out to an arbi- Heteroclinic Bifurcation In mathematics, particu-
trarily large distance (hence “global”). larly dynamical systems, a heteroclinic bifurcation is
Examples of global bifurcations include: a global bifurcation involving a heteroclinic cycle.
1. Homoclinic bifurcation in which a ▶ limit cycle Heteroclinic bifurcations come in two types: resonance
collides with a saddle point (▶ Saddle-Node bifurcations and transverse bifurcations. Both types of
Bifurcation) bifurcations will result in the change of stability of the
2. Heteroclinic bifurcation in which a limit cycle col- heteroclinic cycle.
lides with two or more saddle points At a resonance bifurcation, the stability of the cycle
3. Infinite-period bifurcation in which a stable node and changes when an algebraic condition on the eigen-
saddle point simultaneously occur on a limit cycle values of the equilibria in the cycle is satisfied. This
4. Blue sky catastrophe in which a limit cycle collides is usually accompanied by the birth or death of a
with a nonhyperbolic cycle periodic orbit.
Global bifurcations can also involve more compli- A transverse bifurcation of a heteroclinic cycle is
cated sets such as chaotic attractors. caused when the real part of a transverse eigenvalue of
one of the equilibria in the cycle passes through zero.
Homoclinic Bifurcation In mathematics, This will also cause a change in stability of the
a homoclinic bifurcation is a global bifurcation which heteroclinic cycle.
often occurs when a periodic orbit collides with
a saddle point. Global Bifurcation In mathematics, an infinite-
The image below shows a phase portrait before, at, period bifurcation is a global bifurcation that can
and after a homoclinic bifurcation in 2D. The periodic occur when two fixed points emerge on a limit cycle.
orbit grows until it collides with the saddle point. At As the limit of a parameter approaches a certain critical
the bifurcation point the period of the periodic orbit has value, the speed of the oscillation slows down and the
grown to infinity and it has become a homoclinic orbit. period approaches infinity. The infinite-period bifurca-
After the bifurcation there is no longer a periodic orbit tion occurs at this critical value. Beyond the critical
(Fig. 3). value, the two fixed points emerge continuously from
Bifurcation 83 B
Bifurcation, Fig. 3 A homoclinic bifurcation occurs when bifurcation parameter increases, the limit cycle grows until it
a periodic orbit collides with a saddle point. Left panel: For exactly intersects the saddle point, yielding an orbit of infinite
small parameter values, there is a saddle point at the origin and duration. Right panel: When the bifurcation parameter increases
a limit cycle in the first quadrant. Middle panel: As the further, the limit cycle disappears completely
each other on the limit cycle to disrupt the oscillation Symmetry Breaking in Bifurcation Sets
and form two saddle points. In a dynamical system such as:
Bifurcation,
1.5 ε = −0.1
Fig. 4 Symmetry breaking in
pitchfork bifurcation as the
parameter epsilon is varied.
e ¼ 0 is the case of symmetric 1
pitchfork bifurcation
0.5
x −0.5
−1
−1.5
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
μ
it is also a particular special case of more general called the germs of the catastrophe geometries.
singularity theory in geometry. The degeneracy of these critical points can be unfolded
Bifurcation theory studies and classifies phenomena by expanding the potential function as a Taylor series
characterized by sudden shifts in behavior arising from in small perturbations of the parameters.
small changes in circumstances, analyzing how the When the degenerate points are not merely acciden-
qualitative nature of equation solutions depends on tal, but are structurally stable, the degenerate points exist
the parameters that appear in the equation. This may as organizing centers for particular geometric structures
lead to sudden and dramatic changes, for example, the of lower degeneracy, with critical features in the param-
unpredictable timing and magnitude of a landslide. eter space around them. If the potential function depends
Catastrophe theory, which originated with the work on two or fewer active variables, and four (respectively,
of the French mathematician René Thom in the 1960s, five) or fewer active parameters, then there are only
and became very popular due to the efforts of Christo- seven (respectively, eleven) generic structures for these
pher Zeeman in the 1970s, considers the special case bifurcation geometries, with corresponding standard
where the long-run stable equilibrium can be identified forms into which the Taylor series around the catastro-
with the minimum of a smooth, well-defined potential phe germs can be transformed by diffeomorphism (a
function (▶ Lyapunov Stability). smooth transformation whose inverse is also smooth).
Small changes in certain parameters of a nonlinear These seven fundamental types are now presented, with
system can cause equilibria to appear or disappear, or the names that Thom gave them.
to change from attracting to repelling and vice versa,
leading to large and sudden changes of the behavior of Fold Catastrophe
the system. However, examined in a larger parameter The potential function of one active variable is:
space, catastrophe theory reveals that such bifurcation
points tend to occur as part of well-defined qualitative V ¼ x3 þ ax (13)
geometrical structures.
Catastrophe theory analyses degenerate critical At negative values of a, the potential has two
points of the potential function – points where not extrema – one stable and one unstable. If the parameter
just the first derivative but one or more higher deriva- a is slowly increased, the system can follow the stable
tives of the potential function are also zero. These are minimum point. But at a ¼ 0 the stable and unstable
Bifurcation 85 B
One can also consider what happens if one holds b
constant and varies a. In the symmetrical case b ¼ 0,
x one observes a pitchfork bifurcation as a is reduced,
a
with one stable solution suddenly splitting into two
stable solutions and one unstable solution as the phys- B
ical system passes to a < 0 through the cusp point (0, 0)
(an example of spontaneous symmetry breaking).
Away from the cusp point, there is no sudden change
in a physical solution being followed: when passing
through the curve of fold bifurcations, all that happens
is an alternate second solution becomes available.
3x 2 +a = 0 A famous suggestion is that the cusp catastrophe
can be used to model the behavior of a stressed dog,
Bifurcation, Fig. 5 Stable and unstable pair of extrema disap- which may respond by becoming cowed or becoming
pear at a fold bifurcation angry. The suggestion is that at moderate stress (a > 0),
the dog will exhibit a smooth transition of response
extrema meet, and annihilate. This is the bifurcation from cowed to angry, depending on how it is provoked.
point. At a > 0 there is no longer a stable solution. If But higher stress levels correspond to moving to the
a physical system is followed through a fold bifurca- region (a < 0). Then, if the dog starts cowed, it will
tion, one therefore finds that as a reaches 0, the stability remain cowed as it is irritated more and more, until it
of the a < 0 solution is suddenly lost, and the system reaches the “fold” point, when it will suddenly, dis-
will make a sudden transition to a new, very different continuously snap through to angry mode. Once in
behavior. This bifurcation value of the parameter a is “angry” mode, it will remain angry, even if the direct
sometimes called the tipping point (Fig. 5). irritation parameter is considerably reduced.
Another application example is for the outer sphere
Cusp Catastrophe electron transfer frequently encountered in chemical
The potential function of one active variable is: and biological systems (Xu 1990).
Fold bifurcations and the cusp geometry are by far
V ¼ x4 þ ax2 þ bx (14)
the most important practical consequences of catastro-
The cusp geometry is very common, when one phe theory. They are patterns which reoccur again and
explores what happens to a fold bifurcation if again in physics, engineering, and mathematical
a second parameter, b, is added to the control space. modeling. They are the only way we currently have
Varying the parameters, one finds that there is now of detecting black holes and the dark matter of the
a curve (blue) of points in (a, b) space where stability is universe, via the phenomenon of gravitational lensing
lost, where the stable solution will suddenly jump to an producing multiple images of distant quasars.
alternate outcome. The remaining simple catastrophe geometries are
But in a cusp geometry the bifurcation curve loops very specialized in comparison, and presented here
back on itself, giving a second branch where this alter- only for curiosity value (Fig. 6).
nate solution itself loses stability, and will make a jump
back to the original solution set. By repeatedly increas- Swallowtail Catastrophe
ing b and then decreasing it, one can therefore observe The potential function of one active variable is:
hysteresis loops, as the system alternately follows one
solution, jumps to the other, follows the other back, V ¼ x5 þ ax3 þ bx2 þ cx (15)
then jumps back to the first.
However, this is only possible in the region of The control parameter space is three dimensional.
parameter space a < 0. As a is increased, the hysteresis The bifurcation set in parameter space is made up of
loops become smaller and smaller, until above a ¼ 0 three surfaces of fold bifurcations, which meet in two
they disappear altogether (the cusp catastrophe), and lines of cusp bifurcations, which in turn meet at
there is only one stable solution. a single swallowtail bifurcation point.
B 86 Bifurcation, Supercritical and Subcritical
As the parameters go through the surface of fold Galan J, Freire E (1999) Chaos in a mean field model of coupled
bifurcations, one minimum and one maximum of the quantum wells; bifurcations of periodic orbits in a symmetric
hamiltonian system. Rep Math Phys 44(1):81–86
potential function disappear. At the cusp bifurcations, Gao J, Delos JB (1997) Quantum manifestations of bifurcations
two minima and one maximum are replaced by one of closed orbits in the photoabsorption spectra of atoms in
minimum; beyond them the fold bifurcations disappear. electric fields. Phys Rev A 56:356–364
At the swallowtail point, two minima and two maxima Gutzwiller MC (1990) Chaos in classical and quantum mechan-
ics. Springer, New York. ISBN 0-387-97173-4
all meet at a single value of x. For values of a > 0, beyond http://www.scholarpedia.org/article/Quantum_chaos
the swallowtail, there is either one maximum-minimum http://www.scholarpedia.org/article/User:Gutzwiller
pair, or none at all, depending on the values of b and c. Kleppner D, Delos JB (2001) Beyond quantum mechanics:
Two of the surfaces of fold bifurcations, and the two lines insights from the work of Martin Gutzwiller. Found Phys
31(4):593–612
of cusp bifurcations where they meet for a < 0, therefore Monteiro TS, Saraga DS (2001) Quantum wells in tilted fields:
disappear at the swallowtail point, to be replaced with semiclassical amplitudes and phase coherence times. Foun-
only a single surface of fold bifurcations remaining. dations Phys 31(2):355
Salvador Dalı́’s last painting, The Swallow’s Tail, was Peters AD, Jaffé C, Delos JB (1994) Quantum manifestations of
bifurcations of classical orbits: an exactly solvable model.
based on this catastrophe. Phys Rev Lett 73:2825–2828
Stamatiou G, Ghikas DPK (2007) Quantum entanglement
Butterfly Catastrophe dependence on bifurcations and scars in non-autonomous
The potential function of one active variable is: systems: the case of quantum kicked top. Phys Lett
A 368(3–4):206–214
Wieczorek S, Krauskopf B, Simpson TB, Lenstra D (2005) The
V ¼ x6 þ ax4 þ bx3 þ cx2 þ dx (16) dynamical complexity of optically injected semiconductor
lasers. Phys Rep 416(1–2):1–128
Xu F (1990) Application of catastrophe theory to the DG¼ 6
to –DG
Depending on the parameter values, the potential
relationship in electron transfer reactions. Zeitschrift f€ ur
function may have three, two, or one different local Physikalische Chemie Neue Folge 166:79–91
minima, separated by the loci of fold bifurcations. At
the butterfly point, the different three surfaces of fold
bifurcations, the two surfaces of cusp bifurcations, and
the lines of swallowtail bifurcations all meet up and Bifurcation, Supercritical and Subcritical
disappear, leaving a single cusp structure remaining
when a > 0. Tianshou Zhou
School of Mathematics and Computational Sciences,
References Sun Yet-Sen University, Guangzhou, Guangdong,
China
Blanchard P, Devaney RL, Hall GR (2006) Differential equations.
Thompson, London, pp 96–111
Courtney M et al (1995) Closed orbit bifurcations in continuum Definition
stark spectra. Phys Rev Lett 74:1538–1541
Founargiotakis M, Farantos SC, Skokos CH, Contopoulos G
(1997) Bifurcation diagrams of periodic orbits for unbound In bifurcation theory, a field within mathematics,
molecular systems: FH2. Chem Phys Lett 277(5–6):456–464 a pitchfork bifurcation is a particular type of local
Bile Acid and Xenobiotic System 87 B
bifurcation. Pitchfork bifurcations, like Hopf bifurca- and transporter-related functions that ensure the detox-
tions, have two types – supercritical and subcritical. ification and removal from the body of harmful xeno-
In flows, that is, continuous dynamical systems biotic and endobiotic compounds while ensuring that
described by ODE, pitchfork bifurcations occur gener- primary bile acids (essential for the emulsification and
ically in systems with symmetry. absorption of dietary fats and fat-soluble vitamins) are B
An ODE not eliminated and can be reused. Xenobiotics are
chemical compounds which are foreign to the living
dx
¼ f ðx; r Þ organism. They are not naturally found or expected to
dt be present in the organism as they are neither produced
described by a one parameter function f ðx; r Þ by it nor are they part of the organism’s natural diet.
with r 2 R satisfying: f ðx; r Þ ¼ f ðx; r Þ ( f is an The process of xenobiotic metabolism ensures they are
@f
odd function with regard to x), @x ð0; r 0 Þ ¼ 0, broken down and either put to good use or detoxified
@f 2 @f 3 @f and removed from the organism. Examples of xenobi-
@x2 ð0; r 0 Þ ¼ 0, @x3 ð0; r 0 Þ 6¼ 0, @r ð0; r 0 Þ ¼ 0,
@f 2 otics are drugs, pesticides, and carcinogens. Endobi-
@x@r ð0; r 0 Þ 6¼ 0 has a pitchfork bifurcation at
otics are chemicals produced by an organism, such as
ðx; r Þ ¼ ð0; r 0 Þ. The form of the pitchfork is given by steroid hormones or bile acids. They are produced to
the sign of the third derivative: carry out a variety of functions, e.g., acting as
@f 3 < 0 supercritical a signaling molecule or facilitate absorption of dietary
ð 0; r Þ
@x3
0
> 0 subcritical fats; however, their concentrations need to be strictly
regulated as they can become toxic. The overall
▶ metabolic flux of the BAXS is primarily achieved
through the activities of nuclear receptors, which have
Bigraph the ability to directly bind to DNA and regulate ▶ gene
expression. Nuclear receptors can be thought of as
▶ Bipartite Graph metabolic sensors of exogenous and endogenous
toxins. Detailed knowledge of the factors that govern
the activity of nuclear receptors is required to under-
stand a range of physiological processes, such as drug-
Bile Acid and Xenobiotic System drug interactions, intracrine hormone metabolism,
▶ xenobiotic clearance, and cholesterol homeostasis
Noel Kennedy1, Paul Thompson1, Oliver Schmidt1, and lipid homeostasis.
Werner Dubitzky2 and Huiru Zheng3
1
School of Biomedical Sciences, University of Ulster,
Coleraine, UK Characteristics
2
Biomedical Sciences Research Institute, University of
Ulster, Coleraine, UK A System View
3
School of Computing and Mathematics, Computer ▶ Bile acids represent essential but also toxic biolog-
Science Research Institute, University of Ulster, ical reagents whose concentrations within the body
Jordanstown, UK require critical maintenance. Many of the genetic fac-
tors that dictate bile acid concentration also govern the
detoxification and removal from the body of many
Synonyms drugs and foreign compounds. These overlapping bio-
logical processes define a network or system termed
BAXS; Bile acid system the bile acid and xenobiotic system which involves the
coordinated activities of many genes in different
Definition tissues.
Bile acids are necessary for the emulsification and
The bile acid and xenobiotic system (BAXS) defines absorption of dietary fats and the regulation of choles-
an intricate physiological network of chemoprotective terol homeostasis. They are synthesized in the liver
B 88 Bile Acid and Xenobiotic System
Bile Acids
from the liver
Gall bladder
Bile Acids
95% reabsorbed
Duodenum IIeum
Small Intestine
from the catabolism of cholesterol forming the primary Given the multi-scale nature of the BAXS, it is difficult
bile acids, cholic and chenodeoxycholic acids. Bacte- to assess the exact role of individual receptors and their
rial flora dehydroxylates a portion of the primary bile activating/deactivating ▶ ligands with respect to the
acids in the intestinal lumen, resulting in secondary overall BAXS flux and its variation throughout
bile acids, deoxycholic and lithocholic acids. These a number of participating organs. A comprehensive
four bile acids possess detergent-like properties neces- description of the interacting components that govern
sary for the absorption of dietary lipids and fat-soluble BAXS ▶ gene expression would enable the identifica-
vitamins. When present at high concentrations, they tion of regulatory “nodes” as targets for treatment
can become toxic; therefore, ▶ bile acid concentra- regimes, facilitate a deeper understanding of the com-
tions need to be appropriately regulated and recycled ponents impacting drug-drug interactions, and provide
(Lefebvre et al. 2009). a framework for the design of large-scale, integrated
Similarly, the BAXS detects any accumulation of prediction studies.
xenobiotic and endobiotic compounds and facilitates
their detoxification and removal from the body. This Nuclear Receptors
is accomplished through a complex network of sensors Nuclear receptors are a superfamily of proteins which
in the form of nuclear receptors that function as can bind directly to DNA and upregulate or
ligand-activated transcription factors (Kliewer and downregulate the transcription of a gene. Within the
Willson 2002). The process of enterohepatic circula- BAXS, nuclear receptors act like a network of sensors
tion (as depicted in Fig. 1) ensures 95% of bile acids detecting compounds such as hormones or xenobiotics.
are recycled. These compounds, referred to as ligands, bind to the
The BAXS involves the coordinated activities of nuclear receptors and can either activate them, leading
a number of genes across multiple temporal and spatial to increased transcription, or deactivate them, leading
scales. Basic BAXS processes and their time scales to repression of gene transcription. A bound nuclear
include the binding of ▶ ligands to nuclear receptors receptor may also dimerize with another nuclear recep-
(seconds), ▶ gene expression and ▶ gene regulation tor to form a complex which then binds to response
(hours), ▶ transporter protein activity (minutes), and elements located in the promoter region of the gene.
metabolic enzyme activity (seconds). Spatially, This activates the gene and transcription is increased
BAXS components range from the dimension of mol- considerably. Nuclear receptors can also repress gene
ecules (e.g., nuclear receptors) to organs (e.g., liver). expression through competitive inhibition, whereby
Bile Acid and Xenobiotic System 89 B
Bile Acid and Xenobiotic
System, Fig. 2 The role of
nuclear receptors, illustrating
the competition between
nuclear receptors for the
ligands they bind and the B
genes targeted
the nuclear receptor competes with other receptors for The BAXS main process revolves around the circu-
the ligands they bind, or by binding to the promoter lation and critical maintenance of bile acid concentra-
region of the gene, thus reducing the efficacy of acti- tion and the detection and removal of harmful
vation and therefore gene transcription. Nuclear recep- compounds. This process is composed of multiple sub-
tors are classified as ▶ transcription factors and their processes (e.g., transactivation, interactions with
combined inhibitory and activating effects regulate co-activators or corepressors, heterodimerization)
gene expression. This is vital in controlling develop- operating on different time and space scales.
ment, metabolism and maintaining adult homeostasis In the BAXS, the initial stimuli leading to
(▶ Homeostasis). a physiological response would be the binding of
Nuclear receptors serve to detect fluctuations in a ligand by a nuclear receptor. Subsequently, the
concentration of many compounds and initiate a bound nuclear receptor binds to response elements in
physiological response by regulating the BAXS. the target genes, leading to increased ▶ gene expres-
▶ Transcriptional regulation by nuclear receptors sion and the cascading effects that would ensue. Fur-
involves both activating and repressive effects upon ther processes include conjugation and transporter
specific groups of genes. Figure 2 illustrates the overlap functions (Stieger and Meier 1998).
present between nuclear receptors and the genes they Examples of nuclear receptors within the BAXS
target and also the ligands that bind to and activate them. include the pregnane X receptor (PXR), farnesoid
It is these factors that contribute to the phenomenon X receptor (FXR), constitutive androstane receptor
of drug-drug interactions, e.g., between St. John’s wort (CAR), retinoid X receptor (RXR), and vitamin
and Cyclosporine (Barone et al. 2000), or St. John’s D receptor (VDR). A ligand can be either endogenous,
wort and an oral contraceptive (Hall et al. 2003). e.g., a hormone or bile acid, or exogenous such as
Positive feed-forward and negative feedback loops can a drug, e.g., Rifampicin, St. John’s wort, Dexametha-
also occur, e.g., within the cholesterol ▶ metabolic sone. A ligand binds to the nuclear receptor which may
pathway (Eloranta and Kullak-Ublick 2005). also bind to another receptor of the same type to form
B 90 Bile Acid and Xenobiotic System
a homodimer (homodimerization) or with other through ligand binding, expression of CYP3A4 and
nuclear receptor complexes to form a heterodimer MDR1 are considerably increased, resulting in the
(heterodimerization). The overall complex then binds metabolism of Hyperforin and its transport from
to the ▶ promoter or ▶ enhancer of a specific gene and the cell into the intestinal lumen (Lin et al. 2009). In
this either upregulates or downregulates expression of the absence of receptor binding, Ritonavir inhibits
that gene depending on the complex formed. transcription of CYP3A4 and MDR1. In theory, this
could lead to accumulated levels of Hyperforin in the
BAXS Process and Example liver as unbound Ritonavir inhibits the metabolism
The example in Fig. 3 shows the effects of Ritonavir mechanism.
(an antiretroviral drug from the protease inhibitor class FXR is activated by primary and secondary bile
used to treat HIV infection and AIDS) on the metabo- acids, chenodeoxycholic acid (CDCA) and lithocholic
lism of Hyperforin (a phytochemical produced by acid (LCA). It upregulates transcription of CYP3A4,
some of the members of the plant, e.g., Hypericum multidrug-resistant protein 2 (MRP2) and bile salt
perforatum which is commonly known as St John’s efflux pump (BSEP), both members of the ATP-
wort) in the liver and the overlap of this process with binding cassette (ABC) transporter superfamily of
FXR-mediated primary and secondary bile acid metab- membrane-associated proteins responsible for
olism. Both Ritonavir and Hyperforin are activating transporting bile acids into the bile duct. In both pro-
ligands for PXR (Chang 2009; Willson and Kliewer cesses, an overlap occurs at CYP3A4. A patient taking
2002). CYP3A4 is a member of the cytochrome P450 Hyperforin will have increased expression of CYP3A4
superfamily of enzymes and one of the most important which may lead to a deficiency in ▶ bile acid concen-
involved in xenobiotic metabolism. Multidrug resis- tration as this gene metabolizes bile acids. Similarly,
tance protein 1 (MDR1) is a member of the ATP- a patient with high bile acid concentrations may reduce
binding cassette (ABC) transporter superfamily of the efficacy of Hyperforin (if taken) as transcription of
membrane-associated proteins responsible for CYP3A4 is increased. If Ritonavir is added, then bile
transporting a wide variety of substrates from the cell acids and Hyperforin could accumulate to toxic levels
(Koudriakova et al. 1998). Once PXR is activated in the liver.
Binding Affinity 91 B
Cross-References
Bile Acid System
▶ Bile Acid and Xenobiotic System
▶ Enhancer ▶ Bile Acid and Xenobiotic System
▶ Gene Expression B
▶ Gene Regulation
▶ Ligand
▶ Metabolic Flux Analysis Bimodality
▶ Metabolic Networks, Databases
▶ Metabolic Pathway Analysis ▶ Bistability
▶ Promoter
▶ Transcription Factor
▶ Transcriptional Regulation
▶ Xenobiotics Binding Affinity
Xiujun Zhang
Institute of System Biology, Shanghai University,
References
Shanghai, China
Barone G, Gurley B, Ketel B, Lightfoot M, Abul-Ezz S (2000)
Drug interaction between St. John’s wort and cyclosporine.
Ann Pharmacother 34(9):1013–1016 Definition
Chang T (2009) Activation of pregnane X receptor (PXR) and
constitutive androstane receptor (CAR) by herbal medicines.
AAPS J 11(3):590–601 Binding affinity is a measure of the tendency or
Eloranta JJ, Kullak-Ublick GA (2005) Coordinate transcrip- strength of interactions between molecules. The mol-
tional regulation of bile acid homeostasis and drug metabo- ecules that can bind together include proteins, DNA,
lism. Arch Biochem Biophys 433(2):397–412
antibodies, enzymes, and some other organic mole-
Hall SD, Wang Z, Huang S, Hamman MA, Vasavada N,
Adigun AQ, Hilligoss JK, Miller M, Gorski JC (2003) The cules such as drugs. The result of molecular binding
interaction between St John’s wort and an oral contraceptive is formation of a molecular complex such as protein-
[ast]. Clin Pharmacol Ther 74(6):525–535 protein, protein-DNA, and protein-drug complex.
Kliewer SA, Willson TM (2002) Regulation of xenobiotic and
Binding affinity can be quantified and detected by
bile acid metabolism by the nuclear pregnane X receptor.
J Lipid Res 43(3):359–364 some physical techniques (Karney et al. 2005). For
Koudriakova T, Iatsimirskaia E, Utkin I, Gangl E, example,
Vouros P, Storozhuk E, Orza D, Marinina J, Gerber N
(1998) Metabolism of the human immunodeficiency L þ P LP (1)
virus protease inhibitors Indinavir and Ritonavir by human
intestinal microsomes and expressed cytochrome where L is a ligand, P is a protein, LP is the ligand-
P4503A4/3A5: mechanism-based inactivation of cyto- protein complex.
chrome P4503A by Ritonavir. Drug Metab Dispos
The concentration of the ligand-protein complex is
26(6):552–561
Lefebvre P, Cariou B, Lien F, Kuipers F, Staels B (2009) Role of given by the dissociation constant:
bile acids and bile acid receptors in metabolic regulation.
Physiol Rev 89(1):147–191 ½L ½P
Kd ¼ (1)
Lin YS, Yasuda K, Assem M, Cline C, Barber J, Li C, ½LP
Kholodovych V, Ai N, Chen JD, Welsh WJ, Ekins S, Schuetz
EG (2009) The major human pregnane X receptor (PXR) The binding affinity is defined as:
splice variant, PXR.2, exhibits significantly diminished
ligand-activated transcriptional regulation. Drug Metab kd
!
Dispos 37(6):1295–1304 NA
pKd ¼ log10 (2)
Stieger B, Meier PJ (1998) Bile acid and xenobiotic transporters 1 kmolm3
in liver. Curr Opin Cell Biol 10(4):462–467
Willson TM, Kliewer SA (2002) PXR, CAR and drug metabo-
lism. Nat Rev Drug Discov 1(4):259–266 where NA is the Avogadro constant.
B 92 Binomial Distribution Without Replacement
References
Biochemical Systems Optimization
Karney CF, Ferrara JE, Brunner S (2005) Method for Through Mathematical Programming
computing protein binding affinity. J Comput Chem
26(3):243–251
Julio Vera1 and Néstor V. Torres2
1
Department of Systems Biology and Bioinformatics,
Institute of Computer Sciences, University of Rostock,
Rostock, Germany
Binomial Distribution Without 2
Department of Biochemistry and Molecular Biology,
Replacement University of La Laguna, San Cristóbal de La Laguna,
Islas Canarias, Tenerife, Spain
▶ Hypergeometric Distribution
Synonyms
▶ Biological Activity
Definition
step 1
kinetic
model
S-system
model
step 2
• stability
optimization engine • robustness
• dynamics
Design of multiobjective
linear program
1. objectives
max/min f1(y),...,fn(y)
2. subject to
• steady state equations
step 3
• metabolic burden logarithmic transformation
• fluxes constraints y = In(x)
• metabolites bounds
• enzymes bounds
• others
antilogarithmic transformation
efficient solutions set
step 4
• dynamics
selected
solutions
Step 4: Analysis and refinement of the solution. Here Case Study 1. The Monoobjective Version of the
the system’s properties at the optimal state (stabil- IOM: Optimization of the L-()-Carnitine
ity, robustness, and dynamics) are evaluated. If the Biosynthesis by Escherichia Coli
optimal solution is sound, it is chosen as a solution; L-()-carnitine is a chiral compound widely distrib-
otherwise the imposed restrictions or the objective uted in nature and with a wide range of medical appli-
function are refined and the optimization process cations. A great deal of the L-carnitine is produced by
repeated until a satisfactory solution is reached. In chemical synthesis, with the disadvantage of produc-
the case where an S-system model was built from ing a racemic mixture that is necessary to separate in
a previous kinetic model, the S-system solution a costly process.
must be transferred to the original model. In the Figure 2 shows the experimental set up for the
following we will illustrate the application of this biotransformation of crotonobetaine into L-carnitine
approach to two case studies. by an overproducing E. coli strain in a cell recycle
Biochemical Systems Optimization Through Mathematical Programming 95 B
Biochemical Systems INPUT
Optimization Through glycerol X6
Mathematical crotonobetaine X7
Programming,
Fig. 2 Experimental set up
for the biotransformation of B
crotonobetaine into
L-()-carnitine by E. coli glucose X1
strains biomass X4
membrane OUTPUT
filtration
module volumetric
flux X5
X2 X3
bioreactor. The model includes the metabolic transfor- 0:0211 ¼ 0:9162 Y1 0:5675 Y4 þ 0:5676 Y5
mation of crotonobetaine into g-butyrobetaine, þ Y6 0:5675 Y8
through the activity of a previously described
0:6433 ¼ 0:1424 Y2 þ 0:3538 Y3 0:0249 Y4
crotonobetaine reductase. In the optimization proce-
dure the dependent variables are X1 to X4, while X5 to þ 0:0249 Y5 þ 0:0605 Y7 0:0245 Y9
X9 are the parameters. The original model set up was 0:6246 ¼ 0:1106 Y2 0:3926 Y3 þ 0:0257 Y4
translated to the S-system representation yielding the 0:0257 Y5 þ 0:0254 Y9
following S-system representation (Alvarez-Vasquez 2:706 ¼ 0:8524 Y1 þ Y8
et al. 2002):
where Yi is ln(Xi) for i ¼ 1,. . ., 9. At this point it is
dX1
¼ X5 X6 1:0213 X10:9162 X40:5675 X50:4324 X80:5675 important to define the extent to which changes in the
dt variables and parameters are allowed. Generally this
dX2 variation range will be between 0.5 and 1.5 times the
¼ 0:2438 X30:3538 X40:9394 X50:0605 X70:0605 X90:9292 baseline values. An exception here is X4, the biomass,
dt
which can be allowed to increase up to two times the
0:464 X20:1424 X40:9643 X50:0356 X90:9537 baseline. Accordingly:
dX3
¼ 0:3844 X20:1106 X4 X90:989 3:0746 Y1 4:1732; 2:6909 Y2 3:7895;
dt 2:3277 Y3 3:4263; 1:8068 Y4 3:1931;
0:2058 X30:3926 X40:9742 X50:0257 X90:9636 0:6931 Y5 0:4054; 3:9120 Y6 5:0106;
3:2188 Y7 4:3174; 1:1988 Y8 0:1002;
dX4 4:1214 Y9 6:424:
¼ 0:0053 X10:8524 X4 X8 0:08 X4 :
dt
With the defined linear program two optimization
Subsequently the (mono)objective function is tasks were carried out: with one or two decision
defined, namely, the net rate of L-carnitine biosynthe- (independent) variables. The obtained optimal profiles
sis, expressed as Vc ¼ X3·X5. In logarithmic were translated from the S-system to the original
coordinates this function becomes Y3 + Y5, where model (K-M) and then implemented in the bioreactor.
Yi is ln(Xi). The steady-state condition, once linearized Table 1 shows the experimental results obtained which
in logarithmic coordinates, obtained was: are in good agreement with the theoretical predictions.
B 96 Biochemical Systems Optimization Through Mathematical Programming
Biochemical Systems Optimization Through Mathematical Programming, Table 1 Comparison between predicted (K-M)
and actual experimental values (Exp.) of L-()-carnitine production rate by E. coli in a cell recycle crotonobetaine biotransformation
system. The optimized parameter profiles for changes in one (1; dilution rate, X5) and two parameters (2; dilution rate and
crotonobetaine concentration in the input, X7) are shown. Results are given as the values divided by the basal
1 2
Variables Basal K-M Exp. K-M Exp.
Dependent
Glycerol, X1 43.28 mM 1 0.96 1 0.95
Crotonobetaine, X2 29.49 mM 1 0.97 1.49 1.77
Carnitine, X3 20.51 mM 1 1.01 1.11 1.13
Biomass, X4 12.18 g/L 1.5 1.53 1.5 1.53
Independent
Dilution rate, X5 1 L/h1 1.5 1.5
Crotonobetaine input, X7 50 mM 1 1.33
Production rate, Vc 21.1 mM/h1 1.5 1.54 1.65 1.74
dX5
Case Study 2. The Multiobjective Version of the ¼ 1:406 X0:2605
3 X0:152
4 X0:0739
5 X0:5
9 X10
0:5
dt
IOM: Application to the Ethanol Production by
2:9437 X0:1962 X0:1791 X0:2354 X0:3514 X0:2925
Saccharomyces Cerevisiae 1 2 5 7 8
X
5 X
13
Fint ðYÞ ¼ Yi and Fenz ðYÞ ¼ Yi : s is the ratio between the total intermediate concen-
i¼1 i¼6 tration of the current solution and the one in the orig-
inal basal solution of the system; y is the ratio between
In the optimization program three types of con- the total enzyme activities at the optimum solution and
straints are considered (Vera et al. 2003): (a) those the ones at the basal steady state, while r describes the
that guarantee that solutions correspond to a steady- same ratio for the ethanol production of the system. In
state solution of the system; (b) those setting the phys- order to support the biotechnologists making the
iologically feasible lower and upper boundaries for the proper decision, additional criterions were introduced.
model variables; and (c) additional constraints related We ruled out all solutions with the lowest admissible
to special features of the considered biosystem. The ethanol production (rmin ¼ 2:0) and ranked the
constraints that set up the boundaries for the model remaining ones according with their parametric
variables (metabolite or enzyme concentrations) distance to the so-called utopian point, a fictitious
B 98 Biochemical Systems Optimization Through Mathematical Programming
Biochemical Systems
Optimization Through
Mathematical
Programming,
Fig. 4 Efficient solutions for
5
the multicriteria optimization
of ethanol production by 4
S. cerevisiae. (●): Set I,
solutions with the largest 3
D(X,Xutp) value. (♦): Set II, ρn
solutions closer to the utopian 2
point (star). (■) Set III,
solutions with intermediate 1
values of D(X,Xutp).
0
(X) represents the anti-ideal 0
point
1 5
2 4
3 3
σn
2 θn
4
1
5 0
solution constructed with the optimal values of ethanol high productivity and enzyme levels but low metabo-
production (rutp), total intermediate concentration lite concentrations (0.8–1.85). Finally, Set III repre-
(sutp), and total enzyme concentration (yutp) from the sents those with intermediate values of D(X,Xutp), high
computation of separated monoobjective programs: productivity and enzyme levels but medium values of
total metabolites concentration. When the parametric
rutp ¼ 4:986 sutp ¼ 0:5 yutp ¼ 1:0 distance is considered the criterion to choose the best
solution two solutions were found (five-pointed star in
In a similar way, we defined the anti-utopian solu- Fig. 4: Table 2).
tion, extracting the worst value from every column in These solutions represent opposite alternatives
the payment matrix for the separated monoobjective from a biotechnological perspective. Solution
programs (Vera et al. 2003): I prefers the increase in the ethanol production at the
price of high cell resources consumption, while solu-
rfun ¼ 2:0 sfun ¼ 4:76 yfun ¼ 3:30 tion II establishes a biotechnological compromise
between satisfactory increasing in the ethanol produc-
With this information we defined a weighted para- tion and non-excessive cell resources consumption.
metric distance from every solution to the utopian It should be noted that the heuristic nature of the last
point, which is described by the following equation: step in the IOM method may provoke a loss of
efficiency in the solutions when the behavior of the
DðX; Xutp Þ original model departs from their S-system expansion.
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
#ffi
u " This may result in the violation of some of the restric-
u1 rutp rðXÞ 2 s utp sðXÞ 2 y utp
yðXÞ
2
¼t þ þ tions imposed. However, experience shows that these
3 rutp rfun sutp sfun yutp yfun
violations are often negligible and the steady-state
solutions usually stay in the vicinity of the real efficient
Figure 4 shows the solutions grouped in three sets solution. Moreover, the method incorporates tools to
according with their D(X,Xutp) value. Set I corresponds estimate solutions closer to the optimum in the
to those solutions with the largest D(X,Xutp) value. This nonlinear domain. This latter step consumes consider-
group is characterized by a high productivity (r), but able computational effort, but even more important,
also by high total intermediate concentration (s), an the solution obtained ceases to be optimal. The practi-
undesirable property. Set II includes the closer solu- cal application of the method indicates however that in
tions to the utopian point D(X,Xutp), characterized by most cases it remains a high-quality solution.
Biochemical Systems Optimization Through Mathematical Programming 99 B
Biochemical Systems Optimization Through corresponding values for the fluxes, respectively. The
Mathematical Programming, Table 2 Best two efficient values assigned to the weights lj and ll are propor-
solutions for the multicriteria optimization of ethanol production
by S. cerevisiae tional to the relative importance of each key metabolite
and flux. Every term in the equation is scaled by its
Sol r s y D(X,Xutp)
value in the healthy condition such that all contribu- B
I 4.987 1.812 3.301 0.604
tions have comparable weights.
II 2.791 0.596 2.798 0.613
Dysfunction description. A mathematical model
description of the functional origin of the disease,
which is obtained by imposing a characteristic value
Case Study 3. Combining Mathematical Modeling to the enzyme activities whose deregulation originates
and Optimization to Detect Potential Drug Targets the pathology:
in Human Diseases
The methodology discussed here can be used to detect Xj ¼ XjPS
potential drug targets in diseases that are originated by
dysfunctions of biochemical networks. The strategy is
Physiological constrains. Some additional equa-
to point out one or more enzymes in the biochemical
tions grating that the biochemical network configura-
network, whose modulation via specific drugs permits
tion obtained is physiologically acceptable. These
to redirect some biochemical fluxes in the network. In
equations can be steady-state conditions and upper
this manner, the critical fluxes and metabolites for the
and lower bounds in the concentrations of the metab-
system are restored to values similar to the ones found
olites, respectively:
in healthy subjects (Vera et al. 2007). The method
requires the derivation of a mathematical model
dXi
describing the dynamics of the metabolic network ¼ 0; i ¼ 1; . . . ; n:
dt
under analysis. It is also necessary to retrieve biomed-
ical knowledge required to select the critical fluxes and
metabolites unbalanced in the disease condition, as XiLB Xi XiUB
well as their values in healthy subjects. Model optimi-
zation is used to determine which biochemical The solutions generated by the method are sets of
processes in the network must be modulated, via computationally predicted values for metabolites and
drug-mediated inhibition or activation, in order to substrates, as well as the required modulation level of
move the system from the current values of critical the targeted enzyme. The selected solutions are those
metabolites and fluxes toward the healthy ones. The for which the computed values of critical metabolites
simulation-derived therapeutic treatments may and fluxes are sufficiently close to the ones in the
consider the modulation of one reaction in the network healthy condition without compromising the values
at a time, but also suggest strategies to develop multi- of other metabolites. In Vera et al. (2007) this meth-
factorial treatments. The optimization program odology was used to identify potential enzyme drug
suggested contains at least the following elements. targets for hyperuricemia, a disease associated with
Objective. An objective function, which translated a defect in phosphoribosyl pyrophosphate synthetase,
in mathematical terms the minimization of the distance an enzyme that regulates the de novo metabolic syn-
between the values of the critical metabolites and thesis of purines. The final pathological effect of this
fluxes in the current systems state with respect to deregulation is an abnormal level of uric acid, which
those in the desired health state: triggers arthritic pain and nephropathy. We used
a power-law mathematical model of purine metabo-
" #
Xl Xj XHS X P Ji JiHS lism (Curto et al. 1997) and defined a mathematical
j
Min lj þ li HS
optimization program with the structure described
X HS
j
Ji
j¼1 i¼1 above. We defined as critical metabolite the uric acid
and computed solutions of the optimization program
where Xj and XjHS denote the current and the health consisting in the inhibition of enzymes and the
values of the metabolites and Ji and JiHS are the potential application of a diet with low levels of purine
B 100 Biochemical Systems Optimization Through Mathematical Programming
Biochemical Systems
Optimization Through
Mathematical
Programming,
Fig. 5 Sketch of the
mathematical model for purine
metabolism used in our
analysis (see Vera et al. 2007
for complete description). In
red we highlight the metabolic
fluxes whose inhibition in
combination with a diet low in
purine precursors reduces in
our simulations the levels of
uric acid. Precisely, those
solutions propose the
drug-mediated inhibition
of one of the following
enzymes: Adenine
phosphoribosyltransferase
(Vden), AMP deaminase
(Vampd), RNases to AMP and
GMP (Vrgna, Vrnaa), 50
(30 )-Nucleotidase (Vdgnuc),
Guanine hydrolase (Vgua), or
xanthine oxidase (Vxd, Vhxd).
of BioCreative II.5, the third installment of this chal- that could be used as common basis for the evaluation,
lenge, the organizers asked participants to automati- a so-called gold standard. To this end, article
cally generate parts of the SDAs from FEBS Letters annotations that had initially been made by authors
publications (Leitner 2010a). In total, 15 research were refined by three independent MINT database
groups worldwide followed this call and participated bio-curators, who continued refining the results until
in BioCreative II.5. The aim was to provide they reached a consensus annotation for those articles
a quantitative insight on how well automated systems that matches the MIMIx standard for annotating
can reproduce human annotations. Additionally, anno- PPIs (Orchard et al. 2007). In a second round
tations were provided by the authors of the papers of pruning – after the BioCreative participants had
themselves as well as by expert bio-curators submitted their results – the BioCreative organizers
of the MINT PPI database (Ceol et al. 2010; reevaluated the annotations with the results of the
Chatr-aryamontri et al. 2007) in the context of the automated systems and identified potentially errone-
associated FEBS Letters experiment. All these results ous annotations by examining results where the auto-
were evaluated then by the BioCreative organizers mated systems unanimously reported annotations that
(Leitner 2010b). Together with the author and curator were inconsistent with the curator data; Requesting
annotations it was possible to compare the results of the bio-curators counsel on the found inconsistencies
the automated systems to the manual annotations. and making corrections where necessary, this third
The SDA initiative and BioCreative II.5 are the first step produced the final “gold standard” annotations
quantitative approach to directly compare the impact for the evaluation.
of generating annotations for scientific manuscripts The applied performance measures are intended to
by various plausible approaches, namely, database reflect (a) the overall quality of the annotations made
curators (Howe et al. 2008), manuscript authors, and by any of the sources (i.e., systems, authors, and
automated text-mining systems. At current funding curators) and (b) the performance of the automated
levels, PPI databases can only keep up with a fraction systems when ranking their results by a probability
of the data that is published by the biosciences. score scheme, so-called “confidence values”
This leaves the larger part of generated scientific data (i.e., probability values in the (0,1] range) that partic-
“dormant” in written text: This data is neither trivial ipants were asked to report together with their annota-
to retrieve or locate by researchers looking for a partic- tions. This latter evaluation method was used as
ular information, nor is this format accessible for large- the primary evaluation target, as it measures the
scale data manipulation as would be required by Systems systems’ ability to produce result lists that can be
Biology or other data mining approaches in Computa- used by human annotators as initial dataset to
tional Biology. Therefore, BioCreative II.5 intended to base their own annotation effort on. Therefore, two
investigate the feasibility of pursuing new avenues to statistical measures were used:
increase the coverage of data deposited in structured (a) The harmonic F-measure (aka. F1-score) for the
repositories as a result of scientific publications, using overall quality of the result sets (both human and
PPIs as the exemplary data in this setting. For example, system annotations)
in the realm of PPIs, the united consortium of all major (b) The area under the interpolated precision/recall
PPI databases, IMEx, only manages to cover an esti- curve (aka. AUC iP/R) for the quality of automated
mated 10–20% of the existing interactions, limiting the results given the ranking
reconstruction of interactive graphs for protein interac- The competition itself was carried out online using
tion maps to a fraction of the existing data and making it an extended version of the BioCreative Meta-Server
difficult for biologists to determine all known, proven (BCMS), a special framework that interacts via web
interaction partners with their proteins of interest. services with the automated systems provided by the
challenge’ participants (Leitner et al. 2008) (also see
the entry ▶ BioCreative Meta-Server and Text-Mining
Methods Interoperability Standard in this encyclopedia).
Contrary to the initial version, the extended BCMS
The most crucial part of this effort was to produce allows participants to take direct control of and
a high-quality dataset with near-perfect annotations monitor the communication between their
BioCreative II.5 and the FEBS Letters Experiment on Structured Digital Abstracts 103 B
systems and the BCMS. This tool set the whole
challenge in an environment that simulated the applied
use case in which these annotations would be Full Text Selection
generated online and provided to a human annotator SDA
Structured
Results
BioCreative II.5 and the FEBS Letters Experiment on Struc-
Detailed results are found in the relevant tured Digital Abstracts, Fig. 1 The simulated online setting
of BioCreative II.5 reproduced by the BioCreative Meta-Server.
publications reported across three journals – Nature
The gray box “Users” represents the hypothetical human anno-
Biotechnolgy, FEBS Journal, and IEEE’s Transactions tator creating a SDA, while the “BCMS” and “Annotation
on Computational Biology and Bioinformatics Servers” form the technical framework created for the challenge
(TCBB); all (Leitner 2010a, b, c). They describe the
outcome and conclusions in meticulous detail and
also spurred research of independent organizations –
the challenge’s participants – resulting in nine The main comparison of the results of both the
additional research articles that formed part of the FEBS experiment and the BioCreative II.5 challenge
TCBB special issue that contains the main BC II.5 dealing with each source of annotations (authors,
article. curators, and text-mining systems) has been published
The texts this effort is based on, 122 PPI publica- in Nature Biotechnology; this is also the lead article of
tions, together with more than 1,000 negative example this joint effort. The comparison focused on the correct
articles (i.e., articles that explicitly do not contain PPI annotation of protein [database] identifiers and of the
descriptions with experimental evidence) from the [binary] interaction pairs found in the SDAs, which
same period and journal (FEBS Journal 2008 and were the main two tasks for the automated systems
2009), were provided by the FEBS Journal. With the in BioCreative II.5. The main conclusions from this
friendly permission of Elsevier, they are now accessi- work are:
ble as a free and open resource for further (a) At least the two FEBS magazines (FEBS Journal
research projects in text mining (Leitner 2010c). This and Letters), Cell (their graphical abstracts), and
resource is available through the BioCreative the ScienceDirect annotations (provided by NeXT
homepage (www.biocreative.org), as a so-called Bio) – but potentially other publishers, too – are
corpus. This alone presents an important achievement, very interested in adding annotations to their
as text resources with high-quality annotations are – manuscripts.
mostly due to copyright and other legal issues – scarce (b) Although authors and systems might perform
and very hard to come by; Furthermore, this resource is reasonably well, none of the two produced results
in XML format, which is simpler to read for that are sufficient for database standards.
machines as opposed to the usual PDF format scientific (c) The time requirement of systems is significantly
manuscripts are made available. lower than that of manual annotations
B 104 BioCreative II.5 and the FEBS Letters Experiment on Structured Digital Abstracts
BioCreative II.5 and the FEBS Letters Experiment on Structured Digital Abstracts, Table 1 Individual evaluation results of
the three sources and the combination of authors and curators (left) and results of individual text-mining systems, including their
performance when taking their ranking into account (AUC iP/R, right)
Precision Recall Precision Recall AUC
Task Class (%) (%) F-Score Task Class (%) (%) F-Score iP/R
Protein Systems 74 55 0.59 Protein Best F-score 74 55 0.59 0.53
identifiers Authors 84 66 0.71 identifiers (T42, S1)
Curators 96 89 0.91 Best AUC iP/R 14 73 0.21 0.57
Authors + 96 94 0.95 (T10, R5)
curators
Interaction Systems 53 34 0.37 Interaction Best F-score and 53 34 0.37 0.31
pairs Authors 72 57 0.59 pairs AUC iP/R
Curators 93 83 0.86 Incl. MINT data 64 61 0.58 0.58
Authors + 93 89 0.90 (T18, R1)
curators
(averaging at 2 min/articles, vs. almost 1 h for the author annotations would produce better results than
human annotations) and the quality of systems the two approaches by themselves did.
seems to be sufficient to at least aid human
annotators.
(d) Systems and authors, although producing lower Perspectives
quality results than curators, were able to identify
annotations the curators missed, and in the evalu- With the results of the BioCreative II.5 challenge and
ation it became apparent that when one source the insight created by evaluating it in comparison
based its annotations on another, the quality of with the FEBS Letters SDA experiment, it is clear
the resulting annotations was higher. For example, that neither curators (databases), nor authors, or text-
curators basing their annotations on author results mining systems can be tasked with creating the
performed better than if the curators started anno- needed annotations on their own. Our proposal is to
tating from scratch. A similar correlation is shown combine the approaches (as combining sources at
in annotations in the publications between systems least increased performance in terms of quality and
and authors. it is likely that adding automated systems will
Table 1 shows the evaluation results of the decrease annotation times) and distribute the effort
three sources as well as the performance of curators between all participants. A theoretical framework for
when basing their annotations on author data (left), combining human and machine annotations is
and the detailed results of the best submissions of presented in Leitner and Valencia (2008).
text-mining systems from the BioCreative II.5 partic- A potential outline of this architecture is shown in
ipants (right). Fig. 2 and explained in deep detail in the supplemen-
Last but not least, a detailed analysis of the tary material of the Nature Biotechnology publica-
core text-mining part, that is, BioCreative II.5 tion. Therefore, the further development of the
itself, that includes a comparison of the systems and BioCreative Meta-Server from a demonstration
applying various statistical approaches to the 134 server to a production-ready public service
result sets produced by the 15 participant teams is representing the main resource provided by the text-
part of the third publication in Transactions on Com- mining community is a major objective, and this also
putational Biology and Bioinformatics. This publica- encompasses the generation of an international stan-
tion documents the evaluation metrics (F1-Score, dard for annotating scientific texts, which will
AUC iP/R), the online setting of the challenge with increase the interoperability of text-mining systems
the BioCreative Meta-Server, and shows that combin- and make it easier to interface with tools humans use
ing the results of the text-mining systems with the to annotate the articles.
BioCreative II.5 and the FEBS Letters Experiment on Structured Digital Abstracts 105 B
Paper Biological
SDAs part of repository repository
submission and
review process
Authors
B Curators
B
Web- Publishers Databases
Review Curation
1 UI C 2
process Web- pipeline
Web-
services services
peer-reviewed
SDAs
A D
annotation annotation
assistance during assistance
authoring / reading during curation
BioCreative II.5 and the FEBS Letters Experiment on Struc- extracting annotations from scientific manuscripts (see the
tured Digital Abstracts, Fig. 2 The target architecture to chapter ▶ BioCreative Meta-Server and Text-Mining Interoper-
integrate authors and text-mining systems in the process of ability Standard in this book, too)
Leitner F, Krallinger M, Cesareni G, Valencia A (2010c) The developed (Krallinger et al. 2008a). Some of them
FEBS Letters SDA corpus: a collection of protein interaction have matured to industrial-standard quality sites
articles with high quality annotations for the BioCreative II.5
online challenge and the text mining community. FEBS Lett that are frequented by many users, such as iHOP
584(19):4129–4130 (Hoffmann and Valencia 2004) or Reflect (Pafilis
Orchard S et al (2007) The minimum information required et al. 2009). Others have developed powerful web
for reporting a molecular interaction experiment (MIMIx). service APIs (application programming interfaces),
Nat Biotechnol 25:894–898
as, for example, WhatIzIt (Rebholz-Schuhmann
et al. 2008), or the NaCTeM Service Systems
(Kano et al. 2010). Many more are available as indi-
BioCreative Meta-Server and vidual applications that can be downloaded and
Text-Mining Interoperability Standard installed. However, all this wealth of available soft-
ware, tools, services, and sites does not come without
Florian Leitner, Martin Krallinger and cost. Due to no real community agreement on annota-
Valencia Alfonso tion standards, document formats, and the semantics of
Structural Biology and BioComputing Programme, the generated data, it is painful to interface between
Spanish National Cancer Research Centre (CNIO), tools, especially if they are not from the same organi-
Madrid, Spain zation. Also, the power of even the best systems can be
still outstripped by a consensus result created from
many different systems that all provide the same kind
Definition of annotations (e.g., Smith et al. 2008). In addition, to
use one tool, it often is necessary to rely on results of
Over the past decade, text-mining systems have another. For example, before identifying gene names
matured to serve both bioinformaticians and biologists in an article, such a gene name “tagger” can signifi-
in a variety of ways. However, this also has the nega- cantly gain performance if its annotations are based on
tive impact that a plethora of protocols, sites, and tools the results of a pipeline that first tags the “part-of-
exist that are not compatible between each other. With speech” (PoS) of the words; PoS taggers annotate the
the BioCreative Meta-Server (BCMS) the first proto- grammatical sense of words, such as verb, noun, adjec-
type system to counterbalance this development was tive, etc.. However, while it might be easy to find high-
published. Because of uniting several text-mining sys- quality tools for one specific task, another area often
tems into one service using one protocol, both access can be lacking in terms of the number of systems that
and use of those resources could be significantly are available. Many tools only come with very basic
simplified. In addition, the BCMS provides command-line interfaces or are system libraries that
a straightforward means to combine results from mul- only computer experts know how to make use of. This
tiple text-mining systems, contributing an added ben- creates a very high entry barrier for many potential
efit that can lead to an even higher quality result than users. Even worse, in a few cases research groups
the individual annotations of the systems available decide to not make their tools publicly available; Yet,
through the BCMS. To push the capabilities of text- they might agree to make their pipeline available as
mining systems even further, a project was recently a web service, so that users can gain access to them but
launched to create a text-mining community consensus nobody can gain access to the source code the system
on an interoperability standard for the annotation of relies on.
texts. This project should facilitate the use of annota- All these issues led to a community understanding
tion systems and annotated texts for both text miners that the current status quo is ripe for improvement. In
and consumers of text-mining services. the area of sequence annotation and structure predic-
tion, these bioinformaticians have years ago already
begun to build distributed systems and meta-services.
Characteristics The idea is fairly simple: For distributed systems,
instead of having many different protocols and for-
Over the past decade, many text-mining and informa- mats, a community agrees to certain standards and
tion extraction systems for biomedical texts have been then builds their tools to those specifications
BioCreative Meta-Server and Text-Mining Interoperability Standard 107 B
(nowadays, this design principle is termed “service- scientific community. In essence, it is similar to
oriented architecture”). The earliest example of such as a structure prediction meta-server, but instead of
system is (Bio)DAS, a distributed sequence annotation adapting the interface to every possible text-mining
system (Dowell et al. 2001). On the other hand, as system, it defined a standard exchange format that
combining results leads to often superior consensus any system plugged into the BCMS had to follow. B
predictions, structural biology researchers at the same Thereby, it inherently has advantages similar to the
time begun to create so-called meta-servers (Bujnicki ones of the BioDAS system – namely, a uniform pro-
et al. 2001). In this case, the user contacts a central tocol and data model. The BCMS prototype requests
“service broker” or “meta-server” with the protein and manages annotations of four basic types and works
sequence for which he wishes to receive a structure on PubMed abstracts:
prediction. The meta-server, in turn, instead of calcu- 1. Classifying the abstract as to whether it describes
lating a protein structure on its own, forwards this protein–protein interactions
request to many different prediction servers via web 2. Identifying the mentions of genes and proteins in
services. Then the broker waits for the results to return the abstract
form those prediction servers, collecting each server’s 3. Listing of possible mappings to UniProt accessions
result. Once all results are in, the meta-server creates for the mentions
a superimposed consensus prediction from the individ- 4. Listing the most likely organisms the abstract is
ual the results, which usually tends to be a better esti- discussing
mate of the true protein structure than any single result. However, the current, publicly available version of
Then the user is informed that his result is ready and he the BCMS is a prototype. It only offers these annota-
can fetch the structure and accompanying information tions in a very limited scope: This particular service
directly from the meta-server. broker is limited to the approximately 22,000 PubMed
Both of these approaches would also solve plenty of abstracts used during the BioCreative II challenge.
the issues mentioned with current approaches in text- Therefore, only queries for that data set can be made
mining: The (Bio)DAS approach shows how to reduce and are distributed to the connected servers. In other
interoperability problems by defining a standard for the words, it is not possible to enter any new text to
data exchange. This makes it easy to chain the outputs annotate, request annotations other than the above
of one tool to another and to join or merge data from mentioned four base types, or even just query for any
several sources. The general web service approach other PubMed abstract.
allows organizations to make the fruits of their This prototype project, however, enabled the par-
labor available to the public without having to worry ticipants and developers of the servers to glean some
about loosing control over who has access to their very important insights into the endeavor of setting up
source code. By using the web, a user also does not a distributed and interoperable annotation system for
need to know anything about command-line interac- biomedical texts to and providing a public, automated
tion or programming and can run sequence annotation text-mining platform. These problems can be related to
tools and structure prediction services directly from his the three layers of interoperability (see Fig. 1): First,
browser in a graphical environment he is familiar with. there are low-layer syntactic issues that need appropri-
Last, meta-servers reap the fruits of consensus results, ate solutions, such as:
providing the user with possibly improved results. • Concurrency and thread management on all servers;
All this insight created the motivational basis to a fair queueing policy of requests on the broker/
develop the BioCreative Meta-Server (BCMS) meta-server side
(Leitner et al. 2008). This first prototype was directly • Fail-safe and nonrestrictive handling of the various
created after BioCreative II, a text-mining community possible representations of text in computers (so-
challenge where information extraction systems are called encoding schemes) on all participating
evaluated by an independent group of judges to mea- servers
sure the performance of those systems on some topic in • Compatibility issues with web service
the field of molecular biology (Krallinger et al. 2008b). implementations provided by different platforms
This version of the BCMS became the first attempt to and programming languages (“service
provide true “meta-services” for text-mining to the agnosticism”)
B 108 BioCreative Meta-Server and Text-Mining Interoperability Standard
1
JSON, CSV, Stand-off data,
Any online PC Some application
or inline XML inline XML, or
(annotations) CSV (anns.) B
HTML UI,
server mgmt, REST RPC or REST
system admin (PMIDs, doc) (PMIDs, docs)
REST REST
Website XML & JSON Request manager XML 2
PubMed
Query
retrieval of unknown or
system + entity outdated MEDLINE records
data names docs, jobs
MEDLINE user data & docs anns &
records & MEDLINE inline XML Persistent store
system data records
system data (users, servers),
Fair
MEDLINE records, entity names
queue
PL / Temporaty store 3
Persistent store Temporary store
SQL documents & annotations, logs
docs &
server data & anns validation of identifiers and
entity names docs jobs retrieval of unknown entity names
server data & anns
known entities
inline XML
XML
XML worker RPC Service broker REST 4
Text-mining
servers in
the WWW 5
Gene mention server Protein normalization server PPI relation server etc.
BioCreative Meta-Server and Text-Mining Interoperability manages communication with the text-mining servers, and val-
Standard, Fig. 2 The current design proposal for the final idates database identifiers, adding supplementary information
BCMS. Essentially, there are five units, from top to bottom: (1) (e.g., the protein or gene names) to the annotations; (5) And
Clients accessing the platform, either from a browser or pro- the text-mining servers, providing the actual annotations of any
grammatically via a web service; (2) The front end that provides type agreed upon by the community (i.e., BAIS) via web services
the web site and services, accepts documents, queues jobs, again. Blue arrows symbolize communication channels over the
returns annotations, and communicates with PubMed; (3) The Internet, black dashed arrows outline internal communication
storage layer that is responsible for the data management and pathways for the BCMS components
persistency; (4) The back end that processes inline annotations,
B 110 Bioinformatics
Cross-References Definition
▶ BioCreative II.5 and the FEBS Letters Experiment Biological activity is “the capacity of a specific molec-
on Structured Digital Abstracts ular entity to achieve a defined biological effect” on
a target (Jackson et al. 2007). It is measured in terms of
potency or the concentration of the molecular entity
References
needed to produce the effect (Pelikan 2004).
Bujnicki JM, Elofsson A, Fischer D, Rychlewski L (2001) Struc- A biological activity is determined by means of
ture prediction meta server. Bioinformatics 17(8):750–751 a biological assay.
Dowell RD, Jokerst RM, Day A, Eddy SR, Stein L (2001) The
distributed annotation system. BMC Bioinform 2:7
Hoffmann R, Valencia A (2004) A gene network for navigating
the literature. Nat Genet 36(7):664 Cross-References
Kano Y, Dobson P, Nakanishi M, Tsujii J, Ananiadou S (2010)
Text mining meets workflow: linking U-compare with ▶ Biological Assay
Taverna. Bioinformatics 26(19):2486–2487
▶ Drug Target
Krallinger M, Valencia A, Hirschman L (2008a) Linking genes to
literature: text mining, information extraction, and retrieval ▶ Natural Product Resources
applications for biology. Genome Biol 9(Suppl 2):S8
Krallinger M, Morgan A, Smith L, Leitner F, Tanabe L,
Wilbur W, Hirschman L, Valencia A (2008b) Evaluation
of text-mining systems for biology: overview of the
References
Second BioCreative community challenge. Genome Biol
9(Suppl 2):S1 Jackson MJ, Esnouf MP, Winzor D, Duewer D (2007) Defining
Leitner F, Valencia A (2008) A text-mining perspective on the and measuring biological activity: applying the principles of
requirements for electronically annotated abstracts. FEBS metrology. Accredit Qual Assur 12(6):283–294
Lett 582(8):1178–1181 Pelikan EW (2004) Glossary of terms and symbols used in
Leitner F et al (2008) Introducing meta-services for biomedical pharmacology. University School of Medicine, Department
information extraction. Genome Biol 9(Suppl 2):S6 of Pharmacology and Experimental Therapeutics, Boston.
Pafilis E, O’Donoghue SI, Jensen LJ (2009) Reflect: augmented http://www.bumc.bu.edu/busm-pm/resources/glossary
browsing for the life scientist. Nat Biotechnol 27(6):508–510
Rebholz-Schuhmann D, Arregui M, Gaudan S, Kirsch H, Jimeno A
(2008) Text processing through Web services: calling Whatizit.
Bioinformatics (Oxford, England) 24(2):296–298
Smith L et al (2008) Overview of BioCreative II gene mention Biological Applications of Network
recognition. Genome Biol 9(Suppl 2):S2
Modules
Biological Activity
Definition
Riza Theresa Batista-Navarro
National Centre for Text Mining, Manchester A protein-protein interaction (PPI) network of a certain
Interdisciplinary Biocentre, Manchester, UK organism is a network that contains proteins and
physical interactions between protein pairs. In the
network, each node/vertex is a protein and each edge/
Synonyms link is a physical interaction.
An interactome refers to a complete PPI network
Bioactivity; Pharmacological activity that contains all protein physical interactions in
Biological Applications of Network Modules 111 B
a certain organism. An interactome is usually approx- Protein Function Prediction Based on Modular
imated by combining all known PPI data from large- Analysis
and small-scale experiments, expert curations, and One application of module identification is to predict
possibly computational predictions. functions for proteins in modules with unknown func-
A general greedy search is any algorithm or method tional annotations. Such protein function predictions B
that makes a locally optimal decision in order to retrieve are based on identified modules as well as known
or approximate a globally optimal solution. A greedy functional annotations, such as ▶ gene ontology or
search method often contains a candidate set, a selection Saccharomyces Genome Database annotations
criterion, a feasibility criterion, an objective function, (Ashburner et al. 2000; Cherry et al. 1998).
and a solution criterion. The candidate set contains all A straightforward method to predict protein func-
candidates that can be added to the solution. The selec- tions is to assign a function shared by the majority
tion criterion or function determines the best candidate of proteins/genes in a module to other unannotated
to add next into the solution. The feasibility criterion protein/gene in the same module, also considering
determines whether a candidate can be added to con- each of the assigned functions as a function of the
tribute for the solution. The objective function assigns whole module. In this way, proteins with unknown
and calculates a heuristic score for a solution or partial functions are assigned these shared functions within
solution. The solution criterion indicates whether the the same module. Alternatively, a hypergeometric test
final solution is achieved. (or one-tailed ▶ Fisher’s test) can be performed for
each function to calculate a p value indicating whether
or not a function is overrepresented by genes/proteins
Characteristics in a module:
▶ Biomarkers
▶ Fisher’s Test Definition
▶ Functional Enrichment Analysis
▶ Functional Modules and Complexes A biological assay is an experiment that determines
▶ Functional/Signature Network Module for Target a substance’s biological activity based on its effect on
Pathway/Gene Discovery a specific drug target, relative to that of a standard
▶ KEGG Pathway Database preparation (IUPAC 1997). It is also the process by
▶ Modules in Networks, Algorithms and Methods which the potency of an agent is measured in terms of
▶ Mutual Information the reactions of a specific drug target (Pelikan 2004).
Biological Disease Mechanism Networks 113 B
Cross-References edges (directed/undirected connections) and classified
into different subtypes.
▶ Biological Activity In other words, the complex biological processes
▶ Drug Target are presented through the precise interaction and reg-
▶ Natural Product Resources ulation of 1,000 of molecules. These biological B
networks are significantly different from any random
References networks and often exhibit ubiquitous properties in
terms of their structure and organization.
IUPAC (1997) Compendium of chemical terminology. In: Biological networks are developed and analyzed to
McNaught AC, Wilkinson A (eds) The “gold book”,
understand the pathophysiology of different human
2nd edn. Blackwell Scientific, Oxford
Pelikan EW (2004) Glossary of terms and symbols used in diseases. Such networks also help in identifying
pharmacology. University School of Medicine, Department novel biomarkers, candidate genes, pathway cross
of Pharmacology and Experimental Therapeutics. Boston. talk pattern, etc., leading to the progression/molecular
http://www.bumc.bu.edu/busm-pm/resources/glossary
characterization of complex diseases, i.e., type 2
Diabetes and Cancers.
Biological Clock
Characteristics
▶ Circadian Rhythm
Network Classification/Network Types
The high-throughput data-collection techniques
like microarrays, protein chips, or yeast two-hybrid
Biological Disease Mechanism Networks screens determine how and when these molecules
interact with each other. Various types of interaction
Shipra Agrawal1 and M. R. Satyanarayana Rao2 networks, including protein–protein interaction, and
1
BioCOS Life Sciences Pvt. Limited, Institute of metabolic, signaling, and transcription-regulatory net-
Bioinformatics and Applied Biotechnology, works emerge by integrating these data and interac-
Bangalore, Karnataka, India tions. Figure 1 describes interrelation across different
2
Chromatin Biology Laboratory, Molecular Biology types of biological networks, which are mostly
and Genetics Unit, Jawaharlal Nehru Center for constructed from high-throughput molecular data.
Advanced Scientific Research, Bangalore, Karnataka, The basic networks are required to study any biological
India phenomena or disease includes protein network, sig-
naling and metabolic network, co-expression and tran-
scriptional regulatory network, and protein-DNA
Synonyms interaction network (Fig. 1). All these networks are
analyzed to identify biologically relevant modular
Diseasome; Gene network; Intercatome; Protein net- structure of the networks (Please see the box for Mod-
work; Scale-free network; Transcriptional regulatory ular Network).
network The classifications of such biological networks are
mostly based on types of input data, information con-
nectivity, architecture topology, and overall functional
Definition interpretations.
Based on aforementioned criteria, some of the
The biological networks are constructed from any important networks are defined as follows:
type of molecular/biochemical and biological data 1. Protein–Protein Interaction Network
to interpret the overall functional and regulatory • It is a network of interacting proteins. This is
schema of any biological system (cell/tissue/organ undirected network, in which proteins are
and organism). These networks are represented represented by nodes and the interaction
through wiring diagram of nodes (genes/proteins) and between the proteins are represented by edges.
B 114 Biological Disease Mechanism Networks
Gx
D1
Gy
D2 Ga Disease-Phenotype
network
Gb
D3
Gc Phenomic data
Gd
Level E1
3 3A+B C
A 1
E2
of 1
1 1 Metabolic network
B B+C D C
1
D 1
Complexity 1 1
Metabolic data
D B+C
E3
Proteomic data
Biological Disease Mechanism Networks, which are finally merged to construct complex disease networks
Fig. 1 Biological/molecular networks and relative complexity (Please refer Network Complexity Box for details) (Note: D1,
It shows sequential increase in complexity in terms of both data D2, D3 exemplify related diseases and Gx, Gy, Ga, etc. represent
and information. The layers represent different network types reported candidate genes for corresponding diseases)
and cross-correlation of different types of molecular networks,
2. Signaling and Metabolic Networks enzymes. The nodes are either metabolites or
• Signaling network is molecular bridge of events enzymes or reactions. It is a directed and
between the cell and the outside environment. It weighted network. It is constructed from the
is directed and is composed of proteins or literature information on reactions, enzymes,
enzymes or factors that trigger or suppress the genes, substrate–enzyme concentration, product
expression of a number of genes. The signaling concentration, etc. (Ma and Zeng 2003;
event involves various proteins localizing to Albert 2005).
different compartments in the cell and finally 3. Co-expression Network and Transcriptional Regu-
controlling gene expression in the cell nucleus latory Network
(Hughey et al. 2009). • Co-expression networks are constructed from
• The signaling networks are mostly integrated the co-expression measures of genes across var-
with metabolic networks, which are biochemical ious tissue samples. Here, the nodes correspond
reactions along of substrates, metabolites, and to genes and edges represent the connection
Biological Disease Mechanism Networks 115 B
strength which is obtained from the promoter (in coming connectivity). The network is
co-expression similarity. It is also called as cor- undirected (Deplancke et al. 2006).
relation or association networks (Horvath and
Dong 2008).
Network Complexity
• The co-expression data is modeled to generate B
A rational integration of molecular networks
transcriptional regulatory networks, in
describes substantial understanding of the com-
which one gene may regulate transcription of
plexity of a biological phenomena/molecular
another gene either directly or indirectly.
mechanism. For example, a researcher con-
Here, the nodes are genes that are connected
structs biologically important networks sepa-
through directed edges. Arrows in the network
rately from genomics and proteomics data from
topology diagram indicate that they trigger
a diseased system. Then, these networks are cor-
the target gene activation, while bars in the
related and integrated to develop a single and
network indicate a repressive effect on the
meaningful disease model or network. The bio-
target gene (Keurentjes et al. 2007). This net-
logical significance and complexity of
work presents both physical and functional
a molecular network increases by integrating
interactions composed of genes controlled by
various types of molecular data/networks as
transcription factors. The network is directed
one base and holistic network.
and the node types are genes and transcription
factors. The transcriptional regulatory networks
are having several signature motifs including
feed-forward loop motif, bifan, and auto-
regulatory loops (Dobrin et al. 2004). Descrip- Modular Network
tions of these motifs are given in the following
paragraphs. M2
• Feed-forward loop motif – It consists of a pattern M1
of three genes which is composed of two tran-
scription factors and a target gene. One of the
M3
transcription factors regulates the other while
both together regulate the target gene.
• Autoregulatory loops – It consists of a regulator
gene that in turn binds to its own gene promoter,
thereby bringing autoregulation. M4
GBM-
specific
network gene-expression
GBM
WGCNA
Gene sets
Network
Topologic
Reconstr
al
analysis uction Survival
Gene associated
Coexpression
classifier Genes
module
Novel cell-cycle detection
regulators
Driver GBM Glioblastoma
Molecular mutations
Ladha et al., 2010
targets (DM)
Signaling pathway Cerami et al., 2010
Cross talks
de Tayrac et al., 2009
For DM
Torkamani & Schork, 2009
Horvath et al., 2006
Zhang et al., 2009
Biological Disease Mechanism Networks, Fig. 2 Graphical different colors correspond to specific research methods and
representation of different approaches, which have been used to results available in the literature
construct systems biology models in GBM. The arrows in
▶ Organelle and Functional Module Resources Adverse events; Biomarker; Knowledge base; Systems
biology
Definition
Knowledge base
Biomarker discovery
transfer RNA levels, as well as RNA editing and splic- Amplification plot
1.000 E+1
ing of message RNA. Alterations in the sequences and
expression of the RNA molecules are commonly used
biomarkers. The RNA biomarker discovery process 1.000
begins with the recruitment of individuals for
a biomarker study and the isolation of RNA from
1.000 E–1
subjects’ whole blood or peripheral blood mononu-
ΔRn
clear cells (PBMCs), or generation of lymphoblastic
cell lines (LCLs). Researchers typically screen the 1.000 E–2
known set of genes with candidate genes generated
either from published studies or unpublished pilot 1.000 E–3
studies done by the researcher that repeatedly show
expression alterations in the state of interest, as com-
pared to normal expression. These putative biomarkers 1.000 E–4
are usually compiled via gene-focused microarray
0 5 10 15 20 25 30 35 40 45
analysis of whole blood, LCL, or PBMCs. Once Cycle
a gene, gene list, or network of genes of interest is
compiled, high quality RNA from subjects that meet Biomarker Discovery, Typical Process, Fig. 1 A qPCR
amplification plot is the plot of fluorescence signal versus cycle
the researcher’s inclusion criteria is transcribed into
number. Figure 1 is an amplification plot of the HLA-CD74 gene
cDNA and expression data is generated via quantita- in brain, showing the change in fluorescence of SYBR Green dye
tive PCR, normalized with an appropriate reference (Y axis) plotted versus cycle threshold (X axis). Lower Ct number
gene. The qPCR data can be generated with real-time represents greater expression of the HLA-CD74 gene
amplification plots (Fig. 1) and is analyzed to evaluate
the ability of individual and composite gene expression
markers to differentiate cases from controls based on DNA
transcript abundance as well as to validate the micro- Genomic biomarkers can be characterized by varia-
array results. The primary endpoint of the RNA bio- tions in DNA: single nucleotide polymorphisms
marker discovery process is the area under the curve (SNPs), haplotypes, insertions, copy number variation,
(AUC) of the Receiver Operating Characteristic inversions, deletions, or methylated cytosine residues.
(ROC) curve (▶ Area Under the ROC Curve, Relevant SNPs can be determined from a subject’s
▶ Receiver Operating Characteristic (ROC) Curve). isolated DNA by direct sequencing or SNP genotyping
The ROC is a plot that measures classification perfor- assays which employ fluorescence technology to
mance of a model across the entire range of thresholds assess genotypes. Figure 3 shows an allelic discrimi-
(Dwinnell 2010). The ROC curve is generated by nation plot generated from a TaqMan SNP genotyping
graphed data points of the true positive rate (sensitiv- assay from Applied Biosystems (Foster City, CA).
ity) and the false positive rate (100 minus specificity). ▶ Fluorescent markers are detected at the locus of
The more the ROC curve breaks toward the top-left of interest, depending on genotype of the sample. The
the graph, the better the model is at separating cases blue cluster is a software call of homozygous Allele
from controls, as opposed to a random prediction Y, marked with FAM fluorescence, and the red cluster
shown by the dashed line (Fig. 2). Thus, AUC of the is a call of homozygous Allele X, marked with VIC
ROC represents the predictive ability of the gene fluorescence. Dual fluorescence, an indicator of
expression markers, and a greater AUC correlates to heterozygosity, is seen as the green cluster centered
a stronger predictive marker. If an expression profile between the two fluorescent extremes. The results of
based on the ROC can be developed for cases versus multiple SNP genotyping assays or sequencing
controls, the biomarker can be used for early detection, studies can be analyzed together to find haplotype
monitoring, and treatment decisions with predictive biomarkers.
medicine for disorders, and with the added benefit of Copy number variation (CNV) assays consisting of
greater cost effectiveness. a primer/probe reagent mixture can be purchased from
Biomarker Discovery, Typical Process 121 B
100
6.0
80
5.0 B
60
Sensitivity
4.0
Allele Y
40 3.0
Biomarker 1
2.0
20 Biomarker 2
Biomarker 3
1.0
0
0 20 40 60 80 100 0.0
100-Specificity
0.2 0.7 1.2 1.7 2.2 2.7
Biomarker Discovery, Typical Process, Fig. 2 Three bio- Allele X
markers showing the degree of sensitivity and specificity for
predicting cases and controls; biomarker 1 and 3 are nearly Biomarker Discovery, Typical Process, Fig. 3 An allelic
equal, and both are better predictors than biomarker 2 discrimination plot of genotype calls at Neuregulin1 SNP
rs3924999, showing homozygotes (blue and red ) and heterozy-
gotes (green)
bioscience companies, or can be created in the labora-
tory using primer and probe and standard dilutions of
calibrated DNA references. Figure 4 shows a table
generated by CopyCaller after an Applied Biosystems 2
CNV assay run on an ABI 7900HT PCR system
(Applied Biosystems, Foster City, CA). The assay
Copy number
microarray probes can measure methylation though rich in promise, is not always a particularly
changes across all regions of genomic DNA. Bio- fruitful endeavor. Several limiting factors to protein
markers of methylation are particularly useful, as biomarker discovery exist and include:
screening in at-risk populations can occur with • The relatively low abundance of some biomarkers
minimally invasive techniques. Multiple studies in the proteome that must be detected in a complex
show that DNA methylation of certain loci can be matrix and the lack of means to amplify these infre-
detected in blood, sputum, bronchoalveolar lavage, quent markers
and, potentially, exhaled breath-condensate (Anglim • The post-translational complexity of protein bio-
et al. 2008). markers in human blood and biological fluids mak-
ing detection somewhat ambiguous
Proteins • The variability in human population and
Protein biomarker discovery (▶ Biomarkers, Protein pathologies that make valid biomarkers very
Expression) can be defined as the process by which difficult to detect and apply to disease states (Rifai
the differential expression of proteins in diseased et al. 2006)
versus healthy or transitional states is first identified. Further, on a genome-wide basis, protein or multi-
Protein biomarker discovery can use model systems proteins levels are quite difficult to obtain and techni-
(i.e. mouse models, cell lines) or the analysis of cally not feasible at this time. However, this is not as
a variety of human biological fluids including blood, large of a problem for RNA diagnostics, which evalu-
urine, tissue lysates, and cerebrospinal fluid. Protein ate the expression of the genes in question. Because
biomarker discovery commonly employs two- gene expression can reflect both genetic and environ-
dimensional gel electrophoresis and mass spectrome- mental influences, it may be particularly useful for
try technology to separate and identify differential identifying risk factors for complex disorders which
expression of proteins in diseased and normal states, are thought to have a multi-factorial polygenic etiol-
via measurement of peptide abundance. ogy in which many genes and environmental factors
From this first pass separation and identification interact. In addition, multiple gene biomarkers can be
analysis, a list of candidate biomarker proteins is gen- used to identify disease risk and to predict or monitor
erated. These lists generally consist of hundreds of drug response. Lastly, novel RNA diagnostic tools can
candidate biomarkers, with a high rate of false posi- be employed with PBMCs, whole blood or LCLs,
tives. A biomarker candidate verification phase providing more options for researchers and
reduces the number of false positives and ensures that practitioners.
only the most promising putative biomarkers found in
discovery go on to the validation phase. Human plasma Other Biomarkers
(if not used in the discovery process) is typically ana- Besides the trilogy of RNA, DNA, and protein, the
lyzed during verification, as it is considered the most field of biomarkers routinely engages the use of
comprehensive illustration of the human proteome and other diverse measures such as metabolomics, lipids,
is in contact with all tissues and could be a sentinel of small molecules such as steroids, and toxic environ-
the disease processes (Anderson and Anderson 2002). mental residues in bodily tissues. Also to be consid-
Once a candidate biomarker has been verified, it can ered as biomarkers are the following imaging
move on to the clinical validation and commercializa- techniques: computed tomography, magnetic reso-
tion process. nance imaging, positron emission tomography, and
ultrasound. Electrophysiological measures such as
Caveats of Biomarker Discovery: DNA Versus RNA electroencephalogram, electrocardiogram, and elec-
Versus Protein tromyogram are considered informative biomarkers,
DNA diagnostics such as SNPs are useful but often do to accurately measure tissues of interest in disease
not reflect variants that have a biological role. More- or transitional states. Biomarker discovery in
over, DNA variants do not provide critical information these categories follows a process similar to that of
on the environmental factors that are important for blood-derived biomarkers, but with verification and
pathogenesis of these diseases except in cases of epi- validation stages and analyses that suit the biomarker
genetic biomarkers. Protein biomarker discovery, in study.
Biomarkers 123 B
Cross-References biologic processes, pathogenic processes, or pharma-
cologic responses to a therapeutic intervention
▶ Area Under the ROC Curve (Biomarkers Definitions Working Group 2001).
▶ Biomarkers, Clinical Relevance A biomarker can be used as an indicator of a change
▶ Biomarkers, Protein Expression or, if it meets the highest standard of proof, B
▶ DNA Methylation a surrogate endpoint that is expected to predict clin-
▶ Fluorescent Markers ical benefit (or harm, or lack of benefit or harm) based
▶ Receiver Operating Characteristic (ROC) Curve on epidemiologic, therapeutic, pathophysiologic, or
other scientific evidence.
Biomarkers are widely used in medicine, drug
References discovery and development, and safety assessment.
They serve as indicators for disease progression
Anderson NL, Anderson NG (2002) The human plasma (diagnosis and prognosis), therapeutic responses
proteome: history, character, and diagnostic prospects. Mol
(efficacy), and adverse side effects (safety) in organ-
Cell Proteomics 1(11):845–867
Anglim PP, Alonzo TA, Laird-Offringa IA (2008) DNA isms, organs, and/or cells. An ideal clinical biomarker
methylation-based biomarkers for early detection of should contain the following characteristics (Lesko
non-small cell lung cancer: an update. Mol Cancer 7:81 and Atkinson 1999): (1) clinical relevance, (2) sensi-
Dwinnell W (2010) ROC curves and AUC. http://
tivity and specificity to treatment effects, (3) reliabil-
matlabdatamining.blogspot.com/. Accessed 1 June 2010
Rifai N, Gillette MA, Carr SA (2006) Protein biomarker discov- ity, (4) practicality, and (5) simplicity.
ery and validation: the long and uncertain path to clinical Biomarkers need to be qualified to define
utility. Nat Biotechnol 24(8):971–983 their specific use and context (fit for purpose). How-
Shiraishi M, Hayatsu H (2004) High-speed conversion of cyto-
ever, most biomarkers in use today have not been
sine to uracil in bisulfite genomic sequencing analysis of
DNA methylation. DNA Res 11(6):409–415 evaluated in a comprehensive qualification process
(Goodsaid et al. 2008). This, unfortunately, means
that new biomarkers are being compared to older
ones for which a full understanding of their
context for use is not known. Currently, there is no
Biomarkers widely accepted framework for biomarker qualifica-
tion. The FDA has taken an initiative to develop
Donna L. Mendrick and Weida Tong a consistent and standardized qualification frame-
Division of Systems Biology, National Center for work for the acceptance of biomarkers for regulatory
Toxicological Research, US Food and Drug use. Such a regulatory driven effort will facilitate
Administration, Jefferson, AR, USA communication between regulatory agencies, phar-
maceutical companies, the research community, clin-
ical practice, and consumer participants for
Synonyms evaluation of the biomarker-surrogate-clinical end-
point relationship in different settings and
Biological marker; Molecular marker; Surrogate applications.
endpoint Current biomarker discovery increasingly is
relying on emerging molecular technologies that
aim to determine the causal and mechanistic relation-
Definition ships of molecular markers with clinical endpoints.
The representative technologies and associated bio-
A biomarker is a characteristic that is objectively marker types are summarized in Table 1, and most
measured and evaluated as an indicator of normal of them are high throughput or high content in
nature. These technologies can be applied indepen-
dently or in parallel; the latter is able to identify
Disclaimer: The views presented in this article do not necessarily multiple biomolecules at different level of biological
reflect those of the US Food and Drug Administration. complexity.
B 124 Biomarkers, Blood Derived
TN
between the different forms. The situation is further
FP complicated because the activity of proteins may be
FN [X] influenced by many factors, such as cofactors, forma-
tion of protein complexes, local pH conditions, and the
TP
presence of natural inhibitors, and often in is not the
diseased concentration but the activity of the protein that is in
strong relation with the disease state. This has lead to
the development of analytical methods providing
activity profiles of classes of proteins (Fig. 2).
Biomarkers, Protein Expression, Fig. 1 Concentration dis-
tribution of hypothetical protein biomarker in group of samples For diagnosis and treatment, follow-up biomarkers
obtained from healthy control group and diseased individuals. are mostly detected from easily accessible body fluids
The difference of the concentration distribution and the cut-off such as blood, urine, and saliva or from the cerebro-
value define the number of true positive (TN), false positive spinal fluid. However, in body fluids, the difference
(FP), true positive (TP) and the false negative (FN)
between the concentration of the least and most
abundant compounds is large (11–12 orders of magni-
tude in the case of human blood) (Schiess et al. 2009).
clinically meaningful endpoint, and it is a direct mea- Interesting biomarker candidates originating from the
sure of how a patient feels, functions, or survives and is diseased tissue(s) or organ(s) are mixed with com-
expected to predict the effect of the therapy. The main pounds from all other parts of the body, providing
difference between a surrogate marker and a biomarker a huge challenge for detection by analytical chemistry.
is that a biomarker is a meaningful “candidate” for Other biomarker discovery approaches first identify
a surrogate marker, and a surrogate marker is a test biomarker candidates directly in the diseased cells,
used as a measure of the effects of a specific treatment. tissues, or organs, followed by subsequent validation
Proteins are most suitable for biomarkers as their of the identified candidate compounds or their specific
expression is regulated in correlation with the general metabolites if they are present in sufficient amount in
state of living organisms and contributes considerably easy accessible body fluids, where the final diagnostic
to the display of the phenotype. In an experimental test is applied.
context, this means measuring protein concentration The most widely used techniques are antibody-
distribution in samples obtained at different biological based ELISA tests, protein arrays, and specific LC-
states. Figure 1 shows different concentration distribu- MS analysis. Research to identify new biomarkers is
tions of a hypothetical biomarker between two stages, composed from biomarker discovery process, where
corresponding to the healthy and diseased stages. a high number of proteins are identified in low number
Overlap between the two concentration distributions of samples using comprehensible quantitative analyti-
and the cut-off value define the Type I (number false cal techniques such as protein arrays, LC-MS, or
positive or FN) and Type II errors (number false neg- SELDI measurements. This is followed by identifica-
ative or FP) and specify the sensitivity and specificity tion of the most discriminating peaks between
of the test. The area under the receiving operation predefined class of samples (e.g., healthy and disease)
curve (AUROC) can be used to compare efficiency of and validation of the biomarkers on large number of
different diagnostic tests. samples using fast, targeted analytical methods such
Proteins have a complex life cycle from synthesis as MRM-based LC-MS or ELISA tests. The final val-
on the ribosome to the ubiquitin degradation idation comprises the exploration of the role of
mechanism. During the life span of a protein it may the biomarker compound(s) in the mechanism of the
undergo several chemical, isomerization, and sterical disease.
Biomarkers, Protein Expression 129 B
Albumin
1010
Transferrin
Alpha-2-macroglobulin
109 Alpha-1-acid glycoprotein 1 mg/ml
B
Apolipoprotein C-III
108 Clusterin
Apolipoprotein C-I
Complement component 7
107 Tetranectin
dynamic concentration range
100 pg/ml
interleukins, cytokines
10−1
plasma proteins
Biomarkers, Protein Expression, Fig. 2 Plasma protein con- the HUPO plasma proteome initiative and yellow dots represent
centration showing three main categories (classical plasma pro- currently utilized biomarkers (Figure taken from Schiess et al.
teins, tissue leakage products, signaling compounds e.g., (2009))
cytokines). Red dots indicate proteins that were identified by
Affinity-based protein arrays or two-dimensional Methods without using any chemical modifications are
gel electrophoresis with mass spectrometric identifica- called label-free, and methods using chemical reaction
tion are able to quantify whole intact proteins. Due to or incorporation of amino acids with non-natural iso-
the difficulties to separate intact proteins with reverse- tope compositions form one other type of protein quan-
phase chromatography compatible with MS detection, tification method.
LC-MS methods use protein fragments obtained with
cleavage of protease enzymes such as trypsin. In this
approach, which is also called as shotgun proteomics,
the better chromatographic separation is at the expense
References
of obtaining higher sample complexity, which
Biomarkers Definitions Working Group (2001) Biomarkers and
results in difficulty to obtain exact quantification of surrogate endpoints: preferred definitions and conceptual
the intact protein in the presence of proteins with framework. Clin Pharmacol Ther 69(3):89–95
high homology having few peptides in common or Katz R (2004) Biomarkers and surrogate markers: an FDA
perspective. NeuroRx 1(2):189–195
proteins having partly different type of post-translation
Schiess R, Wollscheid B, Aebersold R (2009) Targeted proteo-
modifications. Quantification of compound using mass mic strategy for clinical biomarker discovery. Mol Oncol
spectrometry can be also performed in several ways. 3(1):33–44
B 130 Biomarkers, Ranking
Definition
concentration range (often 11–12 orders of magni- case-control sampling of the same tissue and therefore
tude). This presents an enormous challenge for analyt- excludes the majority of the biological variance from
ical chemistry and statistical methods to select the analysis. However, disease cells may influence the
compounds with a real association with the disease. surrounding healthy cells, e.g., with proteins secreted
Tissue- or organ-derived discriminating compounds by the disease-altered cells. Therefore, validation by
should therefore be validated so that they reach periph- comparing disease-altered cells, the surrounding
eral body fluids either in intact or modified forms. One healthy tissue in diseased tissue, and healthy cells in
promising approach is the measurement of the healthy samples is required.
secretome molecular profile of in vitro cultivation of MALDI imaging mass spectrometry (Schwamborn
removed tissue samples. Generally, samples obtained and Caprioli 2010) is one other promising technique
from solid tissues are pure, but inevitable contamina- for finding tissue-derived compounds closely related to
tion from blood may result in alteration of the the disease. It enables direct in situ or cross-sample
measured tissue secretome molecular profile. comparison of the molecular profile of diseased
The most widely used sampling method for solid and healthy cells in tissue sections (Schwamborn and
tissues is laser capture microdissection (Murray and Caprioli 2010). Figure 1 presents the outline of MALDI
Curran 2005; Liu 2010), which enables highly accurate imaging mass spectrometry analysis from tissue
cutting of diseased altered and healthy tissue pieces section preparation to the statistical analysis of the
using precise laser shots. This technique enables acquired data.
B 132 Biomedical Decision Support Systems
References
Biomedical Named Entity Recognition,
Liu A (2010) Laser capture microdissection in the tissue Whatizit
biorepository. J Biomol Tech 21(3):120–125
Murray GI, Curran S (2005) Laser capture microdissection:
methods and protocols, Methods in Molecular Biology Dietrich Rebholz-Schuhmann
series, vol 293. Humana Press, Totowa European Bioinformatics Institute, Hinxton, UK
Schwamborn K, Caprioli RM (2010) Molecular imaging
by mass spectrometry–looking beyond classical histology.
Nat Rev Cancer 10(9):639–646
Definition
Definition
BioModels Characteristics
BioModels Database:
A Repository of
Mathematical Models of
Biological Processes,
Fig. 2 Type of models:
Categorization of models in
the curated branch of
BioModels Database based on
the GO terms present in the
annotation of the models. This
chart was generated by
enumerating models in the
database, whose annotations
refer either the GO terms listed
here or the child of the GO
terms listed here
▶ MIRIAM Guidelines
▶ MIRIAM URI Definition
▶ SBGN
▶ Systems Biology Markup Language (SBML) BioNLP Shared Task (BioNLP-ST, hereafter) is
▶ Systems Biology Ontology a series of shared evaluations and workshops focused
on biomolecular event extraction from literature. The
first BioNLP-ST evaluation was organized in 2009 by
References the Tsujii Laboratory of the University of Tokyo, with
a workshop held under the auspices of Biomedical
Juty N, Le Novère N, Laibe C (2012) Identifiers.org and Natural Language Processing Special Interest Group
MIRIAM registry: community resources to provide persis-
tent identification. Nucleic Acids Res 40:D580–D586
(SIGBIOMED) of the Association for Computational
Laibe C, Novère NL (2007) MIRIAM resources: tools to gener- Linguistics (ACL). The task targeted the extraction of
ate and resolve robust cross-references in systems biology. biomolecular events (bio-events), defined as changes in
BMC Syst Biol 1:58 the states of biomolecular objects following the defini-
Le Novère N (2006) Model storage, exchange and integration.
tion and event representation of the GENIA project. The
BMC Neurosci 7(Suppl 1):S11
Le Novère N, Finney A, Hucka M, Bhalla US, Campagne F, task data was based on the GENIA corpus (Kim et al.
Collado-Vides J, Crampin EJ, Halstead M, Klipp E, 2008) human-curated annotations of bio-events in
Mendes P, Nielsen P, Sauro H, Shapiro B, Snoep JL, PubMed abstracts. BioNLP-ST 2009 was the first
Spence HD, Wanner BL (2005) Minimum information
community-wide effort for the automatic extraction of
requested in the annotation of biochemical models
(MIRIAM). Nat Biotechnol 23:1509–1515 bio-events from literature (Kim et al. 2009).
Le Novère N, Bornstein B, Broicher A, Courtot M, Donizelli M, The event extraction task was quite new to much
Dharuri H, Li L, Sauro H, Schilstra M, Shapiro B, Snoep JL, of the community, as most bio-text mining (bio-TM)
Hucka M (2006) BioModels Database: a free, centralized
targeted simple binary relationships between
database of curated, published, quantitative kinetic models
of biochemical and cellular systems. Nucleic Acids Res 34: bio-entities, such as protein–protein interactions (PPI)
D689–D691 (Bunescu et al. 2004) and disease-gene associations
Li C, Courtot M, Le Novère N, Laibe C (2010a) BioModels.net (DGA) (Chun et al. 2006). However, BioNLP-ST’09
Web Services, a free and integrated toolkit for computational
met community-wide participation, with 42 teams
modelling software. Brief Bioinform 11:270–277
Li C, Donizelli M, Rodriguez N, Dharuri H, Endler L, signing up for initial registration and 24 teams submit-
Chelliah V, Li L, He E, Henry A, Stefan MI, Snoep JL, ting final results.
Hucka M, Le Novère N, Laibe C (2010b) BioModels data- At the time of writing, BioNLP-ST 2011, the
base: an enhanced, curated and annotated resource for
second event of the series, is being organized as
published quantitative kinetic models. BMC Syst Biol 4:92
Machné R, Finney A, M€ uller S, Lu J, Widder S, Flamm C (2006) a joint effort of several groups. The evaluation seeks
The SBML ODE solver library: a native API for symbolic to provide a variety of tasks and annotations centered
and fast numerical analysis of reaction networks. Bioinfor- around event extraction.
matics 22:1406–1407
Characteristics
a predicate (event trigger) and an argument (Protein). The failure of p65 translocation to the nucleus . . .
The Binding type is more complex in requiring T1 (Protein, 15-18)
the detection of an arbitrary number of arguments. T2 (Localization, 19-32)
E1 (Type:T2, Theme:T1, ToLoc:T3)
Regulation events always take a Theme argument
T3 (Entity, 40-46)
and, when expressed, also a Cause argument. Further, M1 (Negation E1)
a Regulation event may take an event as its primary
argument (theme or cause), creating structures linking BioNLP Shared Task, Fig. 1 Example event annotation. The
multiple events that represent causal chains. Such protein annotation T1 is given as a starting point. The extraction
of annotation in bold is required for Task 1, the Entity type
connections between events are a unique feature of t-entity T3 and the secondary argument ToLoc:T3 for Task 2,
the BioNLP task compared to other event extraction and M1 for Task 3
tasks such as ACE.
Format and Example e.g., binding and regulation, was a lot more challeng-
In the BioNLP-ST data sets, annotations to text are ing, achieving 40% of performance level.
provided in a stand-off style, as shown in the example
in Fig. 1. The annotations are of two general types: ones Continuation
identifying text spans referring to bio-entities (e.g., T1 After BioNLP-ST 2009, all the resources from the
and T3) and event triggers (e.g., T2), and ones expressing event were released to public, to encourage continuous
events (e.g., E1) and their modifications (e.g., M1). The efforts for further advancement. Since then, several
former type of annotations are represented by pairs, improvements have been reported (Bj€orne et al.
(type-specification, span-specification), and the latter by 2010; Miwa et al. 2010a, b; Poon and Vanderwende
n-tuples resembling predicate–argument structures, 2010; Vlachos 2010). For example, Miwa et al.
(predicate, argument1, argument2, etc.). (2010b) reported a significant improvement with bind-
ing events, achieving 50% of performance level.
Results
BioNLP-ST 2011
The final results enabled to observe the state-of-the-art
performance of the community on the bio-event The second event of BioNLP-ST series is organized for
extraction task. It showed that the automatic extraction 2011 (https://sites.google.com/site/bionlpst/). While
of simple events – those with unary arguments, its predecessor, BioNLP-ST 2009, relied on the
e.g., gene expression, localization, phosphorylation – GENIA corpus which only contained PubMed
could be achieved at the performance level of 70% abstracts on transcription factors in human blood
in F-score, but extraction of complex events, cells, the main theme of BioNLP-ST 2011 is
BioNLP Shared Task 141 B
generalization, which was pursued to three directions: Camon E, Magrane M, Barrell D, Lee V, Dimmer E, Maslen J,
text types, event types, and subject domains. To Binns D, Harte N, Lopez R, Apweiler R (2004) The gene
ontology annotation (GOA) database: sharing knowledge in
achieve that, various tasks and annotations were col- uniprot with gene ontology. Nucl Acids Res 32(Suppl 1):
lected through contributions from several groups, and D262–266
five event extraction tasks were arranged in four tracks. Chatr-aryamontri A, Ceol A, Palazzi LM, Nardelli G,
B
• GENIA (GE) Schneider MV, Castagnoli L, Cesareni G (2007) MINT: the
molecular INTeraction database. Nucleic Acids Res
• Epigenetics and Post-translational Modifications 35(Suppl 1):D572–574
(EPI) Chinchor N (1998) Overview of MUC-7/MET-2. In: Message
• Infectious Diseases (ID) understanding conference (MUC-7) proceedings, Fairfax
• Bacteria Track Chun HW, Tsuruoka Y, Kim JD, Shiba R, Nagata N, Hishiki T,
Tsujii J (2006) Automatic recognition of topic-classified
– Bacteria Biotopes (BB) relations between prostate cancer and genes using
– Bacteria Interaction (BI) MEDLINE abstracts. BMC-Bioinformatics 7(Suppl 3):S4
The GE task remains similar to the task of BioNLP- Hersh W, Cohen A, Lynn R, Roberts P (2007) TREC 2007
ST 2009, to play the role of pivot for generalization and genomics track overview. In: Proceeding of the sixteenth
text retrieval conference, Gaithersburg
to measure the progress of community with the Hirschman L, Krallinger M, Valencia A (eds) (2007) Proceed-
previous task. However, to avoid overfitting to the ings of the second BioCreative challenge evaluation work-
evaluation data set, the GE task is planned to arrange shop. CNIO Centro Nacional de Investigaciones
additional text sets, most of them coming from full Oncológicas, Madrid
Kim JD, Ohta T, Tsuruoka Y, Tateisi Y, Collier N (2004) Intro-
paper articles, so that generalization from abstracts to duction to the bio-entity recognition task at JNLPBA. In:
full papers can be measured. The EPI task is arranged Proceedings of the international joint workshop on natural
to evaluate generalization of event types with focus on language processing in biomedicine and its applications
events relevant to epigenetics and post-translational (JNLPBA), Geneva, pp 70–75
Kim JD, Ohta T, Tsujii J (2008) Corpus annotation for mining
protein modification without further subdomain biomedical events from literature. BMC Bioinformatics
restruction. The ID and BB tasks involve generaliza- 9(1):10
tion to new domains, ID targeting events relevant to the Kim JD, Ohta T, Pyysalo S, Kano Y, Tsujii J (2009) Overview of
biomolecular mechanisms of infectious diseases and BioNLP’09 shared task on event extraction. In: Proceedings
of natural language processing in biomedicine (BioNLP)
BB localization events of bacteria. NAACL 2009 workshop, Colorado, pp 1–9
BioNLP-ST 2011 also includes three supporting Miwa M, Pyysalo S, Hara T, Tsujii J (2010) A comparative study
tasks: Coreference, Entity relation, and Gene renaming. of syntactic parsers for event extraction. In: Proceedings of
Although these are not event extraction tasks them- BioNLP’10, Uppsala, pp 37–45
Miwa M, Sætre R, Kim JD, Tsujii J (2010b) Event extraction
selves, according to the analysis on the results of with complex event classification using rich features.
BioNLP-ST 2009, coreference resolution and entity rela- J Bioinform Comput Biol (JBCB) 8(1):131–146
tion detection are expected to play an important role for Nédellec C (2005) Learning language in logic – genic interaction
making a breakthrough in improving event extraction extraction challenge. In: Cussens J, Nédellec C (eds) Pro-
ceedings of the 4th learning language in logic workshop
performance. Gene naming has similar features with (LLL05), Bonn, pp 31–37
coreference; thus it is included as a supporting task. Poon H, Vanderwende L (2010) Joint inference for knowledge
extraction from biomedical literature. In: Proceedings of
NAACL-HLT’10, Los Angeles, pp 813–821
References Strassel S, Przybocki M, Peterson K, Song Z, Maeda K (2008)
Linguistic resources and evaluation techniques for evaluation
of cross-document automatic content extraction. In: Proceed-
Bader GD, Cary MP, Sander C (2006) Pathguide: a pathway ings of the 6th international conference on language resources
resource list. Nucleic Acids Res 34(Suppl 1):D504–506 and evaluation (LREC 2008), Marrakesh
Bj€orne J, Ginter F, Pyysalo S, Tsujii J, Salakoski T (2010) Vlachos A (2010) Two strong baselines for the BioNLP 2009
Complex event extraction at PubMed scale. Bioinformatics event extraction task. In: Proceedings of BioNLP’10,
26(12):i382–390 Uppsala, pp 1–9
Bunescu R, Ge R, Kate RJ, Marcotte EM, Mooney RJ, Ramani Voorhees E (2007) Overview of TREC 2007. In: The sixteenth
AK, Wong YW (2004) Comparative experiments on learning text REtrieval conference (TREC 2007) proceedings,
information extractors for proteins and their interactions. Gaithersburg
Artif Intell Med 33(2):139–155
B 142 Bio-Ontologies
differently in each research context. Resolving the ten- a machine-readable format, while other types of data
sion between stability and flexibility of classificatory (such as photographs) might need manipulation to be
categories is crucial to bio-ontologies’ success and is stored and retrieved through digital means. The extent
a core responsibility of curators, who engage in adapting to which curators need to manipulate data depends on
and updating bio-ontologies so that they mirror the the format in which data are produced and extracted,
research practices and knowledge of their users. The which is not always compatible with the software and
following activities are good examples (if not exhaus- format supported by any given bio-ontology.
tive) of steps in bio-ontology development that require
expert judgment and intervention by curators. Including Information About Data Provenance
Curators use metadata to classify information about
Data Mining from Publications data provenance. This procedure enables database
When extracting data from publications, curators have users to search through datasets on the basis of
to single out publications that they consider to be reli- context-independent criteria such as bio-ontology
able, updated, and representative for specific datasets. terms, while still allowing them to retrieve information
For instance, when gathering available data on a specific about the production of data when needed. Curators
gene, curators need to choose one or two publications have the responsibility of determining which informa-
that best represent data relevant to a given gene product tion about provenance are most useful to the classifi-
for the purposes of classification. They cannot compile cation of data for circulation and reuse.
data from each relevant publication, as it would be too
time consuming as well as generating inconsistent anno- Defining Terms
tations. Thus, curators choose what they see as the most Definitions that befit bio-ontology terms are not often
up-to-date and accurate publications on a specific gene found in biology textbooks or other publications, since
product, which as a consequence become “representa- those are usually steeped within specific research tra-
tive” publications for that entity. Further, once curators ditions or communities. Constructing a definition that
settle on a specific publication, they have to assess will be acceptable to all potential users of bio-
which data therein contained should be extracted and/ ontology, no matter which tradition they come from
or how the interpretation given within the paper matches and which organism they work on, constitutes a chal-
the terms and definitions already contained in the bio- lenging conceptual task for curators (Leonelli 2012).
ontology. Does the content of the paper warrant the Through activities such as the above, curators medi-
classification of given data under a new bio-ontology ate between the diverse assumptions and practices
term? Or can the contents of the publication be associ- characterizing the work of bio-ontology users and the
ated to one or more existing terms? These choices are need for bio-ontologies to conform to universal
unavoidable when extracting data from a publication requirements such as consistency, computability, ease
and they are impossible to regulate through fixed and of use, and wide intelligibility. Curators’ interventions
objective standards. The reasons why the process of data are crucial to the good functioning of bio-ontologies,
extraction requires manual curation are the same rea- and ideally need to be informed by a wide range of
sons why it cannot be divorced from subjective judg- expertise, including IT and programming skills, train-
ment: The choices involved are informed by a curator’s ing in more than one biological discipline (allowing
expertise and his or her ability to bridge between the them to bridge between different scientific contexts)
original context of publication and the context of bio- and familiarity with experimentation at the bench (so
ontology classification. that they understand observational statements made in
the context of specific experimental settings, as well as
Data Formatting anticipating the expectations of bio-ontologies users).
Without a minimal degree of homogeneity across for-
mats, there is no chance of making data searchable and
displaying them through the visualization tools used Cross-References
within databases. Formatting is also geared toward mak-
ing data computationally manageable: Data generated ▶ Data Mining
through high-throughput technologies are already in ▶ Data-intensive Research
Bio-PEPA 145 B
▶ Disease Ontology Definition
▶ Model Organism
▶ Ontology Lookup Service for Controlled Bio-PEPA (Ciocchetta and Hillston 2009) is an
Vocabularies and Data Annotation extension of the PEPA language (Hillston 1996) for
▶ Relationship Type Ontology dealing with biochemical signaling pathways. B
PEPA is a formal language for describing continuous
time Markov chain originally defined for the
References performance analysis of computer systems. PEPA
allows to quantitatively model and analyze
Ashburner M, Mungall CJ, Lewis SE (2003) Ontologies for large pathway systems and it is supported by a lot of
biologists: a community model for the annotation of genomic
software tools for analysis and stochastic simulations
data. Cold Spring Harb Symp Quant Biol 68:227–236
Baclawski K, Niu T (2006) Ontologies for bioinformatics. The are available. The combined use of PEPA and
MIT Press, Cambridge, MA the probabilistic model checker PRISM describe,
Bard JBL, Rhee S (2004) Ontologies in biology: simulate, and analyze biochemical signaling
design, applications and future challenges. Nat Rev Genet
pathways.
5:213–222
Brazma A, Krestyaninova M, Sarkans U (2006) Standards for A Bio-PEPA system is a formal, intermediate, and
systems biology. Nat Rev Genet 7:593–605 compositional representation of biochemical systems,
Hill DP, Smith B, McAndrews-Hill MS, Blake JA (2008) Gene on which different kinds of analysis can be carried out:
ontology annotations: what they mean and where they come
from. BMC Bioinformatics 9(5):S2
deterministic analysis of the ODE system, stochastic
Leonelli S (2008) Bio-ontologies as tools for integration in simulations (with the ▶ Stochastic Simulation Algo-
biology. Biol Theory 3(1):8–11 rithm), generation and analysis of the underlying
Leonelli S (2012) Classificatory theory in data-intensive science: continuous time Markov chain, and generation of the
the case of open biomedical ontologies. International Studies
input code for the PRISM model checker. Each of
in the Philosophy of Science 26(1):47–65
Rhee SY, Dickerson J, Xu D (2006) Bioinformatics and its the analysis carried on in the different derived formal-
applications in plant biology. Annu Rev Plant Biol isms can be of help for studying different aspects of
57:335–360 the biological model. Moreover, they can be used in
Rubin DL, Shah NH, Noy NF (2008) Biomedical ontologies:
conjunction in order to have a better understanding of
a functional perspective. Brief Bioinform 9(1):75–90
Smith B et al (2007) The OBO Foundry: coordinated evolution the system.
of ontologies to support biomedical data integration. Nat The Bio-PEPA extension modifies PEPA to deal
Biotechnol 25(11):1251–1255 with some features of biological models that are pecu-
liar of those kinds of systems (Calder and Hillston
2009):
• Functional rates: In contrast to PEPA,
individual processes are not able to define their
Bio-PEPA own rates for actions. Instead the rate associated
with an action is specified once, independently
Alida Palmisano1 and Corrado Priami2 of the processes in which the action occurs.
1
Department of Biological Sciences and Department The value of this rate can be specified to be
of Computer Science, Department of Biological a function that depends on the current state of the
Sciences Virginia Tech, Virginia Polytechnic Institute system.
and State University, Blacksburg, VA, USA • Stoichiometry: For each action, as well as its type,
2
Microsoft Research-University of Trento Centre for the stoichiometry or degree of involvement is also
Computational and Systems Biology and DISI, specified.
University of Trento, Povo, Trento, Italy • Parameterized processes: Bio-PEPA has
been designed to support the population-based
reagent-centric style of modeling and so a model
Synonyms consists of a number of sequential components
each representing a distinct species which evolve
PEPA quantitatively (increasing or decreasing amounts).
B 146 BioPortal
BioPortal
Mark A. Musen
Thus in order to capture the state of a system Stanford Center for Biomedical Informatics Research,
each component is parameterized recording its Stanford University, Stanford, CA, USA
current level.
• Differentiated prefix: For each action (reaction)
that a component is involved in it records its role Definition
within that reaction, e.g., reactant, product, inhibi-
tor etc. This enables the appropriate values to be BioPortal is an online repository of biomedical termi-
used in the functional rate associated with this nologies and ontologies maintained by the National
reaction. Center for Biomedical Ontology. BioPortal contains
Recently, some extensions of Bio-PEPA have the world’s largest collection of biomedical
been defined in order to represent some specific ontologies, which it makes available using
features of some biochemical networks. Specifically, a standard Web interface and a standard API at
the language has been extended to support http://bioportal.bioontology.org.
Bipartite Graph 147 B
Cross-References The BioSPI project was the main inspiration for
another framework designed around the pi-calculus
▶ Protégé Ontology Editor language: SPiM (Phillips and Cardelli 2007). The sim-
ulation algorithm is based on the ▶ stochastic simula-
tion algorithm and the language features a simple B
graphical notation for modeling a range of biological
BioSPI systems.
Synonyms
Cross-References
Bimodality
▶ Modules, Identification Methods and Biological
Function
Definition
a dX
= fp(X) - fd(X) fp(X) B
dt
Rates
fd(X)
fp(X) fd(X)
X
Production Degradation
Bistability, Fig. 1 Characteristics of bistable systems (Smits shows the change of X over time (dX/dt). (b) Plot of the functions
et al. 2006). (a) Schematic illustration of a bistable gene regula- fp(X) and fd(X) within the same graph. The intersection points
tion. The functions fp(X) and fd(X) represent the production and show the steady states
degradation rates of X, respectively. The differential equation
Cross-References Definition
▶ Epigenetics
▶ Methylation-sensitive Restriction Endonucleases Synonyms
BlenX, Fig. 2 Coding a basic enzymatic reaction in BlenX. is kb. If E and S are connected, they share a private channel
Upper part: boxes are pictured as different shapes where corners through which the synchronization of the output action x!() and
represent molecule interfaces (with their associate type). Lower input action y?() can happen at rate kc. If this happens the binder
part: The BlenX model of this reaction network is coded in three of S changes (at infinite rate) its type from DS to DP (see the
files: the program file (containing the definitions of the species), internal code of S), allowing the decomplexation of the two
the interfaces file (containing the affinities between the types), boxes (because the affinity between DE and DP is infinite for
and the declaration file (containing the definitions of the con- the decomplexation event, see the interfaces file). The final state
stants). The complexation between E and S happens because of of the system is the E species back in its initial state and a new
the affinity of DE and DS types at rate ka. This reaction is product species P
reversible, so the decomplexation rate between the same types
B 152 BLOcks SUbstitution Matrix (BLOSUM)
creating complex-on-the-fly (i.e., during the simula- BLOSUM-x matrix. The probability pi,j of a unique
tion): the complexation event is performed when two pair of amino acids at a site (Ai and Aj) is computed as
boxes have interfaces with compatible types and it is well as the probability pi of the unique amino acid to
p
used for modeling the creation of a private and perma- be Ai. Then the log odd ratio log pi;ji is computed and
nent (until the complementary decomplexation event) recorded in the (i,j) entry of the matrix. The BLOSUM
communication channel between two specific boxes. matrix comprises of information regarding the
Those two boxes will maintain their individual internal 20 amino acid properties similarity and yields delicate
behavior but they will be able to communicate on the evolutional and chemical association among the
shared channel: the only thing that the user needs to 20 amino acids.
define is a rate of complexation/decomplexation
between two binder types.
An example of a basic enzymatic reaction modeled Cross-References
in BlenX is shown in Fig. 2.
▶ TAP Translocator, In Silico Prediction
Cross-References
Definition
BLOcks SUbstitution Matrix (BLOSUM) Bone marrow-derived cells (BMDC) are derived from
hematopoietic stem cells that give rise to diverse cell
Joo Chuan Tong types present in blood and lymphatic organs, some of
Data Mining Department, Institute for Infocomm which are recruited to other organs.
Research, Singapore, Singapore
Department of Biochemistry, Yong Loo Lin School of
Medicine, National University of Singapore, Characteristics
Singapore
The term, bone marrow-derived cells (BMDC),
encompasses a broad class of undifferentiated cells
Definition known to originate from the bone marrow (▶ Bone
Marrow-derived Cells) that are dispersed among tis-
BLOSUM matices are calculated from blocks of sues. The plasticity of these cells has been implicated
highly conserved amino acid sequences with a certain in diverse processes from development to pathology
degree of similarity between these sequences. Matrix but remains poorly understood. Bone marrow trans-
constructed from no more than x% similarity is called plantation and lineage-tracing experiments have
Bone Marrow-derived Cells 153 B
provided strong experimental evidence that BMDC the ability to rescue lethally irradiated mice.
can generate distinct cell types, including cardiac mus- The uncommitted, or pluripotent, gives rise to five
cle, liver cell types, neuronal and nonneuronal cell progenitor cells that give rise to seven distinct cell
types of the brain, as well as endothelial cells and types: erythrocyte, lymphocyte, neutrophil, eosino-
osteoblasts. These multiple cell types could have orig- phil, basophil, monocytes, and platelets. For many B
inated from either HSC or MSC within bone marrow or of these cells differentiation occurs within the
from other less well-defined precursors. Injury may bone marrow, but for lymphocytes and monocytes,
mobilize BMDC into circulation and recruitment and maturation occurs after release into circulation
differentiation at damage sites (Orlic et al. 2001). and residence in tissues. In the case of lymphocytes,
Inflammation can also mobilize BMDC, and chronic this occurs in the tissues of the immune system.
inflammation may drive BMDC to actively participate Monocytes circulate for 3 days before migrating
in pathological process, including cancer (Houghton into tissues where they become tissue-resident
et al. 2004). macrophages. Dendritic cells, an important antigen-
Bone marrow, a tissue that replenishes the circulat- presenting cell residing in tissues, can be either
ing cells of the blood and immune system, contains lymphoid or myeloid in origin.
many cell types, including stroma, vascular cells, adi- BMDC are mobilized in cancer-bearing animals
pocytes, osteoblasts, and osteoclasts. These cells form and in humans (Wels et al. 2008). Many studies were
the microenvironment, or niche, in which mesenchy- initiated following the observation that recruitment of
mal stem cells (MSC) and hematopoietic stem cells a subset of hematopoietic and vascular progenitors
(HSC) reside. Both stem cells originate within the bone were required to assemble new vessels in tumors
marrow, but HSC mostly function to regenerate (Lyden et al. 2001). The key concepts are that BMDC
hematopoietic cells within the marrow and the circu- have temporally and spatially restricted roles in
lation, while MSC mostly relocate to produce a variety supporting in specific types of tumors, that recruitment
of cell types found in diverse organs. of even small numbers of BMDC can play a crucial
MSC replicate as undifferentiated cells and have the role in catalyzing tumor progression and that BMDC
potential to differentiate to lineages of mesenchymal may precede or presage cancer, implicating them in the
tissues, including bone, cartilage, fat, tendon, muscle, very earliest manifestations of cancer (Kaplan et al.
and marrow ▶ stroma (Pittenger et al. 1999). A MSC 2005).
population resides in bone marrow, but cells with The variety, rarity, and plasticity of different
similar molecular, phenotypic, and functional proper- BMDC subsets complicate their analysis and respec-
ties, at least in vitro, have been identified in a wide tive importance and specific functions in response to
range of tissues. MSC populations in adult tissues and injury, cancer, and pathology at large. An important
organs contribute to tissue cell turnover and also tool has been implementation of bone marrow trans-
respond to tissue damage, for example, in wound plantation in conjunction with various genetic reporter
healing (▶ Wound Response). systems and immunologic markers. One idea is that
HSC give rise to three main lineages, erythroid, BMDC can be used to treat injured tissues and promote
myeloid and lymphoid, which respectively form red repair, and another is that blocking BMDC recruitment
blood cells, monocytes, and immune cells (Morrison and/or action can prevent disease, but there is still
et al. 1997). Extensive single-cell transplantation of much to be learned about the recruitment, behavior,
highly purified stem cells demonstrated that the clonal and stability of these cells across development, during
contribution to the different blood cell lineages varies homeostasis, and in disease.
significantly and can be stably maintained through
serial passaging, providing evidence that the pool of
HSCs comprises at least two and possibly more distinct Cross-References
clonal subtypes imbued with differential lineage and
self-renewal potential (Dykstra et al. 2007). A great ▶ Adaptive Immune System
many functional and characterization studies have ▶ Bone Marrow-derived Cells
established a functional hierarchy of HSC using assays ▶ Fibroblasts
for growth potential under restrictive conditions and ▶ Stroma
B 154 Bonferroni Correction
Xiujun Zhang
Bonferroni Correction Institute of Systems Biology, Shanghai University,
Shanghai, China
Winston Haynes
Seattle Children’s Research Institute, Seatlle,
WA, USA Synonyms
Switching function
Definition
Synonyms
References
Kauffman network; N-K model
Dougherty E, Kim S, Chen Y (2000) Coefficient of determina-
tion in nonlinear signal processing. Signal Process
80:2219–2235
Kauffman S (1969) Metabolic stability and epigenesis in Definition
randomly constructed genetic nets. J Theor Biol
22:437–467 A Boolean network model is composed of two parts:
Kauffman S (1993) The origins of order: self-organization
(1) A directed and loops-allowed graph with N nodes
and selection in evolution. Oxford University Press,
New York representing the genes where each gene is connected
Pal R, Datta A, Bittner M, Dougherty E (2005) Intervention in with only K other genes (hence, Boolean network is
context-sensitive probabilistic boolean networks. Bioinfor- also called N-K model) (Jong 2002; Kauffman 1969;
matics 21:1211–1218
L€ahdesm€aki et al. 2003). Each gene only has two
Shmulevich I, Dougherty E, Kim S, Zhang W (2002)
From boolean to probabilistic boolean networks as states: 1 (on) means expressed and 0 (off) means silent.
models of genetic regulatory networks. Proc IEEE (2) Boolean functions and states of genes, where the
90:1778–1792 state of each gene i, i ¼ 1, 2, ···, N, are modeled as the
Shmulevich I, Dougherty E (2007) Genomic signal processing.
output of Boolean function fi with K (or at most K)
Princeton University Press, New Jersey
inputs specified by the graph. Once the network and the
Boolean functions are determined, the model’s state at
any discrete time step is uniquely determined by the
initial assignment of the states of the genes. Since the
Boolean Modeling of Cell Cycle Control number of the possible states is finite, the model will
inevitably fall into an attractor at last (point attractor or
▶ Cell Cycle Modeling Using Logical Rules cyclic attractor; see Fig. 1).
X T T+1 1 1 0
Z X
0 0
1 1 0 1 1
1 0 1
1 0 0
T T+1 T T+1
X Y X Y Z
Y Z
0 0 0 0 0
0 1 1 1 1 1
1 1
1 0 0
1 1 1
Boolean Networks, Fig. 1 Left: An example of Boolean net- fz(1, 0) ¼ 0; fz(0, 1) ¼ 0; fz(1, 1) ¼ 1. Right: The states trajectories
work. X, Y, and Z are three genes. The Boolean functions of them of two particular realizations of the Boolean network. The top
are given by the truth tables placed next to them. For example, trajectory falls into a cyclic attractor, and the bottom one falls
the Boolean function fz of gene Z at time T is fz(0, 0) ¼ 0; into a point attractor
B 158 Bootstrap Aggregating
XB
^yboot ¼ 1 ^yb (1)
B b¼1
Bootstrapping
Synonyms
1X B
biasf^yg ¼ ð^yb ^yÞ (2)
Bootstrap resampling B b¼1
Bootstrapping 159 B
Bootstrapping,
Fig. 1 Simplified schematic
of the standard bootstrap
process. From the data set D
comprising n points, B
bootstrap samples are B
generated by repeated uniform
sampling with replacement.
From each bootstrap sample,
an estimate of the parameter y
is calculated. The B estimates
are used to derive the
bootstrapped estimate ^yboot
0 if yi ¼ Cb(xi) and 1 otherwise. Equation (4) defines classifier is trained on about 63% of the cases only.
the bootstrapped estimate of the prediction error As a result of the small effective training set, the leave-
(Hastie et al. 2008). Here, B is the number of bootstrap one-out bootstrapped estimate ^eloob tends
samples, and n is the total number of cases. to overestimate the true prediction error (i.e., ^eloob
is biased upward). The 0.632 bootstrap
(▶ Bootstrapping, 0.632 Bootstrap) addresses
11X B X n
^eboot ¼ Lðyi ; Cb ðxi ÞÞ (4) this bias by weighing the leave-one-out bootstrapped
B n b¼1 i¼1
estimate and the bootstrapped resubstitution error,
^eresub , which is defined in (6).
The expected number of cases that are identical in
the data set D and the bth bootstrap sample is 0.368
times n. As the training sets overlap with the test set, 1X n
1 X
^eresub ¼ Lðyi ; Cb ðxi ÞÞ; (6)
the bootstrapped estimate of the prediction error, ^eboot , n i¼1 jSþi j b2S
þi
is biased downward (i.e., optimistically biased). For
▶ overfitted classifiers, the bootstrapped estimate of
the prediction error would then be smaller than their with S+i being the set of indices of bootstrap
true prediction error e. samples that contain the case i, and |S+i| is the number
To alleviate the optimistic bias of (4), we of those samples. Like any resubstitution error esti-
can exclude those cases that are contained in both mate, ^eresub is biased downward, i.e., it is smaller
the bootstrap samples and the original data set. than the true prediction error. To correct the downward
The resulting estimate is called the leave-one-out bias of the 0.632 bootstrap (▶ Bootstrapping, 0.632
bootstrap estimate of the prediction error (Hastie Bootstrap), Efron and Tibshirani (1997) developed
et al. 2008). the 0.632 +bootstrap (▶ Bootstrapping, 0.632+
Bootstrap).
1X n
1 X Figure 2 illustrates the relation between the
^eloob ¼ Lðyi ; Cb ðxi ÞÞ; (5) leave-one-out bootstrapped estimate and the
n i¼1 jSi j b2S
i bootstrapped resubstitution error in a simplified
example. To calculate the contribution of case 1 to
with S–i being the set of indices of bootstrap samples the leave-one-out bootstrapped estimate, the
that do not contain the case i, and |S–i| is the number of prediction of only model C2 is relevant because it is
those samples. Because of the random sampling, it is derived from a bootstrap sample that does not contain
possible that all bootstrap samples contain the case i, case 1. In contrast, to calculate the contribution
leading to |S–i| ¼ 0. To avoid a division by zero, the of case 1 to the bootstrapped resubstitution error,
number of bootstrap samples needs to be sufficiently the predictions of only models C1 and C3 are used
large so that at least one bootstrap sample does because only the bootstrap samples #1 and #3 contain
not contain the case i. Alternatively, we could omit case 1.
those cases that are included in all bootstrap samples
(Hastie et al. 2008). Note that, although the estimate Advantages and Disadvantages of the Bootstrap
^eloob is called the leave-one-out bootstrapped estimate, The bootstrap is one of the most commonly
on average, 0.368 times n cases are left out per boot- used resampling strategies for assessing the reliability
strap sample. The leave-one-out bootstrap estimate of of performance measures in classification tasks, specif-
the prediction error is not to be confused with the out- ically when the data sets are relatively small. This is
of-bag estimate of the error rate resulting from bagged generally the case for genomic data sets where the
classifiers. number of features (e.g., probe sets on a microarray) is
For each bootstrap sample, the expected number of orders of magnitude smaller than the number of
distinct training cases is 0.632 times n. So each specimens.
Bootstrapping 161 B
are questionable and classic statistical procedures
thus problematic. However, it is also possible to
include distributional assumptions into the sampling,
for example, we may assume that the data follow
approximately a normal distribution. Or, on the B
basis of our prior beliefs, we may assign different
prior probabilities to the individual cases for being
selected for a bootstrap sample. The bootstrap is then
a parametric method. Thus, the bootstrap can be
performed either with or without a predefined proba-
bility model (Davison and Hinkley 1997).
A particular advantage of the bootstrap is that its
implementation is straightforward. The language
and environment ▶ R, for example, provides several
functions for parametric and non-parametric
bootstrapping (see, e.g., the R library boot). Specifi-
cally, R provides functions for computing a variety of
bootstrapped confidence intervals (see, e.g., the
R function boot.ci). For instance, using the percentile
bootstrap, we derive a confidence interval for our
statistic of interest as follows: (1) we sample several
hundred bootstrap samples with replacement; (2) we
calculate the statistic for each sample; and (3) we use
the a/2 and (1 a/2) percentiles of the distribution to
obtain an empirical (1 a/2)-level confidence inter-
val. For many real-world data sets, the assumptions
underlying conventional confidence intervals are vio-
lated. Specifically for small data sets, the assumption
of an asymptotic distribution may not hold.
Bootstrapped confidence intervals are then an inter-
esting alternative.
The bootstrap requires multiple samplings of the
data set. The computational costs may be considered
a disadvantage, specifically if all possible subsamples
are generated. The exhaustive subsampling, however,
is generally not necessary. In the context of perfor-
Bootstrapping, Fig. 2 Illustration of the leave-one-out mance assessment, bootstrapped estimates of error
bootstrapped estimate and the bootstrapped resubstitution error. rates tend to be optimistically biased, specifically for
The data set contains n ¼ 10 cases; here, only the case indices are the leave-one-out bootstrap and the 0.632 bootstrap
shown. From the original data set D, three bootstrap samples are (Molinaro et al. 2005). For small data sets, the 0.632+
drawn, and three classifiers (C1, C2, and C3) are built
bootstrap performs competitively with leave-one-out
cross-validation and ten-fold cross-validation
The non-parametric bootstrap method does not (Simon 2007). Although bootstrapping is often
rely on any assumptions about the data distribution. recommended for small sample scenarios, it is no
Therefore, bootstrapping can be applied to real-world panacea for a lack of data (Jiang and Simon 2007;
data sets for which specific distributional assumptions Isaksson et al. 2008).
B 162 Bootstrapping, 0.632 Bootstrap
Cross-References
Bootstrapping, 0.632 Bootstrap
▶ Bootstrapping, 0.632 Bootstrap
▶ Bootstrapping, 0.632+ Bootstrap Daniel Berrar1 and Werner Dubitzky2
▶ Classification 1
Interdisciplinary Graduate School of Science and
▶ Cross-Validation Engineering, Tokyo Institute of Technology,
▶ Learning, Attribute-Value Midori-ku, Yokohama, Japan
▶ Model Testing, Machine Learning 2
Biomedical Sciences Research Institute, University of
▶ Overfitting Ulster, Coleraine, UK
▶ R, Programming Language
Definition
References
The 0.632 bootstrap weighs the bootstrapped
Davison AC, Hinkley DV (1997) Bootstrap methods
resubstitution error (▶ Bootstrapping), ^eresub , and the
and their application. Cambridge University Press,
New York leave-one-out bootstrapped estimate of the prediction
Dixon PM (2006) Bootstrap resampling. In: El-Shaarawi AH, error (▶ Bootstrapping), ^eloob , as follows:
Piegorsch WW (eds) Encyclopedia of environmetrics.
Wiley, New York, pp 212–220
Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd
edn. Wiley-Interscience, New York. ^e:632 ¼ 0:368 ^eresub þ 0:632 ^eloob (1)
Efron B (1983) Estimating the error rate of a prediction rule:
improvement on cross-validation. J Am Stat Assoc 78
(382):316–331.
Efron B, Tibshirani R (1997) Improvement on cross-
This weighted average corrects the upward bias of
validation: the 0.632+ bootstrap method. J Am Stat Assoc the leave-one-out bootstrapped estimate. However,
92:548–560. ^e:632 can now be biased downward (Hastie et al.
Efron B, Tibshirani R (1998) An introduction to the bootstrap. 2008). Consider the example of a classification prob-
Chapman & Hall, London
Hastie T, Tibshirani R, Friedman J (2008) The elements of
lem with two balanced classes where the class labels
statistical learning. Springer Series in Statistics, New York/ are independent of the data set attributes. For this data
Berlin/Heidelberg set, the true prediction error of any classifier
Isaksson A, Wallman M, Goeransson H, Gustafsson MG cannot be smaller than 0.5. However, consider the
(2008) Cross-validation and bootstrapping are unreliable
in small sample classification. Pattern Recogn Lett 29:
0.632 bootstrap estimate for the 1-nearest neighbor
1960–1965 classifier (Hastie et al. 2008). Its resubstitution
Jiang W, Simon R (2007) A comparison of bootstrap methods error is zero because 1-NN uses the class label of
and an adjusted bootstrap approach for estimating the the duplicated training case to predict the class label
prediction error in microarray classification. Stat Med
26:5320–5334
of the corresponding test case in D. The leave-one-out
K€unsch HR (1989) The jackknife and the bootstrap bootstrapped estimate is 0.5 because for those
for general stationary observations. Ann Stat 17:1217– cases that are not included in the bootstrapped sam-
1241 ples, 1-NN performs like a random guesser. Thus,
Molinaro AM, Simon R, Pfeiffer RM (2005) Prediction error
estimation: a comparison of resampling methods. Bioinfor-
the 0.632 bootstrapped estimate of the prediction
matics 21(15):3301–3307 error is ^e:632 ¼ 0 þ 0:632 0:5 ¼ 0:316, which
Silverman BW, Young GA (1987) The bootstrap: to smooth or underestimates the true error rate of 0.5 in this
not to smooth? Biometrika 74:469–479 example. To correct the downward bias of the 0.632
Simon R (2007) Resampling strategies for model assessment and
selection. In: Dubitzky W, Granzow M, Berrar D (eds)
bootstrap, Efron and Tibshirani (1997) developed
Fundamentals of data mining in genomics and proteomics. the 0.632+ bootstrap (▶ Bootstrapping, 0.632+
Springer, Heidelberg Bootstrap).
Bottom-Up Hierarchical Clustering 163 B
Cross-References degree of ▶ overfitting. Hence, the weights for the
0.632+ bootstrap are not fixed as in the 0.632
▶ Bootstrapping, 0.632+ Bootstrap bootstrap but determined based on the degree of
▶ Bootstrapping ▶ overfitting. Efron and Tibshirani (1997) developed
the 0.632+ bootstrap to address the downward bias B
of the 0.632 bootstrap (▶ Bootstrapping, 0.632
Bootstrap).
References
Definition
▶ Interlevel Causation
Here, ^eresub is the bootstrapped resubstitution error
(▶ Bootstrapping), and ^e loob is the leave-one-out
bootstrapped estimate of the prediction error
(▶ Bootstrapping). ^e random is the estimate of the pre-
diction error under the assumption that the class Bottom-Up Hierarchical Clustering
labels are independent of the data set attributes
(▶ Learning, Attribute-Value). R is an estimate of the ▶ Hierarchical Agglomerative Clustering
B 164 Bromodomain
Definition
Cross-References
Bromodomains, initially identified in the Drosophila
protein Brahma, is an acetylated histone recognition ▶ Histone Post-translational Modification to
domain and found in various chromatin factors. Nucleosome Structural Change
ATP-dependent nucleosome-remodeling factors and
histone acetyltransferases frequently contain
bromodomains (Allis et al. 2006). Typical References
bromodomains are composed of approximately 110
amino acid residues. Tertiary structure analyses showed Allis CD, Jenuwein T, Reinberg D, Caparros M-L (2006)
Epigenetics, 1st edn. Cold Spring Harbor Laboratory Press,
that the bromodomain adopts a four-a-helix-bundle
New York
structure and has a cavity at the top of the molecule to
accommodate an acetylated lysine residue (Fig. 1).
Brownian Ratchet
Nobuo Shimamoto
Faculty of Life Sciences, Kyoto Sangyo University,
Kyoto, Japan
Definition
13
C Metabolic Flux Analysis Characteristics
13
Meghna Rajvanshi and Kareenhalli V. Venkatesh C metabolic flux analysis is a powerful technique
Department of Chemical Engineering, Indian Institute for modeling biological systems. 13C metabolic flux
of Technology Bombay, Powai, Mumbai, analysis is free of all the short comings of the stoichio-
Maharashtra, India metric ▶ Metabolic Flux Analysis (MFA), though it
needs more information in addition to extracellular
flux data to have the complete flux distribution.
Synonyms Stoichiometric MFA and ▶ Flux Balance Analysis
(FBA) are restricted in their ability to fully quantify
13
C Fluxomics the network because they cannot be applied in
cases involving the following: (1) metabolic pathways
where the energy metabolites, such as ATP, NADH,
Definition NADPH, are not balanced; (2) parallel reactions
where none of the competing branches are directly
13
C MFA is an experimental approach to compute coupled to measurable variable or metabolite;
intracellular flux pattern. It involves tracer studies (3) metabolic pathways containing cyclic reaction
where substrates are labeled with stable isotopes, chains not linked to a measurable flux; and (4) meta-
such as 13C, and the labeling pattern of metabolites is bolic cycles having bidirectional reactions (Wiechert
subsequently measured. Detailed flux distributions can et al. 1997).
13
C Metabolic Flux Analysis, Fig. 1 Different labeling states of a molecule containing 3 C atoms. Labeled C is shown as black
circles. There are eight positional isotopomers and four mass isotopomers (specified as (a–d))
Certain assumptions and rules need to be applied The basic procedure followed for 13C metabolic
while performing 13C metabolic flux analysis as listed flux analysis (Wiechert 2001; Kr€omer et al. 2009) is
below (Marx et al. 1996): to measure the labeling patterns of metabolites.
1. It is based on an assumption of a pseudo steady Carbon labeling experiment consists of growing the
state. organism of interest on 13C substrate such as glucose
2. It assumes that there is no preference of enzymes for labeled at a specific location. This labeled substrate is
labeled and unlabeled substrates (i.e., no isotope then utilized or consumed across the various path-
effect). ways finally leading to enrichment of the metabolites
3. Bioreactor size should be small to minimize the cost intracellular pool. Cells are harvested and are hydro-
of the experiment as labeled substrates are quite lyzed to get metabolite precursors like amino acids.
costly but it is also necessary that the volume reduc- The pattern of enrichment in these metabolites is
tion should not be to the extent where stable identified or estimated by mass spectrometry (MS)
stationary condition cannot be achieved. or NMR. The output from MS or NMR consists of
4. It is necessary that all the intracellular metabolites a spectral pattern. The pattern and the level of enrich-
are in steady state both in terms of metabolic fluxes ment is a fingerprint for a particular kind of pathway
and isotopomer distribution. Typically it takes 2–4 involved. The level of enrichment of certain metabo-
bioreactor residence times to reach the isotopic lite is a reflection of the permutation and combina-
equilibrium state. Time required for intracellular tions a labeled molecule has undergone. MS is
intermediates to reach isotopic steady state is com- becoming the method of choice for 13C analysis due
paratively less than the time taken by the macro- to its comparatively simpler data interpretation ability
molecules of biomass (proteins, RNA, DNA, cell even for multimolecular species. NMR on the other
wall) to reach isotopic steady state. The term hand does not require complex sample processing as
isotopomer implies “one of the different labeling MS but its spectral data becomes extremely convo-
states in which a particular metabolite can luted for molecules with more than three atoms.
be found.” There can be 2n different labeling A few deconvolution algorithms are available but
states of a molecule. For example, for a metabolite are mathematically involved and this restricts its
with 3 C atoms, there can be eight different usage to only few users with time-developed
positional isotopomers while only four different skill and experience. Thus, this intracellular labeling
mass isotopomers (Fig. 1). Knowledge of complete information, in combination with extracellular
isotopomer distribution yields information regard- fluxes, is used to compute the intracellular fluxes.
ing the enrichment pattern of different intracellular Several algorithms are available to decipher the
metabolites. This fractional enrichment information meaningful information from such spectra. OpenFlux
reflects how atoms in the reacting species interact (Quek et al. 2009), FiatFlux (Zamboni et al. 2005),
13
with each other to form product molecules. Thus, C MFA, etc., are some of the examples of the
fractional enrichment data along with C atom tran- software available. The enrichment information in
sition for all the species involved in a set of reac- terms of mass isotopomer distribution acts as an
tions defining the intracellular metabolism will input to the algorithm. Algorithm uses this mass
yield intracellular fluxes. isotomomer data to obtain the positional isotopomer
13
C Metabolic Flux Analysis 169 C
13
C Metabolic Flux
Analysis, Fig. 2 Schematic Growth on 13C labeled Substrate
flowchart of 13C-based
metabolic flux analysis
Assumed Fluxes
Isotopic Enrichment
Measurement
By Software
NMR or GC/MS Simulation
Simulated
Mass Isotopomer Distribution Isotopomer
Distribution
DIFFERENCE
data, which is further used to estimate the fluxes. Applications of 13C MFA
Software works by minimizing the error between
13
the simulated and experimental mass isotopomer C MFA has been widely used in analyzing microbial
distribution. One fundamental problem of CLE is metabolism and regulation involved in metabolic net-
that it is usually difficult to measure the labeling works. It is mainly used for analyzing central carbon
pattern of many important intracellular metabolites metabolism but recently has been applied for genome
due to their small pool size. To overcome this prob- scale networks as well. 13C MFA has good predictabil-
lem, the labeling patterns of amino acids are often ity for prokaryotic systems but application of 13C MFA
measured, since the amino acids reflect the labeling to eukaryotic systems, especially to plant and animal
states of a number of important intermediate metab- cells, is extremely difficult due to the intracellular
olites through their precursors. These labeling mea- compartmentalization. Information of compartmental-
surement data provide additional and yet independent ization has to be captured by model properly in order to
constraints on the intracellular fluxes and thus enable obtain proper fitting of predicted versus experimental
a more refined analysis of metabolic fluxes in the data (Wiechert 2001). Similarly application of 13C
complex metabolic network. An overall outline of MFA is obscure when multiple carbon substrates are
13
C-based metabolic flux analysis methodology is used. High cost of labeled substrates, need for special-
given in Fig. 2. ized instrumentations (NMR or MS) for determining
C 170 C Domain
isotopic labeling, and elaborate mathematical and sta- ▶ Metabolic Flux Analysis
tistical analysis for isotopomer modeling are some ▶ Spectroscopy and Spectromicroscopy
additional limitations of 13C MFA, which further
restrict its use (Tang et al. 2009). 13C MFA requires
excessive experimental data and is quite complicated References
as compared to other methods of flux analysis;
however, for certain cases, 13C MFA is the only Blank LM, Kuepfer L, Sauer U (2005) Large-scale 13C-flux
analysis reveals mechanistic principles of metabolic network
solution. For example as stated in the previous section,
13 robustness to null mutations in yeast. Genome Biol 6(6):R49
C MFA can resolve parallel pathways, cyclic meta- Iwatani S, Yamada Y, Usuda Y (2008) Metabolic flux analysis in
bolic pathways and bidirectional fluxes in the meta- biotechnology processes. Biotechnol Lett 30(5):791–799
bolic network. 13C MFA has been extensively used in Kr€
omer J, Quek L-E, Nielsen L (2009) 13C-Fluxomics: a tool for
measuring metabolic phenotypes. Australian Biochemist
studying metabolism of relevant organisms for bio-
40(3):17–20
technology processes, namely, production of amino Marx A, de Graaf AA, Wiechert W, Eggeling L, Sahm H (1996)
acids (lysine, glutamate, phenyl alanine, methionine, Determination of the fluxes in the central metabolism of
etc), vitamin (riboflavin), ethanol, glycerol, antibiotics Corynebacterium glutamicum by nuclear magnetic reso-
nance spectroscopy combined with metabolite balancing.
(penicillin G), and bioplastics (1,3-propanediol)
Biotechnol Bioeng 49:111–129
(Iwatani et al. 2008). It has also been used to test the Quek L-E, Wittmann C, Nielsen L, Kromer J (2009)
validity of the assumptions made in other methods of OpenFLUX: efficient modelling software for 13C-based
flux analysis and about the metabolic network. 13C metabolic flux analysis. Microb Cell Fact 8(1):25
Tang YJ, Martin HG, Myers S, Rodriguez S, Baidoo EEK,
MFA has been used in studying single gene disruption
Keasling JD (2009) Advances in analysis of microbial met-
mutant and establishing robustness in the metabolic abolic fluxes via 13C isotopic labeling. Mass Spectrom Rev
network (Blank et al. 2005). 13C MFA has proven to 28(2):362–375
be quite useful in cancer research to study unique Wiechert W (2001) 13C metabolic flux analysis. Metab Eng
3(3):195–206. doi:10.1006/mben.2001.0187
metabolic characteristics under diseased condition. It
Wiechert W, Siefke C, de Graaf AA, Marx A (1997) Bidirectional
is used to understand flux distribution through impor- reaction steps in metabolic networks: II. Flux estimation and
tant pathways for amino acids and fatty acids biosyn- statistical analysis. Biotechnol Bioeng 55(1):118–135
thesis in cancer cells (Yang et al. 2008). Other Yang C, Richardson A, Osterman A, Smith J (2008) Profiling of
central metabolism in human cancer cells by two-
important application of 13C MFA is in identification
dimensional NMR, GC-MS analysis, and isotopomer model-
of bottlenecks in metabolic network of the industrially ing. Metabolomics 4(1):13–29
important organisms. Identified target genes, upon Zamboni N, Fischer E, Sauer U (2005) FiatFlux–a software for
manipulation through genetic engineering techniques, metabolic flux analysis from 13C-glucose experiments.
BMC Bioinformatics 6:209
lead to enhanced productivity of the metabolite of
interest. 13C MFA has also been used in identifying
drug targets for various diseases. By studying flux
distribution in pathogens or diseased cells, it might be C Domain
possible to identify a pathway crucial for the survival
of the pathogen inside the host cells. Such pathways ▶ Constant (C) Domain
can be potential drug targets (Tang et al. 2009). Thus,
in spite of being a complex methodology for flux
analysis, 13C MFA has wide applicability in different
fields. It provides important insights about the charac- C Gene
teristics of a metabolic network, for which this tech-
nique is indispensable. ▶ Constant (C) Gene
Cross-References
C Type Domain
▶ Flux Balance Analysis
▶ Mass Spectrometry, Proteomics, and Metabolomics ▶ Constant (C) Domain
Canalization 171 C
Calibration Canalization
Canalization,
Fig. 1 Canalization (after
Waddington). (a) Epigenetic
landscape, figuring the
robustness of developmental
fates. (b) Genetic
underpinning of canalization
Cross-References Cross-References
References
References
Lehner B (2010) Genes confer similar robustness to environ-
mental, stochastic, and genetic perturbations in yeast. PLoS World Health Organization (2011) Cancer. Fact sheet 297.
One 5(2):e9035 World Wide Web. http://www.who.int/mediacentre/
Masel J, Siegal M (2009) Robustness: mechanisms and factsheets/fs297/en/. Accessed 18 May 2011
consequences. Tr Gen 25(9):395–403
Wagner A (2005) Robustness and evolvability and living
systems. Princeton University Press, Princeton
qualified medical observers, cancer is exceptionally Caucasian inhabitants. The topic has been addressed
rare among the primitive peoples. . .” more recently by Howard and Newby (2004) and
It is easy to dismiss such evidence as “anecdotal.” Irigaray et al. (2007).
However, with the reliably observed continuation of
the trend to increased cancer incidence in developed Low-Dose Epigenetic Phenomena and Cancer
countries, combined with the absence of any evidence There is increasing emphasis on the influence of low-
at all about substantial cancer incidence rates in dose intrauterine exposure of the fetus to ubiquitous
preindustrial societies, it seems reasonable to conclude environmental pollutants on the subsequent risks for
that cancer as a disease is related primarily to industri- cancer in later life. This rather subtle form of environ-
alization. Other arguments can be brought into play to mental chemical exposure, associated with long
counter this assertion. For example, average life expec- latency periods, has been prompted by the demonstra-
tancy is higher in developed countries, and cancer tion in animal models that doses of certain chemicals,
incidence is a function of age (though if one is living several orders of magnitude below the concentration
in a “soup” of carcinogenic influences from conception required to produce measureable effects in adults,
to grave, then length of exposure would be expected to appear to be able to perturb normal development. The
be significant, though not necessarily causal in itself). explanation for this is likely to be found in considering
It should be noted though that average life expectancy the physiological concentrations at which cell signal-
itself, as a measure, can be deceptive. It has been used ing molecules, such as hormones, operate. This is
to assert that life in earlier times was brutal and short typically very low, in parts per trillion. Moreover,
with most adults dying in their third decade. This is dose-response curves tend to be nonlinear and with
a very simplistic explanation. In preindustrial socie- an inverted U shape. This indicates that there is an
ties, the death rate in infancy was high, but if adoles- optimal concentration at which such cell signaling
cence was reached then, according to the same molecules operate, that is, there can be too little and/
medically qualified observers mentioned above, the or too much, as is typical with receptor-mediated
chances of living a reasonable life span in good health events.
were high and unlikely to end in the development of In adult animals, cell signaling molecules are
cancer. These observations are supported by the rec- involved in maintaining homeostasis in an intact
ognition among most epidemiologists that the major organism in which most structural and functional ele-
contribution to the rise in average life expectancy in ments are relatively stable. In the fetus, on the other
developed countries has been the reduction in infant hand, cell signaling molecule expression is intimately
mortality. tied up with control of cell proliferation and apoptosis
during windows of developmental activity. Perturba-
Nature Versus Nurture tion of the tightly regulated levels of key cell signaling
Epidemiological studies of monozygous twins are molecules can lead to subtle changes in the phenotype
regarded as the golden yardstick for deciding whether which can be expressed anatomically as tissue dysgen-
a disease is predominantly determined either by esis or physiologically as functional deficits. Such
genetic or, on the other hand, environmental factors. mechanisms appear to be highly conserved between
Lichtenstein et al. (2000) reported a study of the con- species, making animal studies potentially more com-
cordance of similar cancers in 45,000 pairs of identical parable across species than for adult high-dose
twins. The average rate of concordance was rather low toxicology.
10–15%. The conclusion from the study was that Hitherto, classical toxicology has mainly been pred-
environmental factors far outweigh genetic influences icated on adult animal studies. Teratological studies
in the etiology of cancer. Czene et al. (2002) reported have mainly been restricted to the detection of naked
similar conclusions. The concept should not be eye malformations. Why is this of relevance to the
surprising, because Doll and Peto (1981) showed, in study of environmental causes of cancer? It is known
their study of cancer incidence in native Japanese that any child born with a malformation is marginally
compared to émigré Japanese in Hawaii, that breast more likely than average to develop cancer in their
cancer rates were four times higher in the emigrants, lifetime. There is a connection, as yet not fully under-
approaching the incidence prevailing in the indigenous stood, between the ways that tissues are assembled in
Cancer and Environmental Influences 175 C
the fetus and the subsequent development of cancers. exposed in the womb to the EDS bisphenol A between
This process is normally associated with long gestational day 9 and postnatal day 1, at concentrations
latencies. 200 times lower than the current regulatory tolerable
daily intake. The exposed rats developed such tumors
Cell Signaling Disruption in the Fetus and Cancer in early adulthood, while there were no such changes in
The most widely studied group of chemicals capable of the breasts of the untreated control group. Morpholog-
affecting the function of natural cell signaling mole- ical changes to breast architecture in mice were shown, C
cules are classed as endocrine-disrupting substances Markey et al. (2001), at even lower doses.
(EDSs). There is general agreement that high-dose The Endocrine Society Scientific Statement was
therapeutic exposure to synthetic hormones at critical less emphatic about the role of EDS in male genital
stages of development can cause cancer, for example, tract tumors. However, the testicular dysgenesis syn-
intrauterine exposure to diethylstilboestrol (DES) led, drome (TDS) is of relevance. TDS consists of four
after a 20-year latency period, to clear cell adenoma interlinked conditions: cryptorchidism, hypospadias,
of the vagina in female offspring. However, a recurring oligospermia, and testicular germ cell cancer
argument against the causal link between environmen- (TGCC). The incidence of all four of these conditions
tal exposure to such environmental pollutant EDS and has been rising sharply over the past decades in many
subsequent carcinogenesis, predicated on classical developed countries. TGCC is typically a cancer of
high-dose toxicology, is that pollutants are simply not younger men, the peak age range for incidence is
present in sufficient environmental quantities to lead to between 25 and 40. When a child has a cryptorchid
a human exposure level that could cause cancer. testis, then it is either brought down into the scrotum by
Another aspect is that many EDS are not genotoxic, orchidopexy or removed (orchidectomy). The reason
which under the somatic mutation theory (SMT) (see for this is that an intra-abdominal undescended testis is
entry ▶ Cancer Theories), Boveri (1929), would put dysgenic and carries a higher risk of developing
a question mark over their ability to cause cancer. Such TGCC.
conclusions, however, are usually based on the logic of Skakkebaek et al. (2001) examined bilateral biop-
cancer induction toxicology performed on adult labo- sies, one from each testis, in cases of unilateral TGCC
ratory animals. demonstrated carcinoma in situ in both testes. The
Human evidence of links between exposure to envi- conclusion drawn was that TGCC cancer is
ronmental pollutants, at background, and cancer is a developmental disease. They also demonstrated that
beginning to emerge, as illustrated in a paper by hypofertile males were more likely to develop TGCC
Hardell and Eriksson (2003) which showed a positive than average for the population.
correlation between higher current levels of PCBs, Epidemiological evidence for the hypothesis that
HCB, and chlordanes (long lived pollutants) in human exposure to EDCs causes TGCC remains
mothers with sons who have developed testicular can- scarce. There are some indications that exposure to
cer. However, the majority of our current knowledge PCBs and mono-butyl phthalate may be implicated
concerning low-dose fetal exposure to EDS and sub- but the evidence is not conclusive. Migration studies
sequent cancer development comes from animal have shown that first generation migrants retain the
studies. incidence level of TGCC that prevails in their coun-
The Endocrine Society has published a Scientific tries of birth, while second and subsequent generations
Statement on endocrine-disrupting chemicals had rates similar to their adoptive country. This gives
(Diamanti-Kandarakis et al. 2009). When considering strong evidence that TGCC is a developmental condi-
the role played by EDS in the etiology of breast cancer tion and that it is mediated via environmental factors.
the report concludes that “Collectively, these data sup-
port the notion that endocrine disruptors alter mam- Mechanisms of Cancer Induction
mary gland morphogenesis and that the resulting It is clearly beyond the scope of this entry to discuss all
dysgenic gland becomes more prone to neoplastic the evidence on EDSs and cancer induction. However,
development.” The study in question, Murray et al. the point of the previous section has been to empha-
(2007), was the demonstration of the development of size that a low-dose mechanism mediated through
carcinoma in situ of the breasts in 33% of fetal rats fetal exposure can and does lead to tissue dysgenesis.
C 176 Cancer and Environmental Influences
This is essentially a non-genotoxic process. How then congeners and enantiomers (mirror image isomers)
does it relate to the somatic mutation theory (see entry are taken into account (Vetter and Luckas 1995,
▶ Cancer Theories)? Well, maybe not very much! An 1998). Other groups of organic chemicals also have
alternative hypothesis has been proposed by high numbers of variants. We simply do not possess
Sonnenschein and Soto (1999), the tissue organization the tools in toxicology to analyze complex mixtures.
field theory (TOFT), which provides a more logical Even testing mixtures of two chemicals at a time is
explanation of low-dose fetal exposure mechanisms quite laborious (Axelrad et al. 2002). There are some
(see entry ▶ Cancer Theories). Under this hypothesis, bio-monitoring methods being developed for estimat-
it is proposed that, as in other life forms, all cells in the ing the total dioxin-like activity or estrogen-like activ-
body are in a “default” state of proliferation, rather ity of mixtures (Murk et al. 1996; Soto et al. 1997), but
than “quiescence,” as previously assumed. In these do not identify individual components of the
multicellular organisms, where cells must exist in an mixture. However, many of these components are
ordered “society of cells,” the desire to proliferate has acknowledged to be carcinogenic or cancer promoters.
become continuously suppressed by the evolution of Under such circumstances, the power of epidemiol-
inhibitory juxtacrine and autocrine (i.e., local hor- ogy to determine outcome against exposure in human
monal) influences. This local control is intimately populations is rather weak and requires enormous bio-
bound up with local architecture of the tissue at monitoring programs involving hundreds of thousands
a microscopic level. The balance of the stroma to the of participants. Some such programs are getting under
parenchyma and the relationship of one cell type to way but will not report on cancer outcomes for several
another in 3D space are probably of high significance decades. In the meantime, Sir Bradford Hill
(Kaufman and Arnold 1996). The TOFT proposes that (1960) criteria on causation are worth revisiting. He
carcinogens alter tissue interactions, such as those gave a list of criteria that can be considered when
occurring between the stroma and the epithelium, that addressing the causation of disease in occupational
in turn cause the architectural changes found in carci- medicine. Now it is realized that they have much
nomas, which, in turn, facilitates an increased local cell wider significance and applicability to other fields of
proliferation rate. The control of the micro- medicine. It is not essential that all the criteria be
architecture of organs by epigenetic factors (external fulfilled to establish causation. We discuss each
influences), throughout the developmental period, is Bradford Hill criteria with respect to the etiology of
a delicate process. Some of these factors are known cancer from environmental pollution, in order of
but most are not fully understood. Hormones, which decreasing strength, as previously considered by
are known to be involved in the control of organogen- Nicolopoulou-Stamati, Howard and Gaudet (2004).
esis, act in concentrations in the parts per trillion 1. Temporal sequence: The suspected causal agent
ranges. Xeno-chemicals, which may mimic hormones, must appear in the environment prior to the effect
are present in the populations of industrialized coun- it is assumed to be causing. Preindustrialization
tries at similar concentrations. From our current levels of cancer were low. There has been a steady
knowledge base, they are likely to be inducing tissue increase in cancer incidence following the start of
dysgenesis in the fetus. industrialization. The incidence of certain hormone
related cancers has increased rapidly following the
Complexity introduction of EDSs into the general environment,
Another feature of human exposure to potential car- and there is universal exposure of the population.
cinogens is the extreme complexity of the mixture of This provides strong evidence of causality related
xeno-chemicals to which populations are exposed, to industrial development as measured by this
which consists of hundreds of chemical groups. If criterion.
these groups are broken down into individual com- 2. Experimental evidence: There is strong experimen-
pounds, then there are tens of thousands of individual tal evidence that many of the chemical pollutants in
chemicals in the mixture. For example, the organo- the environment and the food chain are carcino-
chlorine pesticide toxaphene, which in many classifi- genic in animal models. To this, we must add radi-
cations would be regarded simply as one chemical, has ation exposure as a major influence, where the
over 62,000 theoretically possible variants if all experimental evidence is overwhelming.
Cancer and Environmental Influences 177 C
3. Biological plausibility: Exposure to environmental exposure and pleural mesothelioma and radiation
pollution commences pre-conceptually and con- and leukemia. However, apart from the undisputed
tinues throughout all stages of life. This exposure rise in cancer incidence and its temporal sequence
therefore covers vulnerable windows of develop- considered in association with the rise in environ-
ment. There is very high biological plausibility mental pollutant exposure, added to the small num-
that exposure to a mixture of known carcinogens ber of published reports discussed in the biological
would be causally related to the postindustrial gradients section, there is in general weak associa- C
increase in cancer incidence. Furthermore, cancer tion. However, this is largely attributable the lack of
incidence would be age related as this is a measure specificity of association with a complex mixture of
of the length of exposure to these influences. influences.
4. Reasoning by analogy: We can reason that pollut- 8. Specificity: This is the weakest of the associations
ants that are known to cause cancer in animal with the Bradford Hill criteria with respect to the
models are likely to cause cancer in humans. This environmental causation of cancer. The reasons for
is indeed the basis of animal testing for the licensing the lack of specificity are clear; humans are exposed
of pharmaceutical and agrochemical agents, and to diffuse and multiple influences, including radia-
therefore the analogy is widely accepted by tion, chemicals, lifestyle choices, and changes in
governments. dietary habits. Epidemiology, one of the main
5. Coherence with biological background and tools for making specific associations in human
previous knowledge: We know that there has been disease, is a relatively weak tool when a mixture is
environmental contamination with bioactive sub- the causal factor under examination, given also that
stances which, prior to their invention and produc- there is very little fetal exposure data. This is par-
tion, had never been part of the environment. ticularly true if the effect is to alter the frequency in
Animal detoxification systems are generally well the population of a common condition. The power
adapted to natural biochemicals that they have of epidemiological studies is further weakened by
coevolved with, which is often not the case with the fact that there are no true control groups, that is,
novel xeno-chemicals. Bioaccumulation is an everybody has some degree of contamination with
example of such an incompatibility. In addition, such pollutants. Therefore, although this criterion
the effects of low-dose endocrine-disrupting cannot be met, there are perfectly good reasons for
chemicals on development, a newly discovered understanding why it cannot be met, and they
mechanism, have to be considered in the context should not necessarily be used as a reason for
of non-genotoxic mechanisms of carcinogenesis. avoiding the use of precaution. However, opponents
There is considerable prior biological knowledge of the theory of the environmental etiology of can-
to support a coherent theory for such exposures to cer adopt this very lack of specificity as the main
be able to lead to cancer. The causation of cancer plank of their arguments.
by environmental pollutants is in keeping with this It appears therefore that five out of the eight
criterion. Bradford Hill criteria support the argument that
6. Biological gradient (dose-response relationship): environmental pollutants are causally linked to the
Dose-response relationships to chemical pollutants rise in human cancer incidence. Of the remaining
have been demonstrated in animal studies, though three criteria, one offers some evidence, and the
in the case of EDS they frequently are non- other two are weak. However, the reasons for this
monotonic. Data on humans is weak, with many weakness are perfectly understandable, and the
gaps in knowledge. There are many confounding inability to detect any effect does not preclude its
factors, such as the complexity of the mixture to presence but is more likely a commentary on the
which the population is exposed. Dose-response unsatisfactory nature of the investigatory tools that
relationships to radiation exposure are strong and we have at our disposal.
well documented.
7. Strength of association: This is generally a weak Conclusions
criterion for general chemical pollution. Exceptions • The incidence of cancer in preindustrial societies
are, for example, the relationship between asbestos was low.
C 178 Cancer and Environmental Influences
• Since the beginning of the industrial revolution, Doll R, Peto R (1981) The causes of cancer. Oxford University
there has been an inexorable rise in human cancer Press, Oxford
Hardell L, Eriksson M (2003) Is the decline of the increasing
incidence, which is still continuing. incidence of non-Hodgkin lymphoma in Sweden and other
• The fact that environmental rather than genetic countries a result of cancer preventive measures? Environ
influences predominate in the causation of cancer Health Perspect 111:1704–1706
has been shown through human twinning studies. Hoffman FL (1923) Cancer and civilization, speech to Belgian
National Cancer Congress at Brussels (cited by Stefansson)
• Low-dose exposure to cell signaling disrupting Howard CV, Newby JA (2004) Could the increase in cancer
chemicals during windows of vulnerability in the incidence be related to recent environmental changes?
fetal and infant stages of development, resulting in In: Nicolopoulou-Stamati P et al (eds), Cancer as an
tissue dysgenesis, appears to be an important influ- Environmental Disease. Kluwer Academic Publishers,
pp 171–199
ence in adult cancer formation. Irigaray P, Newby JA, Clapp R, Hardell L, Howard CV,
• Cancer is a disease associated with industrializa- Montagnier L, Epstein S, Belpomme D (2007) Lifestyle-
tion. It is likely that there is a predominantly epige- related factors and environmental agents causing cancer: an
netic (non-genotoxic mechanism) influence; this overview. Biomed Pharmacother 61:640–658
Kaufman DG, Arnold JT (1996) Stromal-epithelial interactions
means that the process should be reversible if the in the normal and neoplastic development. In: Sirica AE (ed)
causative agents can be removed from the Cell and molecular pathogenesis. Lippincott-Raven, Phila-
environment. delphia, pp 403–432
• The proportion of cancers caused by “lifestyle fac- Lichtenstein P, Holm NV, Vrkasalo PK (2000) Environmental
and heritable factors in the causation of cancer. N Engl J Med
tors” as opposed to “external factors” remains to be 342:78–85
determined. However, the current state of knowl- Markey CM, Luque EH, Munoz De Toro M, Sonnenschein C,
edge indicates that the statement by the late Sir Soto AM (2001) In utero exposure to bisphenol A alters the
Richard Doll that only 1–4% of cancers can be development and tissue organization of the mouse mammary
gland. Biol Reprod 65:1215–1223
attributed to environmental pollution is a gross Murk AJ, Legler J, Denison MS, Giesy JP, van de Guchte C,
underestimate. Even using Doll’s estimate, the Brouwer A (1996) Chemical-activated luciferase gene
basis of which was never adequately explained expression (CALUX): a novel in vitro bioassay for Ah recep-
would mean that many millions of cancers are tor active compounds in sediments and pore water. Fundam
Appl Toxicol 33(1):149–160
already attributable to environmental chemical pol- Murray TJ, Maffini MV, Ucci AA, Sonnenschein C, Soto AM
lution. The likelihood is that a much higher propor- (2007) Induction of mammary gland ductal hyperplasia and
tion of human cancers, maybe a majority, is carcinoma in situ following fetal bisphenol A exposure.
attributable to external factors. Reprod Toxicol 23:383–390
Newby JA, Busby CC, Howard CV, Platt MJ (2007) The cancer
incidence temporality index: an index to show the temporal
changes in the age of onset of overall and specific cancer
Cross-References (England and Wales, 1971–1999). Biomed Pharmacother
61:623–630
Nicolopoulou-Stamati N, Howard CV, Gaudet BAJ (2004)
▶ Neoplasia Re-evaluation of priorities in addressing the cancer issue:
consclusions, strategies, prospects. In: Nicolopoulou-Stamati
P et al (eds) Cancer as an Environmental Disease. Kluwer
Academic Publishers, pp 171–199
References Skakkebaek NE, Meyts ER-D, Main KM (2001) Testicular dys-
genesis syndrome: an increasingly common developmental
Axelrad JC, Howard CV, McLean WG (2002) Interactions disorder with environmental impacts. Hum Reprod
between pesticides and components of pesticide formulations 16:972–978
in an in vitro neurotoxicity test. Toxicology 173:259–268 Sonnenschein C, Soto AM (1999) The society of cells. Taylor &
Bradford-Hill A (1966) The environment and disease: Francis, London, UK
association or causation? Proc R Soc Med 58:295 Soto AM, Fernandez MF, Luizzi MF, Oles Karasko AS,
Czene K, Lichtenstein P, Hemminki K (2002) Environmental Sonnenschein C (1997) Developing a marker of exposure to
and heritable causes of cancer among 9.6 million individuals xenoestrogen mixtures in human serum. Environ Health
in the Swedish Family-Cancer Database. Int J Cancer Perspect 105:647–654
99:260–266 Stefansson V (1960) Cancer: disease of civilization? An anthro-
Diamanti-Kandarakis E et al (2009) Endocrine-disrupting pological and historical study. Hill and Wang, New York
chemicals: an endocrine society scientific statement. Endocr Vetter W, Luckas B (1995) Theoretical aspects of
Rev 30(4):293–342 polychlorinated bornanes and the composition of toxaphene
Cancer Drug Discovery 179 C
in technical mixtures and environmental samples. Sci Total disease whose pathogenesis involves genetic muta-
Environ 160–161:505–510 tions as well as epigenetic changes, leading to altered
Vetter W, Scherer G (1998) Variety, structures, GC properties,
and persistence of compounds of technical toxaphene gene expression, signaling, and metabolic activity. The
(CTTs). Chemosphere 37:2525–2543 enormous individual variation in these changes adds to
the challenge of developing effective targeted drugs
for even a single known cancer type. Currently, the
most common drug treatment for cancer is through the C
Cancer Cell Cycle use of low-specificity cytotoxic chemotherapy drugs
that affect different aspects of cellular function and
▶ Cell Cycle, Cancer Cell Cycle and Oncogene whose effects are particularly damaging to fast-
Addiction dividing cells. In many cases, tumor cells either are
or become resistant to such drugs, resulting in limited
efficacy. Another, more recent approach has been to
develop drugs that target abnormalities specific to
Cancer Cenes
tumor cells, such as mutations in specific receptors.
Several such drugs are in use that affect intracellular
▶ Cancer Networks
signaling pathways altered in cancer (see, e.g.,
(Alvarez 2010)). They take advantage of our increased
knowledge about the link between genetic lesions and
Cancer Drug Discovery the capabilities normal cells acquire as they undergo
the transition to become malignant cells.
Reinhard Laubenbacher Two relatively recent developments in biology
Virginia Bioinformatics Institute, Virginia Tech, research promise a paradigm change in cancer drug
Blacksburg, VA, USA development: systems biology and cheap high-
throughput sequencing. The fundamental relevance
of systems biology to the understanding and treatment
Definition of cancer is the insight that genes and proteins do not
act in isolation, but rather as nodes in complex inter-
Development of anticancer drugs provides a crucial active networks that include multiple feedback mech-
component of the repertoire of therapeutic interven- anisms and redundancies. The design of effective
tions, in addition to surgery and radiation therapy. drugs to battle cancer will depend on the understanding
While existing drugs have led to important treatment of these networks and of the specific network alter-
advances, much remains to be done to discover new ations present in an individual tumor (Goodarzi et al.
drugs that effectively target key molecular mechanisms 2009; Erler and Linding 2010; Klipp et al. 2010). This
contributing to cancer pathogenesis. This is only possi- will allow the discovery of tailor-made combination
ble through an understanding of the complex nonlinear therapies that target specific network perturbations.
molecular networks that mediate drug actions and are High-throughput sequencing has made possible the
responsible for the de novo or acquired resistance of identification of the genetic changes in an individual
tumor cells to many existing drugs. The field of systems patient, which provide crucial information about how
biology is focused on precisely such networks and to target the network. Thus, systems biology is affect-
therefore holds the promise of changing the paradigm ing a paradigm shift away from single proteins as the
of drug discovery from one focused on individual pro- target of drugs toward the network as a target (Baggs
tein targets to one that targets networks instead. et al. 2010). The result will be what has been called
systems pharmacology (Yang et al. 2010), a systems
biology approach to drug design. Its hallmark will be
Characteristics combination therapy, guided by an analysis of the
complex signaling networks involved in malignant
As our understanding of the molecular basis of cancer changes (Wu et al. 2010). Mathematical modeling
grows, a picture emerges of a complex multi-faceted and computer simulation are essential tools both for
C 180 Cancer Drug Discovery
the understanding of the nonlinear dynamic networks (EGFR) or 2 (HER2) the ErbB receptor is activated,
involved, and for the study of effective control mech- followed by a cascade of phosphorylation events that
anisms (Feala et al. 2010). regulate apoptosis and cellular proliferation pathways,
The two case studies below illustrate the systems mediated by cyclin-dependent kinases. The drug
biology approach to cancer drug design. The first, trastuzumab, approved by the U.S. Food and Drug
focused on breast cancer, shows how one can use Administration in 1998, is a recombinant humanized
systems biology to address the limited efficacy of monoclonal antibody that binds with high affinity to
existing drugs by identifying new additional targets the extracellular domain of the HER2 receptor, and is
that can be used to amplify their effect. The second, applied in patients that exhibit HER2 overexpression.
focused on melanoma, shows how an understanding of However, at least two thirds of patients are de novo
signaling networks can lead the way to appropriate resistant to the effects of this drug. One possible expla-
choices for combination therapy. nation for this resistance on the cellular level could be
that cancer cells are able to exploit the redundancy of
CASE STUDY 1: Breast cancer drugs the signaling network between EGFR and the proteins
There are currently three types of drug therapy avail- controlling cell cycle progression, primarily the reti-
able for breast cancer patients: conventional cytotoxic noblastoma protein pRB, to avoid cell cycle arrest.
chemotherapy, hormone-based therapy, and targeted Thus, additional targets within this network need to
therapy based on monoclonal antibodies (Alvarez be identified in order to overcome resistance. In other
2010) The first two drug types are systemic, in the words, instead of individual proteins, the network has
sense that their effects are not specifically restricted to become the target.
to cancer cells. For instance, cytotoxic chemotherapy In (Sahin et al. 2009) this is done through the use of
drugs inhibit cell division in some way or target DNA a dynamic mathematical model of the network.
repair mechanisms or interfere with metabolism, and A computer implementation of the model enables sim-
are particularly damaging to fast-dividing cells. While ulation studies that can explore the effect of targeting
this includes cancer cells, in particular small, fast- other proteins in the network for their therapeutic
growing tumors, it also includes normal cells in, e.g., potential. New potential targets were identified,
the bone marrow, intestinal tract, and hair follicles. through in silico experiments that were then confirmed
What fundamentally distinguishes targeted therapy in the laboratory or the literature, including cyclin D1
drugs from the others is that they are based on cellular and CDK4, as well as transcription factors c-MYC and
abnormalities specific to some cancer cells, and ER-a. These could be exploited either instead of or in
thereby represent an entirely new and more sophisti- combination with HER2 to attack the entire network
cated class of chemotherapeutics. For breast cancer structure.
cells, an important molecular target that has been iden-
tified is the human epidermal growth factor receptor 2 CASE STUDY 2: Melanoma drugs
(HER2). Systems biology can play a key role in iden- Melanoma is a very aggressive cancer with few effec-
tifying such targets and understanding the network of tive treatment options. However, a recent increased
interconnected pathways that propagate their signals understanding of the genetic events underlying mela-
and their redundancy that helps cancer cells to develop noma pathogenesis and the signaling pathways
resistance to drugs that interfere with their action. The affected by mutations common in melanoma has
example of the drug trastuzumab that targets the HER2 opened the door to a systems biology approach to the
pathway and its role in controlling the cell cycle is discovery of targeted therapeutics. The review (Ko and
instructive. Fisher 2010) surveys some of these developments and
Members of a family of receptor tyrosine kinases, makes clear the need for combination therapies that
the ErbB family, are frequently overexpressed in epi- target the entire signaling network rather than individ-
thelial tumors and promote tumor cell proliferation in ual pathways, similar to the case of breast cancer
several cancers, including breast cancer. Four homol- described above.
ogous members of the HER receptor family have been Several signaling cascades have been discovered as
identified to date, HER1-4 (also known as ErbB1-4). important in melanoma. One is the RAS-RAF-MAPK-
After ligand binding to ErbB family members 1 ERK growth factor pathway. In virtual all human
Cancer Networks 181 C
melanoma cases this signaling cascade has been altered ▶ c-Myc
in some way, in particular through NRAS or BRAF ▶ Interaction Networks
mutations. Several agents targeting proteins in this ▶ Network-Based Biomarkers
pathway are available, such as the BRAF inhibitor ▶ Signal Transduction Pathway
PLX4032, currently in clinical trials. Another pathway, ▶ Systems Medicine
also subject to RAS signaling, is the PI3K-AKT-mTOR ▶ Systems Pharmacology, Drug-Target Networks
pathway, whose activation has the effect of suppressing C
apoptosis. Several small molecule inhibitors are avail-
able that target mTOR signaling. It is thought that none References
of the actors in these pathways stand alone in the
pathogenesis of melanoma, and it is therefore likely Alvarez RH (2010) Present and future evolution of
advanced breast cancer therapy. Breast Cancer Res
that no single-agent approach to this disease will be
12(Suppl 2):S1
successful in affecting a cure. The signaling network Baggs JE, Hughes ME et al (2010) The network as the target.
including the proteins in the two pathways mentioned Wiley Interdiscip Rev Syst Biol Med 2(2):127–133
here is complex, with feedback loops and redundant Erler JT, Linding R (2010) Network-based drugs and
biomarkers. J Pathol 220(2):290–296
pathways, thus, combination therapies targeting several
Feala JD, Cortes J et al (2010) Systems approaches and
network nodes at the same time hold the most promise algorithms for discovery of combinatorial therapies. Wiley
for success. Here too, detailed computer models of the Interdiscip Rev Syst Biol Med 2(2):181–193
network are essential tools for the discovery of effec- Goodarzi H, Elemento O et al (2009) Revealing global
regulatory perturbations across human cancers. Mol Cell
tive drug combinations. Probably, the computer model
36(5):900–911
will ultimately have to be enlarged to also take into Klipp E, Wade RC et al (2010) Biochemical network-
account other apoptosis pathways as well as growth based drug-target prediction. Curr Opin Biotechnol
factor signaling. 21(4):511–516
Ko JM, Fisher DE (2010) A new era: melanoma genetics and
therapeutics. J Pathol 223(2):241–250
Sahin O, Frohlich H et al (2009) Modeling ERBB receptor-
Discussion regulated G1/S transition to find novel targets for de novo
trastuzumab resistance. BMC Syst Biol 3:1
Wu Z, Zhao XM et al (2010) A systems biology approach to
The experience with cancer drugs that target individual
identify effective cocktail drugs. BMC Syst Biol
proteins suggests that combination therapies have the 4(Suppl 2):S7
greatest promise of being effective. To identify the Yang R, Niepel M et al (2010) Dissecting variability in responses
right combination of targets for a particular tumor it to cancer chemotherapy through systems pharmacology. Clin
Pharmacol Ther 88(1):34–38
is essential to understand the signaling network that is
perturbed through mutations and that mediates the
drugs’ action. It is the aim of systems biology to
provide the link between an organism’s genomic
sequence and its phenotype through an understanding Cancer Networks
of the various dynamic networks that connect the two.
Thus, a systems biology approach to drug discovery Eric Werner
and design of combinatorial therapies is indispensable. Department of Physiology, Anatomy and Genetics,
University of Oxford, Oxford, UK
Department of Computer Science, University of
Cross-References Oxford, Oxford, UK
Oxford Advanced Research Foundation, Fort Myers,
▶ Adult T-Cell Leukemia FL, USA
▶ Apoptosis
▶ Cdk Inhibitors
▶ Cell Cycle Checkpoints Synonyms
▶ Cell Cycle, Cancer Cell Cycle and Oncogene
Addiction Cancer cenes; Cenes; Developmental cancer networks
C 182 Cancer Networks
Definition a1
Φ2 → A2 Φ1 → A1 B C
a1 C
b
Cancer Networks, Fig. 2 A cancer meta-stem cell network is the conditions Fi may be locked into being satisfied, so that we
a second-order geometric network. It consists of a second-order have endless cell proliferation. In an ideal space with no resis-
loop at A2 linked to a first-order loop at A1 (Werner 2011b). tance, synchronous divisions, and when both conditions Fi are
A cell in state A2 is a cancer meta-stem cell that generates satisfied, this network generates a triangle
first-order linear cancer stem cells. In cancer, one or both of
ak a k−1 a1
Limited Exponential Growth with Geometric tumor formed by a third-order stem cell, we will see
Networks second-order stem cells and first-order stem cells
When the number of rounds of division n is less than with a majority being terminal or terminal progenitor
the number of loops k, i.e., 0 n k, then a geometric cells (Fig. 4).
stem network exhibits exponential growth since the First-order cancer stem cells are slow growing with
following holds: linear proliferation rates and can be relatively harmless
if there is no stochastic dedifferentiation. A second-
! ! ! ! !
X
n n n n n n order stem cell tumor grows more quickly (adding the
¼ þ þ þ ... þ triangular number) with an ideal proliferation rate
i 0 1 2 n
i¼0
given by the following formula, given n > 0:
¼2 n
(3)
nðn 1Þ
Cellsðn; 2Þ ¼ 1 þ n þ (4)
2
Metastatic Hierarchy
The stem cell network hierarchy generates a hierarchy A third-order stem cell tumor grows even more
of cancer metastases (Werner 2011b). A first-order quickly (adding the tetrahedral number) with an ideal
cancer stem cell generates terminal cells. proliferation rate as follows:
A second-order cancer stem cell generates first-order
stem cells which then generate terminal progenitor nðn 1Þ
Cellsðn; 3Þ ¼ 1 þ n þ
cells. A third-order stem cell generates second-order 2
stem cells which generate first-order stem cells which nðn 1Þðn 2Þ
þ (5)
generate terminal progenitor cells. Hence, in a given 6
C 184 Cancer Networks
a1
A B C
a2
Cancer Pathology,
Fig. 1 Classification scheme
for identification of masses or Tumor (mass of tissue)
“tumors”
Benign Malignant
“Cancer”
Neoplasia
Neural crest cells give rise to pigmented melanin- germ cells may give rise to tumors of embryonic
producing cells called melanocytes. A benign tumor of origin such as teratomas, choriocarcinomas, or even
melanocytes is called a “nevus” or melanocytoma. yolk sac tumors.
A malignant melanocytic cancer is called a melanoma
or malignant melanoma. Hematopoietic tumors
Special consideration and nomenclature is also Hematopoietic tumors are also referred to as “liquid
used for tumors of specialized tissue such as the tumors” and are of blood-based origin. This category
nervous system and the gonads. In the brain, tumors includes cells that make up the immune white blood
may be classified as neuroepithelial and non- cell system such as the lymphocytes and granulocytes
neuroepithelial tumors and have special stains to dif- and red blood cell systems such as erythrocytes and
ferentiate their origin. In the gonads, the totipotent megakaryocytes. Terms such as hyperplasia,
Cancer Pathology 187 C
Cancer Pathology, Table 1 Examples of tumor nomenclature by tissue of origin and behavior
Benign Malignant
Epithelial Adenoma Carcinoma
Papilloma
Polyp
Of tissue Skin Papilloma Squamous cell carcinoma
Basal cell carcinoma C
Gastrointestinal tract Polyp or adenoma Adenocarcinoma
Bladder Polyp Transitional cell carcinoma
Kidney Oncocytoma Renal cell carcinoma
Clear cell variant
Papillary carcinoma
Juxtaglomerlar Tumor
Renal pelvis squamous cell carcinoma
Liver hepatocytes Hepatoma Hepatocellular carcinoma
Liver bile duct Cholangioma Cholangiocarcinoma
Mesenchymal “oma” Sarcoma
Of tissue Connective tissue Fibroma Fibrosarcoma
Smooth muscle Leiomyoma Leiomyosarcoma
Skeletal muscle Rhabdomyoma Rhadomyosarcoma
Endothelial cells Hemangioma/angioma Hemangiosarcoma or angiosarcoma
Cartilage Chondroma Chondrosarcoma
Bone Osteoma Osteosarcoma
CNS Neuroepithelial
Astrocytes Astrocyoma Anaplastic astrocytoma
Glial cells Glioblastoma multiforme
Oligodendrocytes Oligodendroglioma Anaplastic olgiodendroglioma
Ependymal cells Ependymoma Anaplastic epednymoma
Choriod plexus Papilloma Carcinoma
Embryonic origin Medulloepithelioma
dysplasia, and neoplasia are similarly used to define or “trimming” the mold with the tissue; and then plac-
disorders of altered architecture of these lineages as ing the tissue on a glass slide. The tissue will then be
well. Cancer of lymphoid organs is termed lym- stained for visual examination. Preservatives include
phoma. Cancer of circulating bone marrow cells is 10% buffered formalin, which is the most common
termed leukemia. routinely used tissue preservative. Other preservatives
that may be used are methanol or 80% alcohol, 2% or
Diagnosis 4% paraformaldehyde, glutaraldehyde, and “special”
The diagnosis of a mass is based on the gross appear- fixatives such as Bouin’s fixative, which includes
ance and behavior of the tumor and the histological picric acid. Tissues may also be frozen in a container
appearance (Fig. 3). in liquid nitrogen. The tissue is then stained –
For microscopic assessment, the tumor is sampled hemotoxylin and eosin (or H&E) staining is most com-
during an in-life surgical procedure called a biopsy or monly used on a routine basis.
after death as an autopsy. The preparation of the tumor The determination of whether the mass is hyper-
includes fixation of the mass in a preservative; plastic, dysplastic, or neoplastic, of epithelial, mes-
processing the tissue through a series of alcohol dehy- enchymal, or mixed (or “other”) origin, and whether
dration steps to remove water and replace it with it has features of a benign or malignant tumor is based
a transparent harder material which is usually wax on the microscopic pattern using criteria including
paraffin; embedding the tissue in a mold; sectioning the tissue of origin, level of organization, or
C 188 Cancer Pathology
Cancer Pathology, Fig. 3 (a) Photograph of liver with a tan irregularly nodular and cavitated mass. A section of mass is dissected,
fixed in a preservative (formalin), and processed to make a tissue section and stained for microscopic examination. (b) Photomicro-
graph of a tissue section of the mass in (a) stained with hemotoxylin and eosin (H&E). The pattern and location is consistent with
a diagnosis of cholangiocellular carcinoma, a neoplasm arising from the biliary epithelium
Cancer Pathology,
Fig. 4 Photographs of
examples of (a) normal
(arrow) and hyperplastic (*)
and (b) dysplastic squamous
epithelium of mucosa; (c)
Normal mucosa intestinal
epithelium compared to (c)
adenoma or polyp
characterized by disorganized
proliferation of epithelium
supported by a fibrovascular
stalk of connective tissue; and
(d). Adenocarcinoma
characterized by disorganized
proliferation of pleomorphic
epithelium cells invading into
the subjacent submucosa and
muscular wall of the intestine
(box) and (e) higher
magnification of invading
carcinoma
differentiation and level of containment (Fig. 4). The tissue, and whether the pattern most closely resem-
mass is identified and described as to whether it has bling the tissue or cell of origin. For example, neo-
a connective tissue capsule or not, whether it is plastic epithelial cells tend to arrange as glands of
compressing adjacent tissue or invading adjacent acini or tubules, nests or islands. Neoplastic
Cancer Pathology 189 C
Cancer Pathology,
Fig. 5 Photograph of
a section of H&E stained
breast tissue diagnosed as
carcinoma in situ (a) with
intraductular but not invasive
masses of disorganized
pleomorphic epithelial cells
that are confined to the C
basement membrane boxed
and shown in (b)
Cancer Pathology,
Fig. 6 Photographs of
biopsies from (a) H&E stained
section of a poorly
differentiated neoplastic
cancer confirmed to be
a carcinoma based on
identification of (b) positive
brown cytoplasmic staining in
a section stained for
cytokeratin intermediate
filaments using
immunohistochemistry; (c)
H&E stained section of
a intersecting bundles of
spindle-shaped cells
confirmed to be a sarcoma
based on identification of (d)
positive brown cytoplasmic
staining in a section stained for
vimentin intermediate
filaments using
immunohistochemistry
Cancer Pathology,
Fig. 7 Photograph of a H&E
stained biopsy from an
enlarged lymph node
consistent with lymphoma and
confirmation of subtype of
a B-cell origin using
immunohistochemistry with
antibodies to CD79a, which is
expressed on B-cells, or CD3,
which is expressed on T-cells
but not B-cells
The stage of tumor is based on the size of the Metastasis followed by a numerical assignment for
primary tumor and the extent and spread to each category. The designations are: T1-T4; N1-N3;
regional lymph nodes or blood-borne metastasis. and M1, 2.
The staging of tumor is designated by the abbrevia- For carcinoma in situ, the stage would be designated
tion “TNM” for Tumor size, regional Node, and as T0.
Cancer Resources 191 C
References Cesario and Marcus (2011) on cancer systems biology,
bioinformatics, and medicine. This book describes
Rosai J (2011) Rosai and Akerman’s surgical pathology, systems approaches to cancer research and spans
10th edn. Mosby Elsevier, Edinburgh
a variety of topics from laboratory to clinical aspects
Striker T, Kumar V (2009) Neoplasia. In: Aster J, Abbas A,
Kumar V, Fausto N (eds) Robbins and Cotran pathological of cancer. In some publications (e.g., Yan 2010),
basis of disease, 8th edn. Saunders Elsevier, Philadelphia, cancer systems biology is included as a major book
pp 259–330 chapter. C
World Health Organization (WHO) Classification of
Several journals (e.g., BMC Systems Biology,
Tumours. Published by Interational Agency for Research
on Cancer. Cancer Discovery) provide information on latest
results about systems biology and cancer research.
BMC Systems Biology (http://www.biomedcentral.
com/bmcsystbiol/) is an open access journal published
Cancer Resources by BioMed Central. It includes original peer-reviewed
research articles on the functions of biological sys-
Jingky Lozano-K€uhne tems. The Cancer Discovery journal (http://
Department of Public Health, University of Oxford, cancerdiscovery.aacrjournals.org/) is a new cancer
Oxford, UK information resource that also publishes review arti-
cles aside from articles on major advances in cancer
research and clinical trials. Monographs and published
articles about cancer research are also provided by the
Definition International Agency for Research on Cancer (IARC)
which is part of the World Health Organization. Pub-
▶ Cancer resources provide information about cancer, lications of the IARC can be freely downloaded from
researches and clinical trials, and tools in cancer http://www.iarc.fr/.
research and analysis. Other cancer resources that are One comprehensive online resource on cancer is the
intended for patients and caregivers provide informa- Website of the National Cancer Institute (NCI, http://
tion on cancer pain management, support groups, www.cancer.gov/) which provides information on the
financial assistance, and palliative care. The selected different types of cancer, clinical trials, and cancer
cancer resources described here focus on resources for statistics. Beyond information, NCI also provides
health professionals, researchers, and students in the funding and support to cancer researches and
field of systems biology. These resources are available also conducts its own laboratory and clinical studies.
in different formats. There are printed publications, Another useful Website is the Cancer Systems
online references, databases, and also software pack- Biology Resource of the University of Utah (https://
ages for cancer data management and research cancersystemsbiology.utah.edu/) which provides links
analysis. to topics such as basic gene and protein information,
▶ Pathways, tissue distribution and ▶ Gene Expres-
sion information among others.
Characteristics There are numerous printed and online publications
on cancer in addition to the resources described above.
Printed Publications and Online References Many of them share common references and links as
Basic information about cancer can be found in many the ones already mentioned. One can also consult the
medical textbooks and other references. As a reference ▶ Disease Databases and specific cancer (e.g., http://
specific on ▶ Cancer Systems Biology, the book of www.breastcancer.org/) sites for searching additional
Wang (2010) provides not only basic concepts on details about cancer.
cancer and systems biology but also applications,
information on software tools, data resources, and Cancer Databases
a significant collection of research updates. This is Cancer databases may contain collated information
the first book written specially for computational and about different types of cancers, cancer genes, or
experimental biologists. Another reference is that of other clinical data related to cancer. As data sources,
C 192 Cancer Resources
they are useful especially for conducting research on while the Omics World provides links to resources on
analysis methods. One commonly used cancer genomics, genetics, experimental protocols, reagents,
database is the Cancer Chromosomes Database which instruments, and also software.
will be soon part of the dbVar database, a database of There are more data sources available online for
genomic structural variation. Both databases are cancer research. An attempt to collate and interconnect
maintained by the United States National Center for all these data was done by the National Cancer Institute
Biotechnology Information (NCBI). The Cancer (NCI) through the cancer Biomedical Informatics Grid
Chromosome Database is integrated into the (caBIG®, https://cabig.nci.nih.gov/). It was launched
NCBI ▶ Entrez system and is actually composed of by the National Cancer Institute to create a virtual
three sub-databases containing cytogenetic network of interconnected cancer data and people
(▶ Cytogenetics), clinical, and/or reference working together in cancer research. The caBIG® com-
information. munity Website has tools and facility for data sharing
The National Cancer Institute (NCI) provides and also workspaces where members can conduct vir-
access to cancer data and also tools for its analysis tual activities, discussions, and exchange of ideas.
in their statistics’ web page http://www.cancer.gov/ Membership to the network is open to the public
statistics/tools. For studies on cancer incidence and for free.
population data associated by age, sex, race, year of
cancer diagnosis, and geographic areas, their Surveil- Software for Data Management and Analysis
lance, Epidemiology, and End Results (SEER, NCI Many of the cancer data providers previously
2005) data are commonly used. For data that are mentioned also provide data management and analysis
more specific to genes and pathways, the data portal tools. For example, the SEER Website offers analysis
of the International Cancer Genome Consortium software for the SEER data and other cancer-related
(ICGC, http://www.icgc.org/) would be a useful databases for studying the impact of cancer in the
source. The Cancer Genome Atlas (TCGA, http:// population. The ICGC Data Portal also runs a freely
cancergenome.nih.gov/) also provides a data portal available software called BioMart which is used for
for researchers on cancer. The data available at data mining. The caBIG® Website offers quite
TCGA contains clinical information, ▶ Histopathol- a comprehensive collection of tools for cancer research
ogy slide images, and molecular information of high- (NCI 2009). One of the commonly availed tools in
quality tumor samples. For pathway analysis, the caBIG® is the caArray which is used for microarray
▶ KEGG Pathway database is a useful resource. For (▶ DNA Microarrays) data management system. The
interests in mutation data and sequencing (▶ DNA caIntegrator is another tool that allows users to set up
Sequencing) data, the COSMIC database (Forbes web portals for integrative search. For integrated
et al. 2008) which can be downloaded from http:// genomics, the geWorkbench tools can be used for
www.sanger.ac.uk/genetics/CGP/cosmic/ is a good visualization and analysis of gene expression and
reference. The Website also provides other cancer- sequence data. One can also find tools in caBIG® for
related data and information. For other data on molecular analysis, cancer genome-wide association
mutations, one can also check the SH2base site scan (▶ Genome-wide Association Study), clinical
(http://bioinf.uta.fi/SH2base/). trials management, and many more. Currently, more
Omics data or data in the field of biology ending in than 40 tools are available in caBIG®.
“–omics” (e.g., ▶ Pharmacogenomics, ▶ Proteomics) Other software packages and tools for specialized
are also commonly used in the study of cancer and analysis are also available online. Vital to cancer
systems biology. Omics data enable the study of com- systems biology research are computational platforms
plex pathways in cancer etiology. Example of online such as software for simulation, analysis, and method-
resources that provide information on omics data are ologies for modeling. For visualization, mining, anal-
the Biodatabase (http://biodatabase.org/) and the ysis, and modeling of biological networks, an
Omics World Websites (http://www.omicsworld. integrated software platform called VisANT
com/). The Biodatabase contains links to cancer gene (Hu et al. 2009) can be accessed at http://visant.bu.
databases, protein, genomic databases, and others, edu/. Additional resources on computational platforms
Cancer Stem Cell Kinetics 193 C
are also accessible at the Website of the Systems Language (SBML): a medium for representation and
Biology Institute (SBI, http://www.sbi.jp/index.htm) exchange of biochemical network models. Bioinformatics
19:524–531
based in Japan. SBI is a nonprofit private research National Cancer Institute (NCI) (2005) SEER surveillance, epi-
institution that promotes systems biology research demiology and end results program. National Institutes of
and its application to medicine and global sustainabil- Health, Bethesda
ity. They have played a major part in the development National Cancer Institute (NCI) (2009) The cancer biomedical
informatics grid caBIG resource guide. National Institutes of
of the Systems Biology Markup Language (SBML, Health, Bethesda C
Hucka et al. 2003) which is used in software packages Wang E (ed) (2010) Cancer systems biology, CRC mathematical
for graphical editing and simulation of molecular and computational biology series. CRC Press, Boca Raton
networks. Yan Q (ed) (2010) Systems biology in drug discovery and
development: methods and protocols. Humana Press/
One software which uses SBML is the CellDesigner Springer, New York
(http://celldesigner.org/), a structured diagram
editor for drawing gene regulatory and biochemical
networks (▶ Gene Regulatory Networks).
and practical shift among the supporters of the SMT Enderling H, Hahnfeldt P (2011) Cancer stem cells in solid
who adopted concepts of the TOFT, shared by other tumors: is ‘evading apoptosis’ a hallmark of cancer? Prog
Biophys Mol Biol 106:391–399
researchers (Xu et al. 2009). For example, for those Hahn WC, Weinberg RA (2003) Rules for making human tumor
who are committed to explaining carcinogenesis cells. N Engl J Med 347:1593–1603
through a sub-cellular (mutational) strategy, the causal Hanahan D, Weinberg RA (2011) Hallmarks of cancer: the next
role of the ▶ stroma in carcinogenesis is transformed generation. Cell 144:646–674
Marcum JA (2005) Metaphysical presuppositions and scientific
into the problem of how mutated genes can affect the practices: reductionism and organicism in cancer research.
interactions of the cancer cells harboring these genes Int Stud Philos Sci 19:31–45
with otherwise normal neighboring cells. This Moss L (2003) What genes can’t do. MIT Press, Cambridge, MA
“hybrid” theory has been criticized on the grounds Sonnenschein C, Soto AM (2008) Semin Cancer
Biol 18:372–377
that the premises of the SMT and the TOFT contradict Soto AM, Sonnenschein C (2012) Is systems biology
each other (Soto and Sonnenschein 2012). a promising approach to resolve controversies in cancer
research? Cancer Cell Int 12:12
Systems Biology, Heuristics and Theory Testing Xu R, Boudreau A, Bissell MJ (2009) Tissue architecture and
function: dynamic reciprocity via extra- and intra-cellular
Systems Biology offers an opportunity to explore the matrices. Cancer Metastasis Rev 28:167–176
▶ heuristic value and usefulness of the two main the-
ories of carcinogenesis. Mathematical modeling and
computer simulation enables researchers the boldness
to choose premises and temporarily reject data sets Cancer-initiating Cells
without having to commit prematurely to a program
of expensive and time-consuming ‘wet’ experiments. ▶ Cancer Stem Cell Kinetics
This exploratory role of Systems Biology may become
central to breaking the habit of fixing lacks of fit with
ad hoc explanations instead of taking a bold and
critical look at the premises that were adopted Canonical Model
(Enderling and Hahnfeldt 2011; Soto and
Sonnenschein 2012). Eberhard O. Voit
In an ultimate analysis, the passage of time will be The Wallace H. Coulter Department of Biomedical
the final arbiter in deciding which of the two main Engineering, Georgia Institute of Technology and
theories of carcinogenesis has been most useful in Emory University, Atlanta, GA, USA
organizing principles and constructing objectivity by
framing observations and experiments.
Definition
a b c d e
Canonical Network Motifs, Fig. 1 (a) Feed-forward loop; (b) 3-Cycle; (c) single-input motif; (d) multiple-input motif; (e) Bifan
Cross-References
Capacity to Evolve ▶ Cancer Pathology
Case-based Reasoning
Capillary Electrophoresis Mass
Spectrometry–based Metabolomics Eyke H€ullermeier, Thomas Fober and Marco
Mernberger
▶ CE-MS-based Metabolomics Philipps-Universit€at, Marburg, Germany
Definition
Carbon Assimilation
Case-based reasoning (CBR) is a computational prob-
▶ Photosynthesis lem solving methodology offering a principled
approach to knowledge engineering and knowledge-
based systems design. Being rooted in cognitive psy-
chology, CBR essentially seeks to formalize a problem
Carboxyfluorescein Succinimidyl Ester solving paradigm upon the basis of a simple rule of
(CFSE) thumb, namely, that “similar problems tend to have
similar solutions” (Kolodner 1993). More specifically,
▶ Lymphocyte Labeling, Cell Division Investigation the idea of CBR is to store and exploit the experience
from similar problems in the past, and to adapt then
successful solutions to the current situation. Thus,
the core of every case-based problem solver is the
Carcinogens
case base or case library, which is a collection of
memorized “chunks of experience” called cases.
▶ Xenobiotics
Moreover, the concept of similarity plays a key role
in CBR systems. Inference in CBR can be considered
as a specific form of analogical reasoning.
Carcinoma In Situ
RETAIN REUSE
prior
cases
Case Base
confirmed repaired solved suggested
solution case case solution
REVISE
been developed to enable the selective retention of As a means for modeling similarity functions, the so-
cases (Smyth and McKenna 2001). called local–global principle is widely used: The
similarity function is decomposed according to the
Representing and Structuring Knowledge in CBR vocabulary, in such a way that local similarity func-
A unified view of the knowledge contained in tions pertaining to individual attributes are used to
a (structural) CBR application employs the metaphor model the preference according to the corresponding
of a knowledge container, typically referring to four attributes. These local similarities are then aggregated
such containers: the vocabulary, the case base, the into the global similarity by means of an appropriate
similarity measure, and the adaptation knowledge. aggregation function.
The vocabulary (often called ontology today) is the
basis of all knowledge and experience representation
in CBR. The vocabulary defines the information enti- Cross-References
ties and structures (e.g., classes, relations, attributes,
data types) that can be used to represent cases, simi- ▶ Classification
larity measures, and adaptation knowledge. The case ▶ Evolutionary Algorithms
base is the primary form of knowledge in CBR. Tradi- ▶ Identification of Gene Regulatory Networks,
tionally, a case is considered as an instance in the Machine Learning
representation space defined by the vocabulary, for ▶ Knowledge Management
example, a vector in an attribute-value representation. ▶ Model Testing, Machine Learning
From a knowledge representation point of view, ▶ Ontology
vocabulary and case base representations are standard
applications of AI and database technology.
Since cases are selected based on their similarity to
the current problem, the similarity measure used in a References
CBR system encodes important knowledge of the
domain. It is usually formalized in terms of a function Aamodt A, Plaza E (1994) Case-based reasoning: foundational
issues, methodological variations, and system approaches.
that maps a pair of problem descriptions to a real AI Communications 7(1):39–59
number, often restricted to the unit interval, with the Bangpeng Y, Shao L (2010) ANMM4CBR: a case-based rea-
convention that a high value indicates a high similarity. soning method for gene expression data classification. Algo-
The semantics of such similarity degrees can be rithm Mol Biol 5(1):1–11
Bergmann R, Althoff KD, Breen S, G€ oker M, Manago M,
defined via the notions of preference and utility: Traph€oner R, Wess S (2003) Developing industrial case-
A higher degree of similarity suggests that a case based reasoning applications: the INRECA methodology.
is more useful for solving the current problem. Springer, Berlin
Causal Bayesian Networks 205 C
H€ullermeier E (2007) Case-based approximate reasoning. Formally, if one intervenes in a given causal
Springer, Berlin Bayesian network, say by setting a variable Xl to
Jurisica I, Glasgow J (2004) Applications of case-based reason-
ing in molecular biology. Artif Intell Mag 25(1):85–97 a fixed value xl, then the joint distribution is changed to
Kolodner JL (1993) Case-based reasoning. Morgan Kaufmann,
San Mateo Y
n
Lopez Mantaras R, McSherry D, Bridge D, Leake D, Smyth B, pðx1 ; :::; xl1 ; xlþ1 ; :::; xn j^
xl Þ :¼ pðxi jPa½xi Þ
Craw S, Faltings B, Lou Maher M, Cox MT, Forbus K, Keane i¼1
i6¼l C
M, Aamodt A, Watson I (2005) Retrieval, reuse, revision
and retention in case-based reasoning. Knowl Eng Rev
20:215–240 where x^l denotes the intervention in the given variable
Smyth B, McKenna E (2001) Competence models and the main- and thus random variables Xi with Xl as parent use the
tenance problem. Comput Intell 17(2):235–249 fixed value xl instead of the parent variable Xl. The
definition extends to sets of intervention variables.
X 1 ! X2 ! X4
Category Map
& " %
▶ Functor X3 ! X5
possible worlds in which c does not occur is such A tracer allows to track causal influence along
that e does not occur in it either. Causation cannot a “causal line,” which is a world-line consisting of
be directly analyzed in terms of counterfactual a series of contiguous regions of space-time, which
dependence. Such an analysis fails, e.g., in are connected by the transmission of the tracer.
situations of redundant causation, which are fre- A scientist who undertakes uncovering the causal
quent in biology. Assume that a chemical reaction structure of a complex situation may follow two strat-
R can be catalyzed by each of two enzymes, E1 and egies: She may either search for statistical correlations
E2, and take a particular situation in which enzyme among the variables characterizing the situation, or
E1 was active but not E2. In this situation, there is experimentally manipulate these variables. Each of
no counterfactual dependence of R on E1, for even these research strategies – analysis of observations
if E1 had not catalyzed the reaction, E2 would have. and experimentation – provides the central idea of
This type of situation is also a case of “preemption”: a philosophical strategy for analyzing the concept
The influence of E2 on R is preempted by the of causation.
influence of E1. To say that E1 preempts E2 from 3. The main hypothesis of probabilistic analyses of
causing R means that E2 would have caused R in the causality (Eells 1991) is that a variable (or factor,
absence of E1, but that E1 has blocked E2’s influ- or “event” in the language of probability theory)
ence on R. The problem for the counterfactual anal- A causes a variable B if and only if P(B|A) >
ysis is that the presence of the preempted cause E2 P(B|:A). Such theories are usually intended as ana-
removes the counterfactual dependence of R on its lyses of generic causation (definition), as opposed
cause E1. This problem can be solved by allowing to singular causation (definition). However,
that the causal influence from E1 to R runs through P(B|A) > P(B|:A) is insufficient to justify the judg-
a chain of intermediate steps. To cope with other ment that A causes B. It must also be excluded that
problematic situations, the counterfactual analysis A and B are effects of some common cause C,
has been modified in several other ways. which is called a screening factor (definition). Prob-
Furthermore, probabilistic processes have been abilistic analyses can also be adapted to account for
taken into account: What is counterfactually situations in which a factor diminishes the
dependent (directly or indirectly) on the cause is probability of one of its effects: If factor F, which
the probability of the effect, rather than the effect. enhances the risk of a disease M, is positively
Finally, our intuitive concept of causation makes correlated to factor S, which diminishes the risk
a difference between causes and background con- of M, it is possible that P(M|F) ¼ P(M|:F) or even
ditions. This can be made explicit in counterfac- P(M|F) < P(M|:F), because the negative influence
tuals: “c rather than c* caused e rather than e*” of S on M cancels out (first case) or even over-
can be analyzed as “if c* had occurred instead of compensates (second case) the positive influence
c, then e* would have occurred instead of e.” of F on M.
2. Another intuition has it that causation requires the 4. The most important recent analysis of causation is
existence of a continuous process that stretches intended as a model of another fundamental
from cause to effect. Salmon (1984) has combined research strategy for the discovery of causes in
Russell’s (1948) analysis of causal processes in complex situations, as it is used in sciences such
terms of world-lines (as they have been introduced as economics, sociology, epidemiology, or systems
in special relativity by Minkowski diagrams) and biology: To find out whether one variable X has
Reichenbach’s (1991) idea that causal processes are a causal influence on another variable Y in the
characterized by their capacity to transmit marks. system, one holds fixed all variables Z that cause
The paradigm case of this analysis is the propaga- Y but are not caused by X, and then manipulates
tion of a radio signal, which is a local modification X by an intervention (definition) (Woodward 2003).
of a structure spreading along a continuous world- X is a cause of Y if and only if this change of the
line from a sending event to a receiving event. value of X also changes the value of Y. Ideally, this
The use of tracers in biomedical research makes strategy leads to the discovery of the complete
sense in the framework of such process accounts: network of influences among all variables
Causation, Generic and Singular 209 C
characterizing the system. It can be expressed either References
by structural equations, which represent the value of
each variable as a function of all variables that have Quoted in Text
a direct causal influence on it, or equivalently by Eells E (1991) Probabilistic causality. Cambridge University
Press, Cambridge
oriented graphs, in which variables are represented Lewis D (1973) Counterfactuals. Basil Blackwell, Oxford
by nodes and influences by edges connecting these Price H, Corry R (eds) (2007) Causation, physics and the con-
nodes. However, this analysis cannot be used to stitution of reality: russell’s republic revisited. Oxford Uni- C
discover causal influences from scratch: In order versity Press, Oxford
Reichenbach H (1991) The direction of time. University of
to find out whether X causally influences Y, one
California Press, Berkeley
must already presuppose knowledge about which Russell B (1948) Human knowledge: its scope and limits. Allen
other variables influence Y. and Unwin, London, 5th edition, 1966
By requiring that causes of Y which are not effects Salmon W (1984) Scientific explanation and the causal structure
of the world. Princeton University Press, Princeton
of X must be held fixed during the intervention on X,
Woodward J (2003) Making things happen. Oxford University
the problem of preemption (mentioned above) is Press, Oxford
avoided: In the situation sketched above, in which
reaction R is caused by enzyme E1 but could also Recent Books and Overview Articles
have been caused by “back-up” enzyme E2, the value Beebee H, Hitchcock C, Menzies P (eds) (2009) The Oxford
of E2 is set to 0, because E2 does not intervene in the handbook of causation. Oxford University Press, Oxford
Schaffer J (2007) The metaphysics of causation. In: Zalta EN
actual situation. If E2 is “blocked,” then manipulation
(ed) The Stanford encyclopedia of philosophy, http://plato.
of E1 shows that E1 causes R: The value of variable stanford.edu/entries/causation-metaphysics/
R is a function of E1. Sosa E, Tooley M (eds) (1993) Causation. Oxford University
The interventionist analysis also correctly ana- Press, New York
lyzes situations in which causes diminish the proba-
bility of their effect, if a distinction between “total
cause” and “contributing cause” is introduced. If
F enhances the probability of M but also the proba- Causation
bility of S, which in turn diminishes the probability of
M, F may not be a total cause of M, if the positive ▶ Causality
(direct) and negative (indirect, through S) effects of
F on M just cancel out. However, by holding fixed S,
it is possible to discover that F is nevertheless
a contributing cause of M. Causation, Generic and Singular
Initially developed to analyze generic causation
(among variables), the interventionist framework has Max Kistler
recently been adapted to the analysis of token IHPST, Université Paris 1 Panthéon-Sorbonne,
causation: This “actual causation” is modeled by Paris, France
representing influences among particular values of
the variables in the network.
In comparing and evaluating such accounts, one Definition
must bear in mind that philosophical analyses may
pursue different goals: The counterfactual analysis is, Generic causation is a relation between two “factors,”
e.g., intended to be an a priori analysis of our common “properties,” or “events” (in the language of probabil-
sense concept of causation. This implies that it is ity theory) represented by variables. The meaning of
intended to be applicable to physically impossible the statement that smoking (A) (generically) causes
situations, such as fictions describing magical lung cancer (B) can be analyzed according to several
interactions. Process accounts have no such ambition. proposals: In the probabilistic framework, it means
Their aim is to discover the “real essence” of causality that, within a given population, belonging to set
as it is in the real world. A raises the probability of belonging to set B. In the
C 210 Causation, Origination
interventionist framework, it means that an interven- • Efficient cause is the primary exogenous source of
tion that changes the value of variable A, for example, change in an entity or system: “An earthquake
by making some nonsmokers smokers, changes the caused the tsunami.”
value of variable B. It is not possible to draw any • Formal cause is the relational organization which is
inference from the existence of a relation of generic responsible for properties of a system: “The charge
causation between factors A and B in a given popula- distribution on a water molecule causes water to be
tion to causal processes at the level of individuals wet.”
possessing properties A and B (or belonging to sets • Final cause is the purpose or meaning which moti-
A and B), in other words to the existence of relations of vates an action: “Hunger caused me to hurry home.”
singular causation: In a given population, A (smoking) Dynamical systems theory describes causation in
may cause B (having lung cancer), John may smoke terms of 6systems of first-order differential equations
(John is A) and have lung cancer (John is B), and yet, x_ ¼ f ðx; tÞ, where x is a vector of state variables of
John’s smoking may not be the (or even a) cause of his a dynamical system S, t is time, and f is the set of
cancer. The cause of John’s having cancer may be the structural relations which determine the dynamic
fact that he has inhaled asbestos dust. behavior x_ of S. This suggests several kinds of formal
causation:
• Upward causation: Change Dxi in an individual
Cross-References state variable causes a change Dx_ ¼ f ðx þ Dxi Þ in
the collective macro-behavior of S.
▶ Causality • Downward causation: A collective change Dx in all
variables causes a change Dx_i ¼ f i ðx þ DxÞ in
a specific micro-behavior.
• Reciprocal causation: A change Dxijj in either of
Causation, Origination two variables xi or xj causes a change
Dx_ ijj ¼ f ijj x þ Dxjji in the other’s behavior.
Niall Palfreyman These distinctions are of fundamental importance
Biotechnology and Bioinformatics, Weihenstephan- for understanding the origination of phenotypic nov-
Triesdorf University of Applied Science, Freising, elty, environmental effects, and disease. For example,
Germany one approach to carcinogenesis seeks efficient, upward
causes in the accumulation of genetic mutations within
a single originator cell (somatic mutation theory),
Definition while another seeks formal, downward cause from
tissue-organization fields (tissue-organization field
Causation is the physical process assumed by theory).
▶ causality to link one occurrence, the cause, to Physicalism traditionally reduces explanation to
a second occurrence, the effect, which is understood material and efficient causes. This reductive program
to occur as a consequence of the cause. A central stems from the explanatory benefits of Descartes’ and
challenge for the study of causation is origination: Newton’s partitioning of the world into two domains:
How are events in the physical world originally the physical domain of material and efficient cause
caused? operating on inert matter, and the epistemic domain
of final causes operating through human agents. The
epistemic domain was thus bracketed out of physical
Characteristics science, and formal cause assumed a twilight role as
action-at-a-distance fields.
Aristotle distinguishes in Metaphysics V 2 between Physicalism achieved its crowning triumph in the
four types of cause: theory of relativity. The special theory is founded on
• Material cause is the intrinsic physical constitution efficient time-like interaction between observers,
of an entity: “The charge of an electron causes it to while in the general theory, geometry becomes the
react to an electric field.” material cause of dynamics. However, mass-energy
Causation, Origination 211 C
equivalence and nonlinearity of the gravitational field Kim (in Bedau and Humphreys 2008) interprets
equations elevated the relativistic field concept to downward causation in two ways: synchronic (BðtÞ
a new status: formal cause as the bearer of material conditions bðtÞ at the same instant t) and diachronic
properties. This inclusion of formal cause into the (BðtÞ conditions bðt þ DtÞ at a later instant). He argues
physicalist fold was consolidated by quantum field that synchronic downward causation is logically inco-
theory, which shifts emphasis from particle-bound herent, but acknowledges (footnote 37, p. 153) that
properties and interactions as material and efficient diachronic downward causation in nonlinear dynami- C
causes to the formal cause of nonlocal field cal systems may be feasible.
configurations. Pursuing this idea further, Soto et al. (2008) argue
This trend is paralleled in biology, where living for diachronic downward causation. They propose
systems were traditionally explained in terms of effi- a materialistic non-physicalist ontology in which sys-
cient cause: Mayr’s proximate developmental causes tems of natural events operate historically to generate
and ultimate evolutionary causes. Yet this scheme is ontological novelty (novel qualities and novel struc-
incomplete on two counts: tures). Thompson (2007) maintains that diachronic
• Developmental systems theory (DST) downward causation is coherent in cases where BðtÞ
(▶ Developmental Systems Theory) stresses the is non-separable into individual micro-behaviors by
formal reciprocity of gene-organism-environment reason of its dependence on interactions between
relations, which complicates views of gene or envi- these behaviors. This form of emergence is clearly
ronment as efficient developmental causes (Oyama irreducible to individual micro-behaviors, yet even
et al. 2001). Thompson (pp. 429–30) seems doubtful of its ade-
• Acknowledging gene-plus-environment as efficient quacy as a basis for ontological origination. The prob-
evolutionary cause still neglects the role of the organ- lem is that dynamic co-emergence cannot escape
ism as the formal confluence of both – particularly determination by its component micro-behaviors
with regard to the origination of phenotypic and because even in a nonlinear dynamical system, it still
genotypic novelty (M€ uller and Newman 2003). supervenes on the formal organization of the structural
These issues have shifted the focus of biological relationships f.
explanation toward formal cause. Systems biology in Seeking a locus of origination in formal cause
particular, in studying the influence of structural and descends philosophically from the later Wittgenstein,
dynamical organization on ontogeny, claims an unam- who sought to define meaning in terms of an irreduc-
biguous allegiance to formal cause. ible closed set of formal language-usage relations, or
The rehabilitation of formal cause has prompted language game. Yet as Putnam (1980) demonstrated,
a search for the origination of action in formal cause. a closed formal system, by abandoning dependence on
A system-level behavior B of a deterministic dynami- external context, loses all causal relevance to that
cal system S is emergent if it is sufficiently irreducible context. By definition, it is then difficult to see how
to micro-behaviors b of S’s components as to constitute such a system can originate from its context.
an originator of efficient cause. Attempts to formalize Appeal to external context suggests a way to eman-
this notion of irreducibility have led to introduction of cipate formal cause from internal organization by sim-
the term supervenience: B supervenes on b if b logi- ply opening S up to interaction with its context. The
cally determines B; B is then emergent if it supervenes weasel’s coat-change is conditioned in part by the
on no set of micro-behaviors. changing autumnal day-length, and so does not super-
A litmus test for coherence of the emergence con- vene on the weasel’s internal organization. Yet mere
cept is its relation to downward causation, which per- openness is an inadequate basis for origination.
mits an emergent behavior BðtÞ of S at time t to Inserting a coin into a coffee-machine is a contextual
influence a micro-behavior b of S’s own components. interaction which results in a cup of coffee, yet we
For example, a weasel’s cells generate a white winter consider the coin, not the machine, the efficient cause
coat which is a formal cause of its survival, and thus of this activity, because a small variation, or defect, in
also of the very cells which generated it. To claim that the coin prevents coffee-making from occurring.
coloration is non-supervenient on these cellular behav- By contrast, the weasel’s color-change is self-
iors seems to invoke a contradictory circular causality. organizing (▶ Self-Organization): It proceeds even
C 212 Causation, Origination
Concentration
2
1.5
0.5
0
0 50 100 150 200 250 300 350 400 450 500
time
if clouds obscure the day-length. Self-organizing bifurcating into qualitatively distinct system behaviors
systems generate reliably similar outcomes BðtÞ in response to small quantitative changes in some
from dissimilar initial conditions; Fig. 1 illustrates critical parameter P. If these changes in P stem from
self-organized buffering of blood-glucose against S’s context, they constitute environmental cues which
variations in demand in the mammalian insulin- S amplifies into vigorous responses BðtÞ (see Fig. 2).
glucagon system. Swenson (2010) uses the term However, for Kant, meaning is also essentially per-
autocatakinetic (ACK) to describe systems which sonal: S must itself determine the dynamics of its own
self-organize using contextual resources. The weasel meaning-constructions.
is an ACK system which buffers its coat-change Since S’s behavior x_ is determined by its structure
against fluctuations in its context, so its winter f ðx; tÞ, we may ask who determines this structure. We
coat is neither supervenient on the weasel’s internal might assume the weasel’s coloration change to be
structure nor is it determined by its context. So is meaningful, relying as it does on an interpretation of
the weasel qua organism the causal origin of the day-length in terms of future snow. However, the
coloration change? structures generating this dynamic arise from propen-
This is the question of final cause, and goes to the sities of the weasel species-niche system, so the inter-
heart of physical-epistemic dualism. In his Critique pretation “snow” does not originate in this individual
of Judgment, Kant argued for a distinction between weasel. The status of organisms as final causes there-
physical and biological explanation, characterizing fore hangs on one question: Do individual organisms
physical systems as heteronomous (other-governed), determine their own structure?
and living organisms as the autonomous (self- To attribute this determination to efficient
governing) originators of final cause. Autonomous cause would be to accept Cartesian dualism; however,
systems are certainly ACK, and ACK systems have the ability of complex open systems to amplify min-
the capacity to assert their autonomy against the imal contextual variation makes their context
dictates of context; however, Kant’s definition of a rich source of internal dynamical variation. If this
autonomous systems (▶ Autonomy) as final causes variation is dynamically coupled to the very struc-
depends on their ability to construct meaning from tures which generate it, any consequent stabilization
environmental cues, enabling them to predict and of those structures would necessarily be compensated
respond to future events. by a corresponding increase in ▶ entropy production
ACK systems can potentially construct meaning, by the system. Swenson’s (2010) proposed Law of
since they necessarily exhibit ▶ complex behavior, Maximum Entropy Production (LMEP) stipulates
CD4+ T Cell Response 213 C
Causation, Origination,
Fig. 2 Complex response of
a protein-RNA switch to Protein,
varying protein influx 1 RNA
Protein influx
0.8
C
0.6
0.4
0.2
0
0 50 100 150 200 250 300 350 400
time
Characteristics
Definition
Two Families of CKIs Regulate Cell-Cycle
CD8+ cytotoxic T lymphocytes are a subgroup of Progression and Differentiation
T lymphocytes whose main function is to induce the In mammals, CKIs are included in one of two families
apoptosis of infected (e.g., viruses, bacteria, parasites, according to their structure and CDK targets:
or fungi), mutated (e.g., cancer), or foreign (e.g., trans- 1. The INK4 (INhibitors of CDK4) family of CKIs
plants) cells. includes p16INK4a, p15INK4b, p18INK4c, and
p19INK4d. They bind CDK4 and CDK6 and form
Cross-References binary complexes with these kinases, preventing
CDK interaction with cyclin D. The four INK4
▶ TAP Translocator, In Silico Prediction proteins share a similar structure composed of mul-
tiple ankyrin repeats (Cánepa et al. 2007).
References 2. The Cip/Kip (CDK-interacting protein/Kinase
inhibitor protein) family of CKIs comprises
Murphy K, Travers P, Walport M (2007) Janeway’s p21Cip1, p27Kip1, and p57Kip2. These proteins have
immunobiology, 7th edn. Garland Science, New York, both a positive and a negative role on CDK activity:
pp 364–368
they act as cofactors required for CDK4/6-cyclin
D complex assembly but inhibit CDK2-cyclin E,
CDK2-cyclin A, and CDK1-cyclin B activity.
CDK Inhibitors Cip/Kip inhibitors share a conserved N-terminal
domain, which binds to CDKs and cyclins, but
Livia Pérez-Hidalgo and Sergio Moreno differ in the rest of the sequence (Besson et al.
Instituto de Biologı́a Molecular y Celular del Cáncer, 2008; ▶ Cell Cycle of Mammalian Cells) (Fig. 1).
CSIC/Universidad de Salamanca, Salamanca, Spain In other organisms, proteins functionally related to
mammalian CKIs also play key roles in cell-cycle
regulation and in promoting cell-cycle exit during dif-
Definition ferentiation processes (See Table 1; De Clercq and
Inzé 2006).
CDK inhibitors are proteins that suppress CDK-cyclin In Drosophila melanogaster, there are two Cip/
protein kinase activity in the G1 phase of the cell cycle Kip-related CKIs, Dacapo and Roughex. Dacapo reg-
and promote G1 arrest in response to environmental or ulate the exit from the cell-cycle during embryonic
intracellular stimuli (▶ Cyclins and Cyclin-dependent development, and Roughex contributes to the estab-
Kinases). lishment of G1 through downregulation of CDK-cyclin
Oscillations in CDK-cyclin activity are accurately A complexes.
regulated in order to ensure the correct sequence of cell The Caenorhabditis elegans genome encodes two
cycle events (▶ Cell Cycle). This regulation occurs at CKIs, cki-1 and cki-2, belonging to the Cip/Kip family
several levels, including cyclin binding to CDKs, CDK of CDK inhibitors, and both are required for cell-cycle
phosphorylation, and CDK-cyclin activity inhibition exit into quiescence during development.
CDK Inhibitors 215 C
p16INK4a
p15INK4b
INK4 p21Cip1
p18INK4c
INK4d Cip/Kip p27Kip1
p19
p57Kip2
Cdk4/6
G0
CycD C
Cip1
Cdk2 p21
CycE Cip/Kip p27Kip1
p57Kip2
G1
Cdk1
CycB M S Cdk2
CycA
G2
p21Cip1
p27Kip1 Cip/Kip
p57Kip2 Cdk1
CycA
CDK Inhibitors, Fig. 1 Regulation of cell-cycle progression by CKIs in mammalian cells. Two families of CDK inhibitors in
metazoans regulate G1 progression: Cip/Kip proteins and INK4 proteins
In Xenopus, Cip/Kip-related protein p27Xic1 is primary function of Sic1 appears to be at this point of
involved in cell-cycle exit and differentiation in the cycle, Sic1 is also required at mitotic exit,
myogenesis and neurogenesis (▶ Cell Cycle of Early contributing with its accumulation, together with
Frog Embryos). Clb2 degradation, to the irreversibility of this transi-
Yeasts’ CKIs share no sequence homology with tion (▶ Cell Cycle Dynamics, Irreversibility). Far1 is
metazoan CKIs, but they are functionally and structur- induced in response to pheromones, leading to the G1
ally related. In the budding yeast Saccharomyces arrest that precedes mating (▶ Cell Cycle, Budding
cerevisiae, two CKIs regulate cell-cycle progression, Yeast).
Sic1 and Far1. Sic1 is required for the establishment of In the fission yeast Schizosaccharomyces
a G1 phase and to delay entry into S phase until pombe, a single Sic1-related inhibitor, Rum1,
▶ prereplicative complexes are built. In the absence carries out the functions in cell proliferation and
of Sic1, cells show defects in G1 progression, differentiation of budding yeast Sic1 and Far1,
and this impairs S phase dynamics. Cells start DNA respectively. Rum1 regulates progression through
replication from fewer origins, extending S phase the G1 phase of the cell cycle and is essential in
and entering mitosis with unreplicated chromosomes processes that require lengthening the G1 phase,
that inefficiently separate during anaphase. As such as the responses to pheromones and nutrient
a consequence, sic1-defective cells show increased starvation. Also, Rum1 is required to prevent CDK1
genomic instability. All these defects are rescued (Cdc2) activation at Start until the critical cell size is
by delaying CDK activation and S phase, indicating reached (Labib and Moreno 1996). (▶ Cell Cycle,
that the primary function of Sic1 is at the end of Cell Size Regulation; ▶ Cell Cycle, Fission Yeast)
G1 (Lengronne and Schwob 2002). Although the (Fig. 2).
C 216 CDK Inhibitors
Role of Mammalian CKIs in Tumorigenesis and p57Kip2 is required for embryonic development, and
Phenotypes of CKI Knockouts p57Kip2-null mice have major developmental defects
Cip/Kip Inhibitors and mostly die perinatally. This inhibitor shows
According to its function limiting cell-cycle progres- a tissue-specific pattern of expression both in the
sion, p27Kip1-deficient mice display cell proliferation adult and during embryogenesis.
defects: increased body size, multiple organ hyperpla-
sia, and pituitary tumors. Unlike other tumor suppres- INK4 Inhibitors
sor genes, p27Kip1 is rarely mutated in cancer but Members of the INK4 family display different patterns
appears frequently downregulated by increased degra- of expression: p16INK4a and p15INK4b are not expressed
dation, decreased transcription or cytoplasmic during embryonic development but are induced in
mislocalization. Besides, the simultaneous inactiva- response to oncogenic stress, and p18INK4c and
tion of other ▶ tumor suppressor genes enhances the p19INK4d are expressed during embryogenesis and at
tumor-prone phenotype of p27Kip1 knockout (▶ Cell later stages in adults, carrying out their function during
Cycle, Cancer Cell Cycle and Oncogene Addiction). organismal development. Mice deficient in any of
In contrast to p27Kip1, p21Cip1 knockout mice do not these proteins are viable.
display a hyperproliferative defect. They show p16INK4a is an important tumor suppressor that is
increased predisposition to tumor development in frequently mutated or downregulated in many different
combination with mutations in other tumor suppressor forms of human cancer in contrast to the other three
genes. As with p27Kip1, mutations in p21Cip1 gene are INK4 genes, which are rarely mutated in cancer. It is
rare in tumors. In addition, cells lacking p21Cip1 fail to not generally expressed during fetal development but
arrest in response to DNA damage (see below). accumulates in adults during ▶ senescence. In the
CDK Inhibitors 217 C
Budding yeast
Start
G1 S G2 M
Clb5,6
Cdc28
C
α-factor
Sic1
Cln1,2
Far1 P
Cdc28 SCFCdc4 P Ub
Proteasome
Sic1 Sic1 Ub Sic1
Ub
Fission yeast
Start
G1 S G2 M
Cig2/Cdc13
Cdc2
Rum1
Cig1
Cdc2 P
SCFPop1,Pop2 P Ub
Proteasome
Rum1 Rum1 Ub Rum1
Ub
CDK Inhibitors, Fig. 2 Regulation of the cell cycle by CKIs in inhibiting Cdc28-Cln1,2 complexes. Bottom: In fission yeast,
budding and fission yeast. Top: In budding yeast, Sic1 delays Rum1 regulates progression through G1 and induces a cell-cycle
entry into S phase through binding to Cdc28-Clb5,6 complexes. arrest in the absence of nutrients and in response to pheromones
At G1/S transition, Sic1 is phosphorylated by Cdc28-CIn1,2, by inhibiting the activity of Cdc2-Cig2 and Cdc2-Cdc13
thereby inducing ubiquitylation of Sic1 by SCFCdc4 and complexes. Phosphorylation of Rum1 by Cdc2-Cig1 targets it
proteasome-dependent degradation. In response to pheromones, to ubiquitylation by SCFPop1,Pop2, prior to degradation by the
Far1 induces cell-cycle arrest in G1 prior to cell fusion by proteasome
absence of p16INK4a, p15INK4b has an essential function contribution to the DNA-damage response in human
as a tumor suppressor. cells regulating multiple processes, such as cell cycle
The loss of p18INK4c and p19INK4d leads to male progression, ▶ apoptosis, and transcription (Cazzalini
reproductive defects, and mice lacking p19INK4d also et al. 2010).
display progressive hearing loss. In addition, inactiva- Regulation of cell cycle progression occurs by
tion of p18INK4c predisposes to cancer development inhibition of CDK2-cyclin E and CDK2-cyclin
and is a haploinsufficient tumor suppressor in mice. A complexes, required for phosphorylation of ▶ reti-
By contrast, p19INK4d null mice do not display tumors noblastoma, but also by binding to PCNA (proliferat-
nor are more susceptible to cancer development in ing cell nuclear antigen), which interferes with PCNA
response to carcinogens. function during DNA replication.
Another part of the response of p21Cip1 to DNA
Function of p21Cip1 in the DNA-Damage Response damage is transcriptional regulation, where p21Cip1
p21Cip1 increases following DNA damage in a has both a positive and a negative role. This regulation
p53-dependent manner, and it makes an essential occurs through different mechanisms. Inhibition of
C 218 CDK Inhibitors
CDK activity prevents retinoblastoma phosphoryla- and the G1/S transition. Mitogenic stimulation also
tion, thus leading to inhibition of E2F-dependent inhibits, through PI3K and AKT, GSK3b phosphory-
transcription. Besides, p21Cip1 associates with the lation of cyclin Ds, thereby inhibiting their nuclear
DNA-binding proteins E2F1, STAT3, and MYC, export and enhancing the nuclear accumulation of
suppressing their transcriptional activity, and dere- active cyclin Ds-CDK complexes. Inhibition of
presses p300-CREBBP targets. GSK3b also stimulates expression of cyclin Ds
In contrast to the role of p53 in the induction of through regulation of the transcription factors AP-1
apoptosis in response to DNA damage and oxidative and Myc.
stress, p21Cip1 protects cells from apoptosis, although Transforming growth factor (TGFb) antimitogenic
in some cases it may show a pro-apoptotic role. Since signal induce a G1 block, causing synthesis of CKIs
an active cell cycle is required for apoptosis, p21Cip1 p15INK4b and p21Cip1. p15INK4b levels increase through
may prevent apoptosis through its role in suppressing the Smad complex. Smad proteins inhibit expression
cell-cycle progression, but also it may inhibit apoptosis of Myc, and, after Myc has been repressed, expression
directly, blocking pro-caspase-3 processing or of p15INK4b increases (Morgan 2007).
preventing stress-induced apoptosis by binding to
ASK1/MEKK5 or JNK1/SAPK. Degradation of CKIs at the G1/S Transition
Irreversible entry into the cell cycle at Start is con-
Role of CKIs in Endocycles trolled by positive ▶ feedback loop that generate irre-
CKIs play an essential role in the control of CDK versible CDK activation (▶ Cell Cycle Dynamics,
activity in endocycling cells, which undergo multiple Irreversibility). Proteolytic degradation of CKIs at
rounds of DNA replication without an intervening G1/S, triggered by a starter kinase, is essential for
mitosis (Ullah et al. 2009). Start in yeast and mammalian cells (Morgan 2007;
p57kip2 triggers genome ▶ endoreplication during Novak et al. 2007).
differentiation of trophoblast stem (TS) cells into tro- In human cells, different pathways lead to p27kip1
phoblast giant (TG) cells. It is expressed at the end of phosphorylation, ubiquitylation, and proteasome-
S phase, preventing entry into mitosis and inducing the mediated destruction at G1/S:
accumulation of cells in G2. As CDK activity is low, 1. In late G1, p27kip1 is phosphorylated at T187 by
▶ APCCdh1 is activated instead of APCCdc20. One of CDK2-cyclin E and CDK2-cyclin A complexes.
the targets of APCCdh1 is geminin, an inhibitor of the This phosphorylation targets p27kip1 for
licensing factor Cdt1. The destruction of geminin and ubiquitylation by SCFSkp2 (▶ Skp1/Cul1/F-Box-
other targets, such as cyclin A, creates a window in G2 containing Complex (SCF)) and subsequent degra-
with low CDK activity, which permits the formation of dation by the 26S proteasome. The interaction
prereplicative complexes. p57kip2 disappears after between p27kip1 and Skp2 is enhanced by additional
phosphorylation by CDK2-cyclin E and degradation factors, such as Cks1 (CDK subunit 1). Also, inter-
by the 26S ▶ proteasome, to allow the onset of S phase. action between CDK2-cyclin A and T187-
Oscillations of p57kip2 and cyclin E protein levels phosphorylated p27kip1 stimulates p27kip1
Skp2
maintain the endocycles. In Drosophila, endocycles ubiquitylation by SCF .
in nurse cells are maintained through oscillations in 2. Phosphorylation of p27kip1 at S10 by AKT pro-
the levels of Cyclin E and the CKI Dacapo. motes its nuclear export through interaction with
the nuclear export protein Crm1. The stability of
Regulation of CKIs by Mitogenic and cytoplasmic p27kip1 is regulated by the E3 ligase
Antimitogenic Signals KPC (Kip1 ubiquitylation-promoting complex),
In mammalian cells, in response to mitogenic signals which targets it to degradation by the proteasome
and activation of the Ras-MAP kinase pathway, cyclin (Fig. 3).
Ds are synthesized, and stable cyclin Ds-CDK com- In addition, G1/S transition in cycling cells and
plexes containing the p27kip1 inhibitor as a cofactor after DNA-damage repair requires degradation of
form, thus taking p27kip1 from CDK2-cyclin E/A com- p21Cip. p21Cip1 is degraded at the proteasome by
plexes. This leads to the activation of these complexes ubiquitin-dependent and ubiquitin-independent
CDK Inhibitors 219 C
P KPC P
Proteasome
Kip1 p27Kip1
p27 p27Kip1
Ub
Ub
Ub
Cytoplasm
C
Nucleus Crm1
G0 G1 AKT
p27Kip1
S10
p27Kip1
T187
P SCFSkp2 P
Proteasome
p27Kip1 p27Kip1 p27Kip1
Ub
Ub
Ub
CDK2-CycE
G1 Late G1
CDK Inhibitors, Fig. 3 Regulation of p27Kip1 by phosphoryla- degradation at the proteasome. Phosphorylation of p27Kip1 by
tion and proteasome degradation. CDK2-cyclin E phosphory- AKT promotes p27Kip1 translocation to the cytoplasm and
lates p27Kip1, promoting its ubiquitylation by SCFSkp2 and KPC-dependent ubiquitylation
▶ Prereplicative Complex
▶ Proteasome Cell Compartment
▶ Retinoblastoma
▶ Senescence ▶ Organelle and Functional Module Resources
▶ Skp1/Cul1/F-Box-containing Complex (SCF)
▶ Tumor Suppressor Gene
Cell Cycle, Fig. 1 Regulation of Cdk activity during the cell induces cyclin removal at the end of the cell cycle. (b) Temporal
cycle. (a) Possible ways Cdk activity is regulated: CKI – pattern in Cdk activity during the cell cycle and cell cycle
Stochiometric Cdk inhibitor acts in G1, where Cdh1/APC transitions (Rp – Restriction point, exit – mitotic exit). In
complex induces cyclin degradation through the proteasome. S phase, the DNA replicates, in G2 two copies are present and
Synthesis of cyclin could be also regulated by transcriptional Cdk activity is low but not zero. Sharp increase in Cdk activity
factors. Wee1 and Cdc25 are the most important kinase/ induces mitosis, where the nuclei divide and the drop in the
phosphatase pair that regulates Cdk in G2 phase and Cdc20/APC activity triggers cell division
Crucial functions of the core cell cycle machinery in G2-phase). Such size control (▶ Cell Cycle, Cell
eukaryotic cells are the following: Size Regulation) clearly works in yeast cells, but
• The alternation of DNA replication (S-phase) and its role in higher eukaryotes is still not fully
mitosis (M-phase) is essential in normally prolifer- understood.
ating somatic cells. This is achieved by changing The basic controls are conserved in all eukaryotic
Cdk activity in each phase, and ensuring that the organisms from yeast to human. Indeed, human Cdks
cycle proceeds only in one direction. Incidental (or can rescue Cdk mutants of yeast cells and many other
regulated) drop in Cdk activity in G2 phase can core molecules are similarly highly conserved.
induce ▶ endoreplication, when two rounds of Although the cell cycle regulatory molecules of pro-
S phase happen without mitosis and two nuclear karyotes (▶ Cell Cycle, Prokaryotes) are highly dis-
divisions occur in ▶ meiosis, when the DNA con- similar, their interaction network and the resulting cell
tent is halved. These processes normally occur only cycle dynamics are very similar to the yeast system.
during development and embryogenesis, they are
avoided in normal somatic cells by careful control
of Cdk activity. Historical Discoveries of Cell Cycle Research
• Checkpoint mechanisms stop the cell cycle, by
freezing Cdk activity, if some previous event has Systematic analysis of cell cycle mutants in the 1970s
not been properly completed and ensure that repair by Lee Hartwell and Paul Nurse led to the discovery of
mechanisms are initiated to “solve” the problem. In Cdk while cyclin was discovered by Tim Hunt. These
higher eukaryotes, similar pathways eventually researchers received the Nobel Prize in 2001 for their
induce the suicide of the cell if the repair fails to breakthroughs in understanding cell cycle regulation
avoid the spreading of the mutant, which could lead (Nasmyth 2001).
to tumor formation (▶ Cell Cycle, Cancer Cell The first results toward the identification of
Cycle and Oncogene Addiction). a biochemical component governing cell cycle pro-
• Growth in cellular mass could be rate-limiting, thus, gression came from investigations on cell extracts
the DNA replication-division cycle is “slowed from frog eggs (▶ Cell Cycle of Early Frog Embryos).
down” by inserting gap phases (G1- and Studies in the early 1970s described the existence of
Cell Cycle 223 C
a maturation promoting factor (MPF) whose function The system-level interactions depicted in Fig. 2 cannot
was related to induction of quiescent cells into division be easily understood by reductionistic thinking. Math-
(Masui and Markert 1971). At the same time, yeast ematical and computational models can help to under-
genetic approaches lead to the discovery of the most stand such wiring diagrams and test if the current
important genes of cell cycle regulators: Hartwell knowledge on molecular interactions is sufficient to
identified cell division cycle (Cdc) genes in the bud- describe experimental observations (Csikasz-Nagy
ding yeast Saccharomyces cerevisiae (▶ Cell Cycle, 2009). Figure 3 shows simulation results of the wiring C
Budding Yeast) through a genetic screen of diagram of Fig. 2. It explains how the waves of the
temperature-sensitive mutants (Hartwell et al. 1973). different cyclins overlap with the inactivation of Cdk/
Paul Nurse isolated temperature-sensitive mutants of cyclin inhibitors (CKI, Cdh1 and Wee1).
the rod shaped fission yeast, Schizosaccharomyces
pombe (▶ Cell Cycle, Fission Yeast), and found cell-
cycle-blocked long cells, but also noticed unusually Initial Steps in Mathematical Modeling of the
small cells that seemed to be able to proliferate nor- Cell Cycle
mally (Nurse 1975). He named this mutant “wee”
(meaning small in Scottish) and realized that the Mathematical modeling of the cell cycle has been
related genes must have an important role in control- started already when scientists were only guessing
ling proper cell division. Nurse also figured out that the mechanisms that could drive the processes of the
one of the genes with wee phenotype can be mutated in cell division cycle. Early efforts considered the phe-
another domain to produce cell-cycle-blocked elon- nomenon as a black box but could still discover some
gated cells, discovered that this gene (Cdc2) is homo- of the rules that determine the observed physiology of
logues to Hartwell’s most important budding yeast cells. From the 1960s, we can find mathematical
gene, and found the human version of it (▶ Cell models that explain some key aspects of cell cycle
Cycle of Mammalian Cells). Tim Hunt contributed to regulation from phenomenological observations on
the previous work of Hartwell and Nurse when he cell size and cell cycle time distributions (Koch and
studied the control of mRNA translation and found Schaechter 1962). The great interest in the newly dis-
proteins whose abundance oscillated (Evans et al. covered chemical oscillations in the 60s made consid-
1983). He referred to these periodically degraded mol- erable contributions to research on theoretical physical
ecules as “cyclins.” Later discoveries showed that the chemistry and later to mathematical biology as well.
complex of cyclin and Cdc2 (generally now called Researchers, such as Albert Goldbeter, John J. Tyson,
Cdk) is responsible for the MPF activity in frog Arthur Winfree, and others, realized that the theories
embryos. After these breakthrough discoveries, several and tools to analyze chemical oscillations might help
cell cycle regulators and their functions have been to understand biological oscillations. They started to
identified in various organisms. We learned that simi- investigate calcium oscillations, circadian clocks, and
lar molecules control the cell cycle of plants, worms, many other biological oscillators, among them the
and flies and began to understand the intricacies of oscillations that drive cell division. As the first exper-
their regulation in the different organisms. Indeed, imental results on cell cycle regulation appeared (see
not only cyclin and Cdk proteins are conserved above), theoreticians started to create models to predict
among eukaryotes, but most of the cell cycle regulator how Cdk controls the cell cycle. Some of these models
proteins as well as their interactions (Nurse 1990). led to a better understanding of the basic dynamical
Here, we present the generic features of the cell cycle properties of the cell cycle regulatory network
regulatory network (Csikasz-Nagy et al. 2006) (Csikasz-Nagy 2009).
together with the regulators in most important cell
cycle test organisms (Fig. 2 and Table 1).
These interactions can drive the periodic appear- Predictive Models of the Cell Cycle
ances of the various cyclins. CycD is present through-
out the entire cell cycle, CycE, and CycA appears at the In the early 1990s, several hypotheses emerged
G1/S transition restriction point and CycB activity concerning the origin of cell cycle oscillations. Theory
increases at G2/M transition and drops at mitotic exit. tells us that for oscillations the system needed to
C 224 Cell Cycle
Cell Cycle, Fig. 2 Generic cell cycle regulatory network. Solid end lines stand for inhibition effects. Gray squares behind
arrows depict molecular transitions, molecules “sitting” on cyclins (CycA-D) represent their constantly bound Cdk partners
arrows and dashed arrows represent activations; dashed blunted
contain a negative feedback loop (auto-inactivation), a critical value to induce an abrupt activation of
some nonlinearity, and either a delay or a positive Cdk/CycB.
feedback loop (auto-activation) (Novak and Tyson Nonlinear positive feedback loops can create
2008). For the cell cycle oscillations, Albert Goldbeter multistability (Ferrell 2002) when, in a given parameter
proposed a molecular cascade mechanism that pro- range, the system can exist in more than one stable states
duces a limit cycle originating from a pure negative and only the history of the system determines in which
feedback of Cdk and its degradation machinery of these states we will find it (this is the concept of
(Goldbeter 1991). John J. Tyson and Bela Novak hysteresis). Transitions from one stable state to another
predicted a more complex mechanism producing can happen very abruptly when one of the controlling
a relaxation oscillator that moves around a hysteresis parameters reaches a critical value. To set back the
loop as a result of a combination of positive and neg- system to the original state, the same parameter has to
ative feedbacks (Novak and Tyson 1993). The Novak- decrease to a much lower value. This can ensure that
Tyson model proposed that the positive feedback loop transitions between the two stable states are quite robust,
between Cdk/CycB and Cdc25 and the double negative for small fluctuation these transitions are irreversible.
(thus also positive) loop between Cdk/CycB and Wee 1 Now we understand the importance of hysteresis and
create a situation where cyclin levels needs to reach irreversible switches (▶ Cell Cycle Dynamics,
Cell Cycle 225 C
Cell Cycle, Table 1 List of key cell cycle regulators in the most important cell cycle research model organisms
Function Name in Fig. 2 Budding yeast Fission yeast Xenopus laevis Mammalian
Starter CDK/cyclin complex CycD Cdc28/Cln3 Cdc2/Puc1 Cdk4,6/CycD Cdk4,6/CycD1–3
(G1/S transition inducers) CycE Cdc28/Cln1,2 - Cdk2/CycE Cdk2/CycE1,2
S-phase promoting factor (SPF) CycA Cdc28/Clb5,6 Cdc2/Cig2 Cdk1,2/CycA Cdk1,2/CycA1,2
M-phase promoting factor (MPF) CycB Cdc28/Clb1,2 Cdc2/Cdc13 Cdc2/CycB Cdk1/CycB1,2
Stoichiometric kinase inhibitor CKI Sic1 Rum1 Xic1 p27Kip1 C
Mitotic cyclin degradation Cdh1 Cdh1 (Hct1) Ste9 (Srw1) Fzr Cdh1
regulator (with APC)
Cyclin B, Cyclin A degradation Cdc20 Cdc20 Slp1 Fizzy p55Cdc
regulator (with APC)
Retinoblastoma protein Rb Whi5 - RBL1,2 Rb1
Cyclin E, Cyclin A transcription factor TFE Swi4/Swi6 Cdc10/Res1 XE2F E2F1–3
Mbp1/Swi6
CDK/cyclin B inhibitor kinase Wee1 Swe1 Wee1 Xwee1 Wee1
CDK/cyclin B activator phosphatase Cdc25 Mih1 Cdc25 Xcdc25 Cdc25C
Phosphatase working against the CDKs Cdc14 Cdc14 Clp1/Flp1 Xcdc14 Cdc14
Irreversibility) in the cell cycle: the cell must complete physiology (viability-lethality, cell size, lengths of dif-
one state before starting another event, avoiding the ferent cell cycle phases) in wild-type and 50 mutant
same phase to repeat twice (Novak et al. 2007). cells. The authors proposed that G1 and S/G2/M
As a result of the positive feedback in the Novak- phases are alternative states and cells switch between
Tyson model, the removal of some CycB cannot inac- them during the G1/S transition, when CDK activity
tivate Cdk immediately – showing hysteresis and abruptly increases. The bistability of the system is
bistability in the cell cycle (▶ Cell Cycle Dynamics, given by the antagonism between CDK/cyclin com-
Bistability and Oscillations). This prediction was inde- plexes and their inhibitors CKI (Sic1 in budding yeast)
pendently experimentally verified by two groups and Cdh1 (Fig. 4). The authors also proposed an exper-
(Pomerening et al. 2003; Sha et al. 2003), and it was iment which could verify the existence of this
also discovered that by removing the negative effect of bistability. In 2002, Fredrick R. Cross and his group
Wee1 on Cdk (thus disrupting the positive feedback performed experiments following these suggestions
loop), the oscillations became less robust and their and confirmed the prediction (Cross et al. 2002). In
amplitude was reduced (Pomerening et al. 2005). this groundbreaking study, Cross went much further
These insights highlighted the importance of positive and carried extensive experimental tests of various
feedbacks in cell cycle regulation. properties of the model while also performing some
Hysteresis and multistability was also proposed by crucial measurements to help further model develop-
mathematical modeling and verified later experimen- ment. This milestone study was one of the first that was
tally for the cell cycle regulation of yeast cells. The two fully devoted to test a mathematical model, and could
landmark papers by Katherine C. Chen and colleagues therefore be viewed as one of the first true systems
from the Tyson lab dealt with the cell cycle of budding biology studies. Two years later, the measurements
yeast cells. The models that were created in this work and corrections by Cross were implemented into
are now the influential mathematical models of cell a new version of Chen’s model (Chen et al. 2004). It
cycle regulation (Chen et al. 2004; Chen et al. 2000). was also extended to cover detailed regulation of
These models are based on an extensive literature data mitotic exit. The resulting model was tested against
collection on more than 100 budding yeast cell cycle the behavior of 131 mutants and marginally failed only
mutants. The final model can simulate the findings of on 11 of them – highlighting the aspects of the model
almost all of these experiments. In their first paper, that lacked sufficient experimental evidence. This
Chen et al. (2000) examined the molecular events model also predicted the presence of a mitotic exit
regulating the START transition of the cell cycle. regulatory phosphatase that was later experimentally
The model accounts for many details of the cells’ identified (Queralt et al. 2006).
C 226 Cell Cycle
Cell Cycle, Fig. 3 Simulation of a “generic” cell cycle model. induce Cdh1 inactivation and drive the cell into S-phase. Wee1
Temporal fluctuations in molecular levels of the most important keeps Cdk/CycB inactive in G2. Later, as the autocatalytic loops
cell cycle regulators (notations as in Table 1 and Fig. 2). Plots of Cdk/CycB activation turn on, Wee1 gets inactivated and the
show that the level of CycD is not fluctuating during the cell increase in Cdk/CycB activity induces M-phase. At the end of
cycle, still its activity together with the increasing CycE depen- mitosis, when the spindles are properly aligned, Cdc20 activa-
dent Cdk activity can induce CKI degradation and activation of tion is not halted anymore and the active Cdc20/APC induces
further CycE and CycA production (after inactivation of Rb and CycB degradation and CKI and Cdh1 activation to finish the cell
activation of E2F, as noted on Fig. 2). Cdk/CycA and Cdk/CycE cycle. The drop in Cdk/CycB activity induces cell division
Cell Cycle Modeling Techniques model can be defined. ODEs have been used for
many years to create mathematical models of eukary-
The most common approach to describe a system com- otic cell cycle regulation, since this was the technique
posed of biochemical reactions is to translate the inter- that was used to investigate chemical and physical
actions of a molecular wiring diagram into ordinary oscillators. Thus, all the developed tools of dynamical
differential equations (ODEs) (▶ Cell Cycle systems theory could be used to understand ODE
Modeling, Differential Equation). These can be either models of cell cycle regulation. Maybe to most fre-
solved analytically or, in the case of larger models, quently used technique is bifurcation analysis (▶ Cell
numerically by computational means. The solutions of Cycle Model Analysis, Bifurcation Theory), which
ODEs can be compared with existing experimental follows the steady state solutions of the system of
data, thus some of the kinetic rate constants of the differential equations. These steady states can identify
Cell Cycle 227 C
Cell Cycle, Fig. 4 Bistability in the budding yeast cell cycle. abruptly activated at the critical cell mass what drives the cell
The signal-response curve (left) shows how the steady state Cdk/ into S-phase. Small fluctuations cannot drive the system back to
Cyc activity depends on cell mass (solid and dashed lines stand G1 phase. Unless the cell mass sensing pathway activity drops
for stable and unstable steady states respectively). In early G1 below the lower boundary of the bistabile zone (gray back-
phase only one steady state exists, with low Cdk/Cyc activity. As ground), the cell will stay in the high Cdk activity S/G2/M
the cell grows the second stable state appears as well, but the state. The bistability is a result of an antagonistic, double nega-
original G1 steady state still remains until the cell mass reaches tive (thus positive) feedback loop, where Cdk inhibits Cdh1 and
a critical value where this disappears and only the high Cdk CKI activity, while Cdh1 induces cyclin degradation and CKI
activity state remains. As the cell grows (dotted lines) Cdk gets inhibits Cdk/Cyc complexes (Fig. 1a)
the phases of the cell cycle and cell cycle transitions complexity of mathematical models is growing as
could be associated with bifurcations. Indeed Fig. 4 is well. Detailed analysis and parameterization of such
a bifurcation diagram where cell mass is a bifurcation large systems can be difficult when the number of
parameter and Cdk activity is the state variable. ODEs is large. Simplification by logical modeling
The major problem with most biological systems is has been proposed to overcome this problem (▶ Cell
the fact that very few rate constants can be measured Cycle Modeling Using Logical Rules). Logical model-
directly. Parameter sensitivity analysis (▶ Cell Cycle ing has a long tradition in biology and recently some
Models, Sensitivity Analysis) is used to identify the applications to cell cycle research also appeared. These
parameters that have the highest influence on model models are based on Boolean algebra, where the activ-
behavior. This method could help to reduce the number ity of each component is represented by two states: ON
of parameters that need to be measured or estimated. and OFF, providing a method which is computation-
Several techniques exist to estimate the parameters of ally less expensive. Behaviors that originate from the
a particular model, from manual parameter “twid- topology of the system can be investigated this way. In
dling” based on “educated guesses” to highly sophis- their pioneering study, Li et al. (2004) showed that the
ticated parameter inference using software tools network of the budding yeast cell cycle regulation is
(Panning et al. 2008; Schmidt and Jirstrand 2006). robustly designed, as 86% of all possible initial states
Almost all of these have been used in various cell lead the system to the same stable cell cycle path. This
cycle models. Metabolic control analysis is mostly result underlines that the biochemical network of cell
used to identify key reactions of metabolism, but cycle regulation can function in a parameter-
recent studies showed that similar technique could be insensitive way.
used to identify key reactions of the cell cycle as well Fluctuations in the average behavior of a cell popula-
(▶ Cell Cycle Signaling, Metabolic Pathway). tion can be described by deterministic ODE models or
Our knowledge about the regulatory network of the logical models, but the behavior of individual cells cannot
cell cycle and its interactions with other cellular path- be simulated that way. Stochastic modeling approaches
ways is constantly growing. As a consequence, the (▶ Cell Cycle Modeling, Stochastic Methods) are
C 228 Cell Cycle
becoming more and more popular as they provide Restriction Point). Classical flow cytometry was first
a means to analyze single cells and find relevant results used in 70s to determine DNA content and with that,
in cooperation with novel single-cell experimental the cell cycle stage of cell populations. Now, with the
techniques such as quantitative flow cytometry and help of fluorescence proteins, it could be used in
fluorescence microscopy. Some cell cycle models are a much more sophisticated way to determine molecule
already using stochastic simulation techniques, such number changes during the cell cycle (▶ Cell Cycle
as Langevin-type stochastic ODEs or the exact sto- Analysis, Flow Cytometry). Fluorescence proteins and
chastic simulation algorithm (Gillespie 2007). The advanced microscopy techniques also allow analysis
effect of noise in nonsymmetrical cell division on cell of single cells and even localization of single mole-
size distribution on the population level has been also cules during the cell cycle (▶ Cell Cycle Analysis,
investigated. Much more detailed stochastic models of Live-Cell Imaging). The periodic oscillations in tran-
cell cycle regulation are appearing. Such models take scription during the cell cycle was an early target of
into account measured molecular levels, protein, and transcriptomics research (▶ Cell Cycle Analysis,
mRNA half-lives and several other noisy processes. Expression Profiling) indeed, one of the first genome-
Most of these models are based on earlier ODE models scale microarray experiments was performed on the
that are “unpacked” into elementary reactions to be budding yeast cell cycle. Protein level fluctuations are
able to run stochastic simulations on them. classically followed by Western blots, but now large-
A successful approach along these lines uses Petri scale mass spectrometry analysis is widely used to
nets (▶ Cell Cycle Modeling, Petri Nets) which allow follow protein level fluctuations during the cell cycle
to move between deterministic and stochastic simula- and to indentify the phosphorylation states of Cdk-
tions and to follow molecule numbers instead of con- regulated proteins (Aebersold and Mann 2003).
centrations. Some other modeling concepts from Advanced ChIP–chip techniques help to reveal the
computer science are adopted and adapted to model transcriptional network of cell cycle regulation (Bahler
biological systems. Rule-based modeling and espe- 2009) and, essentially, all molecular and systems
cially various process algebras (▶ Cell Cycle Model- biology methods are used in cell cycle research.
ing, Process Algebra) are able to handle combinatorial The collected data is stored in various databases on
complexity caused by complex formation and various protein interactions, transcriptional interactions, post-
protein modifications. These allow an easier handling translational modifications, etc (▶ Cell Cycle Data
of multisite phosphorylation and multi-component Analysis). There are some cell cycle specific databases
complex formations, which are both relevant issues that contain information on periodically transcribed
for cell cycle regulation. genes during the cell cycle (Gauthier et al. 2010) and
that introduce related mathematical models as well
(Alfieri et al. 2008). Furthermore, literature mining
System Level Cell Cycle Experiments tools are helping us in searching for specific experi-
mental results on our favorite proteins. Large-scale
Many models are parameterized by fitting data on gene perturbation experiments are also being used to
physiological observations: cell sizes, cell cycle investigate the cell cycle: studies on single- or double
phase distributions, and viabilities of mutants – after gene-deletion strains (Hillenmeyer et al. 2008) and phe-
painstaking literature mining for these data. The notype analysis of protein overexpressing cells are all
advanced large-scale measurements and the related giving important results to cell cycle research (▶ Cell
databases provide modelers data to help them formu- Cycle Analysis, Systematic Gene Overexpression).
lating larger, more detailed, and more precise models All these resources are being used to support the
in the near future. development of computational models of cell cycle
Results collected over the last 40 years, including regulation. The phenotypes of the deletion and
recent system-level experiments have extended our overexpression mutant strains provide physiological
knowledge and now allow us to construct much more constrains for such models, while time-course mea-
detailed interaction maps of key cell cycle regulatory surements of mRNA and protein levels provide data
steps (▶ Cell Cycle Transition, Detailed Regulation of to be fitted by the simulation of the models.
Cell Cycle 229 C
Models Beyond the Cell Cycle describing the crosstalk between the various upstream
and downstream pathways and the core cell cycle
Cell cycle of individual cells in multicellular organ- machinery.
isms (such as humans) is controlled by the environ-
ment as well as it is influenced by its neighboring cells.
Cell–cell interactions make the modeling of cell cycle Summary
in a cellular tissue a multilevel problem (▶ Multilevel C
Modeling, Cell Proliferation). Not only the internal A full systems biology workflow of experiment !
molecular interactions of each individual cell, but database ! model ! experiment was applied to
their incoming and outgoing signals to their neighbors understand the role of positive feedback in the cell
need to be tracked. Cell cycle modules could be intro- cycle regulation. To realize the entire loop took more
duced into agent-based models to simulate how cell than 10 years in the case of the frog egg studies, but it
cycle is controlled in individual interacting cells. Fur- took only 4 years in the case of the budding yeast cell
thermore, the core cell cycle engine should properly cycle control. We are hoping that with the emergence
regulate important downstream processes, such as of systems biology, stimulating interactions between
DNA replication and cell division. To carry out experimentalists and theoreticians could lead to more
a proper function, it is interlocked with several other detailed, more precise and realistic models. For
pathways. Various environmental effects (stress, light, instance, we will need more sophisticated experimen-
temperature, etc.) can influence progression through tal techniques to measure protein activity fluctuations
cell division cycles, and several drugs, external signal- in time – desirably in individual cells. At the same
ing molecules, and metabolites could also affect cell time, better modeling and parameter estimation soft-
proliferation (Csikasz-Nagy 2009). Many of these ware is needed to deal with the increasing amounts of
upstream and downstream pathways of cell cycle reg- data and to accommodate new types of measurements
ulation have been investigated in great detail by sys- on individual cells. Even though we have learned
tems biology techniques. The yeast osmotic pressure a great deal about cell cycle regulation in the last
sensing pathway is one of the most investigated path- 40–50 years, there are still gaps in our knowledge
ways; several experiments and computational models that need bridged, and experimentalists and theoreti-
are dealing with it (Klipp et al. 2005). Computational cians need to work together to accomplish this.
models of the mammalian system are focusing on the
regulation of the cell cycle by the low oxygen induced
hypoxia pathway, on the Src and NFkB pathways, as Cross-References
well as on the interactions between DNA damage,
apoptosis, and the cell cycle. A recent discovery on ▶ CDK Inhibitors
the connection between the circadian clock and the cell ▶ Cell Cycle Analysis, Expression Profiling
cycle in mammalian cells (▶ Cell Cycle, Coupled with ▶ Cell Cycle Analysis, Flow Cytometry
Circadian Clock) also initiated some computational ▶ Cell Cycle Analysis, Live-Cell Imaging
modeling work along these lines. Likewise, the role ▶ Cell Cycle Analysis, Systematic Gene
of micro RNAs in relation to cell cycle regulation are Overexpression
also being investigated by computational modeling ▶ Cell Cycle Arrest After DNA Damage
(▶ Cell Cycle Regulation, microRNAs). Also, some ▶ Cell Cycle Checkpoints
of the downstream processes induced by the cell ▶ Cell Cycle Data Analysis
cycle machinery have been studied by mathematical ▶ Cell Cycle Dynamics, Bistability and Oscillations
modeling. The regulation of cell growth, cell division, ▶ Cell Cycle Dynamics, Irreversibility
DNA replication and several aspects of mitosis are all ▶ Cell Cycle Model Analysis, Bifurcation Theory
described in some detail in the literature (Lygeros et al. ▶ Cell Cycle Modeling Using Logical Rules
2008; Mogilner et al. 2006), but none of these are ▶ Cell Cycle Modeling, Differential Equation
connected to a model of the core cell cycle regulatory ▶ Cell Cycle Modeling, Petri Nets
network. Thus, we still need to wait to see the results ▶ Cell Cycle Modeling, Process Algebra
C 230 Cell Cycle
▶ Cell Cycle Modeling, Stochastic Methods Ciliberto A, Shah JV (2009) A quantitative systems view of the
▶ Cell Cycle Models, Sensitivity Analysis spindle assembly checkpoint. EMBO J 28:2162–2173
Cross FR, Archambault V, Miller M, Klovstad M (2002) Testing
▶ Cell Cycle of Early Frog Embryos a mathematical model for the yeast cell cycle. Mol Biol Cell
▶ Cell Cycle of Mammalian Cells 13:52–70
▶ Cell Cycle Regulation, microRNAs Csikasz-Nagy A (2009) Computational systems biology of the
▶ Cell Cycle Signaling, Hypoxia cell cycle. Brief Bioinform 10:424–434
Csikasz-Nagy A, Battogtokh D, Chen KC, Novak B, Tyson JJ
▶ Cell Cycle Signaling, Metabolic Pathway (2006) Analysis of a generic model of eukaryotic cell-cycle
▶ Cell Cycle Signaling, Spindle Assembly Checkpoint regulation. Biophys J 90:4361–4379
▶ Cell Cycle Transition, Detailed Regulation of Evans T, Rosenthal ET, Youngblom J, Distel D, Hunt T (1983)
Restriction Point Cyclin: a protein specified by maternal mRNA in sea urchin
eggs that is destroyed at each cleavage division. Cell
▶ Cell Cycle Transition, Principles of Restriction 33:389–396
Point Ferrell JE (2002) Self-perpetuating states in signal transduction:
▶ Cell Cycle Transitions, G2/M positive feedback, double-negative feedback and bistability.
▶ Cell Cycle Transitions, Mitotic Exit Curr Opin Cell Biol 14:140–148
Gauthier NP, Jensen LJ, Wernersson R, Brunak S, Jensen TS
▶ Cell Cycle, Budding Yeast (2010) Cyclebase.org: version 2.0, an updated comprehen-
▶ Cell Cycle, Cancer Cell Cycle and Oncogene sive, multi-species repository of cell cycle experiments
Addiction and derived analysis results. Nucleic Acids Res 38:D699–
▶ Cell Cycle, Cell Size Regulation D702
Gillespie DT (2007) Stochastic simulation of chemical kinetics.
▶ Cell Cycle, Coupled With Circadian Clock Annu Rev Phys Chem 58:35–55
▶ Cell Cycle, Fission Yeast Goldbeter A (1991) A minimal cascade model for the mitotic
▶ Cell Cycle, Physiology oscillator involving cyclin and cdc2 kinase. Proc Natl Acad
▶ Cell Cycle, Prokaryotes Sci USA 88:9107–9111
Hartwell LH, Weinert TA (1989) Checkpoints: controls that
▶ Cyclins and Cyclin-Dependent Kinases ensure the order of cell cycle events. Science 246:629–634
▶ Cytokinesis Hartwell LH, Mortimer RK, Culotti J, Culotti M (1973) Genetic
▶ DNA Replication control of the cell division cycle in yeast: V. Genetic analysis
▶ Endoreplication of cdc mutants. Genetics 74:267–286
Hillenmeyer ME, Fung E, Wildenhain J, Pierce SE, Hoon S,
▶ Lymphocyte Dynamics and Repertoires, Lee W, Proctor M, St Onge RP, Tyers M, Koller D,
Biological Methods Altman RB, Davis RW, Nislow C, Giaever G (2008) The
▶ Meiosis chemical genomic portrait of yeast: uncovering a phenotype
▶ Mitosis for all genes. Science 320:362–365
Klipp E, Nordlander B, Kruger R, Gennemark P, Hohmann S
▶ Modeling Approaches of Cell Cycle Regulation by (2005) Integrative model of the response of yeast to osmotic
Signaling Pathways for External Stress shock. Nat Biotechnol 23:975–982
▶ Multilevel Modeling, Cell Proliferation Koch AL, Schaechter M (1962) A model for statistics of the cell
division process. J Gen Microbiol 29:435–454
Li F, Long T, Lu Y, Ouyang Q, Tang C (2004) The yeast cell-
cycle network is robustly designed. Proc Natl Acad Sci USA
References 101:4781–4786
Lindqvist A, Rodriguez-Bravo V, Medema RH (2009) The deci-
Aebersold R, Mann M (2003) Mass spectrometry-based proteo- sion to enter mitosis: feedback and redundancy in the mitotic
mics. Nature 422:198–207 entry network. J Cell Biol 185:193–202
Alfieri R, Merelli I, Mosca E, Milanesi L (2008) The cell cycle Lukas J, Lukas C, Bartek J (2004) Mammalian cell cycle check-
DB: a systems biology approach to cell cycle analysis. points: signalling pathways and their organization in space
Nucleic Acids Res 36:D641–D645 and time. DNA Repair (Amst) 3:997–1007
Bahler J (2009) Global approaches to study gene regulation. Lygeros J, Koutroumpas K, Dimopoulos S, Legouras I,
Methods 48:217 Kouretas P, Heichinger C, Nurse P, Lygerou Z (2008) Sto-
Bloom J, Cross FR (2007) Multiple levels of cyclin specificity in chastic hybrid modeling of DNA replication across
cell-cycle control. Nat Rev Mol Cell Biol 8:149–160 a complete genome. Proc Natl Acad Sci USA
Chen KC, Csikasz-Nagy A, Gyorffy B, Val J, Novak B, Tyson JJ 105:12295–12300
(2000) Kinetic analysis of a molecular model of the budding Masui Y, Markert CL (1971) Cytoplasmic control of nuclear
yeast cell cycle. Mol Biol Cell 11:369–391 behavior during meiotic maturation of frog oocytes. J Exp
Chen KC, Calzone L, Csikasz-Nagy A, Cross FR, Novak B, Zool 177:129–145
Tyson JJ (2004) Integrative analysis of cell cycle control in Mogilner A, Wollman R, Civelekoglu-Scholey G, Scholey J
budding yeast. Mol Biol Cell 15:3841–3862 (2006) Modeling mitosis. Trends Cell Biol 16:88–96
Cell Cycle Analysis, Expression Profiling 231 C
Morgan DO (2006) The cell cycle: principles of control. New Definition
Science Press, London
Nasmyth K (2001) A prize for proliferation. Cell 107:689–701
Novak B, Tyson JJ (1993) Numerical analysis of Gene expression (or ▶ Transcriptome) profiling refers
a comprehensive model of M-phase control in Xenopus to the simultaneous measurement of the activities of
oocyte extracts and intact embryos. J Cell Sci most or all genes present in a cell, a tissue, or a whole
106(Pt 4):1153–1168 organism. The activity of a given gene is determined
Novak B, Tyson JJ (2008) Design principles of biochemical
oscillators. Nat Rev Mol Cell Biol 9:981–991 by the number of RNA transcripts (expression level) C
Novak B, Tyson JJ, Gyorffy B, Csikasz-Nagy A (2007) Irrevers- derived from this gene under a specific condition.
ible cell-cycle transitions are due to systems-level feedback. Using cells that are synchronized for the cell cycle
Nat Cell Biol 9:724–728 (▶ Cell Cycle, Synchronization), gene expression pro-
Nurse P (1975) Genetic control of cell size at cell division in
yeast. Nature 256:547–551 filing enables the large-scale identification of genes
Nurse P (1990) Universal control mechanism regulating onset of that are periodically expressed as a function of the
M-phase. Nature 344:503–508 cell cycle (▶ Cell Cycle-Regulated Gene Expression).
Panning T, Watson L, Allen N, Chen K, Shaffer C, Tyson J This approach thus provides global data of gene regu-
(2008) Deterministic parallel global parameter estimation for
a model of the budding yeast cell cycle. J Global Optim lation during the cell cycle, which supports a systems-
40:719–738 level understanding of cell-cycle control.
Pomerening JR, Sontag ED, Ferrell JE Jr (2003) Building a cell
cycle oscillator: hysteresis and bistability in the activation of
Cdc2. Nat Cell Biol 5:346–351
Pomerening JR, Kim SY, Ferrell JE Jr (2005) Systems-level Characteristics
dissection of the cell-cycle oscillator: bypassing positive
feedback produces damped oscillations. Cell 122:565–578 Methods for Gene Expression Profiling
Queralt E, Lehane C, Novak B, Uhlmann F (2006) Gene expression profiling has been enabled by the
Downregulation of PP2A(Cdc55) phosphatase by separase
initiates mitotic exit in budding yeast. Cell 125:719–732 development of large-scale technologies such as
Schmidt H, Jirstrand M (2006) Systems biology toolbox for ▶ DNA microarrays and, more recently, next-
MATLAB: a computational platform for research in systems generation sequencing-based approaches such as
biology. Bioinformatics 22:514–515 ▶ RNA-seq. These genome-wide approaches are
Sha W, Moore J, Chen K, Lassaletta AD, Yi CS, Tyson JJ,
Sible JC (2003) Hysteresis drives cell-cycle transitions in quite straightforward experimentally, and the biggest
Xenopus laevis egg extracts. Proc Natl Acad Sci USA challenge is often to extract biologically meaningful
100:975–980 information from the huge data sets generated. Com-
Sia RA, Bardes ES, Lew DJ (1998) Control of Swe1p degrada- putational data mining is therefore a central aspect of
tion by the morphogenesis checkpoint. EMBO J
17:6678–6688 gene expression profiling, including a range of special-
Zachariae W, Nasmyth K (1999) Whose end is destruction: cell ized approaches such as ▶ clustering.
division and the anaphase-promoting complex. Genes Dev Besides methods for gene expression profiling to
13:2039–2058 measure ▶ transcriptome levels during the cell cycle,
complementary large-scale technologies are available
for global insight into ▶ cell-cycle-regulated gene
expression. For example, ▶ chromatin immunoprecip-
Cell Cycle Analysis, Expression Profiling itation (ChIP) combined with ▶ DNA microarrays
(ChIP-chip) or sequencing (ChIP-seq) provide
J€
urg B€ahler and Samuel Marguerat a means to determine the genome-wide binding sites
Department of Genetics, Evolution and Environment of transcription factors controlling gene expression.
and UCL Cancer Institute, University College London, Moreover, recent developments in ▶ proteomics are
London, UK now allowing the global measurement of protein levels
and post-translational modifications as a function of
the cell cycle.
Synonyms
Gene Expression Profiling During the Cell Cycle
Expression profiling; Transcriptome profiling; Periodic control of transcription seems to be
Transcriptomics a universal feature of the cell-division cycle.
C 232 Cell Cycle Analysis, Expression Profiling
M S M S Mcm1p
3
Ace2p
2 Swi5p
Relative expression level
Cln3p
1 M
G1
0.5
Fkh2p G2
Mcm1p MBF
Ndd1p S SBF
0.2 Fkh1p
0 30 60 90 120 150 180 210 240 270
Minutes after synchronization
Cell Cycle Analysis, Expression Profiling, Fig. 1 Expression Cell Cycle Analysis, Expression Profiling, Fig. 2 Serial reg-
profiles of fission yeast transcripts peaking in levels before ulation of transcription factors during the budding yeast cell
M-phase (red) or S-phase (blue). Transcriptomes were analyzed cycle. Each group of transcription factors (specified in colored
by microarrays every 15 min after cell-cycle synchronization for frames) regulates transcription factors acting in the next cell-
two full cell cycles. Data from Rustici et al. (2004) cycle phase (shown in circle). Continuous and hatched lines
indicate transcriptional and post-translational regulation,
respectively
Gene expression profiling of proliferating cells after
▶ cell-cycle synchronization have identified hundreds
of periodically expressed genes, in organisms ranging in human, are not conserved at the sequence level but
from bacteria to humans (Breeden 2003). Periodically play analogous roles in controlling genes whose tran-
expressed genes often play specific roles in the cell script levels peak before S-phase.
cycle, and their expression levels typically peak just Cell-cycle-regulated gene expression often shows
before the phase during which they function. Particu- serial regulation, whereby transcription factors func-
larly rich genome-wide data are available for budding tioning during one cell-cycle phase up-regulate tran-
and fission yeasts, which provide complementary scripts that encode downstream transcription factors
models for cell-cycle-regulated ▶ gene expression functioning during the next phase (Fig. 2) (Futcher
(B€ahler 2005). About 10–20% of all yeast genes are 2002). Such a network of sequentially expressed tran-
periodically expressed during the cell cycle, many of scription factors may act as a cell-cycle oscillator that
them in three transcriptional waves that roughly coin- regulates the periodic gene expression program inde-
cide with major cell-cycle transitions: initiation of pendently of, and in addition to, cyclin-dependent
DNA replication, entry into mitosis, and exit from kinases (Simmons Kovacs et al. 2008).
mitosis (Fig. 1). Budding and fission yeasts show similarities as well
as intriguing differences in their ▶ transcriptional reg-
Control of Cell-Cycle-Regulated Gene Expression ulatory networks, revealing evolutionary plasticity of
Periodic gene expression is integrated with cell-cycle gene expression control (B€ahler 2005): Regulatory
progression by multiple feedback loops at both tran- circuits between conserved transcription factors have
scriptional and post-translational levels (B€ahler 2005; been rewired, and periodic transcription of many
Futcher 2002). Some cell-cycle transcription factors orthologous genes is not conserved, except for a core
are functionally conserved from yeast to human. For set of genes that may be universally regulated during
example, proteins of the forkhead family, such as Fkh2 the eukaryotic cell cycle and may play key roles in cell-
in yeast and FoxM in human, control genes whose cycle progression.
transcript levels peak before M-phase. Other transcrip- Although protein complexes involved in the cell
tional regulators, such as MBF/SBF in yeast and E2F cycle are largely conserved among eukaryotes, their
Cell Cycle Analysis, Flow Cytometry 233 C
Cross-References
P Protein
degradation ▶ Cell Cycle-Regulated Gene Expression
▶ Cell Cycle, Synchronization
▶ Chromatin Immunoprecipitation
Active complex
▶ Clustering
▶ DNA Microarrays C
Inactive complex ▶ Proteomics
▶ RNA-Seq
Periodically ▶ Transcriptional Regulatory Network
expressed subunit
▶ Transcriptome
a b
Relative frequency
2 2
Genomes
1 1
c d
2
DNA content
Frequency
Cell Cycle Analysis, Flow Cytometry, Fig. 1 Relationship a function of the fractional cell cycle time. See (Walker 1954) for
between cell cycle-related DNA expression and cytometry of the logic. Panel (c) shows simulated DNA content measure-
DNA content. Panel (a) represents DNA expression over the life ments that would be measured for cells programmed to express
of a cell with the values G1 ¼ 5.8 h, S ¼ 6.8 h, and G2 + DNA as in A. To generate these data, the X axis of A is
M¼2.8 h, which are values that we have previously measured transformed as Xt ¼ (2*exp(-0.6931*Xi/Tc))*(X/Tc),
for NIH3T3 cells growing in 10% serum at a density of 87–121 Tc ¼ cell cycle time (15.4), Xt ¼ transformed X, Xi ¼ original
cells/cm2 (Disalvo et al. 1995). Panel (b) shows the age distri- X. Xts are unitless fractions of Tc. Panel (d) is a histogram of the
bution of a population of asynchronously growing cells, data in Panel (c)
accounting for the generation of two cells at birth, plotted as
C 236 Cell Cycle Analysis, Flow Cytometry
are generated along the adjusted “expression” curve to subsequent Gaussian standard deviation terms are cal-
simulate cytometric measurements, we get Fig. 1c. If culated from that using the coefficient of variation
we then create a histogram from the simulated data (standard deviation/mean) multiplied by the mean of
(Fig. 1d), one obtains a simulated DNA content distri- the subsequent Gaussian. The same logic is used to
bution for a proliferating cell population with an aver- “broaden” the ends of either a trapezoid series or
age cell cycle time of 15.6 h. This model does not a single polynomial function. Broadening either starts
account for any phase specific cell death. For further at the center of each peak or some fractional value
reading, see Walker (1954), Mitchison (1971), Watson toward the S phase side of each peak (see Fig. 3b).
(1991), Bagwell (1993). The purpose is to account for statistical overlap of each
component (G0/G1, S, and G2/M). Although the
Extension of the Logic Gaussian series is the most appealing from
The illustrated logic of Fig. 1 was first enunciated in a statistical principles point of view, in practice the
1954 (Walker 1954) and later developed within the approach is the least robust and is generally used for
context of data from flow cytometry by several groups. perturbed populations. The polynomial model works
The logic can be extended to multiple parameters. well for asynchronous, unperturbed populations, since
Figure 2 shows simulated plots for DNA and cyclin it robustly accounts for the theoretical, expected shape
A2 (Fig. 2f) compared to real data (Fig. 1a) from which of S phase. The trapezoid model is a good compromise
the fraction profiles (Fig. 2b, c) were derived and noise between the other two, able to robustly fit S phases of
added (Fig. 2d, e). By reversing the logic shown in almost any shape. Most commercial software contains
Fig. 1, the programmed cell cycle expression (Fig. 2g, h) some form of DNA content distribution fitting. Figure 3
can be derived for any parameter, provided a set of shows a typical analysis using a polynomial fit to
continuous measurements can be made covering the S phase.
cell cycle in a nonredundant manner. It is relatively
easy to see that the logic can be extended to an unlimited Limitations
number of parameters and that from primary cytometric Cell populations like liver parenchyme cells in vivo,
data, expression profiles for any measurable biochemi- most mammalian cell cultures, and mammalian tumor
cal feature can be derived. This has distinct implications cells often undergo endoreduplication (failure to
for mathematical models of cell cycle regulation (e.g., undergo mitosis and cytokinesis during a mitotic
Csikasz-Nagy et al. 2006), since the expression profiles cycle). Tumor cells and mammalian cell cultures also
are not different in a relative sense from model outputs fail to undergo binary division (cytokinesis) and thus
of state variable levels over time. create cycling cell populations with two or more nuclei
that cycle together. Additionally, as tumor cells and
DNA Content Analysis for Simple Proliferating tumorigenically transformed mammalian cell cultures
Populations evolve, multiple stable stem lines with fractional
Practical use of DNA content measurements is to cal- genomes (aneuploidy) are present in the populations.
culate the fractions of G0/G1, S, and G2/M cells. Most of these features can be dealt with using single
Because the error in the measurements are normally parameter DNA content analysis with sound logic,
distributed (given that cell fixation and staining, and a theoretical underpinning, and robustness obtained
instrument operation are within standard practice), by years of practice and widespread use. At some
a typical analysis is performed by fitting a complex point, these analyses become tortured. ModFit LT
function composed of summed normal distributions (Verity House, Topsham, ME) and MultiCycle AV
centered on the primary and secondary peaks (G0/G1 (Phoenix Flow Systems, San Diego, CA) are two soft-
and G2/M) and an S phase component that is either ware packages that support complex models.
(1) a series of Gaussian functions, (2) a series of trap-
ezoids with Gaussian “broadened” ends, or Further Reading
(3) a polynomial with “broadened” ends. Generally, There are many good reviews and chapters describing
the terms (mean and standard deviation) of the Gauss- DNA content analysis (Dean 1987; Gray et al. 1990;
ian functions are defined by the G0/G1 peak and all Watson 1991; Bagwell 1993; Rabinovitch 1993).
Cell Cycle Analysis, Flow Cytometry 237 C
a b c
Cyclin A2 content
DNA content
Cyclin A2
Cyclin A2
Cyclin A2
Frequency Frequency DNA
g h
Cyclin A2 expression
DNA expression
Cell Cycle Analysis, Flow Cytometry, Fig. 2 Relationship described in Frisa and Jacobberger (2009). A lengthy, more
between cell cycle-related expression and cytometry of two detailed description is described in Jacobberger et al. (2011b).
parameters. Panel (a) shows data from exponentially growing Panels (g) and (h) are the calculated cell cycle–related expres-
RKO cells that were stained for cyclins A2, B1, DNA content, sion of DNA and cyclin A2, calculated by reversing the logic in
and phospho-S10-histone H3 (Yan et al. 2004). The cell popu- Fig. 1. Panel (f) is a bivariate plot of the data in Panels (d) and
lation frequency profiles for DNA content and cyclin A2 calcu- (e). Note the similarity of the simulated data (f) and the
lated from the data shown in Panel (a) are shown in Panels (b) real data (a)
and (c). Calculation of these profiles was performed generally as
b
Cell count
Cell count
DNA content DNA content
Cell Cycle Analysis, Flow Cytometry, Fig. 3 Common DNA function (black line), extending that line to some predefined
content analysis. The data are from a human T lymphoma cell point under the Gaussian distributed data in either direction
line (Molt4). (a): The model is composed of three components, (here, the means of each Gaussian component), then “broaden-
two Gaussian functions (red) and a “broadened” polynomial ing” or “tailing” the curve with partial Gaussian functions using
(blue, hatched) that are used to fit the areas under the data that the coefficients of variation from Gaussian components (red
can be attributed to cells with 2C and 4C DNA (G1 and G2 + M) components in a) and the amplitudes at the ends of the polyno-
and for S phase (G1< S<G2). (b): Illustrates the fitting of the mial. (ModFit LT was used to fit the distribution in a; Graphpad
central S phase data (circles) to a second-order polynomial Prism was used to generate b)
one wavelength when intercalated into double- quantify DNA and RNA. Later efforts rely on DNA-
stranded nucleic acids and at another wavelength specific dyes (DAPI, Hoechst 33,258) coupled with the
when bound to single-stranded nucleic acids. This RNA binding dye, Pyronin Y. These approaches can
characteristic can be exploited by treating fixed distinguish deep G0 (non-cycling) cells from cycling
cells with RNase to remove RNA and acid to partially cells in addition to the typical G1, S, and G2/M anal-
denature residual DNA. The level of denaturability ysis of DNA content. The acridine approach requires
of DNA changes within the cell cycle, and by measur- some skill and knowledge of chemistry. Ribosomal
ing fluorescence at the two wavelengths, then plotting RNA is essentially what is measured. For further read-
total fluorescence (adding the two measurements) ing, see Darzynkiewicz et al. (1980), Darzynkiewicz
versus the fluorescence for single-stranded DNA, and Traganos (1990).
one can identify five cell cycle compartments –
G1 can be subdivided in two, G1A or G1B (early DNA and Protein Content
and late), S, G2, and M. For further reading, see Using dyes that covalently couple with primary amines
Darzynkiewicz et al. (1980), Darzynkiewicz and or dyes that preferentially bind protein, protein content
Traganos (1990). measurements can be coupled with DNA content. In
general, these approaches are not often used. Protein
DNA and RNA Content content correlates with RNA content. In principal, the
Early protocols used metachromatic dyes like acridine approach should be useful for studying imbalanced
orange which fluoresces green when intercalated into growth (an imbalance between the mitotic cycle and
double-stranded nucleic acids and red when bound to the process of synthesizing approximately twice the
single-stranded nucleic acids or other polyanions. cell’s mass). For further information, see Crissman and
Because DNA is double stranded and cellular RNA is Steinkamp (1987), Darzynkiewicz and Traganos
largely single stranded, acridine orange can be used to (1990).
Cell Cycle Analysis, Flow Cytometry 239 C
DNA Content and BrdU pulse (the single cell pulse as viewed on an oscillo-
The fluorescence from the DNA binding dyes, Hoechst scope) provides an ability to eliminate aggregates from
33,342 or 33,258, is quenched when the dye is bound to the assay, and measurement of excitation light scatter
▶ 5-Bromo-2-deoxyuridine (BrdU) substituted DNA. provides a discriminator based on cell size and granu-
By coupling this dye with a dye that does not quench larity or surface geometric complexity. However, these
and fluoresces at different wavelengths, this feature are largely utilitarian. Conversely, measuring epitopes,
has been exploited elegantly by two laboratories to the expression profiles of proteins or protein modifica- C
measure the phase fractions of successive cell cycles. tions that oscillate within the cell cycle and that oscil-
For further reading, see Poot et al. (1994). late out of phase with each other provides an analytical
paradigm for creating cell states that can be used to
DNA Content and Immunofluorescence parse, in a very fine manner, the effects of various
Immunofluorescence coupled with DNA content treatments on the cell cycle, e.g., drug effects.
cytometry opens up a very broad ability to probe biol-
ogy related to the cell cycle. Immunofluorescence DNA, Cyclin A2, and Phospho-S10-Histone H3 (pH3)
probes can home in on a population of interest in Many epitopes that report the levels of proteins or
complex mixtures, e.g., markers for keratin can isolate other parent molecules can be coupled to provide addi-
the epithelial component in a solid tumor (Glogovac tional sub-compartmentalization of the cell cycle or
et al. 1996). Other cell processes can be queried with other information. The triad of DNA, cyclin A2, and
respect to the cell cycle, e.g., activation of DNA repair pH3 deserves special mention because their joining
pathways using the gH2Ax epitope (Tanaka et al. into one assay provides the ability to deal with
2007). Below, two common immunofluorescence endoreduplication/binucleate cycles (see Limitations,
approaches are described. above). Figures 4 and 5 show the principal features of
this analysis. Figure 5 shows the capture of the primary
DNA Content and BrdU BrdU-substituted DNA can (stem line) cycle (2C ! 4C). Figure 5a shows
be detected by antibodies. This is commonly used to a “region,” R2, set around mitotic cells. Figure 5b is
either prove that the S phase cells detected by DNA “NOT gated” on R2 and a region was set around the
content analysis were synthesizing DNA at the time of 2C ! 4C interphase cells (R3). The data of both 5A
sampling (other possibilities are that the cells were and 5B have been gated to include singlet cells and
dead, dying, or quiescent) or to determine the average exclude aggregates as in Fig. 4. Figure 5c shows all the
phase times by pulse-chase or continuous labeling events including singlet interphase cells (green), singlet
experiments. For further reading, see Gray et al. mitotic cells (blue), and debris/dead cells/outliers and
(1990), Gray et al. (1987), Dean (1987), Terry and aggregates (red), which are not used in the analysis.
White (2001). Figure 5e is a look at the discarded data. The labels
2X, 3X, and 4X point to G1 aggregates (containing 2, 3,
DNA and a Mitotic Marker Many proteins are spe- or 4 cells) together with endoreduplicated/multinucleate
cifically phosphorylated or phosphorylated more often cycling cells for which each event is a single cell.
when cells enter mitosis. Phospho-specific antibodies Finally, Fig. 5c shows cyclin A2 versus pH3 for the
can be raised to these epitopes. These antibodies cou- 2C ! 4C stem line. Regions are contiguously set
ple well with DNA content to provide G0/G1, S, G2, around clusters defined either by changes in cell fre-
and M analysis. Figure 4 shows typical data. For a short quency or by changes in the “direction” of the data
but comprehensive review, see Darzynkiewicz (2008). (arrows). The arrows show “movement” of an ideal
For methods, see Juan et al. (2001). cell through the data space. Div is the interface where
cell division occurs. Figure 5e is particularly relevant to
Multiparametric Cell Cycle Analysis Systems Biology, since it is easy to see that the expres-
Almost all cytometry assays of the cell cycle are sion pattern of pH3 and cyclin A2 form a unidirectional
multiparametric. Commonly, even for DNA content closed loop, and this renders the ability to extract
alone, measuring the pulse height and the integrated programmed expression profiles of these and other
C 240 Cell Cycle Analysis, Flow Cytometry
a 2,000 b
M
104
DAPI (Peak height) (× 100)
1,500
Single
cells
pH3-A488
1,000 103
500
102
0
0 500 1,000 1,500 2,000 2,500 0 500 1,000 1,500 2,000 2,500
DAPI (Integral) (× 100) DAPI (Integral) (× 100)
c d 3,000
1,500 2,500
2,000
Number
Interphase
Number
1,000
1,500
1,000
500
500
M
0 0
Cell Cycle Analysis, Flow Cytometry, Fig. 4 DNA content are colored blue. Panel (c) shows the distribution of pH3 for
analysis that differentiates mitotic cells. Exponentially growing the 2 C ! 4 C interphase (green line) and M cells (blue line).
HeLa cells were fixed and stained for DNA content, cyclin A2, Quantification of the fractions of 2 C ! 4 C interphase and
and histone H3 phosphorylated at serine 10 (pH3). Panel (a) is M cells can be done from this plot. Panel (d) shows the DNA
a plot of the DNA signal trace* peak height versus the integrated content for all of the 2 C ! 4 C cells (yellow line), the 2 C ! 4 C
area under the peak. This plot is used to reduce the fraction of interphase cells (green line), and the 2 C ! 4 C M cells (blue
aggregate cells in the analysis. See Rabinovitch (1993) for line). Both plots in panels (c) and (d) were obtained by using the
a discussion of the logic, theory, and practical considerations gate: [(Single cells AND R3) OR M]. (R3 is a region on cyclin
supporting this method. Singlet events, falling within the region A2 versus DNA, not shown in this figure, but shown in Fig. 5).
labeled “Single cells,” are colored green. Debris (events with Quantification of the fractions of 2 C ! 4 C G1, S, and G2 are
less than a G1 level of fluorescence) and aggregate events are performed on a plot of the green line in Panel (d), using the
colored red. Mitotic cells are selected in Panel (b) by selecting methods illustrated in Fig. 3.*the digitized electrical measure-
the cells within the cluster with 4 C DNA content and high ment as a function of time as a cell or particle moves through the
expression of pH3 (rectangle label “M”). The mitotic events laser beam
state variables. By calculating the mean levels of these Cell Cycle Analysis and Extension of the Logic.
or other parameters and the frequency of cells within For further reading, see Jacobberger et al. (2008), Soni
each region, the programmed expression profile can be et al. (2008), Avva et al. (2011), Jacobberger et al.
obtained by the principles outlined in sections Basis for (2011a, b).
Cell Cycle Analysis, Flow Cytometry 241 C
a b c
R3
R2 104
104 104
cyc A2-PE
pH3-A488
pH3-A488
Div
103 103 C
103
0
102
102
500 1,000 1,500 2,000 500 1,000 1,500 2,000 –103 0 103 104
DAPI (Integral) (× 100) DAPI (Integral) (× 100) cyc A2-PE
d e
105 105
cyc A2-PE
103 103 4X
Debris 3X
0 2X
0
Cell death ?
–103 –103
0 500 1,000 1,500 2,000 2,500 0 500 1,000 1,500 2,000 2,500
DAPI (Integral) (× 100) DAPI (Integral) (× 100)
Cell Cycle Analysis, Flow Cytometry, Fig. 5 Complex cell entire 2 C ! 4 C cycle starting from a state that is equivalent to
cycle analysis. The data file is the same as in Fig. 4. Panel (a) “birth/early G1” and one equivalent to the end of the cell cycle.
represents the same view as Panel (b) in Fig. 4, except that these These states are separated by “Div.” The arrows show the uni-
data have been gated on the “Single cells” region. Panel (b) directional movement of cells through the state space defined by
shows cyclin A2 versus DNA content, Boolean gated on “Single DNA content, cyclin A2, and pH3. Panels (d) and (e) are
cells” AND NOT “M,” which equals 2 C ! 4 C and 4 C ! 8 C identical plots of cyclin A2 versus DNA content, showing the
interphases. The gate, (“Single cells” AND R3) OR R2, 2 C ! 4 C interphase population (green dots) and mitotic cells
recombines interphase with mitosis, including mitotic cells that (blue dots) in Panel (d) and the events that are not used with these
are excluded in the singlet region (arrow, Fig. 4a). This gate has data removed from the plot (Panel e)
been applied to Panel (c), which provides a bivariate view of the
a
1. Add liquid agar (~60 °C) 2. Add another (thinner) 3. Remove top cover
cover glass and wait for glass and add cells
the agar to cool
Cover glass
To the microscope
Pump
Imaging
Cell Cycle Analysis, Live-Cell Imaging, Fig. 1 Two ways to Cut out the region of agar containing the cells and flip it 180
prepare yeast cells for imaging: (a). Add 3–4 ml of 1.5% low- onto the imaging cover glass/dish used for microscope imaging.
melt agar, heated to 60 C, to a cover glass. Leave it to cool (b). Schematic of a microfluidic device. Cells are grown in
a few minutes and then sandwich the agar with a cover slip and a flowcell where media influx and temperature are controlled
wait until the agar cools completely. Once the agar is cool, externally
remove the top cover glass and add the cells on top of the agar.
Data Acquisition 2005). The best ▶ fluorescent markers used for cell
There are two general ways to monitor cells through cycle analysis have the following qualities:
time: Either track the individual cells through time or • Bright enough to yield a good signal at low intensity
sample large numbers of cells without replacement illumination that does not induce photodamage.
(population-based measurements) after cell cycle syn- Enhanced fast folding GFP, or Venus (yellow) are
chronization. On one hand, single-cell analysis does both excellent; mCherry (red) is much dimmer and
not require perturbative synchronization methods, has slower maturation kinetics.
reveals sharper transitions, and can be used to measure • Clearly mark one/several cell cycle events. In gen-
cell-to-cell variation due to stochastic fluctuations. eral, measurements of changes in protein localiza-
On the other hand, population-based methods tion or degradation are preferable to measurements
(e.g., flow cytometry, coulter counter) can be used to of changing concentration because even fast matur-
rapidly acquire large numbers of measurements, and ing GFP proteins fold in ~10–15 min.
can be used in conjunction with standard biochemical • Do not interfere with the natural function of the
techniques. fluorescently labeled protein.
Image acquisition is normally done using ▶ fluo- Commonly used markers for some cell cycle tran-
rescence microscopy in combination with an addi- sitions can be found in Table 1:
tional imaging technique (e.g., bright field, phase A novel technique for live-cell imaging developed
contrast, or differential interference contrast) to deter- by the Pines group relies on a FRET (fluorescence
mine the location of the cells. Depending on the cellu- resonance energy transfer) sensor to measure
lar process studied, a variety of ▶ fluorescent markers CDK-Cyclin B activity in single live cells (Gavet and
may be used. Recent technical advances by the Tsien Pines 2010) Similarly, the Kapoor lab used FRET
group have greatly expanded the quality and color probes to measure spatial and temporal gradient
spectrum of available fluorophores (Shaner et al. of aurora B activity during mitosis in live cells
Cell Cycle Analysis, Live-Cell Imaging 245 C
a Htb2-mCherry Htb2-mCherry Htb2-mCherry Htb2-mCherry Htb2-mCherry Htb2-mCherry
Whi5-GFP Whi5-GFP Whi5-GFP Whi5-GFP Whi5-GFP Whi5-GFP
b c Cln2 induced
60 SCD 60 SCD
Nuclear Whi5 [A.U.]
20 20
SCD +αfactor
0 0
0 60 120 180 240 300 –60 0 60 120 150 210
Time [min] Time [min]
Cell Cycle Analysis, Live-Cell Imaging, Fig. 2 (a). Compos- cell grown in SCD. Whi5-GFP enters the nucleus at anaphase
ite phase and fluorescence images of budding yeast cells grown and exits the nucleus at the point of commitment to cell division
in a flowcell with synthetic complete glucose medium (SCD). in mid-G1. (c). Example nuclear Whi5-GFP time course of a cell
This strain contains integrated Htb2-mCherry (a nuclear marker) arrested in a-factor (added at time ¼ 0) and then forced out
and Whi5-GFP fusion proteins (marking early G1, see table1). of arrest using the inducible MET3-promoter to express the
Note that the cells arrest after addition of a-factor (mating G1-cyclin CLN2 (see text)
pheromone). (b). Example nuclear Whi5-GFP time series of a
(Fuller et al. 2008). The ability to measure protein Although standard algorithms and commercial
activity, rather than protein concentration, in individ- products exist for image analysis, each application
ual cells is a great advantage because of the various tends to be unique enough to require customization.
inhibiting and activating post-translational modifica- As the image processing needs of the scientific com-
tions affecting function. Thus, we expect FRET-based munity increase in the coming decade, we expect com-
techniques to become increasingly important. mercial products to improve and be more widely
applied.
Analysis
Image Analysis An example of Cell Cycle Analysis Using Live-Cell
To obtain single-cell data it is necessary to segment Imaging
(finding the cell outline) and track individual cells over A striking example of the potential in single-cell cell
time. Segmentation is normally done on the entire cell or cycle analysis can be seen in the work by Di Talia et al.
on some subcellular compartment of special interest like to measure size control in budding yeast (Di Talia et al.
the nucleus (Sigal et al. 2006). In general, segmentation 2007). Yeast exhibit size homeostasis: Small cells
starts by manipulating the original image to enhance the must grow more than larger cells through their cell
difference between “cell” and “non-cell” regions. The cycle (▶ Cell Cycle, Cell Size Regulation). To exam-
exact nature of this step depends on the imaging and the ine this outstanding problem, Di Talia and colleagues
markers used. Next, a threshold is applied to distinguish used yeast labeled with a Myo1-GFP fusion protein
cells from non-cells. After the cells are identified, (used to divide the cell cycle into G1 and S/G2/M
parameters useful for cell cycle data analysis such as intervals) and ACT1pr-DsRed, a red fluorescent
size, average fluorescence, and timing of signal appear- protein under the control of the strong and constitu-
ance/disappearance can be extracted. tive ACT1-promoter (used to measure cell size).
C 246 Cell Cycle Analysis, Live-Cell Imaging
Cell Cycle Analysis, Live-Cell Imaging, Table 1 Some commonly used cell cycle markers
Organism/
Cell cycle event/phase Cellular process/localization Gene(s) cell type Ref
Cytokinesis/G1 Disappearance of the septin ring Cdc10, Myo1 Budding (Di Talia et al.
initiation yeast 2007)
Start transition Whi5 export from the nucleus Whi5 Budding (Di Talia et al.
(mid G1) yeast 2007)
S-phase initiation Reappearance of the septin ring Cdc10, Myo1 Budding (Di Talia et al.
yeast 2007)
Metaphase to anaphase Nuclear separation (also gives Htb2 or any other nuclear marker Budding Many
transition lineage information) yeast
Cell cycle transitions, Clb2 degradation/Cdc14 Clb2, Cdc14 Budding (Drapkin et al.
mitotic exit localization yeast 2009)
Mitosis Anaphase spindle status Tub1 Budding (Drapkin et al.
yeast 2009)
G1 & G2 Sensor nuclear Proliferating cell nuclear antigen Human cell (Hahn et al. 2009)
(PCNA)-based biosensor line
S Sensor nuclear at replication foci PCNA-based biosensor Human cell (Hahn et al. 2009)
line
M Sensor located throughout the cell PCNA-based biosensor Human cell (Hahn et al. 2009)
line
End of M-phase to Sensor nuclear Truncated human DNA helicase Human cell (Hahn et al. 2009)
G1/S B (HDHB)-based biosensor line
S & G2, cell cycle Sensor cytoplasmic HDHB-based biosensor Human cell (Hahn et al. 2009)
transitions, G2/M line
M Sensor located throughout the cell HDHB-based biosensor Human cell (Hahn et al. 2009)
line
G1 to S & G1 Licensing Fragment of Cdt1 bound to mKO Metazoan (Sakaue-Sawano
(red/orange) cells et al. 2008)
S, G2 & M Licensing inhibition Fragment of geminin bound to Metazoan (Sakaue-Sawano
mAG (green) cells et al. 2008)
cell cycle regulation. On the other hand, many genes There are thus few examples of the systematic screen-
those are known to be involved in other processes than ings of the genes that cause the cell cycle defects upon
the cell cycle were also identified, though for all overexpression. In the fruit fly, there is a system (GAL4
genetic experiments it is difficult to circumvent driving systems) by which the target gene can be
obtaining indirect factors. overexpressed in organ specific manner. Using 2,300
Along with the extension of our understanding of the fly lines in which each target locus (gene) is
cell cycle mechanisms, an integrative mathematical overexpressed, 32 genes were isolated by the “small
model of the cell cycle has been constructed (Chen eye phenotype” that is an indicator of the cell cycle
et al. 2004). Robustness against parameter fluctuations defects (Tseng and Hariharan 2002). It is estimated
is useful to evaluate the plausibility of biological models that only less than 10% genes are studied in this analysis.
(Morohashi et al. 2002). By measuring limits of gene
overexpression systematically, promoter of the cell The Future of the Cell Cycle Analysis Using
cycle of real cell against (gene expression) parameter Systematic Gene Overexpression
fluctuations can be measured. genetic Tug-Of-War Genetic screening using overexpression enables us to
(gTOW) method is used for this purpose. In gTOW, isolate genes that do not show any cell cycle defect
instead of promoter swapping, the target gene with its upon gene disruption. For example, cyclin genes that
native promoter is cloned into a plasmid of which the make gene family in the budding yeast does not show
copy number increases in the leucine-limiting condition. significant phenotype upon disruption, but they show
If overexpression of the target gene causes the cell cycle strong cell cycle defect upon gene overexpression.
defect, the copy number of the plasmid (and the gene) However, while the gene knockout experiment gives
per cell must be less than the overexpression limit. the unified result because the target gene is completely
Using gTOW, limits of overexpression of about 30 cell inactivated, gene overexpression experiments usually
cycle regulator genes were measured, and the data was do not give unified results because there are many ways
used to evaluate and refine the integrative mathematical to overexpress the target gene (i.e., promoter swapping
model (Moriya et al. 2006; Kaizu et al. 2010). using different promoters, copy number increase),
which lead to the different expression levels. In fact,
Fission Yeast (Schizosaccharomyces pombe) there are surprisingly few overlaps among the gene sets
The fission yeast is another model organism in which isolated in the three budding yeast genome-wide
the regulatory system of the cell cycle has been exten- screening described above (Niu et al. 2008).
sively studied. In this yeast, the target gene can be To obtain the unified result in the overexpression of
artificially overexpressed using nmt1 promoter that is every gene, we need to systematically measure the
induced in thiamine-limiting condition. Although the overexpression limit that causes certain cell cycle
induction of nmt1 promoter is relatively slower defect. An experiment to continuously observe the
(18 h) than the budding yeast GAL1 promoter cell cycle defect upon gradual increase of the target
(1 h), defects of the fission yeast cell cycle can be gene expression will be a one suitable way. Such
easily observed with the elongated morphology (cdc experiment can be done in the model organisms such
phenotype) under the microscope. Until now, 21 genes as the budding yeast (e.g., Charvin et al. 2008). In
were isolated by the systematic screening with cdc higher eukaryotes, the similar technology needs to be
phenotype of the transformants of a cDNA library developed using cultured cell first.
(Tallada et al. 2002). Technically, genuine exhaustive
genomic screening and gTOW can be performed in the
fission yeast as well. Cross-References
Charvin G, Cross FR, Siggia ED (2008) A microfluidic device Cell cycle arrest after DNA damage describes the
for temporally controlled gene expression and long-term
interconnection between two complex signaling pro-
fluorescent imaging in unperturbed dividing yeast cells.
PLoS ONE 3(1):e1468 cesses – DNA damage sensing and the cell cycle – by
Chen KC, Calzone L, Csikasz-Nagy A, Cross FR, Novak B, a variety of biochemical interactions. Damage may
Tyson JJ (2004) Integrative analysis of cell cycle control in arise from various sources, including radiation, chem- C
budding yeast. Mol Biol Cell 15(8):3841–3862
ical agents, or errors during DNA synthesis or cell
Kaizu K, Moriya H, Kitano H (2010) Fragilities caused by
dosage imbalance in regulation of the budding yeast cell division. The resulting damage is sensed by
cycle. PLoS Genet 6(4):e1000919 a signaling network that halts the cell cycle by modu-
Moriya H, Shimizu-Yoshida Y, Kitano H (2006) In vivo robust- lating cyclin/Cdk activity. Cell cycle arrest can be
ness analysis of cell division cycle genes in Saccharomyces
transient to allow repair of DNA damage, or can persist
cerevisiae. PLoS Genet 2(7):e111
Morohashi M, Winn AE, Borisuk MT, Bolouri H, Doyle J, indefinitely as a senescence-like state. This essay
Kitano H (2002) Robustness as a measure of plausibility describes mechanisms of DNA damage-induced cell
in models of biochemical networks. J Theor Biol cycle arrest, their dynamics, and their effect on even-
216(1):19–30
tual cell fate. It also discusses mathematical modeling
Niu W, Li Z, Zhan W, Iyer VR, Marcotte EM (2008) Mecha-
nisms of cell cycle control revealed by a systematic and approaches used to gain insight into these processes.
quantitative overexpression screen in S. cerevisiae. PLoS
Genet 4(7):e1000120
Sopko R, Huang D, Preston N, Chua G, Papp B,
Kafadar K, Snyder M, Oliver SG, Cyert M, Hughes TR,
Characteristics
Boone C, Andrews B (2006) Mapping pathways and
phenotypes by systematic gene overexpression. Mol Cell Arrest Mechanisms: Connecting DNA Damage
21(3):319–330 Signaling to the Cell Cycle
Stevenson LF, Kennedy BK, Harlow E (2001) A large-scale
Progression through the cell cycle can be viewed as
overexpression screen in Saccharomyces cerevisiae iden-
tifies previously uncharacterized cell cycle genes. Proc Natl having two fundamental goals: the duplication of
Acad Sci USA 98(7):3946–3951 genetic information during the synthesis phase
Tallada VA, Daga RR, Palomeque C, Garzón A, Jimenez J (S phase), and the division of this material among
(2002) Genome-wide search of Schizosaccharomyces
daughter cells during mitosis (M phase) (▶ Cell
pombe genes causing overexpression-mediated cell cycle
defects. Yeast 19(13):1139–1151 Cycle of Mammalian Cells). Damage to the cell’s
Tseng AS, Hariharan IK (2002) An overexpression screen DNA interferes with both goals, leading to the possi-
in Drosophila for genes that restrict growth or cell- bility of erroneous genetic material being replicated or
cycle progression in the developing eye. Genetics
partitioned among daughters. Such errors can drive
162(1):229–243
mutagenesis, and play a central role in cancer
progression.
The first line of defense against such errors is to stop
the cell cycle when damage is sensed, preventing the
Cell Cycle Arrest After DNA Damage initiation of DNA replication or cell division. This
process, known as cell cycle arrest, must satisfy three
Jared Toettcher criteria: it must be activated quickly after damage is
University of California, San Francisco, CA, USA delivered, to prevent cell cycle progression in the pres-
ence of damage; it must remain active as long as
damage is present; and finally, if cells resume cycling
Synonyms after arrest, they must reenter the cell cycle in the same
phase from which they exited. Violation of any of these
Cell cycle arrest mechanisms; DNA damage response; conditions could lead to mutagenic errors. A group of
Extrinsic DNA damage checkpoints; p53 and the cell molecular interactions that transmit the presence of
cycle damage to halt the cell cycle is known as a cell cycle
C 250 Cell Cycle Arrest After DNA Damage
b
a ATM
Plk1
Damage DSBs (e.g. IR) SSBs; other (e.g. UV) Chk2
Wip1
γH2AX RPA Cdc25
Wee1 14-3-3σ
Sensors MRN Ku80 TopBP1
Effector
kinases Mcm
Chk2 p38 Chk1 Mdm2
Cdh1
TFs p53
CycA p27
DNA damage sensing
Cell cycle arrest
CycE p21 Cell cycle
DNA repair
Arrest mechanisms
Apoptosis
E2F1 Rb
CycD
Cell Cycle Arrest After DNA Damage, Fig. 1 DNA damage (black boxes). (b) Multiple mechanisms of arrest connect DNA
signaling and its connection to the cell cycle. (a) DNA damage damage to the cell cycle. Pictured here are arrest mechanisms
signaling connects damage sensors to cell fate through multiple implicated in the response to DNA double-stranded breaks. Post-
levels of effectors. The presence of damage leads to the local translational catalytic steps (solid lines), stoichiometric inhibitor
activation of sensor proteins at the damage sites. These sensors binding (dotted lines) and transcriptional activation or repression
in turn activate kinase cascades leading to growth arrest (indi- (dashed lines) for each arrest mechanism are indicated
cated by red boxes), DNA repair (green boxes) or apoptosis
arrest mechanism. Many arrest mechanisms are (▶ Cell Cycle Checkpoints). DNA damage leading
known, and can act to halt the cell cycle in G1, S, primarily to double-stranded breaks (DSBs) is sensed
G2, or M phase. by the Mre11-Nbs1-Rad50 (MRN) protein complex
binding at the site of damage. The MRN complex
The DNA Damage Response: Multiple Branches activates an effector kinase cascade including the
and Timescales phosphoinositide 3-kinase (PI3K) ataxia telangiectasia
Damage to the cell’s DNA is sensed by proteins of mutated kinase (ATM) and checkpoint kinase 2
three different classes: sensors that interact directly (Chk2). A positive feedback loop whereby ATM phos-
with damaged DNA; effectors that undergo post- phorylates histone H2AX at the site of damage leading
translational modification, allowing them to mediate to additional ATM activation ensures a strong damage
a downstream signaling response; and in some cases response even to a single DSB. A second PI3K, DNA-
transcription factors that mediate longer-term tran- PK, is also activated at the site of DSBs.
scriptional responses (Fig. 1a). Other forms of damage, including DNA adducts
The first two classes – sensors and effectors – act and single-stranded breaks (SSBs), are sensed by
quickly to halt the cell cycle, and are often considered a homologous, parallel kinase cascade. Again, DNA
together as the mediators of extrinsic cell cycle binding proteins such as RPA localize at damage sites,
checkpoints (Reinhardt and Yaffe 2009). They are and proceed to activate kinases such as ATM-related
discussed in more detail in the corresponding Essay kinase (ATR) and thousand-and-one amino acid
Cell Cycle Arrest After DNA Damage 251 C
kinase (TAO). Subsequent kinase cascades activate cycle regulators (e.g., cyclins, Cdc25 phosphatases,
checkpoint kinase 1 (Chk1) and the mitogen-activated Plk1), and overexpressing or disrupting them individ-
protein kinase p38. This is by no means an exhaustive ually to induce or abrogate arrest. They act on both G1
list of components involved in sensing and transmit- and G2 cell cycle components by a variety of biochem-
ting damage signals, and many additional regulators ical mechanisms, including stoichiometric inhibition,
affect the activity of the components described here. In enzymatic modification, and transcriptional induction/
addition, specific programs of DNA repair induced by repression of cell cycle proteins. C
each damage type depend highly on the subset of Kinase, phosphatase and protease activity provide
damage sensors and transducers activated as well as the fastest mode of regulation on the cell cycle
the current cell cycle phase (▶ DNA Repair). (Fig. 1b). These connections proceed from DNA dam-
In contrast to the sensor-effector machinery, much age effector kinases such as ATM, Chk1, and Chk2 to
of which is broadly conserved from yeast to mammals, phosphorylate various cell cycle targets. For instance,
the slower transcriptional arm of DNA damage- to signal G1 arrest, ATM can phosphorylate cyclin D
induced cell cycle arrest appears to be restricted to to target it for degradation. ATM can also act on
multicellular organisms. It plays a similar role in G2 cells by phosphorylating Plk1, preventing cyclin
species ranging from Drosophila melanogaster to B/Cdk1 activation. The dual specificity phosphatases
humans. Its central player is the stress-induced tran- Cdc25A, Cdc25B, and Cdc25C are key substrates of
scription factor p53. In mammals, the DNA damage the DNA damage effector kinases. Chk1 phosphoryla-
kinases p38, ATM, ATR, DNA-PK, Chk1, and Chk2 tion of Cdc25A leads to its degradation and G1 arrest,
can all phosphorylate p53, protecting it from homeo- while phosphorylation by Chk1/2 of Cdc25B/C leads
static degradation by ubiquitin ligases such as Mdm2. to their inhibition by 14-3-3 proteins and G2 arrest.
A variety of other post-translational modifications can In addition, stoichiometric inhibitors can play
tune p53 stability and transcriptional activity (acetyla- a crucial role in cell cycle regulation. The transcrip-
tion, methylation, SUMOylation, etc.). Elevated levels tional induction of these stoichiometric cell cycle
of modified p53 lead to both the induction and repres- inhibitors, as well as direct transcriptional repression
sion of various genes involved in the cell cycle, apo- of cyclins, constitutes a slower but potentially longer-
ptosis, and DNA repair. lived response to DNA damage (Fig. 1b). For instance,
p53 activation is itself regulated by multiple posi- p53 can induce expression of the CDK inhibitor p21
tive and negative feedback loops that serve to integrate and the Cdc25C inhibitor 14-3-3s (▶ CDK Inhibitors).
information between cell survival and stress pathways p53 also plays a role in the transcriptional repression of
or maintain low levels of p53 under unstressed condi- cyclins A and B.
tions (Harris and Levine 2005). Moreover, this feed- It should be noted that not all of these mechanisms
back regulation has consequences for p53’s temporal described in this section have been shown to be active
activation. DSB induction across a broad range of in all cell types, or in response to all stimuli. We still
mammalian contexts – from cultured fibroblasts to lack a comprehensive picture of which combinations
live mice – leads to a series of pulses of p53 activation of arrest mechanisms control an individual cell’s
on a timescale of hours (Batchelor et al. 2009). It has response to damage. However, from those few studies
been shown that oscillation depends on two negative addressing combinations of arrest mechanisms in the
feedback connections: p53 transcriptionally induces its same cells, a key motif emerges: that of fast (post-
negative regulator Mdm2 as well as the phosphatase translational) p53-independent arrest initiation acting
Wip1, which can inactivate p38, ATM, and Chk2 in parallel with slower (transcriptional) p53-dependent
(Batchelor et al. 2008). arrest maintenance. For instance, in G1 arrest induced
by ionizing radiation, cyclin D is promptly
The Molecular Logic of DNA Damage Signaling to ubiquitinated in a p53-independent fashion, followed
the Cell Cycle by the slower induction of the p53 transcriptional tar-
Many connections between DNA damage signaling get p21 (Agami and Bernards 2000). Similarly, G2
and the cell cycle have been identified. These connec- arrest is induced quickly by Cdc25C phosphorylation
tions have largely been uncovered through experi- by Chk2, followed by p53-dependent downregulation
ments identifying interacting partners of key cell of cyclins A and B (Toettcher et al. 2009).
C 252 Cell Cycle Arrest After DNA Damage
Understanding Cell Cycle Arrest via Computation challenging to compute the sensitivity of the times at
Differential equation models of the cell cycle typically which cell cycle transitions occur. If the time of cell
undergo autonomous oscillation under normal cycle transition can be associated with a local maxi-
conditions (▶ Cell Cycle Modeling, Differential Equa- mum or minimum of the ith species concentration
tion). Cell cycle arrest arises naturally from such (e.g., cyclin E/Cdk2 to mark the G1/S transition, or
a model, when some parameter or modeled species cyclin A/Cdk1 to mark S/G2), the sensitivity of this
is modified such that oscillation ceases and time t to model parameters p can be represented as
a stable steady state is reached. In contrast to the
checkpoint-centered view in which the cell cycle dt Hx f i @x Hp f i
¼
“checks” the status of damage signaling before com- dp ðHx f i Þ f @p ðHx f i Þ f
mitting to S or M phase, this dynamical systems-
@x
centered view holds that damage signaling modifies where @p is the standard concentration sensitivity of
the cell cycle oscillator such that it settles to a stable, state variables x with respect to model parameters
arrested state. Upon removal of the arrest stimulus, (Rand et al. 2006). This quantitative approach enables
the arrested state may either persist or oscillation may the generation of models that match a variety of exper-
resume. Notably, there is no guarantee that cycling imental observations.
continues from the same phase at which it arrested. Second, the modeler must be able to generate exper-
Modeling approaches have the advantage that cell imentally relevant predictions. Many experimental
cycle arrest mechanisms can be tested individually – or studies are based on measurements from large numbers
in combination – for their ability to promptly elicit of cells. These might be either population-averaged
arrest, maintain arrest, and resume cycling after dam- techniques such as Western blots, or population distri-
age repair. An attempt to characterize all 24 possible bution measurements such as flow cytometry (▶ Cell
arrest mechanisms in a recent model of the eukaryotic Cycle Analysis, Flow Cytometry). To afford compar-
cell cycle (Csikász-Nagy et al. 2006) identified five ison to data, populations of models must be simulated
classes of arrest states corresponding to G1, S, G2, and representing collections of cells. However, there is
M phase arrest, as well as a G1-like state with G2 DNA a subtlety associated with properly sampling
content (Toettcher et al. 2009) (Table 1). Such models populations of dividing cells: since a freely cycling
can also be fit to experimental data to determine which cell population contains twice as many newborn cells
classes of arrest mechanisms govern the response as cells that are just about to divide, the distribution of
in vivo. These experimentally driven approaches are cell ages in the population is skewed toward younger
complementary to theoretical investigations of cells. Generally, the probability of finding a cell at age
models’ oscillation and steady state behavior using t since its last division can be represented by the
techniques such as bifurcation analysis (▶ Cell Cycle relation
Model Analysis, Bifurcation Theory).
The modeler faces two challenges in developing pðtÞ ¼ ð2 ln 2Þ2t=T
quantitative models of DNA damage and the cell
cycle, discussed in this and the following paragraph. where T is the cell cycle length (Mitchison 1971).
First, it is crucial to inform a model by fitting its Thus, the modeler must sample the model’s initial
parameters to account for a number of experimental states according to this distribution. A corollary to
observations. Typical data might include the amount this observation is that the fraction of freely
time spent by freely cycling cells in each cell cycle cycling cells observed in G1, S, and G2 (e.g., by
phase, whether certain cell cycle mutants are able to flow cytometry) is not equal to the amount of
cycle, and how protein levels change during arrest. time spent by a cell in those phases, but must be
Fitting this data accurately and efficiently requires corrected accordingly.
computing the sensitivity of protein levels and the
times of cell cycle transitions from a model (▶ Cell Consequences of Cell Cycle Arrest
Cycle Models, Sensitivity Analysis). While it is Cell cycle arrest is typically considered to be
straightforward to compute the sensitivity of protein a temporary and reversible state, persisting until dam-
concentrations at fixed timepoints, it is more age is repaired, after which cycling is resumed. This is
Cell Cycle Arrest After DNA Damage 253 C
Cell Cycle Arrest After DNA Damage, Table 1 Computational analysis of individual arrest mechanisms affecting eight key cell
cycle species by stoichiometric inhibition, enzymatic inactivation, or transcriptional repression. Entries are grouped first by the arrest
state induced by the each mechanism, and second by the species inhibited
Arrest type Inhibited species Stoichiometric Enzymatic Transcriptional
G1 Cyclin D ✓ ✓ ✓
Cyclin E ✓ ✓ ✓
S Cdc25 ✓ ✓ ✓
C
Cyclin B ✓ ✓
G2 E2F ✓ ✓ ✓
Cyclin A ✓
M APCCdh1 ✓ ✓ ✓
Cdc20 ✓ ✓ ✓
G1 cyclins; 8N DNA Cyclin A ✓ ✓
Cyclin B ✓
✓ Predicted to cause cell cycle arrest
indeed in many contexts, especially for low levels of ▶ Cell Cycle Modeling, Differential Equation
damage where repair is concluded quickly. However, ▶ Cell Cycle Models, Sensitivity Analysis
two alternatives to this course of action can be ▶ Cell Cycle of Mammalian Cells
observed: adaptation, in which cells resume cycling ▶ Cell Cycle, Cancer Cell Cycle and Oncogene
even in the presence of damage; and senescence, Addiction
where cells remain arrested permanently, even after ▶ DNA Repair
damage is repaired. Adaptation plays a role in bacteria ▶ Endoreplication
and yeast, but is largely detrimental to multicellular
organisms due to the risk of mutagenesis it carries.
In contrast, senescence is the safest course of action,
References
provided nearby undamaged cells can compensate for
the loss of the damaged cell’s replicative capacity. Agami R, Bernards R (2000) Distinct initiation and maintenance
A growing body of work suggests a connection mechanisms cooperate to induce G1 cell cycle arrest in
between senescence and DNA damage (Toettcher response to DNA damage. Cell 102:55–66
Bartkova J, Rezaei N, Liontos M, Karakaidos P, Kletsas D,
et al. 2009; Bartkova et al. 2006; Di Micco et al.
Issaeva N, Vassiliou LV, Kolettas E, Niforou K,
2006). A senescence-like arrest state can result from Zoumpourlis VC, Takaoka M, Nakagawa H, Tort F,
DSB-inducing irradiation (Toettcher et al. 2009), and Fugger K, Johansson F, Sehested M, Andersen CL,
the DSB response is required for oncogene-induced Dyrskjot L, Orntoft T, Lukas J, Kittas C, Helleday T,
Halazonetis TD, Bartek J, Gorgoulis VG (2006) Oncogene-
senescence (▶ Cell Cycle, Cancer Cell Cycle and
induced senescence is part of the tumorigenesis barrier
Oncogene Addiction) (Bartkova et al. 2006; Di imposed by DNA damage checkpoints. Nature 444:633–637
Micco et al. 2006). During this response, even G2- Batchelor E, Mock CS, Bhan I, Loewer A, Lahav G (2008)
arrested cells appear to enter a G1-like arrest state, Recurrent initiation: a mechanism for triggering p53 pulses
in response to DNA damage. Mol Cell 30:277–289
with high levels of cyclin E and p21, and low levels
Batchelor E, Loewer A, Lahav G (2009) The ups and downs of
of cyclins A and B. When such an arrest state is p53: understanding protein dynamics in single cells. Nat Rev
disrupted (e.g., by deletion of p21) cells can undergo Cancer 9:371–377
a second round of DNA replication, a phenomenon Csikász-Nagy A, Battogtokh D, Chen KC, Novák B, Tyson JJ
(2006) Analysis of a generic model of eukaryotic cell-cycle
known as endoreduplication (▶ Endoreplication).
regulation. Biophys J 90:4361–4379
Di Micco R, Fumagalli M, Cicalese A, Piccinin S, Gasparini P,
Luise C, Schurra C, Garre M, Nuciforo PG, Bensimon A,
Cross-References Maestro R, Pelicci PG, d’Adda di Fagagna F (2006) Onco-
gene-induced senescence is a DNA damage response trig-
gered by DNA hyperreplication. Nature 444:638–642
▶ Cell Cycle Checkpoints Harris SL, Levine AJ (2005) The p53 pathway: positive and
▶ Cell Cycle Model Analysis, Bifurcation Theory negative feedback loops. Oncogene 24:2899–2908
C 254 Cell Cycle Arrest Mechanisms
Loewer A, Batchelor E, Gaglia G, Lahav G (2010) Basal dynam- transduction cascades blocking or slowing down cell
ics of p53 reveal transcriptionally attenuated pulses in cycle progression at specific stages. Checkpoints are
cycling cells. Cell 142:89–100
Mitchison JM (1971) The biology of the cell cycle. Cambridge triggered by sensor proteins detecting, directly or indi-
University Press, Cambridge rectly, cell cycle perturbations and transmitting the
Rand DA, Shulgin BV, Salazar JD, Millar AJ (2006) Uncovering signal, through the action of protein kinases, to effector
the design principles of circadian clocks: mathematical anal- proteins that stop cell cycle progression until the signal
ysis of flexibility and evolutionary goals. J Theor Biol
238:616–635 activating the checkpoint has been turned off. These
Reinhardt HC, Yaffe MB (2009) Kinases that control the cell mechanisms have been highly conserved during evo-
cycle in response to DNA damage: Chk1, Chk2, and MK2. lution, and checkpoint defects result in genome insta-
Curr Opin Cell Biol 21:245–255 bility, which is frequently associated to tumor
Toettcher JE, Loewer A, Ostheimer GJ, Yaffe MB, Tidor B,
Lahav G (2009) Distinct mechanisms act in concert to medi- development. The checkpoint controls are elicited
ate cell cycle arrest. Proc Natl Acad Sci USA 106:785–790 through molecular events regulating their activation,
maintenance, and inactivation resulting, respectively,
in cell cycle arrest, maintenance of the arrest for
a certain time and recovery of cell cycle progression.
Cell Cycle Arrest Mechanisms These surveillance mechanisms can be divided into
intrinsic regulatory pathways, ensuring the orderly
▶ Cell Cycle Arrest After DNA Damage progression of cell cycle events under physiological
conditions, and extrinsic pathways that are activated in
response to specific clues, such as damage to DNA or
cellular structures. The intrinsic checkpoints act by
Cell Cycle Checkpoints controlling the activity of cell cycle dependent kinases
(CDKs) mainly at the G1/S boundary and at the meta-
Flavio Amara, Marco Muzi-Falconi and Paolo Plevani phase to anaphase transition in mitosis; such mecha-
Dipartimento di Scienze Biomolecolari nisms are described in other entries of the
e Biotecnologie, Università di Milano, Milan, Italy encyclopedia.
stalled replication forks and, in multicellular eukary- RAD53/CHK2 and CHK1 activation influence various
otes, it may promote apoptosis when the damage is DNA metabolic pathways, like homologous recombi-
irreparable. nation, origin firing during S phase, nuclease activity,
and gene expression.
MEC1/ATR DNA Damage Checkpoint Signaling Another essential player of checkpoint activation is
In Saccharomyces cerevisiae, the two main players of represented by the 9-1-1 complex. This heterotrimeric
the DNA damage checkpoint activated in response to complex show limited sequence homology to the
many DNA helix-distorting lesions are two kinases PCNA homotrimeric clamp and it is therefore often
encoded by the MEC1 and RAD53 genes. They are referred to as the checkpoint clamp (Table 1 and
the orthologues of ATR and CHK2 in human cells Fig. 1). The 9-1-1 complex is loaded onto DNA by
(Table 1). a checkpoint clamp loader, a form of replication factor-
In budding yeast, MEC1 acts as the apical kinase C (RFC), in which the RFC1 subunit is substituted by
in the checkpoint cascade and likely senses RAD24 (S. cerevisiae) or RAD17 (mammalian cells)
abnormal amounts of single-stranded (ss) DNA (Majka and Burgers 2003). The 9-1-1 clamp promotes
stretches covered by the replication protein A (RPA), checkpoint activation in vivo by influencing MEC1/
which are generated by processing of various DNA ATR recruitment and its substrate specificity; the 9-1-1
lesions (Fig. 1). clamp has a role in modulating chromatin binding of
RPA-coated ssDNA is thought to recruit the apical RAD9/53BP1 and subsequent RAD53/CHK2 phos-
checkpoint kinase (MEC1-DDC2 or TEL1, in budding phorylation events.
yeast; ATR-ATRIP or ATM in mammalian cells), and
the 9-1-1 complex (RAD17-MEC3-DDC1 in budding Checkpoint Signaling in Response to Double
yeast) triggering the signal transduction cascade (Zou Strand Breaks (DSBs)
and Elledge 2003). As a consequence of this activation, When the DNA integrity is challenged by discontinu-
MEC1 phosphorylates, directly or indirectly, a number ities in the helix backbone, as those generated by
of factors (e.g. DDC2, H2A, DDC1, RAD9, DPB11/ double strand breaks, the checkpoint human kinase
TOPB1, RAD53/CHK2, CHK1, etc.) and these subse- ATM is loaded near the DSBs and, together with
quent phosphorylation events are used to follow ATR, activate the checkpoint response. ATM activa-
the signal through the cascade (Fig. 1). Activated tion requires the presence of the MRN complex, which
RAD53/CHK2 amplifies the signal through has a DNA tethering capacity and possesses both endo-
autophosphorylation events and inhibits the cell cycle and exo-nucleolytic activities. Although the nuclease
machinery by phosphorylating various targets, leading activity of the MRX/MRN complex is not critical for
to cell cycle arrest (Harrison and Haber 2006). checkpoint activation, the presence of a physically
C 256 Cell Cycle Checkpoints
Damaged DNA:
DNA adducts Double Strand Break (DSB)
Helix distorting lesions
DNA damage
G1 S G2/M
NER
Damage
processing
and ssDNA
production
Gapped Stalled replication
DNA fork Processed DSB
9-1-1
H3
DDC2 H2A
DNA adduct MEC1 Checkpoint
Helicase activation
Ser129 Chromatin
RPA Lys79 DBP11 remodelling
Nuclease
RAD9
Phosphorylation
Methylation RAD53
assembled complex and the action of other nucleases manner in response to a variety of genotoxic treatments
and helicases (such as EXO1 and SGS1) are important. and the modified gH2A molecules localize near a DSB
In S. cerevisiae TEL1, the ATM orthologue, partic- (van Attikum and Gasser 2009); the same modification
ipates only marginally in the checkpoint response occurs on the Ser139 conserved residue of the mamma-
induced by a single DSB, but its role becomes relevant lian H2AX histone variant. It is assumed that RPA-
in the presence of multiple DSBs and/or when the coated ssDNA is required to activate the checkpoint,
initiation of DSB ends processing is defective. In this but the MEC1/ATR and TEL1/ATM-dependent phos-
organism, a single irreparable DSB triggers the MEC1 phorylation of the H2A/H2AX histones (Fig. 1) is rele-
pathway, which is activated by extensive resection of vant for checkpoint maintenance, since in its absence
the DSB DNA ends (Lazzaro et al. 2009). If the break the checkpoint signal is turned off prematurely. In yeast,
is not rapidly repaired, nucleolytic activities produce one of the major roles of H2A phosphorylation is the
long ssDNA regions that, via the ssDNA binding pro- recruitment of chromatin remodeling complexes near
tein RPA, recruit MEC1 activating the checkpoint the DNA lesions. Another histone modification impor-
cascade (Fig. 1). tant for the activation of the DNA damage checkpoint is
methylation of Lys79 of histone H3 (H3-Lys79Me) by
Chromatin Remodeling and Checkpoint the specific methyl transferase DOT1. In fact, the main
Maintenance mediator proteins in the checkpoint cascade (RAD9 in
Chromatin modifications contribute to checkpoint func- S. cerevisiae and 53BP1 in mammals) can be recruited
tion. In fact, histone proteins are also modified in near a chromatin damaged site through two parallel
response to DNA damage. In S. cerevisiae, the Ser129 pathways. The first one is dependent upon H3-Lys79
residue of H2A is phosphorylated in a MEC1-dependent methylation, while the second pathway acts through
Cell Cycle Checkpoints 257 C
the function of the DPB11/TOPB1 adaptor proteins bi-orientation are stochastic processes taking
(Fig. 1). a variable amount of time to complete. During that
time individual chromosomes may be detached from
Turning Off the DNA Damage Checkpoint Signal the microtubules or be connected only to one spindle
In cells with repairable DNA damage, the checkpoint pole. The spindle assembly checkpoint (SAC) delays
arrest is maintained until repair is completed. Then, the metaphase to anaphase transition until the sister
cells can resume the cell cycle turning off the check- chromatids are properly attached to the spindle in C
point signal in a process known as recovery. Instead, if a bipolar orientation. In budding yeast, cell cycle arrest
an irreparable lesion is present, cells eventually over- at the G2/M transition is mediated by inhibition of the
ride the cell cycle arrest in a process called adaptation. CDC20-anaphase promoting complex (APC) ubiquitin
In mammals, it has been proposed that adaptation to ligase, thus preventing proteolysis of the securin PDS1
damage is linked to cancerogenesis and, likely, this until complete bi-orientation is achieved (Fig. 2).
event is normally prevented by inducing the apoptotic Checkpoint proteins (MAD1, MAD2, BUB1,
pathway. Inactivation of the RAD53 kinase is required BUBR1, BUB3, and MPS1) all accumulate at unat-
to recover from checkpoint-mediated cell cycle arrest tached kinetochores and form various complexes,
in S. cerevisiae. Various protein phosphatases, includ- many of which can inhibit the APC (Fang et al. 1998)
ing PTC2, PTC3, PPH3, and GLC7, have been found (Fig. 2). APC is a multiprotein complex that targets
to be important for checkpoint inactivation. Interest- several proteins for degradation during mitosis through
ingly, their relative contribution to the recovery pro- the associated specificity factor CDC20. Securin and
cess seems to be influenced by the type of genotoxic cyclins are ubiquitylated by CDC20-APC; therefore, to
stress causing checkpoint activation (DSBs, alkylating delay anaphase onset in the presence of spindle
agents, hydroxyurea). The same phosphatases act on defects, the checkpoint must block CDC20-APC medi-
phosphorylated gH2A and, not surprisingly, homolo- ated PDS1 degradation. Experimental evidence sug-
gous phosphatases influence checkpoint inactivation gests that in response to spindle defects, MAD2
also in human cells. exchanges from a MAD1/MAD2 complex to
While it was somehow expected that reversal of a CDC20/MAD2 complex sequestering CDC20 away
checkpoint signaling would involve the action of phos- from the APC and blocking PDS1 degradation (Fig. 2).
phatases acting on the master effector kinases of the In S. cerevisiae, spindle misorientation is detected
cascade, it was interesting to learn that POLO-like by the spindle positioning checkpoint (SPOC) which
kinases (such as PLK1 and CDC5) also play an impor- prevents mitotic exit. The target of this control is the
tant role in switching off the checkpoint. It is likely that mitosis exit network (MEN), and more specifically the
recovery and adaptation, being two sides of the same activation of the TEM1 GTPase (Adames et al. 2001).
coin, are going to share common players and mecha- TEM1 cycles between GDP- and GTP-bound states,
nisms. However, genetic screenings in budding yeast regulated by the putative guanine nucleotide exchange
revealed that some mutations allow to distinguish factor (GEF) LTE1 and the two-component GTPase
between the two processes. For example, the cdc5-ad activating protein (GAP) BFA1/BUB2. The last one
allele and deletions of certain recombination genes recruits TEM1 to the bud-directed spindle pole, where
(such as KU70/80, TID1 and RAD51) prevent the TEM1 is kept inactive until the pole crosses the neck
adaptation process without affecting recovery. into the bud. GTP-TEM1 then binds to the protein
kinase CDC15, which phosphorylates and activates
S. cerevisiae Spindle Assembly and Spindle the protein kinase DBF2. MOB1 binds to DBF2 and,
Position Checkpoints in a poorly understood manner, MOB1/DBF2 stimu-
Chromosome segregation at mitosis is controlled by lates the release of the CDC14 phosphatase from the
two surveillance mechanisms: the spindle assembly nucleolus and contributes to cytokinesis. CDC14
and the spindle positioning checkpoints. Accurate seg- dephosphorylates CDH1, leading to the activation of
regation requires bipolar attachment of sister chroma- CDH1-APC complex, which triggers cyclin degrada-
tids to the mitotic spindle, which is mediated by tion and exit from mitosis. A possible crosstalk
a proper connection between kinetochores and spindle between the SAC and SPOC is likely, but not yet
microtubules. Kinetochore capture and microtubules fully demonstrated (Lew and Burke 2003).
C 258 Cell Cycle Checkpoints
BFA1
BUB2
TEM1 TEM1
MPS1 MAD2
CDC15
BUB3 BUB1 CDC20 LTE1
BUB3 BUBR1 BUB3 BUBR1
CDC15
MAD1 MAD2 TEM1
MOB1
CDC14 DBF2
APC CDH1
Phosphate
PDS1
CDC28 GDP
CLB2
Separase
GTP
Cell Cycle Checkpoints, Fig. 2 The spindle assembly and spindle positioning checkpoints
the gene and protein lists obtained through analysis of to genes and proteins involved in human and yeast cell
different types of high-throughput data (de Lichtenberg cycle (see also ▶ Cell Cycle, Biology) processes, one
et al. 2008). of the most studied complex system in biology (see
also ▶ Complex System). The database, which is
accessible at the web site http://www.itb.cnr.it/
Cross-References cellcycle, has been developed in a systems biology
context, since it also stores the cell cycle mathematical
▶ Cell Cycle Analysis, Expression Profiling models (see also ▶ Mathematical Model, Model The-
▶ Cell Cycle of Mammalian Cells ory) published in the recent years, with the possibility
▶ Cell Cycle, Budding Yeast to simulate them directly from the web interface. Cell
▶ Cell Cycle, Fission Yeast cycle information from humans is primary considered
▶ Cyclins and Cyclin-dependent Kinases since this resource aims to support biomedical studies
▶ DNA Microarrays in the context of cancer research. Then the database
▶ Fisher’s Test content is extended toward the budding yeast cell cycle
▶ Functional Modules and Complexes because of the genomic similarity between human and
▶ Gene Expression budding yeast and the large number of models avail-
▶ Model Organism able for this organism. According to this choice, the
▶ Oscillation Amplitude data integration (see also ▶ Data Integration) concerns
▶ Periodic Oscillation all genes and proteins involved in the cell cycle models
▶ Receiver Operating Characteristic (ROC) Curve of both the budding yeast S. cerevisiae and the
H. sapiens. This information is taken from the most
recent literature and plays a crucial role to contextual-
References ize behavior of each cell cycle component.
CellCycle Database
• Genes
Data Sources • Proteins
• Mathematical Models
• Entrez Gene
• PDB
• GenBank • Transpath
• GEO • InterPro Linked Resources SW Resources
• SCPD • Mint
• MGC • Bind • Ensembl Genome Browser • XPPAUT
• DBTSS • Biogrid • CYGD
• SBML Parser
• Transfac • Kegg • Unigene
• Gene Ontology • GNUPLOT
• QPPD • Omim
• BioModels • Intact • Java
• SGD
• PubMed • Reactome • MathML Player/Fonts
• Uniprot
Cell Cycle Database, Fig. 1 Architecture of the Cell Cycle resources (Linked Resources). The Cell Cycle Database com-
Database. The core of the resource is a data warehouse system, prises user-friendly interfaces which show information and, in
which collects data from the most relevant bioinformatics general, provide the functions of the resource, even relying on
resources (Data Sources) and provides links to external third party free software
administrator can update the database content by Furthermore, a repository of the most recent
gene or protein name through the web interface. published cell cycle models to allow the exploration
When a new entry is inserted in the core table, all of their mathematical structure through SBML (Sys-
the external tables will be updated in cascade, tems Biology Markup Language) (Hucka et al. 2003)
while when a new entry is inserted in one of the components and their mathematical simulation is
external table no inward updating occurs. As a result available.
all tables of the database are updated according to the
infrastructure, which is designed for automated data Functionality
integration. The database stores 26 models of the cell cycle inter-
This database is able to collect the most important action networks and it allows the mathematical simu-
information related to cell cycle genes and proteins, lation over time of the quantitative behavior of each
which are drawn from the analysis of the cell cycle model component. To accomplish this task, a web
information available in literature and the existing interface for browsing information related to cell
pathway databases. The database also contains pro- cycle genes, proteins, and mathematical models is
tein-protein interactions taken from several resources available.
making the information on the cell cycle interaction Models and proteins information are directly linked
network as complete as possible. in the resource: for each protein we provide
C 262 Cell Cycle Database
information on the models in which it is involved. In the beginning of the page an instruction calls a XSL
the protein report a list of the published models is stylesheet, which allows the formulas to be viewed
available with a direct link to the specific model report. correctly. This technology allows using the browser
In this framework a pipeline, which allows users functions to find strings inside mathematical expressions
to deal with the mathematical part of the models, in and to change their size, operations which are usually
order to solve, using different variables, the ordinary not possible using images to represents formulas.
differential equation systems that describe the biolog- The simulation section allows users to simulate
ical process, has been implemented. Up to now among a model using the software XPPAUT (Ermentrout
the 26 models stored in the database, 13 models have 2002) and to plot results on the fly in order to capture
the related SBML file and for 12 of them it is possible the dynamical properties of the biological process. In
to run simulations. order to enable the simulation, this section lists the
The flexibility of this database allows the addition of model species (state variables, typically protein con-
mathematical data, which are used for simulating the centrations), its parameters (such as kinetic constants),
behavior of the cell cycle components in the different its algebraic rules, and XPPAUT internal options,
models. The resource is useful both to retrieve informa- using default values. Users can change the initial
tion about cell cycle model components and to analyze values in order to analyze the different natures of the
their dynamical properties. The Cell Cycle Database system dynamics, performing sensitivity and bifurca-
can be used to find system-level properties, such as tions analysis. Once the computation is completed
stable steady states and oscillations, by coupling struc- users can download XPPAUT input and output files
ture and dynamical information about models. and plot results. The web interface allows users to plot
both time courses and phase diagrams for each model
Accessibility after the simulation step. Results are shown with
The web interface allows to retrieve model-related images exported by GNUPLOT (Williams and Kelley
information and a pipeline has been implemented to 1998), the popular portable command-line function
deal with the mathematical part of the models, in order plotting software.
to solve the ordinary differential equations systems
that describe the biological processes. This pipeline Model Annotation
essentially consists in the XHTML-MML-XSL style As the primary data source we consider the BioModels
sheet implementation, as it will be discussed below. In Database (Le Novere et al. 2006) from which we
fact, using this system it is possible to visualize the collect the mathematical model specifically developed
mathematical description of the model and to run sim- for yeast and mammalian cell cycles. We also integrate
ulations by introducing modifications of the initial other published models, which are not stored in the
conditions of state variables and parameters. BioModels Database, manually retrieving them from
Each model is presented in a report structured in literature or from the CellML repository (Cuellar et al.
three sections: the publication data, the SBML data 2003).
structure, and the numerical simulation part. The first Each model is presented in a report, which is struc-
section contains the detailed publication data, the dia- tured in three sections: the publication data, the SBML
gram of the model, the related XML file, and the list of data structure, and the numerical simulation part. The
all the proteins involved in the model, which are linked first section contains the detailed publication data
to the related Cell Cycle Database protein report. (such as the authors, PubMed ID, the abstract, and
In the SBML data structure section, users can explore journal information), the diagram of the model and
the SBML components of the selected model including the related XML file, if available, and the list of all
its mathematical expressions. Mathematical formulas the proteins involved in the model, which are linked to
within the SBML models are expressed using the Math- the related Cell Cycle Database protein report. The
ematical Markup Language (MML). To visualize the protein identifiers are chosen according to the UniProt
expressions on the web our resource relies Database annotation.
on a XHTML + MML page, following the w3c specifi- More in detail, the Cell Cycle Database contains the
cations. In this page MML is in-lined with HTML and at literature information related to each model, the input
Cell Cycle Dynamics, Bistability and Oscillations 263 C
for the simulation software, and the XML file coded References
with SBML specifications (Hucka et al. 2003). We
choose SBML since it is an internationally supported Ausbrooks R et al. W3C Recommendation. Mathematical
Markup Language (MathML) version 2.0, 2 edn. [http://
and widely used language for metabolic networks,
www.w3.org/Math/]. Accessed 21 October 2003
cell-signaling pathways, regulatory networks, and Cuellar AA, Lloyd CM, Nielsen PF, Bullivant DP, Nickerson
many other biological pathways. However, there DP, Hunter PJ (2003) An overview of cellML 1.1, a biolog-
are published cell cycle models not yet implemented ical model description language. SIMULATION: Trans Soc C
Model Simul Int 79(12):740–747
in SBML: for this reason, some SBML models
Ermentrout B (2002) Simulating, analyzing, and animating
included in the database are manually generated dynamical systems: a guide to xppaut for researchers and
using the JigCell Model Builder software (Vass et al. students, SIAM 2002, Philadelphia
2004), a model editor which allows the construction Hucka M et al (2003) The systems biology markup
language (SBML): a medium for representation and
of biochemical reaction networks in SBML
exchange of biochemical network models. Bioinformatics
format, and are validated using the Systems Biology 19(4):524–531
Workbench SBML validator. Mathematical formulas Le Novere N, Bornstein B, Broicher A, Courtot M, Donizelli M,
within the SBML models are expressed using Dharuri H, Li L, Sauro H, Schilstra M, Shapiro B, Snoep JL,
Hucka M (2006) BioModels database: a free, centralized
Mathematical Markup Language (MathML or MML)
database of curated, published, quantitative kinetic models
(▶ Systems Biology Markup Language (SBML); of biochemical and cellular systems. Nucleic Acids Res 34:
▶ MathML) (Ausbrooks et al. 2003). D689–D691
Vass M, Allen N, Shaffer CA, Ramakrishnan N, Watson LT,
Tyson JJ (2004) The jigcell model builder and run manager.
Model Curation
Bioinformatics 20(18):3680–3681
As discussed before, the Cell Cycle Database contains Williams T, Kelley C (1998) GNUPLOT: an interactive plotting
models from Biomodels, literature, and authors’ program, version 3.7 organized by: David Denholm
websites, where available. Concerning the models
from Biomodels, the model curation is ensured from
the database reliability, which we trusted in. To ensure
the model curation for the literature models, we tested
and simulated them in order to reproduce the results Cell Cycle Dynamics, Bistability and
shown in the related publication papers. Oscillations
Cross-References
Definition
▶ Cell Cycle, Biology
▶ Complex System A molecular regulatory system can maintain itself
▶ Data Integration indefinitely at a steady state where, for every time-
▶ Mathematical Model, Model Theory varying species, the total rate of formation of the
C 264 Cell Cycle Dynamics, Bistability and Oscillations
chemical is exactly balanced by its total rate of (X1 active, X2 inactive) and (X1 inactive, X2 active).
removal. In mathematical terms, we can write The intermediate steady state (X1 semi-active, X2
semi-active) is unstable.
dxi Multistability plays an important role in the theory
¼ fi ðx1 ; x2 ; :::; xn Þ ri ðx1 ; x2 ; :::; xn Þ; of cell cycle progression. It is proposed that the
dt (1)
i ¼ 1; :::; n characteristic states of cell cycle arrest (G1-arrest,
G2-arrest, metaphase-arrest) correspond to alternative
stable steady states of the underlying chemical reaction
where xi is the concentration of species i, and fi(x1, network controlling the activities of cyclin-dependent
x2, . . ., xn) and ri(x1, x2, . . ., xn) are its rates of formation kinases (CDKs). In this view, cells progress through
and removal. At a steady state, dxi/dt ¼ 0 for all i, the cell cycle (G1 ! S/G2 ! M ! G1 ! . . .) by
irreversible transitions from the G1 steady state to the
fi ðx1 ; x2 ; :::; xn Þ ¼ ri ðx1 ; x2 ; :::; xn Þ; i ¼ 1; :::; n: (2) S/G2 steady state (the G1-S transition, also called
“Start” or the “Restriction Point”), from the S/G2
For any well-specified chemical reaction network, steady state to the M steady state (the G2-M transition),
there always exists at least one steady state solution; and from the M steady state to the G1 steady state (the
call it x0 ¼ x01 ; x02 ; :::; x0n . Such a steady state can be metaphase-anaphase transition, also called “exit from
classified as either stable or unstable with respect mitosis”). In this view, cell cycle checkpoints work by
to small perturbations away from the steady state. If stabilizing one of these three steady states and
all small perturbations (in any direction in the preventing the transition to the next phase of the cell
n-dimensional state space of the reaction network) cycle.
eventually return to x0, then this steady state is said Bistability is intimately connected to the
to be (locally asymptotically) stable. If there exist existence of spontaneous limit cycle oscillations in
small perturbations (in some particular directions in regulatory networks with both positive feedback (to
state space) that depart from x0 and never return, then create alternative stable steady states) and negative
the steady state is said to be unstable. feedback (to induce spontaneous switching between
Since the rates of formation and removal of any the two stable steady states). This combination of
chemical species in a reaction network are algebraic positive and negative feedbacks is precisely the case
functions of species concentrations, Eq. 2 is a system in the regulatory system governing the eukaryotic
of n nonlinear algebraic equations, which in general cell cycle (Tyson and Novak 2008). In somatic
may have multiple solutions in the positive orthant: xi cells, checkpoint signals prevent spontaneous
0 for all i. In this case, we can denote the different cycling, but in some circumstances these checkpoints
steady states as x0j, where j ¼ 1, . . ., m. A typical are removed, and the cell cycle proceeds as
situation is the case of three steady states (m ¼ 3), a spontaneous, unfettered, limit cycle oscillation.
where two of the steady states are stable and one is For example, during early embryogenesis, the fertil-
unstable. This case is called bistability. In general, ized egg undergoes a series of rapid cell divisions
there may be more than two stable steady states, in without growth, until it reaches the “mid-blastula
which case we refer to tristability or multistability. We transition,” when checkpoint proteins are expressed
may describe a system with m > 1 as having multiple and the cell cycle regains the characteristic G1-S, G2-
steady states when we are not sure of the total number M, and M-G1 transitions of somatic cells. It is also
of stable steady states. possible to create mutant yeast cells that lack check-
Bistability is a typical feature of reaction networks point controls; these cells divide faster than they
with positive feedback, although one must be aware grow, getting smaller and smaller each cycle until
that positive feedback is not always readily apparent in they die. These observations suggest that, under
reaction networks as they are conventionally drawn. most circumstances, periodic cell divisions are
Double-negative feedback is a particularly common governed not by spontaneous limit cycle oscillations
motif for bistability: X1 inhibits X2, and X2 inhibits but by multistability and irreversible transitions
X1. In this case, the two stable steady states are (Tyson and Novak 2008).
Cell Cycle Dynamics, Bistability and Oscillations 265 C
Cdc20 APC
WeelP
Cdc20-APC
PPase
Wee1
Cyclin
C
CDK-cyclin PCDK-cyclin CDK-cyclinUUU
(MPF) (preMPF)
CDK CDK
Cdc25P
PPase Checkpoint
signal
Cdc25
Cell Cycle Dynamics, Bistability and Oscillations, CDK-cyclin dimers (called MPF). The CDK subunit is phos-
Fig. 1 CDK regulatory network for frog egg cell cycles. In phorylated by Wee1 to form a less active form of MPF, called
this network diagram, a solid arrow represents a biochemical preMPF. The inhibitory phosphate group can be removed by
reaction, and a dashed arrow represents a catalytic effect on Cdc25. Wee1 and Cdc25 activities are also regulated by phos-
a reaction. A T-shaped arrow represents the association of two phorylation, catalyzed by MPF: Wee1P is the less active form,
proteins to form a complex. Black balls on the crossbar indicate and Cdc25P is the more active form. Cyclin B is labeled for
a reversible binding reaction. The small gray balls represent proteolysis by an E3 ubiquitin ligase, the APC, which works in
proteolytic fragments of a protein. Newly synthesized cyclin collaboration with a targeting subunit, Cdc20. The synthesis of
B combines with CDK subunits (in excess) to form active Cdc20 is activated by MPF
a b
MPF activity
MPF activity
0
qinact qact Total cyclin
Total cyclin
c
MPF activity
Total cyclin
Cell Cycle Dynamics, Bistability and Oscillations, Cdc20 is produced and cyclin is degraded faster than it is syn-
Fig. 2 Signal-response curves: MPF activity as a function of thesized. The bistable switch flips back to the low-activity state
total cyclin. (a) Bistable switch. If cyclin synthesis and degra- once [total cyclin] drops below the inactivation threshold. The
dation are blocked, then the total cyclin concentration ([MPF] + white circle represents an unstable steady state. (c) Damped
[preMPF]) in a frog egg extract can be manipulated experimen- sinusoidal oscillations. If the CDK phosphorylation site (a tyro-
tally. For [total cyclin] between the two thresholds, yinact and sine residue) is mutated to phenylalanine, then the positive
yact, the control system has two stable steady states separated by feedback loops involving Wee1 and Cdc25 are disengaged and
an unstable steady state. The dotted line shows how MPF activity bistability is lost. The system now undergoes damped oscilla-
will respond to slowly increasing [total cyclin] and then slowly tions to a stable steady state (black circle). If there is enough time
decreasing [total cyclin]. (b) Relaxation oscillations. If cyclin is delay in the negative feedback loop through Cdc20-APC, then
steadily synthesized, then [total cyclin] will increase when MPF the control system might exhibit sustained oscillations, but they
is in the low-activity state, because Cdc20 is unavailable, but are quite distinct from the relaxation oscillations in panel B
once the system flips to the state of high MPF activity, then
bistable switch to be reset to the state of low CDK generate spontaneous oscillations of MPF activity,
activity. Repeated cycles of DNA synthesis and mito- unconstrained by requirements of cell growth, DNA
sis correspond to repeated flipping of the switch from synthesis, DNA damage, or mitotic spindle functions.
a stable steady state of low CDK activity (interphase) The spontaneous oscillations are driven by periodic
to a stable steady state of high CDK activity (M phase) bursts of cyclin degradation, which are in turn gener-
and back again. (Physicists and engineers call this ated by the periodic activation of MPF by dephosphor-
behavior a “hysteresis loop”.) ylation of the CDK subunit. These spontaneous
Novak and Tyson showed, furthermore, that under oscillations result from an interplay between the
the special conditions in an early frog embryo or in bistable switch and a negative feedback loop (MPF
a frog egg extract, the CDK-cyclin control system can activates the cyclin degradation machinery which
Cell Cycle Dynamics, Bistability and Oscillations 267 C
destroys cyclin subunits, causing a loss of MPF activ- This notion became the basis of a successful kinetic
ity); see Fig. 2b. (Physicists and engineers call this model of the budding yeast cell cycle by Chen et al.
behavior a “relaxation oscillator.”) (2000). The model made many predictions that were
The mathematical model of Novak and Tyson not tested in Fred Cross’s laboratory. In particular, Cross
only gave a systematic account of a variety of experi- et al. (2002) tested the idea that, under “neutral” con-
mental observations on fission yeast, budding yeast, ditions, budding yeast cells could persist indefinitely in
and frog eggs, but also made a series of striking either the G1 state, with low CDK activity, or the C
predictions: S-G2-M state, with high CDK activity, depending on
1. The “cyclin threshold for MPF activation” that had the immediately prior history of how the cells are
been observed by Solomon et al. (1990) is only one- brought into the neutral conditions. This experiment is
half of a hysteresis loop. There should be a separate, explained in the article on “Cell cycle, budding yeast.”
distinctly lower cyclin threshold for MPF inactiva-
tion (Fig. 2a). Irreversibility
2. For cyclin levels well above threshold, the time lag The concept of bistability provides an immediately
for MPF activation should be roughly constant (as obvious and intuitively satisfying explanation of the
observed by Solomon et al. (1990)), but as cyclin irreversibility of progression through the cell cycle. If
level approaches the activation threshold from cell cycle transitions are a result of passing from one
above, the time lag for MPF activation should stable steady state to another, then it is clear that the
increase dramatically. reverse transition cannot follow the same path. Once
3. The cyclin level for MPF activation should be an the transition is made, then a completely different set
increasing function of unreplicated DNA in an of circumstances must be brought into play to accom-
extract. plish the reverse transition. For example, in Fig. 2a the
4. If the bistable switch is disabled by interfering with activation of MPF is accomplished by increasing total
the inhibitory phosphorylation of CDK, then MPF cyclin concentration above the threshold, yact, where
oscillations may still be possible, but they will be the unstable steady state merges with and annihilates
faster, more smooth (not abrupt bursts of MPF the steady state of low MPF activity and forces the
activity), and probably damped (Fig. 2c). system to switch to the steady state of high MPF
Predictions 1 and 2 are generic properties of activity. If subsequently the cyclin level is caused to
bistable systems with hysteresis. Prediction 3 was decrease, the control system does not jump back to the
a novel proposal for the mode of action of lower steady state at yact, which would be the case for
a checkpoint signal that delays entry into mitosis. a “reversible” transition. Rather, the cyclin level must
Prediction 4 relates to classic properties of oscillations decrease below a much smaller threshold, yinact, before
driven by a “simple negative feedback loop” in com- the down-jump occurs.
parison to relaxation oscillations in a “substrate- In the budding yeast case, the switch from the low-
depletion relaxation oscillator.” CDK state to the high-CDK state is driven by Cln-
dependent kinase activity, but the switch back is driven
Experimental Verification by a completely different mechanism, dependent on
The first three predictions of the Novak-Tyson model the activities of Cdc20-APC (degradation of B-type
were confirmed in frog egg extracts independently and cyclins) and Cdc14 (a CDK-counteracting phospha-
simultaneously by Sha et al. (2003) and Pomerening tase). In this case, the up-jump is a one-way switch: it
et al. (2003); see Fig. 3. Slightly later, Pomerening is induced by Cln-dependent kinase, but after the
et al. (2005) confirmed prediction 4 in a thorough and switch is made, the Cln-kinase activity can drop to
careful study of MPF oscillations in frog egg extracts; 0 and the CDK activity will remain high. Down-jump
see Fig. 4. occurs by means of a different one-way switch. In the
In 1996, Kim Nasymth (1996) proposed that, at its “neutral” condition (Cln ¼ Cdc20 ¼ Cdc14 ¼ 0), the
heart, the budding yeast cell cycle is a repetitive alter- control system can persist indefinitely in either the
nation between two “self-maintaining” states (i.e., sta- low-CDK or the high-CDK state. For more details,
ble steady states): a G1 state with low CDK activity see the articles on “Cell cycle, budding yeast” and
and an S-G2-M state with high CDK activity. “Cell cycle dynamics, irreversible transitions.”
C 268 Cell Cycle Dynamics, Bistability and Oscillations
0min
90min
60min
140min
M M M
Cell Cycle Dynamics, Bistability and Oscillations, is supplemented with variable amounts of nondegradable cyclin
Fig. 3 Experimental confirmation of bistability in frog egg at t ¼ 0, but cycloheximide is not added until t ¼ 60 min. By
extracts. From Sha et al. (2003), used by permission. (a) A frog t ¼ 60 min, the nuclei in each sample have been driven into
egg extract, containing all the proteins in Fig. 1 except for cyclin, mitosis 1 by a combination of the nondegradable cyclin added at
is prepared in the presence of sperm nuclei (indicators of MPF t ¼ 0 and the degradable cyclin subunits synthesized from the
activity) and cycloheximide (a drug that prevents the synthesis extract’s endogenous mRNA. As the extracts try to exit from
of cyclin protein from endogenous mRNA). In the absence of mitosis 1, the endogenous cyclin subunits are degraded by
cyclin subunits, the extract is blocked in interphase (t ¼ 0), as Cdc20-APC, but the exogenous D90 cyclin subunits resist deg-
indicated by the compact nucleus with intact nuclear membrane radation. Cycloheximide prevents any further cyclin synthesis in
and dispersed chromatin (stained blue). At t ¼ 0 the extract is the extracts. At t ¼ 140 min the extracts are assayed for the cell
supplemented with a measured amount of nondegradable (D90) cycle phase of the nuclei. Cyclin concentrations greater than
cyclin, and 90 min later the extract is observed to see if the nuclei 20 nM are sufficient to maintain the nuclei in a mitotic state.
are in interphase (dispersed chromatin, intact membrane, low [Total cyclin] ¼ 16 nM is below the threshold (blue down-
MPF activity) on in mitosis (condensed chromatin, breakdown triangle) for inactivation of MPF and exit from mitosis. Cyclin
of nuclear membrane, high MPF activity). Cyclin concentrations concentrations of 24 and 32 nM are clearly in the bistable region:
less than 35 nM are insufficient to induce entry into mitosis, the nuclei can persist stably in interphase or in mitosis,
but [total cyclin] ¼ 40 nM is above the threshold (blue up- depending on whether they are prepared initially in interphase
triangle) for mitotic entry. (b) In a second experiment, the extract or mitosis
80
35S-cyclin 100
60
40 80 C
20 MPF activity 60
0
Time (min) 0 20 40 60 80 100 40
35S-cyclin 20
MPF 0
activity 0 20 40 60 80 100 120
35S-cyclin (% of maximum)
Cell Cycle Dynamics, Bistability and Oscillations, endogenous Cdc25wt (wild-type) protein, is added 200nM of
Fig. 4 Experimental confirmation of relaxation oscillations in either Cdc25wt protein or Cdc2AF protein, which cannot be
frog egg extracts. From Pomerening et al. (2005), used by phosphorylated and inhibited by Wee1. Hence, in the extract
permission. (a) In a “cycling” extract, cyclin subunits are con- on the right there are roughly equal amounts of Cdc2wt and
tinuously synthesized (blue curve), but MPF activity (red curve) Cdc2AF subunits, compared to the “control” extract on the left
is low until total cyclin exceeds the threshold for MPF activa- which contains 400 nM Cdc2wt subunits. The mutant kinase
tion. The abrupt activation of MPF drives nuclei into mitosis subunits compromise the positive feedback loops in the model
(M). As the extract exits from mitosis, cyclin is degraded by (Fig. 1) and change the properties of MPF oscillations. Com-
Cdc20-APC, and MPF activity falls as a result. (b) The data in pared to the blue curve, the MPF oscillations in the red curve are
panel A is projected onto state space (MPF activity versus total faster, more sinusoidal and noticeably damped, exactly as
cyclin level), as in panel A of Fig. 2bB. (c) Confirmation of predicted by the mathematical model
prediction 4. To a cycling extract, containing 200 nM
Cross-References References
▶ Cell Cycle Dynamics, Irreversibility Chen KC, Csikasz-Nagy A, Gyorffy B, Val J, Novak B,
Tyson JJ (2000) Kinetic analysis of a molecular
▶ Cell Cycle Model Analysis, Bifurcation Theory
model of the budding yeast cell cycle. Mol Biol Cell
▶ Cell Cycle Modeling, Differential Equation 11:369–391
▶ Cell Cycle of Early Frog Embryos Cross FR, Archambault V, Miller M, Klovstad M (2002) Testing
▶ Cell Cycle, Budding Yeast a mathematical model for the yeast cell cycle. Mol Biol Cell
13:52–70
▶ Cell Cycle, Fission Yeast
C 270 Cell Cycle Dynamics, Irreversibility
Murray AW, Kirschner MW (1989) Dominoes and clocks: the transitions (Novak et al. 2007). By “irreversibility” we
union of two views of the cell cycle. Science 246:614–621 understand that, once cells have committed to a new
Nasmyth K (1996) At the heart of the budding yeast cell cycle.
Trends Genet 12:405–412 round of DNA replication (at the G1/S transition), they
Novak B, Tyson JJ (1993) Numerical analysis of a comprehen- do not typically slip back into G1 phase and do
sive model of M-phase control in Xenopus oocyte extracts a second round of DNA replication. Similarly, once
and intact embryos. J Cell Sci 106:1153–1168 cells have committed to mitosis (at the G2/M transi-
Pomerening JR, Sontag ED, Ferrell JE Jr (2003) Building a cell
cycle oscillator: hysteresis and bistability in the activation of tion), they do not typically slip back into G2 phase and
Cdc2. Nat Cell Biol 5:346–351 try for a second mitotic division.
Pomerening JR, Kim SY, Ferrell JE Jr (2005) Systems-level Like most “rules” in biology, this one has important
dissection of the cell-cycle oscillator: bypassing positive exceptions (Morgan 2007). Some cells carry out
feedback produces damped oscillations. Cell 122:565–578
Sha W, Moore J et al (2003) Hysteresis drives cell-cycle transi- repeated rounds of DNA replication without interven-
tions in Xenopus laevis egg extracts. Proc Natl Acad Sci ing mitoses, becoming polyploid. During meiosis,
USA 100:975–980 a germ line cell undergoes two meiotic divisions with-
Solomon MJ, Glotzer M et al (1990) Cyclin activation of out an intervening S phase. But these exceptions only
p34cdc2. Cell 63:1013–1024
Tyson JJ, Novak B (2008) Temporal organization of the cell reinforce the centrality of the general rule of somatic
cycle. Curr Biol 18:R759–R768 cell cycles, namely, irreversible progression through
the cell cycle phases (G1, S, G2, M) in strict order.
The irreversibility of the three transitions is inti-
mately connected with cell cycle checkpoints, which
Cell Cycle Dynamics, Irreversibility halt further progression through the cell cycle when-
ever serious problems are detected (Murray 1992). For
John J. Tyson1 and Béla Novák2 example, DNA damage incurred during G1 phase will
1
Department of Biological Sciences, Virginia block the G1/S transition until the damage is repaired.
Polytechnic Institute and State University, Failure to fully replicate and ligate DNA molecules
Blacksburg, VA, USA during S phase will block the G2/M transition. Incom-
2
Department of Biochemistry, Oxford Centre for plete alignment of replicated chromosomes on the
Integrative Systems Biology, University of Oxford, mitotic spindle will block the M/G1 transition. When
Oxford, UK a checkpoint is lifted, the cell proceeds irreversibly to
the next phase of the cell cycle.
These irreversible transitions and checkpoints are
Synonyms controlled by complex molecular regulatory networks
with both positive and negative feedback. Positive
Bistability; Checkpoints; Hysteresis feedback creates a bistable switch, and negative
feedback allows the switch to be flipped from one
stable state to another (Tyson and Novak 2008).
Definition These systems-level properties of the regulatory net-
work are crucial to irreversible progression through the
The cell cycle is the sequence of events whereby a cell cell cycle, and when they are disturbed by mutation,
replicates its chromosomes and partitions the identical drugs, or disease, then cells make mistakes in DNA
sister chromatids to two separate nuclear compart- replication and partitioning, often with fatal conse-
ments (Morgan 2007). In eukaryotic cells, the phases quences for the cell or for the multicellular organism
of DNA synthesis (S phase) and mitosis (M phase) are harboring the rogue cells.
temporally distinct and separated by gap phases (G1 –
unreplicated chromosomes, and G2 – replicated chro-
mosomes). To maintain the proper ploidy of a cell Characteristics
lineage, generation after generation, it is essential
that S and M phases are strictly alternating. This alter- Physiology and Molecular Biology
nation is enforced by the irreversibility of three crucial Progression through the cell cycle is governed by a set
transitions in the cell cycle: the G1/S, G2/M, and M/G1 of cyclin-dependent kinases (CDKs) that initiate DNA
Cell Cycle Dynamics, Irreversibility 271 C
S-G2 phase M phase (see Fig. 1). During S and G2 phases, the cell
is producing M-phase cyclins, but CDK/cyclin-M
High CDKS activity
activity is low because of inhibitory phosphorylation
of the CDK subunit. At the G2/M transition, the
G2 / M M phase kinases that impose this inhibition must be inactivated,
TF↑
P-CDKM and the phosphatases that remove this inhibition must
ULE↓ CDKM
CKI↓ be activated, thereby allowing the cell to transit C
M / G1 abruptly into mitosis. At the M/G1 transition, these
Exit two switches must be reversed as well.
G1 / S
Start The scenario just described is a generic account of
TF↓ molecular events at the three basic transitions of the
ULE↑
cell cycle. In any specific type of cell (e.g., budding
CKI↑
yeast, fission yeast, mammalian cell) there may be
G1 phase variations on this general scheme, but most cell types
Low CDK activity
studied in depth show evidence of CDK-governing
toggle switches that are flipped on and off at alternat-
Cell Cycle Dynamics, Irreversibility, Fig. 1 Three check- ing transitions of the cell cycle.
points/irreversible transitions in the eukaryotic cell cycle. G1 The greatest exception to this rule is the newly
phase: low CDK activity. S-G2 phase: CDK-cyclinS activity is fertilized egg, a gigantic cell that undergoes rapid
high, CDK-cyclinM activity is low. M phase: CDK-cyclinM
activity is high. At the G1/S transition, cyclin synthesis is turned S-M cycles, lacking gap phases and checkpoints. It is
on, cyclin degradation is turned off, and CDK inhibitors are arguable whether these cell cycles are governed by
destroyed. At the M/G1 transition, these switches are reversed. irreversible toggle switches or a simple periodic oscil-
In S-G2 phase, CDK-cyclinM activity is kept low by phosphor- lator. Nonetheless, it is true that early embryonic cell
ylation of the CDK subunit. At the G2/M transition, this inhib-
itory phosphorylation is removed cycles often make serious mistakes (producing poly-
ploid or aneuploid cells) that lead ultimately to death of
the embryo. These errors seem to be a price that organ-
replication and mitosis and that inhibit the transition isms are willing to pay in order to hurry through the
from metaphase to anaphase-telophase-cytokinesis most vulnerable phase of their life cycle. In this con-
(“exit from mitosis”). G1 phase cells are uncommitted text, these errors reinforce the notion that toggle
to the cell cycle because they lack the necessary CDK switches and irreversible transitions are crucial to
activities. They are devoid of S- and M-phase cyclins maintaining genomic integrity of proliferating eukary-
because the transcription factors that would drive pro- otic cells.
duction of these cyclins are inactive, and the ubiquitin-
ligating enzymes (ULE) that label these cyclins for Bistability and Hysteresis
proteolysis are active. In addition, G1 phase cells From a theoretical perspective, irreversible transitions
contain generous amounts of CDK inhibitors (CKIs) are intimately related to bistability of molecular regu-
that bind to and inhibit CDK-cyclin dimers. latory networks. The article on “cell cycle dynamics,
At the G1/S transition (called “Start” in budding bistability and oscillations” discusses the molecular
yeast and the “Restriction Point” in mammalian cells), mechanisms underlying bistability and hysteresis at
a cell must do three things: (1) turn on the synthesis of the G2/M transition. Here we describe the molecular
S- and M-phase cyclins, (2) turn off their degradation, basis of irreversibility at Start and Exit. Figure 2 illus-
and (3) get rid of the G1-phase CKIs (See Fig. 1). At trates the generic interactions among CDK-cyclin, its
the M/G1 transition (“Exit”), these three switches must transcription factors, its ubiquitin-ligating enzymes,
be reversed. As cells proliferate, a lineage (mother- and its stoichiometric inhibitors. In each case, CDK
daughter-granddaughter) toggles back and forth is involved in a positive feedback interaction with
between states of low CDK activity (G1) and high a “friend” (TF) or a double-negative feedback interac-
CDK activity (S-G2-M). tion with an “enemy” (ULE or CKI). These interac-
Embedded inside this fundamental toggle switch is tions create the possibility of bistability: two stable
a secondary toggle switch governing the transition into steady states (“nodes”) separated by an unstable steady
C 272 Cell Cycle Dynamics, Irreversibility
SK SK
EP EP
CDK cyclin CDK-cyclinUUU
CDK CDK
EP*
SK
CKI-CDK-cyclin
EP
Cell Cycle Dynamics, Irreversibility, Fig. 2 CDK regulatory CKI and CDK-cyclin. These interactions create a bistable
network for a generic eukaryotic cell. Cyclin synthesis is regu- control system, with a stable steady state of low CDK activity
lated by a transcription factor (TF), and cyclin degradation is (G1 phase) and an alternative stable steady state of high CDK
regulated by a ubiquitin-ligating enzyme (ULE). CDK-cyclin activity (S-G2-M phase); see Figs. 1 and 3. Starter kinase activ-
dimers can also be inhibited by binding to an inhibitor protein ity (SK) tips the control system toward the high CDK state, and
(CKI). CKI stability is controlled by phosphorylation: PCKI is exit phosphatase activity (EP) tips the system toward the low
a good substrate for a different ubiquitin-ligating enzyme, which CDK state. SK and EP are related to CDK by negative feedback
labels CKI for degradation by proteasomes. Because TF is acti- loops. SK upregulates CDK-cyclin by flipping the switch to the
vated by phosphorylation by CDK-cyclin, these two components high CDK state, but high activity of CDK-cyclin downregulates
are involved in a positive feedback loop. Because ULE is the synthesis of SK. On the other hand, high activity of CDK-
inactivated by phosphorylation by CDK-cyclin, these two com- cyclin activates EP, which flips the switch to the low CDK state
ponents are involved in a double-negative feedback loop, as are and consequently EP is inactivated
state (a “saddle point”). The stable steady states are TF toward its “off” state, allowing the bistable switch
characterized by either low or high CDK activity to flip off. The M/G1 transition is irreversible because,
(Fig. 3). The CDK control system can persist in either even though EP activity disappears in early G1, the
of these stable steady states, corresponding to G1 or control system remains in the stable G1 steady state
S-G2-M phases of the cell cycle, respectively. (Tyson and Novak 2008).
For a G1 cell to commit to a new round of DNA
replication and division, it must be promoted from the Irreversibility and Proteolysis
G1 steady state to the S-G2-M steady state. This is the For a time it was fashionable among molecular biolo-
job, generically speaking, of a “starter kinase” (SK), gists to attribute irreversibility of cell cycle transitions
typically a special cyclin-CDK pair (Cln3-Cdc28 in to the proteolysis of key cell cycle regulators at each
budding yeast, cyclin D-Cdk4/6 in mammalian cells). transition. Not only are cyclins dramatically degraded
Transient activation of SK can induce a transition from at the M/G1 transition (Minshull et al. 1989), but also
G1 to S-G2-M. Even if SK activity disappears after the CKIs are degraded at the G1/S transition (Schwob et al.
transition, the control system will remain in the stable 1994) and CDK-inactivating kinases are degraded at
steady state of high CDK activity. In this sense, the the G2/M transition (Michael and Newport 1998). As
G1/S transition is irreversible. To exit mitosis and simple and appealing as this idea might be, it is clearly
return to G1, the cell must transit from the high- to insufficient to explain the irreversibility/directionality
the low CDK steady state. This is the job of generic of cell cycle progression.
“exit phosphatases” (EP) that are activated during ana- First of all, proteolysis is not irreversible in a kinetic
phase and telophase of the cell cycle. These phospha- sense; its opposing process is protein synthesis. In
tases push ULE and CKI toward their “on” states and general, the rates of protein synthesis and degradation
Cell Cycle Dynamics, Irreversibility 273 C
CDK In conclusion, the irreversibility of cell cycle tran-
S / G2 / M sitions, which is crucial to maintaining the genomic
integrity of proliferating cells, is a systems-level
property of the molecular networks that regulate
CDK activity in eukaryotic cells. Positive feedback
loops in the control system generate multiple
stable steady states (G1, S-G2 and M states), and C
abrupt, irreversible transitions between these stable
G1 steady states are induced by pro-proliferative signals
(growth factors, size increase, successful completion
EP SK of prior cell cycle events) and inhibited by checkpoint
signals (indicators of potentially fatal mistakes in
Cell Cycle Dynamics, Irreversibility, Fig. 3 Bistability, irre-
versibility, and hysteresis. The steady states of the control system the genome replication process). Irreversibility is
in Fig. 2 are related to SK and EP activities by this bifurcation not a property of any single gene, protein, or reaction
diagram. When SK ¼ EP ¼ 0, the control system has two stable but rather an emergent property of systems-level
steady states (black circles) separated by an unstable steady state
interactions.
(white circle). The cell can be in either G1 phase or S-G2-M
phase, depending on its recent history. A newborn cell, by
definition, is in G1 phase. For it to start a new round of DNA
synthesis and degradation, a starter kinase (SK) must be acti- Cross-References
vated. Activation of SK is controlled by checkpoint signals, such
as growth factors, DNA damage sensors, size sensors, etc. Once
SK flips the switch into the CDK-active state, it is no longer ▶ Cell Cycle Dynamics, Bistability and Oscillations
needed: SK activity can drop to 0, and the control system remains ▶ Cell Cycle Model Analysis, Bifurcation Theory
in the CDK-active state. For the cell to exit mitosis, undergo ▶ Cell Cycle Modeling, Differential Equation
cytokinesis, and return to G1 phase, an exit phosphatase (EP)
▶ Cell Cycle of Early Frog Embryos
must be activated. EP activation depends on other checkpoint
signals, such as complete replication of DNA and successful ▶ Cell Cycle, Budding Yeast
alignment of all replicated chromosomes on the mitotic spindle ▶ Cell Cycle, Fission Yeast
Characteristics
Cell Cycle Kinases
Historical Background
▶ Mitotic Kinases Since the days of Isaac Newton, ordinary differential
equations (ODEs) have been used throughout the
physical and life sciences to describe the temporal
development of dynamical systems: from the solar
Cell Cycle Model Analysis, Bifurcation system, to the clock radio, to the regulation of DNA
Theory replication and cell division. Initially the focus was on
ODEs that could be solved exactly in terms of
John J. Tyson “elementary” functions of high school algebra and
Department of Biological Sciences, Virginia trigonometry or “special” functions of mathematical
Polytechnic Institute and State University, physics. But in the 1890s Poincaré (1899) introduced
Blacksburg, VA, USA the “qualitative” theory of dynamical systems, i.e.,
systems of n nonlinear ODEs
Synonyms dx
¼ fðx; pÞ; xðtÞ ¼ fx1 ; :::; xn g ¼ Variables;
dt (1)
Nonlinear dynamical systems theory
p ¼ fp1 ; :::; pm g ¼ Parameters
ible; e.g., a Hill function. (b) The threshold for gene inactivation SNIC ¼ saddle-node invariant-circle
2
(yinact) is lower than the threshold for gene activation (yact): The
signal-response curve has a region of bistability and functions
like a toggle switch. (c) The gene cannot be inactivated by
lowering the signal strength even to zero: The control system
functions as a one-way switch These examples suggest that the signal-response
curves often measured by cell physiologists are none
other than one-parameter bifurcation diagrams in the
It is possible that yinact < 0, in which case the signal, parlance of applied mathematicians. If we may associ-
S ¼ [S] (a positive number), cannot be made small ate abrupt, qualitative changes in signal-response char-
enough to flip the switch off. In this case, the control acteristics of living cells with bifurcations in vector
system is said to be a one-way switch. By increasing S, fields of nonlinear dynamical systems, then it is natural
the switch can be turned on, but it cannot be turned off to ask how many different types of generic bifurcations
by decreasing S. are exhibited by dynamical systems and what do they
There are many convincing examples of toggle look like? Are there hundreds of different types of
switches in molecular and cell biology generally bifurcations to match the seemingly boundless variety
(Tyson et al. 2003) and in cell cycle regulation partic- of cellular behaviors? Or are all the peculiarities of
ularly (▶ Cell Cycle Dynamics, Bistability and cellular signal processing simply variations on a few
Oscillations). common themes? (Table1)
In another common physiological situation, The answer is the latter. In addition to the saddle-
a cellular process begins to oscillate when node and Hopf bifurcations illustrated in Fig. 1, there
a stimulating signal gets large enough (Goldbeter are only a few other common, generic, one-parameter
1996). In this case, the signal-response curve exhibits bifurcations: subcritical Hopf, cyclic fold, saddle-loop,
a Hopf bifurcation, as in Fig. 1b. and SNIPER bifurcations (Fig. 3).
Cell Cycle Model Analysis, Bifurcation Theory 277 C
a b Hopf
c Hopf
Subcritical
xi Hopf xi
CF SNIPER
xi
SN
C
SL
pj pj pj
Cell Cycle Model Analysis, Bifurcation Theory, Fig. 3 The other common types of bifurcation points. (a) Subcritical Hopf
bifurcation and cyclic fold (CF) bifurcation. (b) Saddle-loop (SL) bifurcation. (c) Saddle-node infinite-period (SNIPER) bifurcation
References
Synonyms
a b
Cln3 Cln3
Clb5_6 Clb5_6
Clb1_2 Clb1_2
Cdh1 Cdh1
Mcm1_SFF Mcm1_SFF
Cell Cycle Modeling Using Logical Rules, Fig. 1 Boolean regulators. Right: Boolean model adapted from the previous one
models of the Budding yeast cell cycle. Left: Original Boolean using specific logical formulae (see Table 1). Both graphs have
model presented in (Li et al. 2004). This model was simulated been drawn using GINsim (Naldi et al. 2009). Green normal
synchronously (see Fig. 2) using a generic summation rule. In arrows stand for activation, red T-arrows stand for inhibition,
this framework, self-inhibiting arrows had to be introduced to yellow T-arrows represent the “self-degradation” loops intro-
account for degradation of components that only have positive duced in (Li et al. 2004)
authors had to include “self-degradation” loops logical operators. In the absence of Cln3, this model
(in yellow in Fig. 1, left), which do not represent has a unique stable state corresponding to the G0
true auto-inhibitory processes. Using a synchronous phase. Starting with initial conditions enabling the
(deterministic) updating strategy, they identified initiation of the cell cycle (with Cln3 ON), this sim-
seven stable states as the sole attractors of the system plified model yields a cyclic attractor, which recapit-
and further established that the large majority of ulates the sequence of events along the cell cycle (see
Boolean states lead to the same stable state, Fig. 2, right).
which corresponds to the G1 phase. Furthermore, Using this second approach, several groups have
starting with initial conditions matching the entry designed more sophisticate models encompassing
into cell cycle, as shown in Fig. 2 (left), the system additional regulatory components (e.g., checkpoints
follows a trajectory consistent with the succession (▶ Cell Cycle Checkpoints)) and recapitulating
of molecular events and phases along budding yeast the behavior of the systems following various
cell cycle. kinds of perturbations (e.g., single or multiple loss-
A second approach consists in defining of-function mutations) (Irons 2009; Fauré et al.
a specific logical rule for each node in the network. 2009). Interestingly, these studies show that various
Fig. 1 (right) shows a transposition of the model by Li kinetic assumptions lead to similar consistent
et al. (2004) into a logical regulatory graph, while dynamical behaviors, indicating that the control net-
Table 1 lists the logical rules associated with each work is sufficiently robust to cope with stochastic
component of the network. Notice that the network is fluctuations of synthesis/degradation rates. The
somewhat simplified (elimination of the auto- multi-level model presented by Fauré et al. is argu-
inhibitory loops), while we have now different types ably the most comprehensive logical cell cycle
of regulatory rules combining NOT, AND, and OR model to date.
Cell Cycle Modeling Using Logical Rules 281 C
Cell Cycle Modeling Using
Cdc20_Cdc14
Cdc20_Cdc14
Logical Rules, Fig. 2 Left:
Mcm1_SFF
Mcm1_SFF
simulation of the original
model by Li et al. (2004) (cf.
Cln1_2
Clb5_6
Clb1_2
Cln1_2
Clb5_6
Clb1_2
Phase
Phase
Cdh1
Cdh1
Figure 1, left). Starting from an
Swi5
Swi5
MBF
MBF
Step
Step
Cln3
Cln3
SBF
Sic1
SBF
Sic1
initial state corresponding to
G1, simulation yields
a sequence of states that 1 G1
follows the normal course of
1 START C
the cell cycle through the 2 G1
2 G1
different phases G1, S, G2, and
M, until reaching a stable G1 3 S
state. The initial state 3 G1
corresponds to an “excited”
4 G1 4 G2
G1 stable state, with the Cln3
variable set to 1. Right:
simulation of the revised 5 S 5 M
model, with the input Cln3
always ON (cf. Figure 1, 6 G2 6 M
right). Black cells denote
activity of the corresponding 7 M 7 G1
components, white cells
denote inactivity 1 G1
8 M
9 M
10 M
11 M
12 M
13 Stationary G1
Cell Cycle Modeling Using Logical Rules, Table 1 Specific Toward the Modeling of Mammalian Cell Cycle
logical rules associated with the revised version of the model by Networks
Li et al. (2004) presented in Figure 1 (right). “&”, “|”, and “!” Proper modeling of the regulatory network controlling
stand for the Boolean operators AND, OR, and NOT,
respectively mammalian cell cycle remains a daunting challenge.
Most published models concentrate on the control of
Components Logical rules
the G1-S transition (see Fauré et al. 2006, for a generic
MBF Cln3 & !Clb1_2
Boolean model). Various groups are currently devel-
SBF Cln3 & !Clb1_2
oping models of human cell cycle in connection with
Cln1_2 SBF
Cdh1 !(Cln1_2 | Clb5_6 | Clb1_2) | Cdc20_Cdc14
cancer studies, with the aim of identifying intervention
Swi5 ((Mcm1_SFF | Cdc20_Cdc14) & !Clb1_2) | points to block uncontrolled proliferation. One diffi-
(Mcm1_SFF & Cdc20_Cdc14 & Clb1_2) culty is that the actors to consider depend to some
Cdc20_Cdc14 Clb1_2 | Mcm1_SFF extent on the cellular context (cell type, environmental
Clb5_6 (MBF | !Sic1) & !Cdc20_Cdc14 signals, presence of mutations).
Sic1 (Swi5 & Cdc20_Cdc14) | For example, Sahin et al. developed a Boolean
!(Cln1_2 | Clb5_6 | Clb1_2) model of the control of the G1-S transition by the
Clb1_2 !(Cdh1 | Sic1 | Cdc20_Cdc14) | EGF pathway to predict potential targets for the treat-
(Clb5_6 & Mcm1_SFF)
ment of trastuzumab-resistant breast cancer (Sahin
Mcm1_SFF Clb1_2 | Clb5_6
et al. 2009). Predictions yielded by loss-of-function
C 282 Cell Cycle Modeling, Differential Equation
▶ Cell Cycle
▶ Cell Cycle Checkpoints Synonyms
▶ Cell Cycle, Budding Yeast
▶ Cell Cycle, Fission Yeast Kinetic equations; Rate equations; Reaction-diffusion
▶ Cyclins and Cyclin-dependent Kinases equations
Cell Cycle Modeling, Differential Equation 283 C
Definition
F
The physiological properties of cells – their growth, X
division, movement, signaling, metabolism, differentia-
tion, and death – are all controlled by gene-protein G H
regulatory networks of considerable complexity.
F
Thanks to the revolutionary advances of molecular XP C
genetics in the latter part of the twentieth century,
much is known now about the genes and proteins that Cell Cycle Modeling, Differential Equation, Fig. 1 A typical
constitute these networks and about their interactions, as molecular regulatory network. In this diagram, letters denote
chemical species (proteins) and solid arrows represent chemical
well as the meso-scale topology of particular regulatory reactions transforming substrate(s) into product(s). A letter
networks and the global topology of genome-wide sur- sitting next to an arrow denotes the enzyme catalyzing the
veys of gene, mRNA and protein interactions. The anal- reaction. Dashed arrows represent “influences” of one protein
ysis of genome-wide (“omics”) data is still very much on another. Barbed arrows denote “activation” and blunt arrows
denote “inhibition.” The dynamics of this reaction network is
dominated by statistical methods, but at the local and represented by the system of five ODEs in Eq. 2
meso-scale of network complexity it is possible to build
detailed, accurate, and predictive models of the dynam-
ics of network behavior by using differential equations. each of these reactions (e.g., the total concentration of
The applicability of differential equations to model- the enzyme that catalyzes reaction j), and kj ¼ the rate
ing network dynamics in general, and cell cycle regu- constant(s) needed to express the material flux through
lation in particular, is based on the following logic reaction j. Similarly, Ril ¼ rate of the l-th reaction that
(Tyson et al. 2001). A molecular regulatory network removes species i, etc.
can be described, depending on the level of detail Regulatory networks like Fig. 1 are sometimes
available from experimental investigations, by called “wiring” diagrams, in analogy to the schematic
(1) a system of biochemical reactions or (2) an “influ- diagrams of electrical devices. Just like the dynamical
ence” diagram or (3) a hybrid of the two types. These behavior of an electrical device can be predicted (in
types of descriptions are illustrated in Fig. 1. Realistic practice) from Kirchoff’s Laws (ODEs for the voltages
networks are, of course, much more complex than the at various points in the circuit), so the dynamics of
example given. Whether the network is described by a molecular regulatory network can be predicted (in
chemical reactions or influences or a hybrid of the two, principle) from ODEs (Eq. 1). Unfortunately, the anal-
the network diagram is trying to tell us how one bio- ogy to electrical engineering goes no further. For an
chemical species is affecting the rates of production or electrical device, we can obtain a schematic wiring
removal of another species. As such, the network dia- diagram from the manufacturer, as well as
gram can be converted into a set of ordinary differen- a specification of the numerical values of the parame-
tial equations (ODEs), one ODE for each time-varying ters that characterize each component (resistances,
biochemical concentration: capacitances, etc.). For the living cell, we must guess
the wiring diagram, and we must estimate the rate
d½Xi X constants from the very experiments we are trying to
¼ Pij ½X... ; ½M... ; kj explain. In essence, we must “reverse engineer” the
dt j
X cell’s circuitry by performing well-designed experi-
Ril ð½X... ; ½M... ; kl Þ (1) ments to probe the input-output (signal-response) char-
l acteristics of the cell under normal and contrived
conditions (including mutations which scramble the
In this equation, [Xi] ¼ concentration of species i, wiring diagram in controlled ways).
Pij ¼ rate of the j-th reaction that produces species i, For the many ways that differential equations have
[X. . .] ¼ concentrations of the time-varying biochem- been used in molecular and cell biology, see the classic
ical species that participate in each of these reactions, books by Edelstein-Keshet (1988), Murray (1989),
[M. . .] ¼ constant concentrations of the time-invariant Goldbeter (1996), Fall et al. (2002), and Keener and
biochemical species (“modifiers”) that participate in Sneyd (2009).
C 284 Cell Cycle Modeling, Differential Equation
1 2
a 1.8 b
0.8 1.6 XT(t)
1.4
0.6 1.2
X(t)
1
0.4 0.8
0.6
0.2 0.4 X(t)
SSS 0.2
0 0
0 1 2 3 4 5 0 5 10 15 20 25 30 35 40 45 50
t t
0.6
c
0.5
0.4
SSS
X(t)
0.3
0.2 USS
0.1 SSS
0
0 2 4 6 8 10
t
Cell Cycle Modeling, Differential Equation, Fig. 2 Repre- returning to the steady state. This behavior is called “excitability.”
sentative simulations of ODEs (Eq. 2). The parameter values and (b) A stable limit cycle oscillation. In addition to X(t) ¼ [MPF],
initial conditions for these simulations are given in Tables 1 and 2. we also plot XT(t) ¼ X(t) + XP(t) ¼ [total cyclin]. The period of
(a) A unique stable steady state (sss). Small perturbations away oscillation is 29 min. Compare to Fig. 1V in Pomerening et al.
from the steady state return immediately. A larger perturbation, (2005). (c) Two coexisting stable steady states separated by an
X(0) ¼ 0.5, exhibits a transient pulse of MPF activity before unstable steady state (uss)
CycB-Cdk1
Cell Cycle Modeling, Petri Nets, Fig. 1 Petri net encoding of the interactions between Cdh1 and the CycB-Cdk1 dimer
Starting from the initial marking, the net can reach responsible for cell cycle arrest in G1 phase, is studied
new markings, which represent the evolution of the through the techniques of p-invariant and t-invariant
state of the modeled variables. In our context, the firing analysis (Murata 1989).
of transitions usually corresponds to the occurrence of
reactions and the changes in the marking of the places Petri Nets with Stochastic Firing Times
caused by the firing corresponds to the updates of the The quantitative information about the speed of reac-
abundance of reactants and products of reactions. tions is introduced into Petri Net models by various
It is quite immediate to realize that several concepts extensions of the firing semantics. The first one we
need a more precise definition to reach a complete consider here proposes a stochastic formulation of
characterization of the firing semantics. Consider for reaction occurrence times (Marsan 1990), where the
instance the case when multiple transitions are competition among concurrently enabled transitions is
enabled: obviously, rules are needed for defining the broken down through a random choice modulated by
order of firings, as different orders may result in dif- the speed of the competing transitions. More precisely,
ferent final markings of the net. each transition firing occurs with a time that follows
a probability distribution law, and in a set of simulta-
Qualitative Modeling with Petri Nets neously enabled transitions the smallest random time
The earliest definition of Petri Nets introduced a non- determines which transition will fire first. This firing
deterministic choice of the firing, which is the same as semantics is called race policy. In the case of contin-
to say that each order is possible, compatibly with the uous-time distributions of the firing times, no simulta-
enabling rules of the firing. Basically, a non- neous firings of transitions are thus allowed. Petri Nets
deterministic firing policy assigns an equal occurrence adhering to this semantics of firing are commonly
priority to each transition, irrespective of the time called Stochastic Petri Nets.
dimension of net evolution, as determined by the rela- Of particular interest in Systems Biology are the
tive speed with which reactions take place in the Stochastic Petri Nets whose firing times are distributed
modeled system. This type of firing semantics is useful as negative exponential ▶ random variables, in which
when no quantitative information is available about the case the marking evolution process over time is equiv-
speed of reactions, and can nevertheless provide a view alent to a discrete-space continuous-time Markov pro-
of the possible markings a given Petri Net model can cess (▶ Cell Cycle Modeling, Stochastic Methods;
reach. Thus, it can help in showing that some features ▶ Stochastic Processes, Fokker-Planck Equation).
of the model exist no matter which speed the real This class of models lend themselves to naturally rep-
kinetics of reactions possesses. An example of this resent the stochastic molecular kinetics as described by
Petri Net qualitative modeling can be found in Gillespie (1977), and their evolution can be studied
(Sackmann et al. 2006), where the mating pheromone through ▶ Gillespie stochastic simulation algorithm
response pathway of budding yeast, which is (Gillespie 1977).
Cell Cycle Modeling, Petri Nets 289 C
0.25
Cell mass distribution (A.U.)
0.2
Wild type sic1Δ
0.15
0.1 C
0.05
0.0
0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75
Cell Cycle Modeling, Petri Nets, Fig. 2 Cell mass spread in wild type and sic1D mutant of budding yeast
Among the modeling works that resort to stochastic degradation times of the key regulator Cdh1, and not
methods (▶ Cell Cycle Modeling, Stochastic only their average values, are critical in determining
Methods), we found in the literature a few applications the overall variability of cell cycle statistics. This
of Stochastic Petri Nets to the modeling and analysis of result provides an important argument for supporting
yeast cell cycle. The work in (Mura and Csikász-Nagy the validity of pursuing a stochastic modeling of cell
2008) provides a model of budding yeast (▶ Cell cycle regulation networks.
Cycle, Budding Yeast) cell cycle regulation network.
The model validation is performed with a comparison Hybrid Petri Nets
of simulated results against a set of experimental Hybrid Functional Petri Nets are another popular
results about wild type as well as budding yeast mutant timed extension of Petri Nets that has been used to
cells. The authors demonstrate how the Stochastic model cell cycle regulation. This modeling formalism
Petri Net model can be used to obtain statistics of the allows the coexistence of both continuous and dis-
key cell cycle parameters such as duration and cellular crete processes. Both the set of places and that of
mass, beyond the average one. This information pro- transitions is divided in a discrete and a continuous
vides insights on cell cycle characteristics. For part. Enabled discrete transitions fire after a deter-
instance, Fig. 2 compares the simulated cell mass dis- mined delay, consuming and producing discrete num-
tribution of wild type budding yeast with that of the bers of tokens as in regular Petri Nets. Enabled
sic1D mutant, where the Cdk inhibitor of cyclins is continuous transitions fire continuously at a given
removed. The chart clearly shows that the spread of the rate. This latter feature provides a convenient abstrac-
cell mass distribution is quite different in the two tion mechanism for directly representing into the
modeled organisms. The mutant mass distribution model processes that are difficult to model with
exhibits a higher variability, in agreement with exper- a discrete number of tokens, such as very high molec-
imental observations. Thus, the stochastic model ular concentrations, as well as external driving vari-
reveals a characteristic of the phenotype of sic1D ables such as cell mass.
cells that could not be anticipated by deterministic The cell cycle regulatory network of fission
models. yeast (▶ Cell Cycle, Fission Yeast) has been modeled
In (Csikász-Nagy and Mura 2010), Stochastic Petri and analyzed by Hybrid Functional Petri Nets in
Nets are used to explore the effects of intrinsic noise (Fujita et al. 2004). This study demonstrate the versa-
propagation on the variability of cell cycle duration tility of the Hybrid Functional Petri Net modeling
and cell mass at division time in fission yeast (▶ Cell formalism in representing a broad range of molecular
Cycle, Fission Yeast). In particular, the authors show regulation mechanisms and the intuitiveness of
that the variance of RNA synthesis and RNA the modeling process. A comprehensive model of the
C 290 Cell Cycle Modeling, Process Algebra
denotes the substitution of the free occurrences of c Cell Cycle Modeling, Process Algebra, Table 2 Summary on
by the actually received name b. tools and case studies of process algebras used for modeling
biological systems
Since names can be transmitted in interactions, the
restriction operator plays a special role in name- Simulation Case studies
passing algebras. Restricted names cannot be used as Biochemical BioSPI, SPiM Autoreactive lymphocyte
pi-calculus recruitment, gene regulation,
transmission media; they can, however, be used as cell cycle, Rho GTP-binding
transmitted data and, once transmitted, they become in phagocytosis, FGF
private resources shared by the sender and the receiver. pathway
The mentioned operators are those common to var- Bio-PEPA PEPA ERK signaling pathway, cell
ious process algebras; in addition to these, each calcu- workbench, cycle, epidemiological
eclipse plug-in models, genetic networks,
lus adopts a few specific operators (see specific entries NF-kB
▶ Stochastic pi-calculus, ▶ κ-Calculus, ▶ Bio-PEPA, BlenX CoSBiLab Cell cycle, circadian clock,
▶ BlenX for more details) (BetaWorkbench) actin polymerization, NF-kB,
MAPK, EGFR
Process Algebras, Comparing Different Languages k-calculus Kappa factory Cellular signals
Table 2 schematically reports a comparison between
different process algebras with the availability of sim-
ulation platforms (usually implementing some variants
of the ▶ Stochastic simulation algorithm) and the and make them active; then the dimers CYCLIN/CDK,
investigation of some complex case studies. the activator CDC14, and the CDH1 are involved in
a negative feedback loop: CYCLIN/CDK turns on
Process Algebras, Applications in Biology (Cell CDC14, which activates CDH1, which inhibits the
Cycle) CYCLIN/CDK activity, destroying the cyclin sub-
As the best of our knowledge, the only three process units. The model includes also another possibility of
algebras used to study different aspects of the cell inhibition of CYCLIN/CDK: the stoichiometric bind-
cycle are ▶ BioSpi, ▶ Bio-PEPA, and ▶ BlenX. ing with CKI.
Here we just sketch the main characteristic of the So the main events that the code simulates are: the
chosen model with respect to the peculiar feature of dimers CYCLIN/CDK formation (by the usage of pri-
those languages just to give to the reader a flavor of vate channels between the two processes), phosphory-
how different process algebras handle various aspect lation, and dephosphorylation of CDH1 by CDC14
of the cell cycle, usually studied in the deterministic (by changing of the current state of the process) and
framework. We refer the reader to the original publi- the protein degradation (by changing the current state
cations for more detailed discussions. of the process to the constant empty process). All the
events occur on global channels at different suitable
A Cell Cycle Model in BioSpi rates (with the exception of the dimer formation that
Lecca and Priami (2003) chose to implement the con- occurs on a private channel between the two pro-
trol mechanism of the cell cycle (modeled by Novak in cesses). The different reactions in which the compo-
1999) that is accounting for the antagonistic interac- nents of the system are involved are implemented as
tion between cyclin-dependent kinase dimers and the a multiple non-deterministic choice, that is then turned
anaphase-promoting complex: The APC extinguishes into a probabilistic one by the BioSpi simulator
cdk activity by destroying its cyclin partners, whereas (See ▶ stochastic simulation algorithm for more
cyclin/cdk dimers inhibit APC activity by phosphory- details about the logic behind the simulator engine).
lating Cdh1 (the interaction is also mediated by
a cyclin-dependent kinase inhibitor). A Cell Cycle Model in Bio-PEPA
The system is composed by six concurrent pro- Ciocchetta and Hillston (2009) chose the model
cesses, corresponding to the main five species of pro- presented by Goldbeter in 1991 and was later extended
teins, which regulate the cell cycle: CYCLIN, CDK, by Gardner et al. in 1998. Broadly speaking, the model
CDH1, CKI, CDC14 plus an auxiliary process describes the negative feedback loop obtained by
CLOCK. First, cyclin subunits bind to CDK monomers the fact that the cyclins promote the activation of
Cell Cycle Modeling, Process Algebra 293 C
a cdk (cdc2) which in turn activates a cyclin protease appropriate values at each simulation step and put
which in turn promotes cyclin degradation. In order to them in the classical race condition with the ele-
represent a control mechanism for the cell division mentary reaction rate, following the ▶ stochastic
cycle, Gardner and colleagues introduced a protein simulation algorithm strategy.
that initiates and concludes the cell division phase • Continuous variables can be defined in the input file
and modulates the frequency of oscillations. and their value is updated by the simulator so that
The Bio-PEPA model of the system has been built the user can define functional rates that refer to C
following this simple sequence of steps: them and that are updated by the simulator when-
• Definition of the initial and the maximum concen- ever the continuous variable is recalculated.
trations (information derived from the paper) and The BlenX model has been used for performing
choice of the appropriate level/step size that define stochastic simulations of wild type and mutants cells,
the level of granularity at which the user wants to showing how the effect of the noise is important to
study the system. characterize partially viable mutants at the border of
• Definition of functional rates and parameters of the life and death.
model. Concerning the kinetic laws, some of the
reactions in the input model follow mass-action
kinetics, whereas all the others have Michaelis– Cross-References
Menten kinetics.
• Definition of species components (i.e., the reagents/ ▶ Bio-PEPA
products of the different reactions) and of the model ▶ BioSPI
component (i.e., the composition of the species ▶ BlenX
components). ▶ κ-Calculus
The Bio-PEPA model has then been analyzed, and ▶ Stochastic Pi-calculus
some interesting properties about different parameter ▶ Stochastic Simulation Algorithm
sets and their effect on the underlying Continuous
Time Markov Chain have been investigated.
References
A Cell Cycle Model in BlenX
Palmisano et al. (2009) illustrates an easy translation to Calder M, Hillston J (2009) Process algebra modeling styles
BlenX of one of the most popular deterministic model for biomolecular processes. Lect Notes Comput Sci
5750:1–25
of the budding yeast cell cycle developed by Novak Ciocchetta F, Hillston J (2009) Bio-PEPA: a Framework for the
and Tyson in 2003. This model contains species Modeling and Analysis of Biological Systems. Theor
involved in reactions at very different levels of abstrac- Comput Sci 410(33–34):3065–3084
tion: There are elementary reactions following mass- Guerriero ML, Prandi D, Priami C, Quaglia P (2009) Process
calculi abstractions for biology, process calculi abstractions
action kinetics and more complex interaction mecha- for biology in algorithmic bioprocesses. Springer, Heidelberg
nisms abstracted as cooperative reactions with Hoare CAR (1985) Communicating sequential processes, Prentice
Michaelis–Menten and Hill kinetics. Moreover, con- hall international series in computer science. Prentice Hall, UK
tinuous variables (e.g., the mass of the cell) are intro- Lecca P, Priami C (2003) Cell cycle control in eukaryotes:
a BioSpi model. Proceedings of BioConcur 03. Electron
duced and they influence the rate of many other Notes Theor Comput Sci 180(3):51–63
reactions. Milner R (1999) Communicating and mobile systems: the pi-
In BlenX, it is easy to code all the features listed calculus. Cambridge University Press, Cambridge
above considering the following ideas: Palmisano A, Mura I, Priami C (2009) From odes to language-
based, executable models of biological systems. Proceedings
• Species can be modeled as simple boxes, with inter- of the pacific symposium on biocomputing (PSB 2009)
action capabilities defined only between the species Plotkin GD (1981) A structural approach to operational seman-
involved in pure mass-action reactions. tics. Tech Rep DAIMI FN-19, Computer Science Depart-
• Complex mathematical formula used as kinetic ment, Aarhus University, Aarhus, Denmark
Priami C (2009) Algorithmic systems biology. An opportunity
laws can be simply copied in the input file of for computer science. Commun ACM 52:80–88
the model and assigned as rate of BlenX events. Regev A, Shapiro E (2001) Cellular abstractions: cells as
The simulator takes care of calculating their computation. Nature 419:343
C 294 Cell Cycle Modeling, Stochastic Methods
Yi concentration
Sensitivity analysis is the name of a family of mathe-
matical methods that investigate the relation between ΔYi
the values of the parameters and the output of models.
Local sensitivity analysis explores the effect of small
parameter changes, while the methods of global sensi- C
tivity analysis may explore large parameter domains.
Characteristics
where t is time, Y is the n-vector of variables, p is the solving the following initial value problem (Tomović
m-vector of parameters, Y0 is the vector of the initial and Vukobratović 1972):
values of the variables, and f is the right-hand-side of
the differential equations. S_ ¼ JS þ F Sðt1 Þ ¼ 0 (3)
Local sensitivity analysis
means the calculation of
partial derivative @Yi @pj , which is called the first where S(t) ¼ {sij(t)} ¼ {@Yi @pj } is the time-
order local sensitivity coefficient. The meaning of dependent local sensitivity matrix, matrix J is the
a local sensitivity coefficient is visualized in Fig. 1. Jacobian (J ¼ {@fi =@Yk }), and matrix F contains the
The concentration–time curve, as obtained by the solu- derivatives of the right-hand-side of the ODE with
tion of the original model, is plotted with solid line. If respect to the parameters (F ¼ {@fi @pj }). Matrices J
parameter j is changed by Dpj at time t1, then the and F are functions of time t2. The sensitivity
solution of the modified model will deviate (see
coefficients
are usually used in normalized
the dashed line) from the original solution. Denote pj =Yi @Yi @pj or semi-normalized pj @Yi @pj
DYi the change in the calculated solution at time t2. form. The normalized sensitivity coefficients are
The finite difference approximation can be used for the dimensionless and show the percentage change of
calculation of the local sensitivity coefficient: model solution i as a result of +1% change of the
value of parameter j.
@Yi DYi
(2)
@pj Dpj Global Sensitivity Analysis
Textbooks about the various methods of global sensi-
Local sensitivity coefficient @Yi @pj depends on the tivity analysis and their applications in science have
time of parameter perturbation t1 and the time of been published by Saltelli et al. (2004, 2008). The most
observation t2. If the time of perturbation t1 is fixed, frequently applied methods of global sensitivity anal-
then the local sensitivity coefficient is a function of ysis are the Monte Carlo method with Latin hypercube
time t ¼ t2t1. sampling, application of pseudo random numbers (e.g.,
The error of the finite difference approximation a Sobol’ sequence), Fourier Amplitude Sensitivity Test
cannot be controlled effectively, therefore the local (FAST), and High Dimensional Model Representation
sensitivity function sik(t) is usually calculated by (HDMR). These methods are able to determine the
C 298 Cell Cycle Models, Sensitivity Analysis
uncertainty of each model output (like the concentra- the sensitivity matrix. However, in several chemical
tion at a given time or the period time) knowing the kinetic systems the following relations have been
uncertainty of each parameter. The ultimate informa- observed (see Rabitz 1989; Zsély et al. 2003):
tion (usually obtained by the combined application of 1. Local similarity: Value
several methods) is the calculation of the joint proba-
bility density function (pdf) of the model outputs and sik ðtÞ
lij ðtÞ ¼ (6)
the contribution of each parameter to it knowing the sjk ðtÞ
joint pdf of the parameters.
depends on time t and the model results Yi and Yj
Raw and Cleaned Local Sensitivity Functions selected, but is independent of the parameter pk
The concentration–time curves of a cell cycle model perturbed.
are always nearly periodic and in most models the 2. Scaling relation: Equation
concentration–time curves are truly periodic having
elapsed a transition time. In this case one period time ðdY =dtÞ s ðtÞ
t later the concentrations are identical: i ¼ ik (7)
dYj =dt sjk ðtÞ
YðtÞ ¼ Yðt þ tÞ (4) is valid for any parameter pk. Existence of scaling
relation includes the presence of local similarity.
3. Global similarity: Value
It is possible to investigate how period time t
depends on the values of the parameters. Using
a local sensitivity approach, this information is sik ðtÞ
mikm ¼ (8)
provided by the period time sensitivities @t @pj . sim ðtÞ
One might assume that the sensitivity functions
of a model having periodic solution are also is independent of t (within an interval).
periodic. However, it is true only for the local sensi- Zsély et al. (2003) have shown that the existence of
tivity functions belonging to parameter pk, if the low-dimensional manifolds in variable space may
corresponding period time sensitivity is zero cause local similarity. Global similarity emerges if
(∂t/∂pk ¼ 0). If ∂t/∂pk is not zero, then the maxima local similarity is present and the sensitivity differen-
of the sensitivity–time curves are continuously tial equations are pseudo-homogeneous. The latter is
increasing. It has been shown by several authors related to the existence of excitation periods, like the
(see (Tomović and Vukobratović 1972), (Edelson and autocatalytic runaways of concentrations.
Thomas 1981), (Larter 1983), (Zak et al. 2005))
that In cell cycle models low-dimensional manifolds
the original (“raw”) sensitivity functions @Yi @pj and excitation periods may occur (Lovrics et al.
can be separated
to truly
periodic (“cleaned”) sensitiv- 2006), and accordingly all the three types of relations
ity functions @Yi @pj t and a secular term: between the sensitivity functions were found (Lovrics
et al. 2008). If the maxima of the sensitivity functions
are continuously
@Yi @Yi t @t @Yi increasing,
then the cleaned sensitiv-
¼ (5) ity functions @Yi @pj t should be investigated.
@pj @pj t t @pj @t
Detection of the global similarity of sensitivity
functions has a special importance. If sensitivity
These authors suggested several methods for the functions @Yi @pj and @Yi =@pk are globally similar,
accurate
determination of the period time sensitivity then changing parameter j causes exactly the same
@t @pj and the cleaned sensitivity functions from the effect on concentration–time curve Yi ðtÞ during the
raw sensitivity functions (Fig. 2). whole time period than a (different extent) change of
parameter k. This means that the effect of the change
The Similarity of the Local Sensitivity Functions of parameter j can be compensated by an appropriate
In the case of a general mathematical model, no rela- modification of the value of parameter k. As
tion is expected among the rows and/or the columns of a consequence, an infinite number of parameter sets
Cell Cycle Models, Sensitivity Analysis 299 C
Cell Cycle Models, 1.2
Sensitivity Analysis,
Fig. 2 Raw (a) and cleaned
(b) local sensitivity functions 0.8
of a cell cycle model
0.4
−0.4
−0.8
−1.2
−1.6
0 50 100 150 200 250 300 350 400
time / min
4
1
cleaned sensitivity function
−1
−2
−3
−4
−5
−6
0 50 100 150 200 250 300 350 400
time / min
No Checkpoints in Response to Unreplicated DNA, laevis eggs (Lohka et al. 1988). This activity was
DNA Damage, or Mitotic Defects ascribed to a heterodimeric complex of proteins com-
After fertilization, frog early embryos continue their prising the cyclin-dependent kinase 1 (CDK1), and its
rapid cell cycles unimpeded, even if exposed to agents activating counterpart, cyclin B. In 1995, the enzyme
that block the processes required for DNA replication responsible for targeting cyclin B for proteolysis by the
and cell division (Morgan 2006). Treatment with DNA proteasome – called the anaphase-promoting complex
synthesis inhibitors prevents S-phase completion, but (APC) – was also purified from Xenopus eggs and
cell cleavages will continue. In fact, the presence of clams (King et al. 1995; Sudakin et al. 1995).
a nucleus is not a requirement: enucleation of fertilized Together, the M-phase stimulatory CDK1 protein
eggs has been shown inconsequential to oscillations of kinase and the CDK1-inhibiting APC ubiquitin ligase
the biochemical clock that drives the frog early embry- were found to be the system that drives the clock-like
onic cell cycle. Poisons including nocodazole and taxol oscillations during the early embryogenesis of frogs, as
inhibit the assembly and disassembly of the mitotic well as other amphibians, marine invertebrates (such
spindle, respectively, but even these cell division inhib- as clams and starfish), and insects. Cycling extracts
itors fail to block the activities of the oscillating derived from parthenogenetically activated Xenopus
enzymes that drive the cycles. Frog early embryos eggs exhibit rapid and sustained oscillations of CDK1
treated with these drugs do not undergo cleavages, but activity, and provide a useful tool to study the dynam-
do exhibit waves of cortical/cell surface contractions ical behaviors of this enzyme and others within the
that mirror the timing when mitosis would be expected frog early embryonic oscillatory system (Murray
to occur. DNA-damaging agents – including UV irradi- 1991). CDK1 activity is assayed in these cytoplasmic
ation – are also insufficient to halt progression through extracts by measuring its phosphorylation of
the cell cycle in these early embryos. Altogether, these a preferred substrate, histone H1 (Fig. 2).
traits exemplify the fact that the cell cycle and its control
system in frog early embryos is a clock-like, autono- Positive Feedback Generates Bistability in the
mous oscillator that is not susceptible to perturbations Response of CDK1 to Cyclin, and Confers
that affect the genome, or the subcellular structures that a Relaxation Oscillator Design in the Frog Early
are necessary for cell division. Embryo
The activation of CDK1 is under the control of inhibi-
Frog Early Embryonic Cell Cycles Are Driven by tory kinases and activating phosphatases (Morgan
Alternating Oscillations of CDK1 and APC 2006). Upon binding of cyclin, phosphorylation of
Activities CDK1 on residues of its ATP-binding site (Thr14/
During the late 1980s, extensive physiological and Tyr15) by the kinase Wee1 (as well as another, called
biochemical studies led to the purification of the Myt1) inhibits its activity. Activation of the phosphatase
M-phase-promoting factor from unfertilized Xenopus Cdc25 counteracts this inhibition, but also initiates two
C 302 Cell Cycle of Early Frog Embryos
critical positive-feedback loops. CDK1 phosphorylates (when leaving the off-state), than the threshold level of
and inhibits the Wee1 and Myt1 kinases – causing cyclin that must be surpassed to switch CDK1 from on
a double-negative (positive)-feedback loop – while to off (Pomerening et al. 2003; Sha et al. 2003). This
also further activating additional Cdc25, and inducing steady-state behavior is known as hysteresis, and this
a second positive-feedback loop (Fig. 3). means that there is bistability in response of CDK1
This activation, in turn, stimulates the APC- to cyclin stimulus: depending on where the system
mediated negative-feedback loop, and induces mitotic starts – either in the off or on state – it can exist in
exit and interphase by resetting CDK1 to an inactive either of those two states for a range of cyclin concen-
state through targeting the cyclin subunit for proteoly- trations (Fig. 4).
sis. Together, these positive- and negative-feedback In other words, within the hysteretic range of cyclin
subcircuits provide the basis for CDK1 activation and stimulus, if CDK1 starts in the on state (mitosis) and
inactivation, respectively, during the cell cycle. the stimulus decreases, CDK1 will remain active
Systems-level studies into the dynamical nature of (though its activity would be decreased, since there
CDK1 activation – that included a fusion of both would be less cyclin present) and the cell would remain
computation and experiment – revealed that the in mitosis. If the system starts in the off state and the
steady-state threshold for CDK1 activation is higher stimulus increases to the same level as mentioned
previously, CDK1 would remain off and the cell
would not leave interphase. Combining positive feed-
back and bistability with the APC-driven negative-
feedback loop, the CDK1-APC oscillator in frog
early embryos behaves as a relaxation oscillator –
producing rapid and spike-like bursts of CDK1
activity – followed by its rapid APC-mediated inacti-
vation. Lessening of this positive feedback through the
addition of a non-phosphorylatable (Wee1-insensitive)
mutant of CDK1 was found to cause damped CDK1
activity oscillations (Pomerening et al. 2005). This
caused defects in cell cycle progression – namely, the
Cell Cycle of Early Frog Embryos, Fig. 3 Positive-feedback premature inhibition of DNA synthesis – by rendering
loops in CDK1 activation egg extracts incapable of sustaining periods of low
mammalian cells, such as cancer cells in tissue culture, oncogene addiction (link to article by Frick et al.).
move through the cell cycle in 12–24 h. Cells with the Nutrient availability, a prerequisite for growth and
potential to proliferate but which are not proliferating proliferation, is communicated via the mTor pathway
are in the G0 phase of the cell cycle and are described (Hay and Sonenberg 2004). The EGFR signaling path-
as quiescent. As in yeast, progression through the way has received considerable attention from systems
cell cycle is driven and regulated by cyclin– and computational biologists and highly detailed mass-
cyclin-dependent kinase (CDK) pairs responsible for action kinetic models have been published (Chen et al.
promoting and regulating each of the four transitions 2009). These models are being used to generate molec-
G0-G1, G1-S, S-G2, G2-M. Mammalian cells have a ular targets for anticancer therapeutics (Schoeberl et al.
larger and more diverse set of cyclins than yeast and 2009).
these cyclins allow for tissue and context-specific
regulation of the cell cycle (see article by Malumbres). Cell Cycle Quality Control
For proliferation to succeed both daughter cells must
Cell Cycle Entry receive accurate and complete copies of the genome.
It is extremely difficult to monitor cell growth and Replicating the genome by unwinding, transcribing,
proliferation in vivo. As such, much of our understand- and accurately segregating chromosomes (S phase)
ing of the mammalian cell cycle is from experiments causes numerous sites of DNA damage such as single
performed in tissue culture where it is possible to and double strand breaks. In addition, there are numer-
control cellular exposure to many of the variables ous additional sources of DNA damage including free
that determine whether cells proliferate including radicals from the cell’s own metabolism and environ-
exposure to growth factors and nutrient availability. mental sources of damaging agents such as back-
Removal of nutrients and growth factors, aka serum ground radiation. DNA damage is detected and
starvation, drives many immortal cell lines out of the responded to by an extensive molecular machinery of
cell cycle and into quiescence (G0). Quiescence is proteins that recognize DNA damage and signal its
characterized by hypophosphorylated Rb that binds presence and extent to the cell and DNA repair
E2F1 thereby blocking transcription of genes essential machinery. In mammalian cells the protein p53
or cell cycle entry. Addition of nutrients and growth functions as a signal integration node coupling DNA
factors activates signal transduction pathways that damage to cell cycle arrest, senescence, and apoptosis.
upregulates gene expression that results in phosphory- In addition, the signaling kinases of the DNA damage
lation of Rb and causes Rb to no longer bind E2F1. response, Chk1 and Chk2, directly couple the DNA
Released E2F1 upregulates the expression of Cyclin D damage response to the cell cycle machinery. For more
and Cdk1, and this cyclin-CDK pair regulates the details on how DNA damage is coupled to the cell
transition from G0 to G1. Once cells have passed the cycle, see Cell cycle signaling, DNA damage by
restriction point (link to article by You et al.) then they Toettcher in this volume.
move into S and are committed to moving through the Cell fate after DNA damages is a function of the
entire cell cycle even if serum starved again. amount of damage, the dynamics of the damage, and
A complex set of signal transduction pathways interplay with extracellular cues. Low levels of tran-
communicates extracellular cues to the cell cycle sient DNA damage induce a temporary cell cycle arrest
machinery. A canonical example is the extracellular that permits repair. In the G2 phase of the cell cycle the
growth factor (EGF) pathway. EGF binds its receptor, cell possesses two copies of each chromosome, which
EGFR, activating the mitogen-activated protein kinase allows the cell to correct damage generated in S phase
(MAPK aka ERK) pathway, which in turn activates through coupled homologous recombination and
transcriptional programs (via c-Myc and CREB) that repair. High and/or persistent levels of DNA damage
induce cell cycle entry. The Akt pathway also trans- signal through p53 to induce either senescence or apo-
duces extracellular growth factor signal into cell cycle ptosis. Differentiated mammalian cells do not express
entry. Many components of the MAPK and Akt path- telomerase and as such each round of replication trun-
ways are oncogenes. Mutations in these proteins can cates their telomeres. Telomeres completely eroded by
drive oncogenesis, and cancer cells often exhibit excessive rounds of replication are recognized by the
Cell Cycle Ontology (CCO) 305 C
DNA damage machinery which induces a persistent of ErbB signaling pathways as revealed by a mass action
DNA damage response that drives the cell into a state model trained against dynamic data. Mol Syst Biol 5:239
Hay N, Sonenberg N (2004) Upstream and downstream of
of permanent cell cycle referred to as replicative senes- mTOR. Genes Dev 18:1926–45
cence. These cellular responses are unique departures Schoeberl B, Pace EA, Fitzgerald JB, Harms BD, Xu L, Nie L,
from the mammalian cell cycle and function as barriers Linggi B, Kalra A, Paragas V, Bukhalid R, Grantcharova V,
to tumorigenesis (Bartkova et al. 2006). Interestingly, Kohli N, West KA, Leszczyniecka M, Feldhaus MJ,
Kudla AJ, Nielsen UB (2009) Therapeutically targeting
cell fate determination after DNA damage is strongly ErbB3: a key node in ligand-induced activation of the ErbB C
influenced by extracellular cues, such as cytokines and receptor-PI3K axis. Sci Signal 2:ra31
growth factors, implying that a multivariate systems
approach is necessary for understanding how cells
couple extracellular signal transduction to the DNA
damage response and the cell cycle machinery to
commit to these cell fates. Cell Cycle Ontology (CCO)
References Characteristics
domain is constantly evolving, and covers a variety of 7. Biological species relationships as described by the
organisms. This poses a range of requirements: NCBI taxonomy (▶ NCBI BioProject Genome
1. CCO integrates cell cycle knowledge across Resources)
several model organisms: H. sapiens, S. cerevisiae, 8. Protein-protein interactions involving cell
S. pombe, and A. thaliana. cycle proteins from the IntACT database
2. It presents a common conceptual view of this (▶ Protein–Protein Interaction Databases)
knowledge, permitting sophisticated querying in
a variety of forms, across cell cycle knowledge. Core Concepts of CCO
3. CCO accommodates frequent changes in data, by Protein, Modified protein, Gene, Molecular interac-
frequent rebuilds through an automated pipeline. tion, Molecular function, Biological process, Cellular
4. CCO performs automatic mapping of biological component, and Organism. All these entities in the
entities. CCO (proteins – including their modified forms,
genes, interactions, etc.) are modeled as classes since
Contents of CCO they gather shared commonalities that are present in all
The cell cycle ontology covers the following topics: the individuals (instances) they represent (Fig. 1).
1. The cell cycle proteins
2. The genes coding for cell cycle proteins Upper Level Ontology
3. The molecular function of cell cycle proteins An upper level ontology (ULO) connects the inte-
4. The cellular localization of cell cycle proteins grated resources. A ULO is an ontology that structures
5. Cellular processes associated with cell cycle general types of concepts (such as a process) in generic
proteins as well as specific domains to provide an integration
6. Post-translational modifications of cell cycle scaffold for including other ontologies. The CCO-
proteins ULO is based on the basic formal ontology (BFO,
7. Protein-protein interactions among cell cycle Grenon et al. 2004) to ensure interoperability of the
proteins CCO with ontologies from OBO. The CCO-ULO has
8. Phylogenetic information for the supported organ- been customized for the CCO by inclusion of a few
isms through orthology relationships among cell high-level concepts, such as “cell cycle protein” (see
cycle proteins Fig. 2).
The cell cycle ontology integrates the following
resources: Construction of CCO
1. Genes, proteins, and their post-translational modi- The cell cycle ontology is constructed in several steps:
fications from the UniProt database 1. The process starts by selecting portions from vari-
2. The cell cycle, cell division, cell proliferation, DNA ous OBO source ontologies. This “pre-cell cycle
replication branches of the gene ontology biological ontology” constitutes a backbone for the complete
process branch (▶ Gene Ontology) CCO and the four species-specific sub-ontologies
3. The complete molecular function branch from the thereof.
gene ontology 2. The OBO relations ontology is fully incorporated,
4. The complete cellular component branch from the together with the “interaction type” branch from the
gene ontology molecular interaction ontology from OBO.
5. The relation ontology provided by the Open Bio- 3. A specific taxonomy is built on the basis of
medical Ontologies Consortium (OBO) (Smith the NCBI taxonomy for H. sapiens, S. cerevisiae,
et al. 2007) S. pombe, and A. thaliana.
6. The gene ontology annotations (Camon et al. 2004) 4. Protein “hubs” that connect their relevant annota-
of UniProt proteins for the cell cycle (these are the tion data (such as the molecular functions in
attachments of GO Cellular Component concepts to which they participate) are generated. All proteins
protein to describe their location); GO biological described with cell cycle concepts in gene
process and molecular function concepts to ontology annotation (GOA) files (Camon et al.
describe their functionality 2004) are named “core cell cycle proteins.”
Cell Cycle Ontology (CCO) 307 C
core cell G1/S
Saccharomyces transition of
cycle SWI4_yeast
cerevisiae mitotic cell
nucleus protein CCO:G0002318
organism CCO:C0000252 cycle
CCO:B0000000
CCO:T0000016 CCO:P0000012
participates_in
derives_from Orthology
located_in encoded_by
cluster1517 C
is_a protein
swi4-ssa1 CO:O0001289
physical
interaction
CCO:I0002887 is_a
SWI4_YEAST-
participates_in Phosphoserine
SWI4_YEAST transforms_into 159
CCO:B0000111 CCO:B0009551
participates_in transforms_into
swi6-mpg1
physical SWI4_YEAST-
interaction Phosphoserine
CCO:I0003305 806
CCO:B0009552
transforms_into
participates_in transforms_into
participates_in
ho-491 SWI4_YEAST-
physical Phosphoserine
interaction 1003
CCO:I0005128 CCO:B0009553
swi4-2 SWI4_YEAST-
physical Phosphoserine
interaction 1007
CCO:I0005527 CCO:B0009554
Cell Cycle Ontology (CCO), Fig. 1 Protein-centric knowledge in CCO. The local neighborhood of the cell cycle protein wee1
(human) is shown. Entities (color coded) and relationships (arrows) illustrate the types of knowledge in CCO
These proteins are added to the CCO as the chil- (Li et al. 2003). This tool is used to infer evolution-
dren of the concept core cell cycle protein (CCO: ary relationships between classes of proteins. The
B0000000) and used as seeds for the data integra- proteins from the clusters containing at least one
tion process. core cell cycle protein are added to the CCO as
5. The “core cell cycle” protein section of the CCO is subclasses of the class cell cycle protein. All the
expanded to include the proteins known to interact imported entries from the sources are cross-
with the core cell cycle proteins, as documented in referenced so that the data can be traced back to
IntACT (Kerrien et al. 2007). These new protein its source.
classes are included as subclasses of the class cell 7. The four organism-specific ontologies and the com-
cycle protein. Then, protein information (such as posite cell cycle ontology are checked and made
synonym names, encoding genes, and cross- available producing the official release of the
references) is fetched from the UniProt knowledge system.
base. In addition, post-translational modification 8. Now that the CCO is available with all the knowl-
data, when available in UniProt, are also added by edge in place, semantic enrichment can occur. We
creating new concepts defined by their specific use the Ontology PreProcessor Language (OPPL,
modification. http://oppl2.sourceforge.net/) to further transform
6. Clusters of putative orthologs of the four species the CCO with application of ontology design
are generated with the OrthoMCL clustering utility patterns.
C
Data from
308
cell cycle
Data from GO UniProt
(GO)
Data from
biological OrthoMCL
perdurant
protein
Data from MI is_a
is_a modified
is_a protein
is_a is_a cell cycle
biological modified
entity gene product is_a protein
cell cycle
protein
is_a
cellular S. cerevisiae
is_a Data from GO
component
S. pombe
Data from
Organism
NCBI
A. thaliana
H. sapiens
Cell Cycle Ontology (CCO), Fig. 2 Upper level ontology (ULO) for the CCO. The ULO is a hierarchical scaffold including generic concepts (e.g., cell cycle gene) which connects
the integrated resources. The dashed rectangles represent the type of data residing below the parental concepts: the node labeled “data from GO” shows where the concept from
GO’s cellular component ontology goes under the concept cellular component (e.g., nucleus), the node “data from UniProt” under the concept protein shows where protein data from
UniProt resides, and so forth
Cell Cycle Ontology (CCO)
Cell Cycle Progression in Bacteria 309 C
Cell Cycle Ontology (CCO), Table 1 Numbers of classes, ▶ NCBI BioProject Genome Resources
breadth, and depth of the CCO hierarchy ▶ Ontology
Structure and cohesion of CCO ▶ Protein–Protein Interaction Databases
Metric Asserted CCO
No. of classes 89,532
No. of leaf classes 85,235
Max. depth 33 References C
Mean depth 15.35
Mean number of subclasses per 4.61 Antezana E, Egaña M, Blondé W, Illarramendi A, Bilbao I,
class De Baets B, Stevens R, Mironov V, Kuiper M (2009) The
cell cycle ontology: an application ontology for the represen-
Max. number of subclasses 24,021
tation and integrated analysis of the cell cycle process.
No. of properties 52 Genome Biol 10:R58
Property with maximum usage has_source used 65 110 times Bug WJ, Ascoli GA, Grethe JS, Gupta A, Fennema-Notestine C,
Mean usage per property 6,673.69 Laird AR, Larson SD, Rubin D, Shepherd GM, Turner JA,
Mean property usage per class 3.87 Martone ME (2008) The NIFSTD and BIRNLex vocabular-
ies: building comprehensive ontologies for neuroscience.
Neuroinformatics 6:175–194
Camon E, Magrane M, Barrell D, Lee V, Dimmer E, Maslen J,
Binns D, Harte N, Lopez R, Apweiler R (2004) The gene
9. Finally, validation and verification processes are ontology annotation (GOA) database: sharing knowledge in
carried out while building the CCO to ensure its uniprot with gene ontology. Nucleic Acids Res 32(Database
issue):D262–266
soundness. The ontologies generated in the previ-
Grenon P, Smith B, Goldberg L (2004) Biodynamic ontology:
ous phase are manually and automatically checked applying BFO in the biomedical domain. Stud Health
by ontology editors, validators, and reasoners. In Technol Inform 102:20–38
addition, the pipeline log execution files are Kerrien S, Alam-Faruque Y, Aranda B, Bancarz I, Bridge A,
Derow C, Dimmer E, Feuermann M, Friedrichsen A, Huntley
inspected in detail. These files are sufficiently
R, Kohler C, Khadake J, Leroy C, Liban A, Lieftink C,
detailed to point out any possible problem. This Montecchi-Palazzi L, Orchard S, Risse J, Robbe K,
has allowed us to assemble a fully automated pipe- Roechert B, Thorneycroft D, Zhang Y, Apweiler R,
line that uploads the ontologies and their exports in Hermjakob H (2007) IntAct - open source resource for
molecular interaction data. Nucleic Acids Res 35(Database
the different formats to the CCO website and to all
issue):D561–565
related and supporting repositories (Table1). Li L, Stoeckert CJ Jr, Roos DS (2003) OrthoMCL: identification
of ortholog groups for eukaryotic genomes. Genome Res
Examples of Other Application Ontologies 13:2178–2189
Malone J, Holloway E, Adamusiak T, Kapushesky M, Zheng J,
The experimental factor ontology (EFO, Malone et al.
Kolesnikov N, Zhukova A, Brazma A, Parkinson H (2010)
2010), developed by the European Bioinformatics Modeling sample variables with an experimental factor
Institute (EBI), represents sample variables from ontology. Bioinformatics 26:1112–1118
gene expression experimental data. The NIFSTD Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W,
Goldberg LJ, Eilbeck K, Ireland A, Mungall CJ, OBI Con-
ontology, developed by the NeuroInformatics Frame-
sortium, Leontis N, Rocca-Serra P, Ruttenberg A, Sansone
work (NIF, Bug et al. 2008), has produced an inventory SA, Scheuermann RH, Shah N, Whetzel PL, Lewis S (2007)
of Web-based neuroscience resources, integrating an The OBO foundry: coordinated evolution of ontologies to
ontology with separate modules covering major support biomedical data integration. Nat Biotechnol
25:1251–1255
domains of neuroscience: anatomy, cell, subcellular,
Stevens R, Lord P (2008) Application of ontologies in bioinfor-
molecule, function and dysfunction, and concepts for matics. In: Staab S, Studer R (eds) Handbook on ontologies
describing experimental techniques, instruments, and in information systems. Springer, Berlin
other resources.
Cross-References
Cell Cycle Progression in Bacteria
▶ Cell Cycle
▶ Gene Ontology ▶ Cell Cycle, Prokaryotes
C 310 Cell Cycle Regulation, microRNAs
Characteristics
Cell Cycle Regulation, microRNAs
Cell Cycle Genes Directly Regulated by microRNAs
Baltazar D. Aguda1 and Gregory Riddick2 Shown in Fig. 1 are some of the experimentally
1
Mathematical Biosciences Institute, Ohio State validated microRNAs targeting key cell cycle genes
University, Columbus, OH, USA involved in the early phases (G1 and S) of the mam-
2
Neuro-Oncology Branch, National Cancer Institute, malian cell cycle (Chivukula and Mendell 2008;
Bethesda, MD, USA Yu et al. 2010). Myc and E2F are transcription
factors that induce expression of cell cycle genes
such as cyclins (e.g., cyclin E) and cyclin-dependent
Synonyms kinases (e.g., CDK4). Note that Myc and E2F
promote each other’s expression. As shown in
Post-transcriptional regulation of gene expression Fig. 1, these transcription factors also induce the
expression of miR-17/20 (which is the abbreviated
name for two microRNAs, namely, miR-17-5p and
Definition miR-20a). In return, miR-17/20 target and inhibit the
expression of Myc and E2F, thus forming negative
MicroRNAs are short (~21 nucleotides) non-coding
RNAs that are endogenous in the genomes of ani-
mals and plants, and have been shown to induce p16 24-1
cleavage or suppression of translation of their
target mRNAs. Thus, microRNAs are generally let-7 15a/16
17/20
let-7 Myc
known as post-translational negative regulators of 34a cycD CDK4,6 34a
let-7
gene expression, although there are a few recent 17 / 20 15a / 16
reports that some microRNAs act as positive regu-
lators. The mRNA target is recognized through
binding of a seed sequence in the microRNA to 17/20 Rb 17/20
E2F
a complementary sequence in the mRNA. Using 34a
E2F2 1 C
2
m
Cell Cycle Regulation, microRNAs, Fig. 2 Modeling the neg- the row of seven microRNAs at the bottom of the figure. Two
ative feedback loop between miR-17/20 and Myc/E2F. (a) members of the cluster, miR-17-5p and miR-20a, target and
Shown inside the gray box are the proteins Myc, E2F1, E2F2, inhibit Myc and E2F1-3. (b) A reduced model involving an
and E2F3 inducing each other’s expression. All of these proteins autocatalytic protein module, p, and a microRNA module, m
also induce the expression of the miR-17-92 cluster pictured by
Cross-References Definition
Separase
b
Cohesin Mad2
Securin
Separase
Cohesin
Cyclin
B
Cdc20
Cohesin
APC/C
Cohesin
Cohesin
Mad1/ APC/C
Mad2
Cdc20
Cell Cycle Signaling, Spindle Assembly Checkpoint 319 C
Cell Cycle Signaling, a b
Spindle Assembly
Checkpoint, Mad2
Fig. 2 Proposed positive Mad2
feedback loops in the SAC
*
Cdc20
network. (a) The Mad2- Cdc20
template model (De Antoni Cdc20
et al. 2005) proposes that Mad2
Cdc20:Mad2 catalyzes the APC/C C
Cdc20 APC/C
formation of more Cdc20:
Mad2 complexes similarly to Mad2
* Mad2
Mad1:Mad2. (b) Activated
APC/C:Cdc20 has been
proposed to induce the
formation of more APC/C: c
Cdc20 complexes, possibly
via the ubiquitination of Cohesin
Cdc20. (c) A double-negative
feedback loop involving the Mad2
SAC and APC/C:Cdc20 has
also been suggested. The Cohesin
double negative behaves like
a positive feedback loop and
could be important for SAC Cdc20
maintenance and Cohesin Mad1/
disengagement Mad2
APC/C
Cohesin
Mad2
Cohesin *
proposed that the MCC itself (or better, one of its Cell Cycle Signaling, Spindle Assembly Checkpoint,
constituents, the Mad2:Cdc20 complex) is capable of Table 1 Concentrations of molecular species belonging to the
SAC pathway in HeLa cells
propagating the formation of new MCC complexes,
the so-called Mad2-template model (Fig. 2a and De Species Value Reference
Antoni et al. 2005). This idea, never confirmed in vivo, APC/C 80 nM Tang et al. (2001)
was initially criticized on the basis that such a loop 65 nM Sudakin et al. (2001)
Bub1 100 nM Howell et al. (2004)
could make the checkpoint independent from kineto-
BubR1 90 nM Tang et al. (2001)
chores and thus potentially impossible to be switched
52 nM Sudakin et al. (2001)
off, but it should be noticed that the positive feedback
127 nM Fang (2002)
loop does not introduce energy in the system, and thus
Cdc20 100 nM Tang et al. (2001)
is conceivable there exists regimes of autocatalysis 285 nM Fang (2002)
where the non-kinetochore-mediated MCC production Mad1 20 nM Shah et al. (2004)
is still dependent on the kinetochore-mediated cataly- 1/4 of Mad2 Luo et al. (2004)
sis (Simonetta et al. 2009). Mad2 120 nM Tang et al. (2001)
Checkpoint inactivation: Positive feedback loops 400 nM Sudakin et al. (2001)
were also proposed to explain the rapid inactivation 100–300 nM Luo et al. (2004)
of the SAC (systems level property (2)). First, it was 200 nM Shah et al. (2004)
suggested that active APC/C:Cdc20 could help the 100 nM Howell et al. (2000)
transformation of APC/C:MCC complexes into active 230 nM Fang (2002)
APC/C:Cdc20 via the ubiquitination of Cdc20 (Fig. 2b
and Reddy et al. 2007). This positive feedback loop
could accelerate the release of APC/C inhibition once specifically Mad2 molecules, at an unattached kineto-
MCC production is shut off. Later, it was proposed that chore (1,500 molecules/unattached kinetochore)
a double-negative feedback loop comprising APC/C: (Howell et al. 2000). Subsequent to this work,
Cdc20 and the checkpoint could explain the a number of groups have measured Mad2 turnover at
rapid checkpoint inactivation after the last kinetochore the unattached kinetochore (t1/2 10–25 s) (Howell
has attached (Fig. 2c and He et al. 2011; Zeng et al. et al. 2000; Shah et al. 2004) providing an upper bound
2010). While the SAC inactivating APC/C has been to the MCC generation rate (35–65 molecules/
well documented, the mechanism whereby APC/C kinetochore/s).
inactivates the SAC has not been so thoroughly exam- This generation rate must be matched or be greater
ined. It should be noted that SAC components are than the dissociation rate of the MCC:APC/C complex
substrates of APC/C (e.g., Mps1), and Cyclin B itself to permit robust inhibition. The dissociation rate, at
is widely recognized as required for SAC activation, this time, has only been estimated from measurements
and it is of course a substrate of APC/C:Cdc20. Impor- of mitotic timing. Observations of live cells have
tantly, the presence of feedback loops involved in SAC provided detailed measurements of anaphase onset
inactivation needs to be reconciled with its reported after the establishment of complete bipolar attachment.
reversibility (system level property (3)). While the numbers vary, the consensus is that in
mammals from last kinetochore attachment anaphase
Quantitative Measurements ensues in 25 min, and from the loss of Mad2 at the
The circuits depicted above provide only a qualitative last unattached kinetochore anaphase begins 10 min
description of the SAC pathway. In this section, we later. From these measurements and known concentra-
report the most relevant experimental measurements tions of SAC and APC/C proteins (see Table 1), we can
that are needed to understand how the SAC might estimate that the dissociation rate of the MCC:APC/C
actually work. Given the basic structure of SAC is near 0.0013/s. For 100,000 APC/C molecules
signaling, the most relevant parameters would seem in the cell, this is a molecular dissociation rate of
to be the balance of MCC production and MCC:APC/C 130 molecules/s. This does not match the MCC
dissociation. Currently only the former has been production rate from the kinetochore, a fact pointing
measured quantitatively. This work stems from the to the little knowledge we have about the ways cells
measurement of the number of SAC molecules, inactivate the SAC. Although different mechanisms
Cell Cycle Transition, Detailed Regulation of Restriction Point 321 C
have been proposed to explain SAC switching off (the Shah JV et al. (2004) Dynamics of centromere and kinetochore
stripping Mad1:Mad2 away from unattached kineto- proteins; implications for checkpoint signaling and silencing.
Curr Biol 14(11):942
chores, the degradation of Cdc20, the phosphorylation Simonetta M, Manzoni R, Mosca R, Mapelli M, Massimiliano L,
of Mad2, the ubiquitination of Cdc20), no consensus Vink M, Novak B, Musacchio A, Ciliberto A (2009) The
has been reached yet, making SAC silencing a major influence of catalysis on mad2 activation dynamics. PLoS Biol
field of investigation. 7(1):e10
Sudakin V, Chan GK, Yen TJ (2001) Checkpoint inhibition of
the APC/C in HeLa cells is mediated by a complex of C
BUBR1, BUB3, CDC20, and MAD2. J Cell Biol 154(5):925
Tang Z, Bharadwaj R, Li B, Yu H (2001) Mad2-Independent
References inhibition of APCCdc20 by the mitotic checkpoint protein
BubR1. Dev Cell 1(2):227
Clute P, Pines J (1999) Temporal and spatial control of cyclin B1 Zeng X, Sigoillot F, Gaur S, Choi S, Pfaff KL, Oh DC,
destruction in metaphase. Nat Cell Biol 1:82–87 Hathaway N, Dimova N, Cuny GD, King RW (2010) Phar-
De Antoni A, Pearson CG, Cimini D, Canman JC, Sala V, macologic inhibition of the anaphase-promoting complex
Nezi L, Mapelli M, Sironi L, Faretta M, Salmon ED, induces a spindle checkpoint-dependent mitotic arrest in
Musacchio A (2005) The mad1/mad2 complex as the absence of spindle damage. Cancer Cell 18:382–395
a template for mad2 activation in the spindle assembly
checkpoint. Curr Biol 15:214–225
Doncic A, Ben-Jacob E, Barkai N (2005) Evaluating putative
mechanisms of the mitotic spindle checkpoint. Proc Natl
Acad Sci USA 102(18):6332–6337
Fang G (2002) Checkpoint protein BubR1 acts synergistically
Cell Cycle Transition, Detailed
with Mad2 to inhibit anaphase-promoting complex. Mol Biol Regulation of Restriction Point
Cell 13(3):755
He E, Kapuy O, Oliveria RA, Uhlmann F, Tyson JJ, Novák B Laurence Calzone
(2011) System-level feedbacks make the anaphase switch
irreversible. Proc Natl Acad Sci USA 108(24):10016–10021
Institut Curie – INSERM U900 – Mines ParisTech,
Howell BJ, Hoffman DB, Fang G, Murray AW, Salmon ED Paris, France
(2000) Visualization of Mad2 dynamics at kinetochores,
along spindle fibers, and at spindle poles in living cells.
J Cell Biol 150:1233–1250
Howell BJ et al. (2000) Visualization of Mad2 dynamics at
Synonyms
kinetochores, along spindle fibers, and at spindle poles in
living cells. J Cell Biol 150(6):1233 G1 checkpoint; R-point
Howell BJ et al. (2004) Spindle checkpoint protein dynamics at
kinetochores in living cells. Curr Biol 14(11):953
Hoyt MA, Totis L, Roberts BTS (1991) Cerevisiae genes
required for cell cycle arrest in response to loss of microtu- Definition
bule function. Cell 66:507–517
Li R, Murray A (1991) Feedback control of mitosis in budding The restriction point is an irreversible transition occur-
yeast. Cell 66:519–531
Luo X et al. (2004) The Mad2 spindle checkpoint protein has two
ring in late G1 phase after which the cell commits to
distinct natively folded states. Nat Struct Mol Biol 11(4):338 DNA replication and cell cycle progression. Using
Musacchio A, Salmon ED (2007) The spindle-assembly check- a systems biology approach, detailed molecular maps
point in space and time. Nat Rev Mol Cell Biol 8:379–393 are built and reveal a highly complex architecture
Reddy SK, Rape M, Margansky WA, Kirschner MW
(2007) Ubiquitination by the anaphase-promoting complex
governing the regulation of this transition.
drives spindle checkpoint inactivation. Nature 446:921–925
Rieder CL, Cole RW, Khodjakov A, Sluder G (1995) The
checkpoint delaying anaphase in response to chromosome Characteristics
monoorientation is mediated by an inhibitory signal produced
by unattached kinetochores. J Cell Biol 130:941–948
Sear RP, Howard M (2006) Modeling dual pathways for the Basic Concepts of the Restriction Point
me-tazoan spindle assembly checkpoint. Proc Natl Acad Progression and coordination of the cell cycle are
Sci USA 103:16758–16763 governed by complex molecular processes (▶ Cell
Shah JV, Botvinick E, Bonday Z, Furnari F, Berns M,
Cleveland DW (2004) Dynamics of centromere and kineto-
Cycle). Throughout the cycle, different safety mecha-
chore proteins; implications for checkpoint signaling and nisms verify that proper conditions are met for cell
silencing. Curr Biol 14:942–952 cycle events such as DNA replication, chromosome
C 322 Cell Cycle Transition, Detailed Regulation of Restriction Point
alignment, etc. (▶ Cell Cycle Checkpoints). The kinases CDK4 or CDK6 to trigger the G1/S transition.
restriction point (▶ Cell Cycle Transition, Principles On the other hand, it suppresses the activity of ▶ Cdk
of Restriction Point) is one of these checkpoints. It inhibitors (or CKI, such as p16INK4a or p15INK4b),
blocks entry into S phase upon growth-limiting which prohibit the association of the CyclinD1/
constraints. CDK4,6 complexes. However, the transition is not
In the absence of mitogens, the cell enters a G0 (or governed by the sole activation of these complexes.
quiescent) state. In the presence of mitogens, growth is As long as proper intracellular conditions are not met,
stimulated and the cell advances through G1 phase. a checkpoint prevents entry into S phase until the cell
During the time preceding the restriction point transi- is ready.
tion, the cell is responsive to both positive (e.g., mito-
genic growth factors) and negative (e.g., transforming RB, the Guardian of the Restriction Point Gate
growth factor b) external factors. However, when it The tumor suppressor gene retinoblastoma (RB) plays
passes through the restriction point in late G1, the cell the role of gatekeeper. It is the main target of
commits to cell cycle events and proceeds to DNA CyclinD1/CDK4,6 complexes. During the G1 phase,
synthesis. After that point, even if the mitogens are RB sequesters a family of key ▶ transcription factors,
removed, the cell completes its cycle before stopping the E2Fs, that mediates the transcription of many genes
at the following round (▶ Cell Cycle Transition, involved in cell cycle events and in the apoptotic
Principles of Restriction Point). pathway (DeGregori and Johnson 2006). Upon mitotic
What is the molecular machinery behind the cell stimulation, CyclinD1/CDK4,6 complexes start phos-
decision to either halt and enter a quiescent state or phorylating RB, which releases and activates E2Fs.
replicate its DNA and progress through M phase? E2Fs activation terminates G1 phase and leads to
entry into S phase. CyclinE1, one of E2F target
Biochemical Interactions Involved in the genes, binds to CDK2 and collaborates with
Restriction Point Transition CyclinD1/CDK4,6 to maintain the hyperpho-
To recapitulate all the key players and understand the sphorylated state of RB, thereby ensuring the irrevers-
intricate processes governing this decision, a thorough ibility of the transition. The processes described above
description of the molecular machinery controlling the are illustrated in Fig. 1.
restriction point is proposed. The study of RB in the context of the restriction
point is critical for several reasons. First, RB inactiva-
Processing the Information tion and the passage through the restriction point occur
A specific family of kinases has first been identified as concomitantly (Bartek et al. 1996). Furthermore, RB is
cell growth sensors (▶ Cyclins and Cyclin-dependent an anti-oncogene and its activity is deregulated in
Kinases). They link external stimuli to the intracellular virtually all types of cancers (▶ Cell Cycle, Cancer
machinery. These sensors are composed of a kinase Cell Cycle and Oncogene Addiction) (Weinberg
subunit, Cdk, and a cyclin partner, which determines 2006), including familial and sporadic forms (osteosar-
the localization and the function of the complex. If comas, breast carcinomas, small cell lung carcinomas,
these complexes can sense growth signals, how exactly bladder carcinomas, melanomas, etc.).
is the signal transmitted to them? The deficiency of the RB pathway is observed in
The sequence of events leading to the activation of cancers when (1) the external stimulation is constant,
these kinases can be characterized as follows: The for example, when the growth factors are always pre-
binding of growth factors to their receptors lead to sent or when their receptors are mutated and as a result
the activation of cytoplasmic proteins such as Ras, the signal is always considered to be on, (2) the cell is
which passes on the signal to a cascade that ultimately insensitive to antimitogenic factors (e.g., TGFb), or
turns on the transcription of cell cycle genes (▶ Tran- (3) when processes that control the passage through
scription). The growth factor-mediated activation of the restriction point are perturbed by mutations or
these genes operates through two distinct paths. On diverse dysfunctions (Sherr 1996). More precisely, in
one hand, it leads to the accumulation of the proto- cancer samples, dysfunctions of RB itself have been
typical G1 cyclin, CyclinD1, in the nucleus and pro- found to arise from the deletion of the gene, from
motes its association with the cyclin-dependent epigenetic mutations, from the presence of viral
Cell Cycle Transition, Detailed Regulation of Restriction Point 323 C
Mitogens p130 (RBL2) is present in quiescent cells; p130 has
(Ras) 22 sites of phosphorylation.
• E2F transcription factors
G1 phase Activators of transcription: E2F1, E2F2, E2F3a.
CyclinD1 CDK4,6 INK4 Inhibitors of transcription: E2F3b, E2F4, E2F5,
E2F6, E2F7a, E2F7b, E2F8.
The E2F transcription factors are active as dimers. C
R-point RB E2F1 to E2F6 dimerize with DP1 or DP2.
E2F7 and E2F8 are active as homodimers.
• Association between RB family members and the
E2Fs
E2F
E2F1, E2F2 and E2F3 bind RB.
S phase
E2F4 binds RB, p107 or p130.
CyclinE1 CDK2
E2F5 binds p130 (and to some extent p107).
E2F6 binds proteins from the polycomb group.
Cell Cycle Transition, Detailed Regulation of Restriction E2F7 and E2F8 form homodimers and do not bind
Point, Fig. 1 Simple representation of the interactions involved to other known proteins.
in the restriction point transition
E2F3_targets E2F1_targets
E2F8_targets E2F7_targets E2F6_targets E2F5_targets E2F2_targets
E2F4_targets
E2F1–3 Apoptosis_entry
E2F4–5
E2F6–8
RB
CycC:CDK3
p16/p15 CycD1:CDK4/6
CycE1:CDK2
CycA2:CDK2
p27/p21
CDC25C
CycH:CDK7
CycB1:CDC2
APC
Weel
Cell Cycle Transition, Detailed Regulation of Restriction modifications of the major proteins involved in RB regulation.
Point, Fig. 2 Detailed map of the retinoblastoma (RB) network Blue rectangles refer to tumor suppressor genes and red rectan-
emphasizing the passage through the restriction point. The map gles to proto-oncogenes. Lower panel: mRNA and gene targets
is constructed using standard SBGN language in CellDesigner of the eight E2F transcription factors. A readable and clickable
(http://www.celldesigner.org). It comprises 80 proteins, 176 version of the map can be found at http://bioinfo-out.curie.fr/
genes, 165 interactions, and established from circa 350 publica- projects/rbpathway/interactive/rb_network.html
tions. Upper panel: biochemical reactions regulating the
restriction point (Calzone et al. 2008). In the latter transformation distinguished. Some links can be bro-
map, the focus is put on the biochemical reactions ken and alternative – or back-up – pathways, etc.,
and transformations (edges) of species (nodes) that proposed.
play an important role in the regulatory process
(Fig. 2). Modular View of the Comprehensive Map
To improve the readability of the map, one can sim-
Use of the Map plify it without losing information about regulations.
Beyond the role of integrating knowledge in an unam- For that purpose, from the initial comprehensive map,
biguous form, these molecular maps can be used to groups of components can be identified from data
interpret biological data or reason on unexpected ▶ clustering, when the data is available, or from
behaviors of the system in specific conditions. The graph theory techniques (▶ Module Network). Here,
map can be viewed as an electronic circuit and inter- the reaction network of Fig. 2 is transformed into an
rogated based on its sole topology. It can be compared influence network (Fig. 3) in which each node corre-
to a map of the New York subway which allows to find, sponds to a set of connected species in the comprehen-
for example, the simplest way from one station to sive map. The sets of proteins are grouped into
another or suggest an alternative route if one station modules. The nodes are connected either by positive
is closed. All the important partners or catalyzers for or negative arrows based on the available biological
a particular activation can be identified. All or one information derived from the complete map. The
path(s) between two species can be isolated and the resulting graph is an abstraction of the initial complex
necessary partners or intermediaries of a and detailed map.
Cell Cycle Transition, Detailed Regulation of Restriction Point 325 C
Cell Cycle Transition, Detailed Regulation of Restriction connected by influence arrows, either positive or negative,
Point, Fig. 3 Modular view of RB pathway. The nodes of this deduced from the initial network and based on biological knowl-
graph represent modules – or groups of interacting proteins – edge of the nature of the interactions
derived from the comprehensive map (Fig. 2). The nodes are
The different versions of the RB network proposed modeling can be applied to influence network (▶ Cell
here (Figs. 1–3) can then be translated into mathemat- Cycle Modeling Using Logical Rules), etc.
ical formulas. Ordinary differential equations are often More generally, the construction of the map of
used for a reaction-based network (▶ Cell Cycle a particular process and the gathering of biological
Modeling, Differential Equation), whereas discrete knowledge from heterogeneous sources is the first
C 326 Cell Cycle Transition, Principles of Restriction Point
step toward the elaboration of a mathematical model Matsuoka Y, Villéger A, Boyd SE, Calzone L, Courtot M,
(Novák and Tyson 2004; Alfieri et al. 2009). Mathe- Dogrusoz U, Freeman TC, Funahashi A, Ghosh S, Jouraku A,
Kim S, Kolpakov F, Luna A, Sahle S, Schmidt E, Watterson
matical modeling provides a formal tool to answer S, Wu G, Goryanin I, Kell DB, Sander C, Sauro H, Snoep JL,
specific biological questions. More importantly, it Kohn K, Kitano H (2009) The systems biology graphical
allows to test and predict, in silico, different perturba- notation. Nat Biotechnol 27(8):735–741
tions of a biological process that would be too complex Novák B, Tyson JJ (2004) A model for restriction point control
of the mammalian cell cycle. J Theor Biol 230(4):563–579
to understand by simple intuitive reasoning. Sherr CJ (1996) Cancer cell cycles. Science 274(5293):
1672–1677
Weinberg R (ed) (2006) The biology of cancer. Garland Science,
Cross-References New York, pp 255–305
▶ CDK Inhibitors
▶ Cell Cycle
▶ Cell Cycle Checkpoints Cell Cycle Transition, Principles of
▶ Cell Cycle Modeling Using Logical Rules Restriction Point
▶ Cell Cycle Modeling, Differential Equation
▶ Cell Cycle Transition, Principles of Restriction Tae Lee1, Guang Yao2 and Lingchong You3
1
Point School of Biology and School of Physics, Georgia
▶ Cell Cycle, Cancer Cell Cycle and Oncogene Institute of Technology, Atlanta, GA, USA
2
Addiction Department of Molecular and Cellular Biology,
▶ Clustering University of Arizona, Tucson, AZ, USA
▶ Cyclins and Cyclin-dependent Kinases 3
Biomedical Engineering Department, Duke
▶ Module Network University, Durham, NC, USA
▶ Pathway
▶ Transcription
▶ Transcription Factor Synonyms
Rate
G1 d
Restriction point
M
C
[X]
b
Rate
G2 S
d
[X ]
Cell Cycle Transition, Principles of Restriction Point, Cell Cycle Transition, Principles of Restriction Point,
Fig. 1 The cell cycle (Cell cycle, phases of the) and the R-point Fig. 2 Rates of synthesis (f(X)) and degradation (d(X)) deter-
mine the dynamics of the system described in Equation 1. Sup-
pose d(X) is always first order in terms of X and is sufficiently
of the cell cycle associated with different starvation small. (a) If f has ultrasensitive dependence on [X], the system is
conditions. That is, cells arrested at quiescence by bistable with two stable steady states (filled circles) separated by
different starvation regimens resumed DNA synthesis an unstable one (open circle). (b) Otherwise, the system is
at apparently the same time point. With sequential monostable with one unstable steady state and a stable one. If
d(X) is too large, the system is monostable for both cases, with
applications of different starvation regimens, cells did a single trivial steady state ([X] ¼ 0)
not escape from any previous arrest condition (indicat-
ing overlapping arrest points). Pardee and colleagues
further demonstrated that the R-point is located several entry (Thron 1997; Aguda and Tang 1999; Novak
hours before the S-phase. Passing this critical time and Tyson 2004). While these studies differed in
point, cells commit to proliferation and continue to molecular details, their consensus is that the R-point
transit the cell cycle even when the exogenous growth is regulated by a bistable switch (▶ Bistability), which
condition was removed (Yen and Pardee 1978). The can generate all-or-none and hysteretic (history-
transition between proliferative and quiescent states dependent) response.
was further analyzed by Zetterberg and colleague A bistable switch can result from one or more pos-
(Zetterberg and Larsson 1985), who measured the itive feedback loops with sufficient nonlinearity
timing of the irreversible commitment to proliferation (Ferrell 1996). For example, consider that X regulates
in individual cells by time-lapse cinematograph, and its own synthesis via a positive feedback loop,
found the R-point occurs at a remarkably constant time
after mitosis in exponentially growing cells. dX
¼ f ðXÞ dðXÞ; (1)
dt
R-point Governed by a Bistable Switch
The R-point has two main characteristics: first, an where f and d denote the synthesis and degradation
initial high threshold on growth-stimulating signals, rates of X, respectively. When the synthesis of X is
which ensures a tight control of cell growth; second, ultrasensitive (e.g., due to cooperativity, multistep,
a low-maintenance mechanism, which ensures that zero-order, or inhibitor ultrasensitivity (Ferrell
once committed, cell proliferation will be completed 1996)), the f(X) curve and the d(X) curve can cross at
even in the absence of sustained growth signals. dt ¼ 0, Fig. 2a). This will generate
three steady states ( dx
Theoretical studies have suggested potential mecha- two stable steady states (and thus, a bistable system)
nisms underlying the R-point control of cell cycle separated by an unstable one. When the system at the
C 328 Cell Cycle Transition, Principles of Restriction Point
Serum
Maintenance-threshold
Myc 10–1
E2F
10–2
CycD / CDK4,6
Activation threshold
10–3
0 0.5 1
Rb Rb-P Serum
Cell Cycle Transition, Principles of Restriction Point, E2F2, and E2F3a). E2F is a transcriptional factor that
Fig. 3 A simplified Rb-E2F network. It contains two positive can activate a large battery of genes encoding activities
feedback loops: (1) auto-catalysis of E2F, and (2) mutual- involved in DNA replication and cell cycle progres-
inhibition feedback involving E2F, CycE/Cdk2, and Rb sion. In quiescent cells, E2F protein is bound and
repressed by tumor suppressor Rb, and transcription
from the E2F promoter is also inhibited by Rb. Upon
unstable steady state is subjected to minor perturba- sufficient growth stimulation, Myc and cyclin D
tions, the system will move away from the unstable (CycD) are induced. CycD associates with and acti-
steady state towards the stable ones. Without ultrasen- vates Cyclin dependent kinases (Cdk4,6), which phos-
sitivity in the synthesis of X, the system is always phorylate Rb and release E2F. Myc directly binds to
monostable. For example, it can have an unstable the E2F promoter, facilitating E2F binding to nearby
steady state and a stable one (Fig. 2b). sites. E2F then activates its own transcription, forming
a positive feedback loop. In addition, E2F also acti-
Molecular Interactions at the G1/S Transition: The vates Cyclin E, which forms a complex with Cdk2 to
Rb-E2F Switch further phosphorylate Rb and relieve its repression of
The molecular events leading to the traverse of the E2F, constituting another positive feedback loop. The-
R-point involve a sophisticated network of molecules oretical studies suggested that the positive feedback
and interactions (defined as the Rb-E2F network, mediated by E2F and the ultrasensitivity created by the
▶ Cell Cycle Transition, Detailed Regulation of Rb repression of E2F could potentially generate
Restriction Point) (Calzone et al. 2008). Here we pre- a bistable switch (Fig. 4, ▶ Cell Cycle Model Analysis,
sent a highly simplified model that focuses on the cell Bifurcation Theory) (Thron 1997).
cycle entry events (Yao et al. 2008), as shown in Fig. 3. Recent experimental analysis has demonstrated that
In this simplified model, the node Rb represents the the Rb-E2F network indeed functions as a bistable
entire pocket protein family (RB, p107, and p130), and switch and generates bimodal and history-dependent
the node E2F represents all E2F activators (E2F1, E2F activities (Yao et al. 2008). Mammalian cells in
Cell Cycle Transitions, G2/M 329 C
quiescence remain in the OFF state of E2F when Yen A, Pardee AB (1978) Exponential 3t3-cells escape in mid-
deprived of growth stimulation. Following sufficient G1 from their high serum requirement. Exp Cell Res
116:103–113
growth stimulation, these cells eventually activate Zetterberg A, Larsson O (1985) Kinetic-analysis of regulatory
E2F and commit to proliferation. Afterward, E2F events in G1 leading to proliferation or quiescence of swiss
stays at the ON state to drive the completion of the 3t3 cells. Proc Natl Acad Sci USA 82:5365–5369
cell cycle, even after the growth stimuli are removed.
For a given serum concentration, the transition C
timing has been found to be highly variable, which
can be described with a stochastic model (Lee et al. Cell Cycle Transitions, G2/M
2010). A similar bistable switch mechanism was
identified in yeast for the control of the Start point Judit Zámborszky
(Charvin et al. 2010). The Microsoft Research – University of Trento Centre
for Computational and Systems Biology, CoSBi,
Trento, Italy
Cross-References
▶ Bistability Synonyms
▶ Cell Cycle Checkpoints
▶ Cell Cycle Dynamics, Irreversibility Entry into mitosis; G2/M transition; Mitotic entry
▶ Cell Cycle Model Analysis, Bifurcation Theory
▶ Cell Cycle of Mammalian Cells
▶ Cell Cycle Transition, Detailed Regulation of Definition
Restriction Point
▶ Cell Cycle, Budding Yeast G2/M transition (or mitotic entry) is the progression
▶ Mitosis from G2-phase to M-phase (also referred to as
“mitosis”) of the mitotic cell division cycle.
References
Characteristics
Aguda BD, Tang Y (1999) The kinetic origins of the
restriction point in the mammalian cell cycle. Cell Prolif
Eukaryotic cells govern a repetitive series of events
32:321–335
Calzone L, Gelay A, Zinovyev A, Radvanyi F, Barillot E (2008) that result in self-reproduction (Morgan 2006). The
A comprehensive modular map of molecular interactions in cell cycle (▶ Cell Cycle) consists of four different
RB/E2F pathway. Mol Syst Biol 4:173 stages: G1-, S-, G2-, and M-phase by which
Charvin G, Oikonomou C, Siggia ED, Cross FR (2010) Origin of
a growing cell replicates all its components and divides
irreversibility of cell cycle start in budding yeast. PLoS Biol
8(1):e1000284 into two nearly identical daughter cells (▶ Cell Cycle,
Ferrell JE (1996) Tripping the switch fantastic: how a protein Physiology). The alteration between cell cycle phases
kinase cascade can convert graded inputs into switch-like is sophisticatedly controlled, ensuring appropriate
outputs. Trends Biochem Sci 21:460–466
timing and order of sequential action (G1-phase,
Lee T, Yao G, Bennett D, Nevins J, You L (2010) Stochastic E2F
activation and reconciliation of phenomenological cell-cycle G1/S transition, DNA replication, G2-phase, G2/M
models. PLoS Biol 8:9 transition, mitosis, and mitotic exit (▶ Cell Cycle
Novak M, Tyson JJ (2004) A model for restriction point control Transitions, Mitotic Exit)). Transitions are tightly
of the mammalian cell cycle. J Theor Biol 230:563–579
monitored by “surveillance mechanisms” (checkpoints
Pardee AB (1974) Restriction point for control of normal
animal-cell proliferation. Proc Natl Acad Sci USA (▶ Cell Cycle Checkpoints)) (Kastan and Bartek
71:1286–1290 2004). Checkpoints are control mechanisms governing
Thron CD (1997) Bistable biochemical switching and the control crucial irreversible transitions of the cell division cycle
of the events of the cell cycle. Oncogene 15:317–325
(▶ Cell Cycle Dynamics, Irreversibility). Monitoring
Yao G, Lee TJ, Mori S, Nevins JR, You LC (2008) A bistable
Rb-E2F switch underlies the restriction point. Nat Cell Biol the environmental conditions (nutrient, hormones,
10:476–U255 etc.) and the stage of the DNA (if there are lesions or
C 330 Cell Cycle Transitions, G2/M
Active form
P
CDK / Cyclin B
CDK
Cyclin B M G2
Active form Mitosis
Cdc25
(▶ Cell Cycle Dynamics, Irreversibility, ▶ Cell Cycle
Active form Dynamics, Bistability and Oscillations, ▶ Cell Cycle
Cell Cycle Transitions, G2/M, Fig. 2 Representation of two
Model Analysis, Bifurcation Theory). Bistability (and
interconnected positive feedback loops regulating entry into hysteretic transitions) of the Wee1-CDK-Cdc25 con-
mitosis during cell cycle. Dashed arrows symbolize activations trol system in frog egg cells has been experimentally
and dashed lines with an end -| assist for inhibitory reactions tested in Jill Sible’s lab (Sha et al. 2003) and in Jim
Ferrell’s lab (Pomerening et al. 2003).
type of positive feedback regulation than the CDK- The switches of cell cycle phases are triggered by
Wee1 link (Fig. 2). transient signals that emerge when the checkpoint con-
In sum (Fig. 3), during the G2-phase of the cell ditions are satisfied. The irreversibility of these transi-
cycle, CDK activity is kept low with inhibitory phos- tions ensures that when the generated signal
phorylation events carried out by the Wee1 kinase. As disappears, cells do not step back into their previous
the activity of CDK/Cyclin B increases in time, it stage resulting in aberration of cells (▶ Cell Cycle
activates its helper phosphatase Cdc25 which in return Dynamics, Irreversibility). For instance, if cells escape
removes the inhibitory phosphate groups from CDK. the checkpoint in G2-phase, they might end up with
The sudden increase of CDK activity triggers inacti- nonidentical daughter cells with extra or less DNAs
vation of Wee1 kinase by its phosphorylation. The than the mother cell, rolling over the stability of the
active CDK generates mitotic events (e.g., chromo- genome that causes fatal errors (Kastan and Bartek
some condensation, nuclear envelope breakdown, 2004).
chromatid segregation, assembly of mitotic spindle Upon DNA damage, the checkpoint proteins are
(▶ Mitosis, ▶ Cell Cycle, Physiology)) and cells often recruited to DNA lesions by repair complexes
enter the M-phase. that generate signals to activate the checkpoint
The two positive feedback loops (▶ Feedback response pathway (▶ Cell Cycle Arrest After DNA
Regulation, ▶ Positive Feedback) described previ- Damage). Transducers transmit and amplify the signal
ously act synergistically and provide an abrupt to downstream targets switching on the DNA-repair
increase in the activity of CDK making the G2/M apparatus and blocking the cell cycle machinery
transition an irreversible, switch-like event (Kastan and Bartek 2004). In case of activated
C 332 Cell Cycle Transitions, G2/M
Cell Cycle Transitions, G2/M, Table 1 Different names used for functional homologues of the regulatory elements in the G2/M
transition network
Generic name Function Mammals Budding yeast Fission yeast Xenopus laevis
CDK Cyclin-dependent kinase Cdc2 Cdc28 Cdc2 Cdc2
Cyclin B Regulatory subunit of CDKs in mitosis CycB Clb1,2 Cdc13 CycB
Wee1 CDK/Cyclin B inhibitory kinase Wee1 Swe1 Wee1, Mik1 Wee1, Myt1
Cdc25 CDK/Cyclin B activatory phosphatase Cdc25 Mih1 Cdc25 Cdc25
CAK Cyclin-dependent kinase activating kinase CAK Cak1 Csk1 MO15
chromosomes are fully replicated and properly controller for the progression of the cell cycle) is
aligned on the mitotic spindle (the “spindle assembly a dimer of a protein kinase, encoded by the CDC28/
checkpoint” or SAC). Budding yeast cells have a cdc2+/CDK1 gene, and a regulatory subunit called
“morphogenetic checkpoint” which blocks mitosis if cyclin B which had been identified years earlier by
the bud is not properly formed, and a spindle align- Tim Hunt. Because the cyclin subunit is required for
ment checkpoint that blocks cell division if the activity of the protein kinase, the kinase is called
daughter cell nucleus is not properly pushed into the cyclin-dependent kinase, or CDK for short. The dis-
bud compartment. covery of the composition of MPF was the beginning
of the modern era of molecular biology of cell cycle
Molecular Cell Biology control in eukaryotic cells. In 2001 the Nobel Prize in
Genetic analysis of the molecular machinery control- Physiology or Medicine was shared by Hartwell,
ling the major events of the cell cycle was begun in Nurse, and Hunt.
budding yeast by Leland Hartwell, who identified In the 1990s a host of cell cycle genes were identi-
dozens of genes that induced the “cell division cycle” fied in budding yeast, including:
phenotype. By Hartwell’s definition, a CDC gene is CLN1,2: encoding G1 cyclins important for START and
a gene that, when mutated, arrests all cells in a specific bud emergence
stage of the cell cycle. For example, cdc7 mutant cells CLB1-6: encoding B-type cyclins essential for DNA
do not replicate their DNA or divide, but bud emer- synthesis and mitosis
gence and growth continue as normal. Mutant cdc CDC20: encoding a ubiquitin-ligase component essen-
genes are isolated as temperature-sensitive alleles tial for sister chromatid separation and cyclin deg-
that on replicate plates form colonies at 25 C but not radation in anaphase
at the restrictive temperature (36 C). Presumptive CDC14: encoding a phosphatase essential for mitotic
mutant cells on the high temperature plate must be exit
examined microscopically to determine that the cells CDH1, SIC1: encoding proteins that stabilize G1 phase
have uniformly arrested at a particular morphological of the cell cycle
stage of the cell cycle. Hartwell’s first set of CDC
genes defined two parallel pathways (Hartwell et al. The Heart of the Budding Yeast Cell Cycle
1974): one pathway relating to bud emergence, isotro- Kim Nasmyth (1996) proposed to think of the budding
pic growth, and nuclear migration, and the other relat- yeast cell cycle as two alternative, self-maintaining
ing to DNA replication and mitosis. These two states: G1 (unreplicated chromosomes, growing but
pathways diverge at START and come together finally not committed to proliferate), and S/G2/M (committed
at mitotic exit. The gene with the earliest effect, whose to chromosome replication and mitosis). START is the
mutation blocked cells before the START transition, was transition from G1 to S/G2/M, and mitotic exit is the
CDC28. transition from S/G2/M back to G1. The molecular
Hartwell’s work set off a landslide of activity correlates of these two states are G1: high activity of
identifying CDC genes in a variety of organisms, Cdh1 and Sic1, no kinase activity associated with
finding other genes that interacted with CDC genes, Clb1-6; and S/G2/M: low activity of Cdh1 and Sic1,
and later cloning these genes and characterizing their high Clb-dependent kinase activity. Finally, Nasmyth
protein products and the interactions among these recognized that a possible reason for these alternative,
proteins (for review, see Morgan 2007). Worthy of self-maintaining states was the antagonism between
note was Paul Nurse’s discovery of a fission yeast Cdh1 and Sic1 on one side (destroying the activity of
gene, cdc2+, which seemed to encode the same pro- the Clb-dependent kinase complexes) and the B-type
tein as CDC28 in budding yeast. Nurse went on to cyclins, Clb1-6, on the other side (inactivating Cdh1
show, quite unexpectedly, that a human gene, later and Sic1).
called CDK1, encodes a protein that can rescue cdc2 Nasmyth stopped short of identifying his idea with
mutant fission yeast cells. In 1989 a consortium of a bistable control system flipping back and forth
geneticists and biochemists demonstrated that MPF between two stable steady states as cell-cycle-
(mitosis promoting factor in frog eggs, a master progression signals pushed the control system past
Cell Cycle, Budding Yeast 339 C
SK EP
CDK
Growth Chromosome
damage alignment
problems problems
Enemies
C
CDK
G2 M A
T
G1 G1
SK EP
SK EP
Enemies CDK
G1 S G2 M A T G1
Cell Cycle, Budding Yeast, Fig. 1 The heart of the budding G1-like steady state merges with and is annihilated by the
yeast cell cycle. Adapted from Tyson and Novak (2008). (Top unstable steady state at a saddle-node bifurcation. Exit from
center) Cyclin-dependent kinase (CDK) and its enemies (Cdh1 mitosis (middle right) is induced by EP (Cdc14 phosphatase),
and Sic1) are involved in an antagonistic (double negative) which activates the proteins that destroy CDK activity. When the
feedback loop, which can persist in either of two stable steady upper steady state merges with the unstable steady state, the cell
states: a G1-like state with low CDK activity and prominent returns to the G1 state. Because EP activity depends on CDK,
enemy forces, and an M-like steady state with high CDK activity after CDK falls, EP activity also disappears, but now the enemies
and the enemies in retreat. These two stable steady states are have the upper hand. The two blue arrows together form
represented by black circles along the CDK axis (middle center). a “hysteresis loop” that switches the CDK-control system from
The pairs of circles side-by-side are meant to represent the same OFF to ON and back again during each cell cycle. (Bottom) Pro-
steady state under conditions when both the starter kinase (SK) gression around the hysteresis loop corresponds to periodic
and exit phosphatase (EP) are close to zero. The two stable changes in the activities of CDK, enemies, SK, and EP. At
steady states are separated by an unstable steady state (open START, a burst of SK activity flips the CDK switch ON, and at
circle). A newborn cell in G1 (middle left) can be induced to EXIT, a burst of EP activity flips the CDK switch OFF. (Top left
enter S/G2/M by activation of SK (Cln-dependent kinase), and right) Checkpoints function by inhibiting either SK or EP.
which phosphorylates and weakens Cdh1 and Sic1, allowing Growth requirements and damage repair processes typically halt
CDK activity to rise and trigger S phase (follow the blue arrow the cell cycle in G1 phase by preventing up-regulation of SK
for the cell’s passage through the cell cycle). Among other activity. Misalignment of chromosomes on the spindle blocks
duties, CDK downregulates SK, but the bistable switch remains exit from mitosis by preventing activation of EP
in the upper state. Notice that, as SK activity increases, the stable
C 340 Cell Cycle, Budding Yeast
C
Cell Cycle, Cancer Cell Cycle and
Oncogene Addiction
Abbott LH, Michor F (2006) Mathematical models of targeted Growth cycle Chromosome cycle
cancer therapy. Br J Cancer 95(9):1136–1141
Hanahan D, Weinberg RA (2000) The hallmarks of cancer. Cell The two cycles are not coordinated ® Cell is not viable
100(1):57–70 Cell division
Jones PA, Baylin SB (2002) The fundamental role of epigenetic
G1
events in cancer. Nat Rev Genet 3(6):415–428
Cell M S C
Minna JD, Dowell J (2005) Erlotinib hydrochloride. Nat Rev
growth G2
Drug Discov Suppl 4:S14–S15
Pao W, Chmielecki J (2010) Rational, biologically based treat-
ment of EGFR-mutant non-small-cell lung cancer. Nat Rev The two cycles are coordinated ® Cell is viable
Cancer 10(11):760–774 Cell division
Sawyers CL (2003) Opportunities and challenges in the devel-
opment of kinase inhibitor therapy for cancer. Genes Dev G1
17(24):2998–3010 Cell
M S
Vogelstein B, Kinzler KW (1993) The multistep nature of can- growth -
G2
cer. Trends Genet 9(4):138–141
Weinstein IB (2002) Cancer. Addiction to oncogenes – the
Achilles heal of cancer. Science 297(5578):63–64
Cell Cycle, Cell Size Regulation, Fig. 1 Checkpoints ensure
size homeostasis via negative signal(s) from the growth cycle to
the chromosome cycle
Cell Cycle, Cell Size Regulation size control mechanisms operate, which ensure that at
specific checkpoints the cell cycle progression is
Ákos Sveiczer1 and Anna Rácz-Mónus2 blocked, unless a critical size has been reached. Such
1
Department of Applied Biotechnology and Food ▶ cell cycle checkpoints exist both in G1 and G2
Science, Budapest University of Technology and phases, which regulate the onset of S and M phases,
Economics, Budapest, Hungary respectively (Alberts et al. 2009). However, in
a specific cell type, both size checkpoints do not nec-
essarily work efficiently: one might dominate, while
Synonyms the other becomes cryptic. In the absence of both size
controls, cells would be smaller and smaller from cycle
Size checkpoint; Size control to cycle and finally probably lose viability (Fantes and
Nurse 1981; Sveiczer et al. 1996). The reason is that
growth (cytoplasmic) cycle was normally longer than
Definition chromosome (nuclear) cycle; therefore the cell would
not double its components and its size before cytoki-
Cell size regulation during the cell cycle is nesis (Fig. 1). By contrast, size control extends the
a mechanism coordinating growth and division. It duration of either G1 or G2 phase by blocking the
ensures that cell size distribution of a population next phase transition, ensuring a compensation mech-
remains constant in consecutive generations. anism, that is, the larger the cell at birth is, the shorter
its cycle will be. With other words, size homeostasis is
achieved via negative signal(s), which are generated
Characteristics by the growth cycle and affect progression of the
chromosome cycle. This entry summarizes how size
Size homeostasis is a general characteristic feature of checkpoints act in the most important model organisms
populations of unicellular organisms, at least in steady- studied so far, namely, budding yeast, fission yeast,
state conditions. It means that an "average" cell must early embryonic cells, and finally mammalian cells. In
double its size and all its components between two every case, we will briefly discuss the methods used
successive divisions. At the individual cellular level, in size control studies and the conclusions drawn.
C 344 Cell Cycle, Cell Size Regulation
References
activity of WEE1 oscillate and are high during mPER1, which leads to its subsequent degradations
the evening. These, in turn, determine the duration of (Sahar and Sassone-Corsi 2009). In other words,
G2-phase. Cells divide in late evening when WEE1 CHK2-induced phosphorylation triggers premature
level decreases in order to allow cells to enter into the degradation of mPER1, which creates phase shift in
M-phase. the circadian clock. The DNA damage–induced phase
WEE1 undergoes complex regulation, because cell shifts are unique because DNA damage predominantly
cycle–controlled WEE1 is in concert with another creates phase advances, but not much of delays.
regulation that is periodically imposed by the circadian This unique phase response was investigated with
clock. The period of cell division cycle varies as tem- computational modeling and revealed possible
perature and/or nutrient conditions change while the molecular criteria for such phenotype: (1) An autocat-
period of circadian clock remains relatively constant. alytic positive feedback mechanism is required
This fact sets up a stage for in silico studies for the cell in addition to the time-delayed negative feedback
cycle and the circadian clock as coupled oscillators. loop and (2) CHK2 phosphorylates and triggers
Computational modeling and analysis of this particular subsequent degradations of mPERs that are not
coupling via WEE1 revealed: (1) phase-lock and bound to their transcription factors, BMAL1/CLOCK
synchronization of cell division cycle with circadian (Hong et al. 2009). This in silico study provides
rhythms when the mass doubling time (MDT, or the testable hypotheses for the detailed mechanism of
period of cell cycle) is close to 24 h, (2) quantized cell DNA damage–induced phase shifts of circadian
division cycles when the MDT is different than 24 h, rhythms, which is important in understanding why
and (3) circadian influenced cell size control DNA damage signaling cascade influences the
(Zamborszky et al. 2007). If the MDT is close to circadian clock. The implications of such an interac-
24 h, the circadian-regulated WEE1 synchronizes cell tion are hypothesized that cell cycle machinery utilizes
cycle to the phase of circadian rhythms regardless of circadian rhythms to activate WEE1 by prematurely
initial phase of cell cycle. However, this coupling degrading mPERs, which subsequently releases nega-
results in multi-modal cell cycle distributions or quan- tive feedback on BMAL1/CLOCK. The BMAL1/
tized cell cycle lengths if the MDT deviates from 24 h, CLOCK-activated WEE1 will delay cell cycle
because the circadian clock influences different phases progress upon DNA damage, and provide more time
of cell cycle in each generation due to their period for DNA repair.
differences. The circadian clock may lengthen There are more clues of coupling between the cell
the MDT via WEE1 depending on its influences on cycle and the circadian clock. The transcript of c-Myc
particular cell cycle phase. This periodic break on cell oscillates with a circadian period, and BMAL1/NPAS2
cycle by the circadian clock enforces cell size control suppresses its transcription (Sahar and Sassone-Corsi
by regulating the duration of growth at the G2 phase. 2009). Its molecular details, however, remain
These multi-modal distributions are recently reported unexplored. Additional cell cycle components such as
in cyanobacteria cell cycle system under the influence Cyclin D1, Gadd45a, and Mdm-2 show rhythmic tran-
of the circadian clock with single cell measurements scripts, but their detailed regulations are unknown
(Yang et al. 2010). (Sahar and Sassone-Corsi 2009). The rhythmic Cyclin
The above seemingly unidirectional coupling was D1 transcripts suggest a role of the circadian clock at
recently shown to be conditionally bidirectional. G1-to-S transition in addition to G2-to-M transition via
Circadian rhythms show unique phase shifts upon WEE1. Furthermore, rhythmic Gadd45a and Mdm-2
DNA damage, which involves cell cycle checkpoint transcripts suggest potential roles of the circadian
regulator, CHK2. DNA damage induces DNA damage clock in DNA damage responses. It is possible that
responses that slow down cell cycle, arrest cell cycle, there are other molecular coupling components.
or lead to programmed cell death, apoptosis. These may be clock-controlled genes (CCGs) like
In mammalian system, ionizing radiation causes Wee1, conditionally affecting the circadian clock like
double-strand breaks that relays signal transduction CHK2, or novel parts of the circadian clock. Bidirec-
cascades via ATM and CHK2. Both ATM and CHK2 tional communications of these two oscillators require
interact with mPER1, and CHK2 phosphorylates further investigations.
Cell Cycle, Fission Yeast 349 C
In contrast to previous data, a recent effort to study
this coupling indicated that this connection might Cell Cycle, Fission Yeast
depend on cell types (Yeom et al. 2010). Yeom and
colleagues reported the circadian-independent cell Ákos Sveiczer and Anna Horváth
divisions in rat-1 fibroblasts. They showed that Department of Applied Biotechnology and
about 13-h cell division cycles at 37 C had no phase Food Science, Budapest University of Technology
correlations with the 24-h circadian rhythms when and Economics, Budapest, Hungary C
they followed 3–4 generations of cell divisions with
bioluminescence assay. Their data, however, also
indicate variability in cell cycle lengths, which may Synonyms
reflect quantized cell cycles as shown in
cyanobacteria studies (Yang et al. 2010). For accurate Cell division cycle; Mitotic cycle; Schizosac-
conclusions, larger data set of cell cycle lengths is charomyces pombe
necessary for careful study of coupling between the
cell cycle and the circadian clock.
Definition
13–14 mm till the end of the mitotic cell cycle by tip G1/S
−
growth (MacNeill and Fantes 1995). Although starva- checkpoint
• Cryptic size
tion blocks cell proliferation and induces either sexual Metaphase/ Cell division
anaphase control
differentiation (mating, meiosis, and sporulation) or checkpoint
dormancy, all these processes are not discussed in
S
this essay. G1
Fission yeast divides symmetrically, producing two
nearly identical progenies. As the sister cells behave − M
Rum1
S
− − (DNA replication)
Ste9
C − −
AP
M
Exit o
f Slp1 − Cdc2 −
C
AP Cdc13
Wee1
+
Cdc25
+
M G2/M G2
(Mitosis)
The Cdc2/Cdc13 dimer gets fully activated, and its albeit with the help of the remaining MPF activity.
functions help condense chromosomes and remodel (Note that Cdc13 is the only known essential cyclin
microtubules in fission yeast cells. As a consequence, in fission yeast, i.e., the onset of DNA replication does
the cells pass the G2/M transition (▶ Cell Cycle Tran- not require Cig2.) This transition is under size control
sitions, G2/M), therefore this Cdk/cyclin is also known in small wee mutants, but it is cryptic in wild type cells
as the M-phase promoting factor, or MPF. In anaphase (see above). The length of G1 influences when cell
the MPF activity abruptly starts to decrease, as the division occurs (Mitchison 1989): either in G1 (wee
anaphase promoting complex (APC), with the help of mutant) or rather in S phase (wild type) (for simplicity,
its auxiliary subunit Slp1, ubiquitinates Cdc13 cyclin, Fig. 4 shows a case, where cytokinesis comes about in
marking it for degradation by the proteasome (Morgan G1). As far as DNA replication starts, the cell returns to
2007). Losing Cdk activity leads to exit of M phase, the very point from which our cell cycle journey
i.e., the cell gets into the next G1 phase in a binucleated started in the previous generation.
state (▶ Cell Cycle Transitions, Mitotic Exit). During
early G1, MPF activity is very low for two reasons. Acknowledgment Research in our group is supported by the
First, APC (with the help of another auxiliary protein Hungarian Scientific Research Fund (OTKA K-76229) and by
the New Széchenyi Plan (Project ID: TÁMOP-4.2.1/B-09/1/
Ste9) continuously ubiquitinates newly synthesized
KMR-2010-0002).
Cdc13 proteins. Second, Cdc2/Cdc13 dimers are
bound to a stoichiometric inhibitor of Cdk, the Rum1
protein (▶ Cdk Inhibitors). In late G1, another cyclin Cross-References
(Cig2) is formed, also making dimers with Cdc2. This
complex is called the starter kinase, because it helps ▶ CDK Inhibitors
the cell pass through the G1/S (or Start) transition, ▶ Cell Cycle Modeling, Differential Equation
Cell Cycle, Physiology 353 C
▶ Cell Cycle Transitions, G2/M normal physiological (as opposed to pathological) con-
▶ Cell Cycle Transitions, Mitotic Exit ditions. Chromosomal replication (▶ DNA Replica-
▶ Cell Cycle, Cell Size Regulation tion) and ordered segregation of the replicated
▶ Cell Cycle, Physiology chromosomes (in eukaryotes by ▶ mitosis or ▶ meio-
▶ Cyclins and Cyclin-dependent Kinases sis) are fundamental aspects of cell cycle physiology
▶ Cytokinesis that are common to all proliferating cells. Other events
▶ DNA Replication such as cellular growth and division (▶ Cytokinesis) C
▶ Endoreplication are features of many cell cycles, but are not universal.
▶ Mitosis Each of these processes can be regulated by physio-
logical cues from within and outside the cell. Elucida-
tion of cell cycle physiology at the systems level
References therefore requires both an understanding of the univer-
sal chromosomal cycle and consideration of the ways
Hayles J, Fisher D, Woolard A, Nurse P (1994) Temporal in which other processes are coordinated with these
order of S-phase and mitosis in fission yeast is determined
chromosomal events.
by the state of the p34cdc2/mitotic B cyclin complex. Cell
78:813–822
MacNeill SA (2002) Genome sequencing: and then there were
six. Curr Biol 12:R294–R296 Characteristics
MacNeill SA, Fantes PA (1995) Controlling entry into mitosis in
fission yeast. In: Hutchison C, Glover DM (eds) Cell cycle
control. Oxford University Press, Oxford, pp 63–105 Coordination of cell cycle events with other biological
Mitchison JM (1989) Cell cycle growth and periodicities. In: processes such as macromolecular synthesis, cell
Nasim A, Young P, Johnson BF (eds) Molecular biology of growth, and differentiation is a key aspect of the phys-
the fission yeast. Academic, New York, pp 205–242
iology of all organisms (Alberts et al. 2010; Morgan
Mitchison JM (1990) The fission yeast, Schizosaccharomyces
pombe. Bioessays 12:189–191 2007). This is particularly evident in multicellular
Mitchison JM, Nurse P (1985) Growth in cell length in the fission organisms, where the development and maintenance
yeast Schizosaccharomyces pombe. J Cell Sci 75:357–376 of complex structures, tissues, and organs are only
Morgan DO (2007) The cell cycle. Principles of control. New
possible through timely and highly regulated cell pro-
Science, London
Nishitani H, Nurse P (1995) p65cdc18 plays a major role control- liferation, differentiation, and, in many cases, ▶ apo-
ling the initiation of DNA replication in fission yeast. Cell ptosis. Understanding biological systems of this
83:397–405 complexity constitutes a monumental challenge.
Sveiczer A, Novak B, Mitchison JM (1996) The size control of
There is therefore much to be gained by studying
fission yeast revisited. J Cell Sci 109:2947–2957
Sveiczer A, Tyson JJ, Novak B (2004) Modelling the fission comparatively simple and genetically well-defined
yeast cell cycle. Brief Funct Genom Proteom 2:298–307 multicellular organisms such as Drosophila
melanogaster (▶ Model Organism), by studying the
cell cycles of early cleaving embryos, where there is
no appreciable cell growth or differentiation (▶ Cell
Cell Cycle, Physiology Cycle of Early Frog Embryos), and by studying cell
cycle regulation in unicellular models (▶ Cell Cycle,
Chris J. Norbury Archaea; ▶ Cell Cycle, Budding Yeast; ▶ Cell Cycle,
Sir William Dunn School of Pathology, Fission Yeast; ▶ Cell Cycle, Prokaryotes), including
University of Oxford, Oxford, UK cell lines derived from multicellular organisms (▶ Cell
Cycle of Mammalian Cells).
Consideration of what constitutes “normal” physi-
Definition ological conditions can be an important aspect of cell
cycle studies. While the conditions experienced by
The physiology of the ▶ cell cycle describes the pro- a cell within a multicellular organism can often be
cesses that underlie and regulate the cycle under assumed to be physiologically relevant, it is usually
C 354 Cell Cycle, Prokaryotes
In fact, cell division depends on the position of the ends up in the swarmer compartments and it is absent
two oriCs which are able to inhibit septum formation; in the stalked cell that can immediately starts a new
several proteins such as MinC and MinD, which are round of replication.
localized at the poles, together with the DNA- All cell cycle regulation factors are schematized in
associated proteins SlmA in E. coli and Noc in Fig. 2 where we illustrate the multilevel regulation of
B. subtilis are ensuring that cell division can take the Caulobacter cell cycle. Two main oscillators are
place only in the midcell (Margolin 2006). working during cell cycle progression: (1) the tran-
scriptional and epigenetic circuit (CtrA-DnaA-GcrA-
Cell Cycle Regulation in C. crescentus CcrM); (2) the phosphorylation/proteolysis and
C. crescentus, a gram negative bacterium of the transcription circuit (CckA-CtrA-DivK). The latter
alpha subdivision of proteobacteria, has recently also involves coordination of CtrA proteolysis through
become a model organism for bacterial cell cycle the regulation of DivK activity.
studies (Ryan and Shapiro 2003). In this organism Transcription of ctrA is controlled by circuit com-
each cell division is asymmetric producing always posed by DnaA and GcrA, and the DNA
two functionally and morphologically different methyltransferase CcrM. DnaA, which is also
cells, the replicating “stalked” cell type and the veg- involved in initiation of DNA replication, as
etative “swarmer” type (Fig. 1). After each initiation discussed for E. coli and B. subtilis, is a key element
of DNA replication, the replication fork is kept in cell cycle regulation in C. crescentus and it also
blocked so that the Caulobacter cell cycle can follow controls the transcription of about 40 genes involved
a pattern of once-and-only-once replication per divi- in nucleotide biogenesis, cell division, and polar mor-
sion (G1, S, and G2 phases are temporally phogenesis. DnaA also activates the transcription of
distinguished). the gcrA gene. GcrA controls the de novo transcrip-
Many factors are known to regulate cell cycle tion of ctrA after proteolysis and genes involved in
progression and most of them are members of the DNA metabolism and chromosome segregation,
family of two-component system proteins, comprised including those encoding DNA gyrase, DNA
of histidine kinases, histidine phosphotransferases, helicase, DNA primase, and DNA polymerase III.
and their response regulator substrates. Among The transcriptional loop of ctrA is closed by CcrM.
those proteins CtrA is the master regulator of the In fact, CtrA activates the transcription of ccrM,
Caulobacter cell cycle; it is an essential response which encodes for a DNA methyltransferase whose
regulator whose activity as a transcription factor abundance is cell cycle dependent. CcrM is able to
varies as a function of the cell cycle. CtrA controls activate the dnaA promoter region through methyla-
various functions during cell cycle progression by tion, closing the positive feedback composed by
activating or repressing expression of genes involved CtrA, DnaA, and GcrA.
in cell division (ftsZ), flagellum biogenesis genes, CtrA must be phosphorylated to bind DNA and its
stalk biogenesis regulatory genes, pili biogenesis phosphorylation depends on cell cycle progression.
genes, and chemotaxis genes. CtrA also blocks the An essential phosphorelay, composed by the hybrid
initiation of DNA replication through binding of the histidine kinase CckA and the histidine
replication origin, oriC. phosphotransferase ChpT, is responsible for CtrA
CtrA activity and stability varies during the cell phosphorylation (Biondi et al. 2006). The membrane
cycle. Oscillation of CtrA levels, peaking at the hybrid histidine kinase CckA is dynamically local-
predivisional stage before cell division, is achieved ized during cell cycle progression acting as a kinase
by different mechanisms: transcription, proteolysis, for CtrA when it is localized at the poles, while it acts
and phosphorylation control. The result of this oscilla- as a phosphatase in the delocalized form. DivK,
tion is that CtrA is present and phosphorylated in which is a single domain response regulator that con-
swarmer cells, then is degraded at the G1/S transition trols CckA localization, plays an essential role as
and then transcribed again and phosphorylated in the a positive regulator of cell cycle progression because,
predivisional stage. Eventually, at cell division, CtrA when phosphorylated, it indirectly delocalizes CckA
Cell Cycle, Prokaryotes 357 C
DivK
divK
Gene expression
Transcription, DNA-binding
DivJ~P PleC + Pi
Post-translational/protein interactions
Phosphotransfer events DivJ PleC
Methylation DivK~P C
CckA-HK~P CckA-HK
ChpT~P ChpT
gcrA GcrA
P1 P2 CtrA
ctrA CpdR CpdR~P
CtrA~P Proteolysis
(ClpX)
ClpP
clpP
CcrM Origin of replication
ccrM
RcdA
rcdA
Pili biogenesis
Cell Cycle, Prokaryotes, Fig. 2 The circuit controlling cell cycle progression in C. crescentus. Colors of arrows have different
colors depending on their interaction type (as indicated in the legend). See text for details
2 σF σE
σE
3 σF σE
σG
4 σG σK σK
Forespore Mother cell
phosphorylation of the master regulator of sporulation, Spo0A, which integrates several environmental sig-
Spo0A. Activated Spo0A leads to a polar septation that nals using its kinases sensing apparatus, is then able to
initially produce a larger mother cell, which eventually start a series of steps leading to the formation of a spore
lised, and a smaller forespore surrounded by special and the decay of the mother cell. This multistep pro-
envelops (Piggot and Hilbert 2004). cess is carried on by different sigma factors acting in
In B. subtilis, five histidine kinases (KinA, the prin- the mother cell or in the forespore. Each sigma factor is
cipal kinase, to E) are controlling the phosphorylation indeed able to activate via many other sporulation
state of Spo0A, through the phosphorylation of Spo0F, factors the following sigma factor in a different
which then transfers to the histidine compartment.
phosphotransferase Spo0B and eventually to the Activated Spo0A and Sigma H are present
response regulator Spo0A as illustrated in Fig. 3. and functional in the step right before the asymmetric
Cell Cycle, Synchronization 359 C
septation and they together activate many Definition
genes including sigma F and sigma E precursors,
respectively, acting in the forespore and in the This term refers to experimental methods to obtain
mother cell compartment. In order to have a fully a population of cells that are in the same cell-cycle
functional Sigma E, Sigma F is required. At this stage (Davis et al. 2001). Cell-cycle synchronization
step, Sigma E activates other factors in order to allows the analysis of biological processes that are
have an active Sigma G in the forespore compart- specific to certain cell-cycle stages and that would C
ment, which is eventually required to activate otherwise be confounded or averaged out in asynchro-
Sigma K in the mother cell. Those last two nous populations. Cell-cycle synchronization can be
factors are required for full maturation of the achieved in two general ways (Fig. 1). (1) Using “block
spore, which is characterized by a special and release” approaches where all cells are first
envelope that is responsible for its special resistance arrested in a defined cell-cycle stage (“block”), and
to stresses. then allowed to cycle again in a synchronous manner
(“release”). This approach involves adding/removing
specific molecules to/from a culture or by using cells
References with conditional mutant alleles of cell-cycle regula-
tors. (2) Isolating cells that are in a given stage of the
Biondi EG, Reisinger SJ, Skerker JM, Arif M, Perchuk BS,
Ryan KR, Laub MT (2006) Regulation of the bacterial
cell cycle based on a phenotypic characteristic such as
cell cycle by an integrated genetic circuit. Nature size (centrifugal elutriation) or age (“baby machine”).
444(7121):899–904
Margolin W (2006) Bacterial division: another way to box in the
ring. Curr Biol 16(20):R881–R884
Piggot PJ, Hilbert DW (2004) Sporulation of Bacillus subtilis. Division
Curr Opin Microbiol 7(6):579–586 Selective ‘Block / release’
Ryan KR, Shapiro L (2003) Temporal and spatial regulation in method method
prokaryotic cell cycle progression and development. Annu Growth
Rev Biochem 72:367–394
Wang JD, Levin PA (2009) Metabolism, cell growth
Asynchronous
and the bacterial cell cycle. Nat Rev Microbiol
7(11):822–827
culture
Zakrzewska-Czerwińska J, Jakimowicz D, Zawilak-Pawlik A,
Messer W (2007) Regulation of the initiation of chromo-
somal replication in bacteria. FEMS Microbiol Rev
31(4):378–387
J€
urg B€ahler and Samuel Marguerat
Department of Genetics, Evolution and Environment
and UCL Cancer Institute, University College London,
London, UK
Cross-References Cross-References
▶ Cell Cycle Analysis, Expression Profiling ▶ Cell Cycle Analysis, Expression Profiling
References
Davis PK, Ho A, Dowdy SF (2001) Biological methods for cell- Cell Cycle-regulated Kinase Complexes
cycle synchronization of mammalian cells. Biotechniques
30:1322–1326, 1328, 1330–1331
▶ Cyclins and Cyclin-dependent Kinases
J€
urg B€ahler and Samuel Marguerat
Cell Cycle–regulated Transcription
Department of Genetics, Evolution and Environment
▶ Cell Cycle-regulated Gene Expression
and UCL Cancer Institute, University College London,
London, UK
Definition
Cell Division
This term describes a specific mode of gene regulation
▶ Cell Cycle of Mammalian Cells
where transcript levels peak at a specific cell cycle
phase, creating periodic profiles of gene expression
across several cycles (Fig. 1). Periodic gene expression
is not only found among cell cycle–regulated genes but
also among genes regulated during other periodic pro-
Cell Division Cycle
cesses such as the circadian cycle.
▶ Cell Cycle, Biology
▶ Cell Cycle, Fission Yeast
G1 S G2 M G1 S G2 M
Level of gene expression
Time
Cell Membrane Proteins, Fig. 1 Scheme of prominent membrane domains, and both ends of the protein reside within
features of a biological membrane: Hydrophilic polar head the same volume separated by a membrane. Another peripheral
group (a) and a hydrophobic oily fatty acid group (b) are oriented membrane protein associated with the membrane via a partially
to form a lipid bilayer (c) that seals the fatty acid chains between hydrophobic polypeptide chain is shown in (g). A peripheral
two layers of polar head groups. A bitopic, type I integral mem- membrane protein (e) with attached lipid modification (h) is
brane protein (d) has parts on both sides of the membrane. present only on the inner side of the membrane. A lipidated
A polytopic, type II membrane protein (f) has more than one bitopic integral membrane protein is shown in (i)
where amino and carboxyl terminals are located at the acylation forms. While myristoylation is an event
opposing sides of the separating membrane (Fig. 1d), where a myristoyl fatty acid group (Fig. 1h) is perma-
the peptide chain of type II starts and ends both on the nently attached exclusively to a glycine amino acid at
same side of the membrane (Fig. 1f). The membrane the beginning of the newly formed polypeptide
incorporation of an integral membrane protein takes sequence protein (Fig. 1e), the attachment of a palmi-
place already during protein biosynthesis in the lumen tate fatty acid group during palmitoylation to any
of the endoplasmic reticulum (ER) following the cysteine residue of a finished protein is a reversible
detection of a signal peptide by the translation post-translational modification. Isoprenylation is yet
machinery. another form of post-translational lipid modification
Rather than by a membrane-traversing peptide where an isoprenyl lipid group, either a farnesyl or
sequence, peripheral membrane proteins associate geranylgeranyl lipid, is attached to a cysteine amino
with the membranes through hydrophobic interactions acid resides at the end of the protein. In general,
between the membrane lipids and a hydrophobic a lipidated membrane protein can attach itself either
polypeptide sequence (Fig. 1g) or via a lipid molecule on the outer membrane of a cytosolic organelle, or on
(Fig. 1e). The attachment of a lipid group to a soluble the intracellular leaflet of the plasma membrane, and
protein can provide sufficient hydrophobicity to facil- thereby executes its function in the cytosolic volume of
itate membrane attachment, and the reversible nature a cell. Yet another family of lipid-modified proteins
of some lipid modifications can act as a regulatory is formed by glycosylphosphatidylinositol (GPI)-
switch to concentrate a protein from soluble, cytosolic anchored proteins synthesized directly on a GPI-lipid
form to a membrane-bound form. A signal sequence, residing on the ER membrane. GPI-anchored mem-
which is recognized by lipid transfer enzymes (such as brane proteins are transported to the outer surface of
palmitoyl-, myristoyl-, and prenyl transferases), leads the plasma membrane and since their peptide sequence
to the covalent attachment of a palmitoyl or a myristoyl is exposed toward the extracellular space, they execute
fatty acid, an isoprenyl fatty acid lipid group, or their function toward the extracellular matrix, i.e.,
a glycosylphosphatidylinositol-(GPI) anchor attached outside the cells.
to a cysteine or glycine amino acid. Timing of Provided that the required targeting signal is present
acylation and the position of the attached fatty acid within the peptide sequence, an integral membrane
group are the major differences between the various protein can additionally be lipidated in a regulated
Cell Membrane Proteins 363 C
manner, for example, with a palmitate fatty acid. Such of epidermal growth factor (EGF) to the extracellular
a lipid molecule provides a membrane protein affinity binding pocket of EGFR induces a conformational
toward specialized membrane regions, such as change to the cytoplasmic tail of the receptor, which
caveolae or other cholesterol- and sphingolipid-rich then leads to the assembly of signaling complex and
lipid microdomains (also termed as lipid rafts, Ikonen subsequent downstream events, such as remodelling of
and Simons 1997). Thus, reversible lipidation may act the cellular cytoskeleton. Yet another functional mode
as a switch in signaling efficiency by moving an inac- for membrane proteins is the selective transport of C
tive protein from the bulk membrane inside molecules across the membrane via a channel protein
a specialized signaling platform. An example of such or protein complex. As the hydrophobic lipid bilayer is
switchable integral membrane protein is the Neuronal not readily passable for nonpolar solutes, their trans-
Cell Adhesion Molecule (NCAM) that can not only be port is mediated by a physical pore or a channel formed
palmitoylated to intracellular cysteine residues by by one polytopic membrane protein, or as a complex of
palmitoyl transferase, but also de-palmitoylated upon several bi- and/or polytopic membrane proteins.
need. The location and function of NCAM critically Within such a complex, the hydrophobic amino acid
depends on its lipidation status and subsequent resi- side chains orient themselves toward the lipid environ-
dence in cholesterol-rich membrane regions. ment, and the hydrophobic side chains together form
the passage. Traffic through the passage can be pas-
Mixed Membrane Attachment Types sive, i.e., the solutes flow from higher concentration
Hydrophobic amino acids can be used to build poly- toward lower concentration across the membrane; or
peptide chains with one-sided hydrophobicity. Upon active, where the transfer against a chemical gradient is
orientation of the hydrophobic side of the protein coupled to the use of energy, such as ATP. So-called
toward a membrane, it can be protected from aqueous cotransporters couple the chemical concentration gra-
environment. This facilitates membrane association dient to the transport of molecules against a gradient.
and results in a conformation where a part of the
protein rather lays “flat” on the plane of the membrane Membrane Proteins Move Rapidly in a Regulated
(Fig. 1g) while other parts of the protein can freely Manner
move around and possibly also obtain a lipid modifi- Experimental work has shown that membrane proteins
cation (below). Presence of such a amphiphatic are in constant diffusional movement along the mem-
domain in a soluble protein can provide sufficient brane plane and around their own axis. An integral
affinity for membrane attachment; alternatively, via membrane protein can not only freely rotate around
such a domain an integral membrane protein can inter- itself at high speed but is also able to diffuse across
act with certain lipid groups on the membrane, and a distance, corresponding to the periphery of a cell at
possibly use it for targeting along the membrane several times per second. Equally, the proteinaceous
plane to regions with specific properties. part of a lipid-modified peripheral membrane protein
can freely rotate around its lipid anchor and also along
Operation Modes of Membrane Proteins the membrane plane at high speed. However, in cellu-
Basically, two action modes exist for membrane pro- lar plasma membranes, the lateral movement of inte-
teins, and the type of membrane attachment is indica- gral and peripheral membrane proteins is restricted by
tive for the functional mode. The activity of specialized domains below and upon the membranes.
a monotopic membrane protein is restricted to one An example of restricted movement of cell membrane
side of a membrane (cis-mode). For example, proteins can be found from epithelial cell layer lining
a lipidated kinase bound on the cytoplasmic side of the digestive system. Here, the surrounding membrane
the lipid bilayer only phosphorylates other cytoplasmic of the epithelial cells is divided by a tight junction to
proteins (soluble or membrane bound). In contrast, basolateral and apical sides both having a unique mem-
a membrane-traversing bi- or polytopic membrane brane protein composition that does not mix with the
protein is likely to operate on both sides of the mem- other side.
brane in trans-mode. One concrete example are the Recent data shows that cellular cytoskeleton just
family of receptor tyrosine kinases, e.g., the receptor beneath the cell plasma membrane forms “picket
for epidermal growth factor (EGFR). Here, the binding fences,” or “pockets”. Where the poles are membrane
C 364 Cell Membrane Toponomics
proteins interacting with cytoskeletal components, enormous number of distinct proteins in order to orga-
such as actin fibers (Kusumi et al. 2011). The latter, nize the myriad of different membrane functionalities
in turn, form the boundaries of the pockets. Once inside (▶ Cell Membrane Proteins). These proteins serve
such a pocket, a protein is free to move around in a variety of essential functions, e.g., establishing and
lateral direction; however, due to the restricting cyto- controlling cell adhesion, inducing intracellular signal-
skeleton, energy is required for an integral membrane ing (by behaving as receptors), maintaining the ion
protein to jump from one pocket to another one during homeostasis (by acting as ion channels), as well as
a so-called hop-diffusion. Thus, the lateral movement controlling environmental adaptation. A fundamental
along the membrane plane is reduced and controlled by biological question is how the approx. 8,000 distinct
the cellular cytoskeleton. cell surface proteins are spatially organized as supramo-
Another cellular system provides restriction lecular functional units (toponome) in one and the same
toward lateral movement. Clusters of cholesterol and cell membrane, and how these units are altered in
sphingolipids form lipid microdomains with specific chronic diseases, such as cancer and Alzheimer’s
properties, which are significantly different from the disease. This field of research is referred to as
bulk of the membrane. Such “lipid rafts” attract both membrane toponomics. It is a subdiscipline in
integral and lipid–modified peripheral proteins to form ▶ toponomics related to the functional detection of pro-
specialized signaling platforms. Interestingly, raft asso- tein networks of cell membranes in morphologically
ciation can be stimulated and modulated upon post- intact cells or tissue sections by using TIS (▶ TIS
translational attachment of specific lipid molecules, as Robot) microscopy. It has been shown that quantitative
described above for the NCAM cell adhesion molecule. information on the subcellular topology and differential
assembly of distinct proteins and their spatial relation-
ships on the cell surface are straightforward to the
References detection of lead proteins hierarchically controlling
cell function: For instance, the systematic mapping of
Alberts B (1994) Molecular biology of the cell, 3rd edn. Garland cell surface proteins in rhabdomyosarcoma cells
Publishing, New York
(Schubert 2010) led to understanding on how tumor
Ikonen E, Simons K (1997) Functional rafts in cell membranes.
Nature 387:569 cells establish multi-protein cluster networks composed
Jennings ML (1989) Topography of membrane proteins. Annu of different classes of CD proteins required for cell
Rev Biochem 58:999 polarization/migration and tumor morphogenesis
Kusumi A et al (2011) Hierarchical mesoscale domain organi-
(Fig. 1). The CD system is a collection of a variety of
zation of the plasma membrane. Trends Biochem Sci 36:604
distinct molecular classes present on the surface of
leucocytes and, in part, also on the surface of many
other cell types (so-called Cluster of Differentiation
Cell Membrane Toponomics (CD) antigens; Zola 2001; Barclay et al. 1995). CD
antigens are held to belong to the best characterized
Anne Gieseler molecules of the cell surface membrane. An example
Molecular Pattern Recognition Research (MPRR) of co-mapping of many CD proteins has revealed a
Group, Otto-von-Guericke-University Magdeburg tumor cell specific surface toponome (Fig. 1): Underly-
Medical Faculty, Magdeburg, Germany ing experiments have shown that blocking the lead pro-
tein CD 13 (present in all multi-protein clusters, or,
▶ CMPs) leads to disassembly of the corresponding
Definition protein network and loss of function of the cell to polar-
ize. This underscores the relevance of cell surface
Any cell is a collection of highly coordinated subcellu- toponomics in understanding chronic disease and finding
lar compartments formed by unit membranes. Unit novel drug-target candidates. Analyzing the toponome
membranes are highly organized as lipid bilayers with of cell surface proteins will provide access to the mech-
a hydrophilic external and internal part and anisms underlying self-organization and cell control in
a hydrophobic intermediate part (Lodish et al. 2000). health and disease (Schubert et al. 2011; ▶ Clinical
This basic structure allows the cell to integrate an Aspects of the Toponome Imaging System (TIS)).
Cell Membrane Toponomics 365 C
Cell Membrane Toponomics, Fig. 1 Illustration showing of the supramolecular structure of the single molecular compo-
a scheme of the molecular protein species combined in various nents within these clusters is shown from (c–g). The colors
clusters along the cell surface of cell extensions (a and b) of “red,” “dark blue,” “yellow” “light blue,” “green” and corre-
a rhabdomyosarcoma tumor cell. The sequential arrangement spond to the colors in the toponome maps in (a) and (b). Scale
and geometry of the corresponding multi-protein clusters is bar: 2 mm (Modified after Schubert 2010)
highly similar in both these extensions. Note that the scheme
C 366 Cell Proliferation
Characteristics
Cell Proliferation
1. Highly efficient: flow cytometer can examine hun-
▶ Modeling, Cell Division and Proliferation dreds of cells in a second and decide which cell or
particle is interested and charge it to be separated;
this is almost the most high-speed sorter now days.
2. High purity: If only the program has been set by the
Cell Proliferation Process user, FACS can get a pure group of the interested
cells since the technique is based on the reaction of
▶ Cell Cycle, Biology antigen and antibody.
3. Accurate number control of sorted cells or particles:
FACS can even put a single cell or particle into
committed container, so it is easy to control the
interested cells number according to the user.
Cell Selection
Basic Instruments for Cell Sorting
▶ Lymphocyte Dynamics and Repertoires, Biological Cell sorting of FACS needs the help of flow cytometry
Methods which includes the following: the fluidics system, the
optics and detection, the signal processing, and elec-
trostatic cell sorting.
a b
catcher catcher
waste waste
harvest tube harvest tube
sheath sheath
sample sample
Cell Sorting, Fig. 1 Cell sorter that sorting cells by catcher shifting
CellML Model Curation, Fig. 1 (a) Published figure showing the results of the original model (Goldbeter (2005) Proc Biol Sci) and
(b) the simulation output from the CellML version of the model
CellML Model Curation, Fig. 2 A comparison of the verbose raw MathML equation and the same equation rendered in COR
CellML Model Curation 375 C
CellML model was translated from the published paper a more descriptive set of ▶ curation flags. Initially,
or from the original model code. these will be based on the ▶ MIRIAM standard: the
There are four defined ▶ curation levels: Minimum Information Requested in the Annotation
• Level 0: Not curated. of Biochemical Models (Le Novere et al. 2005). This
• Level 1: The CellML model description is consis- is an agreed set of rules for curating quantitative
tent with the mathematics in the original published models of biological systems which ensure
paper. a consistent standard of curation between models in C
• Level 2: The CellML model has been checked for: distinct databases.
– Typographical errors
– Consistency of units Conclusions
– Completeness, such that all parameters and ini- We encourage model developers to publish their com-
tial conditions have been defined putational code in a common format, such as CellML,
– Overconstraints, such that the model does not concurrent with the manuscript of their paper. Indeed,
contain equations or initial values which are certain journals are already encouraging modelers to
either redundant or inconsistent submit their model code in SBML. However, until
– Consistency of simulation output with the code submission becomes an obligatory step getting
published results, such that running the model a model reviewed and published, the processes of
in an appropriate simulation environment repro- model translation and ▶ curation will remain
duces the results in the original paper essential.
• Level 3: The model has been checked to the extent
that it is known to satisfy physical constraints such
as conservation of mass, momentum, charge,
etc. This level of ▶ curation needs to be conducted References
by a specialized domain expert, who is not the
Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H,
original model author.
Arkin AP, Bornstein BJ, Bray D, Cornish-Bowden A,
However, flaws have been identified in this Cuellar AA, Dronov S, Gilles ED, Ginkel M, Gor V,
▶ curation rating system. The system is not hierarchi- Goryanin II, Hedley WJ, Hodgman TC, Hofmeyr JH,
cal – that is, a model assigned level 2 does not Hunter PJ, Juty NS, Kasberger JL, Kremling A,
Kummer U, Le Novere N, Loew LM, Lucio D, Mendes P,
necessarily meet the criteria for level 1. In fact,
Minch E, Mjolsness ED, Nakayama Y, Nelson MR,
the two levels are often mutually exclusive because Nielsen PF, Sakurada T, Schaff JC, Shapiro BE,
in order to get the CellML model to replicate the Shimizu TS, Spence HD, Stelling J, Takahashi K,
published results (level 2) we have to adjust Tomita M, Wagner J, Wang J (2003) The systems biology
markup language (SBML): a medium for representation and
the published model description (level 1). Further,
exchange of biochemical network models. Bioinformatics
we have observed that the current “star system” can 19:524–531
lead to uncertainty as to how to interpret the Le Novere N, Bornstein B, Broicher A, Courtot M, Donizelli M,
▶ curation status of a model. For example, it is Dharuri H, Li L, Sauro H, Schilstra M, Shapiro B, Snoep JL,
Hucka M (2006) BioModels database: a free, centralized
often a misconception that unless a model has been
database of curated, published, quantitative kinetic models
assigned three stars, it is not working. The reality of biochemical and cellular systems. Nucleic Acids Res 34:
is that domain experts are usually very busy individ- D689–D691
uals who do not have the time to spend on Le Novere N, Finney A, Hucka M, Bhalla US, Campagne F,
Collado-Vides J, Crampin EJ, Halstead M, Klipp E,
checking other people’s models. Also, since most
Mendes P, Nielsen P, Sauro H, Shapiro B, Snoep JL,
of the CellML models have been derived from Spence HD, Wanner BL (2005) Minimum information
published papers, we assume that the model has requested in the annotation of biochemical models
already undergone peer review and the reviewers (MIRIAM). Nat Biotechnol 23:1509–1515
Lloyd CM, Halstead MD, Nielsen PF (2004) CellML: its future,
were satisfied. present and past. Prog Biophys Mol Biol 85:433–450
To solve this problem of misinterpretation, we are Lloyd CM, Lawson JR, Hunter PJ, Nielsen PF (2008) The
proposing to replace the current star system with cellML model repository. Bioinformatics 24:2122–2123
C 376 CellML Model Repository
a newer version, the new model has to be committed to ▶ Distributed Version Control System (DVCS)
the local clones of an existing ▶ workspace with ▶ JWS Online
a comment on what has been changed, and then pushed ▶ Systems Biology Markup Language (SBML)
back into the repository. It is then the responsibility of
the curators to check the quality of the model before
they ▶ expose (or publish) it. Alternatively, if the References
model being added is a completely new one, the crea-
tion of a new ▶ workspace is required. Depending on Hines ML, Morse T, Migliore M, Carnevale NT, Shepherd GM
(2004) ModelDB: a database to suport computational neuro-
the permissions granted to the user, they can either
science. J Comput Neurosci 17:7–11
create a new ▶ workspace themselves, or they have Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H,
to ask the curators to create one on their behalf. The Arkin AP, Bornstein BJ, Bray D, Cornish-Bowden A,
same pushing, commenting, exposure, and publication Cuellar AA, Dronov S, Gilles ED, Ginkel M, Gor V,
Goryanin II, Hedley WJ, Hodgman TC, Hofmeyr JH,
process then takes place.
Hunter PJ, Juty NS, Kasberger JL, Kremling A, Kummer U,
Le Novere N, Loew LM, Lucio D, Mendes P, Minch E,
Private Models and Password Protection Mjolsness ED, Nakayama Y, Nelson MR, Nielsen PF,
Sometimes model developers may want to utilize the Sakurada T, Schaff JC, Shapiro BE, Shimizu TS, Spence HD,
Stelling J, Takahashi K, Tomita M, Wagner J, Wang J (2003)
▶ Distributed Version Control System (DVCS) feature
The systems biology markup language (SBML): a medium for
of the CellML model repository, to track the changes representation and exchange of biochemical network models.
made to their model during the development process, Bioinformatics 19:524–531
without making their model publicly available until it Le Novere N, Bornstein B, Broicher A, Courtot M, Donizelli M,
Dharuri H, Li L, Sauro H, Schilstra M, Shapiro B, Snoep JL,
is complete. They can work in a private ▶ workspace
Hucka M (2006) BioModels database: a free, centralized
and choose not to ▶ expose their model until the very database of curated, published, quantitative kinetic models
end. Until they are made publicly available, models of biochemical and cellular systems. Nucleic Acids Res 34:
will not come up in search listings. Alternatively, D689–D691
Lloyd CM, Lawson JR, Hunter PJ, Nielsen PF (2008) The
a modeler may want to provide only a few, select
cellML model repository. Bioinformatics 24:2122–2123
individuals with access to their model, such as collab- Olivier BG, Snoep JL (2004) Web-based kinetic modelling using
orators, or reviewers of a journal article. If this is the JWS online. Bioinformatics 13:2134–2144
case the model can be ▶ expose but user access to
this exposure can be controlled by the repository
administrator granting permission to only a few spe-
cific users.
CellML
Persistent Model References
Each model entry in the repository has a unique iden- Andrew K. Miller1, Randall D. Britten1, Catherine M.
tifier in the form of a unique URL. Even if an older Lloyd1, David P. Nickerson1 and Poul M. Nielsen2
1
version of a model is superseded by a newer published Auckland Bioengineering Institute, University of
version, the URL of the old model is retained, with Auckland, Auckland, New Zealand
2
a note that that particular version of the model has been Department of Engineering Science, University of
superseded, or “expired.” This feature allows Auckland, Auckland, New Zealand
a modeler to refer to a specific model version, at
a specific point in time, and they can be confident
that a link in a journal article, or in correspondence Definition
with a colleague, will remain unbroken.
CellML (Hedley et al. 2001; Cuellar et al. 2006) is
a format for specifying mathematical models, building
Cross-References upon Extensible Markup Language (XML). It is
supported by a range of software tools and databases,
▶ BioModels Database: A Repository of and is most commonly used to specify mathematical
Mathematical Models of Biological Processes models of biological systems.
CellML 379 C
Characteristics used to represent parameters, state variables, and
named constants.
Systems biology researchers often need to exchange In CellML, the physical units of all variables and
mathematical models. One approach is to publish constant numbers (which can be dimensionless) must
a paper containing the mathematical equations in the be specified. If units are specified, they may either be
model. However, the process of manually converting one of the standard SI base or derived units, or units
a model to mathematical equations, and then back defined in the model. New units can be defined in terms C
into a computer modeling format is both time con- of other units with specified multipliers and offsets
suming and error prone. Another approach is to (e.g., a millimeter can be defined as a thousandth of
exchange programs for solving the model in a meter, nanomolar can be defined as a billionth of
a procedural language (such as Fortran or MATLAB). a mole per liter), and new base units can also be
This approach has several major drawbacks: it tends defined.
to result in models where the numerical algorithms A variable in one component can be connected to
used are intermixed with the model, it is difficult to a variable in another component. A connection
compose two models to make a model incorporating between two variables means that the two variables
features of each model, and it is hard to use the model represent the same quantity. If the two connected vari-
for any purpose other than the one it was originally ables are in the same physical units, then the connec-
coded for. tion is treated as equality. If they are in different, but
These problems are avoided by the CellML format dimensionally compatible units, then the CellML
(http://www.cellml.org), which acts as a container for processing software will automatically apply the
mathematical expressions encoded in Mathematical appropriate units conversion factor.
Markup Language (MathML), and so can represent In a CellML model, components can be arranged in
a wide variety of models. CellML tools are generally an encapsulation hierarchy. This can be used so that
focused on supporting the class of problems known some components (the encapsulation children)
as systems of differential algebraic equations describe the internal details of another component,
(DAEs). which provides a higher level interface (the encapsu-
CellML has been used to describe a wide range of lation parent). The internal details of each encapsula-
models relevant to systems biology. At the time of tion child component can in turn be described in the
writing, the CellML Model Repository included 500 next level of the encapsulation hierarchy, and so on to
models (including electrophysiology, gene regula- whatever depth is required.
tion, metabolism, and signal transduction models To ensure that there is a distinction between the
(Cooling et al. 2007; Nickerson and Buist 2008; internal and externally exposed parts of a model,
Terkildsen et al. 2008) through to endocrine models). components have two types of interface, the public
CellML allows models to be built in a reusable and interface and the private interface. A connection
modular fashion, and this has been exploited to build between variables in encapsulation siblings uses the
a modular library of synthetic biology models public interface of both components. A connection
(Cooling et al. 2010). between an encapsulation parent and an encapsula-
All mathematical expressions in a CellML model tion child uses the public interface of the encapsula-
are contained within components. A component is tion child and the private interface of the
a container that may represent concrete or abstract encapsulation parent. Variables can have different
concepts. For example, a component could represent visibility on the public and private interfaces. Their
an ion channel, a compartment in a cell, the surround- visibility is directional, and is set to either “in” or
ing environment of a model, or the collection of adjust- “out,” which indicates that the variable is visible on
able parameters. that interface, and may be connected to a variable of
Each CellML component may contain variables, opposite directionality. Visibility defaults to “none,”
which are named values that may or may not meaning that the variable is not visible on that
change with respect to other variables. Every named interface.
value in the mathematics within a component In large systems biology models, the same mathe-
needs to be declared as a variable; variables are matical constructs are often used multiple times, both
C 380 CellML
within a model, and across different models. For exam- licensed under Free/Open Source licenses so that it
ple, a particular mathematical formulation for an ion can be freely used. In addition, bindings that
channel might be used to represent both sodium and allow this implementation to be accessed from other
potassium channels in a cell model, and might be languages are available (e.g., for Python and Java).
reused in other models. Duplicating the markup for Modelers can use CellML as the primary source
the definition of an ion channel component multiple language for models, while leveraging numerical
times would be a repetitive and error-prone approach, tools in other environments. This is achieved using an
and improvements to the component would need to be API extension module that allows tools to convert
added manually to all copies. To address this, CellML models from CellML into procedural programming
1.1 uses a mechanism called importing. Importing languages like C or MATLAB.
allows components and units to be imported by linking In addition, modelers can make use of several pop-
to the Uniform Resource Identifier (URI) of the ular stiff and non-stiff numerical integrators, using an
imported model document from within the importing API module that allows simulations of models to be
model document. When a component is imported, run with numerical integrators from the GSL, CVODE,
the entire encapsulation hierarchy underneath it is and IDA packages.
implicitly imported, including the connections Another API module can be used to validate that
between components in that encapsulation hierarchy, models follow the rules of the CellML specification, as
so that complex submodels made up of multiple well as good modeling practices. For example, the
components can be imported. validation service can generate a warning if the units
Using the structures described so far, CellML in a mathematical expression are inconsistent.
allows the mathematics underlying a model to be There is also a range of tools for use by modelers:
described. However, for a model to be useful, it is one of these is OpenCell, which is based on the CellML
necessary to describe the correspondence between API. It is intended to be used by people creating or
parts of the model and parts of the real-world system working with CellML models. Models can be edited
being modeled. For this reason, CellML models can and validated, simulations can be run and results can be
include information, known as metadata, about plotted on graph axes. OpenCell is specifically
the model and its parts (Beard et al. 2009). This meta- designed to support working with multiple models
data is normally represented within the CellML model simultaneously, and allows the results from different
in Resource Description Format (RDF). Metadata models to be compared graphically on the same set of
specifications exist for basic information about a axes.
model, such as the details of the publication it is OpenCell also supports session files, which
based on, and the chemical species represented by a are a stored description of the complete OpenCell work-
concentration variable. There are also metadata speci- ing environment including the list of open models and
fications for stipulating parameters to numerical inte- information on which graphs to display. Session files
grators, and how to graphically display simulation can also include embedded figures; these figures can be
results. made interactive so that clicking on part of a figure will
A number of tools and libraries capable of bring up a corresponding trace on the graph.
processing CellML models have been developed COR (http://cor.physiol.ox.ac.uk) is another popu-
(Garny et al. 2008); a listing can be found at http:// lar software application for modelers, focused on the
www.cellml.org/tools. editing and use in simulation of CellML, with similar
The CellML API (Miller et al. 2010) is a description features to OpenCell. Other end-user tools that do not
of a programming interface for working with CellML focus on CellML, but include support for it are JSim,
models. Software developers can use the CellML API VirtualCell, and insilicoIDE.
when incorporating support for the CellML format into An important bioengineering application of CellML
their software. The API includes a core for performing is for multiscale organ simulation. Two simulation
basic model manipulations, as well as a number of systems that allow part of the mathematical model to
optional extension modules. The API comes with an be specified by means of CellML are Chaste and
efficient full reference implementation in C++, OpenCMISS.
Cellular Automata 381 C
Cross-References
Cellular Automata
▶ CellML Model Repository
▶ Systems Biology Markup Language (SBML) Carlo Cosentino, Basilio Vescio and Francesco Amato
Department of Clinical and Experimental Medicine,
Magna Graecia University of Catanzaro,
References Catanzaro, Italy C
Beard DA, Britten R, Cooling MT, Garny A, Halstead MDB,
Hunter PJ, Lawson J, Lloyd CM, Marsh J, Miller A et al
(2009) CellML metadata standards, associated tools and
Synonyms
repositories. Phil Trans R Soc A: Math Phys Eng Sci
367(1895):1845 Cellular spaces; Cellular structures; Homogeneous
Cooling M, Hunter P, Crampin EJ (2007) Modeling hypertrophic structures; Iterative arrays; Tessellation automata;
IP3 transients in the cardiac myocyte. Biophys
Tessellation structures
J 93(10):3421–3433
Cooling MT, Rouilly V, Misirli G, Lawson J, Yu T, Hallinan J,
Wipat A (2010) Standard virtual biological parts:
a repository of modular modeling components for synthetic Definition
biology. Bioinformatics 26(7):925
Cuellar A, Nielsen P, Halstead M, Bullivant D, Nickerson D,
Hedley W, Nelson M, Lloyd C (2006) CellML 1.1 Specifi- A cellular automaton (pl. cellular automata, abbrev.
cation. CellML Specification. http://www.cellml.org/specifi- CA) is a discrete model studied in computability
cations/cellml_1.1 theory, mathematics, physics, complexity science
Garny A, Nickerson DP, Cooper J, Santos RW, Miller AK,
(▶ Complexity), theoretical biology, and microstruc-
McKeever S, Nielsen PMF, Hunter PJ (2008) CellML and
associated tools and techniques. Phil Trans R Soc A: Math ture modeling. They are used to build parallel comput-
Phys Eng Sci 366(1878):3017 ing (▶ Parallel Computing, Data Parallelism)
Hedley WJ, Nelson MR, Bullivant DP, Nielsen PF architectures as well as to model and simulate
(2001) A short introduction to CellML. Phil Trans R Soc
physical systems. Moreover they can be used to inves-
Lond A: Math Phys Eng Sci 359(1783):1073
Miller AK, Marsh J, Reeve A, Garny A, Britten R, Halstead M, tigate and reproduce the emergence of patterns
Cooper J, Nickerson DP, Nielsen PF (2010) An overview of (▶ Spatiotemporal Pattern Formation), self-replication
the CellML API and its implementation. BMC Bioinformat- and self-similarity properties in natural systems.
ics 11(1):178
Nickerson D, Buist M (2008) Practical application of CellML
1.1: the integration of new mechanisms into a human
ventricular myocyte model. Prog Biophys Mol Biol Characteristics
98(1):38–51
Terkildsen JR, Niederer S, Crampin EJ, Hunter P, Smith NP
History
(2008) Using physiome standards to couple cellular func-
tions for rat cardiac excitation-contraction. Exp Physiol CA were first developed in the 1940s by Stanislaw
93(7):919–929 Ulam, who was investigating the growth of crystals,
using a simple lattice network as his model, and by
John Von Neumann, who was working on the problem
of self-replicating systems (Von Neumann 1966).
Cell-to-Cell Communication In 1946, Norbert Wiener and Arturo Rosenblueth
developed a cellular automaton model of excitable
▶ Cellular Communication media. Their specific motivation was the mathematical
description of impulse conduction in cardiac systems.
Their original work continues to be cited in modern
research publications on cardiac arrhythmia and
Cellular Aging excitable systems (Wiener and Rosenblueth 1946). In
the 1960s, cellular automata were studied as
▶ Senescence a particular type of dynamical systems and the
C 382 Cellular Automata
Cellular Automata, a b
Fig. 1 (a) Moore
neighborhood; (b) Von
Neumann neighborhood
connection with the mathematical field of symbolic of two-dimensional CA, the most used neighborhoods
dynamics was established for the first time (Hedlund are Moore neighborhood and Von Neumann neighbor-
1969). hood (Fig. 1). More formally, if we refer to a certain
In the 1970s, K. Zuse proposed that the physical cell in the grid as a location identified by discrete
laws of the universe are discrete by nature, and that the coordinates (x, y), being (x0, y0) the central cell and
entire universe is the output of a deterministic compu- being r the neighborhood radius, we can define its
tation on a giant cellular automaton (Zuse 1982). In Moore neighborhood as
those same years, a two-state, two-dimensional cellu-
lar automaton named “Game of Life” became very NM ðx0 ; y0 Þ ¼ fðx; yÞ : jx x0 j <¼ r; jy y0 j <¼ rg;
widely known, particularly among the early computing (1)
community. The Life CA is able to perform computa-
tions, and after much effort, it has been shown that the while its Von Neumann neighborhood is
Game of Life can emulate a universal Turing machine
(Rendell 2002). NV ðx0 ; y0 Þ ¼ fðx; yÞ : jx x0 j þ jy y0 j <¼ rg:
In the 1980s, Stephen Wolfram started a
(2)
systematical investigation of a very basic but essen-
tially unknown class of cellular automata, which he
Given N, the total number of cells, and being a ¼ (x, y)
called “elementary cellular automata.” The unexpected
a generic cell, we can define the neighboring function
complexity of the behavior of these simple automata
as (Sipper 1997) g : N N ! 2 N N, defined as
led Wolfram to suspect that ▶ complexity in nature
may be due to similar mechanisms (Wolfram 2002).
gðaÞ ¼ fa þ d1 ; a þ d2 ; . . . ; a þ dn g; (3)
Cellular Automata Structure
for all a 2 N N, where di 2 N N, i ¼ 1, . . ., n,
A CA consists of a regular grid of elements, named
is fixed.
cells, each in one of a finite number of states, such as
“On” and “Off” or “0” and “1.” The grid can be in any
finite number of dimensions. Rule
Each cell contains a copy of the finite automaton
Neighborhood (V, v0, f), where V is the set of cellular states, v0 is
For each cell, a set of cells called “neighborhood” is a particular state called quiescent state, and f is the
defined, consisting of the cell itself, together with the transition function f : V n ! V. This function is
surrounding cells within a certain distance. In the case subject to the constraint f(v0, v0, . . ., v0) ¼ v0, that
Cellular Automata 383 C
8
is, a cell whose neighborhood is entirely quiescent < 0; S<2_S>3
remains quiescent. The state vt(a) of a cell a at time vtþ1 ðaÞ ¼ nt ðaÞ; S¼2 (4)
:
t is precisely the state of its associated automaton at 1; S¼3
time t. Each cell a is connected to the n neighbor cells
a + d1, a + d2, . . ., a + dn. The neighborhood state
function ht : I I ! V n is defined by ht(a) ¼ (vt(a), A compact representation of this rule is“B3/S23”:
vt(a + d2), . . ., vt(a + dn)), assuming that d1 ¼ (0, 0). This essentially means that, for the next generation, C
Now we can relate the neighborhood state of a cell a a cell has a new birth (B) if its neighborhood consists of
at time t to the state of that cell at time t + 1 by exactly 3 alive cells and survives (S) if there are two or
f(ht(a)) ¼ vt + 1(a). The function f is referred to as three living cells in its Moore neighborhood (r ¼ 1).
the CA rule and is usually given in the form of a rule
table, specifying all possible pairs of the form (ht(a), Properties
vt + 1(a)). Such a pair is termed a transition or rule- CA are actually simple computer programs capable of
table entry. An allowable assignment of states to all a remarkable range of behaviors. Some CA have been
cells in the space is called a configuration. By applying proven to be universal computers. Others exhibit prop-
the transition function to an allowable configuration, erties familiar from traditional science, such as
a new configuration is generated and reiterating this thermodynamic behavior, continuum behavior, con-
process gives out a succession of configurations, called served quantities, percolation, sensitive dependence
evolution. A rule (and its corresponding CA) is said to on initial conditions, and others. They have been used
be totalistic when the value of a cell at time t depends as models of traffic, material fracture, crystal growth,
only on the sum of the values of the cells in its biological growth, and various sociological, geologi-
neighborhood (possibly including the cell itself) at cal, and ecological phenomena. Another feature of
time t 1. simple programs is that making them more compli-
cated seems to have little effect on their overall
Evolution ▶ complexity and this argument has been used as evi-
An initial state (time t ¼ 0) is selected by assigning dence that simple programs are enough to capture the
a state for each cell. A new generation is created essence of almost any complex system (▶ Complex
(advancing t by 1), according to rule f. For example, System). Von Neumann showed that some CA can be
the rule might be that the cell is “On” in the next universal, in the sense that they can perform every
generation if exactly two of the cells in the neighbor- computation, just like a universal Turing machine.
hood are “On” in the current configuration, otherwise The Game of Life, given a proper initial configura-
the cell is “Off” in the next generation. In uniform CA, tion, can generate gliders (Fig. 2), or patterns that are
the rule for updating the state of cells is the same for able to replicate and propagate themselves through the
each cell, while it may change in nonuniform CA. grid. This property has revealed to be useful for
Typically, the rule does not change over time, and is performing calculations, and a universal Turing
applied to the whole grid simultaneously. machine has been implemented in Life, thus demon-
strating the universality of this CA (Rendell 2002).
20
15
10
−5
Cellular Automata, Fig. 2 A glider in the Game of Life: This
pattern can replicate and propagate itself through the CA grid 0 5 10 15 20 25 30 35 40
Applications
The following are some of the many applications of
cellular automata for the simulation of physical systems.
Hydrodynamic modeling: Differential equations,
such as the Navier-Stokes equation, capture important
macroscopic aspects of fluid dynamics; however, their
implementation on a digital computer is not the equa-
tion themselves, but finite models obtained from them
by truncation and roundoff. It is possible to simulate
analogous macrodynamics starting directly from
discrete microscopic models, cellular automata that
idealize the motion and collisions of individual parti-
cles. Figure 3 shows the simulation of a flow through
a circular cylinder obtained by means of a lattice-gas
cellular automaton (Wolf-Gladrow 2005) (Lattice-Gas
Cellular Automaton Models for Biology).
Growth processes: This is useful for the study of
tumor growth dynamics in tissues. Cellular automata Cellular Automata, Fig. 4 A CA model of tumor growth
are used to model cell behavior and interactions at the
microscopic level, in order to simulate and evaluate
growth dynamics, vascularization or response to ther-
apy. Figure 4 was originated by a model that takes into needs of the tumor, cell-doubling time, and an imposed
account four cell types: healthy cells, proliferating spherical symmetry (Kansal et al. 2000).
tumor cells, non-proliferating tumor cells, and necrotic Pattern formation: Figures 5 and 6 show example
tumor cells, using a simple set of rules and a set of four patterns found in nature (Spatio-temporal pattern for-
microscopic parameters that account for the nutritional mation), together with their reproductions obtained by
Cellular Communication 385 C
Cellular Automata,
Fig. 5 (a) Snowflakes and
(b) a snowflake pattern
reproduced by means of a two-
dimensional CA on
a hexagonal grid
a b
Cellular Automata,
Fig. 6 (a) Cone shell (conus
textile) and (b) pattern
generated by the rule 30 CA
means of cellular automata evolution. Snowflakes Toffoli M, Margolus N (1987) Cellular automata
(Fig. 5) are obtained by running a two-dimensional machines: a new environment for modeling. MIT Press,
Boston
CA on a hexagonal grid, while cone shell patterns Von Neumann J (1966) Theory of self-reproducing automata.
(Fig. 6) are generated by a rule 30 CA. University of Illinois Press, Urbana
Wiener N, Rosenblueth A (1946) The mathematical formulation
of the problem of conduction of impulses in a network of
connected excitable elements, specifically in cardiac muscle.
Cross-References Arch Inst Cardiol Mex 16:205–265
Wolf-Gladrow DA (2005) Lattice-gas cellular automata and
▶ Complex System lattice boltzmann models – an introduction. Springer, Berlin
▶ Complexity Wolfram S (2002) A new kind of science. Wolfram Media,
Champaign
▶ Lattice-Gas Cellular Automaton Models Zuse K (1982) The computing universe. Int J Theo Phys
▶ Ordinary Differential Equation (ODE) 21:589–600
▶ Parallel Computing, Data Parallelism
▶ Spatiotemporal Pattern Formation
Definition References
Cell communication is an important process by which Alberts B, Bray D, Hopkin K, Johnson AD, Lewis J, Raff M,
Roberts K, Walter P (2004) Essential cell biology. Garland
a cell sends or receives information from other cells.
Science, New York
Individual cells, such as yeast, need to sense and
respond to their environment, and communicate
with other cells to have any kind of “social life.” For
example, when a yeast is ready to mate, it secrets Cellular Potts Model
a small protein called a mating factor. Neighboring
yeast cells of opposite “sex” can detect this chemical Anja Voß-B€ohme1, J€orn Starruß2 and Walter de Back2
1
mating call and respond by halting their cell cycle Center for Information Services and High
progress and reaching out toward the cell that emitted Performance Computing (ZIH), Technical University
the signal. In a multicellular organism, cells must Dresden, Dresden, Germany
2
interpret the multitude of signals they receive from Centre for High Performance Computing (ZIH),
other cells to help coordinate their behaviors, such as Technical University Dresden, Dresden, Germany
cell fate decision in embryo development (Alberts
et al. 2004).
In typical cell communications, the signaling Synonyms
cell produces a particular type of signal molecule
that is detected by the target cell. The target cell CPM; Glazier–Graner–Hogeweg model; Potts model,
possesses receptor proteins that recognize and cellular/extended
respond specifically to the signal molecule. Signals
can act over long or short range, depending on the
way in which the signal is transported. For example, Definition
hormones produced in endocrine cell are secreted
into the bloodstream and are often distributed A cellular Potts model (CPM) is a spatial lattice-based
widely throughout the body. In embryo development, formalism for the study of spatiotemporal behavior of
signal proteins (morphogens) disperse from localized biological cell populations. It can be used when the
synthesis sites to form concentration gradients details of intercellular interaction are essentially deter-
across the developing tissue. Neuronal signals mined by the shape and the size of the individual cells
are transmitted along axons to remote target cells. as well as the length of the contact area between
Cells with membrane-to-membrane interface can neighboring cells.
engage in contact-dependent signaling (Alberts et al. Formally, a cellular Potts model is a time-discrete
2004). Markov chain (▶ Markov Chain). It is a lattice model
Each cell can only respond to a limited set of where the individual cells are simply connected domains
signals selectively depending on the specific receptor of nodes with the same cell index. A CPM evolves by
proteins. Most extracellular signal molecules cannot updating the cells’ configuration by one pixel at a time
pass through the plasma membrane. They bind to based on probabilistic rules. These dynamics are
specific cell-surface receptor proteins to induce the interpreted to resemble membrane fluctuations, where
intracellular signal transduction pathway. Receptor one cell shrinks in volume by one lattice site and
proteins act as transducers, binding to signaling mol- a neighboring cell increases in volume by occupying
ecules, and converting the signal from one physical this site. The transition rules follow a modified
form to another. Most cell-surface receptor proteins ▶ Metropolis algorithm with respect to a Hamiltonian.
belong to one of three large families: ion-channel-
linked receptors, G-protein-linked receptors, or Characteristics
enzyme-linked receptors. These families differ from
each other by the nature of signals they generate when Problem
the extracellular signal molecule binds to them The biological structure and function typically result
(Alberts et al. 2004). from the complex interaction of a large number of
Cellular Potts Model 387 C
Cellular Potts Model,
Fig. 1 Cell-surface
interaction in the Cellular Medium
Potts Model. Three cells, each
JT1,M JT1,M JT1,M JT1,M
one covering several lattice
sites, interact with each other
at the cell surfaces. The JT1,M JT1,M JT1,M
strength J of the interaction
depends on the cell types, type JT1,M JT1,M JT1,M C
T1 depicted in red, type T2 in
green. There are also JT1,T1
Celltype 1 Celltype 1
interactions between the cells
and the medium (white) JT1,T2 JT1,T2
JT1,T2 JT1,T2
JT2,M JT2,M
components. When ▶ spatiotemporal pattern forma- expression of molecules that regulate intercellular
tion in cellular populations or tissues is considered, adhesion are responsible for cell sorting. Since then,
one is often interested in concluding characteristics of this formalism has been elaborated and applied to
the global, ▶ collective behavior of cell configurations study a wide range of morphogenetic phenomena in
from the individual properties of the cells and the developmental biology.
details of the intercellular interaction. However, even
if the basic cell properties and interactions are per- The Model
fectly known, it is possible that – due to the complex State Space
structure of the system – the collective traits cannot be A CPM assigns a value ðxÞ from a set
directly extrapolated from the individual properties. W ¼ f0; 1; :::; ng to each site x of a countable set S,
Therefore, appropriate mathematical models need to cp. Fig. 1. The set S resembles the discretized space
be designed and analyzed that help to accomplish this and is often chosen as a two- or three-dimensional
task on a theoretical basis. Cellular Potts models con- regular lattice. The set W ¼ f0; 1; :::; ng contains the
stitute a modeling framework that is applicable when so-called cell indices, where n 2 N is the absolute
the details of intercellular interaction are essentially number of cells that are considered in the model. The
determined by the shape and the size of the individual state of the system as a whole is described by config-
cells as well as the length of the contact area between urations Z 2 X ¼ Ws. Given a configuration Z 2 X,
neighboring cells. a cell is the set of all points in S with the same cell
This model class has been developed by Glazier and index, cellw :¼ fx 2 S : ðxÞ ¼ wg; w 2 Wnf0g:The
Graner (1993) in the context of cell sorting. The latter value 0 is assigned to a given node, if this node is not
refers to the observed segregation of heterotypic cell occupied by a cell but by medium. Each cell is of
aggregates into spatially confined homotypic cell clus- a certain cell type, which determines the migration
ters. The CPM was introduced to explore the tissue- and interaction properties of the cell, the set of all
scale consequences of the differential adhesion possible cell types being denoted by L. Denote by
hypothesis (▶ Differential Adhesion Hypothesis) that t : W ! L the map that assigns each cell its cell
holds that cell-type-dependent disparities in the type. A cell with index w 2 W has volume (for the
C 388 Cellular Potts Model
Kronecker symbol d it holds that dðu; vÞ ¼ 1 if u ¼ v Again mt , the target surface length, and at , the strength
and dðu; vÞ ¼ 0 otherwise) of the surface constraint, are parameters, t2L. Thus,
the typical structure of a CPM-Hamiltonian is
X
Vw ðÞ :¼ dðw; ðxÞÞ;
H ¼ Hs þ Hv þ H0 ; (4)
x2S
Limitations and Merits Glazier JA et al. CompuCell3D, an open source modeling envi-
From a theoretical perspective, the CPM is poorly ronment, www.compucell3d.org
Glazier JA, Graner F (1993) Simulation of the differential adhe-
understood. Hence, the analysis of CPM models can sion driven rearrangement of biological cells. Phys Rev
effectively only be performed by numerical simula- E 47(3):2128–2154
tion. Important mathematical methods, such as rigor- Glazier JA, Balter A, Poplawski NJ (2007) Magnetization to
ous spatiotemporal limit procedures to derive the laws morphogenesis: a brief history of the Glazier-Graner-
Hogeweg model. In: Anderson ARA, Chaplain MAJ,
that guide the behavior of certain macroscopic vari- Rejniak KA (eds) Single cell-based models in biology and
ables, are not yet available. Since the classical Metrop- medicine, Mathematics and biosciences in interaction.
olis algorithm (▶ Metropolis algorithm) is modified in Birkh€auser Verlag, Basel, pp 79–106
the CPM, these models differ in essential aspects from Marée AFM, Jilkine A, Dawes A, Grieneisen VA, Edelstein-
Keshet L (2006) Polarization and movement of keratocytes:
classical equilibrium models. In addition, CPMs have a multiscale modelling approach. Bull Math Biol
been criticized because their calibration is often 68(5):1169–1211. doi:10.1007/s11538-006-9131-7
nontrivial. Cellular behaviors are specified in an indi- Ouchi NB, Glazier JA, Rieu J, Upadhyaya A, Sawada Y
rect or phenomenological manner via the Hamiltonian (2003) Improving the realism of the cellular Potts model
in simulations of biological cells. Physica A 329(3–4):451–
and the modified Metropolis algorithm. Consequently, 458
the relation between the parameters that control the Savill NJ, Hogeweg P (1997) Modelling morphogenesis: from
dynamics of the CPM and the biological-physical single cells to crawling slugs. J Theor Biol 184(3):229–235
quantities they represent often remains allusive. Starruß J, Bley T, Sogaard-Andersen L, Deutsch A (2007) A new
mechanism for collective migration in Myxococcus xanthus.
Despite these limitations, the CPM formalism has J Stat Phys 128(1–2):269–286
found applications in many topics, mainly in the field Zajac M, Jones GL, Glazier JA (2003) Simulating convergent
of developmental biology. Its spatial and cell-centered extension by way of anisotropic differential adhesion.
nature renders it suitable for the study of phenomena J Theor Biol 222:247–259
where a mesoscopic description of individual cell
shape and motility is important. It provides a flexible
modeling framework that allows incorporation of
problem-specific extensions. Moreover, coupling the
CPM to auxiliary model formalisms enables the explo- Cellular Spaces
ration of the complex interplay between several factors
at different biological scales, acting at the intracellular, ▶ Cellular Automata
the intercellular, and the tissue level.
Tamaki Fujimori
Human Metabolome Technologies Inc., Tsuruoka,
References
Yamagata, Japan
Balter A, Merks RMH, Poplawski NJ, Swat M, Glazier JA
(2007) The Glazier-Graner-Hogeweg model: extensions,
future directions, and opportunities for further study. In: Synonyms
Anderson ARA, Chaplain MAJ, Rejniak KA (eds) Single
cell-based models in biology and medicine, Mathematics
and biosciences in interaction. Birkh€auser Verlag, Basel, Capillary electrophoresis mass spectrometry–based
pp 151–167 metabolomics
CE-MS-based Metabolomics 391 C
Definition property. The migration speed of metabolites is deter-
mined by hydrated ionic radius and valence. Because
Capillary electrophoresis (CE) mass spectrometry even the metabolites with the same mass can be sep-
(MS) is an analytical technique. CE is superior in arated based on the nature of CE, the separation of
separation efficiency of ionic metabolites. MS can pro- structural isomers is possible. For example, glucose
vide high sensitivity. The combination of CE with MS 1-phosphate, glucose 6-phosphate, and fructose
can lead to achieve the analytical platform with high 6-phosphate can be easily separated in CE. The CE C
separation capacity, resolution, and sensitivity. CE-MS analysis is also advantageous in terms of small vol-
has been often used for measurements of ionic metab- ume requirement (a few nanoliters) for sample
olites in metabolomics. CE-MS analysis is not suited to injection.
measurements of neutral metabolites. CE-MS analysis
is a complementary tool to reversed-phase LC-MS TOFMS
analysis which is specific to measurements of neutral Time-of-flight mass spectrometry (TOFMS) is often
metabolites. used as MS in CE-MS analysis. Accelerated ions by
applied voltage reach the detector under high vacuum.
Ions with low m/z reach the detector earlier than those
Characteristics with high m/z. The value of m/z is calculated on the
basis of the arrival time. TOFMS provides relatively
Selection of Analytical Machines high resolution, which enables estimation of composi-
Most of the metabolites in primary metabolism are tion formula.
ionic, hydrophilic, and polar. For example, metabo-
lites of glycolysis, TCA cycle, amino acid metabo- Peak Annotation
lism, and nucleotide synthesis show these Detected peak information including migration time
characteristics. Therefore, CE-MS analysis is suitable in CE and m/z is obtained in CE-TOFMS analysis.
for measurements of metabolites in primary metabo- Peaks of putative metabolites are assigned to those of
lism. Metabolomics using CE-MS analysis could lead known metabolites according to the peak informa-
to understandings of biochemical and biological tion. Human Metabolome Technologies Inc. pos-
mechanisms in primary metabolism. On the contrary, sesses library information about migration time and
CE-MS analysis is not suited to measurements of m/z of about 900 metabolites. These metabolites can
neutral metabolites because they cannot be separated be identified and quantified in reference to library
in CE. LC-MS is suitable for measurements of neutral information. Most amino acids, organic acids, sugar
metabolites. CE-MS analysis is a complementary tool phosphates, and nucleotides are included in the
to LC-MS analysis. Whether the usage of CE-MS or library.
LC-MS is selected in metabolomics depends on the
target metabolites. Metabolomics Application
CE-MS based metabolomics is applied to studies in
CE many fields including basic sciences, medical sci-
CE separates polar metabolites on the basis of their ences, nutrition, agriculture, and toxicology.
mass-to-charge ratios. The separation capacity of CE is Metabolomics could lead to elucidating of gene,
superior to that of gas chromatography (GC) or liquid enzyme, biomarker, and metabolic network. Specifi-
chromatography (LC). Although the sensitivity is low cally, CE-MS based metabolomics can depict the
in CE, it can be sufficiently compensated by high details of primary metabolism which consists of
sensitivity in MS. Low-molecular-weight metabolites many ionic metabolites. Several applications are
in biological samples can be analyzed in two modes for introduced below.
cationic and anionic metabolites. Soga et al. identified an oxidative biomarker in
After sample injection, the voltage is applied to hepatotoxicity by CE-MS analysis. The changes in
the capillary. Cationic metabolites migrate toward the liver metabolites following acetaminophen-induced
cathode side and anionic metabolites migrate toward hepatotoxicity were examined. Ophthalmic acid
the anode side on the basis of electrochemical increased in conjunction with glutathione
C 392 Cene
1 X X
lðGÞ ¼ splðu; u0 Þ; (1)
jV j ðjV j 1Þ u2V u0 2Vnfug
CFSE
where spl(u,u0 ) gives the number of edges in a shortest
▶ Modeling, Cell Division and Proliferation path between vertices u and u0 . In case of an
unconnected graph, the characteristic path length is
infinite, as the number of edges between two
unconnected vertices is considered infinite. In this
Chain Type case, formula (1) is often modified to sum over just
all connected vertex pairs. Alternatively, other mea-
▶ IMGT-ONTOLOGY, ChainType sures such as the harmonic mean or the average inverse
path length can be used.
and
Chemical Master Equation
Hao Ge1 and Hong Qian2 vji ¼ the change in the number of Si molecules
1
School of Mathematical Sciences and Centre for caused by one Rj reaction; ðj ¼ 1; ; M; i ¼ 1; ; NÞ:
Computational Systems Biology, Fudan University, (2)
Shanghai, China
2
Department of Applied Mathematics, University of
Washington, Seattle, WA, USA The stoichiometry vector {vji} can be obtained from
the difference between the numbers of a molecular
species that are consumed and produced in the
Synonyms reaction.
Exact description of the rate function associates
Gillespie algorithm; Stochastic chemical kinetics with a reaction that can be either from phenomenolog-
ical models for stochastic chemical kinetics or from
Definition more fundamental molecular physics based on the
concept of elementary reactions. In general, the func-
Chemical master equation is the stochastic counterpart tion rj has the mathematical form:
of the chemical kinetic equation based on the law of
mass action. It describes the kinetics of chemical rj ðxÞ ¼ kj hj ðxÞ: (3)
reactions in a rapidly stirred tank with small volume
in terms of stochastic reaction times giving rise to
fluctuating copy numbers of reaction species. Here kj is the specific probability rate constant for
Consider a system of fixed volume V at constant reaction Rj, which is defined such that kjdt is the
temperature T. Let there be well-stirred mixture of probability that a randomly chosen combination of
N 1 molecular species {S1, . . . ,SN} and M 1 the reactants of Rj will react accordingly in the next
reactions {R1, . . ., RM}. One specifies the dynamical infinitesimal time interval dt. kj is intimately related to
state of this system by X(t) ¼ (X1(t), . . ., XN(t)), where the rate constants in the traditional law of mass action
Xi(t) is the copy number of molecular species Si in the kinetics.
system at time t. The function hj(x) in Eq. 3 measures the number
One describes the time evolution of X(t) from some of distinct combinations of Rj reactant molecules
given initial state X(t0) ¼ x0. Both single-molecule available in the state x. It is a combinatorial factor
experimental measurements and theoretical investiga- that can be easily obtained from the reaction Rj, i.e.,
QN xk !
tions have shown that X(t) is a stochastic process hj ðxÞ ¼ k¼1 mjk !ðxk mjk Þ!. In the transitional mass-
because the time at which a particular reaction occurs
action kinetics, this term is related to the product of
is random.
the concentrations of all reactants.
The chemical master equation kinetics assumes that
In general, for a chemical reaction
the system is well stirred such that at any moment each
reaction occurs with equal probability at any position in
space. Furthermore, it assumes that for each reaction Rj, Rj : mj1 S1 þ þ mjN SN ! nj1 S1 þ þ njN SN ; (4)
there is a corresponding rate function rj and a stoichi-
ometry vector vj ¼ (vj1, . . ., vjN), which are defined as
one has
rj ðxÞdt ¼ the probability; given XðtÞ ¼ x;
that one reaction Rj will occur some where
vji ¼ nji mji ;
in the next infinitestimal time interval ½t; t þ dtÞ;
Y
N
xk ! (5)
ðj ¼ 1; ; MÞ: rj ðxÞ ¼ kj
m
k¼1 jk
!ðx k mjk Þ!
(1)
Chemical Master Equation 397 C
If for any k and j, have xk >> mjk, then and now derive its evolutionary equations, i.e., the
approximately chemical master equation.
We take a time increment dt and consider the
variation between the probability of X(t) ¼ x and of
Y
N
xk mjk
r j ð xÞ ¼ k j (6) X(t + dt) ¼ x. This variation is
k¼1
mjk !
Pðx; t þ dtÞ Pðx; tÞ ¼ Increasing of the probability in dt C
Then Decreasing of the probability in dt
(9)
0 1
rj ðxÞ B n1 C N
B kj V C Y xk mjk XN We take dt so small such that the probability of
¼BN C ; n¼ mjk : (7) having two or more reactions in dt is negligible com-
V @Q A k¼1 V
mjk ! k¼1 pared to the probability for only one reaction. Then
k¼1 increasing of the probability in dt occurs when
a system with state X(t) ¼ x vj reacts according to
is precisely the reaction flux per unit volume in the Rj in (t, t + dt), the probability of which is rj(x vj)dt.
mass-action kinetics; the (xk/V) is the concentration of Thus,
0 k V n1
species k, and the term in the parenthesis, kj ¼ Q
j
N ,
mjk ! X
M
k¼1 Increasing of the probability in dt ¼ rj ðx vj ÞPðx vj ; tÞdt:
j¼1
is the reaction rate constant in the traditional chemical
kinetics. Here, kj is regarded as “stochastic reaction (10)
0
constant,” while kj is the corresponding reaction
constant for Law of Mass Action. Similarly, when a system with state X(t) ¼ x reacts
The reaction rate constant is deduced only from according to any reaction channel Rj in (t, t + dt), the
experiments, or calculated based on Kramers-Marcus probability P(x, t) will decrease. Thus,
theory. It is usually a function of temperature but
is independent of the volume of a reaction system. X
M
However, the stochastic reaction constant kj depends Decreasing of the probability in dt ¼ rj ðxÞPðx; tÞ dt:
j¼1
on the system volume and the temperature.
From the rate function given from above, the (11)
state vector X(t) is a Markov jump process on the
nonnegative N-dimensional integer lattice. Substituting Eqs. 10 and 11 into Eq. 9, we obtain
Any Markov process can be described by two
completely different, complementary mathematical X
M
Pðx; t þ dtÞ Pðx; tÞ ¼ rj ðx vj ÞPðx vj ; tÞ dt
methods. One follows its stochastic trajectories and j¼1
one consider its probability distribution function (12)
X
M
changing with time. The Gillespie algorithm gives rj ðxÞPðx; tÞ dt;
the former, and the chemical master equation is for j¼1
the latter.
In the chemical master equation perspective, one no which yields, with the limit dt ! 0, the chemical
longer asks what are the copy number (or concentra- master equation:
tion) of species i at time t, but rather what is the
probability of the system having xi copies of species @ XM
Pðx; tÞ ¼ rj ðx vj ÞPðx vj ; tÞ dt
Xi at time t. @t j¼1
We focus on the probability
X
M
rj ðxÞPðx; tÞ dt: (13)
Pðx; tÞ ¼ PrfXðtÞ ¼ xg; (8) j¼1
C 398 Chemical Master Equation
interval, even though it can be as long as one likes: The r0 ðxÞ ¼ rk ðxÞ (18)
k¼1
e is a function of T; greater the T, larger the e.
Compare the solution to the chemical master The key step of this simulation method is to
equation with initial distribution concentrated at x0, generate the pair of numbers (t, j) in accordance with
P(x,t|x0) and the deterministic kinetics c(t,x0); if the the probability Eq. 18: First generate two random
latter approaches to a steady-state value then the numbers a1 and a2 from the uniform distribution in
former cumulates probability at the steady state. In the unit interval, and then take
general, there is an agreement between the steady
states of the deterministic kinetics and the peaks of lnða1 Þ
the stationary distribution of the chemical master t¼ ;
r0 ðxÞ
equation.
However, in the limit of infinitely large volume, the X
j
situation is quite different: The peaks of the stationary j ¼ the smallest integer satisfying ri ðxÞ > a2 r0 ðxÞ
distribution of chemical master equation may not i¼1
be consistent with the stable fixed points of the (19)
Chi-Squared Test 399 C
Below are main steps in the stochastic simulation Vellela M, Qian H (2007) Quasistationary analysis of
algorithm. a stochastic chemical reaction: Keizer’s paradox. Bull Math
Biol 69:1727–1746
1. Initialization: Let the initial state as X ¼ X0, and Vellela M, Qian H (2009) Stochastic dynamics and
t ¼ t0. nonequilibrium thermodynamics of a bistable chemical sys-
2. Simulation: Generate a pair of random numbers tem: the Schl€
ogl model revisited. J R Soc Interface 6:925–940
(t, j) according to the probability density function
(17). C
3. Update: Increase the time by t, and replace the ChIP
molecule numbers by X + vj.
4. Iterate: Go back to Step 2 unless the end of the ▶ Chromatin Immunoprecipitation
simulation procedure.
The exact stochastic simulation algorithm consists
with the chemical master equation and gives an exact
sample trajectory of the real system. ChIP-on-Chip Assay
Nobuo Shimamoto
Faculty of Life Sciences, Kyoto Sangyo University,
Cross-References
Kyoto, Japan
▶ Fokker–Planck Equation
▶ Law of Mass Action
▶ Markov Chain
Definition
▶ Stochastic Differential Equation
ChIP-on-CHIP Assay is a method to determine the
location of the genomic DNA to which a DNA-binding
protein binds. The protein is cross-linked to DNA,
References usually in vivo, and the DNA is fragmented.
The DNA fragments cross-linked with the protein are
Deard DA, Qian H (2008) Chemical biophysics: quantitative then immunoprecipitated, and hybridized to DNA chip
analysis of cellular systems, Cambridge texts in biomedical
engineering. Cambridge University Press, Cambridge
or directly sequenced to determine the location.
(Chap 11)
Ge H, Qian H (2009) Thermodynamic limit of a nonequilibrium
steady-state: Maxwell-type construction for a bistable Cross-References
biochemical system. Phys Rev Lett 103:148103
Gillespie DT (1977) Exact stochastic simulation of coupled
chemical reactions. J Phys Chem 81:2340–2361 ▶ Transcription in Bacteria
Kurtz TG (1972) The relationship between stochastic and
deterministic models for chemical reactions. J Chem Phys
57:2976–2978
Lei JZ (2010) Stochastic modeling in systems biology. J Adv
Chi-Squared Test
Math Appl (to appear)
Liang J, Qian H (2010) Computational cellular dynamics Larissa Stanberry
based on the chemical master equation: a challenge for Bioinformatics and High-throughput Analysis
understanding complexity. J Comput Sci Technol (Rev)
25:154–168
Laboratory, Seattle Children’s Research Institute,
McQuarrie DA (1967) Stochastic approach to chemical kinetics. Seattle, WA, USA
J Appl Probab 4:413–478
Qian H, Bishop LM (2010) The chemical master equation
approach to nonequilibrium steady-state of open biochemical
systems: linear single-molecule enzyme kinetics and
Synonyms
nonlinear biochemical reaction networks. Int J Mol Sci
11(9):3472–3500 Pearson’s chi-squared test
C 400 Chondrocytes
Definition References
The chi-squared test of goodness of fit evaluates Agresti A (2002) Categorical data analysis. Wiley, Hoboken
whether the observed frequencies differ from theoret-
ical values.
Chondrocytes
Conclusions
The chi-square test provides evidence of association
between the variables; however, it does not indicate Chorioallantoic Membrane (CAM) Assay
how the variables relate to each other. To truly under-
stand the nature of the association, the chi-squared test Marsha A. Moses
alone is not sufficient and should be followed by more Department of Surgery/Harvard Medical School,
detailed analysis, i.e., residual analysis, decomposi- Vascular Biology Program/Children’s Hospital
tion, odds ratios, etc. The chi-squared test relies on Boston, Boston, MA, USA
asymptotic distribution and is applicable only when
the cell frequencies are sufficiently large. For small
sample sizes, the Fisher’s Exact Test can be used. Definition
Furthermore, the chi-squared test is invariant with
respect to reodering of rows and columns. Hence, if The chorioallantoic membrane (CAM) assay is one of
one of the variables is ordinal, the appropriate statistics the earliest in vivo angiogenesis assays to be devel-
that respect the ordinality should be chosen. oped. The CAM of fertilized chicken eggs is
Chromatin Immunoprecipitation 401 C
challenged with either potential stimulators or
inhibitors of neovascularization by placement of Chromatin
polymers or other delivery systems impregnated
with the test substances at a site on the CAM imme- Vani Brahmachari and Shruti Jain
diately above the subectodermal plexus (Ausprunk Dr. B. R. Ambedkar Center for Biomedical Research,
et al. 1975). Zones of vessel clearance indicate University of Delhi, Delhi, India
anti-angiogenic activity, whereas increased vessel C
development in a spokewheel configuration indicates
angiogenic stimulation (Fernandez et al. 2003; Synonyms
Moses et al. 1990).
Compact DNA, Nucleosomes
Cross-References
Definition
▶ Neovascularization
The genome which is a linear DNA molecule is very
long relative to the size of the nucleus in the cells. For
References example, it is estimated that all the 23 pairs of
chromosomes present in the human cells if completely
Ausprunk DH, Knighton DR, Folkman J (1975) Vascularization stretched and lined up end to end, it will extend over
of normal and neoplastic tissues grafted to the chick chorio-
2 m, which is of many orders more than the size of the
allantois. Role of host and preexisting graft blood vessels.
Am J Pathol 79:597–618 nucleus in the cell. However, the DNA is wrapped
Fernandez CA, Butterfield C, Jackson G, Moses MA (2003) Struc- around a ball of proteins called the histones and further
tural and functional uncoupling of the enzymatic folded to be accommodated in the nucleus. This
and angiogenic inhibitory activities of tissue inhibitor of
metalloproteinase-2 (TIMP-2). J Biol Chem 278:40989–40995
compact DNA-protein complex is called chromatin.
Moses MA, Sudhalter J, Langer R (1990) Identification of an The DNA-protein complex forms repetitive units called
inhibitor of neovascularization from cartilage. Science ▶ nucleosomes. The compaction state of chromatin is
248:1408–1410 correlated to the transcriptional activity of DNA. The
chromatin provides the platform for interaction with
various regulatory factors that control transcription.
Choristoma
Cross-References
Barbara J. Davis
Section of Pathology, Tufts Cummings School of ▶ Epigenetics
Veterinary Medicine Biomedical Sciences, ▶ Nucleosomes
North Grafton, MA, USA
A heterotopic rest of cells in normally organized tissue Vani Brahmachari and Shruti Jain
in an abnormal site – common to find pancreatic tissue Dr. B. R. Ambedkar Center for Biomedical Research,
in the liver. University of Delhi, Delhi, India
Cross-References Synonyms
Chromatin
Immunoprecipitation,
Fig. 1 Outline of chromatin Sonicated to break up the cross-
immunoprecipitation (ChIP). Protein-DNA cross-linked linked chromatin to 500-200base
Immunoprecipitation with pair fragments
with formaldehyde &
anti-H3K4me is shown as an These fragments are not
Chromatin isolated precipitated as they do
example. The modification on
not have modified histone
histone is marked as a star H3 (marked as star).
Immunoprecipitated with
antibodies directed against histone
H3 methylated at lysine 4
Definition References
Chromatin immunoprecipitation (ChIP) is a type of Collas P (2010) The current state of chromatin immunoprecipi-
tation. Mol Biotechnol 45:87–100
immunoprecipitation technique used to analyze the pro-
tein-protein and DNA-protein interactions (Collas 2010).
It aims to determine whether specific proteins, like tran-
scription factors are associated with specific genomic
regions like promoter elements or regulatory elements.
It can also be used to determine the specific histone Chromatin Proteins
(▶ epigenetics) modifications associated at the target
sites in the genome, indicating gene expression profiles. ▶ Histones
The steps involved are as follows (Fig. 1):
(a) The cells are taken and the protein-DNA
complexes (chromatin) are crosslinked by formal-
dehyde treatment.
(b) The DNA-protein complexes are then sheared and Chromodomain Helicase DNA Binding
DNA fragments associated with the protein(s) of (CHD)
interest are selectively immunoprecipitated using
antibody against the specific protein. Tetsuro Kokubo
(c) The associated DNA fragments are purified and Department of Supramolecular Biology, Graduate
sequences are identified by amplification or DNA School of Nanobioscience, Yokohama City
sequencing reactions. University, Yokohama, Kanagawa, Japan
Cross-References Definition
Chromodomain Helicase DNA Binding (CHD), Table 1 Remodeler composition and orthologous subunits
Organisms
Family and composition Yeast Fly Human
SWI/ Complex SWI/SNF RSC BAP PBAP BAF PBAF
SNF ATPase Swi2/Snf2 Sth I BRM/Brahma hBRM or BRGI
BRGI
Noncatalytic Swi1/Adr6 OSA/ BAF250/
homologous subunits eyelid hOSAI
Polybromo BAF 180
BAP170 BAF 200
Swi3 Rsc8/Swh3 MOR/BAP155 BAF155, BAF170
Swp73 Rsc6 BAP60 BAF60a or b or c
Snf5 Sfh1 SNRI/BAP45 hSNF5/BAF47/INII
BAPI11/dalao BAF57
Arp7, Arp9 BAP55 or BAP47 BAF53a or b
Actin β-actin
a b
Unique
ISWI Complex ISW1a ISW1b ISW2 NURF CHRAC ACF NURF CHRAC ACF
ATPase Isw1 Isw2 ISWI SNF2L SNF2Hc
Noncatalytic Itc1 NURF301 ACF1 BPTF hACFI/
homologous subunits WCRF180
CHRAC14 hCHRAC17
CHRAC16 hCHRAC15
NURF55/ RbAp46
p55 or 48
Unique Ioc3 Ioc2, NURF38
Ioc4
CHD Complex CHD1 CHD1 Mi-2/NuRD CHD1 NuRD
ATPase Chd1 dCHD1 dMi-2 CHD1 Mi-2α/CHD3,
Mi-2β/CHD4
Noncatalytic dMBD2/3 MBD3
homologous subunits dMTA MTA1,2,3
dRPD3 HDAC1,2
p55 RbAp46 or 48
p66/68 p66α,β
Unique DOC-17
(continued)
C 404 CIA
References
Synonyms
Asf1; CIA CIA/Asf1, Fig. 1 Crystal structures of CIA/Asf1 and its histone
complex. (a) Crystal structure of CIA (PDB code: 1ROC). (b)
Crystal structure of the CIA–histone-H3–H4 complex (PDB
code: 2IO5). CIA, H3, and H4 are shown in red, blue, and
Definition green, respectively
00:00
22:30
Midnight 02:00
Bowel movements supperssed Deepest sleep
21:00
Melatonin secretion starts
04:30
19:00 Lowest body temperature
Highest body temperature
Highest blood pressure 18:30
18:00 06:00
Circadian Rhythm, Fig. 1 Diagram of the circadian patterns of a typical human (Smolensky and Lamberg 2000)
Classification 407 C
References data. Roughly speaking, this means that C should be
able to classify so far unseen instances x 2 X correctly.
Forger DB, Peskin CS (2003) A detailed predictive model of the Apart from predictive accuracy, other criteria may of
mammalian circadian clock. Proc Natl Acad Sci USA
course play a role. For example, interpretable classi-
100:14806–14811
Reppert SM, Weaver DR (2002) Coordination of circadian fiers such as rule-based models (▶ Rule-based
timing in mammals. Nature 418:935–941 Methods) are often preferred to “black box” models
Smolensky M, Lamberg L (2000) The body clock guide to better delivering predictions which are not comprehensible C
health: how to use your body’s natural clock to fight illness
by a human user. Besides, the efficiency and scalability
and achieve maximum health. Henry Holt and Company,
New York of the underlying learning algorithms is of major
importance.
means that the true risk of a classifier C is much higher classifier that discriminates this class (considered as
than its empirical risk. the positive class) from all other classes (considered as
The danger of “over-fitting” typically increases negative). At prediction time, each binary classifier
with the flexibility (capacity) of the underlying predicts whether the query input is positive or not.
▶ hypothesis space H, that is, the set of candidate Tie breaking techniques (typically based on confidence
models from which a classifier C is chosen. This flex- estimates for the individual predictions) are used in the
ibility depends on the classification method or, more case of a conflict, which may arise when no class or
precisely, on the underlying model representation. more than one class is predicted.
▶ Linear models, for example, separate classes by
fitting a hyperplane in the instance space. They are Ordinal and Hierarchical Classification
less flexible than artificial neural networks, ▶ decision In standard classification, the set of class labels Y does
trees, or nearest neighbor methods, which can repre- not have any internal structure. In many applications,
sent highly nonlinear decision boundaries. The restric- however, the classes are not completely unordered.
tion to certain types of decision boundaries is called the Two important special cases are a total order
representation bias of a classification method. It (y1 y2 . . . yk ) and a hierarchy, giving rise,
largely determines the inductive bias of the method, respectively, to ordinal classification (also called ordi-
which, informally speaking, produces a preference for nal regression in statistics) and hierarchical classifica-
specific types of models. A bias of that kind is needed tion. As examples consider, respectively, learning to
to enable a choice between a potentially infinite num- predict the expression of a gene on the scale
ber of mappings X ! Y which explain the training Y ¼ {underexpressed, normal, overexpressed} and
data equally well. Implementing a suitable inductive the functional class of a protein according to the hier-
bias and finding a proper trade-off between the com- archical EC nomenclature for enzymes.
plexity of a model and its performance on the training From a learning point of view, the structure of Y is
data are key prerequisites for successful learning additional information that a learner should try to
(Bishop 2006). exploit, and this is what existing methods for ordinal
and hierarchical classification essentially seek to do.
Binary Versus Multi-class Classification The basic assumption in this regard is that the structure
The simplest type of classification problem is of Y is reflected in the topology of the class distribution
binary (dichotomous) classification, in which Y con- in the instance space X . Moreover, corresponding
sists of only two classes, typically referred to as the algorithms normally seek to minimize loss functions
positive and negative class, respectively. For such other than the simple 0/1 loss, which is arguably inap-
problems, a multitude of efficient and theoretically propriate for a structured output space Y. For example,
well-founded classification methods exists. In fact, despite being wrong, predicting the functional class
the representation of models is often geared toward a-amylase (EC 3.2.1.1) for a protein that actually
the binary case, and sometimes even restricted to belongs to the class of b-amylases (EC 3.2.1.2) is still
this problem class. For example, methods such as better than predicting a phosphorylase (EC 2.4.1.1) –
▶ logistic regression and support vector machines the former two classes are both glycosidases, and
produce decision boundaries in the form of a separat- hence functionally related.
ing hyperplane that can only divide the instance space
into two parts. Feature Representation
Other methods, such as decision trees and the naı̈ve Instances x 2 X are normally described in terms of an
Bayes classifier, are not restricted in this way. Instead, attribute-value representation or “feature vector”,
they are directly applicable to the case of multi-class which means that x is a vector ðx1 ; . . . ; xm Þ and X the
(polytomous) classification, that is, problems involv- Cartesian product X 1 . . . X m , with X i the domain
ing more than two classes. Algorithms that are inher- of the i-th attribute (i ¼ 1; . . . ; m). In fact, many
ently binary can still be used in a multi-class setting, methods for classification as well as widely used soft-
namely, through class binarization techniques. ware systems, such as the WEKA machine learning
A popular example is the one-against-rest binarization, toolbox (Hall et al. 2009), expect exactly this type of
where one takes each class in turn and learns a binary representation.
Classification 409 C
Prior to actually training a classifier, this represen- methods can be used on a “meta level,” more or less
tation is normally subjected to a number of independently of the underlying classifier.
preprocessing steps, including normalization, dimen- An important example of this type of approach is
sionality reduction (like ▶ Principal Component Anal- ▶ ensemble learning, where a potentially large number
ysis (PCA)), and ▶ feature selection or weighing. The of classifiers is trained instead of a single one (Rokach
selection of a small to moderate number of attributes 2010). At prediction time, each member of the ensem-
(features) is important for many learning algorithms, ble is queried, and the predictions thus obtained are C
the performance of which may deteriorate in the pres- combined in one way or the other, for example, by
ence of irrelevant, noisy, or simply too many features. means of a simple voting scheme or so-called stacking,
This aspect is especially critical in applications like where the optimal combination itself is formalized as
gene expression analysis, where the number of features a classification problem. In order to generate
(the genes) may largely exceed the number of a (diverse) ensemble of classifiers, different methods
instances (e.g., different types of tissue). can be used, including resampling methods such as
▶ bagging and boosting.
Classification of Structured Data
Especially in the life sciences, however, a feature rep- Classification in Systems Biology
resentation of the above kind is often not natural, and Systems biology offers a multitude of applications for
representing an object in terms of a fixed number of classification methods. For example, classification has
predefined features may come along with a considerable already been used for the construction and analysis of
loss of information. Instead, other representations, such metabolic networks, signaling and protein interaction
as sequences and graphs, appear to be more appealing. networks, gene regulatory networks, the analysis of
A small molecule, for example, is naturally represented microarray data, and the prediction of the influence
in terms of a graph, with each atom corresponding to of toxins on biochemical pathways (Muggleton 2005;
a node, and a bond between a pair of atoms modeled as Larranaga et al. 2006; Fogel et al. 2007). The use of
an edge; likewise, biological networks are commonly powerful predictors for protein–protein or protein–
represented as graphs or hypergraphs. Another example ligand interactions can help to reduce and organize
is modeling molecular structures on the sequence level, the “wet” laboratory work necessary to verify such
which comes down to representing them in terms of interactions in order to construct interaction networks.
a sequence over a finite number of symbols, where Moreover, classification tasks arise when extracting
different sequences can differ in length. the information contained in huge biological networks
Dedicated classification algorithms for handling this (Rapaport et al. 2007).
type of data have been developed in the recent years.
Especially interesting in this regard is kernel-based
machine learning (▶ Learning, Kernel-based), includ- Cross-References
ing support vector machines, which allows for general-
izing (“kernelizing”) conventional methods by means of ▶ Bagging
a mathematical construct called kernel function (Shawe- ▶ Data Mining–based Transcriptional Regulatory
Taylor and Christianini 2004). A kernel function K is Network Construction
a mapping X X ! satisfying special properties, ▶ Decision Tree
and Kðx; x0 Þ can often be interpreted as a degree of ▶ Ensemble
similarity between the instances x and x0 . A kernel of ▶ Feature Selection
that kind can also be defined for structured data objects ▶ Learning, Kernel-Based
like graphs and sequences, thereby making kernel-based ▶ Learning, Supervised
algorithms amenable to this type of data. ▶ Linear Model
▶ Logistic Regression
Ensemble Learning ▶ Overfitting
In recent years, several methods for improving predic- ▶ Principal Component Analysis (PCA)
tive performance have been developed that go beyond ▶ Regression Analysis
the learning of a single classifier. Instead, such ▶ Rule-based Methods
C 410 Classification Assessment and Optimization
Clinical Aspects of the Toponome Imaging System (TIS), Fig. 1 A TIS comparison of CMPs between cancerous colon cells
(on the left) and normal colon cells in the same patient (on the right) (Bhattacharya et al. 2010)
Diseased and nondiseased tissues will show unique dermatitis exhibit distinct protein clusters: Some of
and different CMP patterns, and thus the patterns can these clusters are specific to either psoriasis or atopic
act like “fingerprints” in further comparative analysis dermatitis, whereas others are common to both condi-
of other tissue. An example of these different CMP tions (Schubert et al. 2006). If the clinician knows the
patterns can be seen in Fig. 1. The pictures show the unique “fingerprint” of each disease, only a small skin
CMP clusters displayed in color over a visual image sample is needed to give a clear diagnosis. Disease
(Phase) and over a fluorescent image (DAPI). The left- fingerprinting will also aid the clinician in situations
hand side shows colon cancer tissue, and the right- of uncertain or limited clinical symptoms, helping to
hand side normal colon tissue (both tissues are from prevent situations of misdiagnosis.
the same patient). In this example, 21 proteins were Another example of the clinical possibilities for
used (given acronyms such as PCNA, Bax, etc.) and it using TIS is in cancer studies. In cancer, the ability to
can be seen for each color of protein cluster which look at the tissue for the visible hallmarks of malig-
proteins contribute (shown as a 1 in the table). Lead nancy alongside molecular phenotype is critical, in
proteins (L) are present in all the main CMPs, while particular, as it enables the examination of single
other proteins may be absent in the clusters (A). Wild cells, which may be cancer cells that are invading,
card proteins (W) are present in some but not all dividing, or could be immune or stromal cells. Because
protein clusters and may have key significance in the TIS can detect individual cell protein patterns, a very
cell networks. small number of affected cells can give a positive
Using TIS analysis is a clinically powerful method cancer result; this enhances the chances of early detec-
because, as well as revealing which proteins are pre- tion, especially because cancer cells show very distinc-
sent and where, it indicates how they are co-located, tive differences in their topology in comparison to
giving further information about particular cells and normal, noncancerous cells. While much is known
subcellular regions. Importantly, being able to use about the biochemical pathways in cancers, an under-
a large number of biomarkers allows subtle differences standing is needed in cell organization of functions,
between similar samples to be determined. For exam- and this can be achieved using ▶ toponome analysis to
ple, compared to normal skin, psoriasis and atopic create insight into the key protein networks driving the
C 414 Clinical Aspects of the Toponome Imaging System (TIS)
disease. Schubert et al. (2009) analyzed prostate cancer can be found, for example, by also using mass spec-
tissue, determining a mosaic of distinct protein clusters trometry or gene analysis. Despite having pre-decided
that were specific to the observed cancerous cells. on the proteins to investigate, TIS remains a tool for
Using disease-specific CMP knowledge, two routes discovery science because previously unsuspected pro-
can now be taken: The first to target lead proteins in tein interactions may still be identified. Combining TIS
the development of the cancer with specifically with new knowledge from genomics, transcriptomics,
designed drug therapies, the second to aid the early and proteomics allows various differentially expressed
detection of cancer development. In instances where proteins to be studied in the context of intact anatomy.
two diseases can have shared but distinct disease path- The promise that TIS can detect clear differences
ways, the subtle differences in lead proteins can aid between normal and unhealthy tissue, even if there
understanding of the triggers for pathogenic events that are few cells and the differences are subtle, opens up
result in cancer. In addition, with the CMP clusters not the prospect of easy, early-stage cancer detection using
likely to occur by chance, a correct diagnosis can still blood or, for colon cancer, stool samples. There is
be made even if the cells are at the early stages of further potential that cancer stem cells will become
cancer formation. As technological advances are detectable, and drug therapies will be developed to
made, these protein clusters can be looked for in specifically target stem cells restricting cancer growth.
other body samples, such as from the blood, enabling With greater understanding of the cell’s network path-
a less invasive method of early cancer detection. ways, drug design can be made more disease specific
Sometimes the knowledge of genomic loci involved and enable more precise targeting, whether this is in
in the pathogenesis of a disease is limited. Because the the form of an analgesic or perhaps a unique drug for
TIS technique can furnish a toponomic picture of what a rare disease.
is occurring in the cell, proteins can be detected assem-
bling on the cell surface. This surface assembly is part
of a complex communication network involved with Cross-References
many different cell functionalities and, by monitoring
these protein networks, key proteins can be determined ▶ TIS Robot
and disease-altered network dynamics detected. This is ▶ Toponome Analysis
valuable in situations where diseases are rare, or there
is limited knowledge of cell protein pathways in
a disease. Given sufficient typical protein clusters, by
References
using comparisons with protein clusters from known
networks, it is possible to predict disease behavior and Bhattacharya S, Mathew G, Ruban E, Epstein D, Krusche A,
effective therapies. Hillert R, Schubert W, Khan M (2010) Toponome Imaging
Using a systems biology approach, by combining System: In situ protein network mapping in normal and
cancerous colon from the same patient reveals more than
the TIS results with proteome analysis and
five-thousand cancer specific protein clusters and their sub-
transcriptome analysis, an interpretation of multiple cellular annotation by using a three symbol code. J Proteome
levels of gene expression can take place leading to Res 9:6112–6125
further analysis, for example, comparing the effect of Linke B, Pierre S, Coste O, Angioni C, Becker W, Maier T,
Steinhilber D, Wittpoth C, Geisslinger G, Scholich K (2009)
drug interactions on different patients. Drugs are gen- Toponomics analysis of drug-induced changes in
erally designed to influence protein interactions and so arachidonic acid-dependent signalling pathways during
the effects of the drug can be easily detected using TIS: spinal nociceptive processing. J Proteome Res 8:4851–4859
Changes in the protein networks of synapses due to the Schubert W, Bonnekoh B, Pommer A, Philipsen L,
B€ockelmann R, Malykh Y, Gollnick H, Friedenberger M,
analgesic drug dipyrone have already been observed
Bode M, Dress A (2006) Analyzing proteome topology and
(Linke et al. 2009). function by automated multidimensional fluorescence
While TIS is a powerful and widely applicable microscopy. Nat Biotechnol 24:1270–1278
stand-alone tool, it also works well in complement Schubert W, Gieseler A, Krusche A, Hillert R (2009) Toponome
mapping in prostate cancer: Detection of 2000 cell surface
with other tools as part of a systems biology pipeline.
protein clusters in a single tissue section and cell type specific
The experimenter specifies which proteins or other annotation by using a three symbol code. J Proteome Res
molecules to study using TIS, and new target proteins 8:2696–2707
Closure, Causal 415 C
Closed World Assumption, Table 1 Sample instances about
Clinical Decision Support Systems alumni and their degrees obtained
Alumnus Degree obtained
▶ Biomedical Decision Support Systems Delani PhD in Molecular Biology
Anna PhD in Ecology
Peter MSc in Informatics
Dalila PhD in Genetics C
Clinical Systems Pathology
▶ Clustering Coefficient
Closure, Causal
Closed Chromatin
Definition
▶ Heterochromatin
In biological systems, closure refers to a holistic fea-
ture such that their constitutive processes, operations,
and transformations (1) depend on each other for their
Closed World Assumption production and maintenance and (2) collectively con-
tribute to determine the conditions at which the whole
C. Maria Keet ▶ organization can exist.
KRDB Research Centre, Free University of According to several theoretical biologists, the con-
Bozen-Bolzano, Bolzano, Italy cept of closure captures one of the central features of
biological organization since it constitutes, as well as
evolution by natural selection, an emergent and dis-
Definition tinctively biological causal regime. In spite of an
increasing agreement on its relevance to understand
The Closed World Assumption (CWA) is the assump- biological systems, no agreement on a unique defini-
tion that that what is not known to be true, is false, tion has been reached so far.
so that absence of information is interpreted as
negative information. It assumes that complete infor-
mation about a given state of affairs is provided, Characteristics
which is useful for constraining information and
validating data in an application such as a relational The concept of closure plays a relevant role in biolog-
database. This is contrasted with the Open World ical explanation since it is taken as a naturalized
Assumption. grounding for many distinctive biological dimensions,
C 416 Closure, Causal
as purposefulness, normativity, and functionality taken as a necessary but not sufficient condition to
(Chandler and Van De Vijver 2000). define biological organization. Biological systems, in
The contemporary application of closure to the bio- fact, constitute a subclass of autonomous systems,
logical domain comes from a philosophical and theoret- which realize a specific form of operational closure,
ical tradition tracing back at least to Kant who claimed, which Varela labels, with Humberto Maturana,
in the Critique of Judgment, that biological autopoiesis (▶ Systems, Autopoietic) (Varela 1979).
systems should be understood as natural purposes The specificity of operational closure as autopoiesis is
(Naturzwecke), i.e., systems in which the parts are recip- that, unlike other possible forms, it describes the sys-
rocally causes and effects of the others, such that the tem at the chemical and molecular level, and supposes
whole can be conceived as organized by itself, self- relations of material production among its constituents.
organized. The essence of living system is a form of A crucial distinction is usually made between orga-
internal and circular causality between the whole and the nizational/operational and material closure, where the
parts, distinct from both efficient causality of the phys- latter indicates the absence or incapacity to interact.
ical world and the final causality of artifacts (Kant 1985). While being organizationally closed, biological sys-
One of the most influential contemporary charac- tems are structurally coupled with the environment,
terizations of closure in the biological domain has been with which they exchange matter, energy, and infor-
provided by Francisco Varela (1979). In his account, mation. The concept of biological closure implies then
he builds on an algebraic notion, according to which “a a distinction between two causal levels, an open and
domain K has closure if all operations defined in it a closed one – an issue which have been more explic-
remain within the same domain. The operation of itly addressed by the account proposed by Robert
a system has therefore closure, if the results of its Rosen (Rosen 1991).
action remain within the system (Bourgine and Varela Rosen’s account is based on a rehabilitation and
1992, p. xii).” reinterpretation of the Aristotelian categories of cau-
Applied to biological systems, closure is realized as sality and, in particular, on the distinction between
what Varela labels operational (or organizational) efficient and material cause. Let us consider an
closure, which designates an organization of processes abstract mapping f between the sets A and B, such
such that “(1) the processes are related as a network, so that f: A¼> > B. Represented in a relational diagram,
that they recursively depend on each other in the gen- we have (Fig. 1):
eration and realization of the processes themselves, When applied to model natural systems, Rosen
and (2) they constitute the system as a unity recogniz- claims that the hollow-headed arrow represents mate-
able in the space (domain) in which the processes rial causation, a flow from A to B, whereas the solid-
exist” (Varela 1979, p. 55). headed arrow represents efficient causation,
It should be noted that Varela himself has proposed, a ▶ constraint exerted by f on this flow.
over time, slightly different definitions of operational Rosen’s central thesis is that “a material system is
closure. In addition, more recent contributions have an organism [a living system] if, and only if, it is closed
introduced a theoretical distinction between organiza- to efficient causation” (Rosen 1991, p. 244), whereas
tional and operational closure. Whereas “organiza- a natural system is closed to efficient causation if and
tional” closure indicates the abstract network of only if its relational diagram has a closed path that
relations that define the system as a unity, “operational” contains all the solid-headed arrows. It is worth noting
closure refers to the recurrent dynamics and processes that, unlike the varelian tradition, Rosen takes closure
of such a system (Thompson 2007). as the definition of biological organization.
In Varela’s view, operational closure is closely According to Rosen, the central feature of
related to ▶ autonomy, the central feature of living a biological system consists in the fact that all compo-
organization. More precisely, he enunciates the “Clo- nents having the status of efficient causes are materially
sure Thesis,” according to which “every autonomous produced by and within the system itself. At the most
system is operationally closed” (Varela 1979, p. 58). In general level, closure is realized in biological systems
principle, the class of autonomous systems realizing among three classes of efficient causes corresponding
operational closure is larger than the class of biological to three broad classes of biological functions (▶ Func-
systems. As a consequence, operational closure is tion, Distributed) that Rosen denotes as metabolism
Closure, Causal 417 C
f at least some of the constraints that make work possible.
The cycle is a thermodynamic irreversible process,
which dissipates energy and requires a coupling
between exergonic (spontaneous, which release energy)
and endergonic (non spontaneous, which require
energy) reactions, such that exergonic processes are
constrained in a specific way to produce a work, C
A B which can be used to generate endergonic processes,
which in turn generate those constraints canalizing
Closure, Causal, Fig. 1 Relational diagram, descriprion see exergonic processes. In Kauffman’s terms: “Work
text
begets constraints beget work” (Kauffman 2000).
A complementary account of closure has been pro-
f posed by Howard Pattee, who focused on its informa-
tional dimension (Pattee 1982). In his view, biological
organization consists of the integration of two
intertwined dimensions, which cannot be understood
separately. On the one side, the organization realizes
a dynamic and autopoietic network of mechanisms and
A B Φ
processes, which defines itself as a topological unit,
Closure, Causal, Fig. 2 Relational diagram, descriprion see structurally coupled with the environment. On the
text other side, it is shaped by the material unfolding of
a set of symbolic instructions, stored and transmitted as
genetic ▶ information.
(f: A¼>> B), repair (F: B¼>> f), and replication (B: According to Pattee, the dynamic/mechanistic and
f¼>>F) (Fig. 2). informational dimensions realize a distinct form of
By providing a clear-cut theoretical and formal closure between them, which he labels semantic clo-
distinction between material and efficient causation, sure. By this notion, he refers to the fact that while
Rosen’s characterization explicitly spells out that bio- symbolic information, to be such, must be interpreted
logical organization consists of two coexisting causal by the dynamics and mechanisms that it constrains, the
regimes: closure to efficient causation, which grounds mechanisms in charge of the interpretation and the
its unity and distinctiveness, and openness to material “material translation” require that very information
causation, which allows material, energetic, and infor- for their own production. Semantic closure, as an
mational interactions with the environment. interweaving between dynamics and information, con-
More recently, the scientific work on biological stitutes then an additional dimension of organizational
closure has been developed in various directions closure of biological systems, complementary to the
(Chandler and Van De Vijver 2000). In particular, operational/efficient one.
a thriving research line has specifically focused on
the critical nature of systems realizing closure, which
must maintain a continuous flow of energy and matter Cross-References
with the environment in conditions far from thermo-
dynamic equilibrium. To capture this dimension of ▶ Autonomy
closure, Stuart Kauffman has proposed the notion of ▶ Constraint
Work-Constraint cycle (Kauffman 2000). ▶ Emergence
The Work-Constraint cycle represents an interpreta- ▶ Explanation in Biology
tion of organizational closure that links the idea of ▶ Explanation, Functional
“work” to that of “▶ constraint,” the former being ▶ Function, Distributed
defined, as “constrained release of energy into relatively ▶ Holism
few degrees of freedom.” A system realizes a Work- ▶ Information, Biological
Constraint cycle if it is able to use its work to regenerate ▶ Organization
C 418 Cluster Analysis
References Characteristics
Bourgine P, Varela FJ (1992) Toward a practice of autonomous The main idea behind TIS (TIS robot) (▶ Clinical
systems. MIT Press/Bradford Books, Cambridge, MA
Aspects of the Toponome Imaging System (TIS);
Chandler JLR, Van de Vijver G (eds) (2000) Closure: emergent
organizations and their dynamics. Ann NY Acad Sci 901 ▶ TIS Robot) data clustering is to find a way to interpret
Kant I (1987/1790) Critique of judgment. Hackett, Indianapolis ▶ toponome images without applying thresholds. Thus,
Kauffman S (2000) Investigations. Oxford University Press, a toponome image recorded with an antibody library of
Oxford
N tags is considered a N-dimensional data of fluores-
Pattee HH (1982) Cell psychology: an evolutionary approach
to the symbol–matter problem. Cogn Brain Theory cence grey value patterns g(x,y) ¼ (g1, . . ., gN)(x,y) with
5(4):325–341 (x, y) as pixel coordinates and gj(x,y) denoting the grey
Rosen R (1991) Life itself: a comprehensive inquiry into the value for the j-th tag at pixel (x, y). ▶ Clustering pro-
nature, origin, and fabrication of life. Columbia University
vides a valuable method for partitioning this data into
Press, New York
Thompson E (2007) Mind in life: biology, phenomenology, groups of similar grey value patterns. This partitioning
and the sciences of mind. Harvard University Press, has two aims: First, clustering achieves a data reduc-
Cambridge, MA tion, so that data sets can be analyzed and compared on
Varela FJ (1979) Principles of biological autonomy. North
a meta-level. Second, one wants to address the fact that
Holland, New York
similar co-location patterns may also belong to the
same functional group or to the same hierarchically
organized network. If cluster analysis is to be applied
Cluster Analysis to TIS data, the following aspects have to be
considered.
Tim W. Nattkemper Like in any other clustering application, it is not
Biodata Mining Group, Faculty of Technology, possible to define a criterion for an optimum cluster
Bielefeld University, Bielefeld, Germany result, since clustering tries to achieve two opposing
goals: connectedness and compactness (Handl et al.
2005). Transferred to the context of toponomics data
Synonyms ▶ clustering this can be seen as the conflict between
the two ideas:(1) to generate large clusters that repre-
Vector quantization sent one frequent co-fluorescence pattern with some
variation, and (2) to form small compact clusters that
contain only those proteins which are utmost similar.
Definition Cluster analysis algorithms can be divided into the
following two groups.
A large and high-dimensional TIS (TIS robot) (▶ Clin- Hierarchical cluster analysis methods organize the
ical Aspects of the Toponome Imaging System (TIS); input data (i.e., the fluorescence grey value patterns of
▶ TIS Robot) data set is reduced to a small set of protein one or two TIS images) into a tree structure exposing
co-location prototypes without the application of thresh- the relationships from the most similar to the most
olds. Different indices are proposed to evaluate the different fluorescence grey value patterns. In contrast,
▶ clustering result or to estimate an appropriate number partitioning or vector quantization cluster algorithms
of clusters in a data set. To allow visual diagnostics and successively assign each pattern g(x,y) to one distinct
to apply biological expert knowledge, a special visual- group, i.e., cluster. In the clustering result, each of the
ization icon is proposed. This way, important molecular clusters is characterized by a typical representative or
co-location patterns can be identified and localized in the cluster center, which is usually referred to as the
the image, which helps in different fields of systems prototype or the codebook vector. The most popular
biology (▶ Systems Biology Pathway Exchange partitioning cluster method is the ▶ K-means
(SBPAX)) like ▶ pathway analysis or protein network approach (Lloyd 1982). The objective is to find a set
analysis or other ▶ toponomics applications. of K prototypes so that the distances between each
Cluster Analysis 419 C
data item and its closest prototype are minimized. a certain range (e.g., [0; 1]) using a linear scaling or
Although the obtained clustering result may represent some nonlinear scaling function such as tan h (gj(x,y)),
only a locally optimal solution rather than the global so for all prototype components it is cj(k) 2 [0; 1]. An
optimum, the strategy is often used due to its algorith- essential first step in the evaluation is the visual inspec-
mic simplicity and efficiency, and has been recently tion of the prototypes, so a graphical display is needed
voted in the list of top ten algorithms in data mining to support visual ▶ data mining. A visual inspection of
(Wu et al. 2008). For large data sets such as toponome the prototypes allows the identification of interesting C
data, the consideration of each data item for prototype protein co-location patterns in the data and
adaptation can be computationally expensive, which a comparison with ▶ combinatorial molecular pheno-
motivates an online K-means version applying the types (CMPs), without the need of analyzing single
winner-takes-all (WTA) rule. It is common to repeat- images which eases the knowledge discovery process.
edly run the algorithm with different initializations If interesting prototypes are found, the associated data
(Hastie et al. 2001) and to take that solution as items can be analyzed in a subsequent step following
the clustering result, which shows the smallest the Shneiderman mantra of “Overview first, zoom in,
intra-class variance. Neural Gas clustering claims to and filter, details on demand.” However, suitable pro-
be an enhancement as it takes into account totype visualization is not as straightforward as it
a “neighborhood ranking” of all proteins that are seems. One way to display multivariate data are
assigned to a cluster – an advantage bought by an glyph or icon displays. According to Colin Ware “A
increase in computational running time. In TIS cluster glyph is a graphical object designed to convey multiple
analysis, one should usually favor partitioning cluster data values” (Ware 2004, p.145). Each data feature is
algorithms, since the tree plot for an entire image mapped to a different graphical attribute of the glyph
should be too large for a visual analysis. Thus, the such as size, shape, or color. We apply a glyph-based
following part of this entry focuses on partitioning visualization approach, which has been developed to
cluster algorithms. suit the needs of non-binarized signal co-location anal-
A number of quality measures have been proposed ysis (Loyek et al. 2011a, b; K€olling et al. 2012). This
to evaluate the outcomes of cluster algorithms. glyph approach combines visualization aspects known
Because of its opposing goals, no definite criterion from bar charts and star glyphs and is to some extent
can be formulated that describes an optimal clustering inspired by the sequence logo display, which represents
of a dataset. This pertains not only to the applied patterns in nucleotide or amino acid sequences. In
cluster algorithm but also to the “true” number of a sequence logo, for each position of a set of aligned
clusters of a dataset. Proposed measures that base sequences, e.g., nucleotide sequences, the four nucleo-
solely on the clustering itself and the underlying tides are arranged on top of each other sorted according
dataset range from early approaches (Calinski and to their frequency at that position. The character height
Harabasz 1974; Davies and Bouldin 1979) up to represents the frequency of the according nucleotide.
novel instruments (Yeung et al. 2001; Maulik and Through this visualization, a rapid identification of
Bandyopadhyay 2002). Another kind of assistance in prominent sequence patterns can be achieved as high
choosing a cluster algorithm was recently proposed in frequent nucleotides can directly be “read” from the
Yeung et al. (2001). They delineated an instrument logo. To construct a glyph for one signal co-location
called Figure of Merit (FOM) to evaluate cluster solu- ck (k ¼ 1,. . ., K), a horizontal box is drawn for each data
tions. The idea of their method is to integrate a kind of feature (see Fig. 1a). The height, as well as the length, of
bootstrapping approach and thereby estimate the pre- each box is scaled according to the feature’s value. To
dictive power of a cluster algorithm. increase differentiation between neighboring boxes,
One important aspect in TIS cluster analysis is the they are alternatingly colored in black and light shaded
evaluation of the outcome, i.e., the cluster result, which grey. This follows Ware’s suggestion for star glyphs or
is, in case of partitioning methods, represented by whisker plots to increase the number of dimensions by
the set of K prototypes {c(k)}k ¼ 1,. . ., K. Usually, changing length and width of the bars as well as using
the grey values of the images are transformed to different luminance levels. Furthermore, by employing
C 420 Cluster Analysis
Cluster Analysis, Fig. 1 (a) Generation of a glyph for one the assigned prototype color, are given. For comparison, two
cluster prototype c(k): Each box is scaled in height and width classic bar plot displays for two 15-dimensional cluster proto-
according to the features’ values. The prototype names (P1, . . ., types are displayed. Since this glyph display is much more
PN) are written in the boxes for fast association of the proteins to compact compared to a standard box plot (b), more prototypes
the boxes. The relative and absolute abundance of the prototype, can be displayed on the same screen space and a comparative
i.e., how many pixel are associated to that prototype, as well as analysis is easier
length as an attribute for data representation, a well- course). The grey box with the overlaid number shows
suited graphical parameter to encode quantitative data the relative abundance, below the absolute abundance
has been chosen, as has already been discussed for the of pixels is displayed, which are assigned to this proto-
bar plot. To allow for a fast identification of prominent type according to the best match criterion. This meta-
proteins, the protein names are directly incorporated information is very valuable for prototype analysis.
into the visualization. To this end, the associated protein If this glyph visualization is compared to a classic bar
name is written in each bar and scaled in height and graph (see Fig. 1b), it is evident that in the glyph display
length analogue to the bar itself. With this strategy, the association of proteins to individual graphical attri-
prominent protein co-localization can easily be identi- butes is much easier. Furthermore, besides being able to
fied by “reading” the glyph analogous to the reading of rapidly identify the dominant proteins, an advantage
a sequence logo. Figure 1a displays the construction of of the glyph display is that only features with high
the glyph display and shows two glyph examples on the values allocate space, whereas low value features are
right. In addition to the special box plot display of the squeezed in contrast to Fig. 1a). Thereby, space is only
feature vector components, the top bar of the glyph allocated proportional to the importance of the protein
shows the color which is assigned to the glyph, and and the total size of the glyph reflects the amount of
meta-information as the associated data set and the information provided by the prototype. In some appli-
prototype number can be displayed here as well. How cations, this might not be a desirable feature so that bar
color will be assigned to each prototype will be graphs, or glyphs with constant bar width would better
explained below. On the right, the glyph displays infor- be suited.
mation about the abundance of the feature combination Different strategies to assign color to the prototypes
in one selected TIS image (or in a set of TIS images of and the corresponding glyphs can be proposed. If one
Cluster of Differentiation 421 C
wants to achieve preservation of topologies between References
the cluster positions in the N- dimensional co-location
space and the color space, i.e., similar clusters get Calinski T, Harabasz J (1974) A dendrite method for cluster
analysis. Commun Stat Theory Methods 3:1–27.
similar colors, dimensional reduction can be applied.
doi:10.1080/03610927408827101. http://dx.doi.org/10.1080/
Using for instance Sammon mapping or ▶ principal 03610927408827101
component analysis (PCA), the N-dimensional proto- Davies DL, Bouldin DW (1979) A cluster separation measure.
types are mapped into a 3D color space where each IEEE Trans Pattern Anal Mach Intell PAMI-1:224–227 C
Handl J, Knowles J, Kell D (2005) Computational cluster
prototype is associated with a three-dimensional color
validation in post-genomic data analysis. Bioinformatics
vector h ¼ (h1, h2, h3).. A whole pseudo-color image 21:3201–3212
can be rendered by finding for each feature vector g(x,y) Hastie T, Tibshirani R, Friedman JH (2001) The elements of
its best matching prototype Nearest Neighbor statistical learning. Springer, New York
K€
olling J, Langenk€amper D, Abouna S, Khan M, Nattkemper
Method and its color. The pixel position x, y is plotted
TW (2012) WHIDE—a web tool for visual data mining
in the corresponding color. The most common color colocation patterns in multivariate bioimages. Bioinformat-
space is the RGB space, where color is produced by ics 28(8):1143–1150. doi:10.1093/bioinformatics/bts104.
additively mixing red, green, and blue primaries. Lloyd S (1982) Least squares quantization in PCM. IEEE Trans
Inf Theory 28:129–137
Another widely used space is the HSV space, where
Loyek C, Bungkowski A, Nattkemper TW (2011a) Web2.0
color is described by means of hue (H), saturation (S) paves new ways for collaborative and exploratory analysis
and value (V). of chemical compounds in spectrometry data. J Integr
Bioinform 8(2):158. doi:10.2390/biecoll-jib-2011-158
Loyek C, Langenk€amper D, K€ olling J, Niehaus K,
Nattkemper TW (2011b) A Web2.0 strategy for the collabo-
Clustering Algorithms and Tools rative analysis of complex bioimages. In: Advances in intel-
In general, there is a large variety of clustering algo- ligent data analysis X: 10th international symposium, IDA
rithms and tools; open source tools widely used in 2011, Porto, October 29–31, 2011, Proceedings. Lecture
notes in computer science (Applications, incl. Internet/
computational biology include Weka (Weka, Machine
Web, and HCI). Springer, Berlin/Heidelberg, pp 258–270
Learning Tool) and R (R, Data Analysis Tool) Maulik U, Bandyopadhyay S (2002) Performance evaluation of
▶ R, Programming Language. some clustering algorithms and validity indices. IEEE Trans
Pattern Anal Mach Intell 24:1650–1654
Ware C (2004) Information visualization: perception for design.
Morgan Kaufmann, San Francisco
Wu X, Kumar V, Ross GJ, Yang Q, Motoda H, Mclachlan G,
Cross-References Ng A, Liu B, Yu P, Zhou ZH, Steinbach M, Hand D,
Steinberg D (2008) Top 10 algorithms in data mining.
Knowl Inf Syst 14:1–37
▶ Clinical Aspects of the Toponome Imaging System
Yeung K, Haynor D, Ruzzo W (2001) Validating clustering for
(TIS) gene expression data. Bioinformatics 17:309–318
▶ Clustering
▶ Clustering, k-Means
▶ Data Mining
▶ Information Theory and Toponomics Cluster of Differentiation
▶ Knowledge Acquisition
▶ Learning, Supervised Rohan Dhiman
▶ Model Testing, Machine Learning Center for Pulmonary and Infectious Disease Control,
▶ Model Training, Machine Learning University of Texas Health Science Center at Tyler,
▶ Principal Component Analysis (PCA) Tyler, TX, USA
▶ Subset Surprisology and Toponomics
▶ TIS Robot
▶ Topology and Toponomics Definition
▶ Toponome Analysis
▶ Toponomics It stands for cluster of differentiation. It is
▶ Unsupervised Learning Methods a method conceptualized in 1st international
C 422 Clustering
workshop and conference on human leukocyte of the other clusters. From different clustering types,
differentiation antigens to provide nomenclature to there are three clustering patterns as below.
all the cell surface molecules present on white
blood cells. It includes various molecules some
which either act as ligands or receptors (Zola et al.
2005; Bernard and Boumsell 1984; Fiebig et al.
1984).
Cross-References
▶ Fluorescent Markers
References
Clustering Coefficient
a
Local clustering coefficient
CL CL
IQU IQ
E UE
v
kv =2 kv =2 kv =3 kv =3 kv =4
Cv =0 Cv =1 Cv =0 Cv =0.33 Cv =1
b
Global clustering coefficient
Example
Triplet Triangle LOCAL DENSITY
(3) (Δ) τΔ(v )
cL(V ) =
τ3(v )
TRANSITIVITY
3τΔ(G )
cLT(G) =
τ3(G )
CLUSTERING COEFFICIENT
1
cL(G ) = Σ cL(v)
V⬘ v∈V⬘ G(V=7,E=15)
Clustering Coefficient, Fig. 1 The density of connections in definition of the clustering coefficient relates to the proportion
a graph can be approached by local and global definitions of the of triplets that form triangles in a graph. The clustering coeffi-
clustering coefficient. (a) The local clustering coefficient repre- cient corresponds to the mean value of the local clustering
sents the density of connections among the neighbors of a node, coefficient (referred to as local density), while the transitivity
and ranges from 0 to 1. The higher the value, the more the node is of a graph gives the probability that the direct neighbors of
part of a densely connected cluster of nodes. A value of 1 a node are connected
indicates that the node is part of a clique. (b) The Global
In hierarchical clustering, it is necessary to measure the original n objects with k-dimension vectors.
the similarity or dissimilarity of two groups to decide Then use k-means algorithm to partition all these
which two groups should be merged or how a group vectors into k clusters. Similarly, there are also revised
should be divided. Several methods have been devel- algorithms like normalized spectral clustering
oped. Single linkage clustering defines distance according to Shi and Malik (2000) and normalized
between groups as the distance between the closest spectral clustering according to Ng et al. (2002).
pair of objects, one from each group, while complete Spectral clustering is successful mainly based on
linkage is opposite, defining that as the distance the fact that it does not make strong assumptions on the
between the most distant pair of objects. Average link- form of the clusters, but at the same time spectral
age clustering defines that as the average of distances clustering can be quite unstable under different choices
between all pairs of objects, where each pair is made up of the parameters or neighborhood graphs. It should be
of one object from each group. Besides, there are also used with care.
some other approaches like average group linkage, In addition to these classical clustering methods,
Ward’s hierarchical clustering method, and so on. there are also some other related methods including
Hierarchical clustering is also one of the most com- fuzzy c-means clustering (Bezdek and Ehrlich 1984),
monly used clustering methods. It has the advantage graph-theoretic methods (Sharan and Shamir 2000),
that any valid measure of distance can be used. Com- biclustering (Madeira and Oliveira 2004), probabilistic
paring with k-means, the observations themselves are clustering (Nikhil et al. 2005), Markov clustering
not necessary. Hierarchical clustering does not really (Dongen 2000), support vector clustering (Ben-Hur
produce groups, and the user should artificially decide et al. 2001), and so on. Clustering methods are useful
where to split the tree into groups. Besides, hierarchi- in exploring data, but most methods are still very ad
cal clustering is also sensitive to noise and outliers. hoc, depending on choosing proper similarity metric
and parameters.
Spectral Clustering
As described by Luxburg (2007), spectral clustering is
one of the most popular modern clustering algorithms. Cross-References
The basic idea of spectral clustering is to make use of
the spectrum of the graph Laplacian of data to perform ▶ Clustering
dimension reduction for clustering in low dimension. ▶ Clustering, k-Means
The first step in spectral clustering is to transform ▶ Heuristic Optimization
similarities or distances into a similarity graph to ▶ Heuristic Optimization
model the local neighborhood relationships between ▶ Hierarchy
objects. There are several methods to construct the ▶ NP-Hard
graph. e-neighborhood graph connects two objects ▶ Pearson Correlation Coefficient
when their distance is smaller than given e; k-nearest
neighbor connects an object with k nearest neighbors,
and kernel-based fully connected graph connects each
References
pair of objects with a weighted edge. Finally,
a similarity graph matrix W is obtained to represent Ben-Hur A, Horn D, Siegelmann HT, Vapnik V (2001) Support
local relationships between objects. The graph vector clustering. J Mach Learn Res 2:125–137
Laplacian is defined as Bezdek JC, Ehrlich R (1984) FCM: the fuzzy c-means clustering
algorithm. Comput Geosci 10(22):191–203
Dongen SV (2000) Graph clustering by flow simulation. Ph.D.
L ¼ D W; thesis. University of Utrecht
Fowlkes EB, Mallows CL (1983) A method for comparing two
P
n hierarchical clusterings. J Am Stat Assoc 78(384):553–584
where D is a diagonal degree matrix with dii ¼ wij : Inaba M, Katoh N, Imai H (1994) Applications of weighted
j¼1 voronoi diagrams and randomization to variance-based
Based on the graph Laplacian, given number k-clustering. Proceedings of 10th ACM symposium on
k of clusters, unnormalized spectral clustering algo- computational geometry. 332–339. doi:10.1145/
rithms compute the first k eigenvectors to represent 177424.178042
Clustering, Hierarchical 427 C
Luxburg UV (2007) A tutorial on spectral clustering. Stat clustering is simple to apply and in comparison to other
Comput 17(4):395–416 clustering methods makes no parametric assumptions
Madeira SC, Oliveira AL (2004) Biclustering algorithms for
biological data analysis: a survey. IEEE Trans Comput Biol about the size, shape, or quantity of clusters (Jain and
Bioinform 1(1):24–45 Dubes 1988).
Ng A, Jordan M, Weiss Y (2002) On spectral clustering: analysis
and an algorithm. In: Dietterich T, Becker S, Ghahramani Dissimilarity Measure
Z (eds) Advances in neural information processing, systems
14. MIT, Cambridge, pp 849–856 Unlike k-means clustering (▶ Clustering, k-means), C
Pal NR, Pal K, Keller JM, Bezdek JC (2005) A possibilistic hierarchical clustering does not require one to specify
fuzzy c-means clustering algorithm. IEEE Trans Fuzzy Syst the number of clusters a priori. In contrast, it uses
13(4):517–530 a dissimilarity measure.
Sharan R, Shamir R (2000) CLICK: a clustering algorithm with
applications to gene expression analysis. In: Proceedings of The choice of the dissimilarity measure depends on
ISMB’00, pp 307–316 the data type (nominal, ordinal, ratio etc.). Common
Shi J, Malik J (2000) Normalized cuts and image segmentation. examples include Euclidean distance, Mahalanobis
IEEE Trans Pattern Anal Mach Intell 22(8):888–905 distance, Manhattan distance, and others.
Linkage Method
Recursive splitting or merging of the clusters can be
Clustering, Hierarchical represented by a binary tree known as dendrogram.
The tree provides a simple and convenient visualiza-
Larissa Stanberry tion of data hierarchy. Based on the dissimilarity
Bioinformatics and High-throughput Analysis measure, the data are grouped into a hierarchical tree
Laboratory, Seattle Children’s Research Institute, called dendrogram using a linkage method. The three
Seattle, WA, USA most popular linkage methods in hierarchical agglom-
erative clustering are
• Single linkage, or the nearest neighbor, method.
Definition Here, the dissimilarity between two groups is
defined as the minimum pairwise distance of points
Hierarchical clustering represents a collection of in the two groups.
methods which seeks to partition the data into groups • Complete linkage, or the furthest neighbor, where
by building a hierarchy of clusters. the dissimilarity between two clusters is taken as the
maximum distance between points in the clusters.
• Average linkage uses the average dissimilarity
Characteristics between the groups.
If the grouping structure is compact with
The goal of a clustering method is to partition the data well-separated clusters, all of the three clustering
into distinct groups where each group contains objects methods produce a similar result.
with similar characteristics. Hierarchical clustering Single linkage has certain advantages as compared
refers to the collection of methods which builds to other methods. In particular, it is known to correctly
a hierarchy of clusters. The hierarchical clustering identify the underlying structure of the data. This is not
methods are subdivided into two categories necessarily the case for the other linkage methods,
• Agglomerative, where each datum initially repre- unless points of high density are sufficiently separated.
sents a cluster and the two clusters are merged For example, when sets of original points become
together at each step of the hierarchy. Hence, the close or overlap, the complete, average, and K-means
hierarchy is built from the bottom up. clustering (▶ Clustering, k-means) yield several large
• Divisive, where initially all data represents a single clusters, giving an impression of distinct grouping in
cluster that is split as one moves down the the data regardless of the density. It has been shown
hierarchy. that the large clusters produced by these algorithms
Agglomerative clustering has been extensively depend on the range, but not on the true density of the
studied and applied in numerous studies. Hierarchical data set. Single linkage behaves differently, producing
C 428 Clustering, Hierarchical
Clustering, Hierarchical,
Fig. 1 The binary tree for 1.6
the simulated example 7 2
1.4
1
9 1.2
2
1.0
height
0 4
3
0.8
1 0.6
−1 5
10
10 3 4 7 9 6
8 0.4 1
6
−2 0.2 5 8
long-chained clusters that are difficult to interpret. The data set consisting of ten two-dimensional points
“chaining” effect indicates the lack of spatial separa- labeled from 1 to 10 (Fig. 1). Initially every point is
tion in case of touching or overlapping clusters. a cluster. The first step is to compute all pairwise
In addition, the single linkage is fully consistent for distances between the points. Here, we use the Euclid-
separating two disjoint, high-density clusters in one ean. The two closest points are 5 and 8 and will be
dimension, and is fractionally consistent for data in merged to form a new cluster. The next smallest dis-
higher dimensions. The fractional consistency means tance is between points 1 and 5, but 5 has already been
that if two disjoint population groups exist, there will merged with 8, thus 1 is added to a cluster {5, 8}. Next,
be two distinct single linkage clusters containing points 3 and 4 are merged together and so on. The
a positive fraction of the sample points from the process continues until all the nodes form a single
corresponding population groups. Hence, the single cluster. The result is the single linkage dendrogram
linkage is in a sense conservative, as it will not neces- shown in the left plot. The original points are marked
sarily detect all clusters, but will identify modal on the tree.
regions separated by a sufficiently deep valley The single linkage clusters can be obtained by cut-
(Hartigan 1975, 1977). ting the tree at a certain level. In the example, deleting
the largest edge will yield to two clusters a singleton
Cluster Identification {2} and a cluster containing all the remaining nine
The clusters are typically defined by setting an appro- points. Cutting, for example, at 0.9 would give three
priate inconsistency threshold, which is equivalent clusters {2}, {3,4,7,9}, and {10,6,1,5,8}.
to cutting the dendrogram at a certain level. For that To demonstrate the performance of the hierarchical
the length of each link in a cluster tree is compared clustering, we classify the famous iris data set that
with the length of neighboring links below it. If the gives the measurements (in cm) of the four variables,
length of the link does not differ significantly from the including sepal length and widths, and petal length and
length of the neighboring links, it means that objects, width. The data was collected on the three species of
joined at this level of the hierarchy, have similar Iris setosa, versicolor, and virginica and is freely avail-
characteristics. Large differences would indicate able as a part of R package.
some dissimilarity between the objects, and the link Figure 2 shows the dendrogram trees for the single,
is said to be inconsistent. Alternatively, one can con- complete, and average linkage. The three iris species
sider an adaptive rule that determines inconsistency are color coded to show their relative positions in the
threshold for each link. tree. The average and single-linkage trees look similar
misplacing only a few versicolor species into the
Example virginica branch. The complete linkage performs the
To gain a better understanding, consider the following worst as it identifies half of the virginica species to be
example of the single linkage algorithm for a small closer to the setosa and half to the versicolor. The
Clustering, Hierarchical 429 C
1.5
Height
1.0
0.5
0.0
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
v
v
g
v
g
g
g
v
v
v
v
v
v
v
g
g
g
v
v
v
v
v
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
v
v
v
v
v
v
v
v
v
g
v
g
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
Single Linkage C
6
Height
0
vv
vv
vv
vv
vv
vv
vv
vv
vv
vv
vv
vv
vv
vv
vv
vv
vv
gvv
vv
vv
vv
vv
vv
g v
gv
g v
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
gv
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
gs
ss
ss
ss
ss
ss
ss
ss
ss
ss
ss
ss
ss
ss
ss
ss
ss
ss
ss
ss
ss
ss
ss
ss
ss
s
Complete Linkage
4
3
Height
2
1
0
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
s
sv
s
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
v
g v
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
gv
g
gv
g
v
v
v
v
gvv
v
v
v
v
gv
g
g v
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
Average Linkage
Clustering, Hierarchical, Fig. 2 Binary trees for the iris data for the single-, complete and average linkage methods. The three iris
species were color-coded
misclassification error is about 9% for the average and ELKI package implements a single-linkage clustering
single linkage and is 16% for the complete linkage among others. Orange is a free data-mining software
method. that can be used for clustering. Hierarchical Clustering
Explorer is another useful tool for data visualization
Divisive Clustering and cluster analysis.
Divisive clustering was studied and used considerably
less extensively. The divisive method can be accom-
plished by recursively subdividing groups into two
clusters. The splitting continues until every cluster is Cross-References
a singleton. The divisive clustering does not necessar-
▶ Clustering, k-Means
ily leads to the convenient representation of the data as
a binary tree, although some methods were proposed to ▶ Clustering, Model-Based
▶ Model Testing, Machine Learning
address this issue.
Characteristics Example
Here we demonstrate the performance of the K-means
K-means algorithm clustering algorithm on the famous iris data set
The K-means algorithm is based on the dissimilarity that gives the measurements (in cm) of the four vari-
measure given by squared Euclidean distance, i.e., for ables including sepal length and widths, and petal
p-dimensional data vectors x1 ¼ (x11 . . .,,x1p) and x2 ¼ length and width. The data was collected on the
(x21, . . ., x2p) three species of Iris setosa, Iris versicolor, and Iris
virginica. The data set is freely available as a part of
R package.
X
p
Figure 1 shows the results of clustering the
dðx1 ; x2 Þ ¼ ðxi1 x2i Þ2 data into four groups using K-means method with 25
i¼1
random initializations. Here, s, v, g stand for the
three different species setosa, versicolor, and
In K-means, the clusters are determined by mini- virginica. The cluster assignments are marked by the
mizing the loss function three different color. Clearly, K-means exactly iden-
tifies setosa species, but makes some misassignments
by classifying some versicolor with virginica and
X
K X vice versa. This is expected, since the setosa species
LðCÞ ¼ Nk dðxi ; xk Þ; are well separated, while the other two have some
k¼1 CðiÞ¼k overlap across different measurements. The
misclassification error for the K-means classification
is about 11%.
where k ¼ 1, . . ., K indexes clusters,
xk ¼ ð
x1k ; . . . ; xpk Þ is the mean vector of cluster k, Software
and Nk is the total number of points in cluster k. The The K-means clustering algorithm is available in
loss function L(C) is minimized when, within each a number of software packages both free and commer-
cluster, the average distance from any point to the cial including R, S-plus, SAS, Apache Mahout,
cluster mean is minimized. Matlab, and Mathematica.
Clustering, k-Means 431 C
2.0 2.5 3.0 3.5 4.0 0.5 1.0 1.5 2.0 2.5
v v v
v v vv v v vvvv vvvvv
7.5
vv v v
vv v v vvvvv v vv v v
g vvgvg
v ggv v v gg v v
v gg g vvv
g gggggv vvvvv
g
g
gg gv vv v
gg vvv
6.5
g
vvvgv v
vg
vg g
ggggg vv v v ggg
g vvvvvvv
g g
v
g v g vv
vgg vv gggvvvvvvvv v g vg v
v v vvv
Sepal.Length g
v gg ggv g
v
v ggg
g vvvg vv ggg
g
g vg
g v
v
vg g
v
ggvv v g gggg gv g g g vv
vggg
g g
gg
g s s s s ss ggg ggg vvv sss gggg
gg vv
v
gggg v g ss s s s ggggggg gggg
5.5
g sss sss s
s
sssss g ss s
sssssss
gg C
g ggg ssss
s
ss sss
sss ggg g gg g
v ss
ss
sss
ss s s
ssss
sss s g v ss
ssss
ss s g v
ss s s s sssss s
sss
4.5
s sss s sss ss
s s s
s s s ss sss
2.0 2.5 3.0 3.5 4.0
ss s
s s s vv ssss vv ss
ssss v v
s sssssssss v s ss
s
s
s v ss v
s s sssss s g vg v sss
ss
ss
gg vv sssss s g
g vvv
s ss s g vg vvvg v sss gg g
ggggv
v
vvvv ss s gg g v
sss s sssss g g g g
vvg vvvvgg
vvg
g v vv vv Sepal.Width sss
s ggggg vvvv
v s
ss ggg
gg
g
v vv v
vgv vvvv
vv
gg
g ggvgggvg
v
gg vv v sssss g ggggg gg vvvvv v
vv v
v v sss ggg v
v
v v
g ggggg
g vv
v v vv
g
v
g ggggggggg
vv
vgv
vv v v vv gg
g
gggg
v vv
ggg g vvvvv v
g gg
v gv v v g v gg gggg v gv vv g gvg vvvv v
g gg
s g g g vg
g s g g g gg v s g
g gg v
g g g
7
v v v vvvv
vv v v
vvv v v vvvv vv v
v vvv v v
v v v v vvv
6
vvvvvv vvvv v vv vv vvvv
vvv v v v vv v v v
vvvv
vvg
vv gvvg vvvv g
vvg
vg
gvg v g
vvv g
v g vgvg
v v
gggvvvv vv
5
g gg vg
vgg
vgg g g
ggggg
ggvg
v v
g gg g
v ggg g ggg g gggg gg vg g g g gggg
g g
ggggggg gg gg
g gggg
gggg g gg Petal.Length
4
ggg
g gg
gg g gg g g g
g
g g g
3
2
s ss s s ss
s s ss s
sssss
s sss
ssssssssss s sssss
ssssssss ss s sss s
ss
s
s sss s s s s s s s ss sss s
1
0.5 1.0 1.5 2.0 2.5
s sss sss ss
sssss sssss s ss
sss s ss sss s s s ssssss
s ss ssssss s
s ssss ss
ss ssss
sss
ss s s ssssss
ss
s s
ss
s
sssss
s
Clustering, k-Means, Fig. 1 Color-coded K-means cluster assignments for the three different iris species setosa, s, versicolor, v,
and virginica, g
7.5
gg gg gg
gg g
g gggg gg g g
g v
vg vvggg vv g g
g vvg gvvgvgg vvvvvvgggggg v
vvvvv vg
g
g gg ggg
6.5
gvvg
g g g
g v gggg g g
vv g gg
v gg vg
vg vvvvgvgggggg
gg vv g g gg
vvv gg ggg
g
Sepal.Length g
vg
v gvvvvg v g vv gg
vvv vvvg gv g v vvg
vg
v g g
g
g
gg g v v v vgg v vvv g
v
gvv
vvvg
vvvvv s s s s ss v v
vvv g
vv vvvv sss vv vvv v gg
vv g g
vvv vvv g v
5.5
vvvv v ss ss s s
v v s ssssssssssss s
sssss
ssss
ssss v vv v
s
s s
ss
s
ss
sss
vvvv v
v v
C
v vvg sssssss g ss
sss s v g
sssss s s s ss v ss v
ss s s s sssss s
ss
4.5
s sss s s
sss ss
s s s
s s s s
ssss ss
2.0 2.5 3.0 3.5 4.0
s s
s sss s gg ssss gg sss gg
s ssss ss s
g s ssss g ss
s s g
s s ssss s v g gg ggg g sssssss
s vv gg g g s
ss
ss s v gg gg
s ss s s v vg g
vg ggvvgg gg
vgg Sepal.Width sss
ss vvvvvvvg
vvggggggg s
s s v
vvv g g g g
v g gg
sss s sssss v v v g
vggv g g sssssss gvgggg
ggggggg ss vvvv vvgvg
gggggg
vvvg vvvg
g vgvvvvv gg g
gg v vv
v vvvvv vv
vvvgvg ss
ss v
vvg
v vvg v vg gg g v vv ggg
v ggg
g g
g
v v v g
v v gg g
vv vvvg
ggg g
gv v vvvgv g v g v vvvvvvv vg vg g gggg
v
s v v g v s v vv vv g s vvvv vv v
vv v g
v
v v v
g gg gggg
7
gg g g
ggg g g gggg gg g
g ggg g g
g g g gg g ggg
6
ggggg gg gg gg gggg ggg g g gggg
gg ggg ggg g ggg g ggg g
v
gggg
gg vg vg g gv g
gvg g
vg
vg g v gg
vg g gg
5
v v g
vvvg v
g vvvv vvvvvvvvvvvvvv v g v
vv v vvvvvvvvv
v
vvvvvv
vvv
vgv
vvv
vvvvvv vvvvvvvv v vvvv
vv v
4
v vv v Petal.Length v vv
vv v vv v v
vv
v vv
v v v
3
2
s s s s ss sss ssss
s sss
sssssssssss s sssssssssssssssssss s s
sss
ss
s s
s sss s s s s s s ss
s
s
1
0.5 1.0 1.5 2.0 2.5
g gg gg g ggg g g gg gg
gg g gggggg gg
ggg g g g gg g g g ggggg gg
ggg ggg ggg g g gg g
g g
g g
g
g
g
g
g gg
g g ggggg gggg
gg gg
gg g
g vgggggg g
g gg g gggg
g gg
v g v ggg
g
vg g ggg
v v vg
v vvv vv v g
v vg
v v vgvvg
g g v
vvvvvvv v vvvvvgg
v v
v vvv g vvvv vvvv v v vvvvvvvvvvvvvv v vvvvvvvvvvvv
g
v v
vvvv vv v vvvvvvvvvvv v
Petal.Width
vv v vv v v vvvvvvv
ss sss sss s
ss s ssss ssss s
s s s s ssss
ssss ss
ss sss
sssss
sssss s s sssssssssss sss
sssssss s
ssss
ss
sss
ss
ssss
4.5 5.5 6.5 7.5 1 2 3 4 5 6 7
Clustering, Model-based, Fig. 1 Scatter plot of the three clusters identified in the iris data using model-based clustering. Clusters
are color-coded and the three species are labeled, i.e., setosa, s, versicolor, v, and virginica, g
The package is well documented and provides a good agreement between the true data and clustering
enhanced display and visualization tools. results. The misclassification rate is 3.3%.
Example
Figure 1 shows the classification of the iris data using Cross-References
the model-based clustering approach with
unconstrained covariance model as implemented in ▶ Bayesian Information Criterion (BIC)
MCLUST. The species are labeled with letters and ▶ Clustering
the three clusters are color-coded. Clearly, there is ▶ K-Means
C 434 CMP Motif
▶ Model Testing, Machine Learning The term dual is used in the formal sense of category
▶ Model Validation (▶ Mathematical Structure, Category) theory, namely,
that the defining operations are reversed.
References
References
SY
Cross-References Y
▶ Transcription in Eukaryote
AND
Z
References
Coherent Type I FFL, Fig. 1 The coherent type-1 FFL with an
Kaiser K, Meisterernst M (1996) The human general co-factors. AND input function at the Z promoter. SX and SY are input
Trends Biochem Sci 21(9):342–345 signals for X and Y
C 438 Cohesin
References Definition
Alon U (2007) Network motifs: theory and experimental Collaborative and distributed biomedical applications
approaches. Nat Rev Genet 8:450–461
on the World Wide Web or biomedical
Mangan S, Alon U (2003) Structure and function of the
feed-forward loop network motif. Proc Natl Acad Sci USA “collaboratories” (Olsen et al. 2008) enable and social-
100:11980–11985 ize multimodal activities critical to biomedical
research, using social media technologies. They enable
near-real-time group discussion of methods and
Cohesin findings, sharing of reagents, cooperative construction
of databases, and publication and annotation of new
Rosella Visintin scientific communications.
IEO, European Institute of Oncology,
Milan, Italy
Characteristics
Definition
Collective Behavior
The combinatorial molecular phenotype (CMP)
Andreas Deutsch is calculated from the alignment of a stack of
Center for Information Services and High Performance binary images originating from fluorescence signals
Computing (ZIH), Technical University Dresden, (gray values) generated by a Toponome Imaging
Dresden, Germany System (TIS) (▶ Clinical Aspects of the Toponome
Imaging System (TIS)). A CMP is a bijective
binary vector denoting the presence or absence (1 or
Definition 0) of any single out of N marker molecules
co-mapped by TIS (▶ Clinical Aspects of the
The term collective behavior describes the macroscopic Toponome Imaging System (TIS)) at a certain pixel/
behavior observed in systems that consist of a large voxel position or at multiple positions in the
number of interacting components. Collective behavior image. A CMP vector includes a certain number of
can result from self-organization, that is, the emergence pixel coordinates, at least one, which determine(s)
of patterns without external influence, and the formation the CMP frequency for statistical analysis. Several
of complex spatiotemporal patterns even from relatively distinct CMPs can be grouped as a so-called CMP
simple interactions. Examples of collective behavior motif (Fig. 1). A motif describes a set of CMPs by
include the sorting of cells by differential adhesion, the using a three-symbol code (Schubert 2007). The sym-
formation of fruiting bodies by microorganisms, and bol L (or 1) describes a marker or protein that all
organization of fish into huge swarms. CMPs of the group have in common. Symbol
A (or 0) is used when a certain marker is always
absent (anti-colocated) in the CMPs of the group. If
Colony-Forming Fibroblastic Cells a marker is variably present within a CMP group, the
symbol in the corresponding motif is a wild card
▶ Single Cell Assay, Mesenchymal Stem Cells W (or *).
A CMP at pixel position x,y is defined as binary
vector as follows:
Combinatorial Diversity
Combinatorial Molecular Phenotypes (CMPs), Fig. 1 The figure shows the stepwise generation of binary images, as well as the
calculation of CMP vectors and the three-symbol code of the resulting CMP motif
References
Combinatorial Transcription Regulatory
Friedenberger M, Bode M, Krusche A, Schubert W (2007) Network
Fluorescence detection of protein clusters in individual
cells and tissue sections by using toponome imaging system: Yong Wang
sample preparation and measuring procedures. Nat Protoc
2(9):2285–2294 Academy of Mathematics and Systems Science,
Schubert W (2007) A three-symbol code for organized Chinese Academy of Sciences, Beijing, China
proteomes based on cyclical imaging of protein locations.
Cytometry A 71(6):352–360
Schubert W (2010) On the origin of cell functions encoded in the
toponome. J Biotechnol 149(4):252–259 Synonyms
Schubert W, Bonnekoh B, Pommer AJ, Philipsen L,
B€ockelmann R, Malykh Y, Gollnick H, Friedenberger M, Combinatorial regulation; Transcriptional factor
Bode M, Dress AW (2006) Analyzing proteome topology cooperativity
and function by automated multidimensional fluorescence
microscopy. Nat Biotechnol 24(10):1270–1278
Schubert W, Bode M, Hillert R, Krusche A, Friedenberger M
(2008) Toponomics and neurotoponomics: a new way to Definition
medical systems biology. Expert Rev Proteomics
5(2):361–369
Schubert W, Gieseler A, Krusche A, Serocka P, Hillert R (2011) A regulatory network consists of regulators such as
Next-generation biomarkers based on 100- parameter func- transcription factors or kinases that control the expres-
tional super-resolution microscopy TIS. N Biotechnol. sion or activity of their target genes (Chen et al. 2009).
doi:10.1016/j.nbt.2011.12.004 Compared to more commonplace contexts, these reg-
ulators can be thought of as managers in a social or
corporate setting controlling their common subordi-
nates. Usually, these regulators are not working alone
Combinatorial Regulation and, almost always, these multiple regulators are
partnering together to control their targets. Combinato-
▶ Combinatorial Transcription Regulatory Network rial regulation is not a well-defined term in Biology and
C 442 Combinatorial Transcription Regulatory Network
(Hannenhalli and Levy 2002) predicted TF synergism genomic features, which provide further biological
by the co-localization of transcriptional factor–binding insights into the different types of TF cooperativity.
site (TFBS) on the genome at the specific distance
apart. (Das et al. 2004) created a multivariate adaptive
Cross-References
regression splines model to correlate the occurrences
and interactions of TFs to the logarithm of the ratio of
▶ Gene Regulatory Networks
gene expression levels. (Yu et al. 2006) identified
▶ Regulation
interacting binding motif pairs considering their orien-
tation and distance. Similarly, (Chang et al. 2006) used
promoter sequences and gene expression data to detect References
cooperativity. Recently, (Datta and Zhao 2008) used
a log-linear model to infer cooperative binding. The Balaji S, Babu MM, Iyer LM, Luscombe NM, Aravind L (2006)
second class is the target gene (TG)-based methods. Comprehensive analysis of combinatorial regulation using
(Banerjee and Zhang 2003) assumed that a TF pair is the transcriptional regulatory network of yeast. J Mol Biol
360:213–227
cooperative if their TGs are significantly co-expressed Banerjee N, Zhang MQ (2003) Identifying cooperativity among
while (Nagamine et al. 2005) considered TG interac- transcription factors controlling the cell cycle in yeast.
tions instead of TG co-expression. (Tsai et al. Nucleic Acids Res 31:7024–7031
2005) directly selected TF pairs with significant num- Chang Y-H, Wang Y-C, Chen B-S (2006) Identification of
transcription factor cooperativity via stochastic system
ber of common TGs and further applied the ANOVA model. Bioinformatics 22:2276–2282
test to determine synergy. (Balaji et al. 2006) applied Chen L, Wang RS, Zhang XS (2009) Biomolecular networks:
a network transformation procedure to ChIP-chip data methods and applications in systems biology. Wiley,
and analyzed combinatorial regulation among multiple Hoboken
Das D, Banerjee N, Zhang MQ (2004) Interacting models of
TFs. The third class is the transcription factor activity cooperative gene regulation. Proc Natl Acad Sci 101:16234–
(TFA)-based methods. (Yang et al. 2005) inferred the 16239
TFAs from gene expression data then assessed TF Datta D, Zhao H (2008) Statistical methods to infer cooperative
cooperativity by their TFA correlation. Wang binding among transcription factors in Saccharomyces
cerevisiae. Bioinformatics 24:545–552
(2007) used the similar framework to study combina- Hannenhalli S, Levy S (2002) Predicting transcription factor
torial regulation in yeast cell cycle. synergism. Nucleic Acids Res 30:4278–4284
In addition to the above case studies and the Nagamine N, Kawada Y, Sakakibara Y (2005) Identifying coop-
unsupervised framework using information from erative transcriptional regulations using protein-protein
interactions. Nucleic Acids Res 33:4828–4837
a single data source such as TF-binding motif, target Parisi F, Wirapati P, Naef F (2007) Identifying synergistic reg-
gene, and TF activity, Wang (2009) used Bayesian ulation involving c-Myc and sp1 in human tissues. Nucleic
networks, a supervised learning approach, to system- Acids Res 35:1098–1107
atically integrate diverse data sources at different Pilpel Y, Sudarsanam P, Church GM (2001) Identifying regula-
tory networks by combinatorial analysis of promoter ele-
levels to predict and analyze TF cooperativity. Specif- ments. Nat Genet 29:153–159
ically, they design a Bayesian network structure to Tsai H-K, Lu HH-S, Li W-H (2005) Statistical methods for
capture the dominant correlations among features and identifying yeast cell cycle transcription factors. Proc Natl
TF cooperativity, and introduce a supervised learning Acad Sci 102:13532–13537
Wang J (2007) A new framework for identifying combinatorial
framework with a well-constructed gold standard regulation of transcription factors: a case study of the yeast
dataset. This framework allows us to assess the predic- cell cycle. J Biomed Inform 40:707–725
tive power of each genomic feature, validate the supe- Wang Y, Zhang X-S, Chen L (2009) Predicting eukaryotic
rior performance of our Bayesian network compared to transcriptional cooperativity by Bayesian network
integration of genome-wide data. Nucleic Acids Res
alternative methods, and integrate genomic features. 37(18):5943–5958
Data integration reveals 159 high-confidence predicted Yu X, Lin J, Masuda T, Esumi N, Zack DJ, Qian J (2006)
cooperative relationships among 105 TFs, most of Genome-wide prediction and characterization of interactions
which are subsequently validated by literature search. between transcription factors in Saccharomyces cerevisiae.
Nucleic Acids Res 34:917–927
The existing and predicted transcriptional Yang Y-L, Suen J, Brynildsen M, Galbraith S, Liao J (2005)
cooperativities can be further grouped into three cate- Inferring yeast cell cycle regulators and interactions using
gories based on the combination patterns of the transcription factor activities. BMC Genomics 6:90
Community Genomics 445 C
(1) facilitating integration across datasets and disci-
Combinatorics of Sets plinary approaches to the study of the same organism
and (2) facilitating cross-species comparison across
▶ Subset Surprisology and Toponomics available datasets (Howe and Rhee 2008; Leonelli
and Ankeny 2012). Examples of community databases
are The Arabidopsis Information Resource, gathering
data about Arabidopsis thaliana (Koornneef and C
Committee Meinke 2010); and FlyBase, gathering data about Dro-
sophila melanogaster (Drysdale and FlyBase Consor-
▶ Ensemble tium 2008).
Cross-References
Common Cold
▶ Data-Intensive Research
▶ Viral Respiratory Tract Infections
References
Community Detection
Community Genome
Synonyms
▶ Metagenome
Model organism database
Definition
Community Genomics
Database collecting diverse types of data documenting
the same (set of) organisms, with the purpose of ▶ Metagenomics
C 446 Community Metabolomics
Definition
Community Metabolomics
A fundamental goal of biology is to understand the cell
▶ Metametabolomics as a system of interacting components which can be
represented as a network (Chen et al. 2009). The dis-
covery, analysis, and understanding of interactions
between molecules as well as the networked system
Community Proteomics
have received significant attentions in recent years. In
a network, each node corresponds to a molecule and an
▶ Metaproteomics
edge indicates a direct/indirect interaction between the
molecules. As the number of available molecular net-
works for various species rapidly increases, compara-
Community Structure tive analysis of them across species is proving to be
increasingly important. The comparative study of
▶ Modular Organization of Gene Regulatory Networks molecular networks can be topologically coarse-level
comparison such as the comparative analysis of degree
distribution, clustering coefficient, diameter, and rela-
Community Transcriptomics tive graphlet frequency distribution, or can be the
discovery of conserved subgraphs among networks.
▶ Metatranscriptomics Actually, the later one which is known as network
alignment problem in bioinformatics field is a well-
known concept for molecular network studies and will
be the key of this entry. We should note here that the
Compact Chromatin problems similar to network alignment have also been
studied in other fields like graph theory, data mining,
▶ Heterochromatin and computer vision. For example, the problem of
matching a query image to an existing image in the
database has been formulated as a graph-matching
problem in computer vision field, where each image
Compact DNA
is represented as a graph/network.
▶ Chromatin
Characteristics
PKSDIDVDLCSELMAKACSE-GV
PKS +D+DLCSEL+ KAC++ +
PKSSLDIDLCSELIIKACTDCKI
C
Biological networks
Conserved Matched
Species 2 interactions protein pairs
(Condition/type 2)
High-scoring
conserved subnetworks
Search
algorithm
Comparative Analysis of Molecular Networks, Fig. 1 A general framework for pairwise network alignment, with permission
from Sharan and Ideker (2006)
the subgraph which may correspond to the sequence taking into account both the sequence homology
similarity or function similarity of molecules (Ogata between the aligned molecules and the probabilities
et al. 2000). The goal is to maximize the overlap that interactions in the subnetworks are true or not
between the aligned networks and ensure that the pro- (Kelley et al. 2004), while the NetworkBLAST tool
teins mapped to each other are evolutionarily or func- detects conserved protein clusters rather than only
tionally related. Moreover, according to the number of paths across pairwise or multiple networks, by
input networks, the network alignment problem can be deploying a likelihood-based scoring scheme that
formulated as pairwise or multiple alignments. While weighs the denseness of a subnetwork versus the
according to the scope of node mapping desired, the chance of observing such subnetwork at random
alignment can be local or global which is analogous to (Sharan et al. 2005). By using this tool to compare
sequence alignment or structure alignment problem. multiple networks across different species, Suthram
We may also want to align a known or even unknown et al. (2005) explored whether the divergence of Plas-
small network to a large network or network database modium at the sequence level can be embodied at the
to find the conserved ones which is named as the level of the structure of its protein interaction network.
network querying problem. They found that Plasmodium has only three conserved
The majority of methods used for alignment of complexes versus yeast, and no conserved complexes
molecular networks have focused on local alignments against fly, worm, and bacteria. But yeast, fly, and
(Berg and L€assig 2004, 2006; Kelley et al. 2004; worm share an abundance of conserved complexes
Flannick et al. 2006; Liang et al. 2006) which attempts with each other. In another method MaWISh, the
to find local region similarity and find node mappings authors formulated network alignment as a maximum
independently for different regions. There have been weight induced subgraph problem and implemented an
many algorithms for local alignment in the literature. evolution-based scoring scheme to discover conserved
The PathBLAST tool searches for high-scoring linear modules. The method extends the concepts of evolu-
or tree substructures between two large networks, by tionary events in sequence alignments to that of
C 448 Comparative Analysis of Molecular Networks
duplication, match, and mismatch in network align- rankings with other ranking algorithms (Liao et al.
ments and evaluates the similarity between network 2009). The Graemlin method has also been extended
structures through a scoring function that accounts for to allow global network alignment which is based
these evolutionary events (Koyut€ urk et al. 2005), while on a learning algorithm that uses a training set of
the Graemlin is the first method which can identify known network alignments and their phylogenetic
dense conserved subnetworks of arbitrary structure. relationships to learn parameters for its scoring
This method scores a module by computing the log- function, and by automatically adapting the learned
ratio of the probability that it is subject to evolutionary objective function to any set of networks (Flannick
constraints and the probability that it is under no con- et al. 2008).
straints, while taking into account phylogenetic rela- Another aspect of network alignment studies is
tionships between species whose networks are being alignment for multiple networks which has gained
aligned (Flannick et al. 2006). Another example is great progress recently. Several of above methods
MNAligner (Li et al. 2007), which is designed for have also been developed for multiple cases. For
general molecular networks and combines both molec- example, Graemlin developed by Flannick et al.
ular similarity and topological similarity. This method (2006), which uses a probabilistic function for topol-
can detect conserved subnetworks in an efficient math- ogy matching, and can be applied to search for con-
ematical programming manner without requiring spe- served functional modules among multiple protein
cial structures on the networks. interaction networks. The C3Part-M method (Deniélou
Local alignments can be ambiguous, in which one et al. 2009) addressed the problem of multiple network
node can have different counterparts in different con- alignment with an exact and generic approach by
served modules. By contrast, a global network align- avoiding the explicit construction of the network align-
ment provides a unique alignment for each node in the ment multigraph, and then can deal with a large num-
smaller network to exactly one node in the larger ber of networks. By using microarray data from
network. We have to note that this may lead to multiple conditions and species, various comparative
suboptimal alignments in some local regions. Gener- studies have been conducted so as to reveal transcrip-
ally, the local network alignment algorithms have not tional regulatory modules, predict gene functions, and
been able to identify large subnetworks that have been uncover evolutionary mechanisms (Zhou and Gibson
conserved during evolution due to the algorithmic 2004). For example, Yan et al. (2007) have developed
complexity or modeling aspect (Berg and L€assig a graph-based data-mining algorithm NeMo to detect
2004; Kelley et al. 2004). Recently, global network frequent co-expression modules among a large number
alignment methods have been studied and designed for of gene co-expression networks across various condi-
aligning molecular networks (Singh et al. 2007; tions. They found a large number of potential tran-
Flannick et al. 2008; Zaslavskiy et al. 2009). Unlike scriptional modules, which are activated under
the local alignment algorithms, the IsoRank method multiple conditions.
(Singh et al. 2008) aims to maximize the overall match In addition to the network alignment method
between the two networks. The method is based on discussed above, the network querying technique is
spectral graph theory which computes scores of becoming a new major network comparative analysis
aligning pairs of nodes according to the score of their tool. The goal of network querying is to find a small
respective neighbors from different networks. In other network against a large-scale network or a database of
words, the score of a protein pair depends on the score large-scale networks which is a NP-hard problem
of their neighbors. And then sequence scores are incor- (Fig. 2). The network querying problem has been stud-
porated into these “topological” scores in the pairwise ied in the past few years and a few search tools have
alignment scores. Finally, IsoRank constructs the been developed, such as PathBLAST (Kelley et al.
node alignment with the repetitive greedy strategy of 2003, 2004), QPath (Shlomi et al. 2006), TORQUE
identifying among all protein pairs the highest scoring (Bruckner et al. 2009), and QNet (Dost et al. 2007).
pair (Singh et al. 2008). Recently, the IsoRankN PathBLAST (Kelley et al. 2003, 2004) developed by
has extended based on the notion of node-specific Kelley et al. can query a small pathways with length no
Comparative Analysis of Molecular Networks 449 C
Comparative Analysis of
Molecular Networks,
Fig. 2 Illustration of
metabolic pathways querying
in a pathway database,
redrawn from (Pinter et al.
2005)
C
Ferro A, Giugno R, Pigola1 G, Pulvirenti A, Skripin D, Bader Zhang S, Zhang XS, Chen L (2008) Biomolecular network
GD, Shasha D (2007) NetMatch: a cytoscape plugin for querying: a promising approach in systems biology. BMC
searching biological networks. Bioinformatics 23:910–912 Syst Biol 2:5, Jan 18
Flannick J et al (2006) Graemlin: general and robust alignment Zhou XJ, Gibson G (2004) Cross-species comparison of
of multiple large interaction networks. Genome Res genome-wide expression patterns. Genome Biol 5(7):232
16(9):1169–1181
Flannick J et al (2008) Automatic parameter learning
for multiple network alignment. RECOMB LNBI
4955:214–231
Kelley BP et al (2003) Conserved pathways within bacteria and Competitive Repopulation
yeast as revealed by global protein network alignment. Proc
Natl Acad Sci USA 100:11394–11399 ▶ Single Cell Assay, Hematopoietic Stem Cell
Kelley BP, Yuan B, Lewitter F, Sharan R, Stockwell BR,
Ideker T (2004) PathBLAST: a tool for alignment of
protein interaction networks. Nucleic Acids Res 32
(Web Server issue):W83–W88
Koyut€urk M, Grama A, Szpankowski W (2005) Pairwise local Compiler
alignment of protein interaction network guided by models of
evolution. RECOMB LNBI 3500:48–65
Li Z, Zhang S, Wang Y, Zhang X, Chen L (2007) Alignment of José Román Bilbao Castro
molecular networks by integer quadratic programming. Bio- Supercomputing-Algorithms Group, University of
informatics 24(4):594–596 Almerı́a, Almerı́a, Spain
Liang Z, Xu M, Teng M, Niu L (2006) NetAlign: a web-based
tool for comparison of protein interaction networks. Bioin-
formatics 22:2175–2177
Liao CS, Lu K, Baym M, Singh R, Berger B (2009) IsoRankN: Definition
spectral methods for global alignment of multiple protein
networks. Bioinformatics 25:i253–i258
Ogata H, Fujibuchi W, Goto S, Kanehisa M (2000) A heuristic
A compiler is a computer program that translates a
graph comparison algorithm and its application to detect programmer-written code into something a processor
functionally related enzyme clusters. Nucleic Acids Res can “understand.” The programmer writes what is called
28:4021–4028 a “source code” in a specific language. Such a language
Pinter RY, Rokhlenko O, Yeger-Lotem E, Ziv-Ukelson M
(2005) Alignment of metabolic pathways. Bioinformatics
is easier to understand to the programmer than a
21:3401–3408 “machine language” composed of 1s and 0s. The com-
Sharan R, Ideker T (2006) Modeling cellular machinery piler then facilitates programming by allowing a much
through biological network comparision. Nature Biotechnol higher level of abstraction. Compiler development is
24:427–433
Sharan R, Suthram S, Kelley RM, Kuhn T, McCuine S, Uetz P,
a very important field of study as compiled programs’
Sittler T, Karp RM, Ideker T (2005) Conserved patterns of performances will strongly depend on the compiler
protein interaction in multiple species. Proc Natl Acad Sci ability to interpret the high level code and optimize it
USA 102:1974–1979 for the underlying machine it will be executed on.
Shlomi T, Segal D, Ruppin E, Sharan R (2006) QPath: a method
for querying pathways in a protein-protein interaction net-
work. BMC Bioinformatics 7:199
Singh R, Xu J, Berger B (2008) Global alignment of multiple Cross-References
protein interaction networks with application to functional
orthology detection. PNAS 105(35):12763–12768
Suthram S, Sittler T, Ideker T (2005) The plasmodium protein
▶ Multicore Computing
network diverges from those of other eukaryotes. Nature
438:108–112
Yan X, Mehan M, Huang Y, Waterman MS, Yu PS, Zhou XJ
(2007) A graph based approach to systematically reconstruct
human transcriptional regulatory modules. Bioinformatics Complementarity Determining Region
23(13):i577–i586
Zaslavskiy M, Bach F, Vert JP (2009) Global alignment of
protein-protein interaction networks by graph matching ▶ Complementarity Determining Region (CDR-
methods. Bioinformatics 25(12):i259–i267 IMGT)
Complementarity Determining Region (CDR-IMGT) 451 C
IMGT ®, the international ImMunoGeneTics informa-
Complementarity Determining Region tion system ® (http://www.imgt.org) (▶ IMGT® Infor-
(CDR-IMGT) mation system).
In a V-LIKE-DOMAIN (V domain of ▶ Immuno-
Marie-Paule Lefranc globulin Superfamily (IgSF) other than IG and TR),
Laboratoire d’ImmunoGénétique Moléculaire, Institut the three loops BC, C’C” and FG correspond,
de Génétique Humaine UPR 1142, Université structurally, to the three CDR-IMGT of a C
Montpellier 2, Montpellier, France V-DOMAIN, and are delimited by anchors at the
same positions, 26 and 39, 55 and 66, 104 and 118,
respectively.
Synonyms Amino acid positions of the CDR-IMGT of a
V-DOMAIN have always the same number according
CDR-IMGT; Complementarity determining region to the ▶ IMGT unique numbering for V domain
(Lefranc et al. 2003). This allows to define in an
▶ IMGT Collier de Perles the lengths of the CDR-
IMGT. These characteristics, based on the ▶ IMGT-
Definition ONTOLOGY concepts, are managed in the ▶ IMGT ®
information system databases and tools. Starting from
A Complementarity determining region (CDR-IMGT) amino acid sequences, the CDR-IMGT lengths are
is a loop region of a V-DOMAIN (▶ Variable (V) obtained using the IMGT/DomainGapAlign tool
Domain), delimited according to the IMGT unique (http:www.imgt.org) (Lefranc 2009; Ehrenmann et al.
numbering for V domain (Lefranc et al. 2003). There 2010).
are three CDR-IMGT in a V-DOMAIN: CDR1-IMGT The lengths of the three CDR-IMGT lengths of
(loop BC), CDR2-IMGT (loop C0 C00 ), and CDR3- a V-DOMAIN are shown between brackets and sep-
IMGT (loop FG). arated with dots. For example, the CDR-IMGT
In a V-DOMAIN (V domain of the immunoglobu- lengths of the V-DOMAIN displayed in Fig. 1 are
lins (IG) or antibodies and T cell receptors (TR)), [8.10.12]. The CDR-IMGT lengths of a basic
the amino acids of the CDR-IMGT bind to an V domain without gaps are [12.10.13]. For CDR3-
antigen (▶ Epitope) and confer the specificity to IMGT with more than 13 amino acids, additional
the IG and TR (▶ Paratope, ▶ IMGT-ONTOLOGY, positions are added at the top pf the loop (Lefranc
SpecificityType). The first two CDR-IMGT are part of et al. 2003).
the V-REGION (encoded by a ▶ variable (V) gene), The CDR-IMGT lengths of therapeutic
whereas the CDR3-IMGT corresponds to the junction antibodies are required for the World Health
and results from the rearrangement between a V gene Organization/International Nonproprietary Names
and a ▶ joining (J) gene (V-J rearrangement) or (WHO/INN) and are included in the INN definitions
between a V gene, a ▶ diversity (D) gene, and a J (Lefranc 2011). The standardized delimitation
gene (V-D-J rearrangement) (▶ Immunoglobulin Syn- of the CDR-IMGT has a crucial importance in anti-
thesis). The two anchors, 2nd-CYS 104 and J-TRP or body engineering and antibody humanization by
J-PHE 118 (▶ Framework region (FR-IMGT)), are CDR grafting: it allows to precisely delimit the
part of the JUNCTION but do not belong to the CDR amino acids from the original antibody
CDR3-IMGT. (murine, rat, etc.) that need to be grafted on an
Analysis of the JUNCTION and of the CDR3- human antibody framework, in order to preserve the
IMGT of V-DOMAIN nucleotide sequences is specificity of the original antibody in the therapeutic
performed by IMGT/JunctionAnalysis, integrated in monoclonal antibody, while decreasing its immuno-
IMGT/V-QUEST, the nucleotide sequence analysis genicity (Lefranc 2009, 2011; Ehrenmann et al.
tool (Lefranc 2009; Ehrenmann et al. 2010) of 2010).
C 452 Complementarity Determining Region (CDR-IMGT)
Complementarity Determining Region (CDR-IMGT), representation. (e) Spacefill 3D representation. The CDR1-
Fig. 1 Complementarity determining regions (CDR-IMGT) of IMGT, CDR2-IMGT and CDR3-IMGT regions are colored in
a V-DOMAIN. The CDR-IMGT correspond to three loops in red, orange and purple, respectively (IMGT Color menu). The
IMGT Colliers de Perles (on one layer, and on two layers with CDR-IMGT lengths are [8.10.12]. The antiparallel beta strands
hydrogen bonds) and in three-dimensional (3D) structures. (A to G) correspond to the ▶ Framework Region (FR-IMGT).
(a) Amino acid sequence (http://www.imgt.org) of an antibody Anchors positions, shown as squares in B and C (26 and 39, 55
V-DOMAIN (VH of the IG-Heavy chain) with delimitation of and 66, 104 and 118), and as spheres in D, belong to the
the three CDR-IMGT. (b) IMGT Collier de Perles on one layer. FR-IMGT. Hydrophobic amino acids (hydropathy index with
(c) IMGT Collier de Perles on two layers. Hydrogen bonds positive value) and tryptophan (W) found at a given position in
between the amino acids of the C, C0 , C00 , and F and G strands more than 50% of analysed IG and TR sequences are shown
and those of the CDR-IMGT are shown. (d) Ribbon 3D in blue in B and C
Complex Behavior 453 C
Cross-References understand living systems as being somehow both
lawful and yet also unpredictable. Yet in her book of
▶ Diversity (D) Gene the same name, Mitchell (2009) notes the current lack
▶ Epitope of a consensual definition of what precisely constitutes
▶ Framework Region (FR-IMGT) a complex system. She cites definitions of complexity
▶ IMGT Collier de Perles as the algorithmic information content of a system
▶ IMGT Unique Numbering (i.e., the minimum coding length of an algorithm C
▶ IMGT ® Information System capable of reconstructing the system), as the fractal
▶ IMGT-ONTOLOGY dimension of a system, and as the level of composi-
▶ Immunoglobulin Superfamily (IgSF) tional hierarchy possessed by a system’s structure. She
▶ Immunoglobulin Synthesis concludes from this potpourri of definitions that
▶ Joining (J) Gene “notions of complexity [. . .] have many interacting
▶ Paratope dimensions and probably can’t be captured by
▶ IMGT-ONTOLOGY, SpecificityType a single measurement scale.”
▶ Variable (V) Domain These definitions reflect a marked ambiguity in
▶ Variable (V) Gene what precisely we mean when we describe life
as being complex. Definitions in terms of
algorithmic information content regard complexity
References as complicatedness: Humans are more complex than
bacteria because they possess a more complicated
Ehrenmann F, Duroux P, Giudicelli V, Lefranc M-P (2010) Stan- behavioral repertoire. On the other hand, definitions
dardized sequence and structure analysis of antibody using
in terms of compositional hierarchy tackle a more
IMGT ®. In: Kontermann R, D€ ubel S (eds) Antibody engi-
neering, vol 2, 2nd edn. Springer-Verlag Berlin, Heidelberg, elusive quality of living systems as organized:
pp 11–31, chapter 2 Humans are more complex than bacteria because
Lefranc M-P (2009) Antibody database and tools: the IMGT ® their wide behavioral repertoire is organized around
experience. In: Zhiqiang An (ed) Therapeutic monoclonal
overarching purposes.
antibodies: from bench to clinic. Wiley, Hoboken,
New Jersey, pp 91–114, chapter 4 A synthesis of these quite differing criteria has
Lefranc M-P (2011) Antibody nomenclature: from IMGT- emerged from Kauffman’s (1993) analysis of the
ONTOLOGY to INN definition Mabs 3:1–2 behavior of a family of dynamical systems called NK
Lefranc M-P, Pommié C, Ruiz M, Giudicelli V, Foulquier E,
Boolean networks. When the connectivity K of
Truong L, Thouvenin-Contet V, Lefranc G (2003) IMGT
unique numbering for immunoglobulin and T cell receptor a Boolean network is below a certain threshold, the
variable domains and Ig superfamily V-like domains. Dev network exhibits ordered behavior, converging to
Comp Immunol 27:55–77 a temporally stable, globally coordinated attractor
state, while at higher values of K, the behavior
becomes chaotic, veering unpredictably between topo-
logically uncoordinated states. At intermediate values
Complex Behavior of K the network enters the complex regime, in which
the network divides into connected regions of compar-
Niall Palfreyman atively stable, topologically coordinated states, yet
Biotechnology and Bioinformatics, Weihenstephan- intermittently redistributing these coordinated states
Triesdorf University of Applied Science, Freising, and regions.
Germany Such complex behavior is certainly visually remi-
niscent of living systems – it is neither random nor
predictably ordered, but is characterized by dynamic
Definition metastability. A dynamical system is metastable with
respect to some parameter l if its dynamics remain
Since the 1990s, the term complexity has played an relatively stable over large variational regions of l, but
increasingly prominent role in our attempts to bifurcate abruptly across specific critical values of l.
C 454 Complex EGI and Enriched Complex EGI
0.5
−0.5
−1
0 1 2 3 4 5 6 7 8 9 10 11 12
Cross-References
n1
References
n1 n1
A1eidxy A1eidxz
n1
A1
n1
A1eidyz
Society conference on computer vision and pattern recogni- complex systems. The effects of an element’s behavior
tion, Maui, HI, pp 580–585 are fed back to in such a way that the element itself is
Kang S, Ikeuchi K (1993) The complex EGI, a new representa-
tion for 3D pose determination. IEEE Trans Pattern Anal altered (Kitano 2007). Emergent behavior in complex
Machine Intell 15(7):707–721 systems arises if all constituents of the system observed
on one level cannot explain the system properties on
a coarser or higher level (Walleczek 2000).
Complex System
References
Roberta Alfieri and Luciano Milanesi
Institute for Biomedical Technologies – CNR Kitano H (2007) Towards a theory of biological robustness. Mol
Syst Biol 3:137
(Consiglio Nazionale delle Ricerche), Segrate,
Rocha LM (1991) Systems modeling: using metaphors from
Milan, Italy nature in simulation and scientific models, Los Alamos
National Laboratory, Nov 1999
Walleczek J (2000) Self-organized biological dynamics and
nonlinear control. Cambridge University Press, Cambridge
Synonyms
that is nonlinear (e.g., because the variable of interest is remains stable under a certain range of disturbances
squared). Nonlinear interactions frequently involve of the parts of the system.
positive or negative ▶ feedback.
Nonlinear dynamical equations are characterized by Emergence
nonadditivity, which means that numerical combina- The behavior of complex systems is often said to be
tions of solutions are in general not solutions. The emergent (▶ Emergence) in the sense of being
feature of nonadditivity is one reason why complex unexpected, unpredictable, or unexplainable on basis
systems are said to be “more than the sums of their of the knowledge about the parts of the system in
parts.” To put it another way, complex systems are not isolation (more on this in section “Emergence and the
aggregative systems (Wimsatt 2007) because their Limits of Reductionism”).
behavior does not remain invariant under interchanging
their parts, under changes in the number of parts, and Different Kinds of Complexity
under decomposing and rearranging their parts, and the A second way to characterize the phenomenon of com-
interactions among their parts are not linear. plexity is to distinguish different kinds of complexity.
At least two ▶ classifications of complexity are worth
Sensitivity to Initial Conditions being mentioned here. The first one has been intro-
The behavior of some complex systems is sensitive to duced by Mitchell (2003, 2009). She distinguishes
initial conditions, that is, their behavior differs largely three different kinds of complexity in biology: consti-
in the long run even if there are only minuscule differ- tutive complexity, dynamic complexity, and evolved
ences at the beginning. This is due to the fact that the complexity. “Constitutive complexity” refers to the
nonlinearity of the interactions between the system’s complexity of the structure that biological systems
parts allows small differences in the system state to be (like organisms) display. The structure of biological
amplified (▶ Amplification) into large differences in systems is complex if the system as a whole is being
the subsequent system trajectory. formed of numerous parts in nonrandom, non-simple
▶ organization (see also Simon 1962; Wimsatt 2007).
Hierarchical Organization “Dynamic complexity” concerns the complexity of the
Complex systems typically possess a hierarchical processes that biological systems are engaged in, for
nature (▶ Hierarchy), that is, they are parts of higher- instance, the developmental or evolutionary processes
level systems and they consist of parts that are them- that organisms undergo. Finally, “evolving complex-
selves (lower-level) systems which, in turn, may also ity” refers to the domain of alternative adaptive solu-
be composed of subsystems, and so on. For instance, tions that are available for certain adaptive problems. If
organisms consist of organ systems that are composed there exists a wide diversity of forms in life which have
of tissues which consist of cells, etc. This feature is evolved as solutions to the same adaptive problem,
also referred to as the multilevel character of complex there is said to be much evolving complexity.
systems. Recognizing the hierarchical nature of com- More recently, Kuhlmann (2011) has argued that it
plex systems is important for understanding how com- is important to distinguish compositional complexity
plexity can evolve (Simon 1962). from dynamical complexity. Although he uses similar
words as Mitchell, the two kinds of complexity he
Self-organization identifies are different from hers. This difference
Complex systems are self-organizing systems. might (at least partly) be due to the fact that Kuhlmann
▶ Self-organization means that an initially disordered is more interested in complex systems from physics
system becomes more organized or ordered because of and socio-economics, rather than from biology. What
the interactions of the parts of the system. The process Kuhlmann means by compositional complexity of
of self-organization is spontaneous, that is, it is not a system is the complicated organization of the setup
centrally controlled by any agent or subsystem. conditions of a system, that is, the fact that a system
consists of many parts and that the individual behav-
Robustness iors of the parts as well as their organization determine
The organization or order of a complex system the overall behavior of a system. Somehow surpris-
is said to be robust (▶ Robustness), that is, it ingly, Kuhlmann emphasizes that, in this case, the
Complexity 459 C
parts of compositionally complex systems interact developed which decouple the concept of
with each other in a linear fashion, which is why the a mechanistic explanation from the concept of
behavior of the system is a summation of the behaviors a reductive explanation because they allow higher-
of its parts. Contrary to this, in the case of dynamical level and contextual factors to figure center stage in
complexity, most facts about the nature of the parts of a mechanistic explanation (e.g., Bechtel 2008).
the system as well as their initial arrangement have no
bearing on the behavior of the system. Rather, what Causality and Complex Systems C
makes these kinds of systems complex is that although A second philosophical question concerns the causal
they are compositionally simple and the rules that structure of complex systems (▶ Causality). Do any
determine their dynamics are simple (but nonlinear), challenges and constraints for a philosophical theory of
they show patterns that are factually unpredictable and causation arise from the peculiarities of the causal
qualitatively unexpectable. structure of complex systems?
A possible challenge might be the existence of
Further Philosophical Issues downward causation, which is a topic that is also
Explaining Complexity frequently discussed in science itself. Downward cau-
One important philosophical question that arises in sation encompasses cases, in which entities from
the context of complex system research concerns the a higher level of organization causally affect entities
nature of scientific ▶ explanation. Do the explana- on a lower level (▶ Interlevel Causation). Downward
tions of the behavior of complex systems constitute causation between a whole (i.e., the higher-level
a special kind of explanations, as, for instance, Mitch- entity) and its parts (i.e., the lower-level entities) is
ell (2009) and Kuhlmann (2011) argue? Or do they regarded as particularly problematic because a central
belong to one or more of the established kinds of assumption in most theories of causation is that cause
explanation (like causal-mechanistic explanation, and effect must not be identical. However, if it is true
covering-law explanation, functional explanation, that the relation between a whole and the set of its
mathematical explanation, etc.)? A good starting organized parts is one of identity, as one could argue,
point for addressing this question is the widespread the whole cannot be causally related to its parts.
claim of biologists that reductionistic-mechanistic Other challenges and constraints of a theory of
research strategies (▶ Reduction; ▶ Mechanism) are causation that may arise from the investigation of
inappropriate or, at least, insufficient to investigate complex systems concern the context-sensitivity of
and explain the behavior of complex systems (e.g., their behavior and the nonlinearity of the interactions
Gallagher and Appenzeller 1999). This suggests that between their parts. The latter often involve cyclic
explanations that are developed in sciences of com- causal relations like ▶ feedback, which constitute
plex systems are non-reductive (▶ Explanation, a challenge for some theories of causation (e.g., for
Reductive) and non-mechanistic. However, it is far causal graph theories).
from clear what exactly this claim amounts to and
whether it is true for all explanations in this field (e.g., Emergence and the Limits of Reductionism
whether it also applies to computational explanations One of the features that are characteristic of complex
that can be found, for instance, in systems biology). systems is that their behavior is said to be emergent
One reason why the explanation of the behavior of (see section “What Is a Complex System”). However,
complex systems might be non-reductive is that these despite its ubiquity, the notion of ▶ emergence is left
explanations frequently appeal not only to the parts of notoriously vague. Most scientists use the term “emer-
the system, but also to contextual factors and higher- gence” in an epistemic sense, that is, they call
level factors (which is why they are said to be a behavior (or property) of a system emergent if it is
multilevel explanations; Mitchell 2009). Moreover, unpredictable, unexpectable, or unexplainable on the
the behavior of complex systems cannot be explained basis of present knowledge about the behavior (or
by referring only to the parts of the system in isolation properties) of its parts in isolation.
(which is, in turn, typical for reductive explanations; Others want to understand emergence ontologically
Kaiser 2011). However, in the last 15 years, several and link it to irreducibility. According to them, study-
accounts of mechanistic explanation have been ing emergent behaviors of complex systems reveals the
C 460 Complication
limits of reductionism (▶ Reduction). More precisely, Bechtel W, Richardson RC (2010) Discovering complexity.
it discloses the conditions under which the (exclusive) Decomposition and localization as strategies in scientific
research. MIT Press, Cambridge, MA
application of reductive research strategies (like Gallagher R, Appenzeller T (1999) Beyond reductionism.
decomposition, simplification of the system’s context, Science 284:7
and investigating parts in isolation; Wimsatt 2007; Hooker C (ed) (2011) Philosophy of complex systems. Elsevier,
Kaiser 2011) is not adequate any more. In other Amsterdam (¼ Handbook of the philosophy of science; 10)
Kaiser MI (2011) Limits of reductionism in the life sciences.
words, the emergent behavior of a complex system Hist Philos Life Sci 33:389–396
cannot be understood by decomposing the system Kuhlmann M (2011) Mechanisms in dynamically complex
into its parts, by examining the parts in isolation (i.e., systems. In: McKay Illari P, Russo F, Williamson J (eds)
separated from the original system), and by ignoring Causality in the sciences. Oxford University Press, Oxford,
pp 880–906
the context of the system. This is, for instance, due to Ladyman J, Lambert J, Wiesner K (2012) What is a complex
the fact that the parts of a complex system are orga- system? Eur J Philos Sci (online first)
nized or “integrated” (Bechtel and Richardson 2010) in Mitchell SD (2003) Biological complexity and integrative
such a complicated way that their behavior is co- pluralism. Cambridge University Press, Cambridge
Mitchell SD (2009) Unsimple truths. science, complexity, and
determined by the system’s organization. Furthermore, policy. University of Chicago Press, Chicago/London
the behavior of several complex systems heavily Simon HA (1962) The architecture of complexity. Proc Am
depends on certain parts of their context (i.e., it is Philos Soc 106:467–482
non-robust under variations of these contextual Wimsatt WC (2007) Re-engineering philosophy for limited
beings. Piecewise approximations to reality. Harvard
factors), which is why the context of the system cannot University Press, Cambridge, MA
be ignored altogether or grossly simplified.
Cross-References Complication
▶ Amplification ▶ Complexity
▶ Causality
▶ Classification
▶ Emergence
▶ Explanation in Biology Components of Variance
▶ Explanation, Reductive
▶ Feedback Regulation ▶ Experimental Design, Variability
▶ Gene Regulatory Networks
▶ Hierarchy
▶ Holism
▶ Interlevel Causation Composite Motifs
▶ Mechanism
▶ Organization Jianhua Ruan
▶ Reduction Department of Computer Science, University of Texas
▶ Robustness at San Antonio, San Antonio, TX, USA
▶ Self-Organization
Definition
References A set of transcription factor binding motifs, usually
located close to each other on the chromosome, asso-
Bechtel W (2008) Mental mechanisms. philosophical perspec-
tives on cognitive neuroscience. Taylor and Francis, New ciated with a cooperative set of transcription factors. It
York/London is also called cis-regulatory modules.
Computational Autopoiesis 461 C
demarcation were accepted, then autopoietic organi-
Computational Autopoiesis zation would be a required property of all properly
biological systems, that is, of the entire domain of
Barry McMullin systems biology. Computational autopoiesis is
The Rince Institute, Dublin City University, Dublin, concerned with testing the conjecture by building
Ireland computational universes (virtual worlds) specifically
designed to facilitate embedding of systems with C
autopoietic organization and investigating the extent
Definition to which these can then exhibit lifelike phenomenol-
ogy within those universes (including dynamic
Autopoiesis (“self-production”) denotes a particular self-maintenance, reproduction, and evolutionary
class of “circular” system organization. A system is growth of complexity). Computational autopoiesis is
said to be autopoietic if its organization satisfies two commonly classified as a subfield within ▶ Artificial
separate, but intertwined, “closure” properties: Life.
• Closure in production: The system is composed of
components which give rise to (realize or instanti- The Minimal Model
ate) processes of production which, in turn, collec- Partly because the definition of autopoiesis was still
tively produce more of those same components informal and qualitative, Maturana and Varela offered
• Closure in space: The self-construction of what they called a concrete “minimal model” of
a boundary between the system and the world in autopoietic organization. With this they put forward
which it is embedded, yet from which it distin- a relatively abstract, but still “chemical-like,” world,
guishes itself incorporating a minimal set of chemical species and
Computational autopoiesis denotes the study of reactions that would still suffice to allow their organi-
autopoietic systems realized in computational form – zation into an autopoietic system. With Ricardo Uribe,
that is, embedded in artificial, computer-generated, they took the highly original (for the time) further step
virtual worlds. of actually implementing this minimal model in the
form of a computer simulation of their abstract chem-
ical world (Varela et al. 1974).
Characteristics The minimal (computer-simulated) chemistry takes
places in a discrete, two-dimensional, space. Each
Motivation position in the space is either empty or occupied by
The term autopoiesis was coined in the early 1970s by a single particle. Particles generally move in random
Chilean biologists Humberto Maturana and Francisco walks in the space. There are three distinct particle
Varela. They introduced it in an attempt to character- types (chemical species), engaging in three distinct
ize what they saw as a decisive organizational dis- reactions:
tinction between living and nonliving systems – • Production: Two substrate (S) particles may react,
namely, that even the simplest living system (such in the presence of a catalyst (K) particle, to form
as a bacterial cell) has the capacity both to regenerate a link (L) particle.
all its own components and also to demarcate itself, as • Bonding: L particles may bond to other L particles.
a distinct system, from its surrounding environment. Each L particle can form (at most) two bonds, thus
In their view, this entails an essential self-referential allowing the formation of indefinitely long chains,
circularity in the organization of living systems; and which may close to form membranes. Bonded
they conjectured that this self-referential circularity is L particles become immobile.
at least a necessary (though hardly sufficient) condi- • Disintegration: An L particle may spontaneously
tion for the distinctive phenomenology of biological disintegrate, yielding two S particles. When this
systems. This conjecture is clearly relevant to what is occurs, any bonds associated with the L particle
now called systems biology insofar as if this are destroyed also.
C 462 Computational Autopoiesis
Elaborations
The earliest systematic elaborations of the minimal
model of computational autopoiesis were developed
by Milan Zeleny (1978). In particular, he reported
such phenomena as growth, change in shape, oscilla-
tion in chemical activity, and self-reproduction of
autopoietic entities. Prima facie, these represented
important demonstrations of progressively richer, life-
like, phenomena within the framework of computa-
tional autopoiesis. However, the algorithmic
Computational Autopoiesis, Fig. 1 Minimal autopoietic sys- (“chemical”) basis for these phenomena was not fully
tem embedded in 2D computational model. Small hollow detailed in the published reports. Technological limi-
squares are substrate (S); solid disk is catalyst (K); large hollow tations of the time meant that the source code (the
squares with bonds are membrane/link (L) particles
definitive documentation) was not widely distributed,
and, indeed, all copies are now believed to have been
Chains of L particles are permeable to S particles lost. The qualitative descriptions that are available
but impermeable to K and L particles. Thus, a closed suggest that, in at least some cases, the original
chain, or membrane, which encloses K or L particles constraints of local interaction, random motion of
effectively traps such particles. particles, and time-independent particle dynamics
The basic autopoietic system embedded in this may have been substantially relaxed. If that were so,
world (Fig. 1) consists of a closed chain (membrane) the interest of these results would be seriously
of L particles enclosing one or more K particles. diminished – in the sense that the more complex
Because S particles can permeate through the mem- “macroscopic” phenomena may have been, in effect,
brane, there can be ongoing production of L particles. implicitly programmed into individual microscopic
Since these cannot escape from the membrane, this particles. In any case, this particular line of elaboration
will result in the buildup of a relatively high concen- was not pursued further.
tration of L particles. On an ongoing basis, the mem- Some years later, Breyer et al. (1998) described
brane will rupture as a result of disintegration of another series of developments beyond the minimal
component L particles. Because of the high concentra- autopoiesis model. They first relaxed the original
tion of L particles inside the membrane, there should restrictions on the motion of bound L particles to
be a high probability that one of these will drift to the allow the formation of flexible chains and, indeed,
rupture site and effect a repair, before the K particle(s) membranes. Allowing multiple, doubly bonded,
escape, thus reestablishing precisely the conditions L particles to occupy a single lattice site further
allowing the buildup and maintenance of that high enhanced membrane motion. Adopting more sophisti-
concentration of L particles. cated “bond rearrangement” interactions then allows
Practical computer simulation experiments chain fragments formed within the cell to be dynami-
demonstrated that such dynamic, self-maintaining, cally integrated into the membrane. These mechanisms
autopoietic systems can indeed both emerge and per- together obviate the need for the chain-based bond
sist as discrete, self-demarcated, composite individuals inhibition mechanisms – since now, even if
embedded within this abstract world. In the original L particles spontaneously bond in the interior of the
Computational Autopoiesis 463 C
cells, the oligomers so formed remain mobile and, Related Concepts
through bond rearrangement, can still successfully Computational autopoiesis is distinct from, but clearly
function to repair membrane ruptures. In this way, related to several other, parallel, developments.
the model was reported as successfully supporting The chemoton concept of Tibor Gánti (2003),
cell growth. though apparently developed quite independently,
More recently, a variation of the minimal model has shares significant features with cellular autopoiesis.
been presented which demonstrates systematic motion It is again a proposal for an abstract, “minimal” cell, C
of the autopoietic cell in a manner reminiscent of consisting of a collectively autocatalytic network of
chemotaxis (Suzuki and Ikegami 2008). reactions which is enclosed within a membrane,
A specific criticism of the minimal model is that it which is also generated and maintained by the reac-
provides no mechanism for production of further tion network. It differs from the minimal autopoiesis
K particles (within a functioning autopoietic cell). model in explicitly including a “genetic subsystem.”
This is argued to be both a logical and practical defi- It is also rather more detailed in its analysis of the
ciency. Logical because, prima facie, it conflicts with required chemical dynamics (kinetics, etc.) and aims
the stated requirement of autopoiesis for closure of at supporting self-reproduction by growth and fission
production. But perhaps more importantly, from even in the minimal version. The chemoton has been
a practical point of view, in the absence of ongoing subject to various computer simulation studies. How-
production of K, then self-reproduction of this form of ever, these have been based on an ordinary differen-
autopoietic entity is impossible; and consequently, tial equation approach rather than spatially
there is also no possibility for spontaneous Darwinian distributed molecular (particle) models. This means
evolution among variant lineages of such entities. that, in particular, the distributed, spatial, dynamics
Breyer et al. (1998) did address this issue to the of membrane growth, fission, and individuation have
extent of presenting a number of specific mechanisms not been substantively modeled in the chemoton
for production of further K particles but did not pursue context.
that to the demonstration of autopoietic entities actu- Another related line of work involves much more
ally capable of sustained reproduction (i.e., which physicochemically realistic computer models of mini-
could continue up to the limit of available resources mal cellular structures. This line of attack poses some
of space and particles). scaling difficulties, as the computational demands of
Further substantive progress in demonstrating evo- physicochemical realism rise very rapidly. On the
lution of computational autopoietic systems has (to other hand, the cost of computation continues to fall,
date) not generally focused on direct elaboration of so this may well be a very fruitful domain of further
the original minimal model but instead has tended to research in the near future.
shift to computer modeling at a more coarse-grained Finally, there are specific conceptual connections
level. This has been at least partially driven by reasons between the idea of autopoiesis and Robert Rosen’s
of computational cost in the simulations. In the original metabolism-repair (MR) systems and “closure under
model, each lattice site can generally be occupied by at efficient causation” (Rosen 1991). However, it must be
most one particle: It is fine grained to the single mol- noted that Rosen explicitly argued that his form of
ecule level. In the so-called lattice artificial chemistry closure could not, even in principle, be embedded within
(LAC) models, by contrast, each single lattice site is a purely computational system. By contrast, computer
permitted to host many particles, of many distinct realization has been an explicit exemplar of autopoietic
molecular species. Using this approach, a variety of closure from the very start. The relationship between
additional phenomena have been reported, including autopoiesis (especially in its computational realizations)
self-reproducing autopoietic cells, with (limited) heri- and closure under efficient causation is, accordingly, an
table variation and Darwinian evolution; for example, issue of continuing active research and debate.
selection between varieties of catalyst particles having
different rates of membrane particle generation (Ono
and Ikegami 2002). Some example phenomena, Cross-References
including self-reproduction, have also been reported
in three-dimensional LAC models. ▶ Artificial Life
C 464 Computational Creativity
References Characteristics
Breyer J, Ackermann J, McCaskill J (1998) Evolving reaction- Understanding brain processes behind creativity and
diffusion ecosystems with self-assembling structures in thin
modeling them using computational means is one of
films. Artif Life 4:25–40
Gánti T (2003) The principles of life. Oxford University Press, the grand challenges for systems biology. Computa-
Oxford tional creativity is a new field, inspired by cognitive
McMullin B (2004) Thirty years of computational autopoiesis: psychology and neuroscience. In many respects,
a review. Artif Life 10:277–295. doi:10.1162/
human-level intelligence is far beyond what artificial
1064546041255548, http://alife.rince.ie/bmcm-alj-2004/
Ono N, Ikegami T (2002) Selection of catalysts through intelligence can provide now, especially in regard to
cellular reproduction. In: Standish R, Bedau MA, Abbass the high-level functions, involving thinking, reason-
HA (eds) Proceedings of the 8th international conference ing, planning, and the use of language. Intuition,
on artificial life (ALIFE 8). MIT Press, Sydney, Australia,
pp 57–64. http://web.archive.org/web/20040518183545/
insight, imagery, and creativity are important aspects
http://parallel.hpc.unsw.edu.au/complex/alife8/proceedings/ of all these functions. Computational models show
sub2844.pdf great promise both in elucidating mechanisms behind
Rosen R (1991) Life itself. Columbia University Press, such high-level mental functions, and in applications
New York
Suzuki K, Ikegami T (2008) Shapes and self-movement in
requiring intelligence (Duch 2007).
protocell systems. Artif Life 15(1):59–70. doi:10.1162/ Creativity, defined by Sternberg (1998) as “the
artl.2009.15.1.15104, http://dx.doi.org/10.1162/artl.2009.15. capacity to create a solution that is both novel and
1.15104 appropriate,” has often been understood in a narrow
Varela FJ, Maturana HR, Uribe R (1974) Autopoiesis: the orga-
nization of living systems, its characterization and a model.
sense, with a focus on big discoveries, inventions, and
Biosystems 5:187–196 creation of novel theories, arts, and music, but it also
Zeleny M (1978) APL-AUTOPOIESIS: experiments in self- permeates everyday activity, thinking, understanding
organization of complexity. In: Progress in cybernetics and language, providing flexible solutions to everyday
systems research, vol III. Hemisphere, Washington,
pp 65–84
problems (R. Richards, in Runco and Pritzke 2005,
pp. 683–688). Creativity research has been pursued
mostly in the domain of philosophy, education, and
psychology, with many research results published in
two specialized journals: Creativity Research Journal
and Journal of Creative Behavior. The “Encyclopedia
Computational Creativity of Creativity” (Runco and Pritzke 2005), written by
167 experts, does not contain any testable neurological
Włodzisław Duch or computational models of creativity. MIT Encyclo-
Department of Informatics, Nicolaus Copernicus pedia of Cognitive Sciences (Wilson and Keil 1999)
University, Toruń, Poland contains only a single page (of about 1,100 pages)
School of Computer Engineering, Nanyang about creativity, does not mention intuition at all, but
Technological University, Singapore, Singapore devotes six articles to logic, appearing in the
index almost 100 times. Logic has never been too
successful in modeling real-thinking processes that
Synonyms rely on intuition and creativity. The interest in research
on computational and neuroscience approaches to
Ideation; Ingenuity; Innovation creativity is thus quite recent.
intuition is the basis of nonverbal communication, and problem solving, although the problem of find a good
can be seen as the phenomenological and behavioral way to represent knowledge domain in which genetic
correlate of knowledge gained through implicit learn- processes operate may be as hard as the original prob-
ing (Lieberman 2000). lem. Other approaches to insight include the “small
The measurement of intuition is based on several world” network model of Schilling (2005) and the
tests and inventories (for example, the Myers-Briggs work on fluid concepts and creative analogies
Type Inventory, or Accumulated Clues Task), but (Hofstadter 1995), with some applications to design.
there is little correlation between them, so the concept A direct attempt to model creative processes in the
of intuition is not well defined from an operational brain is still not feasible, but inspirations from the
point of view. Significant correlations were found BVSR models may be used in a number of ways. The
between the Rational-Experiential Inventory (REI) results of the experimental and theoretical research in
intuition scale and some measures of creativity (Raidl this domain can be summarized in three points:
and Lubart 2001). Such tests reflect rather complex 1. Space: creativity involves neural processes that are
cognitive processes, and it is not clear which brain realized in the space of quasi-stable neural activi-
processes are behind these measures. ties, leading to patterns of activity that reflect rela-
From a computational point of view, it is much tions between concepts in some domain.
easier to create predictive models of data than to pro- 2. Blind variation: priming by concepts that represent
vide explanations (Duch 2007). In particular, it is dif- the problem leads to distributed fluctuating neural
ficult to explain decisions made by neural networks or activity constrained by the strength of associations
similarity-based systems. Using such systems for between patterns of neural activity coding concepts;
learning from partial observations can constrain the this process is responsible for imagination and the
search for solutions, avoiding combinatorial explosion flexible formation of transient novel associations.
that is a main problem in AI, making the reasoning 3. Selective retention: filtering of interesting results,
process feasible. discovering partial solutions that may be useful to
reach goals, amplifying or forming new associa-
Creativity from the Computational Perspective tions; in biological systems, this may involve emo-
Psychology and neuroscience agree that creativity is tional arousal.
a product of ordinary cognitive processes. The lack of The blind-variation process may require some
understanding of detailed mechanisms involved in cre- structuring to be effective. Brainstorming, free associ-
ative activity makes the development of creative com- ations, random stimulation, or lateral thinking have not
puting rather difficult. The need for everyday creativity been very successful in the generation of creative ideas
has been almost completely neglected by the artificial in advertising and product innovation (Goldenberg and
intelligence research community and may be credited Mazursky 2002). However, structured approaches,
for failures of many AI programs. Early attempts to based on higher-order rules and templates, led to
model intuition, insight, and inspiration from the AI excellent results. Computer-generated ideas based on
point of view have been summarized by Simon (1995). templates were rated significantly higher both for
His work has mostly been directed at understanding creativity and originality than the non-template
historical discoveries of scientific laws, as well as the human ideas. The associative processes may in this
search for new scientific knowledge of this kind in case have been guided by general rules. Connectionist
astronomy, physics, chemistry, and biology. Simon models for the generation of ideas within the brain-
made no attempts to connect search-based AI storming context can successfully predict factors that
approaches to processes in the brain. Research in auto- enhance brainstorming productivity. The model of Iyer
matic genetic programming (Koza et al. 2003) can be et al. (2009) is perhaps the most sophisticated, with
credited for useful patentable inventions in automated features, concepts, and cognitive control components
synthesis of antennas, analog electrical circuits, con- as separate neural layers. Ideas emerge in a multilevel,
trollers, metabolic pathways, genetic networks, and modular semantic space from itinerant attractor
other areas. These inventions have been mostly dynamics (i.e., activity of neuronal changes in the
restricted to optimized versions of known designs. itinerant way, attracted toward quasi-stable states
Genetic programming may be capable of creative where it slows down) shaped by context, synaptic
Computational Creativity 467 C
learning, and ongoing evaluative feedback. This model In computational models, these cognitive processes
generates novel ideas by multilevel dynamical search may be implemented in large-scale neural models, but
in various contexts, capturing the interplay between even the simplest approximations give interesting
semantic representations, working memory, focusing results (Duch 2006; Duch and Pilichowski 2007). The
attention on various features, and using reinforcement algorithm involves three major components:
signals. The model is quite useful for the elucidation of 1. An autoassociative memory (AM) structure, trained
the mechanisms of creativity. For example, it has on a large lexicon to learn statistical properties at C
found interesting associations for tourist activities the morphological level, providing the model of
such as “outing and vacation,” depending on the infor- a neural space and storing background knowledge
mation of the context. that is modified (primed) by keywords.
The simplest domain in which creativity is fre- 2. Blind-variation process (imagery), forming new
quently manifested and can be studied in experiments strings from combinations of substrings found in
as well as computational models is the invention and keywords and their synonyms, with probabilistic
understanding of neologisms. Poems by Lewis Carroll constraints provided by the AM to select only lex-
are full of neologisms, but novel words are also in ically plausible strings.
great demand for products, web sites, or company 3. Selective retention ranking the quality of the strings
names. In languages with rich morphological and representing neologisms from a phonological and
phonological compositionality (Latin-based, Slavic, semantic point of view.
and other families), out-of-vocabulary words appear Filters should estimate phonological plausibility
fairly often in conversations. In most cases, the mor- and “semantic density,” or the number of potential
phology of these words gives sufficient information to associations with commonly known words, calculat-
understand their meaning. Given keywords or a short ing the number of substrings in the lexical tokens that
description from which keywords are extracted primes may serve as morphemes in each new candidate
the brain at the phonetic and semantic level. The string. Many factors may be included here: general
structure of the language is internalized in the neural similarity between morphemes, personal biases,
space. Priming leads to blind variation at the level of tweaking phonology for neologisms with interesting
word constituents (syllables, morphemes), creating phonetics. The implementation of this algorithm led
a large number of transient resonant configurations to the generation of neologisms with highest ranks
of neural cell assemblies that remain unconscious. that have actually been used as company or domain
This process explores the space of possibilities that names in about 75% of cases. For example, starting
agree with internalized constraints on the statistical from an extended list of keywords, portal, imagina-
probabilities of phonological structure (phonotactics tion, creativity, journey, discovery, travel, time,
of the language) and morphological structure. Imag- space, infinite, the best neologisms included creatival
ery processes are approximated in a better way by (used by creatival.com), creativery (used by
taking keywords, finding their synonyms to increase creativery.com). Novel neologisms (not found by
the pool of words, breaking words into syllables and the Google search engine) included discoverity, asso-
morphemes, and combining the fragments in all pos- ciated with discovery of something true (verity), and
sible ways. Words that share properties with many linked to many morphemes: disc, disco, discover,
other words (i.e., patterns that code them in the brain verity, discovery, creativity, verity, and through pho-
overlap strongly with patterns for other words) have nology to many others. Another interesting word
a higher chance to win the competition for access to found is digventure, with many associations to dig
awareness. Many variants of words are created around and venture.
the same morphemes. The same word can be used with These examples show that computational
many meanings because the context creates specific approaches to creativity can, at least in restricted
brain activation patterns for this word. Creative brains domains, lead to results that are comparable with
spread the activation to more words associated with human ingenuity, and that blind-variation selective
initial keywords, produce more combinations, retention ideas based on the generalization of evolu-
selecting the most interesting ones using phonologi- tionary processes may be as useful in cognitive science
cal, affective, and semantic filters. as they are in life sciences.
C 468 Computational Data Analysis
References
Computational Data Analysis
Boden M (1991) The creative mind: myths and mechanisms.
Abacus, London
▶ Data-intensive Research
Campbell DT (1960) Blind variation and selective retention in
creative thought as in other knowledge processes. Psychol
Rev 67:380–400
Duch W (2006) Computational creativity. IEEE World congress
on computational intelligence, 16–21 July. IEEE Press, Van-
couver, pp 1162–1169
Computational Identification of
Duch W (2007) Towards comprehensive foundations of com- microRNA Targets
putational intelligence. In: Duch W, Mandziuk J (eds)
Challenges for computational intelligence. Springer, Berlin/ ▶ MicroRNA, Target Prediction
Heidelberg/New York, pp 261–316
Duch W, Pilichowski M (2007) Experiments with computational
creativity. Neural Inform Process Lett Rev 11(4–6):123–133
Duch W, Matykiewicz P, Pestian J (2008) Neurolinguistic
approach to natural language processing with applications Computational Linguistics
to medical text analysis. Neural Netw 21(10):1500–1510
Goldenberg J, Mazursky D (2002) Creativity in product innova-
tion. Cambridge University Press, Cambridge ▶ Natural Language Processing
Hofstadter DR (1995) Fluid concepts and creative analogies:
computer models of the fundamental mechanisms of thought.
Basic Books, New York
Iyer LR, Doboli S, Minai AA, Brown VR, Levine DS, Paulus PB
(2009) Neural dynamics of idea generation and the effects of Computational Methods for Mapping
priming. Neural Netw 22:674–686 B Cell Epitopes
Kounios J, Jung-Beeman M (2009) Aha! The cognitive neuro-
science of insight. Curr Dir Psychol Sci 18:210–216
▶ B Cell Epitope Prediction
Koza JR et al (2003) Genetic programming IV: routine human-
competitive machine intelligence. Kluwer, New York
Lieberman MD (2000) Intuition: a social cognitive neuroscience
approach. Psychol Bull 126(1):109–137
Mednick SA (1962) The associative basis of the creative process.
Psychol Rev 69:220–232 Computational Methods for
Myers DG (2002) Intuition: its powers and perils. Yale University Transcriptional Regulatory Networks
Press, New Haven
Pulverm€uller F (2003) The neuroscience of language On brain
circuits of words and serial order. Cambridge University Jianhua Ruan
Press, Cambridge Department of Computer Science, University of Texas
Raidl M-H, Lubart TI (2001) An empirical study of intuition and at San Antonio, San Antonio, TX, USA
creativity. Imagination, Cognition Personality 20(3):
217–230
Runco M, Pritzke S (eds) (2005) Encyclopedia of creativity,
vol 1–2. Elsevier, Burlington Definition
Schilling MA (2005) A “Small-orld” network model of cogni-
tive insight. Creativ Res J 17(2–3):131–154 The transcriptional level of gene expression is
Simon HA (1995) Explaining the ineffable: AI on the topics of
intuition, insight and inspiration. In: Proceedings of the 14th controlled, to a large extent, by specific interactions
international joint conference on artificial intelligence, between transcription factors (TFs) and the promoter
Montreal, pp 939–948 sequences of their target genes. The interactions
Simonton DK (2010) Creative thought as blind-variation and between TFs and target genes are combinatorial in
selective-retention: combinatorial models of exceptional cre-
ativity. Phys Life Rev 7:156–179 nature, and are typically many-to-many, i.e., each TF
Sternberg RJ (ed) (1998) Handbook of human creativity. controls many genes, and a gene can be controlled by
Cambridge University Press, Cambridge many TFs, forming complex transcriptional regulatory
Sternberg RJ, Davidson JE (1995) The nature of insight. MIT networks (TRNs). To understand gene functions in dif-
Press, Cambridge, MA
Wilson R, Keil F (eds) (1999) MIT encyclopedia of cognitive ferent biological processes, it is necessary to reconstruct
sciences. MIT Press, Cambridge, MA such regulatory networks from experimental data.
Computational Methods for Transcriptional Regulatory Networks 469 C
Characteristics expressed genes”) are often considered co-regulated.
A number of statistical methods have been developed
Data to identify differentially expressed genes, for example,
Input data used for computationally reconstructing Statistical Analysis of Microarray (SAM) (Tusher et al.
genome-scale TRNs may include gene expression data 2001), and Rank Products (Breitling et al. 2004).
measured by DNA microarrays or RNA-seq, promoter Statistically, genes exhibiting similar expression patterns
sequences that the cis-regulatory elements may be across multiple experimental conditions are more likely C
located, and protein-DNA interaction data obtained co-regulated than genes co-expressed under very specific
from high-throughput experiments such as chromatin conditions. Therefore, unsupervised methods perform
immunoprecipitation (ChIP) combined with microarrays better when the data set contains a large number of
(ChIP-chip), chromatin immunoprecipitation combined experimental conditions. Genes can then be grouped
with next-generation sequencing (ChIP-Seq), and into clusters, where genes in the same cluster have
universal protein-binding microarrays (PBM). similar expression patterns. This clustering-based
These data sets provide different, orthogonal, infor- method is generally considered a more reliably way of
mation and should be combined whenever possible. identifying co-regulated genes than methods based on
ChIP-based protein-DNA interaction data provides differentially expressed genes. Many clustering
the most direct evidence of physical interactions algorithms have been introduced from the data mining
between regulators and DNA. However it is relatively field, including hierarchical clustering (Eisen et al.
costly and may be limited by the availability of anti- 1998), k-means clustering (Tavazoie et al. 1999), self-
bodies specific to the regulators of interest. Further- organizing maps (Tamayo et al. 1999), and graph-
more, protein-DNA interaction is dynamic and is theoretic algorithms (Xu et al. 2002) (see Belacel et al.
affected by many factors, while ChIP-based data only (2006) for a survey). Bi-clustering methods have also
captures the interaction under a particular condition been introduced to find clusters of genes that are
and at a specific time point. Gene expression data co-expressed under subsets of conditions (Cheng and
combined with promoter sequences can be combined Church 2000; Tanay et al. 2002; Turner et al. 2005;
to reveal dynamic changes of transcriptional regulation Madeira and Oliveira 2004).
under different conditions. However, the simple
assumption that such method usually relies on, i.e., Identifying Common cis-Regulatory Elements
co-expressed genes are transcriptionally co-regulated, After obtaining a list of putatively co-regulated genes,
does not always hold. the next logical step is to find the common regulators, or
In addition, using promoter sequences one can only to find common cis-regulatory elements. The latter is
infer the binding preferences of a transcription factor. more often used, simply because the former relies on
The identity of the actual transcription factor needs to be protein-DNA interaction data which is still relatively
resolved, which is often not easy. The PBM technique rare and may not be available for the particular condition
may fill this gap by explicitly measuring the affinity of interest. In contrast, finding cis-regulatory elements
between regulators and all possible DNA subsequences only needs genomic sequence data – it attempts to find
of a fixed length. common short subsequences from the promoter regions
of the co-regulated genes (“motif finding”). Motif find-
Unsupervised Methods ing can also be classified into two categories: de novo
Identifying Putatively Co-regulated Genes motif finding and motif scanning. While de novo motif
An unsupervised method for reconstructing transcrip- finding does not require input of known motifs, motif
tional regulatory networks usually starts with identifying scanning relies on databases of known cis-regulatory
potentially co-regulated genes. Although co-regulated elements. With motif scanning, one searches for the
genes can be directly obtained from protein-DNA occurrence of a list of known cis-regulatory elements
interaction data, the most common source is gene expres- (with some tolerance of mismatches) from the promoters
sion data analysis, which relies on the simple assumption of the co-regulated genes, and then reports the ones with
that co-expressed genes are likely co-regulated. For the highest number of occurrence (or more accurately,
example, genes that show similar transcriptional with the highest statistical significance). The de novo
responses to the same treatment (“differentially motif finding method does not rely on databases of
C 470 Computational Methods for Transcriptional Regulatory Networks
regulators
motifs
C
1 1 0 1 0 0 1
0 1 0 1 1 0 0
1 0 0 0 1 0 1
1 0 0 0 0 1 0
0 1 1 0 0 0 0
genes
0 0 1 0 1 0 0
1 1 0 0 1 1 0
0 0 0 1 1 0 0
0 0 1 1 0 1 0
0 0 1 0 0 0 0
A 1 0 0 1 0 1 1
conditions
B
Computational Methods for Transcriptional Regulatory are modeled by the motifs on their promoters. Box C, the expres-
Networks, Fig. 1 Schematic representation of existing super- sion levels of a single gene under multiple conditions are modeled
vised machine-learning methods for modeling transcriptional reg- by the expression levels of putative regulators. Box D, the expres-
ulation. The bottom right matrix represents gene expression levels. sion levels of multiple genes under multiple conditions are
The bottom left matrix represents motif occurrences on promoter modeled by the expression levels of putative regulators. Box E,
sequences. The top matrix is the expression levels of regulators. the expression levels of multiple genes under multiple conditions
Box A, the expression levels of multiple genes under a single are modeled by the motifs on their promoters and the expression
condition are modeled by the motifs on their promoter. Box B, levels of putative regulators
the expression levels of multiple genes under multiple conditions
transcriptional regulation of gene expressions over example, Qian et al. (2003) applied support vector
several time points simultaneously. In these two machines (SVMs) to predict the targets of 36 yeast
methods, motifs will be associated with a cluster of transcription factors by identifying subtle relationships
genes that have similar expression patterns; this is between their expression profiles. Soinov et al. (2003)
similar to the unsupervised methods. However, in and Segal et al. (2003a) used decision trees and regres-
these methods clustering of genes and identification sion trees for similar purposes. The method in Soinov
of cluster-specific motifs are performed et al. (2003) used decision tree to identify possible
simultaneously. regulators for several cell-cycle genes individually,
The second type of supervised methods attempt to while the method in Segal et al. (2003a) proposed
model the expression levels of target genes by the a more sophisticated procedure suitable for whole-
expression levels of other genes, e.g., TFs and other genome analysis. The method first clusters genes
regulators (Fig. 1, Boxes C and D). In Box C, the according to their expression patterns, and then builds
expression levels of a single gene under multiple con- a regression tree for each cluster to represent their
ditions are modeled as a function of the expression common regulation program. The procedure then iter-
levels of putative regulators under the same set of atively refines the clusters and the trees.
conditions. That is, each condition is treated as an Finally, the third type of supervised methods model
instance, and the expression levels of putative regula- gene expression levels using both putative binding
tors under the condition are its attributes. Similar to the motifs on promoter sequences and the expression
first type of methods, both rule-like and regression- levels of putative regulators (Fig. 1, Box E). For exam-
based methods have been commonly applied. For ple, Middendorf et al. (2004) used an ensemble of
C 472 Computational Methods for Transcriptional Regulatory Networks
decision trees to model gene expression levels by com- (Ihmels et al. 2002; Bar-Joseph et al. 2003). Hybrid
bining putative binding motifs and the expression methods that combine gene clustering and classifica-
levels of putative TFs. The method by Ruan and tion iteratively have also been developed (Segal et al.
Zhang (2006), Bi-Dimensional Regression Tree 2003b; Beer and Tavazoie 2004).
(BDTree), is an extension to the multivariate regres-
sion tree approach of Segal (1992) and Phuong et al.
(2004). A multivariate regression tree recursively References
splits genes into groups, where the genes in each
group have similar selected attributes (motifs) and Bailey TL, Elkan C (1994) Fitting a mixture model by expecta-
tion maximization to discover motifs in biopolymers. Proc
responses (Segal 1992). Phuong et al. (2004) extended
Int Conf Intell Syst Mol Biol 2:28–36
the method to handle multiple responses, so that Bar-Joseph Z, Gerber GK, Lee TI, Rinaldi NJ, Yoo JY, Robert F,
the instances in each group have a similar pattern of Gordon DB, Fraenkel E, Jaakkola TS, Young RA, Gifford DK
responses across multiple conditions. The basic idea (2003) Computational discovery of gene modules and
regulatory networks. Nat Biotechnol 21:1337–1342
of BDTree, as suggested by its name, is to extend
Beer MA, Tavazoie S (2004) Predicting gene expression from
the multivariate regression tree approach to both sequence. Cell 117(2):185–198
dimensions of the expression matrix (Fig. 1, Box E). Belacel N, Wang Q, Cuperlovic-Culf M (2006) Clustering
On one dimension, each gene is treated as an instance, methods for microarray gene expression data. OMICS
10:507–531
where the attributes are the binding motifs on its
Breitling R, Armengaud P, Amtmann A, Herzyk P (2004) Rank
promoter sequence, and the responses are its expres- products: a simple, yet powerful, new method to detect
sion levels under different conditions. On this dimen- differentially regulated genes in replicated microarray exper-
sion, the approach recursively partitions genes so iments. FEBS Lett 573(1–3):83–92
Buhler J, Tompa M (2002) Finding motifs using random pro-
that those in the same subset have some common
jections. J Comput Biol 9:225–242
binding motifs and similar expression patterns across Bussemaker HJ, Li H, Siggia ED (2001) Regulatory element
the conditions. On the other dimension, each condition detection using correlation with expression. Nat Genet
is treated as an instance, while the attributes are 27:167–171
Cheng Y, Church GM (2000) Biclustering of expression data.
the expression levels of candidate regulators under
Proc Int Conf Intell Syst Mol Biol 8:93–103
the condition, and the responses are the expression Conlon EM, Liu XS, Lieb JD, Liu JS (2003) Integrating regula-
levels of genes under that condition. BDTree recur- tory motif discovery and genome-wide expression analysis.
sively partitions the conditions so that the expression Proc Natl Acad Sci USA 100:3339–3344
Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster
levels of the regulators and target genes within each
analysis and display of genome-wide expression patterns.
subset of conditions are similar. The final model, Proc Natl Acad Sci U S A 95:14863–14868
represented by a tree, can be considered as a set of Hu YJ, Sandmeyer S, McLaughlin C, Kibler D (2000) Combi-
rules, each of which has the form of “if a gene has natorial motif analysis and hypothesis generation on
a genomic scale. Bioinformatics 16:222–232
binding motif A, it will be up-regulated when TF B is
Ihmels J, Friedlander G, Bergmann S, Sarig O, Ziv Y, Barkai N
up-regulated.” (2002) Revealing modular organization in the yeast tran-
scriptional network. Nat Genet 31:370–377
Other Methods Keich U, Pevzner PA (2002) Finding motifs in the twilight zone.
Bioinformatics 18:1374–1381
Besides the supervised and unsupervised methods,
Keles S, van der Laan M, Eisen MB (2002) Identification of
several other approaches have been proposed to explic- regulatory elements using a feature selection method. Bioin-
itly identify synergies among regulators or regulatory formatics 18:1167–1175
elements without clustering or classifying genes. For Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF,
Wootton JC (1993) Detecting subtle sequence signals:
example, Pilpel et al. (2001) analyzed the combinato-
a gibbs sampling strategy for multiple alignment. Science
rial effects of motif pairs on gene expression profiles 262:208–214
and identified many significant motif combinations. Liu X, Brutlag DL, Liu JS (2001) Bioprospector: discovering
Also, several methods have been developed to conserved DNA motifs in upstream regulatory regions of co-
expressed genes. Pac Symp Biocomput 2001:127–38
combine gene expression clustering with additional
Madeira SC, Oliveira AL (2004) Biclustering algorithms for
information, such as promoter sequences, cellular biological data analysis: a survey. IEEE/ACM Trans Comput
functions, and genomic localization data Biol Bioinform 1:24–45
Computational microRNA Biology 473 C
Middendorf M, Kundaje A, Wiggins C, Freund Y, Leslie C
(2004) Predicting genetic regulatory response using classifi- Computational microRNA Biology
cation. Bioinformatics 20(Suppl 1):I232–I240
Pavesi G, Mauri G, Pesole G (2001) An algorithm for finding
signals of unknown length in DNA sequences. Bioinformat- Julio Vera and Ulf Schmitz
ics 17:S207–S214 Department of Systems Biology and Bioinformatics,
Phuong TM, Lee D, Lee KH (2004) Regression trees for regu- Institute of Computer Science, University of Rostock,
latory element identification. Bioinformatics 20(5):750–757
Pilpel Y, Sudarsanam P, Church GM (2001) Identifying regula- Rostock, Germany C
tory networks by combinatorial analysis of promoter ele-
ments. Nat Genet 29:153–159
Qian J, Lin J, Luscombe NM, Yu H, Gerstein M (2003) Predic- Synonyms
tion of regulatory networks: genome-wide identification of
transcription factor targets from gene expression data. Bio-
informatics 19:1917–1926 Kinetic modeling of microRNA regulation; Mathemat-
Roth FP, Hughes JD, Estep PW, Church GM (1998) Finding ical modeling of microRNA regulation; MicroRNA;
DNA regulatory motifs within unaligned noncoding MicroRNA clusters; MicroRNA databases and Web
sequences clustered by whole-genome mRNA quantitation.
Nat Biotechnol 16:939–945 resources; MicroRNA gene prediction; MicroRNA
Ruan J, Zhang W (2006) A bi-dimensional regression tree synthesis; MicroRNA target hubs; MicroRNA target
approach to the modeling of gene expression regulation. prediction; MicroRNA target regulation;
Bioinformatics 22:332–340 Regulatory networks
Segal MR (1992) Tree-structured methods for longitudinal data.
J Am Stat Assoc 87:407–418
Segal E, Shapira M, Regev A, Pe’er D, Botstein D, Koller D,
Friedman N (2003a) Module networks: identifying regula- Definition
tory modules and their condition-specific regulators from
gene expression data. Nat Genet 34:166–176
Segal E, Yelensky R, Koller D (2003b) Genome-wide discovery MicroRNAs (miRNAs) are small regulatory RNAs of
of transcriptional modules from DNA sequence and gene ~22nt length that regulate the activity and stability of
expression. Bioinformatics 19(Suppl 1):i273–i282 a large number of messenger RNAs (Bartel 2004;
Simonis N, Wodak SJ, Cohen GN, van Helden J (2004) Com- Ambros 2004; Filipowicz et al. 2008). miRNAs bind
bining pattern discovery and discriminant analysis to predict
gene co-regulation. Bioinformatics 20(15):2370–2379 to specific messenger RNAs and target them for cleav-
Sinha S, Tompa M (2002) Discovery of novel transcription age, deadenylation, or translational repression. In all
factor binding sites by statistical overrepresentation. Nucleic three processes, protein synthesis of the targeted mes-
Acids Res 30:5549–5560 senger RNA is prevented. Until today, a vast amount of
Soinov LA, Krestyaninova MA, Brazma A (2003) Towards
reconstruction of gene networks from expression data by human miRNAs has been detected (1.524 human
supervised learning. Genome Biol 4:R6 miRNAs are listed in the ▶ miRBase database, r18).
Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Many of them are involved in the regulation of relevant
Dmitrovsky E, Lander ES, Golub TR (1999) Interpreting cellular processes, including cell signaling, metabo-
patterns of gene expression with self-organizing maps:
methods and application to hematopoietic differentiation. lism, apoptosis, and developmental processes (Ambros
Proc Natl Acad Sci USA 96:2907–2912 2004).
Tanay A, Sharan R, Shamir R (2002) Discovering statistically In addition to their function in cancer, numerous
significant biclusters in gene expression data. Bioinformatics studies have validated the role of miRNAs as
18(Suppl 1):S136–S144
Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM key modulators in other high prevalence diseases,
(1999) Systematic determination of genetic network archi- including cardiovascular disorders like fibrosis
tecture. Nat Genet 22:281–285 and atherosclerosis, neurodegenerative diseases, auto-
Turner HL, Bailey TC, Krzanowski WJ, Hemingway CA immunity, and infectious processes including pulmo-
(2005) Biclustering models for structured microarray data.
IEEE/ACM Trans Comput Biol Bioinform 2:316–329 nary infections (▶ microRNA, Disease and Therapy).
Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of On the other hand, strategies have been established
microarrays applied to the ionizing radiation response. Proc that use miRNAs as therapeutic agents by either
Natl Acad Sci USA 98(9):5116–5121 downregulating those that are abnormally
Xu Y, Olman V, Xu D (2002) Clustering gene expression data
using a graph-theoretic approach: an application of minimum overexpressed or replacing silenced ones with stable
spanning trees. Bioinformatics 18:536–545 nucleotide constructs.
C 474 Computational microRNA Biology
Networks Involving miRNA Regulation program or be used as a backup option to ensure the
MicroRNAs are often embedded in complex gene and activation of those programs.
signaling networks composed of transcription factors, MiRNAs are involved in positive feedback loops; in
miRNAs, and signaling proteins. Interestingly, the simplest of these structures, the expression of
miRNA-regulated networks are often enriched in reg- a miRNA is inhibited by one of its target proteins
ulatory motifs like feedback and feed-forward loops. (Fig. 3 top left). This motif can induce bistable expres-
Moreover, they exhibit other complex motifs that are sion of both the miRNA and its target, while under
specific to miRNA regulation. certain conditions this system transforms a transient
Feedback Loops. Often miRNAs regulate the activation signal into a sustained cellular response,
expression of signaling proteins that are involved in driven by the miRNA target or downstream signal
feedback loop structures. In other cases, the expression mediators. This confers robustness to those systems
of miRNAs can be regulated by their own targets or against fluctuations in gene expression and noise in
their interaction partners (▶ MicroRNA Regulation, the activation signal (Inui et al. 2010; ▶ MicroRNA
Signaling Pathways). MiRNAs embedded in feedback Regulation, Feed-Forward Loops). MiRNAs also par-
loop systems can reinforce a given transcriptional ticipate in negative feedback loops; in the simplest
C 476 Computational microRNA Biology
C
(miR-17-5p, miR-20a) post-transcriptionally regulate developed that could help to uncover many more puta-
E2F1, which is itself a transcription factor that cross tive miRNA genes. In ▶ miRNA gene prediction,
activates c-Myc. Thus, a complex regulatory network sequence-based approaches try to find genomic regions
of miRNAs and TFs is formed, including negative and which may reveal stem-loop structures when being
positive feedback loops (Aguda et al. 2008). expressed. Such methods are often complemented by a
MicroRNA Target Hubs. MiRNA target hubs are conservation analysis of candidate sequences and their
genes targeted by many miRNAs (▶ Target Hub, predicted structures. Additional approaches investigate
Fig 4. right). These miRNAs might belong to the structural properties, like stability and robustness, of the
same miRNA cluster or are subject to totally different stem loop in order to reduce the space of candidates.
and independent transcriptional regulation. Under the Machine-learning approaches use known miRNA pre-
assumption that miRNA target hubs are regulated by at cursors as training set and exploit their features as
least 15 miRNAs, Shalgi and coauthors (2007) found criteria to classify new candidates. Transcriptiomics
470 of such genes in the human genome. It has been based approaches search in ▶ RNA-seq data for short
suggested that miRNA target hubs are integrated in transcripts that can be mapped to genomic loci that may
larger regulatory networks, involving transcription form stem loops upon expression. Nowadays, the search
factors, signaling proteins, and miRNAs, enriched for miRNA genes in newly sequenced genomes benefits
with many of these regulatory motifs introduced from the large body of known miRNA genes, which can
above (feed-forward and feedback loops). This multi- be used in a homology search in order to come up with
plicity of miRNA target hub regulation can lead to an initial set of miRNA candidates (Mendes et al. 2009).
highly nonlinear features like miRNA-mediated cross
talk, cooperation between different transcription fac- Target Prediction
tors, and generation of tissue-specific expression pat- The first step in the functional characterization of
terns (Lai et al. 2012). miRNAs and their association with biological pro-
cesses is the identification of miRNA-regulated
Bioinformatic Tools and Algorithms for mRNA targets. Shortly after the first miRNA-target
miRNA Investigation pairs have been identified experimentally, computa-
Advancements in the understanding of miRNA biol- tional algorithms for the prediction of more targets
ogy have always been paved by bioinformatic tools have been developed (▶ MicroRNA Target Predic-
and algorithms. Computational methods are being used tion). Soon, patterns in miRNA-target recognition
for the prediction of miRNA genes and their targets, emerged and were exploited to improve target predic-
for the analysis of high-throughput expression experi- tion results. A central criterion for target candidates is
ments and functional assessment of miRNA targets. the sequence complementarity between a potential
Furthermore, bioinformatic approaches are used for ▶ target site and the mature miRNA, with special
data integration and knowledge transfer. emphasis on the ▶ seed region. Other criteria that
suggest the presence of functional miRNA-mRNA
MiRNA Gene Identification pairs are the evolutionary conservation of the target
Around the beginning of this century, it became clear site sequence, the thermodynamic stability of the puta-
that there exist more miRNAs than just lin-4 and let-7. tive miRNA-mRNA duplex, and the multiplicity of
And it was found that many of them are conserved in miRNA binding sites in the 30 UTR of the mRNA
different species. At that time, methods based on target. For many algorithms, target predictions were
▶ non-coding RNA prediction algorithms were made accessible in public databases. Some examples
C 478 Computational microRNA Biology
Computational microRNA Biology, Table 1 MicroRNA Web resources. It has to be noted that URLs might change for unforeseen
reasons. Similarly, users should be aware of the date of the last update of the database to avoid the use of outdated information
Relevant microRNA Web resources
Name Description URL
miRBase Primary miRNA registry www.mirbase.org
Pre-miRNA and mature miRNA sequence and structure information
microRNA.org miRanda predicted miRNA targets www.microrna.org
miRecords Database of validated miRNA-target interactions http://mirecords.biolead.org
miRGator v2.0 miRNA expression profiles http://mirgator.kobic.re.kr
Functional characterization of miRNAs
miR2Disease miRNA disease relationships www.mir2disease.org
miRWalk Target predictions from eight algorithms http://mirwalk.uni-hd.de
Validated targets
Predicted functional associations
responses in the mRNA and protein expression levels categories. In this way, the effective inhibitory level
of the target gene are affected by parameter settings, of a miRNA can in multivalued logic be described by
which are thought as being target gene several stages, for example, null, low, high, and max-
specific. Others have shown how to employ kinetic imum inhibition, while the expression of the target
modeling to investigate the regulation of miRNA- might be categorized into silenced, low, high, and
regulated networks that are of biomedical interest. maximal expression.
For example, Lee and collaborators (2010) devel- Graph Analysis of miRNA Networks. MicroRNA-
oped a network model that characterized miR-204 mRNA networks can be described by using
as tumor suppressor miRNA and uncovered previ- graphs that capture the processes by which mature
ously unknown connections between network topol- miRNAs control the translation of target mRNAs.
ogy and expression dynamics. Additionally, they In these graphs, elements involved in the network
validated 18 target genes related to tumor are represented as nodes. Edges that connect
progression. two nodes account for a relationship (interaction)
Logical Modeling of miRNA Regulatory Networks. between them, for example, the repression of
When microRNAs are embedded in regulatory net- a mRNA by a given miRNA (▶ MicroRNA-mRNA
works, the size limits the applicability of the kinetic Regulation Networks). Using this approach, miRNA-
modeling approach. Logical models represent mRNA regulation networks have proven to display
a suitable alternative for larger networks. These can, topological properties like fat-tail degree distribution,
for example, be regulatory networks involving dozens abundance of cybernetic motifs, and modular
of transcription factors, miRNAs, and other types of structures.
functional molecules (e.g., repressors, coactivators,
corepressors, ribonucleoproteins). The logical
approach can be classified into Boolean logic and Cross-References
multivalued logic (▶ MicroRNA-embedding Regula-
tion Networks, Logical Modeling). Boolean logic ▶ MicroRNA Biogenesis, Regulation
works on the basis of binary categories. For instance, ▶ MicroRNA Clusters
a miRNA can only be treated as absent or present, ▶ MicroRNA Families, Cancer Progression
while a transcription factor as inactive or active. ▶ MicroRNA Gene Prediction
Their time evolution is restricted to transitions ▶ MicroRNA Regulation, Feedback Loop
between two generic states encoded by 0 and 1. ▶ MicroRNA Regulation, Feed-Forward Loops
Multivalued logic instead generalizes the Boolean ▶ MicroRNA Regulation, Signaling Pathways
approach and allows the use of multiple ordinal ▶ MicroRNA Target Prediction
C 480 Computational Modeling
▶ MicroRNA Web Resources Lee Y et al (2010) Network modeling identifies molecular func-
▶ microRNA, Disease and Therapy tions targeted by miR-204 to suppress head and neck tumor
metastasis. PLoS Comput Biol 6(4):p.e1000730
▶ MicroRNA, Seed Levine E, Ben Jacob E, Levine H (2007) Target-specific and
▶ MicroRNA-embedding Regulation Networks, global effectors in gene regulation by MicroRNA. Biophys
Logical Modeling J 93(11):L52–L54
▶ MicroRNA-mRNA Regulation Networks Mendes ND, Freitas AT, Sagot MF (2009) Current tools for the
identification of miRNA genes and their targets. Nucleic
▶ MiRBase Acids Res 37(8):2419–2433
▶ Non-coding RNA, Prediction Osella M et al (2011) The role of incoherent microRNA-
▶ RNA-seq mediated feedforward loops in noise buffering. PLoS
▶ Target Cleavage Comput Biol 7(3):p.e1001101
Shalgi R et al (2007) Global and local architecture of the mam-
▶ Target Hub malian microRNA-transcription factor regulatory network.
▶ Target Identification, microRNA PLoS Comput Biol 3(7):p.e131
▶ Target Site Takamizawa J et al (2004) Reduced expression of the
let-7 microRNAs in human lung cancers in association
with shortened postoperative survival. Cancer Res
64(11):3753–3756
References Tsang J, Zhu J, van Oudenaarden A (2007) MicroRNA-mediated
feedback and feedforward loops are recurrent network motifs
in mammals. Mol Cell 26(5):753–767
Aguda BD et al (2008) MicroRNA regulation of a cancer net-
work: consequences of the feedback loops involving miR-
17-92, E2F, and Myc. Proc Natl Acad Sci USA
105(50):19678–19683
Ambros V (2004) The functions of animal microRNAs. Nature
431(7006):350–355
Bartel DP (2004) MicroRNAs: genomics, biogenesis, mecha- Computational Modeling
nism, and function. Cell 116:281–297
Carrington JC, Ambros V (2003) Role of microRNAs in plant
▶ Modeling Formalisms, Lymphocyte Dynamics and
and animal development. Science (New York, NY)
301(5631):336–338 Repertoires
Enright AJ, Griffiths-Jones S (2008) MiRBase: a database of
microRNA sequences, targets and nomenclature. In:
Appasani K (ed) MicroRNAs: from basic science to disease
biology. Cambridge University Press, Cambridge,
pp 157–171
Filipowicz W, Bhattacharyya SN, Sonenberg N (2008) Computational Modeling Languages
Mechanisms of post-transcriptional regulation by
microRNAs: are the answers in sight? Nat Rev Genet
9:102–114 ▶ Cell Cycle Modeling, Process Algebra
Garzon R, Calin GA, Croce CM (2009) MicroRNAs in cancer.
Annu Rev Med 60:167–179
Hurst DR, Edmonds MD, Welch DR (2009) Metastamir: the
field of metastasis-regulatory microRNA is spreading.
Cancer Res 69(19):7495–7498
Inui M, Martello G, Piccolo S (2010) MicroRNA Computational Models of Mechanisms
control of signal transduction. Nat Rev Mol Cell Biol
11:252–263 ▶ Mechanism, Dynamic
Iorio MV et al (2005) MicroRNA gene expression deregulation
in human breast cancer. Cancer Res 65(16):7065–7070
Lai X, Schmitz U, Gupta S, Bhattacharya A, Kunz M,
Wolkenhauer O, Vera J (2012) Computational analysis of
target hub gene repression regulated by multiple and coop-
erative miRNAs. Nucleic Acids Res 40:8818–8834
Lee RC, Feinbaum RL, Ambros V (1993) The C. elegans
Computer Modeling
heterochronic gene lin-4 encodes small RNAs with antisense
complementarity to lin-14. Cell 75:843–854 ▶ Lymphocyte Dynamics and Repertoires, Modeling
Concavity Tree for Protein–Protein Interaction Analysis 481 C
the root is a simple characterization of the whole
Computer Simulations object boundary, i.e., its ▶ convex hull. The next
level describes the set of objects obtained by
▶ Partial Differntial Equations, Numerical Methods subtracting the object from the convex hull. All
and Simulations deeper levels are elaborated in the same way, itera-
tively. A node that represents a convex shape corre-
sponds to a leaf in the tree, so it does not have any C
child.
Figure 1 shows an example of a shape (a), its
Computer-Based Patient Records (CPR) convex hull, concavities, and meta-concavities (b),
and its corresponding concavity tree (c). The shape
▶ Electronic Health Records generates five concavities as reflected in level one of
the tree. The four leaf nodes in level one correspond to
the highlighted triangular concavities shown
in (d), whereas the non-leaf node corresponds to the
concavity shown in (e). Similarly, the nodes in levels
Concavity Tree for Protein–Protein two and three correspond to the meta-concavities
Interaction Analysis highlighted in (f) and (g), respectively. Typically,
each node in a concavity tree stores information perti-
Virginio Cantoni1,2, Alessandro Gaggia1, nent to the part of the object the node is describing
Riccardo Gatti1 and Luca Lombardi1 (a feature vector for example), in addition to tree meta-
1
Department of Computer Engineering and Systems data (like the level of the node; height, number of
Science, University of Pavia, Pavia, Italy nodes, and number of leaves in the subtree rooted at
2
Computational Biology, KTH Royal Institute of the node, etc.).
Technology, Stockholm, Sweden
Characteristics
Definition
One of the most successful approaches for protein
First introduced by Sklansky (1972), further shape analysis and description is the structural one.
researched by Bachelor (1979, 1980) and Borgefors A complex shape can be segmented into its compo-
and Sanniti di Baja (1992), a concavity tree is a data nent (e.g., a pockets set), and each pocket can be
structure used for describing non-convex subsequently decomposed into simpler regions, and
two-dimensional shapes. It is a rooted tree in which the complete description is given in terms of the
Concavity Tree for Protein–Protein Interaction Analysis, Fig. 1 An object (a), its convex hull with green concavities (b), the
actual shape of each node (c) and the corresponding three levels concavity tree (d)
C 482 Concavity Tree for Protein–Protein Interaction Analysis
D1
B331
B321
B33
B111 B311
B31 B32
D B3
D2 B3
B11
B
B1 B1
A
C B2
A1
B12
B22
A2 C1 B21 B2
B211
A B C D
Concentration Control Coefficient
A1 A2 C1 D1 D2
Emma Saavedra and Rafael Moreno-Sánchez
Department of Biochemistry, National Institute for
B1 B2 B3 Cardiology “Ignacio Chávez”, Mexico City, Mexico
PDB: 3IX0
fX;Y ðx; yÞ
Concentration Graph Model fY ðyjX ¼ xÞ ¼ ; (2)
fX ðxÞ
▶ Graphical Gaussian Model where fX;Y ðx; yÞ gives the joint density of X and Y,
while fX ðxÞ gives the density of X.
Cross-References
Concept Extraction Task
▶ Causal Relationship
▶ Template Filling, Text Mining ▶ Conditional Independence
References
Katsuhisa Horimoto
Synonyms Computational Biology Research Center, National
Institute of Advanced Industrial Science and
Conditional probability distribution; Conditional Technology, Koto-ku, Tokyo, Japan
probability density function; Conditional probability
mass function
Synonyms
Cross-References
Conformational Epitope
▶ Identifiability
▶ Maximum Likelihood Estimation ▶ Discontinuous Epitope
▶ Structural and Practical Identifiability Analysis
References
Conjugation Reactions
Meeker WQ, Escobar LA (1995) Teaching about approximate
confidence regions based on maximum likelihood
▶ Phase II Enzymes
estimation. The Am Stat 49(l):48–53
Press WH, Teukolsky SLA, Flannery BNP, Vetterling WMT
(1990) Numerical recipes: FORTRAN. Cambridge
University Press, Cambridge
Connectionist Model
▶ IMGT-ONTOLOGY, ConfigurationType
Connective Tissue
▶ Stroma
Confocal and Multiphoton Microscopy
Xiaohua Wu
Department of Pediatrics, Herman B Wells Center for Connectivity Theorem
Pediatric Research, Indiana University School of
Medicine, Indianapolis, IN, USA Emma Saavedra and Rafael Moreno-Sánchez
Department of Biochemistry, National Institute for
Cardiology “Ignacio Chávez”, Mexico City, Mexico
Definition
Cross-References Cross-References
Conservation Analysis
of the system. Namely, constraints cannot be expressed behalf.” A system acts on its own behalf if it is able
in the same language than the microscopic description to use its work to regenerate at least some of the
of matter. In fact, what a constraint does is to selec- constraints that make work possible. When this occurs,
tively ignore microscopic degrees of freedom in order a Work-Constraint cycle is realized. In physical terms,
to obtain a simplification in the prediction or explana- it requires very specific conditions to occur. Actually,
tion of movement. The concept of constraint means the cycle is inevitably a thermodynamic irreversible
a selective loss of detail or a predetermined rule about process, which dissipates energy and requires cou-
what is going to be ignored. plings between exergonic (spontaneous, energy releas-
Therefore, for Physics constraining forces are, ing) reactions and endergonic (non-spontaneous,
unavoidably, linked to a new hierarchical level of energy requiring) ones, such that exergonic processes
description. When in Physics a constraint equation is are constrained in a specific way to produce work that
added to the equations of movement, we are always may be used to generate endergonic processes, which
dealing with two languages at the same time. The in turn generate those constraints canalizing exergonic
language of the equation of movement relates the processes. In Kauffman’s terms, “Work begets con-
detailed trajectory or the state of the system with straints beget work” (Kauffman 2000).
dynamical time, whereas the language of the constraint In evolutionary biology, the concept of constraint
does not deal at all with the same kind of system, but has mainly been introduced as a challenge by develop-
with another situation in which the dynamical detail mental approaches regarding the scope of selection and
has been purposely ignored. In other words, constric- the extent to which ▶ adaptation remains the main
tion forces are not detailed forces of the individual explanatory factor (see ▶ Explanation, Developmen-
particles but forces of the collections of particles or, tal). Besides the specific issue of connecting develop-
sometimes, forces of simple units averaged out in time. mental with evolutionary accounts, the concept of
In any case, some form of statistical averaging process constraint plays an important role, for instance, in the
has substituted microscopic detail. In Physics, then, in criticisms to adaptationism, in the discussion regarding
order to describe a constraint, the detailed dynamical the relevance of stasis and other macroevolutionary
description has to be abandoned. A constraint requires, patterns, or in the morphologically oriented proposals
therefore, an alternative description. of structuralism or ▶ complexity theories (see Orzack
Another relevant contribution to naturalize the con- and Sober 2001).
cept of constraint in the biological domain has been In an already classic paper that may be considered
provided by theoretical biologist Stuart Kauffman, to be an attempt to build a “consensus” position on the
who has recently proposed the idea of the “Work- subject, a developmental constraint is defined as “a
Constraint cycle” (Kauffman 2000, 2003). The Work- bias on the production of variant phenotypes or
Constraint cycle is supposed to capture what Kauffman a limitation on phenotypic variability caused by the
takes as a central feature of all biological organisms, structure, character, composition, or dynamics of the
namely, the fact of acting “on their own behalf” developmental system” (Maynard Smith et al. 1985,
(Kauffman 2003, 1089). Whereas this idea appears to 266). The origin of this bias or limitation may be
be in accordance with common intuition, Kauffman’s attributed to various sources (materiality, genetic
scientific challenge consists in giving a naturalized and dynamics, evolutionary pathways, complexity regime,
consistent account of it. The concept of a Work- etc.) but what is agreed upon is that they have an
Constraint cycle plays precisely this role. The main impact in evolution. A distinction is also drawn
idea is to link the idea of action to that of “work,” the between universal constraints, deriving from general
latter being defined, following Atkins, as “constrained laws of physics or from invariant properties of some
release of energy into relatively few degrees of free- material or complex systems, and local constraints,
dom” (Kauffman 2003, 1094). In this definition, the confined to particular taxa.
concepts of work and constraint are related: work is Amundson (1994), while accepting that constraint
constrained release of energy. This connection gives “implies some sort of restriction on variety or on
a way to interpret the slogan acting “on their own change,” claims that the key rests on the answer to
Constraint 493 C
the question: “What is being constrained?” Thus, he Cross-References
identifies two possible answers depending on whether
the explanandum is adaptation or form. We would ▶ Adaptation
therefore have to distinguish between understanding ▶ Autonomy
the effect of developmental principles on evolution as ▶ Complexity
constraints on adaptation, that is, as restrictions ▶ Constraint, Holonomic
imposed by embryology on the adaptive optimality of ▶ Constraint, Non-holonomic C
adult organisms on the potential of adaptation, or con- ▶ Explanation, Developmental
straints on form, in the sense that in the morphospace ▶ Explanation, Evolutionary
there are morphologies that cannot be achieved by the ▶ Explanation, Functional
process of development. ▶ Functional
This distinction is orthogonal to the previous one ▶ Hierarchy
between local and universal kinds of constraints since ▶ Interlevel Causation
both of them may operate, irrespectively, on the pros- ▶ Organization
pects of reaching more optimal adaptations and on the
scope of generation of organic form.
Sometimes, other kinds of constraints are men- References
tioned in the context of contending evolutionary expla-
nations, even if they are not generally accepted as Allen TFH, Starr TB (1982) Hierarchy. Perspectives for
specific constraints. For instance, historical constraints ecological complexity. The University of Chicago Press,
Chicago
are often brought up to contrast with developmental
Amundson R (1994) Two concepts of constraint: adaptationism
ones though maintaining the same feature of imposing and the challenge from developmental biology. Philos Sci
restriction on adaptation, but they are due to more 61:556–578
“evolutionary” common factors such as contingency Blitz D (1992) Emergent evolution. Qualitative novelty and the
levels of reality. Kluwer, Dordrecht
or accident creating some kind of bias. Ecological
Kauffman S (2000) Investigations. Oxford University Press,
constraints can also be sometimes evoked. Oxford
Regarding these uses and besides those distinctions Kauffman S (2003) Molecular autonomous agents. Phil Trans
about how to understand the concept, there is another R Soc A 361:1089–1099
Maynard Smith J, Burian R, Kauffman S, Alberch P, Campbell J,
point worth mentioning, since it allows connecting the
Goodwin B, Lande R, Raup D, Wolpert L (1985) Develop-
concept of developmental constraint with the more mental constraints and evolution: a perspective from the
general idea defined in the beginning of this entry. mountain lake conference on development and evolution.
According to Schwenk and Wagner (2003), this col- Q Rev Biol 60(3):265–286
Orzack SH, Sober E (eds) (2001) Adaptationism and optimality.
lective definition highlights the separation between
Cambridge University Press, Cambridge
constraint and selection by distinguishing the “gener- Pattee HH (1968) The physical basis of coding and reliability in
ation of variation from the operation of selection on biological evolution. In: Waddington CH (ed) Towards
that variation” (p. 54). In this sense, the force of selec- a theoretical biology, vol 1, Prolegomena. Edinburgh Uni-
versity Press, Edinburgh, pp 67–93
tion would assume the role of the general law upon
Pattee HH (1972) Laws and constraints, symbols and languages.
which some rules are imposed and, here too, some In: Waddington CH (ed) Towards a theoretical biology,
potential degrees of freedom are reduced due to what vol 4, Essays. Edinburgh University Press, Edinburgh,
Polanyi was calling boundary conditions and what pp 248–258
Polanyi M (1968) Life’s irreducible structure. Science
Pattee called constraints in their more general
160:1308–1312
approaches. Salthe SN (1985) Evolving hierarchical systems. Their structure
and representation. Columbia University Press, New York
Schwenk K, Wagner GP (2003) Constraint. In: Hall BK, Olson
Acknowledgments Funding for this work was provided by WM (eds) Keywords in evolutionary developmental biology.
grants GV/EJ IT505-10 from the Government of the Basque Harvard University Press, Cambridge, MA, pp 52–61
Country, and FFI2011-25665 from the Ministerio de Economı́a Sommerfeld A (1952) Mechanics, 2nd edn. Academic,
y Competitivad (MEC) and FEDER funds from the E.C. New York
C 494 Constraint, Holonomic
▶ Constraint Synonyms
Constraint, Non-holonomic
Definition
Jon Umerez and Matteo Mossio
IAS-Research Philosophy of Biology Group, With the advent of high-throughput technology, there
Department of Logic and Philosophy of Science, has been a growing need to develop computational
University of the Basque Country (UPV/EHU), frameworks that contribute to analyze these data for
Donostia – San Sebastian, Spain unveiling the biological principles underlying meta-
bolic networks. Constraint-based modeling is
a paradigm in systems biology that contributes to this
Definition latter aim by considering physical, enzymatic, and
topological constraints underlying the phenotype in
Non-holonomic constraints are variable auxiliary con- a metabolic network. In global terms, this method can
ditions that limit in time the number of degrees of be stratified into four steps: metabolic reconstruction
freedom of the system. They are, then, dynamical of an organism, mathematical representation of the
structures that establish time-dependent relations metabolic network, in silico analysis, and experimental
Constraint-based Modeling 495 C
Constraint-based
Modeling, Constraint-based modeling.
Deterministic
Fig. 1 Constraint-based
modeling. Constraint-based Mathematical Topology Stochastic / Boolean
modeling is a paradigm in Network dx = s.v
representation dt
systems biology which
consists of four main steps. Maximize z = Σc .x
2 2
that all the reactions included in the reconstruction are certain reactions along the reconstruction. Other
such that they are charge and mass balanced, this issue constraint belonging to this classification is related
is essential for ensuring proper in silico results and with the second law of thermodynamics, the latter
interpretations (Thiele and Palsson 2010). being a fundamental law to regulate the direction of
the biological process.
Mathematical Representation of the Metabolic • Enzymatic: when one has information about the flux
Network capacity that one enzyme can be able to carry out.
The relation between the genotype and the phenotype For instance, uptake of oxygen is usually of 20
is a central issue when studying microorganisms in nMol•gDW/Hr.
systems and synthetic biology approaches (Palsson • Additional constraints can be imposed to delimit
2006). At a macroscopic level, the phenotype state in the metabolic capacities, for instance, regulatory
a cell is well characterized by the activity of metabolic constraints when the transcriptional regulatory net-
pathways, which in turn, mirror the genetic and protein work is known in one organism (Covert et al. 2004).
activity required for facing external environment. For Taking into account these constraints brings as
this reason, the study of the metabolic capacities in a consequence a reduction of the feasible metabolic
a genome scale reconstruction becomes relevant for space, this point is an important step toward a more
exploring the feasible phenotype space in an organism. accurate modeling of genome scale metabolic recon-
To this end, the set of reactions that form the metabolic struction, see Fig. 1 (black region). A variety of
reconstruction are mathematically represented through approaches have been suggested to explore the pheno-
the ▶ stoichiometric matrix, S. Thus, by assuming that type capacities of a metabolic reconstruction and link
metabolic concentrations are at steady state condition, their outputs to experimental and high-throughput data
the feasible metabolic space of flux distribution can be (Price et al. 2004). Among these approaches, ▶ flux
calculated by: balance analysis (FBA) is a paradigm in systems biol-
ogy for exploring the metabolic phenotype in an organ-
S v¼0 (1) ism and predicting its response under external
perturbation (Orth et al. 2010). In this biased approach,
where v represents a vector whose entries are the the metabolic activity in a reconstruction is obtained,
metabolic fluxes of the reactions included in the recon- first, by defining a function that represents certain type
struction. Hence, from a mathematical perspective, of biological purpose – called objective function, Z, and
the problem associated to the feasible phenotype then, identifying the metabolic flux distribution that
state is close related to solve the null space of the maximizes it. Depending on the case of study, this
stoichiometric matrix (Palsson 2006). objective function can represent a specific physiological
state (as growth rate or bacterial nitrogen fixation) and it
In Silico Analysis is calculated through linear optimization,i.e.,
Physical, thermodynamical, and enzymatic con-
" #
straints. Even though the feasible metabolic capacities X
in the metabolic network are quantitatively well Maximize Z ¼ c k Xk
k¼1
defined through Eq. 1, the number of possibilities
remains still huge and additional considerations are
such that:
required to limit their spectrum. Thus, in order to
limit even more this space, one imposes these main X
constraints when available: Si;j vj ¼ 0 i ¼ 1 . . . :m
• Physical: when we have information that certain
reactions occur in well-defined compartmentalized a j v j bj j ¼ 1...n
organelle inside the cell, for instance, mitochondria
or cytoplasm. where Xk is a metabolite participating in the objective
• Thermodynamical: when we have information function and ck is a weight factor. Here m indicates
related with the irreversibility or reversibility of the number of metabolites and n the number of
Constraint-based Modeling 497 C
reactions along the reconstruction. Note that the latter capable of integrating high-throughput data and com-
two equations indicate that the solution is subject to putational simulations. This integrative description at
a steady state condition and is constrained by thermo- genome scale is useful for: (1) understanding the fun-
dynamic and enzymatic limits. This approach has damental activity of metabolism, (2) predicting pheno-
been successfully and extensively applied to explore type metabolic under internal or external
the consequences that gene perturbations produce on perturbations, and (3) designing more informative
phenotype in archaea, eukarya, and bacteria. The experiments at various biological layers. C
number of papers where FBA has contributed to
understand the metabolic activity in microorganism
has been rising recently; a representative set of Cross-References
applications in some organism can be reviewed in
Resendis-Antonio et al. (2007, 2012), Chang et al. ▶ Flux Balance Analysis
(2012) and Varum Mazumdar et al. (2009). ▶ Stoichiometric Matrix
In parallel with this approach, a variety of methods
have been proposed for extending the analysis of the
metabolic phenotype inherent to a genome scale References
metabolic reconstruction (Price et al. 2004). The appli-
Caspi R, Altman T, Dale JM, Dreher K, Fulcher CA, Gilham F,
cability of these methods ranges from sampling
Kaipa P, Karthikeyan AS, Kothari A, Krummenacker M,
the entire phenotypes in the null space of the stoichio- Latendresse M, Mueller LA, Paley S, Popescu L, Pujar A,
metric matrix to calculating the metabolic phenotype Shearer AG, Zhang P, Karp PD (2010) The MetaCyc data-
when knockout or gene deletion occurs in the base of metabolic pathways and enzymes and the BioCyc
collection of pathway/genome databases. Nucleic Acids Res
metabolic reconstruction (Segre et al. 2009). In addi-
38:D473–D479
tion, there are some algorithms devoted to explore the Chang RL, Ghamsari L, Manichaikul A, Hom EFY, Balaji S,
relation between dynamic behavior of metabolites, Fu W, Shen Y, Hao T, Palsson BO, Salehi-Ashtiani K,
topology of network, and biological functionality Papin JA (2012) Metabolic network reconstruction of
Chlamydomonas offers insight into light-driven algal
(Resendis-Antonio 2009; Jamshidi and Palsson 2008).
metabolisms. Mol Syst Biol 7:518
Covert MW, Knight EM, Reed JL, Herrgård MJ, Palsson BØ
Experimental Assessment of In Silico Predictions (2004) Integrating high-throughput and computational data
Constraint-based modeling lets us simulate real bio- elucidates bacterial networks. Nature 429(6987):92–96
Feist AM, Herrgard MJ, Thiele I, Reed JL, Palsson BO
logical processes and explore the resulting phenotypes
(2009) Reconstruction of biochemical networks in microbial
when changes in parameters occur due to genetic organisms. Nat Rev Microbiol 7(2):129–143
mutations or environmental perturbations. The struc- Jamshidi N, Palsson BØ (2008) Formulating genome-scale
tural organization of this approach is such that it lets us kinetic models in the post-genome era. Mol Syst Biol 4:171
Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, Itoh M,
to move toward experimental assessment with high-
Katayama T, Kawashima S, Okuda S, Tokimatsu T,
throughput technology. Thus, transcriptome data, pro- Yamanishi Y (2008) KEGG for linking genomes to life and
teome, and metabolome profiles – and other omics – the environment. Nucleic Acids Res 36:D480–D484
become valuable data for refining the model and Mazumdar V, Snitkin E, Amar S, Segre D (2009) Metabolic
network model of a human oral pathogen. J Bacteriol
assessing the biological hypothesis emerging from in
191(1):74–90
silico analysis (Resendis-Antonio et al. 2012). This Orth JD, Thiele I, Palsson BØ (2010) What is flux balance
feedback, established between experiment design and analysis? Nat Biotechnol 28:245–248
computational evaluation, is a valuable and needed Palsson BØ (2006) Systems biology: properties of reconstructed
networks. Cambridge University Press, Cambridge/New York
task for completing our understanding of how micro-
Price ND, Reed JL, Palsson BO (2004) Genome-scale models of
organisms organize and control their metabolic microbial cells: evaluating the consequences of constraints.
response at specific environments. Nat Rev Microbiol 2:886–897
Reed JL, Famili I, Thiele I, Palsson BO (2006) Towards
multidimensional genome annotation. Nat Rev Genet
Conclusions
7(2):130–141
Constraint-based modeling of metabolic networks Resendis-Antonio O (2009) Filling kinetic gaps: dynamic
contributes to establish a systems biology platform modeling of metabolism where detailed kinetic information
C 498 Continuous Model
is lacking. PLoS One 4(3):e4967. doi:10.1371/journal. In Systems Biology, ordinary differential equations
pone.0004967 (ODEs) are the most established continuous modeling
Resendis-Antonio O, Reed JL, Encarnacion S, Collado-Vides J,
Palsson BØ (2007) Metabolic reconstruction and modeling representation. For example, one uses ODEs to model
of nitrogen fixation in Rhizobium etli. PLoS Comput Biol the concentration of each metabolite which is calcu-
3:10 e192 lated by a single ODE encapsulating all the reactions
Resendis-Antonio O, Checa A, Encarnación S (2010) Modeling where the metabolite is synthesized or consumed, with
core metabolism in cancer cells: surveying the topology
underlying the warburg effect. PLoS One 5(8):e12383. fluxes determining the transformations to and from
doi:10.1371/journal.pone.0012383 other metabolites in the metabolic network.
Resendis-Antonio O, Hernandez M, Salazar E, Contreras S, The use of continuous models is the main technique
Martinez-Batalla G, Mora Y, Encarnacion S (2012) Systems of the quantitative sciences. The antonym of continu-
biology of bacterial nitrogen fixation: high-throughput tech-
nology and its integrative description with constraint-based ous model is logic model (Please refer to the definition
modeling. BMC Syst Biol 5:120 of logic model).
Segre D, Vitkup D, Church G (2009) Analysis of optimality in
natural and perturbed metabolic networks. Proc Natl Acad
Sci U S A 99(23):15112–15117
Thiele I, Palsson BO (2010) A protocol for generating a
high-quality genome scale metabolic reconstruction.
Nat Protoc 5(1):93–121
Contractile Actomyosin Ring (CAR)
▶ Actomyosin Ring
Continuous Model
Synonyms
Control
ODE
▶ Constraint
Definition
Cross-References
William Bechtel For a set of points S the Convex Hull H(S), in 2D space,
Department of Philosophy and Center for is the smallest convex polygon P that encloses S,
Chronobiology, University of California, San Diego, smallest in the sense that there is no other polygon P0
La Jolla, San Diego, CA, USA such that S P0 P (Fig. 1).
Cross-References
Core Promoter Elements
▶ Convex Programming
▶ Model Falsification Tetsuro Kokubo
▶ Semidefinite Programming Department of Supramolecular Biology, Graduate
School of Nanobioscience, Yokohama City
University, Yokohama, Kanagawa, Japan
References
Definition
Convex Programming
The core promoter elements are cis-acting functional
Lin Wanglin sequences embedded within the core promoters.
School of Computer Science and Information Several representatives in the RNAPII system are
Engineering, Tianjin University of Science and summarized in Fig. 1 (Juven-Gershon and Kadonaga
Technology, Tianjin, China 2010; Smale and Kadonaga 2003). The first-
discovered and best-characterized is the TATA box,
which serves as a recognition site for TBP. This ele-
Synonyms ment is widely conserved from yeast to human, but is
found in only a small fraction (approximately 10–20%)
Convex optimization of all RNAPII promoters. Importantly, TBP is essential
for transcription, regardless of whether the TATA box
is present or absent in the core promoter. Therefore,
TBP must play some indispensable but as-yet
Definition unidentified role(s) in transcription, other than simple
binding to the TATA box. In some human promoters,
A following nonlinear programming is called a convex the TATA box is flanked by BREu and/or BREd, both
programming if f ðxÞ; gi ðxÞ; i ¼ 1; ; m; are convex of which are recognition sites for TFIIB. These ele-
functions in Rn : ments function positively or negatively depending on
the architecture of promoters and/or transcription fac-
Minimize f ðxÞ tors present in the system.
subject to gi ðxÞ 0; i ¼ 1; ; m; (1) The DNA region encompassing the transcription
Ax ¼ b; initiation site(s) often contains the initiator (Inr)
element (Fig. 1). Inr consensus sequences appear to
where A is an p n matrix. be different in human (YYANWYY) and Drosophila
(TCAKTY) (underlined A indicates initiation site/+1).
In addition, even in human, there are multiple Inr
consensus sequences ranging from a very relaxed
References motif (YR) to a very strict motif (sINR;
GSCGCCATYTTG) (Dikstein 2011), depending on
Zhang XS (2000) Neural networks in optimization. Kluwer, the estimation method. Therefore, it is likely that
Dordrecht there are a variety of core promoter elements
Core Promoter Elements 503 C
initiation site
+27~+29
−40 +40 C
BREu TATA BREd Inr
Core Promoter Elements, Fig. 1 S(G/C), K(G/T), R(A/G), (motif ten element), DPE (downstream promoter element),
M(A/C), W(A/T), Y(T/C), V(G/A/C), D(G/A/T), BRE (TFIIB DCE (downstream core element), XCPE1/2 (X gene core pro-
recognition element), BREu (upstream BRE), BREd (down- moter element 1/2)
stream BRE), TATA (TATA box), Inr (Initiator), MTE
encompassing the initiation site(s), with only a fraction Nearly all of the core promoter elements are recog-
belonging to the original Inr. Consistent with this, nition sites for GTFs. As previously discussed, TATA
some human genes contain XCPE1 or XCPE2, which box and BREu/d are recognized by TBP and TFIIB,
were originally identified in the core promoter of hep- respectively. Importantly, TFIID (transcription factor
atitis B virus X gene. These elements are not only IID), a large protein complex comprising TBP and 14
structurally but also functionally different from Inr. TAFs (TBP-associated factors), is involved in the
In higher eukaryotes, some core promoter elements recognition of Inr, MTE, DPE, Bridge, and DCE. Spe-
are located downstream of transcription initiation cifically, Inr, MTE/DPE/Bridge and DCE may be rec-
site(s) (Fig. 1). Although MTE and DPE were origi- ognized by TAF1/TAF2, TAF6/TAF9, and TAF1,
nally identified as separate elements, it was recently respectively. However, it is still unclear which factor
proposed that they comprised two distinct functional recognizes XCPE1/2. Native promoters contain only
subregions, one of which is included in common in one or a few of these core promoter elements, probably
these two elements. The novel core promoter element because a weaker level of basal transcription is more
that contains the 50 -subregion of MTE and the advantageous for regulation by activators. In fact, an
30 -subregion of DPE is designated as “Bridge.” DCE artificial core promoter containing TATA box, Inr,
is another downstream core promoter element that also MTE, and DPE is highly active even in the absence
comprises three subregions (SI, SII, and SIII). DCE is of activators.
not related to MTE, DPE, and Bridge, because the Core promoter elements such as Inr that
distance from the transcription initiation site(s) is overlap with the initiation site(s) could function as
highly restricted for the latter three elements but not a platform for PIC assembly. However, these ele-
for the former element. Conversely, the TATA box ments may also contain as-yet uncharacterized but
occurs frequently with the former element but not crucial information that enables RNAPII to initiate
with the latter elements. transcription productively. Genetic studies indicate
C 504 Cores
Cross-References
Cores
▶ Neovascularization
José Román Bilbao Castro
Supercomputing-Algorithms group, University of
Alméria, Alméria, Spain
References
▶ Correlation Coefficient
Synonyms
Yellow body
Correlation Coefficient
Definition Fuyan Hu
Institute of Systems Biology, Shanghai University,
The corpus luteum is a temporary endocrine structure Shanghai, China
formed during the mammalian menstrual cycle. In the
early phase of the human menstrual cycle, 20–25 pri-
mary follicles containing oocytes begin to develop into Synonyms
secondary follicles, among which only one will mature
into a Graafian follicle containing a mature egg. During Coefficient of correlation; Correlation; Correlational
ovulation, the egg is released upon rupture of the statistics; Correlativity
Graafian follicle, in which the follicular cells then
undergo reprogramming to form the corpus luteum
(Rizzo 2006). During this process, the follicular cells Definition
exit cell cycle and undergo terminal differentiation into
estrogen- and progesterone-producing luteal cells. If A correlation coefficient is a statistic index which
fertilization and implantation do not occur, the corpus measures the degree to which two variables are related.
luteum soon degenerates to form the corpus albicans Generally, it varies from 1 (perfect negative correla-
(white body), and a new menstrual cycle is initiated tion) through 0 (no correlation) to +1 (perfect positive
(Stocco et al. 2007). Intense angiogenesis occurs during correlation). If a negative correlation is detected,
the formation of the corpus luteum, which is regulated whenever one variable has a high (low) value, the
by key angiogenic factors such as vascular endothelial other will have a low (high) value. If it is a positive
growth factor (VEGF) and matrix metalloproteinase-2 correlation, whenever one variable has a high (low)
(MMP-2) (Stocco et al. 2007; Zhang et al. 2005). value, so does the other.
The correlation coefficient of x1 and x2 is given by
the formula:
Cross-References
covðx1 ; x2 Þ E½ðx1 m1 Þðx2 m2 Þ
¼ (1)
▶ Neovascularization s1 s2 s1 s2
C 506 Correlation Network
where m1 is the arithmetic mean of x1 , m2 is the arith- correlation between the values on the two axes in
metic mean of x2 , covðx1 ; x2 Þ is the covariance of x1 a two-dimensional plot is quantified by the correla-
and x2 , s1 is the standard deviation of x1 , and s2 is the tion coefficient. Also, correlating values of one vari-
standard deviation of x2 . able with corresponding values at different times is
called autocorrelation.
Cross-References
▶ Causal Relationship
References ▶ Conditional Independence
Correlational Statistics
▶ Relevance Networks
Correlativity
Katsuhisa Horimoto
Computational Biology Research Center, National
Institute of Advanced Industrial Science and CoSBiLab
Technology, Koto-ku, Tokyo, Japan
▶ BlenX
Synonyms
▶ Objective Function
Definition
Cross-tissue network
Tissue B Tissue C
References
Cross-Validation
Drake TA (2010) Genes and pathways contributing to
obesity: a systems biology view. Prog Mol Biol Transl Sci Joo Chuan Tong
94:9–38 Data Mining Department, Institute for Infocomm
Schadt EE (2009) Molecular networks as sensors and drivers
of common human diseases. Nature 461:218–223 Research, Singapore, Singapore
Sieberts SK, Schadt EE (2007) Moving toward a Department of Biochemistry, Yong Loo Lin School of
system genetics view of disease. Mamm Genome Medicine, National University of Singapore,
18:389–401 Singapore
Definition
C
Cyclin-dependent Kinase 1 (CDK1)/
Cumulative Causation Anaphase-Promoting Complex (APC)
Oscillator
▶ Positive Feedback
▶ Cell Cycle of Early Frog Embryos
Curation
Cycling Egg Extract
Sabina Leonelli
ESRC Centre for Genomics in Society, University of ▶ Xenopus laevis Egg Extract
Exeter, Exeter, Devon, UK
Definition
Cyclins and Cyclin-dependent Kinases
Ensemble of activities involved in developing and
maintaining a database (Howe and Rhee 2008; Stein Marcos Malumbres
2008; Leonelli 2010). Spanish National Cancer Research Centre (CNIO),
Madrid, Spain
Cross-References
Synonyms
▶ Bio-Ontologies
▶ CellML Model Curation Cell cycle-regulated kinase complexes
Definition
References
Cyclins are proteins initially identified by their differ-
Howe D, Rhee SY (2008) The future of biocuration. Nature ential protein levels during different phases of the
455:47–50
▶ cell cycle. This characteristic depends on specific
Leonelli S (2010) Packaging data for re-use: databases in
model organism biology. In: Howlett P, Morgan MS (eds) amino acid sequences that are recognized by the pro-
How well do facts travel? The dissemination of tein degradation machinery. Cyclins are known to
reliable knowledge. Cambridge University Press, function as the regulatory subunit of heterodimeric
Cambridge, MA
protein kinase complexes that also contain a catalytic
Stein LD (2008) Towards a cyberinfrastructure for the biological
sciences: progress, visions and challenges. Nat Rev Genet subunit with serine/threonine kinase activity, known as
9(9):678–688 cyclin-dependent kinase (Cdk).
C 510 Cyclins and Cyclin-dependent Kinases
Cyclins and Cyclin-dependent Kinases, Fig. 1 Representa- indicate interactions whose biological significance is not clear.
tive interactions between mammalian cyclins and Cdks. CDK5 Evolution distances in cyclins and Cdks are represented at
may interact with some cyclins (e.g., D-type cyclins) although different scale
this binding does not result in kinase activation. Dotted lines
response to mitogenic signaling (Malumbres and both activating and inactivating phosphorylations. In
Barbacid 2005). It is not clear whether most of the mammals, CDK7 is the catalytic subunit of the com-
other Cdks play direct roles in the regulation of the plex known as Cdk-activating protein (CAK). CAK
cell cycle. CDK5 for instance is activated by cyclin- phosphorylates several Cdks at a critical threonine
like proteins that are almost uniquely expressed in residue at the T-loop of the kinase domain, and this
brain cells. As indicated above, CDK7-9 and phosphorylation is required to unblock the active-site
CDK11 are involved in transcription and RNA cleft. On the other hand, two additional kinases,
processing and the information available for most WEE1 and MYT1, are able to phosphorylate two
other Cdks is too scarce to assign a clear role in the residues located in the roof of the kinase ATP-binding
control of the cell cycle. site and these phosphorylations inhibit Cdk kinase
The activation of cyclin-Cdk complexes is tightly activity. Dephosphorylation of these sites is mediated
controlled at different levels including phosphoryla- by CDC25(A-C) phosphatases. Both WEE1 and
tion and dephosphorylation events, as well as folding CDC25 enzymes are critical regulators of Cdk acti-
and subcellular localization. Cdks are regulated by vation in response to multiple signals such as those
C 512 Cyclins and Cyclin-dependent Kinases
involved in the DNA damage response (Morgan monitored by several ▶ cell cycle checkpoints
2007; Jackson and Bartek 2009) (▶ Cell Cycle Arrest which sense possible abnormalities and generally
After DNA Damage). Cdks are also regulated by result in the inhibition of Cdk activity and cell cycle
direct binding to several ▶ Cdk inhibitors of the arrest.
Ink4 (p16Ink4a, p15Ink4b, p18Ink4c and p19Ink4d) and In general, Cdks are considered as the major
Cip/Kip (p21Cip1, p27Kip1 and p57Kip2) families. engines that drive entry and progression throughout
Ink4 proteins bind CDK4 and CDK6 thus preventing the different phases of the eukaryotic cell cycle. Not
the interaction of the monomeric Cdk with their acti- surprisingly, their activity is deregulated in human
vating cyclins. Cip/Kip inhibitors, on the other hand, cancer (▶ Cell Cycle, Cancer Cell Cycle and
generally bind and inhibit cyclin-Cdk complexes gen- Oncogene Addiction) and inhibiting Cdk activity is
erating inactive trimeric complexes (Sherr and Rob- now considered as an attractive therapeutic approach
erts 1999) against tumor cell proliferation (Malumbres and
The typical phosphorylation sequence recognized Barbacid 2009).
by Cdks is [S/T]*PX[K/R], where [S/T]* indicates
the phosphorylated serine or threonine residue, P is
proline, X represents any amino acid, and [K/R] rep- Cross-References
resents a basic amino acid lysine or arginine two
positions downstream of the phosphorylated residue. ▶ Cdk Inhibitors
Cdks exert their effects in the cell cycle by phosphor- ▶ Cell Cycle
ylating a large number of proteins in the cell, thus ▶ Cell Cycle Arrest After DNA Damage
regulating many aspects of cellular architecture and ▶ Cell Cycle Checkpoints
metabolism. In response to mitogenic signals, cyclin ▶ Cell Cycle Transition, Principles of Restriction
D-CDK4/6 complexes phosphorylate and inhibit the Point
retinoblastoma protein (pRB), a transcriptional ▶ Cell Cycle Transitions, G2/M
▶ repressor that defines the ▶ restriction point in ▶ Cell Cycle Transitions, Mitotic Exit
mammals by preventing the synthesis of cell cycle ▶ Cell Cycle, Budding Yeast
proteins. pRb exerts this function by inhibiting E2F ▶ Cell Cycle, Fission Yeast
transcription factors and recruiting chromatin ▶ DNA Replication
remodeling complexes thus repressing transcription ▶ Mitosis
at specific sites. pRB may be also phosphorylated by ▶ Mitotic Kinases
many other Cdks such as CDK1, CDK2, CDK3, ▶ Repressor
CDK9. Upon pRB inactivation, many proteins ▶ Transcription
required for the synthesis of DNA and mitosis are
synthesized, and cells become committed to ▶ DNA
replication and progression throughout the following References
phases of the cell cycle. CDK2 and CDK1, in com-
plex with E-, A-, and B-type cyclins, are responsible Jackson PK (2008) The hunt for cyclin. Cell 134:199–202
for the phosphorylation of a significant number of Jackson SP, Bartek J (2009) The DNA-damage response in
human biology and disease. Nature 461:1071–1078
proteins. CDK1 (▶ Mitotic Kinases), the major Cdk Malumbres M, Barbacid M (2005) Mammalian cyclin-
activity during G2 and M, phosphorylates more than dependent kinases. Trends Biochem Sci 30:630–641
70 proteins involved in DNA condensation, formation Malumbres M, Barbacid M (2009) Cell cycle, Cdks and cancer:
of the spindle, Golgi remodeling, nuclear envelop a changing paradigm. Nat Rev Cancer 9:153–166
Malumbres M, Harlow E, Hunt T, Hunter T, Lahti JM,
breakdown, etc. (Malumbres and Barbacid 2005).
Manning G, Morgan DO, Tsai LH, Wolgemuth DJ (2009)
Mitotic exit (▶ Cell Cycle Transitions, Mitotic Exit) Cyclin-dependent kinases: a family portrait. Nat Cell Biol
requires the proteasome-dependent degradation of A- 11:1275–1276
and B-type cyclins, thus switching off Cdk activity Morgan DO (2007) The cell cycle: principles of control. New
Science Press, London
and allowing DNA decondensation, spindle disas-
Sherr CJ, Roberts JM (1999) CDK inhibitors: positive and neg-
sembly, reformation of the nuclear envelop, ative regulators of G1-phase progression. Genes Dev
etc. Normal progression throughout the cell cycle is 13:1501–1512
Cytokinesis 513 C
Cyclosome Cytokinesis
Characteristics
References
Definition Farkas AM, Kilgore TM, Lotze MT (2007) Detecting DNA:
getting and begetting cancer. Curr Opin Investig Drugs
Damage associated molecular patterns (DAMPs) are 8(12):981–986
molecules released by stressed cells. Interactions Janeway C (1989) Immunogenicity signals 1,2,3 . . . and 0.
Immunol Today 10(9):283–286
between PRRs and DAMP initiate and perpetuate
Rubartelli A, Lotze MT (2007) Inside, outside, upside down:
immune response similar to pathogen-associated damage-associated molecular-pattern molecules (DAMPs)
molecular pattern molecules (PAMPs) that drive and redox. Trends Immunol 28(10):429–436
Characteristics
Cross-References
The SEEK is accessible via a web-based client and
built on an open source philosophy. It has been adopted ▶ JWS Online
by projects outside the ▶ SysMO consortium, includ- ▶ MIRIAM
ing The Virtual Liver project (http://seek.virtuelle- ▶ SBGN
leber.de/), EraSysBio + (http://www.erasysbio.net), ▶ SysMO
and UniCellSys (http://www.unicellsys.eu/) by the ▶ Systems Biology Markup Language (SBML)
time of writing. Development of the SEEK follows
a rapid, incremental cycle that is strongly user oriented
by virtue of frequent interactions, in site visits and References
workshops with a focus group of users, the
Wolstencroft K, Owen S, du Preez FB, Krebs O, Mueller W,
▶ SysMO-PALS (Project Area Liasons), who are rep-
Goble C, Snoep JL (2011) The SEEK: a platform for sharing
resentatives from each ▶ SysMO project and are data and models in systems biology. Methods Enzymol
a mixture of experimentalists, modelers, and 500:629–655, Elsevier, Amsterdam
Data Integration and Visualization 519 D
biology–related activities, from functional characteri-
Data Collection zation of genomic and proteomic data to the develop-
ment of mathematical models of biological processes,
▶ Data Integration requires an integrated view of all relevant data useful
to accomplish those tasks.
References
Data Deluge Stein LD (2003) Integrating biological databases. Nat Rev Genet
4(5):337–345
▶ Data-intensive Research
Roberta Alfieri and Luciano Milanesi Steve R. Pettifer and Teresa K. Attwood
Institute for Biomedical Technologies – CNR Faculty of Life Sciences and School of
(Consiglio Nazionale delle Ricerche), Segrate, Milan, Computer Science, University of Manchester,
Italy Manchester, UK
Synonyms Synonyms
Definition Definition
In systems biology, data integration is an important Data integration is the process of bringing together
approach to better understand the main features of information from multiple, diverse sources such that
a biological process, because it represents a way to it can be interrogated as a whole to provide holistic
combine interesting information related to the reaction knowledge that is greater than the sum of its parts. In
involved in specific network. One possible method to particular, data integration aims to seamlessly expose
develop an integration system is the data warehousing information inherent in the relationships between con-
approach (Stein 2003), which allows the integration of cepts. There are numerous technological challenges
information stored in different biological databases. relating to the scalability of data-integration systems,
The necessity for data integration is widely approved as well as complex issues concerning both the nature of
in the bioinformatics and systems biology community the data itself and the means by which the data may be
since bioinformatics data are currently spread across understood by humans. Visualization is the process of
different databases and they are stored in a wide vari- making data human intelligible, enabling human intu-
ety of formats. Moreover, the achievement of ition and expert knowledge to be applied in areas
interesting results in most bioinformatics and systems where algorithmic interrogation is unrealistic.
D 520 Data Integration and Visualization
Systems biology attempts to take a “universal” Many data-integration issues arise from the need to
view of complex biological phenomena, treating reconcile disparate source database schemas; the
them as an integrated whole rather than as individual, more diverse the schemas, the harder the problem.
independently functioning components: it is as much Traditional (relational) databases explicitly encode
about understanding the interactions between biolog- schemas into the database architecture; this requires
ical entities as it is about the entities themselves. This the schemas to be “decomposed” when loaded into
approach necessitates studies at a variety of levels, a warehouse, or at run-time in a federated system.
from individual small molecules and macromolecular Schemas can be very simple, with all information
complexes to their interactions in a variety of interre- encoded in a single monolithic table of records – this
lated contexts: e.g., the specific biochemical path- makes some queries trivial, but is inflexible, making
ways in which they participate, the molecular re-purposing the data difficult. Alternatively, they may
networks those pathways may form, and, ultimately, encode records with finer levels of granularity – this
the complete organisms to whose evolution and func- provides greater flexibility, but imposes on users the
tioning those networks and pathways contribute. need to formulate significantly more complex queries.
Supporting systems perspectives with information Contemporary approaches attempt to obviate the need
technology is challenging in terms of both data inte- for schemas, relying on the use of ontologies to give
gration and visualization: managing the individual meaning and structure to database contents. Here,
components in isolation is hard enough; integrating “triple stores” and “schemaless” databases provide
this information to provide “system-wide” views is a generic underlying technology, and shared ontol-
even more daunting. ogies for reconstituting those data are managed by
the community.
Various community-driven initiatives are evolving
Characteristics data-integration standards (e.g., BioSharing,
BioDBcore, BioPax) in close collaboration with
Issues with Data-Integration Technologies publishers, journals, database curators, software devel-
Traditionally, data integration has involved either opers, and so on (Field et al. 2009; Gaudet et al. 2011;
data warehousing or database federation. In data Demir et al. 2010). These are intended to help users
warehousing, data are extracted from various sources locate and access information dispersed within data-
(databases, “flat files,” and other information- bases; to help shape the data-preservation/manage-
management systems), transformed into a common ment/sharing policies implemented by journal editors
schema (and also possibly audited for quality), and and funders; and to encourage software developers to
finally loaded into what is logically a single (usually embrace and extend community-endorsed standards,
relational) data-store for querying by end-users. This like the systems biology markup language (SBML,
process is often referred to as “ETL” (extract, trans- Hucka et al. 2004) and CellML (Garny et al. 2008).
form, load). Federating databases, however, retains Alongside ontologies and schemas, numerous “mini-
the original data sources and schemas at their original mum information” standards have evolved as a means
locations, instead providing a distributed query mech- of establishing a baseline for the information deposited
anism that interrogates the sources in unison, as if in data repositories (Taylor et al. 2008). These check-
they were logically a single resource. In essence, lists and guidelines aim to improve the quality and
federated systems perform a similar process to ETL integrity of recorded data, leading to improved oppor-
but “on the fly.” Warehousing is a predominantly tunities for data integration.
centralized approach, and federation is inherently
distributed; thus, the same issues of balancing space
complexity (storage) and time complexity (computa- Issues with the Data
tional load) apply as with other distributed architec- Irrespective of data-integration technologies, there are
tures. Similarly, the usual pros and cons associated also issues at the data level, especially with identity
with scalability, run-time performance, and data and nomenclature. Getting humans to agree on the
integrity must be taken into consideration when meaning of words is hard; getting them to understand
designing data-integration platforms. when they are using the same name to mean different
Data Integration and Visualization 521 D
Data Integration and Visualization, Table 1 The variety of names for ‘the same’ gene and its protein product in three different
species, including the many different protein alternative names and gene synonyms
Species Protein Alternative name Gene Synonym UniProtKB:ID
Saccharomyces DNA replication licensing cell division control MCM4 CDC54, MCM4_YEAST,
cerevisiae factor MCM4 protein 54 HCD21 P30665
Drosophila DNA replication licensing protein disc proliferation dpa MCM4_DROME,
melanogaster factor MCM4 abnormal Q26454
Mus musculus DNA replication licensing CDC21 homolog, Mcm4 Cdc21, MCM4_MOUSE,
factor MCM4 P1-CDC21 Mcmd4 P49717
D
IUPAC international chemical identifier (InChiTM) are disparate data drawn from a field that is itself growing
often sufficient to define a molecule’s topological in complexity. It has become clear that the ad hoc
structure (but the use of InChi as a chemical identifier mechanisms and file formats that have served the
is hotly contested, with many arguing that “semantic- field for decades are no longer sufficient, and greater
free” identifiers offer a more reliable foundation for use of ontologies, standards, and identification
data integration). A plethora of other file formats schemas will be required to help extract knowledge
explicitly capture atomic coordinates for small mole- from our growing data collections. With increasing
cules; for large molecules like proteins, where the complexity, however, our ability to integrate data
atomic structure is typically determined experimen- meaningfully using ontologies and schemas is being
tally using techniques like X-ray crystallography, for- pushed to its limits. Ultimately, if we are to be able to
mats such as that defined by the protein data bank grasp the subtleties of communication, we will need to
(PDB, Berman et al. 2000) are often used for data be able to more effectively capture the relationships
exchange. As well as encoding the empirically deter- between data and the biomedical literature (i.e., how
mined coordinate data, this format supports basic data are described in scientific articles). Tools for
annotation of the underlying sequence. managing this relationship will become crucial in
Sequence visualization techniques usually involve future.
displaying ordered lists of amino acids (in a protein) or
nucleotides (in a chromosome or genome). Here, the Cross-References
main challenge is to provide mechanisms for mapping
regions of interest (topological domains, coding/non- ▶ Alignment, Protein Interaction Networks
coding regions, etc.) onto a sequence, and to allow ▶ Applied Text Mining
comparison between multiple sequences (e.g., to deter- ▶ Biological Network Model
mine their similarity). The data formats for sequences ▶ Biological System Model
are relatively simple compared to those for structure ▶ Bio-Ontologies
and network visualization: e.g., the FastA and PIR ▶ Controlled Vocabulary
formats primarily comprise strings of characters from ▶ Data Integration
the relevant sequence alphabet; the “general feature ▶ Data Mining
format” (GFF) and the schema of the distributed anno- ▶ Directed Acyclic Graph
tation system (DAS) also encode regions of interest ▶ Distributed Data Access
that may be mapped to the sequence. Typically, ▶ Gene Ontology
sequences are depicted visually as horizontal rows of ▶ Graph
color-coded blocks, representing residues or nucleo- ▶ Graph Algorithms in Network Analysis
tide bases. Such visualizations allow users to scroll ▶ Graph Alignment, Protein Interaction Networks
back and forth, zoom in and out, reorder the sequences ▶ Graphical Model
or their features, render them in 3D, etc. Until recently, ▶ Interoperability
users would tend to align tens or hundreds of ▶ Knowledge
sequences; however, developments in next-generation ▶ MIRIAM
sequencing (NGS) are likely to drive this into the ▶ MIRIAM URI
thousands, pushing the limits of sequence visualization ▶ Ontology
into new dimensions. ▶ Ontology Structure
▶ SBML
Conclusions ▶ Text Mining
Capturing life science data for the purposes of data
integration and visualization – whether for sequence, References
structure, or network analysis – is challenging. As
techniques like NGS create more data than ever before, Berman HM, Westbrook J, Feng Z et al (2000) The protein data
bank. Nucleic Acids Res 28:235–242
the problems become more complex. In attempting to
Demir E, Cary MP, Paley S et al (2010) The BioPAX community
paint holistic pictures, systems biology must take into standard for pathway data sharing. Nat Biotechnol 28:
account the increasing convolutions of integrating 935–942
Data Integration, Breast Cancer Database 523 D
Field D, Sansone SA, Collis A et al (2009) Omics data sharing. Characteristics
Science 326:234–246
Garny A, Nickerson DP, Cooper J et al (2008) CellML and
associated tools and techniques. Philos Transact A Math Motivation
Phys Eng Sci 366:3017–3043 Breast cancer is one of the most common cancer types:
Gaudet P, Bairoch A, Field D et al (2011) Towards BioDBcore: Approximately, it affects 1 out of 10 women and rep-
a community-defined information specification for biologi- resents the 25% of all tumors that hit women. From
cal databases. Nucleic Acids Res 39:D7–D10
Hucka M, Finney A, Bornstein BJ et al (2004) Evolving a lingua a scientific point of view, it is increasingly believed
franca and associated software infrastructure for computa- that using a systems biology perspective it is possible
tional systems biology: the Systems Biology Markup to develop better strategies for cancer treatment; this D
Language (SBML) project. Syst Biol 1:41–53 consideration is due to the fact that the complex behav-
Laibe C, Le Novère N (2007) MIRIAM resources: tools to
generate and resolve robust cross-references in systems ior of living systems can be hard to predict from the
biology. BMC Syst Biol 1:58 properties of individual parts (such as genes, proteins,
Le Novère N, Hucka M, Mi H et al (2009) The systems biology and cells). In this context, a multilevel integration of
graphical notation. Nat Biotechnol 27:735–741 the available knowledge regarding both biological
Taylor CF, Field D, Sansone SA et al (2008) Promoting coherent
minimum reporting guidelines for biological and biomedical components and pathways is a crucial task in order to
investigations: the MIBBI project. Nat Biotechnol 26: promote the system perspective.
889–896
Functionality
The G2SBC Database contains several types of molec-
ular alterations associated with breast cancer. These
Data Integration, Breast Cancer alterations encompass the genome (mutations and
Database SNPs), the transcriptome (RNA expression level and
splicing variations), and the proteome (protein expres-
Ettore Mosca, Ivan Merelli and Luciano Milanesi sion level, sequence, structure, and localization).
Institute for Biomedical Technologies – CNR Molecular alteration data are integrated with informa-
(Consiglio Nazionale delle Ricerche), Segrate, tion concerning the molecular network layer (path-
Milan, Italy ways and interactions, retrieved from databases as
KEGG Pathway and BioGRID, respectively) and the
cellular layer (breast cancer types and tissue images
Synonyms from the Human Protein Atlas, mathematical models
of carcinogenesis, tumor growth, and tumor response
Genes-to-systems breast cancer database from literature).
The Web site (implemented employing PHP
and JavaScript) by which the database can be
Definition accessed is mainly divided into three sections. The
first concerns the query system, which allows to
The Genes-to-Systems Breast Cancer Database retrieve data from the three levels of biological enti-
(shortly, G2SBC Database) is a freely available Web ties: molecular components, the molecular systems,
resource that collects data about genes, transcripts, and and the cellular layer.
proteins reported in the scientific literature as altered in The second section concerns the analysis tools
breast cancer cells; alterations encompass different available by means of the Web interface. In particular
types of mutations and expression variations of tran- there are two tools that rely on the application of
script and proteins. These data are integrated in graph theory to biological networks for highlighting
a multilevel database (from genes, transcripts, and pro- interactions, protein complexes, and hubs.
teins to molecular networks, cell populations, and tis- Concerning the gene-annotation enrichment analysis,
sues), which is coupled with a series of analysis tools the G2SBC Database provides a tool that enables
concerning cellular biochemical pathways, protein– the functional annotation of gene lists provided by
protein physical interactions, protein structures, and the user or retrieved exploring the data from the
mathematical models of cell behavior. database.
D 524 Data Management
PTPμ
PTPs
VEPTP PTP1B
LAR SHP-1
Lastly, the G2SBC Database maintains a model- ▶ Gene Set and Protein Set Expression Analysis
oriented section, which involves two aspects. The ▶ Interaction Networks
first one concerns the interaction among cell cycle ▶ KEGG Pathway Database
regulation and breast cancer: due to this connection it
is possible to retrieve the breast cancer genes involved
in cell cycle control and simulate the associated math- References
ematical models. The second regards the mathematical
models related to carcinogenesis, tumor growth, and Kanehisa M, Goto S, Furumichi M, Tanabe M, Hirakawa M
(2010) KEGG for representation and analysis of molecular
response to treatments.
networks involving diseases and drugs. Nucleic Acids Res
The data integration approach employed to develop 38(Database issue):D355–D360
this resource enables the collection of a large amount Mosca E, Alfieri R, Merelli I, Viti F, Calabria A, Milanesi L
of records and the Web tools provided allow to infer (2010) A multilevel data integration resource for breast can-
cer study. BMC Syst Biol 4:76
nontrivial knowledge: one example is the integration
Stark C, Breitkreutz BJ, Chatr-Aryamontri A, Boucher L,
of protein expression data available for different types Oughtred R, Livstone MS, Nixon J, Van Auken K,
of breast cancer with the cellular pathways, Fig. 1. Wang X, Shi X, Reguly T, Rust JM, Winter A, Dolinski K,
Tyers M (2011) The BioGRID interaction database:
2011 update. Nucleic Acids Res 39(Database issue):D698–
Accessibility
D704
The G2SBC Database is freely accessible at http:// Uhlen M, Oksvold P, Fagerberg L, Lundberg E, Jonasson K,
www.itb.cnr.it/breastcancer. An extensive help section Forsberg M, Zwahlen M, Kampf C, Wester K, Hober S,
is available and contains some use cases covering the Wernerus H, Bj€ orling L, Ponten F (2010) Towards
a knowledge-based human protein atlas. Nat Biotechnol
different sections of the G2SBC Database.
28(12):1248–1250
Cross-References
There is some confusion on the meaning of the term Data Mining Methods
data mining and its relation to overlapping terms such Data mining is to a large extent defined by the methods
as machine learning and statistics. As a field of science, that are used in the field. The methods and their goals
data mining sits between information technology and are varied, but can be roughly divided into two main
statistics. Data mining originated from knowledge dis- categories: they either try to model the data (learn the
covery research in information science, and it has global structure of the data) or try to find patterns of
traditionally had very little interaction with statistics. interest (learn local structures from the data). From
This history also explains much of the critique toward a computational point of view, data mining problems
data mining, to a large extent from the statistical com- are often NP-hard, meaning that they do not have an
munity. Early data mining methods have been criti- optimal solution algorithm and need approximated
cized for so-called “data fishing”: searching for solutions. This inherent complexity is also one of the
complex patterns without controlling the probability reasons why data mining is seen as an interesting area
of them being just random artifacts in the data (Hand for computer science research.
et al. 2001). The most common data mining tasks can be divided
However, the difference between statistics and data into clustering (unsupervised learning), classification
mining is vague: statistics is especially interested in (supervised learning), regression (▶ Regression
inference and prediction, typically based on some Analysis), association mining, and text mining
a priori hypotheses and supporting experimental (▶ Text Mining) (Hand et al. 2001; Hastie et al. 2009).
D 526 Data Mining
Data Mining, Fig. 1 Visualization of a hierarchical clustering of gene expression samples. Genes are clustered so that they have
similar expression levels in the sampled conditions. Visualization was produced with Chipster software
D 528 Data Mining–based Transcriptional Regulatory Network Construction
of the original data. Only those original results that are Cross-References
unlikely in randomized data are selected.
Validation is hard to do well, and it often takes ▶ Bayesian Inference
considerable effort to program a validation approach ▶ Classification
that is suitable for the particular data at hand. Ready- ▶ Clustering, Hierarchical
made solutions in most software programs often do not ▶ Clustering, k-Means
perform satisfactorily. ▶ Clustering, Model-Based
▶ Data Mining–based Transcriptional Regulatory
Tools and Data Management Network Construction
It is common to store structurally simple data sets as ▶ Linear Regression
flat files, such as comma separated tables. More com- ▶ Model Testing, Machine Learning
plex data sets are typically stored in SQL databases, in ▶ Model Training, Machine Learning
XML files or in custom data storage systems. Data ▶ Model Validation, Machine Learning
warehouse technologies commonly used in the indus- ▶ Overfitting
try are less often seen in bioscientific data mining. ▶ R, Programming Language
For preprocessing and analysis, most commonly ▶ Text Mining
used tools are either generic scripting languages (e.g.,
Perl and Python), interactive data analysis environ-
ments (e.g., R (▶ R, Programming Language) and References
Matlab), and graphical tools (e.g., Weka). Knowledge
discovery professionals in the industry often use spe- Baldi P, Brunak S (2001) Bioinformatics: the machine learning
approach, 2nd edn. MIT Press, Massachusetts
cialized data mining packages, but in systems biology
Chambers J (1993) Greater or lesser statistics: a choice for future
in academia, a set of open source command line tools is research. Stat Comput 3:182–184
favored. As the need for data mining methods is grow- Hand DJ, Mannila H, Smyth P (2001) Principles of data mining.
ing due to developments in measurement technology, MIT Press, Massachusetts
Hastie T, Tibshirani R, Friedman J (2009) The elements of
commercial companies have started to provide special-
statistical learning: data mining, inference, and prediction,
ized graphical tools, which have been later followed by 2nd edn. Springer, New York
open source developments. These tools aim to provide
advanced data analysis capabilities to users without
computational background.
Analysis workflows differ, but they all contain
some variants of the following steps: preprocessing, Data Mining–based Transcriptional
data exploration, and main analysis. In preprocessing, Regulatory Network Construction
the data is formatted correctly, filtered for unneces-
sary or dubious content, and often normalized. Explo- Xing-Ming Zhao
ration ranges from outputting simple statistics to Institute of System Biology, Shanghai University,
elaborate visualizations. The main analysis is often Shanghai, China
constructed in the form of a statistical test, although
sometimes a less strict approach is used. Especially
for systems biology work, an additional step is taken Synonyms
after the main analysis: result integration and expla-
nation. In the integration and explanation step, the Classification; Data mining; Regression
results are often validated (Model Cross-validation,
Machine Learning) against independent data sources,
such as the many biological databases that are pub- Definition
licly available. The intent is to find support for the
results and also to find sound explanations that help to Data mining is a new field that aims to analyze large
understand the results as part of the larger “systems datasets and extract knowledge from data, and it is an
view”. interdisciplinary field involving statistics, pattern
Data Mining–based Transcriptional Regulatory Network Construction 529 D
recognition, information theory, and so on. Generally, feature selection techniques include maximum rele-
data mining techniques can extract informative pat- vance/minimum redundancy (MRMR), Entropy, and
terns from data and construct decision-making sys- Mutual information, etc. The selected features can help
tems. The most popular techniques in data mining to interpret the model and important patterns.
include clustering, classification, regression, etc. Data
mining is being widely used in various fields, including Data Mining Algorithms
image processing, marketing relationship, and medi- Various data mining methods can be grouped into
cine (Fayyad et al. 1996). three groups, that is, supervised methods, unsupervised
Recently, data mining has been playing an impor- methods, and semi-supervised methods, among which D
tant role in bioinformatics, based on which various the supervised and unsupervised methods are widely
tools have been developed to construct transcriptional used in transcriptional regulatory network construction.
regulatory networks. Instead of determining individual In supervised methods, there are some regulations that
protein-gene regulations with traditional biological are known in advance and are therefore used as training
experiments, the integration of emerging high- set to infer the patterns for prediction of new regula-
throughput data and data mining is able to predict tions. The supervised methods widely used to construct
transcriptional regulations in an automatic way. In transcriptional regulation networks include artificial
general, the regulation relationship can be predicted neural network, decision tree, genetic algorithm, and
as a classification or regression problem, which aims to support vector machine. On the other hand, if there are
extract regulation patterns from the experimental data. not any known regulations available, the unsupervised
methods will be helpful, among which the most popular
one is clustering.
Characteristics
Artificial Neural Networks
Data Representation and Preprocess Artificial Neural Network (ANN) model is designed to
In data mining, the data are usually formulated as mimic the real biological neural network that can pro-
vectors, where each sample is represented as one vec- cess information and make decision. In general, the
tor. Therefore, the similarity between samples can be ANN model consists of three parts, including input
estimated. In constructing transcriptional regulatory layer, output layer, and hidden layer. There are some
network, one of the most commonly available data nodes in each layer that work like the neurons, and the
source is microarray data that describes the expression nodes are interconnected to simulate the information
profile of genes under different conditions. The gene flow between neurons. The structure of the ANN
expression data can be easily used for data mining. model reflects the mapping from input variables to
Other kinds of data, for example, ChIP-on-chip (also output variables. The ANN model can handle large
known as ChIP-chip) data, can also be represented as dataset and is robust against noisy data. However, the
numeric vectors. ANN model works as black box and it is difficult to
However, it is not a trivial task to extract informa- interpret the results obtained in some cases.
tive patterns due to the noise inherited in the experi- Veiga et al. (2008) designed a feed-forward (FF)
mental data. Some data mining techniques are and bi-fan (BF) regulatory motif prediction model for
therefore used to reduce noise and sometimes also Escherichia coli based on multilayer perception artifi-
help to reduce the dimensionality of the data, such as cial neural networks (ANNs). The regulatory motifs
principal component analysis (PCA). Since the gene predicted consist of transcription factors and regulated
expression data can be measured as time series data, genes, and are highly enriched. Hart et al. (2006) found
some signal processing methods, for example, Fourier that single-layer feed-forward ANN models can effec-
transform, are helpful to reduce noise and transform tively discover gene network structure by integrating
the data into another description space so that the global in vivo protein-DNA interaction data (ChIP/
signal in the data can be easily detected. Another Array) with genome-wide microarray data. The ANN
important issue in data mining is feature selection, models were successfully applied to construct the yeast
which aims to select important patterns so that the cell cycle transcriptional regulatory network, which is
performance of the classifier can be improved. Popular composed of hundreds of genes.
D 530 Data Mining–based Transcriptional Regulatory Network Construction
Cross-References
D
▶ Chromatin Immunoprecipitation Data Sampling
▶ Entropy
▶ Feature Selection Haiying Wang and Huiru Zheng
▶ Maximum Relevance/Minimum Redundancy School of Computing and Mathematics, Computer
(MRMR) Science Research Institute, University of Ulster,
▶ Mutual Information Jordanstown, UK
Synonyms
References
Sampling; Statistical sampling
de Hoon MJL, Imoto S, Kobayashi K, Ogasawara N,
Miyano S (2003) Inferring gene regulatory networks
from time-ordered gene expression data of Bacillus
subtilis using differential equations. Pac Symp Biocomput Definition
8:17–28
Fayyad U, Piatetsky-Shapiro G, Smyth P (1996) From data
mining to knowledge discovery in databases. AI Mag
In machine learning (▶ Model Validation, Machine
17:37–54 Learning), data sampling (Cochran 1977) is a widely
Hart CE, Mjolsness E, Wold BJ (2006) Connectivity in the yeast accepted process concerned with the selection of an
cell cycle transcription network: inferences from neural net- unbiased subset of data, which are representative of
works. PLoS Comput Biol 2(12):e169
a larger population, for the purposes of constructing
Kikuchi S, Tominaga D, Arita M, Takahashi K, Tomita M
(2003) Dynamic modeling of genetic networks using predictive models with machine learning algorithms.
genetic algorithm and s-system. Bioinformatics The size of the sample is the number of data items in
19(5):643–650 a sample, typically denoted as an integer number N.
Kumar M, Gromiha M, Raghava G (2007) Identification of
The major benefit of applying data sampling in the
DNA-binding proteins using support vector machines and
evolutionary profiles. BMC Bioinformatics 8:463 context of machine learning is it can effectively speed
Qian J, Lin J, Luscombe NM, Yu H, Gerstein M (2003) Predic- up the modeling process, which allows the analyst to
tion of regulatory networks: genome-wide identification of build a model and make prediction with relatively little
transcription factor targets from gene expression data.
cost and effort. To achieve this, a sample is required to
Bioinformatics 19(15):1917–1926
Ruan J, Deng Y, Perkins EJ, Zhang W (2009) An ensemble contain the essence and reflect the characteristics of the
learning approach to reverse-engineering transcriptional reg- entire dataset. The key to meet this fundamental
ulatory networks from time-series gene expression data. requirement is randomness, i.e. allowing each data
BMC Genomics 10(Suppl 1):S8
item in the database to have the same probability of
Seema S, Ramanatha KS (2010) Inference of gene regulatory
network using modified genetic algorithm. In: Proceedings of being selected.
the international symposium on biocomputing, Calicut, Types of data sampling commonly used in machine
Kerala, pp. 1–8 learning include the following:
Veiga DF, Vicente FF, Nicolás MF, Vasconcelos AT
(2008) Predicting transcriptional regulatory interactions
• Simple Random Sampling
with artificial neural networks applied to E. coli multidrug Each data item has the same chance of being
resistance efflux pumps. BMC Microbiol 8:101 selected in the data set.
D 532 Data Storing and Querying
Characteristics
Cross-References
Role
Technical advances in biological information capture
▶ Data Integration and Visualization
have yielded intimidating quantities of data pertaining
to many areas of biology, including biochemical sig-
naling. In parallel, continuing developments in soft-
References ware and simulators have helped researchers to
develop and explore increasingly detailed biological
Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A models of complex signaling pathways. Model data-
et al (2009) InterPro: the integrative protein bases are an emerging category of bioinformatics
signature database. Nucleic Acids Res 37(Database Issue): tools that serve the intersection of these trends
D211–D215
UniProt Consortium (2011) Ongoing and future developments
(Ghosh et al. 2011).
at the universal protein resource. Nucleic Acids Res DOQCS was designed to integrate several modeling
39(Database issue):D214–D219 requirements to provide a resource for model
Database of Quantitative Cellular Signaling (DOQCS) 535 D
Database of Quantitative Cellular Signaling (DOQCS), enzyme, and reaction tables. The identity of reactants (substrates
Fig. 1 Entity Relationship diagram for the DOQCS database. and products of reactions and enzymes) and hence the connec-
The accession table is used as an entry point into the model. The tivity of the model is stored in the “Molecule List” table. The
accession number is the unique identifier for each model. Most channel and geometry information are stored in respective
other tables in the database refer to this (broken line). Similarly, tables. Original and converted model files are stored as large
each pathway has a unique pathway number used by other tables text objects in the “File Format” table
(the black line). Model parameters are stored in the molecule,
Database of Quantitative Cellular Signaling (DOQCS), Fig. 2 Screenshot of overview page of DOQCS accession, showing
directory-tree-like hierarchy structure, block diagram and basic annotations
As before, DOQCS is implemented using Linux/ Original and converted model files are stored as
Apache/MySQL/PHP. Large Text Objects in the File Formats table.
The accession table specifies the primary model
entry into the database and holds substantial contextual Database Interface
data as well as a summary diagram for each model. The The web interface to DOQCS facilitates navigation
contextual data include ▶ MIRIAM compliant annota- either through links (URL-based navigation) or
tions. Further fields including tissue, cell type data in through searches (form-based navigation) (Fig. 2).
addition to fields on curation level and model type The URL-based navigation matches the conceptual
annotation are added. organization of the database into accessions, pathways,
The Pathway table provides additional overview and biochemical entities. These levels are explicitly
information about the models, including reaction dia- represented using a directory-tree-like hierarchy in the
grams and annotations. web interface. The root of the tree is the accession,
The remaining tables provide low-level model folders include different pathways within the acces-
details: molecular identity, rate constants, and sub- sion, and the molecules, reactions, and enzymes are
strate/product lists. represented as entries within the pathways. A click on
Compartmental details are stored in Geometry any level of the tree brings up the details pertaining to
table. that level of the database, as mentioned above.
Databases for Kinetic Models 537 D
Overview data is provided on the accession and path- Cross-References
way pages, and detailed biochemical parameters are
found on the reaction and molecule pages. ▶ Gillespie Stochastic Simulation
Each of these pages includes richly hyperlinked ▶ MIRIAM
results from the navigation. For example, each pathway ▶ Ordinary Differential Equation (ODE)
page has links to all related pathways in the database. ▶ Relational Database
Further, each biochemical entity page has links to all the ▶ Systems Biology Markup Language (SBML)
reactions or other molecules that interact with it.
The second major navigation option is through D
searches. This uses a query-based form present at the References
top of each page of DOQCS. In additional to simple
queries as previously described (Sivakumaran et al Bhalla US (2002) Use of Kinetikit and GENESIS for modeling
signaling pathways. Methods Enzymol 345:3–23
2003), additional several complex searches
Ghosh S, Matsuoka Y, Asai Y, Hsin KY, Kitano H (2011)
are possible for textual pattern matching within Software for systems biology: from tools to integrated plat-
various subfields, as it is common on most databases. forms. Nat Rev Genet 12(12):821–832
More sophisticated searches can also be carried out Gillespie DT (2007) Stochastic simulation of chemical kinetics.
Annu Rev Phys Chem 58:35–55
based on connectivity or functional criteria.
Sivakumaran S, Hariharaputran S, Mishra J, Bhalla US
Typical connectivity criteria specify that a given (2003) The database of quantitative cellular signaling: man-
molecule is a substrate of another. A functional crite- agement and analysis of chemical kinetic models of signaling
rion might be that a given molecule acts as an enzyme. networks. Bioinformatics 19(3):408–415
Vayttaden SJ, Bhalla US (2004) Developing complex signaling
models using GENESIS/Kinetikit. Sci STKE 219:l4
Data Sources and Curation
Except for few models that were explicitly developed
for the database by the curation team, all other
models are published models which have been tested
before putting online and in the accession table citation Database Search
information is included as a link to PubMed. This
procedure frequently reveals ambiguities in model rep- ▶ Protein Identification Analysis
resentation, and in some complex models, the original
implementation is difficult to replicate. In all cases, the
extent of the fit between published and internally tested
representations is reported as an important entry in the Databases for Kinetic Models
validation field in the table.
Jacky L. Snoep
File Formats Department of Biochemistry, Stellenbosch University,
DOQCS supports three file formats: Kinetikit, SBML, Matieland, South Africa
and MATLAB. All models in the database have Manchester Institute for Biotechnology, University of
been implemented using GENESIS/Kinetikit simula- Manchester, Manchester, UK
tor (Bhalla 2002; Vayttaden and Bhalla 2004) and Molecular Cell Physiology, Vrije Universiteit
are available in its internal model file format Amsterdam, Amsterdam, The Netherlands
which includes annotation information. In cases
where the original publication includes model
files, these are also presented unchanged for down- Definition
load. Most models have also been converted to
MATLAB and SBML. All converted models have Over the last decade, systems biology (SB) has been
been tested for equivalent function to the GENESIS/ one of the fastest growing fields in the life sciences.
Kinetikit form. There are many interpretations and definitions for the
D 538 Databases for Kinetic Models
SB field (Westerhoff and Alberghina 2005), but most Models have increased in size for two reasons:
scientists agree that an SB study combines experimen- firstly because larger systems are being studied and
tal and theoretical (including modeling) approaches. secondly because the models are more detailed. In
As such the increase in number of SB studies correlates some cases, the modelers are actually attempting to
with the increase in mathematical models for biologi- build replicas of the real system. In the latter case,
cal system, which is evident from a recent inventory where the model variables have a well-defined
(Huebner et al. 2011). Not only did the number of mechanistic interpretation, it is important to annotate
models increase, also the size of the models has the model. The information in such annotated models
increased. The latter effect is due to the larger size of is ideally suited for storing in relational databases.
the systems being studied but more importantly due to Strong searching capabilities, easy access and dis-
a more complete representation of the biological semination via standard formats and protocols, linking
system in the models. to other databases, and possibilities of incorporation of
The aim of many SB studies is to relate systems automated workflows are just a few of the added
behavior to characteristics of the components of the advantages of storing models in databases.
system. This is a subtle but important difference from
the classic theoretical biology approaches, where core Minimal Requirements for Model Databases
models were constructed to describe biological behav- A model repository or database must fulfill three min-
ior using mathematical equations that were as simple imal criteria to make it useful. Firstly, the models’
as possible, not necessarily reflecting biological descriptions must be correct, i.e., they must be identi-
mechanisms. Specifically in so-called bottom-up cal to the model description in the publication where
SB models, a strong mechanistic link is maintained the model was first presented (see ▶ CellML Model
between model components and biological entities. Curation). Secondly, the models must be available for
Such models can be used to critically test our download from the repository in a standard model
biochemical knowledge of a given system (e.g., description format, such as ▶ Systems Biology
Teusink et al. 2000). Markup Language (SBML) or ▶ CellML. Thirdly,
The increase in number and size of kinetic models the models must be annotated (model annotation).
and the more realistic representation of biological A minimal annotation should link the model to the
components in these models have led to the develop- reference where it was published and it should be
ment of a number of model database initiatives. In this clear how the model variables and parameters link to
section, descriptions of several of these initiatives are the species and constants in that publication.
given and the advantages of storing models in such A nice-to-have functionality for kinetic model
databases and the consequences for model reuse, anno- repositories is a simulation tool, such that the models
tation, and dissemination are discussed. in the repository can be directly inspected and simu-
lated, for instance in a web browser. The first initia-
tives that stored kinetic models of biological systems
Characteristics started out as model repositories with a simulation
engine (see ▶ JWS Online and the ▶ Virtual Cell
Why Model Databases (VCell) Modeling and Analysis Platform).
The increased size of kinetic models in SB studies A large number of model databases have been ini-
necessitates the storage of the models in a publically tiated in the last decade. Some of these initiatives focus
accessible form. Whereas it is easy to code a two- on a specific subset of models, such as models for
variable model from a publication, the effort and signal transduction pathways (▶ DOQCS: Database
chances of making errors becomes much larger with of Quantitative Cellular Signaling), for the cell cycle
increased model sizes. Clearly, if a model were avail- (▶ Cell Cycle Database), for neuronal models
able in a repository from which it can be downloaded (modelDB), while other databases are more general
and used without recoding the model, this would save repositories (▶ BioModels Database: a repository of
a lot of work and would also eliminate coding errors mathematical models of biological processes, ▶ JWS
(if the model in the database were properly curated). Online, ▶ CellML, ▶ Virtual Cell, ▶ WebCell).
Databases for Kinetic Models 539 D
Model Annotation What Information Should Be Stored in Model
Whereas the focus in theoretical biology studies used Databases?
to be on the mathematical aspects of a model, i.e., on For each model entry in the database, its description
the analytical and numerical analyses, in systems biol- file in a standard format and an annotation file would
ogy studies the direct link to experimental data is much be the minimal information to be stored (in SBML, the
stronger and it therefore becomes more important to two can be combined in a single xml file). In this
relate model components to biological entities, i.e., to section, the focus is on models described as determin-
annotate models. istic ordinary differential equations, which is the class
In addition to a minimal reference annotation to in which the majority of the kinetic models in SB D
link models in the database to the scientific publica- studies are described. For a more general treatment,
tion in which the model is described, a further anno- see the section on kinetic models (▶ Kinetic Modeling
tation of model components to biological entities and Simulation).
using controlled vocabularies (e.g., SBO terms) A typical model description would include the fol-
greatly enhances the application strength of mathe- lowing: (1) description of a set of reactions, (2) rate
matical models. The annotation of models makes it equations for the reactions, (3) parameter values, and
much easier to search for specific components in (4) initial conditions. In addition, models can contain
models, or to compare or even link models for events, functions, and algebraic equations, but the above
overlapping systems. four components are generic. The reaction description
The Minimal Information Required in the Annota- and the rate equations can be combined to give a set of
tion of Models (▶ MIRIAM) is an example of guide- differential equations, and some models are defined in
lines for annotating models, and it links model differential equations only, without specifying the reac-
variables to ontology terms (e.g., to a systems biology tions and their rate equations. The standard model
ontology, SBO term) and thereby to unique identi- description formats allow for these different ways to
fiers. Thus, where modelers can use names as G6P, describe kinetic models. Whether a model is described
Glu6P, Glc6p, or X2 to denote a variable such as in terms of reactions or as ODEs does not influence the
glucose 6-phosphate, if such variable names are numerical integration (since these are performed as
linked to an identifier such as a ChEBI number they differential equations), but the model description can
can all be related to glucose 6-phosphate. This anno- affect the functionality of the model.
tation relates model constituents unambiguously to
a known entity, where the references information is The Importance of Specifying Reactions
given in a {“data type,” “identifier,” “qualifier”} trip- If a model is described in ODEs without explicit ref-
let. The “data type” is given as a Unique Resource erence to the reactions in a system, it is not always
Identifier and can be an Uniform Resource Locator possible to dissect the contributions of the individual
(url) or a Uniform Resource Name (urn), for instance reactions to an ODE.
for a model variable Ca2+ which is a reactant in The famous Lotka-Volterra model for predator-
a reaction the data type could be a url: “http://www. prey interaction might serve as an example to illustrate
ebi.ac.uk/chebi/” with identifier: “CHEBI:29108” the point. This model is usually presented in the form
and qualifier: “is.” MIRIAM extends beyond the of ODEs:
annotation of model components; it also requires
a reference correspondence, standard model format, dðx½tÞ=dt ¼ x½t ða b y½tÞ
and the possibility of instantiation of a model
simulation. dðy½tÞ=dt ¼ y½t ðc x½ t dÞ
Biomodels (▶ BioModels Database: a repository of
mathematical models of biological processes) was one with x[t] and y[t] the densities of prey and predator,
of the first model database initiatives to strongly pro- respectively, and a, b, c, and d positive parameters.
mote model annotation and has played an important From these ODEs, it is not immediately clear that the
role in development in many model description, anno- system consists of three reactions: (1) a birth rate of
tation, and simulation standards. prey (a*x[t]), (2) a death rate of predator (d*y[t]), and
D 540 Databases for Kinetic Models
(3) a rate for prey consumption by predator leading to to represent biological pathways and is widely
a decrease of prey ( b y½t x½t) and an increase of supported by database initiatives.
predator (c y½t x½t). From the ODEs, it cannot be Thus, although the traditional way of defining
automatically derived that (3) is a single reaction models in terms of ODEs without explicit reference
with a different stoichiometry for prey consumption to reactions has no consequence for the time integra-
and predator growth. If the ODEs would be given in tion of the model, it does affect the functionality of the
terms of reactions this uncertainty would not exist, for model both in terms of comparing to other resources
instance: and in terms of analyses that can be made with the
model. For instance, automated drawing of reaction
dðx½tÞ=dt ¼ v1 v2 network graphs or ▶ metabolic control analysis
(MCA) can only be performed for models defined in
dðy½tÞ=dt ¼ e v2 v3 terms of reaction steps.
activity, this can still work fine. However, for multiple Linking Models to Data Sets
substrate/product reactions, the rate equation becomes One can roughly distinguish two types of data sets that
dependent on the kinetic mechanism used, and for are connected to mathematical models: data for model
ordered binding mechanisms the definition of the inhi- construction and data for model validation. In principle
bition constants is related to the mechanism. For such these should be independent data sets, and they can be
reactions, it is important to publish the kinetic rate very different. For instance in a typical bottom-up
equation together with the Ki values. modeling approach, it is possible to have kinetic data
Irrespective of the mechanistic interpretation of for each of the individual enzymes that is used for
kinetic parameters, their values are usually dependent parameterization of the enzyme kinetic rate equation
on the assay conditions under which they were deter- as part of the model construction. In such a study, data
mined. The kinetic parameter values should be deter- for the complete system could be used for model val-
mined under conditions that closely reflect the idation. Of course, this is just a scenario and other
conditions that are simulated in the model. Where types of data sets are possible. In many studies it is
a model simulates the intracellular cytosol, it is hard not possible to separate the model components func-
to imitate those conditions ex vivo. Often kinetic tionally from the system, and then one cannot charac-
parameters are determined in vitro, and some guide- terize the individual components in isolation.
lines for mimicking cytosolic conditions have been Because the bottom-up approach nicely separates
formulated (van Eunen et al. 2010). the model construction data sets from the model vali-
The contents of databases are dependent on the way dation data sets, I will expand a little further on how
the data are represented in the scientific literature. For such data sets can be linked to model storage in data-
the representation of enzyme kinetic data, standards bases. The first question to ask is whether experimental
have been formulated by the ▶ STRENDA commis- data should be stored in a model database. On the one
sion (Standards for Reporting Enzymology Data, hand, one can argue that the data are not part of the
http://www.beilstein-institut.de/en/projects/strenda/). model description, and are not necessary for model
A number of databases exist where enzyme kinetic simulation or analysis and could therefore be kept
data are collected. This might seem in conflict with the separate from the model. On the other hand, the data
above advice to use published kinetic parameters with for model construction are closely linked to the model
caution, but these database initiatives give information and can enhance the model functionality significantly.
on the following: (1) the experimental conditions Firstly, if all data for model construction are made
under which the parameters were determined, (2) the available, the model construction becomes
associated rate equation, and (3) the experimental data a completely transparent process and could be
used for the parameter determination (e.g., SABIO- reproduced by other scientists, which is an important
RK). Some kinetic parameters such as Keq and kcat aspect of scientific research. Secondly, it enables
are relatively constant (e.g., not sensitive to a particular researchers to make changes to the model, for instance
kinetic mechanism assumed in its estimation), when using the same experimental data sets but applying
they are measured under the correct physiological con- different rate equations. A last advantage to explicit
ditions. Other kinetic parameters, such as Vmax values linking of the data sets to the model is that it makes it
can vary largely as they are not only dependent on the possible to completely separate the construction and
assay conditions but also on expression level of the validation data sets.
enzyme, i.e., on the growth conditions of the organism. The above given advantages should be sufficient
The relevance of this long section on enzyme motivation to link experimental data sets to the exper-
kinetic parameters for model databases is that data- imental data that was used for the model construction,
bases that store the kinetic parameters will facilitate but it does not provide with a strong argument to store
the reuse of the parameter values. There is a significant the data in the same database as where the models are
risk involved in reusing parameter values, if the user stored. For instance it would be possible to store all the
does not check carefully how the parameter values experimental data in a specialized database for enzyme
were determined. A close link of the parameter values kinetics, such as SABIO-RK (sabio.villa-bosch.de/) or
and rate equations to experimental data seems Brenda (http://www.brenda-enzymes.info/), and then
necessary. link the model to the external database. Whereas such
Databases for Kinetic Models 543 D
a link to external databases is easily made, it is some- (www.unicellsys.eu/), EviMalaR (www.evimalar.org/),
times more convenient to also store the experimental the Virtual Liver (www.virtual-liver.de/), and many
data in the model database. For instance, several of the more research projects.
model databases (e.g., JWS Online and Biomodels) are
part of data management systems of large research Web-Services and Workflows
projects, and then it can be advantageous to keep the One of the biggest advantages of using a database for
data in a secure database and have direct access to the storing data is the possibility to structure the data
data without external links. For instance, JWS Online according to its expected use. One can store the data
is used in active model development where for each such that expected queries run most efficiently. It is D
reaction there is a data set available that is directly a relatively simple step to not only structure the data
available via the network schema. These data sets are storage but also the way in which queries are made and
sometimes updated and linked to versioned models. output is given. Most databases allow access to their data
Only the final model is released to the public and via so-called web-services (see ▶ Web Service), which
then the data is made available. are structured queries. There are different types of web-
Clearly, different solutions can be used to store services, which are treated in more detail in another
models, kinetic parameters, rate equations, and the section. For the model databases, such web-services can
associated experimental data. It is important that be a simple search query to select all models for a certain
a link between the model and the experimental data is organism, or pathway, but it can also involve running
made explicit and that it can be followed, how the link simulations of models that have been selected before.
is made is not so important. In some of the above- Such workflows that run a set of web-services in succes-
mentioned systems biology projects, in addition to sion can be defined in software tools specifically
experimental data there is a lot of additional informa- designed for this task, such as Taverna, or in more generic
tion stored and often a much greater functionality is programs such as Mathematica. Workflows are very
provided in a complete data management structure. powerful tools that automate tedious and repetitious jobs.
An important aspect of web-service construction is
Data and Model Management Structures the formulation of the specific format in which the
The multidisciplinary character of systems biology query must be made and the answer will be given.
projects and the scale of many of these projects neces- Recent initiatives have focused on the formulation of
sitate the collaboration between sometimes-large num- standards for model simulation descriptions. MIASE
bers of research groups. Although each research (Minimal Information About a Simulation Experi-
project could in principle come up with its own unique ment) proposes a minimal set of information needed
solution for data and model management, it would to reproduce simulation experiments, and SED-ML
make more sense to develop more generic tools that (Simulation Experiment Description Markup Lan-
can be used by many research groups. An example of guage) encodes this information.
such a data management system that is being devel-
oped centrally is the data management group for the The Silicon Cell Initiative
▶ SysMO projects (www.sysmo.net/). The SEEK Model databases greatly enhance the accessibility of
(▶ Data and Model Management Platform, SEEK) published kinetic models. The storage of curated
(Wolstencroft et al. 2011) is the centrally developed models is important to prevent losing the models,
software package that gives a wide functionality to which are often custom coded, and to make the models
each of the individual research groups, one of which publicly available in standard formats. The reproduc-
is the storage and curation of mathematical models, but ibility of (model simulation) results is an important
it also makes it possible to make explicit links between aspect of the scientific process and this is ensured in
models and experimental data. the curation process of the model databases.
The SEEK has been a large success and is now also Model accessibility is also important for future
implemented in several other SB projects. SEEK can model reuse. Whether a specific model will be reused
be downloaded, installed, and adapted to anyone’s is largely dependent on how it was constructed. If
needs, and this has helped strongly in the package a model was made to address a specific research
being incorporated in, e.g., the UniCellSys project question, the model structure and kinetic parameters
D 544 Data-Driven Science
Cross-References
necessarily starting from a specific hypothesis to be also have biological significance, and what they
tested and without necessarily possessing the same teach us about the biology of organisms needs to
expertise as the original data producers (Kell and Oli- be investigated through further experimentation.
ver 2004). At the same time, it has become clear that Yet, computational analysis here offers a shortcut
the sheer scale and diversity of data to be analyzed toward discovery by pointing to patterns that have
requires the creation of sophisticated tools for data at least the potential to enhance existing under-
mining, which in turn needs to be informed by relevant standings of biological phenomena.
expertise in the theory and practice of all the relevant • Discovery through spotting gaps
domains within biology (Blake and Bult 2005, Buetow Data mining can lead to discovery by pointing
2005). researchers toward areas of investigation that were
not previously charted. A good example is the dis-
Three Data-Intensive Research Methods covery of “ultra-conserved regions” in vertebrate
The following three cases provide good examples of DNA and their regulatory role in development
the advantages and limitations of data-intensive (Blake and Bult 2005); or, more generally, the dis-
research methods: covery of correlations between the presence of spe-
• Discovery through triangulation of existing cific alleles in an individual’s genotype and the
evidence occurrence of a phenotypic trait, which might
The opportunity to access data through a web of open the way to the investigation and discovery of
interlinked repositories and databases is enabling mechanisms responsible for the development of
scientists to retrieve more and more data of possible that trait (this is the approach underlying genome-
relevance to their research interests. It is now pos- wide association studies). In these ways, data-
sible to gather and integrate data obtained on a wide intensive research helps to identify gaps in the
variety of organisms by laboratories across the existing knowledge about a given entity, thus open-
globe, no matter the specific expertises and interests ing up new areas for investigation.
guiding the production of data at each location. This
is particularly true in the case of research on Beyond Induction
▶ model organisms, where researchers can retrieve As exemplified by the above cases, at the core of data-
large portions of the data available on the same set intensive research is an emphasis on the epistemic
of phenomena through community databases. This value of data beyond the experimental context in
unrivaled level of data sharing is fueling the discov- which they are originally produced. Supporters of
ery of new regulatory roles of specific genes or data-intensive research stress that the way in which
pathways. The triangulation of existing evidence data can be used as evidence varies depending on the
thus furthers and transforms existing understand- scientific context in which they are considered. This
ings of phenomena. This research strategy is hardly flexibility as “raw materials” of science is what makes
new, yet the use of digital technologies makes it the dissemination and integration of data across bio-
tremendously more efficient. It is crucial that the logical domains into an effective route toward dis-
data in question are in a digital format, which makes covery (Leonelli 2009). This does not necessarily
it possible to disseminate them widely and retrieve mean that the accumulation and sharing of data con-
them instantly. stitutes the best starting point for inquiry, yet critics of
• Discovery through data mining data-intensive approaches have stressed the risk
Online databases can also be searched for emerging of “induction beckoning again” (Allen 2001) and
patterns or correlations that could not have been emphasized that proponents of data-driven science
predicted otherwise. A striking instance of this tend to portray data as the primary source of scientific
method is the idea of “random walks” through knowledge and to stress the value of inductive
data, where software is used to mine datasets to procedures over and above other research methods.
spot statistically significant patterns (e.g., gene This impression is heightened by the frequent
expression). It is not obvious that these patterns juxtaposition of data-intensive methods to
Data-Intensive Research 547 D
“hypothesis-driven,” deductive research, which sug- ▶ ontologies (Renear and Palmer 2009). Indeed, it
gests that data mining can lead to the formulation of has been claimed that data-intensive science cannot
testable claims without recourse to preconceived advance without substantial investment in a reliable
hypotheses (Evans and Rzhesky 2010). These inter- cyberinfrastructure which can be easily updated by
pretations of data-intensive research as purely induc- users, particularly when this approach is put to the
tive and independent from existing theoretical service of systems biology (Philippi 2006).
expectations are very problematic and untenable in 2. Successful examples of data-intensive science illus-
the face of actual scientific practice. Reusing data for trate the need for these methods to be embedded in
the purposes of discovery involves a complex ensem- a wider spectrum of scientific practices, ranging D
ble of skills and methodologies, which go well from theoretical analysis to ▶ experiments and
beyond an inductive approach. Extracting biologi- field observations. Data-intensive research thrives
cally meaningful inferences from high-throughput when exposed to iterative feedback with in vivo
genomic data involves, for instance, reliance on the- research (O’Malley et al. 2009).
ories about gene expression and regulation, specific
models of the biological processes being regulated, The Heuristic Value of Data-Intensive Science
common standards for the formatting and visualiza- The relation between data-intensive science and other
tion of genomic data, and familiarity with the instru- forms of research is difficult to describe in simple and
ments and organisms from which data were originally general terms, yet this complexity is precisely what
obtained. This methodological complexity is what makes data-intensive methods interesting as a new
makes it difficult to pinpoint the epistemic character- approach to scientific inquiry. Recent advances in
istics of data-driven research as a unique, emerging bioinformatics and successful applications of data-
mode of inquiry. At the same time, methodological driven methods have shown that the analysis of data
complexity aligns this form of research with the goals in silico cannot be fully automated, nor should it be
and methods favored in system biology: the pursuit of disjoint from experimental research in vivo. How-
complex, interdisciplinary interactions; and the use of ever, computational tools have the power to substan-
a variety of methods to achieve an integrated under- tially transform how research is performed and the
standing of living organisms (Philippi 2006). ways in which ▶ experiments are set up, carried out,
and verified. Data-intensive science has great heuris-
The Limits of Automation tic value, since findings emerging from the analysis of
Another characteristic common to the three examples large datasets can be used to challenge and re-direct
of data-driven methods given above is reliance on the means and targets of experimental research. This
automated data analysis: “smart” software is assigned does not mean that data-driven methods should
a prominent role in facilitating the extraction of pat- always be conceived as the starting point for experi-
terns from data, either through statistical analysis or mental inquiry. They constitute resources that can
through search mechanisms in databases. While auto- potentially complement all stages of research and
mated techniques for data analysis and hypothesis gen- can thus be fruitfully applied to any project, even if
eration are proliferating and becoming increasingly in each case their function is likely to be different
sophisticated, there are two good reasons to believe depending on the specific goals, context, and
that biological research cannot and should not be fully resources available.
automated:
1. Research within the field of bioinformatics, and
particularly database curation, has shown that data Cross-References
mining processes are only reliable when resources
and expertise are invested in selecting high-quality ▶ Automated Reasoning
data for insertion in databases; and in building data- ▶ Bio-Ontologies
bases that make use of efficient search engines, ▶ Community Database
visualization systems for data, and interoperable ▶ Data Mining
D 548 Data-Intensive Science
▶ DNA Sequencing
▶ Experiment dbSNP
▶ Ontology Lookup Service for Controlled
Vocabularies and Data Annotation Jingky Lozano-K€uhne
Department of Public Health, University of Oxford,
Oxford, UK
References
Data-Intensive Science
DDBJ Genome Resources
▶ Data-Intensive Research
Akos Dobay1 and Maria Pamela Dobay2
1
Institute of Evolutionary Biology and Environmental
Studies (IEU), University of Zurich, Zurich,
Date Hub Switzerland
2
Department of Physics, Ludwig-Maximilians
▶ Hub University, Munich, Germany
Definition
DBID
The DNA Data Bank of Japan (DDBJ; http://www.
▶ Database Identifier ddbj.nig.ac.jp) is a collection of nucleotide
DDBJ Genome Resources 549 D
sequence data. Although DDBJ collects data DDBJ Genome Resources, Table 1 List of the databases
worldwide, most of the direct submissions are available at DDBJ
from Japanese researchers. DDBJ continuously Service
exchanges the collected data with the European name Description
Bioinformatics Institute (▶ EBI Genome Resources) INSD- The INSD-core data contain all the traditional
core sequences of complete genomes, but exclude
and the National Center for Biotechnology Informa- whole-genome shotgun (WGA) sequences, mass
tion Database (NCBI; http://www.ncbi.nlm.nih.gov). sequence for genome annotation (MGA), and third
The three databases are part of the International party annotation (TPA) (Kaminuma et al. 2010;
Nucleotide Sequence Database (INSD; http://www. Kaminuma et al. 2011) D
insdc.org) consortium, whose function is to WGS The whole-genome shotgun contains large sets of
overlapping and finished sequences without
ensure the integrity of the shared information. annotation from ongoing genome projects
Apart from the genome resources, DDBJ also has (Kaminuma et al. 2010; Kaminuma et al. 2011)
resources for protein sequences and protein MGA The mass sequence for genome annotation is
structures. comprised of sequences that are produced in a large
quantity for the purpose of genome annotation
(Kaminuma et al. 2010)
TPA Third party annotation data are assembled using
Characteristics primary entries of publicized nucleotide sequence
data collections, with additional features
DDBJ was an initiative of Japanese molecular biolo- determined by experimental or inferential methods
from a pool of experts (Cochrane et al. 2010)
gists and biophysicists (Tateno and Gojobori 1997)
DTA DDBJ trace archive is a permanent repository of
that began its activities in 1986, in collaboration DNA sequence chromatograms
with the European Molecular Biology Laboratory DRA DDBJ sequence read archive is a repository for
(EMBL; http://www.embl.de) and the genetic sequencing data from next-generation sequencing
sequence database (Genbank; http://www.ncbi.nlm. technologies. A web-based metadata creation tool
nih.gov/genbank) of the National Institutes of called MetaDefine has been released in March 2010
to facilitate the submission process
Health (NIH; http://www.nih.gov). The maintenance
DAD The DDBJ amino acid database contains amino acid
and the development of DDBJ are organized by sequences extracted from the nucleotide flat files
the Center for Information Biology and DNA present in the DDBJ periodical release and TPA dataset
Data Bank of Japan (CIB-DDBJ; http://www.cib. GTPS The Gene trek in prokaryotic space is a re-annotated
nig.ac.jp/) of the National Institute of Genetics database that uses motif scans
(NIG; http://www.nig.ac.jp/english/index.html). GIB The genome information broker is a comprehensive
data repository of complete microbial genomes
Ninety-nine percent of the data coming from Japanese
GIB-V The genome information broker for viruses is a
researchers and available from INSD are submitted repository for complete virus genomes. For other
through DDBJ (Kaminuma et al. 2010, 2011). virus genome resources, refer to the ▶ NCBI viral
Researchers from China, Korea, and Taiwan genomes resources
are also mostly submitting their data through CIBEX The Center for Information Biology Gene
Expression database (http://cibex.nig.ac.jp) is a
DDBJ. When the submission does not include
public database for mircoarray data (▶ Microarray
a large number of sequences, as the case is from data, parallel and distributed preprocessing)
whole-genome shotgun (WGS) data or mass sequence DOR The DDBJ omics archive stores quantitative data
data for genome annotation (MGA) (Sugawara from both microarrays (▶ Microarray data, parallel
et al. 2009), DDBJ offers a user interface and distributed preprocessing) and new high-
throughput sequencing platforms. DOR also
called SAKURA. Up to now, DDBJ has released integrates CIBEX and exports the data to
several databases. As the case is in the ▶ NCBI ArrayExpress database (▶ EBI genome resources;
BioProject genome resource, researchers have the Kodama et al. 2010)
option to submit their projects prior to full comple- GTOP The genomes TO protein structures and function
tion. DDBJ also provides genomic information database consists of data analyses of proteins by
application of various computational tools to the
as well as resources for protein sequences and protein amino acid sequences of genome projects
structures. Table 1 summarizes the available data- sequenced to its entirety (Kawabata et al. 2002;
bases in DDBJ. Fukuchi et al. 2009)
D 550 De Novo Computational Discovery of Motifs
Other Resources
The DDBJ also includes patent data transferred from De Novo Computational Discovery of
the Japan Patent Office (JPO; http://www.jpo.go.jp), Motifs
the Korean Intellectual Property Office (KIPO; http://
www.kipo.go.kr), as well as from the United States Jianhua Ruan
Patent and Trademark Office (USPTO; http://www. Department of Computer Science, University of Texas
uspto.gov) and the European Patent Office (EPO; at San Antonio, San Antonio, TX, USA
http://www.epo.org).
Definition
Cross-References
A class of computational methods for finding tran-
▶ EBI Genome Resources scriptional binding motifs within a set of promoter
▶ Genome Annotation sequences. This is different from the scenario where
▶ Microarray Data, Parallel and Distributed one needs to search promoter sequences for the binding
Preprocessing sites of a known transcription factor binding motif
▶ NCBI Bioproject Genome Resources (e.g., a consensus or a PSWM).
▶ NCBI Viral Genomes Resources
References
Death Rate
Cochrane G, Bates K, Apweiler R, Tateno Y, Mashima J,
Kosuge T, Mizrachi IK, Schafer S, Fetchko M (2010)
▶ Life Span, Turnover, Residence Time
Evidence standards in experimental and inferential
INSDC third party annotation data. OMICS J Int Biol 10 ▶ Lymphocyte Population Kinetics
(2):105–113
Fukuchi S, Homma K, Sakamoto S, Sugawara H, Tateno Y,
Gojobori T, Nishikawa K (2009) The GTOP database in
2009: updated content and novel features to expand and
deepen insights into protein structures and functions. Nucleic
Acids Res 37(suppl 1):D333–D337 Decentralized Version Control
Kaminuma E, Mashima J, Kodama Y, Gojobori T, Ogasawara O,
Okubo K, Takagi T, Nakamura Y (2010) DDBJ launches a
new archive database with analytical tools for next- ▶ Distributed Version Control System (DVCS)
generation sequence data. Nucleic Acids Res 38:D33–D38
Kaminuma E, Kosuge T, Kodama Y, Aono H, Mashima J,
Gojobori T, Sugawara H, Ogasawara O, Takagi T, Okubo K,
Nakamura Y (2011) DDBJ progress report. Nucleic Acids Res
39:D22–D27
Kawabata T, Fukuchi S, Homma K, Ota M, Araki J, Ito T,
Ichiyoshi N, Nishikawa K (2002) GTOP: a database of
Decision Rule
protein structures predicted from genome sequences. Nucleic
Acids Res 30:294–298 ▶ Prediction Rule
Kodama Y, Kaminuma E, Saruhashi S, Ikeo K, SugawaraH TY,
Nakamura Y (2010) Biological databases at DNA data bank
of Japan in the era of next-generation sequencing technolo-
gies. Adv Exp Med Biol 680:125–135
Sugawara H, Ikeo K, Fukuchi S, Gojobori T, Tateno Y (2009)
DDBJ dealing with mass data produced by the second
generation sequencer. Nucleic Acids Res 37:D16–D18
Decision Theory
Tateno Y, Gojobori T (1997) DNA data bank of Japan in the age
of information biology. Nucleic Acids Res 25:14–17 ▶ Bayesian Decision Analysis
Decision Tree 551 D
Figure 1 illustrates decision tree structures based on
Decision Tree two simple examples from everyday life. To empha-
size the tree analogy, the decision trees depicted in
Daniel Berrar1 and Werner Dubitzky2 Fig. 1 have their root node at the bottom of the diagram
1
Interdisciplinary Graduate School of Science and and the leaf nodes at the top. In practice, decision trees
Engineering, Tokyo Institute of Technology, are usually visualized with the root node at the top and
Midori-ku, Yokohama, Japan the leaf nodes at the bottom.
2
Biomedical Sciences Research Institute, University of Consider Fig. 1. In the Name Title example, there
Ulster, Coleraine, UK are three leaf nodes labeled Mr, Mrs, and Miss, and two D
non-leaf nodes labeled with the attributes Gender (root
node) and Marital Status. The branches are labeled
Synonyms with the corresponding attribute values: male and
female (for attribute Gender) and married and not
Classification tree; Regression tree married (for attribute Marital Status). In the Jogging
example, there are seven leaf nodes (labeled Yes and
No, respectively), three non-leaf nodes (labeled with
Definition the attributes Outlook, Humidity, and Temperature),
and ten branches labeled with the corresponding attri-
A decision tree refers to both a concrete decision bute values. Figure 1 also illustrates the corresponding
model used to support decision making and decision rules (▶ Prediction Rule).
a method to construct such models automatically A decision tree whose leaf nodes are labeled with
from data. As a model, a decision tree refers to discrete class labels is referred to as classification tree
a concrete information or knowledge structure to sup- (▶ Model Testing, Machine Learning). A decision tree
port decision making, such as classification (▶ Model that uses continuous values or value ranges is referred
Testing, Machine Learning) and regression to as regression tree (▶ Regression Analysis) (Breiman
(▶ Regression Analysis) tasks, processes or analyses. et al. 1984). A decision-tree structure represents
As a method, a decision tree comprises various tech- a ▶ directed acyclic graph which satisfies the follow-
niques to construct decision tree models in a highly ing properties:
automated fashion using specific algorithms and mea- 1. There is exactly one node, called the root, into
sures from the field of statistics, machine learning which no edges (branches) enter.
(▶ Model Validation, Machine Learning) and artifi- 2. Each node other than the root has exactly one enter-
cial intelligence. ing edge (branch).
3. There is a unique path from the root to each non-
root node.
Characteristics Each path from a decision tree’s root node to a leaf
node can be interpreted as a decision rule (▶ Decision
Decision-Tree Structure Rule, ▶ Machine Learning) which has a condition and
A decision tree is composed of nodes and branches that conclusion part. This may be expressed using the
connect the nodes (Quinlan 1993; Hastie et al. 2001; IF-THEN notation or the symbol for logical implica-
Duda et al. 2001). Two basic node types are distin- tion as follows:
guished: leaf nodes and non-leaf nodes (a special non-
leaf node is the root node). Each non-leaf node is IF condition THEN conclusion
labeled with an attribute or a question. The branches
condition ) conclusion
emanating from a non-leaf node correspond to the
possible values of the attribute or the answers to the
question. The leaf nodes of a decision tree are labeled If the input information meets all the conditions
with a class or category. described in the condition part, then the conclusion
D 552 Decision Tree
Name Title: Decision tree model used to Yes Jogging: Decision tree model
No
determine name title used to decide whether
or not to go jogging
< 20 C ≥ 20 C
Mr Mrs Miss
Temperature No
No
m
married not married Yes Yes
mediu
low < 25 C
male high
Marital Status ≥ 25 C
Decision Tree, Fig. 1 Simple decision trees facilitating classi- jogging (right diagram). Circles depict leaf nodes, boxes depict
fication tasks in everyday life: determining the English (name) non-leaf nodes, and arrows represent branches. The rule sets
title or honorific (left diagram), and deciding whether or not to go below the decision trees describe the corresponding decision rules
stated in the conclusion part is asserted. Thus, the 2. Derive the decision tree (or equivalent set of deci-
decision structure of a decision tree can be formulated sion rules) by means of an inductive process that
as a set of decision rules (this is illustrated in Fig. 1). generalizes from the available facts (data).
The decision rules derived from a decision tree may be In not so well-understood domains (like biological
associated with quantities expressing confidence and knowledge discovery), the machine learning
support as illustrated in Fig. 2. approach has the added advantage that by automati-
cally generating a decision tree from available data it
Decision-Tree Construction may be possible to reveal relationships in the data
Once a decision tree is constructed, it can be used to aid that may not be obvious to the investigator or
decisions of a decision maker, humans or machines. decision maker. A decision-tree learning algorithm
There are two basic approaches for creating decision determines, for instance, which attribute should be
trees. Firstly, decision trees may be generated manu- placed at the root node given the input data,
ally by knowledge engineers working with human thus providing an insight as to what attribute has
experts or textbooks. This approach is effective in the strongest influence in the partitioning of the
well-understood domains but involves a considerable underlying instances. The knowledge-discovery
human effort and time. Secondly, decision trees may aspect as well as the symbolic encoding of the knowl-
be derived automatically from available examples edge they represent make decision trees a useful
(data) by suitable machine learning (▶ Induction) method for exploring biological data. Decision
algorithms. The two basic steps involved in the trees have been widely used for exploratory
machine learning approach are: data analysis, classification, and regression tasks in
1. Obtain facts (data) about the decision making prob- biology (Zhang et al. 2001; Kingsford and Salzberg
lem or studied phenomenon. 2008).
Decision Tree 553 D
Training Set Partition of Learning Set Based on Attribute X
(Learning Instances)
Decision Tree, Fig. 2 Illustration of decision-tree learning based on a training set with ten instances and two attributes. Top:
Training set and partition determined for root node. Bottom: resulting final decision tree (left) and decision rules (right)
Decision-tree learning generalizes from observa- and a column represents either an attribute and its
tions by a process of induction. This process takes as values or the class labels.
input a set of specific observations or instances and Using the training instances as input, a decision-tree
generates general rules that cover them. To facilitate learning algorithm generates a decision-tree model by
automated decision-tree learning, the learning obser- recursively partitioning the instances via the following
vations or instances (referred to as training set main steps:
[▶ Model Training, Machine Learning]) are usually 1. Initialize. Provide the full set of observations (training
expressed as a table where a row represents an instance set) as input to the decision-tree learning algorithm.
D 554 Decision Tree
2. Analyze attributes: Determine the attribute that pro- applied to the remaining attributes (in this case only
duces the most uniform grouping or partitioning of attribute Y) at the next level. This leads to the deci-
instances based on the class label of the instances. sion-tree model and decision rule (▶ Prediction Rule)
3. Create node: If the predefined uniformity threshold depicted in Fig. 2. The model consists of three non-leaf
is reached, create a leaf node and label it with the nodes (including the root node) and four leaf nodes.
corresponding class label. Otherwise, create a non- Whereas three leaf nodes are unambiguously associated
leaf node that tests the attribute, assign to it the with the class label A and B, respectively, Leaf Node #2
instances of the corresponding partition, then go to cannot be assigned to a unique class label.
Step 2. The class label frequencies represented by the leaf
The eventual result of the decision-tree learning nodes are used to express likelihood estimates for the
procedure outlined above is a decision-tree model in classification of unseen instances. Unseen instances
which all or the majority of instances at a leaf node are those that comply with the structure of the learning
show the same class label. instances but have not been used in the decision learn-
A challenge in decision-tree learning is to maximize ing process. To illustrate this idea, consider the deci-
uniformity or purity of the instances assigned to the sion tree model in Fig. 2. An unseen instance that is
leaf nodes of a decision-tree model while ensuring that characterized by the attribute values X ¼ 1 and Y ¼ 0
the model generalizes well to unseen instances, i.e., would be labeled (classified) with the class label A.
instances that do not feature in the training set. The Because three out of five instances in the corresponding
latter is also referred to as generalization ability, or Leaf Node #4 carry the class label A, the conditional
bias-variance trade-off. probability for this classification would be given as 3/5.
Measures of uniformity, purity, or homogeneity The conditional probability associated with the predicted
commonly employed by classification tree learning outcome of decision tree and similar models is also
algorithms include the information theory measures referred to as confidence. Another measure that is fre-
of entropy (▶ Entropy), the Gini index, and the quently used in the context of decision trees is called
Kolmogorov–Smirnov distance. Regression-tree support. The support of a decision rule (corresponding to
learning algorithms make use of variance minimiza- a path from root to a leaf node in a decision tree model) is
tion techniques for the same purpose. determined as the proportion of instances in the learning
The following example illustrates the concept of set that satisfy the rule. For example, the decision rule R3
decision tree learning for a binary classification task corresponding to Leaf Node #4 in the decision tree
(▶ Model Testing, Machine Learning), i.e., a task model in Fig. 2 has a support of 3/10 because 3 of 10
involving exactly two class labels. The training set items (#1, #2, and #5) satisfy the rule. The aim in
(▶ Model Training, Machine Learning) in this example decision tree learning is to construct a decision tree
comprises ten instances, each described by two numeric model with a high confidence and support.
attribute values and one class label. The attributes are
called X and Y and assume values from the set {0,1}, Strengths and Weaknesses of Decision Trees
and the class labels are drawn from the set {A,B}. Six Strengths
instances in the training set carry the class label A, and • Decision-tree models capture knowledge in an
four the label B. Figure 2 depicts the training set as well easy-to-interpret knowledge structure, either as
as a partition (composed of two groups of instances) hierarchical trees or sets of decision rules.
obtained from applying the information-theoretic • Decision-tree learning offers a powerful approach
entropy uniformity measure. According to this measure, to automatically discover decision-tree models
higher uniformity corresponds to more information, from large data sets. Decision-tree learning can be
which is corresponds to lower entropy. used to reveal important relationships in data.
Because the ▶ information gain for the partition • The computational costs associated with decision-
derived from the values of attribute X is higher than tree learning are low to moderate.
that for attribute Y, attribute X is chosen to split the data • Decision trees can process both discrete and con-
at the root node. This is illustrated by the decision tree tinuous variables and have an intrinsic ability to
depicted in Fig. 2. The same learning process is handle missing values.
Declarative Language 555 D
Weaknesses
• Decision-tree learning is highly sensitive to Declarative Language
changes in the training set. Small changes in the
learning set may lead to considerably different Amanda Clare1 and Martin Swain2
1
decision-tree models. This is also referred to as the Department of Computer Science, Aberystwyth
stability problem in machine learning. University, Aberystwyth, UK
2
• For a given data set or decision problem, the Institute of Biological, Environmental, and Rural
orthogonal decision regions generated by axis- Sciences, Aberystwyth University, Aberystwyth,
parallel decision-tree learning methods may not Ceredigion, UK D
sufficiently approximate the true underlying deci-
sion regions.
Synonyms
Decision-Tree Algorithms and Tools
There is a large variety of decision-tree algorithms and Declarative programming
tools; open source tools widely used in computational
biology include Weka (Weka, Machine Learning
Tool) and R (▶ R, Programming Language).
Definition
Deduction f _ j
f j j
C. Maria Keet
KRDB Research Centre, Free University of In addition, there are two rules for the quantified for-
Bozen-Bolzano, Bolzano, Italy mulas. First, if a model satisfies a universally quanti-
fied formula (8), then it also satisfies the formula where
the quantified variable has been substituted with some
Definition term (and the prescription is to use all the terms which
appear in the tableaux),
Deduction is a way to ascertain if some theory T, which
consists of one or more axioms expressed in a suitable 8x:f
logic language, entails a conclusion, which is an axiom ffX=tg
a that is not explicitly asserted in T; this is written as 8x:f
T j¼ a. That is, a can be derived from the premises
using a set of deduction rules. and, second, for an existentially quantified formula, if
a model satisfies it, then it also satisfies the formula
Usage where the quantified variable has been substituted with
Deduction is used widely in ▶ knowledge representa- a new Skolem constant,
tion and the semantic web, including checking consis-
tency of ▶ bio-Ontologies, using automated reasoners 9x:f
(▶ Automated Reasoning). ffX=ag
Characteristics Example
Let us take some arbitrary theory T that contains two
There are various ways how to ascertain T j¼ a, be it axioms stating that relation R is reflexive (8x.R(x,x),
manually or automatically. One can construct a step- a thing relates to itself) and asymmetric (8x,y. R(x,y)
by-step proof (▶ Proof, Logic) “forward” from the ! :R(y,x); if a thing a relates to b by relation R, then b
premises by applying the deduction rules or prove it does not relate back to a). We then can deduce, among
indirectly such that T [ {:a} must lead to others, that T [ {:8x,y.R(x,y)} is satisfiable. We do
a contradiction. The former approach is called natural this by demonstrating that the negation of the axiom is
deduction, whereas the latter is based on techniques unsatisfiable.
such as resolution, matrix connection methods, and To enter the tableau, we first rewrite the asymmetry
sequent deduction (which includes tableaux). into a disjunction using equivalences, that is, 8x,y.
Concerning deduction rules for tableaux and first R(x,y) ! :R(y,x) is equivalent to 8x,y. :R(x,y) ∨ :R
order predicate logic formulae, we have, as with the (y,x), and add a negation to {:8x,y.R(x,y)}, which
Deductive Reasoning 557 D
Deduction, Table 1 Tableau example
Number Tableau Explanation
1 "x.R(x,x) Reflexivity axiom in the original theory T
2 "x,y. ¬R(x,y) Ú ¬R(y,x) Asymmetry axiom in the original theory T
3 "x,y.R(x,y) The negated axiom added to theory T
4 Substitute x for term a in 1,2,3
5 R(a,a)
6 "y. ¬R(a,y) Ú ¬R(y,a)
7 "y.R(a,y) D
8 Substitute y for term a in 2 and 3
9 R(a,a)
10 ¬R(a,a) Ú ¬R(a,a)
11 R(a,a)
12 Split the disjunction of 10
13 ¬R(a,a) ¬R(a,a) Which each generate a clash with 9 and 11, hence, :8x,y.R(x,y) is entailed by T
Cross-References References
References
Synonyms
Cross-References
Deltaretrovirus
▶ HTLV, Cellular Transcription
Definition
References
Retroviridae is a family of retroviruses whose replica-
tion involves one reverse transcription step. Older pub- Index of Viruses – Retroviridae (2006) In: B€uchen-Osmond
C (ed) ICTVdB – The Universal Virus Database. Columbia
lications often refer to three retrovirus subfamilies
University, New York (Version 4)
Oncovirinae, Lentivirinae, and Spumavirina. Nowa- Van Dooren S, Salemi M, Vandamme AM (2001) Dating
days, the classification is obsolete and retrovirus is the origin of the African human T-cell lymphotropic
grouped into two subfamilies Orthoretrovirinae and virus type-i (HTLV-I) subtypes. Mol Biol Evol 18(4):661–
671
Spumaretrovirinae. The genera Betaretrovirus, Wolfe ND, Heneine W, Carr JK, Garcia AD, Shanmugam V,
Gammaretrovirus, Alpharetrovirus, Deltaretrovirus, Tamoufe U, Torimiro JN, Prosser AT, Lebreton M, Mpoudi-
and Lentivirus belong to the subfamily of Ngole E, McCutchan FE, Birx DL, Folks TM, Burke DS,
Orthoretrovirinae (Index of Viruses – Retroviridae Switzer WM (2005) Emergence of unique primate
T-lymphotropic viruses among central African bushmeat
2006). Deltaretroviridae are complex retroviruses,
hunters. Proc Natl Acad Sci USA 102(22):7994–7999
whose genomes contains besides LTR-gag-pol-env-
LTR a number of accessory genes. Deltaretrovirus
infects vertebrates. Bovine leukemia virus (BLV),
Human T-lymphotropic virus 1, 2, 3, and 4 (HTLV-1,
-2, -3, and 4) and Simian T-lymphotropic virus 1, 2, Deltaretrovirus
and 3 (STLV-1, -3, -4) were isolated from cow, human,
and various nonhuman primates. HTLV-1 and its sub- ▶ Deltaretroviridae
types are thought to originate from various indepen-
dent interspecies transmissions from simians to
humans starting around 50,000 years ago. The most
recent HTLV-1 subtype f probably emerged some
3,000 years ago (Van Dooren et al. 2001). Dense Overlapping Regulon Motif
HTLV-1 and HTLV-2 are human-pathogenic
viruses that are transmitted by sexual contact, ▶ Dense Overlapping Regulons
D 560 Dense Overlapping Regulons
Characteristics
combination of the biological and technical variation, to a statistician once the data are collected. In practice,
and it may be difficult to detect the systematic differ- experimental design and statistical modeling are
ences between conditions. tightly interconnected. The understanding of the down-
Blocking improves upon these two aspects when stream statistical analysis helps us determine the opti-
confounding factors are known in advance, and is mal design and maximize our ability to detect
performed by imposing restrictions on randomization. quantitative changes in the response for a given
The two sources of variation (biological and technical) amount of replication and cost.
lead to two types of blocking. In the context of sample Statistical inference requires a probability model
selection, blocking is sometimes referred to as that describes three aspects of the experiment. First,
matching. In the diabetes example, a block is the model describes the sources of systematic varia-
a combination of known confounding factors tion in the response (such as conditions, subjects, and
(e.g., age, gender, ethnicity). The design enforces an blocks) in the experimental design. Second, the
equal number of subjects with diabetes and controls in model describes the scope of interpretation that we
a block, and randomly selects subjects with these char- associate with the biological replicates, i.e., whether
acteristics from the population. When multiple mea- we restrict our conclusions to the subjects in the
surements are possible on a same subject, e.g., two study or generalize the conclusions to the entire
treatments can be applied to two skin patches, the underlying populations. In systems biology, many
subject forms a block and receives both treatments, experiments aim at generating initial hypotheses for
and the pairs of the patches are matched. a subsequent follow-up, and the number of biological
For data acquisition, blocking is sometimes referred replicates is too small to adequately represent the
to as multiplexing. If a particular experimental step is underlying populations. In these cases, it is useful to
noisy but can accommodate multiple samples, it can be restrict the scope of the conclusions to the selected
viewed as a block. Blocking enforces the constraint that samples only and treat them as fixed units. On the
the step processed an equal number of samples from other hand, experiments at the validation stage aim at
each condition. An example of block is a two-color generalizing the conclusions to the underlying
cDNA mocroarray, and a variety of strategies for allo- populations, and in this case, the biological replicates
cating samples to arrays have been proposed (Dobbin are best viewed as random selections from the
and Simon 2002). In an RNA-seq experiment, blocking populations and are represented in a mixed or
can utilize the capacity to label the samples with sam- multilevel model.
ple-specific sequences (“bar codes”) that allow multiple Finally, the probability model describes the nature
samples to be sequenced in a same lane of a flow-cell, of the nonsystematic variation in the response, based
making within-lane differences more consistent (Auer on the information from a pilot study or from the
and Doerge 2010). Blocking strategies for mass spec- literature. For example, in gene expression
trometry-based proteomic experiments have also been microarrays, the variation of log-response can be
discussed (Oberg and Vitek 2009). A randomization of assumed normally distributed and leads to models
samples within blocks is always required to account for such as analysis of variance (ANOVA) (▶ Analysis
the unknown confounding effects. of Variance (ANOVA) Tables). In the RNA-seq exam-
Blocking enforces a strict balance of the ple, the response is the count of reads and can be
confounding factors between conditions and prevents described using a Poisson or a negative binomial
bias. In addition, if the downstream statistical analysis distribution. Frequentist modeling treats the unknown
distinguishes the variation of the response between the parameters of these distributions as fixed, while
blocks from the within-block variation between the Bayesian models specify the prior distributions of
conditions, it increases the sensitivity of tests. all the parameters, and Empirical Bayes models
estimate the parameters of the prior distributions
Step III: Statistical Modeling from the data.
Describe the Anticipated Properties of Data in
a Statistical Model Describe the Model-Based Testing
Unfortunately experimentalists often consider statisti- A consequence of each probability model is the proce-
cal modeling as a separate task, which can be deferred dure for testing the scientific hypothesis of interest.
Designing Experiments for Sound Statistical Inference 565 D
For example, in analysis of variance, testing the null changes in the response between conditions requires
hypothesis of the same expected gene expression in more replication. Experiments with larger variation
two conditions leads to the student’s t-test. Count also require more replication. An increase in the num-
response leads to the Fisher’s exact test or tests based ber of biological replicates is always more efficient
on Poisson regression. In experiments with multiple than an increase in the number of technical replicates,
responses, such as the expression of genes, a separate and experiments without biological replication cannot
hypothesis is tested for each gene and the testing generalize their conclusions to the underlying
requires controlling a multivariate error rate, such as populations. Experiments with more responses, and
the false discovery rate. also experiments with a smaller proportion of expected D
changes in multiple responses, have more opportunity
Derive Model-Based Requirements of Sample Size for false discoveries and therefore require more
The combination of the protocol of replication, ran- replication.
domization, and blocking and of the probability model The implementation of the steps of the experimental
allows us to calculate the sample size (▶ Power and designs described in this entry depends substantially
Sample Size) (Lenth 2001; Wittes 2002), i.e., the desir- on the characteristics of the biological system and on
able number of biological and technical replicates, the technology at hand. The rapid technological
with two goals. First, sample size calculations are advances introduce constant changes into how
used to evaluate the expected operational characteris- the experiments are performed and into the properties
tics of the experiment. The number of replicates should of the resulting datasets. At the same time, they moti-
be large enough to enable the detection of biologically vate new developments in statistical methodology.
significant changes in the response, but not too large in As the result, experimental planning becomes increas-
order to optimize the cost. Second, sample size calcu- ingly complex and cannot always be achieved in a fully
lations allow us to compare various strategies of sam- automated fashion. A close collaboration between
ple selection and resource allocation, e.g., various experimentalists and statisticians at all steps of the
strategies of allocation samples to blocks. The optimal experiment, starting from the earliest stages of exper-
strategy will maximize our ability to detect a change in iment planning, is highly recommended.
the response for a fixed number of biological and
technical replicates.
In frequentist modeling, each test is characterized Cross-References
by five properties: (1) the probability of type I error
(i.e., the probability of rejecting the null hypothesis ▶ Analysis of Variance
when it is true), (2) the probability of type II error ▶ Design of Experiments
(i.e., the probability of not rejecting the null hypothesis ▶ Experimental Design, Variability
when the alternative is true), (3) the extent of known ▶ False Discovery Rate (FDR)
sources of variation, (4) the biologically significant ▶ Fisher’s Test
change in the response, and (5) the number of biolog- ▶ Frequentist Inference
ical and technical replicates in each condition. ▶ Hypothesis Testing
A modification is introduced for experiments with ▶ Hypothesis Testing, Bayesian vs Frequentist
multiple responses, such as RNA-seq. The probability ▶ Hypothesis Testing, Parametric vs Nonparametric
of type I error in (1) is replaced with the false discovery ▶ Mixed and Multi-Level Models
rate, and an additional quantity (6), the expected pro- ▶ Multiple Hypothesis Testing
portion of the true null hypotheses in all tests, is spec- ▶ Poisson Regression
ified. The quantities (l)–(6) are interconnected, and ▶ Power and Sample Size
a prior specification of all of them but one allows us ▶ Quantitative Experiment Design
to solve for the remainder. In particular, the specifica- ▶ RNA-Seq
tion of (l)–(4) and (6) allows us to solve for the sample ▶ Sample Variability, Inter-groups
size. ▶ Sample Variability, Intra-groups
Several general conclusions can be made from the ▶ Statistical Methods in Systems Biology
calculations of sample size. Detection of smaller ▶ Student’s t-Test
D 566 Deuterated Glucose (2H-Glucose)
References
Developmental Control p
Networks, Fig. 1 Flexible
exponential-linear stochastic 1− p
stem cell network. One stem j1
cell (S) divides to produce two
cells (J1 and J2) that each J1
S J2 T
stochastically activates either
the stem cell itself or
a terminal cell (Werner 2011b)
j2 1− q
Stochastic Developmental Networks stem cell results in a terminal tumor that does not
A stochastic developmental control network contains develop further because it consists only of cells of
links to nodes that have an associated probability p and terminal type T. This shows that stochastic cancer
only jump to that node with probability p. Consider an Stem Cell Networks (▶ Cancer Networks) can in
example of a stochastic, flexible developmental net- some cases go into spontaneous remission. While
work whose characteristics and topology are deter- this network can also exhibit exponential growth
mined by the probabilities of its links: even in a stochastically linear probability distribu-
The network in Fig. 1 is very flexible. Depending tion, because of the two backward loops, there is
on the probability distribution, the network can a diminishing probability that it remains exponential.
range between being exponential, linear, or terminal, Thus, whether this network results in linear or expo-
as well as every mixture in between. Furthermore, nential proliferation depends on the probability
the network can be deterministic, stochastic, or distribution.
mixture of both. This flexibility is partly the result For this stochastic network, if the probabilities
of separating out the probability distributions for the p ¼ 1q then the higher the probability of p the more
behaviors of the two daughter cells. It shows that the network approximates a deterministic linear devel-
the architecture of the network imposes constraints opmental network. Since, in this case the distribution is
on what kinds of developmental dynamics are in antisymmetric, the cell population partition of cell
principle possible. The probability distribution pre- types consists of an equal number of exponential
supposes a network architecture of possible develop- stem cells and terminal cells, with the majority of
mental paths. cells being linear stem cells. This corresponds to the
The cell S is only a stem cell stochastically and not observed distribution in epidermal basal stem cells. If,
intrinsically when p < 1 and q < 1. When p ¼ q ¼ 1 the on the other hand, we have a symmetric distribution
network is deterministic exponential. When p ¼ 1, where p ¼ q then the higher the probability of p
q ¼ 0 or p ¼ 0, q ¼ 1 the network is deterministic the more the network approximates a deterministic
linear, that is, a deterministic 1st order geometric stem exponential network. Thus, the type of cancer network
cell network. If p ¼ q ¼ 0 the network is terminal. we have depends on the probability distributions over
When p ¼ 1 and q < 1, or when q ¼ 1 and p < 1, the connecting stochastic links.
then the network is mixed deterministic linear with
stochastic exponential tendencies. The Network Evolution of Multicellular Organisms
If probabilities p ¼ q then as p and q approach 1 the The evolution of multicellular organisms is directly
network approaches the behavior of a deterministic linked to the increasing complexity of their
exponential network. However, if p and q are differ- developmental control networks. These networks are
ent then this network can simulate a linear stochastic an autonomous layer on top of normal gene networks
network as well when, for example, p approaches (Werner 2011a, b).
1 and q approaches 0, or vice versa. As the probabil- Developmental control networks provide a power-
ities p and q decrease, the more frequently the cancer ful framework for explaining diverse developmental
Developmental Networks 569 D
phenomena beyond stem cells and cancer. These across many species, and developmental theory (unlike
include bilateral symmetry and the evolution of the the evolutionary viewpoint) is mostly interested in
internal skeleton in bilateral symmetric animals commonalities across phyla (e.g., in regular constant
(Werner 2012a, b). developmental mechanisms such as apoptosis) rather
than in differences. Developmental modules exist at
many levels: genetic (GRNs), cellular (morphogenetic
Cross-References fields), and tissues (germ layers: ectoderm, etc.), and
therefore may have some overlap.
▶ Cancer Networks Although quasi-independence defines modules in D
▶ Cene general, developmental modules are not necessarily
▶ Cenome the same modules as the ones identified by physiology
▶ Stem Cell Networks or morphology (Winther 2001). For instance, the
mesoderm – a developmental module – gives rise to
the heart (a physiological module), but is also involved
References in the production of the vertebrate eye (another phys-
iological module).
Werner E (2011a) On programs and genomes, arXiv:1110.5265v1 Modularity seems tied to evolvability (Wagner and
[q-bio.OT]. http://arxiv.org/abs/1110.5265
Altenberg 1996) because it entails that no variation in
Werner E (2011b) Cancer networks: a general theoretical and
computational framework for understanding cancer, a subpart is likely to change the functioning of the
arXiv:1110.5865v1 [q-bio.MN]. http://arxiv.org/abs/1110. whole; therefore, mosaic evolution can be possible.
5865v1 Developmental modularity fulfills the same evolution-
Werner E (2012a) The origin, evolution and development
ary requirements. It raises the question of its evolu-
of bilateral symmetry in multicellular Organisms.
arXiv:12073289v1 [q-bioTO]. http://arxiv.org/abs/1207.3289 tionary origins: is it given with the first elementary
Werner E (2012b) How to grow an organism inside-out: evolu- eukaryote and generally the most basic of cell mecha-
tion of an internal skeleton from an external skeleton in nisms? Or has it been selected for some advantages, or
bilateral organisms. arXiv:12073624v1 [q-bioTO]. http://
evolved as a by-product of selection for some devel-
arxiv.org/abs/1207.3624
opmental mechanisms? No consensus is yet attained.
Cross-References
Developmental Module
▶ Explanation, Developmental
Philippe Huneman
Institut d’Histoire et de Philosophie (IHPST), des
Sciences et des Techniques, Université Paris 1 References
Panthéon-Sorbonne, Paris, France
Oliveri P, Tu Q, Davidson E (2008) Global regulatory logic for
specification of an embryonic cell lineage. PNAS
105(16):5955–5962
Definition Wagner G, Altenberg L (1996) Complex adaptations and the
evolution of evolvability. Evolution 50(3):967–976
A developmental module is a set of cells, or genes, Winther R (2001) Varieties of modules: kinds, levels, origins,
and behaviors. J Exp Zool 291:116–129
which is more intrinsically connected than connected
to its surroundings, which is constant across some
clades, and which plays a specific causal role in devel-
opment. For example endoderm is a developmental
module, but also the Gene Regulatory Network of the Developmental Networks
skeletogenic micromere cell lineage in sea urchins
(Oliveri et al. 2008). Developmental modules are ▶ Cene
important units of analysis because they are constant ▶ Developmental Control Networks
D 570 Developmental Systems Theory
D
Synonyms Differential Equations with Deviating
Arguments
DAH
▶ Dynamical Systems Theory, Delay Differential
Equations
Definition
ðtþ1Þ ðtÞ
Green JB (2008) Sophistications of cell sorting. Nat Cell Biol xij ¼ xij :
10(4):375–377
Krieg M, Arboleda-Estudillo Y, Puech PH, K€afer J, Graner F, ðtÞ ðtþ1Þ
Of xi and xi , the one that has higher fitness with
M€uller DJ, Heisenberg CP (2008) Tensile forces govern
germ-layer organization in zebrafish. Nat Cell Biol respect to the objective function f(x) is passed on to the
10(4):429–436 next generation.
D 572 Differential Expression Analysis
The iteration process is terminated when some stop distinct cell types. Three major categories of potency
criterion is satisfied. are totipotency, pluripotency, and multipotency. Toti-
potent cells are capable of differentiating into every
cell type of a particular organism in addition to the
Cross-References extraembryonic tissues. An example of totipotent
cells is those produced upon the fusion of a sperm
▶ Identification of Gene Regulatory Networks, and an egg up to the stage of a morula. Pluripotency
Neural Networks describes the potential of a stem cell to differentiate
into cells comprising any of the three germ layers:
ectoderm (gut, lung, etc.), mesoderm (blood, bone,
References muscle, etc.), and endoderm (skin, nervous system,
etc.). Examples of pluripotent stem cells include
Xu R, Venayagamoorthy GK, Wunsch DC II (2007) Modeling of embryonic stem cells and induced pluripotent stem
gene regulatory networks with hybrid differential evolution
(iPS) cells. Multipotent cells are those that can differ-
and particle swarm optimization. Neural Networks
20(8):917–927 entiate into multiple cell lineages, but only to
a restricted family of closely related cell types. Hema-
topoietic stem cells are an example of multipotent
stem cells as they can give rise to all blood cells but
Differential Expression Analysis not other tissue types such as neurons, muscle, or
epithelium.
▶ Gene Expression Biomarkers, Ranking
▶ Relative Expression Analysis
Cross-References
▶ Genomic Imprinting
Diffuse Division
Digital Metrics
D8 ðp; qÞ ¼ maxðjx sj; jy tjÞ (3) D
Virginio Cantoni, Riccardo Gatti and Luca Lombardi
Department of Computer Engineering and Systems
Science, University of Pavia, Pavia, Italy These definitions can be easily extended to the 3D
space.
Definition
D4 ðp; qÞ ¼ jx sj þ jy tj (1)
Synonyms
Digital Metrics, Table 2 Euclidean distance: pixels classifica-
tion in a 5 5 neighborhood Avidian
2√2 √5 2 √5 2√2
√5 √2 1 √2 √5
2 1 0 1 2
√5 √2 1 √2 √5 Definition
2√2 √5 2 √5 2√2
A digital organism is a self-replicating computer
program, usually within the Tierra or Avida evolution
Digital Metrics, Table 3 Chessboard distance: pixels classifi- platforms.
cation in a 5 5 neighborhood
2 2 2 2 2
2 1 1 1 2
2 1 0 1 2 Cross-References
2 1 1 1 2
2 2 2 2 2 ▶ Artificial Evolution
D 574 Dimer
Directory
Definition
▶ Workspace
A dimer is a molecule consisting of two subunits called
monomers. In biochemistry and molecular biology,
dimers of proteins or nucleic acids are often observed.
If the two subunits constituting a dimer are the same Disassembly of the Pre-initiation
monomers, the dimer is called a homodimer. If the two Complex
subunits constituting a dimer are different monomers,
the dimer is called a heterodimer. ▶ PIC Disassembly
Synonyms
Synonyms
Acyclic digraph; Directed acyclic network
Conformational epitope
Definition
Definition
A directed graph, or digraph, is a graph with directions
assigned to its edges. A digraph is usually denoted by Epitopes whose residues are distantly placed in the
D ¼ (V, A) where V¼fv1 ; ; vn g is a finite set of sequence brought together by physicochemical folding
nodes and A is a set of ordered pairs of nodes called constitute discontinuous epitopes. The epitope struc-
arcs. In a digraph, a cycle C¼fc1 ; ; ck g of D is ture is defined by protein folding process when the
a closed and non-repetitive sequence of nodes in V residues forming a discontinuous epitope are juxta-
such that ðcj ; cjþ1 Þ 2 A; j ¼ 1; ; k 1, c1 ¼ ck , posed, enabling the antibody to recognize its three-
and ci 6¼ cj ; i; j ¼ 1; ; k 1. A directed acyclic dimensional structure.
graph is a digraph without any cycles.
References
Discrete Model
Zhang XS (2000) Neural networks in optimization. Kluwer,
Dordrecht ▶ Logical Model
Disease Databases 575 D
References
Disease Classification or Discrimination
Baek S, Tsai C-A, Chen JJ (2009) Development of biomarker
classifiers from high-dimensional data. Brief Bioinform
James J. Chen
10:537–546
U.S. Food and Drug Administration, National Center Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang CH,
for Toxicological Research (HFT-20), Angelo M, Ladd C, Reich M, Latulippe E, Mesirov JP,
Jefferson, AR, USA Poggio T, Gerald W, Loda M, Lander ES, Golub TR
(2001) Multiclass cancer diagnosis using tumor gene expres-
sion signatures. Proc Natl Acad Sci 98:15149–15154
D
Synonyms
Jingky Lozano-K€uhne
Definition Department of Public Health, University of Oxford,
Oxford, UK
Disease classification is to systematically group dis-
eases into classes in a hierarchical structure according
to characteristics of disease: etiology, pathology, phys- Definition
iology, prognosis, and combinations of these. Every
disease is grouped into one and only one class. Disease Disease databases are resources containing informa-
discrimination is used to identify a set of features that tion on diseases, syndromes, and other medical
classy diseases into classes. Disease identification is to conditions. They may provide specific information on
examine and classify diseases into specific classes. the signs and symptoms, risk factors, treatment
Disease taxonomy is the practice and science of clas- regimen, and/or results of studies done on known
sification of diseases. Disease taxonomy is diseases and medical conditions. Diseases databases
a completing list of all disease types in the classifica- are available both in printed and online publications.
tion system. Nowadays, online resources have already supplanted
International Statistical Classification of Diseases many printed publications. Selected online disease
and Related Health Problems is a system of codes for databases used in systems biology researches are
classifying diseases and health problems published by presented below.
the World Health Organization. The Medical Dictio-
nary for Regulatory Activities is a clinically validated
international medical terminology used by the biophar- Characteristics
maceutical industry and is used as the adverse event
classification dictionary endorsed by the ICH. Classification of Disease Databases
Molecular classification of diseases based on geno- A disease database can be categorized as general
mic and proteomic profiling is used within the field of or specialized database depending on its scope.
systems biology. Molecular classification uses statisti- General disease databases contain a wider scope of
cal and machine learning methods to identify molecu- diseases and medical conditions, while specialized
lar markers of a specific disease or to develop databases are limited to certain types of diseases
prediction models to classify disease types (Baek such as cancer, tropical diseases, or genetic diseases.
et al. 2009). Classification models select a set of Many online disease databases such as the
molecular features to discriminate between different “MedlinePlus,” “Diseases Database,” and the
types of disease or between disease and normal groups “Online Mendelian Inheritance in Man (OMIM)”
(Ramaswamy et al. 2001). The selected molecular are publicly accessible for free. Some databases are
features, individually or as a set, may be further devel- maintained by universities and government institu-
oped to be probable biomarkers or valid biomarkers for tions, while others are maintained by private organi-
disease classification or disease discrimination. zations or interest groups.
D 576 Disease Databases
Disease Ontologies
Disease Mechanism Networks Disease ontologies are biomedical ontologies provid-
ing coverage for the domain of diseases, disorders,
▶ Ontology Analysis of Biological Networks illness, etc. The degree to which these words are
▶ Functional/Signature Network Module for Target synonymous is subject to debate (Ceusters and
Pathway/Gene Discovery Smith 2010), but disease ontologies generally cover
conditions understood as or suspected of being
a deviation from a healthy status and, less frequently,
diagnostic criteria for these conditions. Some
Disease Ontology disease ontologies also cover the manifestations of
these conditions, that is, the signs and symptoms
Olivier Bodenreider associated with them. However, the relations between
Lister Hill National Center for Biomedical conditions and their manifestations are usually not
Communications, US National Library of Medicine, recorded in ontologies, as such relations are not def-
Bethesda, MD, USA initional for most diseases. In fact, except for the
so-called pathognomonic manifestations, there is
a probabilistic, not systematic relation between
Definition a manifestation and a condition. Phenotypes are the
observable characteristics of organisms resulting
Background from the genetic makeup of a particular organism.
Disease ontologies can be traced back to the seven- There is partial overlap between phenotypes and dis-
teenth century, when health authorities in London used ease manifestations, and phenotypes may be covered
a standard list of about 200 causes of death to compile by disease ontologies.
accurate health statistics known as the Bills of Mortal- The principal use of disease ontologies is to
ity (Bodenreider 2008). This list was later integrated support the annotation of diseases in biomedical
into the International Classification of Diseases, the datasets, including the curation of knowledge bases,
11th revision of which is currently under active devel- the clinical documentation of electronic health
opment. Among the several hundredths of biomedical records and the indexing of the biomedical literature.
ontologies currently available, a few dozen provide Disease ontologies are also used for aggregation
coverage of diseases. A selection of these ontologies purposes (e.g., for grouping myocardial infarction
is presented in this brief review. and mitral stenosis under cardiovascular diseases),
as well as for clinical decision support (e.g., drugs
Biomedical Ontologies contra-indicated with asthma should not be pre-
Biomedical ontologies represent the properties of scribed to patients diagnosed with specific forms of
biomedical entities and their relations to other bio- asthma, such as seasonal asthma and occupational
medical entities. As such, biomedical ontologies are asthma).
artifacts used to represent and share knowledge about
the biomedical domain (Bodenreider and Stevens
2006). More specifically, biomedical ontologies tend Characteristics
to focus on definitional knowledge (i.e., what is
always true of biomedical entities), as opposed to Existing Disease Ontologies and their
assertional knowledge, usually found in knowledge Characteristics
bases. Ontologies also differ from terminologies, In this section, we review 17 ontologies, which roughly
whose focus is purely naming, and from thesauri, in qualify as disease ontologies according to the defini-
which knowledge is usually organized for a specific tion above. The list of ontologies is shown in Table 1,
purpose (e.g., information retrieval). Despite these along with a URL from which more information can be
differences, the term “ontology” is often used loosely, obtained. A list of salient characteristics for disease
as an umbrella name for these various kinds of ontologies is presented in Table 2. Finally, for each
artifacts. ontology, we provide a brief description and a list of
Disease Ontology 579 D
Disease Ontology, Table 1 List of disease ontologies
DO Disease Ontology – http://diseaseontology.sourceforge.net/
DSM Diagnostic and Statistical Manual of Mental Disorders – http://www.psych.org/
HPO Human Phenotype Ontology – http://www.human-phenotype-ontology.org/
ICD International Classification of Diseases – http://www.who.int/classifications/icd/en/
ICPC International Classification of Primary Care – http://www.globalfamilydoctor.com/wicc/
IDO Infectious Disease Ontology – http://www.infectiousdiseaseontology.org/
LOINC Logical Observation Identifiers Names and Codes – http://loinc.org
MEDCIN MEDCIN – http://www.medicomp.com/ D
MedDRA Medical Dictionary for Regulatory Activities – http://www.meddramsso.com/
MeSH Medical Subject Headings – http://www.nlm.nih.gov/mesh/
MPATH Mouse Pathology Ontology – http://www.pathbase.net/
MPO Mammalian Phenotype Ontology – http://www.informatics.jax.org/searches/MP_form.shtml
NCI Thes. NCI Thesaurus – http://ncit.nci.nih.gov/
NDF-RT National Drug File-Reference Terminology – http://evs.nci.nih.gov/ftp1/NDF-RT/
OMIM Online Mendelian Inheritance in Man – http://www.ncbi.nlm.nih.gov/omim/
PATO Phenotypic Quality Ontology – http://obofoundry.org/wiki/index.php/PATO:Main_Page
SNOMED CT SNOMED CT – http://www.ihtsdo.org/
characteristics summarized in Table 3, with additional annotation of the genetic diseases listed in OMIM.
notes in Table 4. Developed by a consortium including Charité Hos-
• Disease Ontology (DO): Controlled terminology pital (Berlin) and the University of Cambridge
originally created for annotation purposes as part (UK).
of the NuGene project at Northwestern University. • International Classification of Diseases (ICD):
Still under development. Classification from the World Health Organization
• Diagnostic and Statistical Manual of Mental Dis- (WHO) family of health classifications, with many
orders (DSM): Standard classification of mental local adaptations. ICD9-CM, developed by the Cen-
disorders in the United States, developed by the ter for Medicare & Medicaid Services (CMS) for
American Psychiatry Association and used by use in the US, includes clinical modifications.
a wide range of mental health professionals across Broad coverage of diseases and health problems.
clinical settings. • International Classification of Primary Care
• Human Phenotype Ontology (HPO): Controlled (ICPC): Classification of reasons for encounter,
vocabulary for the phenotypic features encountered diagnoses or problems, and process of care. Devel-
in human hereditary and other diseases, used for the oped by the World Organization of Family
D 580 Disease Ontology
Disease Ontology, Table 3 Some characteristics of 17 disease ontologies (see Table 2 for the definitions of the characteristics)
Component Specialized Human OBO Clinical Definitions Translations Publicly available X-ref
DO No No Yes Yes No Textual No Yes Native
DSM No Yes Yes No Yes None* No No UMLS
HPO No Yes Yes Yes No* Textual No Yes Native
ICD No No Yes No Yes None Yes No UMLS
ICPC No Yes* Yes No Yes None Yes Yes Native to ICD,
UMLS
IDO No Yes Yes Yes No Textual No Yes None
LOINC No Yes Yes No Yes Logical* Yes Yes UMLS
MEDCIN Yes No Yes No Yes None No No UMLS
MedDRA No Yes* Yes No Yes None Yes No UMLS
MeSH Yes No Yes* No No Textual Yes Yes UMLS
MPATH No No No Yes No Textual No Yes None
MPO No No No Yes No Textual No Yes None
NCI Thes. Yes Yes* Yes No Yes* Logical, No Yes Native, UMLS
textual
NDF-RT Yes No Yes No Yes Logical* No Yes UMLS
OMIM No Yes Yes No Yes Textual* No Yes UMLS
PATO No No Yes* Yes No Logical, No Yes None
textual
SNOMED CT Yes No Yes No Yes Logical Yes Yes* UMLS
∗
Refer to additional notes in Table 4
Doctors (Wonca). Coverage of diseases and • Logical Observation Identifiers Names and Codes
health problems at the level of detail required for (LOINC): Set of names and codes for laboratory
primary care. and other clinical observations (elements of clinical
• Infectious Disease Ontology (IDO): Set of ontologies phenotypes). Developed at the Regenstrief Institute.
for specific infectious diseases, including malaria, Coverage restricted to clinical observations.
influenza, and tuberculosis, sharing a core ontology. • MEDCIN: Developed by Medicomp Systems,
Covers entities relevant to both biomedical and clini- MEDCIN is a vocabulary for clinical documenta-
cal aspects of most infectious diseases. Developed tion and a knowledge base for clinical decision
by the Infectious Disease Ontology Consortium. support. It provides coverage for elements
Disease Ontology 581 D
including symptoms, medical history, physical Disease Ontology, Table 5 Ontology repositories providing
examination, tests, and diagnoses. coverage of disease entities
• MedDRA: Created by a pharmaceutical industry trade UMLS Unified Medical Language System – https://uts.nlm.
group, the Medical Dictionary for Regulatory Activ- nih.gov
ities (MedDRA) is a medical terminology used to BioPortal NCBO BioPortal – http://bioportal.bioontology.org/
classify adverse event information associated with
the use of medications, vaccines, and medical devices,
especially for reporting to regulatory agencies. Terminology Standard Development Organization
• Medical Subject Headings (MeSH): Controlled D
(IHTSDO) for use in electronic health records and
vocabulary developed by the US National Library adopted by seventeen countries to date. Broad cov-
of Medicine for the indexing and retrieval of the erage including diseases.
biomedical literature, especially in the MEDLINE
bibliographic database. Broad coverage including Ontology Repositories
diseases. Many of the disease ontologies listed above are present
• Pathbase pathology ontology (MPATH): Ontology in ontology repositories (Table 5), which offer a con-
of mutant and transgenic mouse pathology pheno- venient way of integrating disease resources annotated
types used for the annotation of Pathbase, to different ontologies. The Unified Medical Language
a repository of histopathology images. Developed System (UMLS) is a terminology integration system
by the Pathbase European Consortium. developed by the US National Library of Medicine.
• Mammalian Phenotype Ontology (MPO): Con- The UMLS establishes a correspondence among terms
trolled vocabulary for the annotation of mammalian from different terminologies for a given biomedical
phenotypes, currently used for the annotation of entity. It integrates a number of the terminologies
phenotypic data in mouse and rat databases. Devel- presented above, as well as many other biomedical
oped at the Jackson Laboratory. Coverage restricted terminologies. Developed by the National Center for
to phenotypes. Biomedical Ontology (NCBO), The BioPortal is
• NCI Thesaurus (NCIt): Controlled vocabulary another such repository, which offers mapping
developed by the National Cancer Institute to sup- among terms from different ontologies. The BioPortal
port the integration of information related to cancer provides systematic coverage of the ontologies from
research. Broad coverage including diseases. the Open Biomedical Ontologies (OBO) family, as
• National Drug File-Reference Terminology well as many other ontologies. The NCBO also
(NDF-RT): Reference terminology for medications, indexes resources, such as clinical trials and gene
providing information including pharmacologic expression databases, in reference to ontology entities
class, therapeutic intent, mechanism of action, and from the BioPortal.
physiologic effect. Produced by the US Department
of Veterans Affairs, Veterans Health Administra-
Acknowledgments This research was supported in part by the
tion (VHA). Coverage of diseases through their Intramural Research Program of the National Institutes of Health
relations to drugs (therapeutic intent). (NIH), National Library of Medicine (NLM). This manuscript is
• Online Mendelian Inheritance in Man (OMIM): loosely based on a study of desiderata for disease ontologies,
coauthored with Dr. Anita Burgun and presented to the First
Knowledge base on human genetic diseases devel- International Conference on Biomedical Ontology in 2009.
oped at Johns Hopkins University and available
through the NCBI Entrez system. Coverage
restricted to genetic diseases. References
• Phenotypic Quality Ontology (PATO): Ontology of
phenotypic qualities, intended for use in a number Bodenreider O (2008) Biomedical ontologies in action: role in
of applications, primarily defining composite phe- knowledge management, data integration and decision sup-
notypes and phenotype annotation. Coverage port. Methods Inf Med 47(suppl 1):67–79
Bodenreider O, Stevens R (2006) Bio-ontologies: current trends
restricted to phenotypes.
and future directions. Brief Bioinform 7(3):256–274
• SNOMED CT: The largest clinical terminology, Ceusters W, Smith B (2010) Foundations for a realist ontology of
maintained by the International Health mental disease. J Biomed Semantics 1(1):10
D 582 Disease Progression
Anyela Camargo1 and Jan T. Kim2 dkout = dt ¼ fdp; elim ðkout ; tÞ: (3)
1
Institute of Biological, Environmental and Rural
Sciences, Aberystwyth University, Aberystwyth, Models of disease progression may provide insight into
Ceredigion, UK the biology behind the evolution of a target disease, and
2
School of Computing Sciences, University of East help identify the best treatment that either control or
Anglia, Norwich, Norfolk, UK stop disease progression. The selection of the best
course of action is a function of the effect of a given
treatment on disease status over time. It is therefore
Synonyms paramount that disease progression models include bio-
logical aspects (i.e., genetic, transcription, and cross
Disease system analysis; Disease progression talks) that are involved in the evolution of the disease.
Definition References
Disease progression describes the change of disease Chan PLS, Holford NHG (2001) Drug treatment effects on disease
status over time as function of disease process and progression. Annu Rev Pharmacol Toxicol 41(1):625–659
Dayneka NL, Garg V, Jusko WJ (1993) Comparison of
treatment effects. The status of a subject, such as
four basic models of indirect pharmacodynamic responses.
a patient, may be represented by a numerical J Pharmacokinet Pharmacodynamics 21:457–478.
quantity S (Chan and Holford 2001). In practice, ↗ doi:10.1007/BF01061691
biomarkers are frequently used as a proxy to monitor
disease status. In a healthy subject, mechanisms of
homeostasis ensure that the status S is relatively con- Disease Risk Assessment
stant and remains within a normal range. The change in
disease status shows minimal variation over time t, Sanjeev Kumar1 and Shipra Agrawal2
which mathematically is expressed as dS/dt 0 1
BioCOS Life Sciences Private Limited, Bangalore,
A disease process is characterized by a change of S, Karnataka, India
taking it outside of the normal range. As disease pro- 2
BioCOS Life Sciences Pvt. Limited, Institute of
gresses, the status S continues to move further away Bioinformatics and Applied Biotechnology,
from the normal range, or in terms of change over time, Bangalore, Karnataka, India
dS/dt 6¼ 0.
The change of status S over time can be described
by mathematical expressions that model biological Definition
processes. As an example from Dayneka et al.
(1993), change can be explained in terms of Disease risk assessment could be defined as the sys-
a constant rate of synthesis kin and a first-order process tematic evaluation and identification of ▶ risk factors
of decay or elimination kout: responsible for a disease, estimation of risk levels
and finding possible ways to counter the onset and
dS=dt ¼ kin kout S: (1) progression of a disease within the population.
Disease Risk Assessment 583 D
Disease Risk Assessment, A
Fig. 1 Factors involved in
disease risk assessments: The
Evidence-based
information from evidence- Data
based data (demographic, SNP
population-based data, and markers
family history) is processed to Demographic Population Family
get the SNP markers. The SNP -based history
marker data is used to predict Disease
Disease Risk
the disease risk. An integrated prediction
Assessment
approach from proteomics and tools D
genomics provide clues to
identify predictive biomarkers
(expression), which also help Integrated Predictive
in disease risk assessment approach using biomarkers
B
Proteomics
and Genomics
The following facts are very important and considered investigations, are conducted on a group of people
for disease risk assessment (Fig. 1): to gather evidences on a probable cause for
• Estimation of risk factors involves the evaluation of a disease.
variables and factors which can be indicative of the
likelihood development of a particular disease. For
example, the risk factors such as age, smoking, Disease Risk Assessment is a Challenge for
elevated levels of cholesterol and LDL, low levels Complex Diseases
of HDL, family history of premature coronary heart
disease, etc. are indications of incurring cardiovas- Complex diseases result from the combined effects of
cular disease. multiple genetic and environmental factors and each
• As the current focus has shifted toward preventive such factor alone has only marginal contribution to the
intervention than the disease cure, the risk assess- disease. Type 2 diabetes, coronary heart disease, and
ment involves genotyping of tissues to identify myocardial infarctions are examples of such diseases
“SNP markers” associated with a genetic disease. where genetic profiling has led to limited predictive
The SNP/genetic markers could be used to estimate values. Complete knowledge of causal mechanisms,
the disease susceptibility in a new born/unborn which involves identification of all the possible
child. This becomes useful in administrating the combinations of the causal factors (gene–gene interac-
preventive treatment to the child. tions, gene–environmental interactions), is required.
• Integrated approaches in proteomics and genomics Monogenic diseases like Huntington disease, PKU,
can create a rich resource of “predictive and hereditary cancers where identification of causal
biomarkers” that is, signature molecules, which mechanisms and risk prediction in these diseases are
are biochemically measurable. Such biomarkers comparatively straightforward are caused by DNA
can help in secondary disease prediction (i.e., variations in a single gene. (Teramoto et al. 2008;
whether a diabetic person has a susceptibility of Wei et al. 2009).
developing cardiovascular disease).
• Evidence-based data (▶ Evidence-based Medicine)
such as family history, demographic, and clinical References
data can be used to develop disease prediction tools.
Such predictive models having variables (factors Teramoto T, Ohashi Y, Nakaya N, Yokoyama S, Mizuno K,
Nakamura H, MEGA Study Group (2008) Practical risk
which may influence the disease) gathered from
prediction tools for coronary heart disease in mild to moder-
cohort-based studies can be used in these studies. ate hypercholesterolemia in Japan: originated from the
Cohort-based studies which are analytical MEGA study data. Circ J 72(10):1569–75
D 584 Disease System Analysis
Wei Z, Wang K, Qu HQ, Zhang H, Bradfield J, Kim C, expression and regulation and also establishes phylo-
Frackleton E, Hou C, Glessner JT, Chiavacci R, Stanley C, genetic relationships between different organisms.
Monos D, Grant SF, Polychronakos C and
Hakonarson H (2009) From disease association to risk An upcoming extension of genomic studies is
assessment: an optimistic view from genome-wide associa- transcriptomics which involves identification of
tion studies on type 1 diabetes. PLoS Genetics 5(10): actively expressing genes at any given point of time.
e1000678 Microarray technology is generally used to quantify
the expression levels of various mRNAs and RNA-
Seq provides details about the nucleotide sequence of
transcripts being expressed. Proteomics deals with
Disease System Analysis the comprehensive analysis of the entire complement
of proteins present in a given system. Mass spectrom-
▶ Disease Progression Modeling etry is the main tool used for proteomic studies and it
has revolutionized this field due to its extremely high
sensitivity and diverse qualitative and quantitative
applications. Metabolomics involves the study of the
Disease System, Malaria metabolites (metabolic intermediates, hormones, sig-
naling molecules, etc.) present in a given system at
Pragyan Acharya, Manish Grover and Utpal Tatu a particular time and provides an instantaneous
Department of Biochemistry, Indian Institute of glimpse into the physiology of any system. Bioinfor-
Science, Bangalore, Karnataka, India matics makes use of computational and statistical
approaches to develop algorithms and databases
which help to integrate and analyze the information
Synonyms obtained from genomic, proteomic, and metabolomic
studies.
Bioinformatics; Genomics; Metabolomics; Parasitol-
ogy; Proteomics; Transcriptomics
Characteristics
Definition Malaria
Malaria is a debilitating disease affecting almost
Malaria continues to be an enormous threat to the 200–300 million people worldwide annually. It is
society, and the emergence of drug-resistant parasite caused by the protozoan parasite belonging to the
strains has severely challenged the methods of disease genus Plasmodium, which is carried by the female
prevention and control. In order to combat this situa- Anopheles mosquito and transmitted by mosquito
tion, better understanding of the disease is required to bites in humans. Malaria inflicts mostly children and
identify new drug targets and vaccine candidates. is centered around the tropical and subtropical regions
Hence, we need to expand our research horizon and of the world, being most widespread in Africa, some
move beyond the regular reductionist approach of parts of South America, and Asian countries including
studying a single gene/protein or a biochemical path- India. In humans, malaria is caused by infection of the
way. With the help of high-throughput technologies red blood cell with any of the five species of the
and interdisciplinary tools, a holistic approach toward parasite – Plasmodium falciparum, Plasmodium
the understanding of the disease has been developed vivax, Plasmodium malariae, Plasmodium ovale, and
(Fig. 1). These initiatives commonly referred to as Plasmodium knowlesi. Of these, maximum mortality
“systems biology” are further classified into geno- and morbidity is caused by P. falciparum followed by
mics, proteomics, and metabolomics based on the P. vivax. All types of malaria are associated with
component of the biological system they deal with. febrile episodes with periodic paroxysms and chills in
Genomics is concerned with the study of genomes of addition to nausea, headache, and general weakness.
organisms and majorly involves DNA sequencing. It Population groups such as children, pregnant women,
provides insights into the mechanism of gene HIV/AIDS-infected individuals and travelers to
Disease System, Malaria 585 D
Ring RB
te
zoi C
M ero
Tro
p ho
Lab Cultures +T,+P,+M zo
it
P e
+T,+ +T
Clinical Isolates +T,+P ,+
-P P,+
-T, M
s
+T
yte
,-P
D
t
Metabolomics
toc
on
hiz
pa
Schi
P
Sc
,+
He
+T,+
+T
-P
zo
+T,-P
-T,
ies
Pro
P,+M
nt
ud
teo
St
mic
l
ica
Systems
n
s
cli
Biology of
+T,+
+T,
Mero
-T,-P
Spo
Saliv
+T,-P
Malaria
-P
P,+M
rozo
a
zoite
ry G
Bi
ites
oin
ics
land
for
-T m tom
,-P at
scr
ip ,-P
s
ics -T
n
es e
+T Tra P
yt al
,-P ,+
oc m
+T
et Fe
O -T,-P -T,-P
am nd
oc
G ea
ys +T,-P
+T,-P
al
t
M
acro
and M
Ook
inet Micro tocytes
e Game Human
Mid
gut
Mosquito
Disease System, Malaria, Fig. 1 Life cycle of P. falciparum the particular stage in the parasite’s life cycle (Adapted and
and the systems biology approaches used for understanding the modified from Das A et al., Systems Biology of Malaria: An
disease. “T,” “P,” and “M,” respectively, represent the availabil- Indian Perspective. Biobytes Vol. 5, 2009)
ity of transcriptomic, proteomic, and metabolomic evidence for
malaria endemic areas are at a higher risk for malaria further compounded by the emergence of drug-
due to lower immunity (reviewed by Gracia 2010). resistant strains of the parasite. Both P. falciparum
Malaria is a disease that has been a major research and P. vivax have become resistant to common anti-
focus for several years. Yet, vaccines against malaria malarials and recently new P. falciparum strains
are elusive. The best bet in the development of resistant to artemisinin, one of the most effective
malaria vaccines has been the RTS,S vaccine antimalarial available, have also emerged (reviewed
(Mosquirix) which has been recently demonstrated by Gracia 2010). Traditionally, most studies on
by Joe Cohen, a GlaxoSmithKline research scientist, malaria have focused on the biochemical, cell biolog-
to provide about 50% protection in infants (Olotu ical, and genetics of single gene or protein. In recent
2011). Still, the most popular and effective methods times however, the focus in malaria research has
of controlling malaria are control of its mosquito shifted to the analyses of gene and protein expression
vector and the use of bed nets. However, malaria is at the global level, revealing several interesting fea-
not completely preventable and continues to be an tures of this parasite that may be useful in the discov-
enormous threat to the society. This problem is ery of novel antimalarial drug targets.
D 586 Disease System, Malaria
Disease System, Parkinson’s Disease, Fig. 1 Nigrostriatal pathway in normal and PD brains
higher amounts of oxygen (2) accumulates lipids and the precise relationship, synergy, and temporal order
iron that promote oxidative damage (3) has lower among these pathways are not clear (Fig. 2).
antioxidant defenses. Among the brain cells, neurons
in general and dopaminergic neurons in particular are Systems Biology Applications in
highly susceptible to oxidative damage because they Neurodegeneration
(1) generate oxidative stress during dopamine metab- It could be surmised that designing experiments to
olism (2) have lower levels of the antioxidant ▶ gluta- analyze the comprehensive dynamics of all the events
thione (GSH) and (3) increased iron content. in PD could be difficult. However, systems biology
Mitochondrial damage in SN neurons is caused by based in silico predictive technology can address this
oxidative stress mediated selective inhibition of issue by recapitulating an accurate and simultaneous
mitochondrial complex I (CI) (Bharath et al. 2002; view of individual and interlinked pathways at the
Abou-Sleiman et al. 2006). molecular level. Such a virtual platform should include
Inhibition of protein processing is linked to aggrega- all the relevant proteins and their genes and transcripts
tion of cellular proteins in the SN dopaminergic neurons with their relationship quantitatively represented. The
as intraneuronal protein deposits called “Lewy bodies platform should integrate these species in intra and
(LBs)” (▶ Lewy Bodies, Role of Alpha-syn). LBs rep- intercellular pathways providing a comprehensive
resent a pathological hallmark of PD with a-synuclein view at the cellular and tissue level. The platform
(a-syn) protein as the major component (Shults 2006). should be certified against predefined in vitro and
Neuroinflammatory pathways also contribute to the in vivo studies reported in scientific literature with
degenerative process in PD (Glass et al. 2010). flexibility to include new data. Such a platform could
Although PD involves a complex network of events, assay all intermediate and endpoint biomarkers and
Disease System, Parkinson’s Disease 591 D
Disease System, Parkinson’s Disease, Fig. 2 Interplay among different disease pathways in PD
manipulate different triggers, inhibitors, and activa- stress, proteasomal dysfunction, protein aggrega-
tors. This also includes percentage, knockout, dose– tion, neurotrophin/growth factor signaling, etc.
response, overexpression, and mutational analysis. 3. Amalgamation of signals from all cell types
Different customized studies can be defined and exe- involved, including dopaminergic neurons,
cuted through an interactive graphical user interface microglia, and astrocytes and relevant pathways
mode or a high-throughput approach. covering all disease stages.
Accordingly, a typical in silico PD platform should 4. Incorporation of different triggering factors includ-
include an exhaustive list of ▶ molecular markers ing environmental toxins (rotenone, 1-methyl-4-
representing important pathways in normal physiology phenyl-1,2,3,6-tetrahydropyridine or MPTP
and neurodegeneration as follows: (▶ Parkinson’s Disease, MPTP) and genetic factors
1. Markers that represent physiological aging to dis- (familial mutations) which induce
tinguish aging and neurodegeneration. Aging state neurodegeneration in the aging neurons.
is depicted by markers of oxidative stress, mito- We recently generated such a platform to carry out
chondrial activity, and quantitative changes in insu- simulations linking disease pathways in PD with
lin growth factor, melatonin, homocysteine, etc. experimental validation (Vali and Bharath 2009). The
2. Pathways representing PD state including neurode- methods in model construction are as follows and are
generative and neuroinflammatory events and applicable to similar biological phenomena:
related endpoint biomarkers of mitochondrial dys- A bottom–up approach was adopted to build the plat-
function, oxidative stress, endoplasmic reticulum form. Initially, the various phenomena related to
D 592 Disease System, Parkinson’s Disease
neuronal dynamics were built as base modules and mitochondrial damage with therapeutic potential in
progressively integrated. These modules included: PD. In a third study, modeling envisaged that the
(a) Comprehensive mitochondrial bioenergetics A53T mutant of a-syn can accumulate and disrupt
(b) Generation of reactive oxygen and nitrogen spe- mitochondrial function in dopaminergic neurons. In
cies (ROS/RNS) including dopamine metabolism the presence of proteasome inhibition, mitochondrial
and iron turnover is further reduced, resulting in decreased ATP
(c) GSH and Ca2+dynamics synthesis and in turn decreasing GSH synthesis.
(d) Proteasome inhibition and protein aggregation
Modeling involved a detailed study of the individual
pathways and components. Interactions among these Cross-References
components and their regulation were arranged to
obtain a basic interaction map. The static map was ▶ Aging
made dynamic by incorporating (1) physiological con- ▶ Dopamine
centrations of individual molecules and enzymes/ ▶ Genetic Factor in Parkinson’s Disease
proteins (2) catalytic rates and reaction mechanisms in ▶ Glutathione
flux equations. Most of these data were obtained from ▶ Lewy Bodies, Role of Alpha-syn
published scientific literature involving experimental ▶ Mitochondrial Dysfunction, Parkinson’s Disease
methods. After individual modules were built and vali- ▶ Molecular Markers, In Silico Parkinson’s Disease
dated against published data, the cross-talk among these Platform
phenomena was integrated such that the output from ▶ Oxidative Stress, Protein Damage
a module either created the input to another module or ▶ Parkinson’s Disease, MPTP
provided regulatory control. These interactions cross- ▶ Proteasome Inhibition, Parkinson’s Disease
linked and integrated the base modules thus generating
a complete integrated system. The performance of such
a complex network would be different from isolated References
experimental data obtained from these phenomena.
Abou-Sleiman PM, Muqit MM, Wood NW (2006) Expanding
The modeling of kinetic phenomena such as time- insights of mitochondrial dysfunction in Parkinson’s disease.
dependent potential differences in trans-mitochondrial Nat Rev Neurosci 7:207–219
membrane potential, proton motive force, ion fluxes, Betarbet R, Sherer TB, Di Monte DA, Greenamyre JT
and other metabolic reactions were performed utilizing (2002) Mechanistic approaches to Parkinson’s disease path-
ogenesis. Brain Pathol 12:499–510
modified “Ordinary Differential Equations” and “Mass Bharath S, Hsu M, Kaur D, Rajagopalan S, Andersen J (2002)
Action Kinetics” and the integration of the pathways Glutathione, iron and Parkinson’s disease. Biochem
were solved by the Radau and Euler methods. Such Pharmacol 64:1037–1048
modeling experiments reconcile numerous formerly Burke RE (1998) Parkinson’s disease. In: Koliatsos VE, Ratan
RR (eds) Cell death and disease of the nervous system.
unrelated features of PD in a chronological manner Humana, Totowa, pp 459–475
thus elucidating disease progression based on the com- Chaudhuri KR, Healy DG, Schapira AH (2006) Non-motor
bined molecular actions of different mechanisms. symptoms of Parkinson’s disease: diagnosis and manage-
Using this platform, we carried out few studies ment. Lancet Neurol 5:235–245
Diaz NL, Waters CH (2009) Current strategies in the treatment
exploring the mechanistic and therapeutic aspects of of Parkinson’s disease and a personalized approach to man-
PD (Vali and Bharath 2009). Firstly, we integrated agement. Expert Rev Neurother 9:1781–1789
GSH metabolism and mitochondrial dysfunction asso- Glass CK, Saijo K, Winner B, Marchetto MC, Gage FH
ciated with PD. This study inferred that the mitochon- (2010) Mechanisms underlying inflammation in
neurodegeneration. Cell 140:918–934
drial damage affected the cellular GSH synthesis Hoehn MM, Yahr MD (1967) Parkinsonism: onset, progression
thereby enhancing the oxidative damage and exacer- and mortality. Neurology 17:427–442
bating neurodegeneration. Secondly, we have also Shults CW (2006) Lewy bodies. Proc Natl Acad Sci USA
traced the neuroprotective function of curcumin 103:1661–1668
Vali S, Bharath MMS (2009) Dynamic virtual prototype of
(a polyphenol from turmeric) and its bioconjugates. Parkinson’s disease at the cellular and molecular abstraction
We found that curcumin and its conjugates induced level: focus on neurodegeneration physiology. Biobytes
GSH production, protected against oxidative stress and 5:40–46
Disease-oriented Causal Networks 593 D
variable affects the certainty of the state of another
Disease Taxonomy variable hence these causal networks are graphical
representations of causal relationships between the
▶ Disease Classification or Discrimination variables of the graph. For example, obesity is one of
the factors for developing insulin resistance, which in
turn is the cause for type 2 diabetes.
Disease-oriented Causal Networks
ACGTGGCCGTCAA
Causal SNP
ACGTGGGCGTCAA
Regulatory gene
Protein interaction
information Causal genes
Causal probabilistic
network
Disease-oriented Causal
Networks, Fig. 1 Data
integration and network
modeling to create a causal Gene-expression
probabilistic disease network information
D 594 Diseasome
extended process, which requires that many genes are efficacy. For example, whether a population is evolvable
correctly activated or repressed, that plenty of proteins or not is not independent of contextual factors like
are properly synthesized and interact in the right way migratory abilities and landscape topography. Hence,
with each other. According to the ▶ complexity of the the intrinsic character of the biological disposition
differentiation process, numerous factors could disturb “evolvability” is called into question.
this process and prevent the manifestation. It is hardly
imaginable that one could (even in principle) prepare Single-Track Versus Multi-Track Dispositions
a complete list of all possible interfering factors. Courage, it seems, is a disposition that will be
manifested in different situations by different behaviors.
Intrinsicality It is a multi-track disposition, that is, one disposition
A second instructive debate concerns the with multiple possible manifestations. However, the
▶ intrinsicality of dispositions. Roughly speaking, SCA-tradition has often assumed that dispositions are
a property is intrinsic if a system possesses the property individuated in terms of one set of enabling conditions
independently of what is going on in its context. Shape and one manifestation (single-track dispositions). The
is an intrinsic property, whereas being smaller than drawback is a proliferation of dispositions, for example,
everybody else in the room is an extrinsic (relational) different courage-dispositions – one for each kind of
property. courageous behavior, for example, courage in the face
The rationale for attributing a disposition to of death and courage in the face of financial stress.
a particular system seems to imply that dispositions In biology, there are many possible candidates for
are intrinsic. The rationale is as follows: The phenom- multi-track dispositions: the manifestation of
enon of sugar dissolving in water is, strictly speaking, evolvability for a population can result in different
a property of a combined system – sugar plus water. If changes of gene frequency of a population; the
we describe the phenomenon in terms of a disposition pluripotency of stem cells can become manifest in
being manifest rather than in terms of a property of muscle cells, bone cells, etc. But on closer inspection
a compound system, we usually introduce a distinction it becomes apparent that the characterization of these
between a system (e.g., sugar), which is endowed with dispositions as “multi-track” depends on a fine grained
a disposition, and external, for example, contextual analysis of the manifestation states. If we raise the
conditions. If we ascribe solubility to sugar, then we graininess of the analysis, just one and not multiple
focus on those conditions for obtaining of the phenom- possible manifestation states can be identified. For
ena that are due to sugar only. The disposition (solu- example, the evolvability of a population will be man-
bility) comprises exactly those conditions of the ifest if its gene frequency has changed independent of
phenomenon that the system (sugar) possesses inde- the kind of gene whose frequency has changed and
pendently of what is going on in the context. Thus, independent of the exact dimension of the change.
even though the manifestation of dispositions (e.g., the
dissolving in the case of solubility) depends on extrin- Reduction
sic factors, it is usually held that the disposition itself A further frequently disputed question concerns the
(e.g., the solubility of salt) is intrinsic. But issue of ▶ reduction. Dispositions, such as fragility,
intrinsicality may not be a necessary feature of dispo- are necessary conditions for the obtaining of the man-
sitions (cf. McKitrick 2003; Choi and Fara 2012). The ifestation (provided the simple conditional analysis or
challenge is particularly clear in the case of some something akin is correct). This is often analyzed as:
biological systems: The importance of the context Fragility is causally efficacious (▶ causality) in bring-
undermines the claim that all dispositional properties ing about the manifestation. An ensuing question that
are intrinsic. As Alan Love (2003) has pointed out for has been widely discussed is whether a disposition can
the example of the ▶ evolvability of populations be considered causally efficacious on its own or
(▶ adaptation), in many cases external factors are not whether it is causally efficacious in virtue of an under-
only the enabling conditions for biological dispositions. lying causal basis, such as molecular structure.
Rather, they determine jointly with intrinsic factors the It is important to distinguish two issues in this
very nature of the disposition as well as its causal debate. First, fragility and other every-day dispositions
Distance Transform 597 D
are macroscopic properties. We tend to assume that Mellor DH (2000) The semantics and ontology of dispositions.
macroscopic properties of systems can be reduced to Mind 109:757–780
Mumford S (1998) Dispositions. Oxford University Press,
their molecular structure. A glass, for instance, is frag- Oxford
ile in virtue of its molecular structure. This, however, is Prior E (1985) Dispositions. Aberdeen University Press,
true for dispositional and categorical properties alike. Aberdeen
The glass has its shape (a categorical property) in
virtue of its molecular structure (and/or arrangement)
as well. So this is not a special issue for dispositions.
A second, different, issue is whether there can be Distance Field D
bare dispositions or whether every dispositional prop-
erty needs to be reduced to categorical properties, such ▶ Distance Transform
as the microstructural configuration. The question is ▶ Distance Transform and Travel Depth
whether there might be irreducible dispositional prop-
erties that cannot be identified with a set of categorical
(e.g., microstructural) properties. The physical prop-
erty “charge” or other fundamental dispositions might Distance Map
be candidates for bare dispositions because there
are no microstructural properties that they might be ▶ Distance Transform
identified with. ▶ Distance Transform and Travel Depth
Cross-References
Distance Transform
▶ Adaptation
▶ Causality Virginio Cantoni1,2, Riccardo Gatti1 and
▶ Complex System Luca Lombardi1
▶ Complexity 1
Department of Computer Engineering and Systems
▶ Enabling Conditions Science, University of Pavia, Pavia, Italy
▶ Evolvability 2
Computational Biology, KTH Royal Institute of
▶ Explanation in Biology Technology, Stockholm, Sweden
▶ Intrinsicality
▶ Manifestation
▶ Reduction Synonyms
▶ Simple Conditional Analysis (SCA)
Distance field; Distance map
References
Definition
Armstrong DM, Martin CB, Place UT (1996) Dispositions –
a debate. Routledge, London The Distance Transform (DT) is an operator usually
Bird A (2007) Nature’s metaphysics: laws and properties. applied into the domain of 2D binary image (where
Oxford University Press, Oxford
Carnap R (1936) Testability and meaning. Phil Sci 3:420–471
each point is classified as foreground or background)
Choi S, Fara M (2012) Dispositions. In: Zalta EN (ed) The but it can be extended to 3D domains too. The result of
Stanford Encyclopedia of Philosophy (Spring 2012 Edition), the transform is a new image whose foreground pixels
http://plato.stanford.edu/archives/spr2012/entries/dispositions/ are labeled with a value that represents the minimum
Love A (2003) Evolvability, dispositions, and intrinsicality. Phil
Sci 70:1015–1027
distance from the background.
McKitrick J (2003) A case for extrinsic dispositions. Aust J Phil There are many different types of DT which differ
81:155–174 mainly on the type of metric used to evaluate the
D 598 Distance Transform and Travel Depth
Distance Transform,
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Fig. 1 Distance transform
examples with city block and 0 0 0 1 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0
chessboard metrics
0 0 0 1 2 2 1 1 0 0 0 0 0 1 2 1 1 1 0 0
0 0 1 2 3 3 2 1 0 0 0 0 1 1 2 2 2 1 0 0
0 1 1 2 3 3 3 2 1 0 0 1 1 1 2 3 2 1 1 0
0 0 0 1 2 2 2 2 1 0 0 0 0 1 2 2 2 1 1 0
0 0 0 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
distance. Most commonly used ▶ digital metrics are ▶ convex hull (or a related surface), is called travel
“chessboard” metric, “Euclidean” metric, or “city depth. For the evaluation of this feature, an effective
block” metric (Fig. 1). tool is given by the ▶ distance transform.
The simplest method to obtain a DT is to apply
a sequence of erosion operations from mathematical mor-
phology with a proper structuring element defined through Characteristics
the chosen metric. This sequence of operations must be
performed until all foreground pixels are covered. In biology, the travel depth parameter has been intro-
The DT is applied in skeletonization, shape descrip- duced by Coleman and Sharp (Coleman and Sharp 2006)
tion, and symmetry evaluation processes. (Coleman and Sharp 2010). The travel depth is the value
of the distance associated to each point of a pocket sur-
face from the ▶ convex hull (a 2D example is shown in
Cross-References Fig. 1). The shape and the properties of the molecular
surface determine what interactions are possible with
▶ Distance Transform and Travel Depth ligands or other macromolecules. In particular, the active
sites are generally described as shallow and deep spots on
the molecular volume. Experimentations have shown
that active sites usually correspond with areas on the
Distance Transform and Travel Depth surface having a high travel depth value and thus they
coincide with the bottom of protein’s pockets.
Virginio Cantoni1,2, Riccardo Gatti1 and Luca
Lombardi1 How to Obtain Travel Depth for a Given Protein 3D
1
Department of Computer Engineering and Systems Structure
Science, University of Pavia, Pavia, Italy The first step is to define the reference protein’s vol-
2
Computational Biology, KTH Royal Institute of ume inside a cubic grid of voxels (Cantoni et al. 2010).
Technology, Stockholm, Sweden This reference volume could be, for example, the van
der Walls volume, the solvent excluded surface (SES),
or the solvent accessible surface (SAS). The second
Synonyms step is to build the convex hull of the molecule’s
volume that will define the boundaries in the 3D
Distance field; Distance map space in which to apply the distance transform algo-
rithm. The convex hull of a molecule is the smallest
convex polyhedron that contains the molecule voxels.
Definition In R3 the convex hull is constituted by a set of facets,
usually triangles and a set of ridges (boundary ele-
For the pockets analysis in proteomics, the minimum ments) that are edges. Each triangle that belongs to
distance of a point from a reference surface, that is the the convex hull must then be inside the 3D grid and
Distributed Data Access 599 D
• BCH represents the surface of the convex hull.
• minn2ne(dn + wn) represents the minimum value
among the distances de in the near neighbors
belonging to D already defined, incremented by
the displacement wj between the locations (e, n):
C
that is, if e and n have a common face wn ¼ 1; if
A
e and n have a common edge wn ¼ √2; if e and
D
E n have a common vertex wn ¼ √3. At each iteration,
new voxels, inside R, are reached by the propaga- D
B tion process and the value they take is determined
F
by the neighbor distance (from the convex hull) and
the voxels distance from the neighbor involved; this
in order to simulate an isotropic propagation pro-
cess and the proper distance evaluation.
Distance Transform and Travel Depth, Fig. 1 A 2D • E ¼ Ø corresponds to the regime condition: no other
example. A, B, C, and D are points with high travel depth changes are given and the connected component of
value and possible active sites candidates R, adjacent to the border BCH, is completely
covered.
also all the voxels belonging to the volume of the The travel depth represents the distance of each
convex hull must be labeled inside the grid. The region voxel of A from BCH.
on which the distance transform is applied is called
concavity volume and is obtained by:
Cross-References
R ¼ CH \ RV (1)
▶ Convex Hull
where R is the concavity volume, CH the convex hull, ▶ Distance Transform
and RV the reference molecular volume (e.g., the SES).
Within the region R the following propagation is References
applied:
( ) Cantoni V, Gatti R, Lombardi L (2010) Segmentation of SES for
1 if i 2 BCH protein structure analysis. In: Proceedings of bioinformatics
Di ¼ 2010, Valencia, INSTICC Press: Setubal, PT, pp 83–89
0 otherwise
Coleman RG, Sharp KA (2006) Travel depth, a new shape
A ¼ BCH ; descriptor for macromolecules: application to ligand bind-
ing. J Molecular Biology 362:441–458
N ¼ ðA K Þ \ R; Coleman RG, Sharp KA (2010) Protein pockets: inventory,
E ¼ N A; shape and comparison. J Chem Inf Model 50:589–603
while E 6¼ do
8e 2 E : de ¼ minn2ne ðdn þ wn Þ;
A ¼ N; Distributed Data Access
N ¼ ðA K Þ \ R;
E ¼ N A; Eugenio Cesario
done ICAR-CNR (Istituto di Calcolo e Reti ad Alte
Prestazioni), Rende, CS, Italy
Where:
• A represents the increasing set of voxels contained
in R; E corresponds to the recruited set of near Synonyms
neighbors of A contained in R (i.e., the voxels
reached by the last propagation step). Data collection (integration) from distributed sources
D 600 Distributed Data Access
100000
Sequences
10000
Annotations
1000
100
1982
1984
1986
1988
1990
1992
1994
1996
1998
2000
2002
2004
2006
Year
power, storage, software, and data managed and shared Dowell RD, Jokerst RM, Day A, Eddy SR, Stein L (2001) The
by different organizations. This technology has shown distributed annotation system. BMC Bioinformatics 2:7
Eurogrid (2001) http://www.eurogrid.org/
to be very reliable in solving large-scale bioinformat- Fayyad U, Haussler D, Stolorz P (2006) Mining scientific data.
ics-related problems and improving the efficiency and Commun ACM 39(11):51–57
effectiveness of the computation on biological data Foster I, Kescbrselman C, Nick J, Tuecke S (2003) The physi-
(Cesario and Talia 2010). In the last years, various ology of the grid. In: Berman F, Fox G, Hey A (eds) Grid
computing: making the global infrastructure a reality. Wiley,
international scientific projects have been developed New York, pp 217–249
(and are currently under development) on this field. GenBank (2011) http://www.ncbi.nlm.nih.gov/genbank/
The most important are Euro BioGrid (Eurogrid 2001), Grossman RL, Kamath C, Kegelmeyer P, Kumar V, Namburu
Asia Pacific BioGrid (APBiogrid 2001), UK BioGrid RR (2001) Data mining for scientific and engineering appli-
cations. Kluwer Academic, Dordrecht/Boston
(UKBiogrid 2001), and North Carolina BioGrid Haider S, Ballester B, Smedley D, Zhang J, Rice P,
(NCBiogrid 2001) showing that the Grid is a reliable Kasprzyk A (2009) BioMart Central Portal – unified access
and useful infrastructure for the management and anal- to biological data. Nucleic Acids Res 37:23–27
ysis of distributed biological data. Jenkinson AM, Albrecht M, Birney E, Blankenburg H, Down T,
Finn RD, Hermjakob H, Hubbard TJP, Jimenez RC, Jones P,
Kahari A, Kulesha E, Macı́as JR, Reeves GA, Prlic A (2008)
Integrating biological data – the distributed annotation
Cross-References system. BMC Bioinformatics 9:S3
Meyer F (2006) Genome sequencing vs. Moore’s law: cyber
challenges for the next decade. Trends and tools in bioinfor-
▶ General-Purpose Computation, Graphics Processing matics and computational biology. CTWatch Quart 2(3)
Units http://www.ctwatch.org/quarterly/articles/2006/08/genome-
▶ Grid Computing, Parallelization Techniques sequencing-vs-moores-law/
NCBiogrid (2001) http://www.ncbiogrid.org/
Smedley D, Swertz MA, Wolstencroft K, Proctor G,
Zouberakis M, Bard J, Hancock JM, Schofield P (2008)
References Solutions for data integration in functional genomics:
a critical assessment and case study. Brief Bioinform
APBiogrid (2001) http://compaq.apbionet.org/grid/ 9(6):532–544
Baralis E, Fiori A (2008) Exploring heterogeneous biological Smedley D, Haider S, Ballester B, Holland R, London D,
data sources. In: 19th international workshop on database and Thorisson G, Kasprzyk A (2009) BioMart – biological
expert systems applications, Turin, Italy, pp 647–651 queries made easy. BMC Genomics 10:22
BioInfoBank Library (2010). http://lib.bioinfo.pl/courses/view/160 Tao C (2006) Toward making online biological data machine
Cesario E, Talia D (2010) Using grids for exploiting the abun- understandable. In: Proceedings of the 5th international
dance of data in science. Scalable Comput Pract Exp semantic web conference (ISWC 2006) doctoral consortium,
11(3):251–262 Athens, GA, November 2006, pp 992–993
Distributed Data Management 603 D
Topaloglou T, Davidson SB, Jagadish HV, Markowitz VM, Distributed Data
Steeg EW, Tyers M (2004) Biological data management: Information organized in a semantically related way,
research, practice and opportunities. In: Proceedings of the
thirtieth international conference on very large data bases, also divided in pieces of information with rules to be
Toronto, pp 1233–1236 able in reconstructing them. Pieces of information may
UKBiogrid (2001) http://www.mygrid.org.uk/ represent data duplication (minimizing data loss risks)
or partition (saving space).
Pietro Hiram Guzzi1, Giuseppe Tradigo2 and Availability of data in digital way with an increasing
Pierangelo Veltri2 data precision and quality is increasing the need of
1
Department of Experimental Medicine and Clinic, both support and spaces for data storing as well as
University Magna Gratia of Catanzaro, Catanzaro, procedures and structures for data exchanging. In
Italy such a scenario, the term Distributed Data Manage-
2
Department of Medical and Surgical Sciences, ment refers to a set of methodologies, architectures,
University Magna Græcia of Catanzaro, and tools enabling the efficient management of data
Catanzaro, Italy stored in geographically distributed databases aiming
both to reduce access time and to allow efficient
knowledge extraction. In research contexts, distributed
Synonyms data management refers to a set of technologies
enabling the realization of a distributed laboratory in
Data Management: Data processing; Data storing and which different research unit collaborate performing
querying different experiments on the same project.
Distribution may help in saving spaces and improve
data availability and replication. It is the case of bio-
Definition logical data management where data produced are
huge and information extraction often is a hard task.
Database For example, data produced by many experimental
A database is a set of information semantically orga- platforms used in the biological field have been largely
nized and correlated such that it can be used to guide used in many studies to identify molecules that may be
processes or opportunely combined to form knowledge. related to human diseases (Cannataro et al. 2010).
Databases can be reconducted to database management Computational intensive applications for data manip-
systems that include all procedures necessary to orga- ulation require distributed processing, as in biological
nize, store, index, and query data and to keep data applications, where, thanks to always more accurate
always consistent, that is, always valid. techniques to manipulate or to simulate information
obtained from molecules are available (Cannataro
Distributed Processing et al. 2010), a typical study involves large number of
The distributed processing means solving an samples and huge amount of data. Data distribution
input problem by means of subprocesses evaluated on allows retrieving information in similar way as in
different processing unit. Processing unit may centralized data management structure, allowing scal-
be located on single computational machine or can be ability in terms of data and users, where parallel data
dislocated on different machine wired and located in manipulation from different users allows to improve
the same building or geographically distributed. knowledge of the database.
Main requirements of distributed data storing with
Data Management several nodes each with part of database and part of
Organizing information in data containers for storing, local data are as follows:
saving, and maintaining them allowing querying and • The introduction of a commonly shared data model
information retrieval. able to capture both raw data of the experiment and
D 604 Distributed Query Optimization
Distributed Querying
Mecurial http://mercurial.selenic.com/
Synonyms
Disturbance
Decentralized Version Control; Distributed revision
control; Revision control; Source control; Version ▶ Perturbation
control
Definition
Diversity
Version control is the management of changes to
documents, programs, and other information stored ▶ Diversity (D) Gene
D 606 Diversity (D) Gene
Cross-References
Definition
▶ Chain Type
The diversity (D) gene, or “diversity” is ▶ Configuration Type
a ▶ leafconcept of the “▶ GeneType” concept of iden- ▶ Constant (C) Gene
tification (generated from the ▶ IDENTIFICATION ▶ Conventional Gene
Axiom) of ▶ IMGT-ONTOLOGY, the global refer- ▶ Gene Type
ence in ▶ immunogenetics and ▶ immunoinformatics ▶ IMGT ® Information System
(Giudicelli and Lefranc 1999; Lefranc et al. 2004, ▶ IMGT-ONTOLOGY
2005, 2008; Duroux et al. 2008), built by IMGT ®, the ▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom
international ImMunoGeneTics information system ® ▶ IMGT-ONTOLOGY, Leafconcept
(http://www.imgt.org) (▶ IMGT ® Information Sys- ▶ Immunogenetics
tem). “Diversity” identifies a gene that rearranges ▶ Immunoglobulin Synthesis
at the DNA level and codes the diversity region of ▶ Immunoinformatics
the variable domain of an immunoglobulin (IG) ▶ Joining (J) Gene
or antibody or of a T cell receptor (TR) chain ▶ Recombination Signal (RS)
(▶ Chain Type). ▶ Variable (V) Domain
It is one of the four leafconcepts that are ▶ Variable (V) Gene
characteristics of the IG and TR loci (Lefranc
and Lefranc 2001a, b), the other three being “vari-
able” (V), “joining” (J), and “constant” (C) (▶ Vari-
References
able (V) Gene, ▶ Joining (J) Gene, ▶ Constant (C)
Gene). Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, Lefranc M-P,
The diversity (D) genes are observed in three loci: Giudicelli V (2008) IMGT-Kaleidoscope, the Formal IMGT-
IG heavy (IGH), TR beta (TRB), and TR delta (TRD), ONTOLOGY paradigm. Biochimie 90:570–583
Giudicelli V, Lefranc M-P (1999) Ontology for immunogenet-
where they participate to V-D-J rearrangements ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–1054
(Lefranc and Lefranc 2001a, b) (▶ Immunoglobulin Lefranc M-P, Lefranc G (2001a) The immunoglobulin
Synthesis). An IG or TR diversity (D) gene has three FactsBook. Academic Press, London, pp 1–458
possible and exclusive configurations (▶ Configura- Lefranc M-P, Lefranc G (2001b) The T cell receptor FactsBook.
Academic Press, London, pp 1–398
tion Type): a germline configuration (before DNA
Lefranc M-P, Giudicelli V, Ginestoux C, Bose N, Folch G,
rearrangement), a partially rearranged configuration Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D,
(after D-J DNA rearrangement or, less frequently, Thouvenin V, Combres K, Girod D, Jeanjean S, Protat C,
V-D or D-D rearrangements), and a rearranged config- Yousfi Monod M, Duprat E, Kaas Q, Pommié C, Chaume D,
Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics
uration (after a complete V-D-J DNA rearrangement,
and immunoinformatics. In Silico Biol 4:17–29
that eventually may involve several D). In the germline Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P,
configuration, a diversity (D) gene possesses in 50 Coelho I, Combres K, Ginestoux C, Giudicelli V, Chaume D,
DNA Methylation 607 D
Lefranc G (2005) IMGT-Choreography for immunogenetics
and immunoinformatics. In Silico Biol 5:45–60 DNA Damage Checkpoint
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT,
a system and an ontology that bridge biological and
computational spheres in bioinformatics. Brief Bioinform ▶ Cell Cycle Checkpoints
9:263–275
DNA Labeling
Dividing Cells Depletion
▶ Modeling, Cell Division and Proliferation
▶ Quantifying Lymphocyte Division, Methods
Yan Zhang
▶ DNA Microarrays
Key Laboratory of Systems Biology, Shanghai
Institutes for Biological Sciences, Chinese Academy
of Sciences, Shanghai, China
DNA Content
Characteristics
Definition
The DNA in many different types of organisms can
DNA is modified by numerous chemico-physical
undergo DNA methylation, though it does not always
agents causing a variety of lesions on the DNA
necessarily serve the same function. In plants, for exam-
molecule.
ple, scientists believe that methylation occurs to deacti-
vate genes that could otherwise cause harmful
Cross-References mutations. In fungi, DNA methylation is used to mod-
erate and control the expression of certain genes based
▶ Cell Cycle Checkpoints on the particular conditions affecting the fungus.
D 608 DNA Methylation
CH3
T T C G A T T A C G A T T C G A T T A C G A
5’ 3’ 5’ 3’
3’ 5’ 3’ 5’
A A G C T A A T G C T A A G C T A A T G C T
CH3
Methylation Methylation
CH3 CH3
T T C G A T T A C G A T T C G A T T A C G A
5’ 3’ 5’ 3’
3’ 5’ 3’ 5’
A A G C T A A T G C T A A G C T A A T G C T
CH3 CH3
Not recognized by
DNA methyltransferase
Methylation in mammals similarly moderates and characteristics necessary for multicellular life from
inhibits the expression of certain genes; additionally, it a single immutable sequence of DNA. DNA methyla-
is involved in the production of chromatin, a protein- tion also plays a crucial role in the development of
DNA complex that makes up the structure of nearly all types of cancer (Jaenisch and Bird 2003).
chromosomes. DNA methylation at the 5 position of cytosine has
DNA methylation stably alters the gene expression the specific effect of reducing gene expression and has
pattern in cells. DNA methylation is typically removed been found in every vertebrate examined. In adult
during zygote formation and reestablished through somatic tissues, DNA methylation typically occurs in
successive cell divisions during development. How- a CpG dinucleotide context; non-CpG methylation is
ever, the latest research shows that hydroxylation of prevalent in embryonic stem cells (Lister et al. 2009).
methyl group occurs rather than complete removal of
methyl groups in zygote (Iqbal et al. 2011). Some
methylation modifications that regulate gene expres-
References
sion are inheritable and are referred to as epigenetic
regulation (Fig. 1). Iqbal K, Jin SG, Pfeifer GP, Szabó PE (2011) Reprogramming of
In addition, DNA methylation suppresses the expres- the paternal genome upon fertilization involves genome-
sion of viral genes and other deleterious elements that wide oxidation of 5-methylcytosine. Proc Natl Acad Sci
USA 108(9):3642–3647
have been incorporated into the genome of the host over
Jaenisch R, Bird A (2003) Epigenetic regulation of gene expres-
time. DNA methylation also forms the basis of chroma- sion: how the genome integrates intrinsic and environmental
tin structure, which enables cells to form the myriad signals. Nat Genet 33(suppl (3s)):245–254
DNA Microarrays 609 D
Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon G, Tonti- Definition
Filippini J, Nery JR, Lee L, Ye Z, Ngo QM, Edsall L,
Antosiewicz-Bourget J, Stewart R, Ruotti V, Millar AH,
Thomson JA, Ren B, Ecker JR (2009) Human DNA This term refers to hybridization-based analytic plat-
methylomes at base resolution show widespread epigenomic forms composed of hundreds to millions of DNA
differences. Nature 462(7271):315–322 probes with defined sequences arrayed on a solid sur-
face to measure, in parallel, nucleic acids present in
a complex sample (Wheelan et al. 2008). The sample is
first labeled with a fluorescent dye and then hybridized
DNA Microarrays to the DNA microarray. After hybridization and wash- D
ing, the fluorescence signal intensities from each probe
J€
urg B€ahler and Samuel Marguerat are recorded using a laser scanner, providing
Department of Genetics, Evolution & Environment a semiquantitative measure for the amount of nucleic
and UCL Cancer Institute, University College London, acid molecules whose sequence is complementary to
London, UK any given probe. One or two samples can be hybridized
on a single microarray. When two samples are ana-
lyzed together, they are labeled with different dyes and
Synonyms the relative intensities of the two dyes are analyzed for
each probe (two-color array; Fig. 1). When only one
Arrays; cDNA microarrays; DNA chip; Microarrays; sample is analyzed, the absolute signals of the probes
Oligo microarrays are used instead. Microarray probes are either PCR
dye A
Culture A Culture B ratio
dye B
RNA
extraction
Microarray
scanning
sm3445
Microarray
cDNA synthesis
and labelling
+
Microarray
hybridisation
+
dye A dye B
Synonyms
DNA Polymerases
DNA synthesis
Zoi Lygerou
School of Medicine, Laboratory of General Biology,
University of Patras, Patras, Greece Definition
DNA Replication,
Fig. 1 The eukaryotic DNA Fork movement
replication fork
MCM
3’
PCNA GINS 5’
Pol d
Cdc45
Lagging strand
5’
3’ RP-A
Okazaki fragment
Pol e
PCNA
d
an
str
ing
5’ ad
Le
3’
Dynamic Origin-Bound Complexes Safeguard The Cell Cycle Control System Orchestrates
Once per Cell Cycle Replication Transitions
Every part of the genome must be replicated once and DNA replication must be accurately controlled in
only once per cell cycle. This is ensured by ordered space and time and must be coordinated with cell
transitions in protein complexes bound at origins of cycle progression. How are the ordered transitions of
replication (Blow and Dutta 2005, Fig. 2). When mito- origin-bound complexes brought about at the correct
sis is completed and cells move into a new G1 phase, point in time during the cell cycle? Cyclin dependent
all putative origins become licensed (▶ DNA Replica- kinases (CDKs) (▶ Cyclins and Cyclin-dependent
tion Licensing) for a round of replication by the assem- Kinases), the master regulators of the cell cycle,
bly of ▶ prereplicative complexes onto origins, cross-talk with origin-bound complexes to ensure
consisting of the six subunit origin recognition com- once per cell cycle replication (Blow and Dutta 2005,
plex (ORC), the loading factors Cdc6/Cdc18 and Cdt1, Fig. 2). When mitosis is completed, CDK activity
and the six subunit MCM complex, which will later act levels are low. Only then can pre-replicative com-
as the replicative helicase. As cells move into S-phase, plexes assemble at origins of replication (window of
specific origins are activated by modifications and opportunity). Increase in CDK activity levels at the
recruitment of additional factors (Cdc45, the GINS G1/S transition signals conversion of pre-replicative
complex, Dpb11/TopBP1, Sld2/RecQL4, Sld3/ complexes to pre-initiation complexes and entry into
Treslin, Mcm10, and others) which turn the pre- S-phase. CDK activity levels over the G1/S transition
replicative complex into a pre-initiation complex, threshold however inhibit further assembly of pre-
lead to activation of the helicase, recruitment of poly- replicative complexes, ensuring that licensing will
merases, and origin firing. Complexes which remain at only take place again after mitosis has been completed
origins after firing (or passive replication from a fork and CDK activity levels have dropped (▶ Cell Cycle
coming in from a nearby origin) are at the post- Transitions, Mitotic Exit). The CDK cycle therefore
replicative state, and cannot support replication initia- restricts licensing and replication to different windows
tion again until mitosis has been completed and a new of the cell cycle, guarding against re-replication. To
round of licensing has taken place. It is therefore the ensure that mitosis only occurs after DNA replication
composition of complexes along the DNA which dic- has been completed, replication complexes signal to
tate when and where replication will initiate. the cell cycle control system to inhibit CDK activity
DNA Replication 613 D
DNA Replication, Replication
Fig. 2 DNA replication
licensing
Replisome
MCM
Low CDK High CDK
Pre-replicative Complex Post -replicative Complex
from reaching the levels required for mitotic entry single-cell level and statistically analyze the properties
(▶ G2/M Checkpoint) until every part of the genome of the process at the population level. Spiesser et al.
has been replicated. Defects in the cross-talk between (2009) modeled replication in Saccharomyces deter-
CDKs and origin-bound complexes can lead to over- ministically using data for location and firing times of
replication of the genome or a catastrophic entry into a fraction of replication origins. De Moura et al. 2010
mitosis with unreplicated DNA. developed a stochastic model of DNA replication
which was employed for quantitative analysis of the
Mathematical Models of DNA Replication dynamics of replication of chromosome VI of S.
As eukaryotic DNA replication is characterized by cerevisiae, while Yang et al. 2010 presented an ana-
a high degree of uncertainty, both in the location and lytical model of DNA replication with which replica-
in the time of activation of origins along the genome, tion timing across the S. cerevisiae genome was
mathematical models have been employed to capture analyzed. A model to analyze replication fork failure
how DNA replication may be progressing in each cell in metazoan cells was developed by Blow and Ge
in a population and to allow accurate interpretation of 2009. The model assumes stochastic origin activation
experimental data (reviewed in Hyrien and Goldar in a cluster of 5–100 potential origins on a circular
2010). One of the first models to be developed was 250 kb DNA molecule, modeling replication in a series
a stochastic model for DNA replication based on the of discrete time steps.
KJMA model of phase transition kinetics, originally
used to analyze single-molecule data of DNA replica- Uncertainty and Robustness in DNA Replication
tion in cell-free extracts of Xenopus laevis embryos Model predictions and single cell experiments indicate
and further exploited for analyses of DNA replication that random selection and activation of origins of rep-
dynamics (Herrick et al. 2002; Hyrien and Goldar lication results in an exponential distribution of dis-
2010). A stochastic hybrid model of eukaryotic DNA tances between active origins. Such a distribution
replication incorporating exact locations and firing would produce infrequent large inter-origin gaps,
propensities of origins along a complete genome was which would need a long time to be replicated. This
proposed by Lygeros et al. 2008. The model was complication of random origin selection has been
instantiated using experimental data for Schizosac- named the random completion or random gap prob-
charomyces pombe and Monte Carlo simulations lem. Several hypotheses have been proposed to resolve
were used to reproduce full genome replication at the this paradox (reviewed in Legouras et al. 2006;
D 614 DNA Replication
Hyrien and Goldar 2010), including redistribution of the process of DNA replication is organized in time
a limiting factor, defined spacing of active origins, or and space and how it cross-talks with other cellular
origin redundancy. It has also been suggested that processes, such as transcription and the inheritance of
S-phase may in fact last longer than previously epigenetic states.
assumed, occupying much of what is currently thought
of as the G2 phase of the cell cycle (Lygeros et al. 2008).
Given the uncertainty inherent in stochastic origin Cross-References
selection, why have eukaryotes not opted for
a deterministic mode of origin selection? It is likely ▶ Cell Cycle
that stochastic origin selection offers robustness because ▶ Cyclins and Cyclin-dependent Kinases
of redundancy (Legouras et al. 2006): there are many ▶ DNA Polymerases
more origins ready to fire than those actually required to ▶ DNA Replication Licensing
complete S-phase which can be used if need arises. One ▶ Prereplicative Complex
example of this is DNA damage: when cells experience ▶ Replication Fork
DNA damage and active forks arrest, the presence of ▶ Replication Origin
dormant origins becomes essential for the completion of ▶ Replisome
DNA replication (Blow and Ge 2009). Differences in
transcriptional programs in different cell types offers
another example: transcription cross-talks with origin
selection and origin usage changes under different met-
References
abolic conditions or differentiation. The excess in puta- Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walter P
tive origins may therefore ensure timely completion of (2007) Molecular biology of the cell, 5th edn. Garland
replication under various, often adverse, conditions. Science, New York
Blow JJ, Dutta A (2005) Preventing re-replication of chromo-
somal DNA. Nat Rev Mol Cell Biol 6:476–486
Higher-Order Organization of DNA Replication Blow JJ, Ge XQ (2009) A model for DNA replication showing
Within the Cell Nucleus how dormant origins safeguard against replication fork fail-
We have thus far considered DNA replication as ure. EMBO Rep 10:406–412
a linear process along the DNA. DNA is however de Moura APS, Retkute R, Hawkins M, Nieduszynski CA
(2010) Mathematical modelling of whole chromosome rep-
tightly packaged within the cell nucleus and DNA lication. Nucleic Acids Res 38:5623–5633
replication is topologically organized, providing an DePamphilis ML (2006) DNA replication and human disease.
additional level of regulation. Replication origins CSHL Press, Cold Spring Harbor
appear to fire in clusters (of 6–12 active origins within Gilbert DM (2004) In search of the holy replicator. Nat Rev Mol
Cell Biol 5:848–855
an approximately 1 Mb region) which are co-regulated Herrick J, Jun S, Bechhoefer J, Bensimon A (2002) Kinetic
and are visible within the cell nucleus as replication model of DNA replication in eukaryotic organisms. J Mol
foci (or factories). Such a topological organization Biol 320:741–750
may offer a number of advantages: sequestration of Hyrien O, Goldar A (2010) Mathematical modelling of eukary-
otic DNA replication. Chromosome Res 18:147–161
replication proteins within factories may help increase Legouras I, Xouri G, Dimopoulos S, Lygeros J, Lygerou Z
their local concentration facilitating the kinetics of (2006) DNA replication in the fission yeast: robustness in
DNA replication; co-replication of origins within the face of uncertainty. Yeast 23:951–962
a similar chromatin context may facilitate the inheri- Lygeros J, Koutroumpas K, Dimopoulos S, Legouras I, Kouretas P,
Heichinger C, Nurse P, Lygerou Z (2008) Stochastic hybrid
tance of epigenetic modifications at a given locus; modeling of DNA replication across a complete genome. Proc
local organization in clusters provides the possibility Natl Acad Sci USA 105:12295–300
for differential regulation within a cluster (e.g., local Spiesser TW, Klipp E, Barberis M (2009) A model for the
firing of dormant origins in the vicinity of DNA dam- spatiotemporal organization of DNA replication in Saccha-
romyces cerevisiae. Mol Genet Genomics 282:25–35
age) and outside the cluster (e.g., global inhibition of Yang SC, Rhind N, Bechhoefer J (2010) Modeling genome-wide
replication when DNA damage is present elsewhere in replication kinetics reveals a mechanism for regulation of
the genome). Future work will hopefully elucidate how replication timing. Mol Syst Biol 6:404
Doob–Gillespie Algorithm 615 D
References
DNA Replication Licensing
Maxam AM, Gilbert W (1977) A new method for sequencing
DNA. Biochemistry 74(2):560–564
Zoi Lygerou
School of Medicine, Laboratory of General Biology,
University of Patras, Patras, Greece
DNA Synthesis
Definition D
▶ DNA Replication
DNA replication licensing is the process of
marking origins of replication as competent for
a new round of replication. It takes place in late
mitosis and G1 and consists of the loading of Document Classification/Categorization
the hexameric MCM complex, which will later
act as the replicative helicase, onto origins of replica- ▶ Text Classification
tion. It requires the origin recognition complex
(a six subunit complex which binds to origin
DNA) and the MCM loading factors Cdc6/Cdc18
and Cdt1. Document Retrieval
▶ Information Retrieval
Cross-References
▶ DNA Replication
DOE
▶ Design of Experiments
DNA Sequencing
Synonyms
Domain Type
Deoxyribonucleic acid sequencing
▶ IMGT-ONTOLOGY, DomainType
Definition
Pharmaceutical discovery
▶ Database of Quantitative Cellular Signaling Drug discovery is “the process of identifying chemical
(DOQCS) entities that have the potential to become therapeutic
agents” (Decker and Sausville 2007).
There are two different approaches to drug discov-
ery: empirical and rational. Empirical drug discovery
involves finding a compound that produces a desired
DOR therapeutic effect in vitro. Initially, there is no
understanding of the candidate drug’s mechanism of
▶ Dense Overlapping Regulons action. In rational drug discovery, on the other hand,
the target is known from the beginning; scientists then
attempt to find or design compounds which would
interact with the target of interest (Decker and
Sausville 2007).
Dosage Compensation
▶ Biological Activity
▶ Drug Target
Doubling Time ▶ Natural Product Resources
References
Handorf T, Ebenh€
oh O, Heinrich R (2005) Expanding metabolic
networks: scopes of compounds, robustness, and evolution.
J Mol Evol 61:498–512
Schwartz JM, Nacher JC (2009) Local and global modes
Drug Scope, Metabolic of drug action in biochemical networks. BMC Chem
Biol 9:4
Jean-Marc Schwartz
Manchester Institute of Biotechnology, Faculty of Life
Sciences, University of Manchester, Manchester, UK
Drug Target
Definition
Riza Theresa Batista-Navarro
The metabolic drug scope is an extension of the National Centre for Text Mining, Manchester
concept of scope in metabolic pathways. The scope Interdisciplinary Biocentre, Manchester, UK
of a metabolic compound is the set of all compounds
that can be generated in principle by transformations of
the seed compound, irrespective of kinetic and ther- Synonyms
modynamic laws that determine the rate at which these
transformations might actually take place (Handorf Molecular drug target; Target
et al. 2005). When a set of several seed compounds is
considered, the resulting scope is the set of all
compounds that can be generated by transformations Definition
and combinations of the seeds.
The scope of metabolic compounds is generated A drug target is a macromolecule that undergoes
through an expansion process. The principle used is a specific interaction (e.g., binding, inhibition)
that for any reaction to take place, all necessary sub- with a drug. A target is linked to a specific disease;
strates must be present. Starting from the seed com- the interaction between a drug and a target is
pounds, products generated by metabolic reactions expected to elicit an effect on the course of the
using the seeds are iteratively added, until no further disease.
reaction is possible. There are four main types of drug targets: proteins,
By extension, the metabolic drug scope is the scope polysaccharides, lipids, and nucleic acids. Among
generated by the enzymatic targets of a drug. The them, proteins are considered the best source of drug
metabolic drug scope is constructed by the expansion targets as most drugs have been shown to interact with
of a set of seeds containing the substrates and products them. Proteins can be further divided into seven fam-
of all metabolic reactions targeted by the drug. Essen- ilies: G-protein-coupled receptors, ion channels, pro-
tially, the metabolic drug scope represents the largest tein kinases, zinc metalloproteases, serine proteases,
Dynamic Bayesian Networks 619 D
nuclear hormone receptors, and phosphodiesterases
(Giegel et al. 2007). Drug Targets
Some targets belong to or are associated with path-
ogenic organisms (e.g., bacteria, viruses, and fungi), ▶ Epigenetics, Drug Discovery
which cause infectious diseases (Gies and Landry ▶ Host–Pathogen Systems, Target Discovery
2008). ▶ Pharmacogenomics
Cross-References D
Drugs
▶ Natural Product Resources
▶ Xenobiotics
References
Giegel DA, Lewis AJ, Worland P (2007) Diversity versus focus Duration Analysis
in choosing targets and therapeutic areas. In: Taylor JB,
Triggle DJ (eds) Comprehensive medicinal chemistry II.
Elsevier, Oxford, pp 753–770 ▶ Survival Analysis
Gies J, Landry Y (2008) Molecular drug targets. In: Wermuth ▶ Survival Analysis, Fundamental Statistical
CG (ed) The practice of medicinal chemistry, 3rd edn. Techniques
Elsevier Science and Technology, Amsterdam, pp 85–105
Definition Synonyms
Target of a drug that is not the primary target, which Dynamic conditional independent graphs
may give rise to undesirable pathophysiology leading
to adverse events.
Definition
maximize v4
subject to v1 v2 a1 v4 ¼ 0 (3)
Dynamic Metabolic Flux Analysis, Fig. 1 A generic linear v 2 v 3 a2 v 4 ¼ 0
pathway with feedback inhibition by the product (X2)
Notably, no explicit accounts of metabolite concen-
trations are involved in FBA, and the result exclusively
distribution with methods of MFA or FBA and then to addresses fluxes.
infer the kinetic parameters based on concentration As an alternative, one can construct an ODE
measurements and functional descriptions of individ- model using traditional enzyme kinetics. For instance,
ual reactions. Such a merger of two otherwise distinct assuming that all reactions follow the classic
modeling techniques has the following attraction: not Michaelis–Menten type kinetics and that the inhibition
only is the resulting model more accurate at the level by X2 is competitive, the corresponding ODE model is
of fluxes, but one also gains information on metabo- formulated as:
lite concentrations. This type of merger is greatly
facilitated by using canonical models within the X0 ¼ constant
framework of Biochemical Systems Theory (BST) V1 X0 V2 X1
X_ 1 ¼ v1 v2 ¼
(Savageau 1976; Voit 2000) because mechanistic X0 þ K1 ð1 þ X2 =K4 Þ X1 þ K2 (4)
assumptions regarding the enzymatic processes can V2 X 1 V 3 X2
X_ 2 ¼ v2 v3 ¼
be minimized. X1 þ K2 X2 þ K3
Let us illustrate the idea with a linear pathway
(Fig. 1). Suppose a fixed amount of substrate X0 is Solving Eq. 4 requires knowledge of the initial
converted into product X2 in two steps, with X2 metabolite concentrations and of all kinetic parameters
inhibiting the synthesis of the intermediate X1 through (Vi and Ki). In practice, these are often determined
feedback. using purified enzymes and thus prone to uncertainties
In FBA, the assumption of a metabolic quasi-steady due to the fact that in vivo systems are quite different
state leads to the following flux balance equations: from in vitro experiments. A novel means of parameter
estimation, thanks to the advent of metabolic time-
v1 v2 ¼ 0 series profiles, is to substitute the differentials on the
(1)
v2 v3 ¼ 0 left-hand side of Eq. 4 with estimated slopes at discrete
time points, transform the coupled system of differen-
Equation 1 is underdetermined since the number of tial equations into several sets of decoupled algebraic
fluxes exceeds the number of metabolites (equations) equations, and identify the optimized values of param-
where the steady-state constraints are imposed (i.e., eters via regression (Voit and Almeida 2004). For
X1 and X2). Therefore, infinitely many solutions exist. some metabolic systems, we can even derive the
In this case, a particular solution may be obtained if dynamic flux profiles from the slope data in the first
one of the three fluxes can be directly measured, as it place and then estimate the parameters on a flux basis
might be the case for substrate uptake (v1) or product (Goel et al. 2008).
formation (v3). If none of the three fluxes is measur- In cases where the time-series metabolic profiles are
able, one might be able to make reasonable assump- not accessible, the steady-state flux distributions, as
tions regarding the biomass composition of the system, determined by MFA or FBA, can be valuable for
such as: parameter estimation. The idea is that instead of
using the estimated fluxes at multiple time points
v4
a1 X1 þ a2 X2 ! Biomass; (2) within one experiment, one may slightly perturb the
system many times and record or predict the steady-
where a1 and a2 represent stoichiometric coefficients state responses. With flux and concentration data from
for the two species. Linear optimization can be used to multiple perturbation experiments, the task is again
Dynamic Metabolic Flux Analysis 623 D
reduced to fitting parameters individually for each flux. ln n2
To illustrate the point, it is convenient to model the
linear pathway in Fig. 1 with power-law functions, as
proposed in BST:
1 f2,1
2
X0 ¼ constant 3
X_ 1 ¼ v1 v2 ¼ g1 X 01;0 X 21;2 g2 X 12;1 :
f f f
(5)
X_ 2 ¼ v2 v3 ¼ g2 X12;1 g3 X23;2
f f
ln g2 D
ln X1
In Eq. 5, each flux has two or three unknown param-
eters (rate constants gi and kinetic orders fi,j). Thus, we Dynamic Metabolic Flux Analysis, Fig. 2 Estimation of f2,1
need flux and concentration data at three or more and g2 using data from three observed steady states. The colored
different steady states for parameter estimation. This circles refer to the log-transformed values (ln X1, ln v2) taken
argument is valid, because a flux, say v2, reduces to either from the nominal steady state (1) or from new steady states
where, for instance, the input substrate X0 is increased (2) or the
a linear equation, when we take logarithms of the activity of enzyme catalyzing v3 is decreased (3)
power-law representation:
ln v2 ¼ ln g2 þ f2;1 ln X1 : (6)
We can see that the kinetic order f2,1 is the slope in it was shown that even without a comprehensive set of
the plot of ln v2 versus ln X1, while the logarithm of concentration data, the conversion can still be accom-
the rate constant ln g2 is the y-intercept (Fig. 2). plished through a combination of model reduction and
If experimentally feasible, it is important that enough optimization methods (Lee and Voit 2010).
data are obtained to allow a statistical regression
analysis.
Many experimental strategies are available for arti- Cross-References
ficially driving the pathway of interest to different
metabolic steady states. For instance, one may slightly ▶ 13C Metabolic Flux Analysis
perturb the system, which is initially at a nominal ▶ Flux Balance Analysis
steady state, by changing the amount of substrate fed ▶ Metabolic Flux Analysis
into the pathway. Another approach is to genetically ▶ Metabolic Networks, Structure and Dynamics
modify the activity of pathway enzymes, for instance, ▶ Metabolic Pathway Analysis
through a gene knockdown experiment. In the latter ▶ Ordinary Differential Equation (ODE)
approach, one must also measure the relative enzyme ▶ Optimization Algorithms for Metabolites
activities compared to those in the control to adjust for Production
the bias introduced to the rate constants. ▶ Parameter Estimation, Metabolic Network
Overall, the transformation of steady-state, flux- Modeling
based models into dynamic, kinetics-based models ▶ Pathway Modeling, Metabolic
requires large amounts of concentration measure-
ments, which despite the substantial improvements
that have recently been made in the field of References
metabolomics, remains a challenge. The issue is espe-
Aiba S, Matsuoka M (1979) Identification of metabolic model:
cially significant in plant biology: not only do plants citrate production from glucose by Candida lipolytica.
synthesize a far more diverse array of metabolites than Biotechnol Bioeng 21:1373–1386
do animals and microorganisms, but we also lack Goel G, Chou I-C, Voit EO (2008) System estimation from
metabolic time-series data. Bioinformatics 24:2505–2511
the ability to identify the majority of signals from
Lee Y, Voit EO (2010) Mathematical modeling of
metabolic profile data (Saito and Matsuda 2010). monolignol biosynthesis in Populus xylem. Math Biosci
Nevertheless, even in this complicated case of plants, 228:78–89
D 624 Dynamic Metabolic Networks, k-Cone
Palsson BØ (2006) Systems biology: properties of reconstructed under external perturbations or internal gene dele-
networks. Cambridge University Press, Cambridge tions. In the assumption that linear perturbations
Saito K, Matsuda F (2010) Metabolomics for functional geno-
mics, systems biology, and biotechnology. Annu Rev Plant occur around a metabolic steady state, one can expect
Biol 61:463–489 that even though the perturbation induces changes
Savageau MA (1976) Biochemical systems analysis: a study of in the metabolic concentrations of the network,
function and design in molecular biology. Addison-Wesley, the system recovers its initial metabolic profile. The
Reading
Schaub J, Mauch K, Reuss M (2008) Metabolic flux analysis in dynamical description of this relaxation process is
Escherichia coli by integrating isotopic dynamic and described by:
isotopic stationary 13C labeling data. Biotechnol Bioeng
99:1170–1185 dm 0
Stephanopoulos G (1999) Metabolic fluxes and metabolic engi- ¼J m (1)
neering. Metab Eng 1:1–11 dt
Voit EO (2000) Computational analysis of biochemical systems: 0
a practical guide for biochemists and molecular biologists. where J is a diagonal matrix whose entries are the
Cambridge University Press, Cambridge eigenvalues of the Jacobian matrix while m is a vector
Voit EO, Almeida J (2004) Decoupling dynamical systems for
pathway identification from metabolic profiles. Bioinformat-
that specifies the modes of the system, i.e., the set of
ics 20:1670–1681 metabolites whose concentrations dynamically corre-
Voit EO, Alvarez-Vasquez F, Sims KJ (2004) Analysis of late at each timescale. Timescale distribution is defined
0
dynamic labeling data. Math Biosci 191:83–99 by the negative inverse of J eigenvalues.
Wiechert W (2001) 13C metabolic flux analysis. Metab Eng
Even though this theoretical framework can be
3:195–206
straightforwardly applied to genome-scale metabolic
reconstructions, a major limitation in dynamic model-
ing is the current lack of kinetic information. With
the purpose to overcome this limitation, the k-cone
Dynamic Metabolic Networks, k-Cone has been suggested as an appealing scheme for
exploring dynamic behavior of genome-scale recon-
Isaac F. López-Moyado and structions. This formalism refers to the feasible
Osbaldo Resendis-Antonio space of kinetic parameters that ensures a steady-
Center for Genomic Sciences-UNAM, Universidad state behavior in metabolic networks. From
Nacional Autónoma de México, Cuernavaca, a mathematical point of view, this space is defined
Morelos, Mexico through the next equation:
S diagðCÞ k~ ¼ 0 (2)
Synonyms
where S is the stoichiometric matrix and diag(C)
Feasible parameter space; k-cone space; Modal is a diagonal matrix whose entries are determined
analysis by a function of metabolic concentrations at steady
state (C ¼ Pxi jSij j , where SRij refers to the reactant
R
Dynamic Metabolic Networks, k-Cone, Fig. 1 General over- parameters ensuring a steady state in the metabolic systems
view. (a) The metabolic state of a system can be obtained from a Jacobian library can be calculated. (d) Finally, the modal
metabolome data. (b) From this information and assuming that library is recovered from the diagonalization of the Jacobian
law mass action govern all metabolic reactions in the network, library. The Jacobian and the modal libraries contain informa-
the feasible space of kinetic parameters, the k-cone, can be tion about the timescales and the metabolic pools formed during
calculated. (c) Once defined the feasible set of kinetic the relaxation process, respectively
In general, the temporal behavior of a metabolite In this contextual scheme, the Jacobian matrix con-
concentration is calculated as a balance among the tains information about the topology of the metabolic
metabolic flux of those reactions that contribute to network and the thermodynamics of the reactions
increase and decrease the production and use of (Jamshidi and Palsson 2008).
metabolites. This principle is conserved at genome As we mentioned earlier, the objective of using
scale and the temporal behavior of a metabolic net- the theory of modal analysis is to recover dynamical
work integrated by n reactions and m metabolites is information of the system, in particular (1) how the
given by: metabolism rearranges its constituents while it relaxes
toward a steady state and (2) which are the timescales
d~
x during this process. Even though these questions can
~
¼SV (3)
dt be explored from Eq. 5, modal analysis is a proper
formalism to explore these issues in a more direct
where S denotes the ▶ stoichiometric matrix which is way. Thus, by applying a diagonalization on Eq. 5 we
formed from the stoichiometric coefficients of the obtain:
reactions that comprise a metabolic reconstruction. 0
This matrix is organized in such a way that J ¼ M J M1 (7)
every column corresponds to a reaction and every
row corresponds to a metabolic compound (as exem- Notably, J 0 defined in Eq. 7 is a diagonal matrix
plified by Fig. 2 panel D). In addition, V ~ refers to with the eigenvalues of J ordered descendingly and the
a vector that contains the fluxes of the reactions term M is a matrix whose columns are integrated by the
included in the reconstruction. Here d~ x eigenvectors of J. By substituting Eq. 7 into Eq. 4 we
dt is a vector
with time derivatives of the metabolite concentrations obtain:
and it is equal to zero when the system has reached
a steady state. x0
d~
x 0 ¼ M J 0 M1 ~
¼ J ~ x0 (8)
Here we limit our description to analyze the dt
dynamical metabolic profile when small perturbations
are applied to the metabolic reconstruction. Thus, in d~x0 0
k6 k1 J=S·G
∂Vi
k5 k2 G=
∂xj
k4
B
D
C J = M · J ′· M −1
k3
m = M −1 · x ′
dm
= J ′· m
A V2 + V5 −V1 − V6 dt
B = V1 + V4 −V2 − V3
V3 + V6 −V4 − V5 Supposing steady-state
C
and law of mass action
c A = k 2B + k 5C − (k 1 + k 6)A dx
=S·V=Q·k
B = k 1A + k 4C − (k 2 + k 3)B dt
Q = S · diag (x0)
C = k 6A + k 3B − (k 4 + k 5)C
Q · k = 0 ⇒ null (Q) ⇒ k-cone
V1 k1A k1
k1 k2 k3 k4 k5 k6 V2 k2B k2
−1 −1 V3 k3B k3
1 0 0 1 A V= = k=
S= 1 −1 −1 1 0 0 B V4 k4C k4
0 0 1 −1 −1 1 V5 k5C k5
C
V6 k6A k6
−A B 0 0 C −A
d Q = S · diag (xst) = A −B −B C 0 0
0 0 B −C −C A
k1 0 0
0 k2 0
∂Vi 0 k3 0 −k1 −k6 k2 k5
Gij = = J=S·G= k1 −k2 −k3 k3
∂xj 0 0 k4
0 0 k5 k5 k4 −k4 −k5
k6 0 0
Dynamic Metabolic Networks, k-Cone, Fig. 2 Example of composed of three metabolites which interconvert to each other
the application of the theory of modal analysis and the k-cone with reaction constants ki . (c) Box showing the balance equations
formalism to a hypothetical metabolic network. (a) The box in terms of the fluxes Vi or the metabolite concentrations and
shown in this figure recapitulates the mathematical equations kinetic constants. (d) Box exemplifying the matrices mentioned
described in this essay. (b) A hypothetical metabolic network in the text constructed according to the network from (b)
in systems biology for surveying the mechanism by scales. For instance, in the example depicted in Fig 2,
which cells respond under external perturbations. it can be seen that the knowledge of the kinetic
However, given that a variety of kinetic information constants is needed to obtain the Jacobian matrix J
is unknown in most of the biological systems, this (panel D). With the purpose of overcoming this issue,
formalism has been applied in few cases at genome some databases are emerging (Rojas et al. 2007;
D 628 Dynamic Metabolic Networks, k-Cone
Scheer et al. 2011) that store the numerical values of space let us identify and explore the potential response
some parameters required for analyzing certain biolog- of the metabolic phenotype for a microorganism and
ical circuits at well-defined physiological conditions. allows us to survey how the metabolism coordinately
Although, these databases represent important contri- acts to reach a steady state after one perturbation
butions to enrich the model and experimentally assess occurs. Remarkable, by selecting an ensemble of
its outcomes predictions, there is evidence that the kinetic parameters belonging to the k-cone, one can
numerical values of the kinetic parameters can vary build a library of feasible dynamic behavior and
depending on the physiological context. In fact, there explore the average or most frequent behavior.
is evidence that measurement in vitro and in vivo can Furthermore, this dynamic library can contribute to
differ by some orders of magnitude (Famili et al. classify those parameters that have a high from those
2005). In this contextual scheme, the elaboration of with a low numerical variability for recovering the
alternative procedures that allows us to estimate the steady state.
range of each parameter participating in the reactions In order to apply this framework at the genome
conforming a metabolic network would be of great scale, an important issue is to have a well-defined
value. metabolic profile at a steady state. As described in
In general terms, the temporal evolution of the Fig. 1, the variety of technologies used in metabolome
metabolite concentrations can be founded by solving high-throughput data can fulfill this latter requirement.
Eq. 3, but this requires a complete knowledge of the As we have seen throughout this essay, the combined
~ and the initial conditions, a fact that is
values of S, V, effect of theory of modal analysis and the k-cone
not always fulfilled. However, if we suppose that all formalism supply with a pipeline to explore and survey
the reactions obey law of mass action, Eq. 3 can be the dynamical behavior of genome-scale metabolic
written as: reconstructions. This method is strongly dependent
on quantitative metabolic profile of the organism in
study and independent of a complete or partial knowl-
d~
x
¼ S diagðCÞ k~ (12) edge of the kinetics underlying the metabolic network
dt
(Resendis-Antonio 2009).
The main idea of dynamic modularity comes from the John R. King
need to systematically explain the influence of a simple School of Mathematical Sciences, University of
genetic change or environment perturbation on the Nottingham, Nottingham, UK
behavior of an organism, which is also the ultimate
goal of studying biological networks. However,
research on dynamics of molecular networks remains Synonyms
very challenging even though a huge number of high-
throughput data are available nowadays because the Matched asymptotic expansions; Multiscale and
state space of networks grows exponentially with homogenization methods; Regular and singular pertur-
the number of network components (Alexander bation methods
et al. 2009). So, methods to reduce the complexity of
the analysis are of great interest. Thus, dynamic mod-
ularity appears, which can be used to explore molecu- Definition
lar network dynamics to some extent by two steps.
First, the network is decomposed into smaller building Asymptotic methods comprise a class of systematic
blocks that can be analyzed more easily, among which mathematical procedures that allow the dominant
D 630 Dynamical Systems Theory, Asymptotics and Singular Perturbations
processes operating in a given model to be identified, for for prescribed constants S0 and E0. An essential
the model to be reliably simplified by neglecting those first step in the application of asymptotic techniques
effects that are identified as negligible and for the error is to non-dimensionalize the model, thereby reducing
arising from the simplification to be rather precisely the number of parameters present, identifying the
characterized. The techniques are widely applied in relevant parameter groupings and, most importantly
the study of differential-equation models but are equally in the current context, characterizing the relative
applicable to discrete systems, differential-delay equa- importance of these dimensionless groups and hence
tions, and stochastic models, for example. Typical of the processes that each embodies. The choice of
applications in systems biology include in reducing the non-dimensionalization is somewhat arbitrary, but
complexity of network models (possibly expressing it is often convenient to scale out the initial data, i.e.,
them in modular form) and in integrating across dispa- in the case of Eqs. 1 and 2 to set, having noted that
rate spatial and/or temporal scales in multiscale formu- E ¼ E0 C,
lations. The techniques permit the extraction of
parameter dependencies from, and more extensive ana- ^
S ¼ S0 S; ^
C ¼ E0 C; t ¼ ^t=k1 S0 : (3)
lytical treatment of, the governing models, as well as
complementing their numerical study. This yields
d S^
¼ r 1 C^ S^ þ k1 C^ ;
Characteristics dt (4)
d C^
¼ 1 C^ S^ ðk1 þ k2 Þ C; ^
The numerous general texts available on asymptotic d^t
methods include Murray (1984); Hinch (1991);
Holmes (1995); and Kevorkian and Cole (1996). An with
ordinary differential equation (ODE) model that exem-
plifies many of the key aspects of singular-perturbation S^ ¼ 1; C^ ¼ 0 at ^t ¼ 0: (5)
methods as applied in systems biology is one that is
widely adopted as a deterministic description of the Here the number of parameters has been reduced
enzymatic reactions from five in Eqs. 1 and 2 to three dimensionless
groupings
k1 k2
S þ E Ð C! P þ E
k1 r ¼ E0 =S0 ; k1 ¼ k1 =k1 S0 ; k2 ¼ k2 =k1 S0 ; (6)
in which a substrate S reacts with an enzyme E to form the first of which is a concentration ratio, while the
a complex C that decomposes irreversibly into the other two are ratios of possible timescales. The extent
enzyme and a product P. Using the same notation for to which the number of parameters can be reduced
the concentrations of each of these species then gives depends on the problem; a direct application of the
the ▶ Ordinary Differential Equation (ODE) Buckingham p theorem gives a lower bound on how
many fewer there are in the dimensionless formulation
(and the actual value, two, in the above example),
dS
¼ k1 ES þ k1 C; while the number of variables to be scaled, e.g., three
dt
in the case of Eq. 3, is typically an upper bound.
dE
¼ k1 ES þ ðk1 þ k2 ÞC; (1) If reasonable (at least order of magnitude) estimates
dt
dC are available for each of the parameters, the next step is
¼ k1 ES ðk1 þ k2 ÞC; to determine which of the dimensionless groupings is
dt
small (or large) and hence available for exploitation
in an asymptotic analysis (subtleties can arise in iden-
typically subject to the initial conditions tifying the combination of dimensionless parameters
that can be most effectively exploited in this way – see
S ¼ S0 ; E ¼ E0 ; C ¼ 0 at t ¼ 0 (2) Segel and Slemrod (1989), for example – and
Dynamical Systems Theory, Asymptotics and Singular Perturbations 631 D
distinguished limits, described below, have a role to The reduced problems obtained in the above fash-
play in this regard). In the case of regular perturbation ion are more amenable to analytical solution (as in
problems, a uniformly valid approximation is obtained Eq. 8) than is the full problem, are numerically better
simply by discarding all terms multiplied by the small conditioned (their stiffness having been removed), and
parameter. Singular perturbation problems are more contain fewer parameters (e.g., Eqs. 7 and 9 each
delicate, since different approximations hold on differ- involve only the single grouping k1 þ k2 ), so can
ent scales, thereby capturing the ▶ slow-fast dynam- more readily be fitted to experimental data. It is note-
ics: this can be illustrated by considering the limit of worthy in particular that for such purposes it may not
Eqs. 4 and 5 in which r is small. Then for ^t ¼ O (1) (the be necessary to have any quantitative information D
shortest relevant timescale, corresponding to the inner about the small parameter, beyond that it is indeed
region or boundary layer) the appropriate approxima- small. In the above, only leading order approximations
tion to Eq. 4 is, at leading order in r, have been given – more accurate expressions can be
obtained by expanding the solutions in each region in
dS^0 d C^0 powers of the small parameter and balancing the terms
¼ 0; ¼ 1 C^0 S^0 ðk1 þ k2 ÞC^0 ; (7)
d^t d^t that involve the same power (in more complicated
examples, terms depending on the logarithm of the
so that S^0 ¼ 1 and small parameter may also need to be introduced in
order to match successfully etc.).
The above assumes the problem to contain a single
C^0 ¼ 1 eð1þk1 þk2 Þ^t =ð1 þ k1 þ k2 Þ: (8) small (or large) parameter. In many applications (and
almost invariably in the case of complex systems, such
The outer region sets ^t ¼ t=rk2 (to obtain as are common in the modeling of ▶ gene regulatory
a balance in the first of Eq. 4), S^ ¼ S and C^ ¼ C and networks and of ▶ metabolic and signaling networks)
a different approximation then holds as r ! 0, namely, multiple such parameters are present; typically one of
these would be identified as that in which the expan-
d S0
k2 ¼ ð1 C0 ÞS0 þ k1 C0 ; sions are to be performed and the relative sizes of the
dt others are then characterized by expressing them in
0 ¼ ð1 C0 ÞS0 ðk1 þ k2 ÞC0 ; terms of powers of this small parameter; there can,
however, be ambiguity in how this is accomplished.
whereby the second equation in Eq. 4 is replaced by its Distinguished limits, in which such characterizations
quasi-steady approximation (▶ Flux Balance Analysis, are identified in part on mathematical grounds as giv-
for example) leading to a Michaelis-Menten expres- ing the fullest set of relevant balances within the equa-
sion (a special case of the ▶ Hill Equation): tions, can then be of particular value, as they can also
be (because of their broad validity) when limited infor-
mation is available about the sizes of some of the
d S0 S0 0 ¼ S0
¼ ; C : (9) parameters.
dt k1 þ k2 þ S0 k1 þ k2 þ S0 The discussion above pertains to circumstances in
which the method of matched asymptotic expansions
The initial data on Eq. 8 are provided by the applies. Another broad set of techniques (multiple
matching condition scales, having two-timing as a special case) was origi-
nally developed largely in the analysis of nonlinear
oscillations (whereby rapid oscillations are subject to
lim S0 ðtÞ ¼ lim S^0 ð^tÞ ¼ 1
t!0 ^t!þ1 a slow rate of decay, for example: since the oscillations
persist, the fast and slow scales must be captured for all
(more generally, however, the matching conditions times, rather than the former being relevant only in the
need not correspond to the original initial conditions). boundary layer); here secularity conditions, instead of
The approximations valid on each of these timescales matching, play a central role. This class of techniques is
can be combined into a uniformly valid or composite of particular importance in integrating between scales
approximation. (homogenization – see Mei and Vernescu (2010), for
D 632 Dynamical Systems Theory, Bifurcation Analysis
a b
(x) (x )
λ1 λ2 λ λ
bistability region
c d
(x ) (x )
x2 x1
Dynamical Systems Theory, Bifurcation Analysis, norm or scalar measure of the vector state x. (b) A supercritical
Fig. 1 Schematic bifurcation diagrams depicting pitchfork bifurcation (upper plot) and a transcritical bifurcation
codimension-one local bifurcations. (a) An s-shaped fold featur- (lower plot). (c) A supercritical Hopf bifurcation (upper plot)
ing two saddle-node bifurcations (at parameter values l1 and l 2) and a subcritical Hopf bifurcation (lower plot). Here, a curve
and a consequent parameter interval between these two values in composed of solid circles represents a path of stable limit cycle
which bistability is observed. In this and subsequent panels, solid oscillations, whereas a curve of open circles represents a path of
lines represent paths of stable steady states and dashed lines unstable limit cycles. (d) A representation of a supercritical Hopf
unstable steady states. Also ||x|| represents a characteristic bifurcation in state and parameter space
numerical techniques for analyzing bifurcations in systems theory, stability analysis in general can be
practical examples, principally using the theory of found in Strogatz (1994). For more on numerical tech-
center manifolds and normal forms. That theory can niques for performing parameter continuation and
be seen as a counterpart to more traditional ▶ dynam- bifurcation analysis, see Krauskopf et al. (2007).
ical systems theory, asymptotics, and singular pertur- Rather than reiterate this theory, this entry shall try
bations. The qualitative geometry or topology of to give a qualitative flavor to how bifurcation theory
bifurcations is also stressed by Shilnikov et al. can underlie several key phenomena in biological
(2001). Many examples in biological systems, espe- systems. Specifically treated are threshold behavior,
cially in ▶ reaction-diffusion-advection equations are oscillation, bi- and multi-stability, synchronization
found in Murray (2007), and applications to cell biol- and emergence of collective behavior, before some
ogy in Fall et al. (2002). A more elementary introduc- final remarks on bifurcation theory applied directly to
tion to bifurcation theory and nonlinear dynamical experiments.
Dynamical Systems Theory, Bifurcation Analysis 635 D
Threshold Examples:
It is common in biological systems for a threshold • The ▶ Goodwin oscillator is a simplified model of
concentration of some chemical signal to be there in circadian rhythms based on a negative ▶ feedback
order for certain behavior to be triggered. Thresholds loop.
can often be understood in terms of the phenomenon of • The ▶ repressilator and oscillating network is
excitability, which in itself is a property of a dynamical a simple ▶ network motif constructed by ▶ model-
system with two timescales, where a transient pushes ing and simulation: synthetic models and methods.
the system beyond the region where a large excitation It models a ▶ gene regulatory network composed of
occurs. In systems that reach steady state though, the three genes which mutually repress each other in D
variation of a parameter can cause a global bifurcation sequence according to a ring structure.
that makes the large excitation occur. • Spatiotemporal regulation of extracellular signal-
regulated kinase has been shown to result in rapid
Examples: and sustained nuclear-cytoplasmic oscillations
• Control of calcium oscillations, see for example (Shankaran et al. 2009).
(Sneed and Keener 2008)
• ▶ Cell cycle model analysis, bifurcation theory, Bistability and Multi-stability
which is a canonical example of the practical appli- Bistability refers to the existence of two distinct
cability of bifurcation theory in explaining how attractors to which the system may evolve given
biological decisions occur as emergent bifurcation different initial states or perturbations. A typical bifur-
events upon integrating the various environmental cation scenario in which bistability occurs is via a pair of
and internal parameters ▶ saddle-node bifurcations that are connected in an
S-shaped fold, see Fig. 1a. Such folded structures are
Oscillation often seen as part of the the unfolding of a codimension-
Periodic oscillations are important in biology, two cusp bifurcation point (see Fig. 2a, b) which is one
appearing in diverse areas such as ▶ Circadian of the elementary catastrophes of singularity theory.
rhythms, metabolic networks and their evolution, Bistability also often accompanies subcritical
heart beats, cell signaling, etc. Oscillations can be bifurcations, where an extra fold causes the unstable
described by simple harmonic motion, governed by bifurcating branch to turn around, become stable, and
second-order linear ODEs, but such descriptions suffer coexist with the primary branch, again see Fig. 1.
from several serious limitations. First, they are not Multi-stability refers the situation when there are
robust, as a small amount of damping destroys the more than two competing attractors, each with distinct
oscillation. Second, ▶ oscillation amplitude and basins of attraction, which can occur via a sequence of
phase depends on the initial condition. Finally, the bifurcations, or directly via a global bifurcation, such as
frequency is typically not adaptable. In contrast, stable that caused by a Shilnikov-type homoclinic bifurcation.
limit cycles of nonlinear models are robust, because
following a small ▶ perturbation away from the cycle, Examples:
the system will return to the cycle by itself. Also, if the • The Toggle switch (Gardner et al. 2000) is
dynamics changes a little, a limit cycle will still exist, a synthetic biology construct that matches what is
close to the original one. believed to be a common ▶ network motif in sys-
The canonical way to generate oscillation from tems biology. It is composed of two genes that
a system that is otherwise at rest is via a ▶ Hopf mutually repress each other, which causes
bifurcation. These oscillatory instabilities can occur bistability between steady states in which either
in two ways (see Fig. 1c), either as a ▶ supercritical one gene or the other is expressed at high levels.
bifurcation in which a stable limit cycle is created at • Multi-stability is also seen in unusual cell-division
small amplitude, or as a ▶ subcritical bifurcation, phenotypes in ▶ cell cycle model analysis, bifurca-
often accompanied by bistability which would tion theory and in recent systems biology models of
cause the jump to a fully formed large-amplitude tumors as competing attracting states in cancer
attractor. dynamics (Huang et al. 2009).
D 636 Dynamical Systems Theory, Bifurcation Analysis
a b
l2 nx
solution
1 stable state states
codim 2 l2
cusp point
2 stable bifurcation
states, 1 unstable set
l1 l1
c d
16 600
period decreasing
14
500
12 TB
10 400
8 SNP SN
300
α1
μ2
DH δ = 0.1
6 period increasing
HB 200 δ = 0.01
4 HC δ = 0.001 oscillations
2 100
no oscillations
0 CP SNIPER 0
2 3 4 5 6 7 8 9
−20 −10 0 10 20 0.1 1
μ2 p (μM)
Dynamical Systems Theory, Bifurcation Analysis, saddle-node combine), DH a degenarate Hopf bifurcation
Fig. 2 Examples of two-parameter bifurcation diagrams indi- (a transition between super and subcritical cases), SNIPER rep-
cating parameter regions in which qualitatively distinct behavior resents a Saddle Node of Infinite PERiod bifurcation point
is observed. (a) A cusp bifurcation point linking two curves of (a kind of global bifurcation), SN a saddle node of equilibria,
saddle-node bifurcations. (b) A representation of the cusp in SNP a saddle node of periodic orbits, HB a Hopf bifurcation, and
3D showing a norm of the solution state on the vertical axis. HC a homoclinic bifurcation. (d) Two-parameter bifurcation
(c) Codimension-two bifurcations in a simple model for plateau diagram of an open-cell model for calcium oscillations (Sneed
bursting in excitable systems that arises in the unfolding of and Keener 2008). The lines (for three different values of
a certain degenerate codimension-three bifurcation point; see a parameter d) represent Hopf bifurcation curves in the param-
(Golubitsky et al. 2001). Here CP represents a cusp bifurcation eter plane and separate regions in which oscillations do and do
point, TB a Takens-Bogdanov point (where Hopf and not exist
Example: ▶ Attractor
• Collective oscillations in proliferating bacterial ▶ Bifurcation
population (Danino et al. 2010). ▶ Bifurcation, Supercritical and Subcritical
Dynamical Systems Theory, Delay Differential Equations 637 D
▶ Cell Cycle Model Analysis, Bifurcation Theory Huang S, Ernberg I, Kauffman S (2009) Cancer attractors: a
▶ Circadian Rhythm systems view of tumors from a gene network dynamics
and developmental perspective. Semin Cell Dev Biol
▶ Continuous Model 20(7):869–876
▶ Differential-Difference Equations Krauskopf B, Osinga HM, Galn-Vioque J (2007)
▶ Dynamical Systems Theory, Asymptotics and Numerical continuation methods for dynamical systems:
Singular Perturbations path following and boundary value problems. Springer,
New York
▶ Dynamical Systems Theory, Delay Differential Kuznetsov YA (2004) Elements of applied bifurcation theory,
Equations 3rd edn. Springer, New York
▶ Feedback Regulation Murray JD (2007) Mathematical biology. Springer, New York, D
▶ Gene Regulatory Networks third (in two parts) edition
Shankaran H, Ippolito DL, Chrisler WB, Resat H, Bollinger N,
▶ Goodwin Oscillator Opresko LK, Wiley HS (2009) Rapid and sustained nuclear-
▶ Hopf Bifurcation cytoplasmic erk oscillations induced by epidermal growth
▶ Law of Mass Action factor. Mol Syst Biol 5:332
▶ Limit Cycle Shilnikov LP, Shilnikov AL, Turaev DV, Chua LO
(2001) Methods of qualitative theory in nonlinear dynamics.
▶ Metabolic Networks, Evolution World Scientific, Singapore
▶ Network Motif Sieber J, Gonzalez-Buelga A, Neild SA, Wagg DJ, Krauskopf
▶ Partial Differntial Equations, Numerical Methods B (2008) Experimental continuation of periodic orbits
and Simulations through a fold. Phys Rev Lett 100:244101
Sneed J, Keener J (2008) Mathematical physiology. Springer,
▶ Optimization and Parameter Estimation, Genetic New York, second, in two parts edition
Algorithms Strogatz SH (1994) Nonlinear dynamics and chaos: with appli-
▶ Ordinary Differential Equation (ODE) cations to physics, biology, chemistry, and engineering.
▶ Oscillation Amplitude Westview, Boulder
▶ Partial Differential Equation (PDE), Models
▶ Periodic Oscillation
▶ Perturbation
▶ Reaction-Diffusion-Advection Equation
▶ Repressilator and Oscillating Network Dynamical Systems Theory,
▶ Saddle-Node Bifurcation Delay Differential Equations
▶ Stability
▶ Stability, States and Regions Patrick Nelson
▶ Steady State CCMB 2017 Palmer Commons,
University of Michigan, Ann Arbor, MI, USA
References Synonyms
Danino T, Mondragn-Palomino O, Tsimring L, Hasty J (2010)
A synchronized quorum of genetic clocks. Nature
Differential equations with deviating arguments;
463(7279):326–330 Differential-difference equations; Functional differen-
Fall CP, Marland ES, Wagner JM, Tyson JJ (2002) Computa- tial equations; Time delay
tional cell biology. Springer, New York, In memory of Joel
Keizer
Gardner TS, Cantor CR, Collins JJ (2000) Construction of
a genetic toggle switch in Escherichia coli. Nature
403(6767):339–342 Definition
Golubitsky M, Josic K, Kaper TJ (2001) An unfolding theory
approach to bursting in fast-slow systems. Global Analysis Delay differential equations (DDE) are equations
of Dynamical Systems. Festschrift dedicated to Floris
Takens for his 60th birthday, pp 277–308
whose solution depends on not just a single initial
Hoyle R (2006) Pattern formation: an introduction to methods. condition at time, t ¼ t0, but also on the past history
Cambridge University Press, Cambridge of the system. DDEs can be classified as retarded or
D 638 Dynamical Systems Theory, Delay Differential Equations
Dynamical Systems Theory, Delay Differential Equations, Table 1 Summary of Research on DDEs
Topic Problem formulation Approach Applications
Nonlinear y˙(t) ¼ f(y(t), y(t t), u(t)) Perturbation method with Lambert HIV, chatter
function
P
_ þ Ni¼1 Ai yðt ti Þ þ ByðtÞ ¼ uðtÞ Superposition or modified Lambert
Multiple delays yðtÞ HIV, multiple regenerative effect
function in chatter
Time-varying y˙(t) + A(t)y(t t) + B(t)y(t) ¼ u(t) Floquet theory, Wronskian matrix Milling chatter
with Lambert function
neutral and continuous or discrete. In general, DDEs has been led by Bellman, Cooke, Hale, Kuang,
a discrete delay differential equation can be written as and Stepan (Bellman and Cooke 1963; Stepan 1989;
Hale 1977; Corless et al. 1996). Time delays can be
dn y used to represent self-oscillating systems, economic
¼ f ðt; a0 ðtÞyðtÞ; a1 ðtÞy0 ðtÞ;... aðtÞn1 yn1 ðtÞ;
dtn (1) futures, and in engineering, pure delays are often
b0 ðtÞyðt tÞ. ..bn1 ðtÞyn1 ðt tÞÞ used to ideally represent the effects of transmission,
transportation, and inertial phenomena. They are also
where t represents a point of time in the past history quite popular in biology, where they can be used to
of the equation. Note, when bi ¼ 0 for all i that the model gestation, maturation, transcription, and
system is just an ordinary differential equation. How- numerous cell-cycle phenomena. Delay differential
ever, when bi ¼ 0 for i > 0 then the system is defined as equations constitute basic mathematical models for
a retarded differential equation. If bi ¼
6 0 for any i > 0 the such real phenomena. However, there is a principal
system is defined to be of neutral type. Equation 1 is an difficulty in studying DDEs hidden in their special
example of a DDE with only one time delay, t ¼ t but transcendental character which leads to an infinite
there can be multiple time delays, t1, t2,. . ., tn, in the spectrum of frequencies. In other words, all DDEs
system. Most DDEs in physics, engineering, and biol- will result in an infinite number of eigenvalues.
ogy are of the discrete type with one time delay. Another Hence, they are often solved using numerical methods,
representation for a DDE is defined as a continuous asymptotic solutions, approximations (e.g., Padé)
delay differential equation. and graphical approaches or through the study of
bifurcation of their characteristic equations to
Z 1
dy determine their stability.
¼ f t; yðtÞ; yðt tÞgðtÞdt (2)
dt 0 Characteristic Equation (zeros of the transcenden-
tal equation) The characteristic equation for a discrete
Note that when one considers the distribution time delay equation at steady state, with a single delay
function g(t) to be a gamma distribution (more (( Eq. 1), with bi ¼ 0 for i > 1) can be written as
precisely an Erlang distribution), the system can
reduce to the discrete delay type. P1 ðlÞ þ P2 ðlÞelt ¼ 0 (3)
X
N X
M Analytical Solutions
lt
ai l þ e
i
bi l ¼ 0
i
Finding analytical solutions for DDEs is difficult and
i¼1 i¼1 most of the theory to date focuses on computation
methods for solutions or methods for determining sta-
as the characteristic equation of the system about ys. bility via computation or analytics. However, a group
Then there exists a t∗ > 0 for which ys undergoes at the University of Michigan (Asl and Ulsoy 2003; Yi
a nondegenerate change of stability if and only if the et al. 2007) recently developed an analytic approach,
equation based on the matrix Lambert function, for the complete
1. S(m) ¼ 0 (defined by substituting l ¼ m + in into solution of a system of linear constant coefficient
the characteristic equation, separating the real and DDEs. This method can be applied to study eigenvalue
D 640 Dynamical Systems Theory, Delay Differential Equations
and using the property of the exponential Comparing Eqs. 17 and 18 we note that
Definition
▶ Cell Cycle of Early Frog Embryos The European Bioinformatics Institute, an outstation
of the European Molecular Biology Laboratory
(EMBL; http://www.embl.org), was created in 1992
to respond to the requirements for organizing data
EBI Genome Resources from sequencing efforts, notably those originating
from the genome projects (Brooksbank et al. 2003).
Akos Dobay1 and Maria Pamela Dobay2 Located on the Welcome Trust Genome Campus in
1
Institute of Evolutionary Biology and Environmental Hinxton near Cambridge (UK), the EBI web portal
Studies (IEU), University of Zurich, Zurich, constitutes the European node of the INSD consortium
Switzerland (Brooksbank et al. 2003). Other members are the DDBJ
2
Department of Physics, Ludwig-Maximilians and the NCBI. Information on new entries and updates
University, Munich, Germany are exchanged between these databases on a daily basis.
The EBI provides access to more than 60 databases
containing high-quality annotated information and raw
Synonyms
data that range from nucleotide sequences, whole-
genome projects, to protein sequences, protein struc-
EMBL genome database
tures, and protein interactions. The data can be browsed
and analyzed using various tools hosted on the site, and
Definition that are also available for download. The EBI constitutes
the primary repository in Europe for sequence data.
The European Bioinformatics Institute (EBI; http://
www.ebi.ac.uk) is the main European repository of DNA and RNA Sequence Databases Available
nucleotide sequence data. EBI exchanges the collected from EBI
data with the DNA Data Bank of Japan (DDBJ; http:// The main nucleotide sequence database is the European
www.ddbj.nig.ac.jp) and the National Center for Bio- Nucleotide Archive (ENA). ENA is built from several
technology Information Database (NCBI; http://www. databases: the EMBL Nucleotide Sequence Database
ncbi.nlm.nih.gov) on a daily basis. The three databases (Kulikova et al. 2007) or EMBL-Bank created in early
are part of the International Nucleotide Sequence 1980; the Sequence Read Archive (SRA), a repository
Database (INSD; http://www.insdc.org) consortium, set up in 2009 for raw data obtained from next-
whose function is to ensure the integrity of the shared generation sequencing platforms; and the European
information. Apart from the genome resources, EBI Trace Archive (ETA), composed of raw data from elec-
also has resources for protein sequences and structures, trophoresis-based sequencing machines. The EBI also
as well as access to sequences from patents. provides access to whole genome databases across the
phylum Chordata through the project Ensembl (http:// Microarray Experiment (MIAME) recommendations
www.ensembl.org). Ensembl was launched in 1999 of the international Microarray Gene Expression Data
as a joint collaboration between EBI and the (MGED; http://www.mged.org) Society. The EBI also
Welcome Trust Sanger Institute (Flicek et al. 2011). provides access to patent data resources. These resources
Ensembl Genomes (http://www.ensemblgenomes.org) include the biology-related abstracts of patent applica-
is an extension of the Ensembl project to non-vertebrate tions, chemical compounds available from the Chemical
species such as bacteria, protists, fungi, plants, and Entities of Biological Interest (ChEBI; http://www.ebi.
metazoa. ac.uk/chebi), protein sequences of the European Patent
Office (EPO; http://www.epo.org), the Japan Patent
Protein Sequence Databases and Functional Office (JPO; http://www.jpo.go.jp), the Korean Intellec-
Information Available from EBI tual Property Office (KIPO; http://www.kipo.go.kr)
The Universal Protein Database (UniProt; http://www. and the United States Patent and Trademark Office
uniprot.org), a consortium created in 2002, is a joint (USPTO; http://www.uspto.gov), as well as patent
effort between the EBI, the Swiss Institute of Bioin- equivalents.
formatics (SIB; http://www.isb-sib.ch), and the Protein
Information Resource (PIR; http://pir.georgetown.edu,
The UniProt Consortium 2008). UniProt is comprised
Cross-References
of six different services:
1. The UniProt Knowledgebase (UniProtKB) consists
▶ DDBJ Genome Resources
of the two sections: the UniProtKB/Swiss-Prot for
▶ Metagenomics
curated protein information and the UniProtKB/
▶ NCBI BioProject Genome Resources
TrEMBL for automated annotation.
2. The UniProt Reference Clusters (UniProt/UniRef)
databases, which contain clustered sets of sequences
from the UniProtKB. References
3. The UniProt Archive, a nonredundant protein data-
Brooksbank C, Camon E, Harris MA, Magrane M,
base. Sequence retrieval is done through a unique
Martin MJ, Mulder N, O’Donovan C, Parkinson H,
protein identifier (UPI). Tuli MA, Apweiler R, Birney E, Brazma A, Henrick K,
4. The UniProt Metagenomic and Environmental Lopez R, Stoesser G, Stoehr P, Cameron G (2003) The
Sequences (UniProt/UniMES) database, which is European Bioinformatics Institute’s data resources. Nucl
Acids Res 31:43–50
a repository specifically developed for
Flicek P, Amode MR, Barrell D, Beal K, Brent S, Chen Y,
▶ Metagenomics. Clapham P, Coates G, Fairley S, Fitzgerald S, Gordon L,
5. The UniProtKB-GOA, which provides high-quality Hendrix M, Hourlier T, Johnson N, K€ah€ari A, Keefe D,
gene ontology (GO; http://www.geneontology.org) Keenan S, Kinsella R, Kokocinski F, Kulesha E, Larsson P,
Longden I, McLaren W, Overduin B, Pritchard B, Riat HS,
annotations to proteins in UniProtKB/Swiss-Prot
Rios D, Ritchie GRS, Ruffier M, Schuster M, Sobral D,
and UniProtKB/TrEMBL. Spudich G, Tang AY, Trevanion S, Vandrovcova J,
6. The UniProtKB Sequence/Annotation Version Vilella AJ, White S, Wilder SP, Zadissa A, Zamora J,
Archive (UniSave), which is a repository of Aken BL, Birney E, Cunningham F, Dunham I, Durbin R,
Fernández-Suarez XM, Herrero J, Hubbard TJP, Parker A,
UniProtKB/Swiss-Prot and UniProtKB/TrEMBL Proctor G, Vogel J, Searle SMJ (2011) Ensembl 2011. Nucl
entry versions. InterPro, an integrated documenta- Acids Res 39:D800–D806
tion resource used for the classification and auto- Kulikova T, Akhtar R, Aldebert P, Althorpe N, Andersson M,
matic annotation of proteins and genomes, can be Baldwin A, Bates K, Bhattacharyya S, Bower L, Browne P,
Castro M, Cochrane G, Duggan K, Eberhardt R, Faruque N,
used to add in-depth annotation, including GO terms,
Hoad G, Kanz K, Lee C, Leinonen R, Lin Q, Lombard V,
to protein signatures in UniSave. Lopez R, Lorenc D, McWilliam H, Mukherjee G,
Nardone F, Pastor MPG, Plaister S, Sobhany S, Stoehr P,
Other Databases Available from EBI Vaughan R, Wu D, Zhu W, Apweiler R (2007) EMBL
nucleotide sequence database in 2006. Nucl Acids Res 35:
The ArrayExpress Archive is a database of microarray
D16–D20
data for gene expression and functional genomics in The UniProt Consortium (2008) The universal protein resource
accordance with the Minimum Information About a (UniProt). Nucl Acids Res 36:D190–D195
Ecological Modeling 645 E
content in DNA. Defining organisms in relation to the
ECM embedding context, i.e., their environment including
history, appears as easier than in relation to their constit-
▶ Extracellular Matrix uents. No reconstruction of life from nonliving building
blocks is yet possible. These features also set the stage
for viewing living systems as open with respect to their
Ecological Explanation environment. Since the times of Uexk€ull (Buchanan
2009), the question has been posed but not settled to
▶ Explanation, Functional what extent relationships between organisms and their
environment are appropriately described by mechanisms
or whether the role of life constitutes in turn another E
Ecological Modeling modeling relationship. Hence, proper definitions of
organism, environment, and modeling may turn out to
Michael Hauhs be recursive.
Ecological Modeling, University of Bayreuth, Environmental models (and environmental model-
Bayreuth, Germany ing) can be regarded as a field closely related to eco-
logical modeling. An environment can only be defined
relative to a system or interface of which it serves as
Definition environment. In environmental models, the key role of
organisms is relaxed for characterizing the potential
Ecology studies organisms in their relationship to their behavior of the environment. In this case, abiotic sys-
living and nonliving environment. This relationship can tems serve as a reference for the environment.
either be conceptualized as a function of some internal
structure or as characteristic behavior at the respective
external interface. Models can be used to formalize these Characteristics
options. There are several scientific versions of the
notion of models as exemplified by the usages in math- Ecological Modeling Classically
ematics, physics, or engineering. Nothing has to be Environmental modeling was envisioned by John von
a model and anything can be a model (Mahr 2009), Neumann as a key application area when computing
even an organism, e.g., ▶ model organism. This defini- machinery was introduced. He argued that computing
tion deals with models of organisms or ecosystems. The would be critical in areas such as meteorology where
distinction between conceptual and computational results are relevant for society; they solve numerically
models has been useful. In the following context of a so far unsolvable problem and hence are capable of
▶ artificial life (AL), the discussion will be restricted to demonstrating the power of computing to a wide audi-
models which can be expressed or implemented ence (McGuffie and Henderson-Sellers 2005).
on a computer. Hence, in ecological modeling, the The subsequent success of computer models in opera-
empirical phenomena derived from scientific study as tional weather forecasting (prediction) has aptly demon-
in ecosystem science (on the one hand) or in land-use strated his vision. Computer models transformed weather
traditions as in agriculture, forestry, or nature conserva- prediction from an art into scientifically based technology.
tion (on the other hand) are linked with the theoretical In this area, ▶ process-based models could be shown to
results and engineering potential of computer science. outperform empirical models (McGuffie and Henderson-
In other words, the definition above does not restrict Sellers 2005). The success of environmental models has
ecological modeling neither to a subfield of ecology been used as a blueprint for other modeling approaches,
nor to a subfield of computer science. especially those in the field of ecological modeling or
Organism is the central notion of biology and also climate models (Edwards 2010).
ecology. Characterizing and expressing its key features It is often assumed that computer models can
require biological language and notions. One such key become equally important when studying the phenom-
feature is that all life on earth is embedded into one ena of life. However, this puts the above definition of
single evolution history as indicated by the shared ecological modeling into a dilemma: On the one hand,
E 646 Ecological Modeling
many ecologists regard ecosystems and living systems an interface, e.g., between an ecosystem and its abiotic
almost by definition as ▶ complex systems. Also, the environment. Computers have been routinely used as
atmosphere might be perceived as complex, but unlike (reductionistic) generators of virtual (living) behavior.
large scale geo-systems, the complexity of life has no However, it has not been widely recognized that com-
formalization and appears as such on all scales. Fea- puter science also offers theoretical tools studying the
tures of living systems and especially behavior such as “design of behavior.” Theoretical computer science
reproduction or evolution are often described as the could accordingly be summarized by a corresponding
result of ▶ emergence or interaction (Oyama et al. slogan “Computers, as they could be.”
2000), a behavior towards the environment. Further,
symptoms of unsettled theoretical and conceptual Ecological Modeling for Artificial Life
issues in biology can be found under the notion of An area where such a link seems natural, but has also
holism; even the notion and meaning of causality is not been fully established so far, is Artificial Life
still contested (Laland et al. 2012). research. (▶ Artificial Life) has constituted itself with
On the other hand, each computer program runs on the slogan “Life as it could be” (Langton 1989), in the
a mathematical machine, and any ecological models sense of, could be artificially generated. Models in AL
implement implicitly a formal approach to ecology, and have been focused at the reductionist approach in
any ▶ artificial life model implements a formal approach which living behavior is sought to be generated artifi-
to biology. This implied theory can be seen as a hypothet- cially from abstract building blocks. The commentary
ical core to a theory about life and can be interpreted as of the results as “something is missing” by Brooks
reductionism. The goal of reductionist ecological and (2001) still largely holds. AL seems to share with
much of AL modeling is to follow the successful exam- ecology the strange relationship between methodolog-
ples set by environmental modeling above. This, however, ical rigor and practical relevance. Using this analogy
implies that ecological and AL models attempt serving would imply focusing in a parallel manner on attempts
holism and reductionism at the same time. “formalizing holism,” i.e., studying life as virtual
Models in ecology appear as either practically (rather than artificial) behavior with methods from
(empirically) relevant or as methodological rigorous computer science. So far, however, there has been little
but rarely both at the same time (Peters 1991). This contact in the research agendas of AL and ecological
leaves the field of ecological modeling with two oppo- modeling as indicated by the few respective entries in
site points of orientation: mainly empirical with AL or ecological modeling conferences.
a pragmatic focus on usefulness or mainly theoreti- Rosen proposed to formalize modeling relations
cal/speculative while remaining untestable. between organisms and environment in categorical lan-
A promising modeling approach to the ability of guage (Rosen 1991). In the language of (▶ Mathematical
organisms of behaving strategically and deciding due Structure, Category) theory, there are two modeling par-
to an actual environment is ▶ agent based modeling. adigms available which encompass the above dilemma:
This modeling technique is a new rapidly expanding Firstly, a functional modeling paradigm which has
field not only of ecological modeling (Grimm and been adopted from physics and which provides
Railsbeck 2005) but also in other fields such as sociol- a constructive interpretation to the slogan: “Life as it
ogy or economy. It is, however, still characterized could be artificially and symbolically constructed on
by an unsettled relationship to theory (O’Sullivan a computer.” It derives the environmental relationship
and Haklay 2000) and a strained relationship with of life as a function of state. Secondly, there is an inter-
empirical studies (Clifford 2007). active modeling paradigm which characterizes life at
Within the life sciences, ecological modeling has the interface with its environment as emergent behavior.
been perceived as an applied science. The theoretical It can be adopted from computer science and provides
background of the models being applied stems from a classificatory interpretation to the slogan: “Life as
physics. In such a reductionist perspective, the rela- it could behave virtually on a computer.” In the related
tionship of an organism or ecosystem with an environ- field of computational intelligence, the latter approach
ment is modeled as a function of state (e.g., implied by has become famous by the so-called Turing test.
a natural law). In a holistic perspective, the relation- A correspondent behavioral classification of life has
ship can be modeled as emergent behavior observed at been proposed (Cronin et al. 2006).
Edge Betweenness Centrality 647 E
Since the behavior-based definition of intelligence Mahr B (2009) Die informatik und die logik der modelle.
by A. Turing, theoretical computer science has come Informatik Spektrum 32(3):228–249
McGuffie K, Henderson-Sellers A (2005) A climate modelling
up with a unifying approach of formalizing behavior of primer. Wiley, London
systems (Rutten 2000). Categorically, this approach O’Sullivan D, Haklay M (2000) Agent-based models and
can be abstracted as (▶ Coalgebra). The two modeling individualism: is the world agent-based? Environ Plann
approaches above are categorical duals of each other. A 32:1409–1425
Oyama S, Griffiths PE, Gray RD (2000) Cycles of contingency –
However, only one of them has dominated the fields developmental systems and evolution. MIT Press,
of AL and ecological modeling so far, i.e., the former Cambridge
physical one. The second approach exists as a theoret- Peters RH (1991) A critique for ecology. University Press,
ical possibility: In this perspective, the relationships of Cambridge
living system to their context and how they behave
Rosen R (1991) Life itself – a comprehensive inquire into the E
nature, origin, and fabrication of life. Columbia University
towards their environment become the defining fea- Press, New York
tures of life. The corresponding states observable at Rutten JJMM (2000) Universal coalgebra: a theory of systems.
interfaces are the mere (epistemic) signatures of this Theor Comp Sci 249(1):3–80
behavior rather than their material causes.
Edge Betweenness
Cross-References ▶ Edge Betweenness Centrality
▶ Agent-Based Modeling
▶ Artificial Life Edge Betweenness Centrality
▶ Coalgebra
▶ Complex System Long Jason Lu1 and Minlu Zhang2
▶ Emergence 1
Division of Biomedical Informatics, Cincinnati
▶ Mathematical Structure, Category Children’s Hospital Research Foundation, Cincinnati,
▶ Model Organism OH, USA
▶ Process-based Model 2
Department of Computer Science, University of
Cincinnati, Cincinnati, OH, USA
References
Synonyms
Brooks R (2001) The relationship between matter and life.
Nature 409:409–411 Edge betweenness
Buchanan B (2009) Onto-ethologies: the animal environments
of Uexkull, Heidegger, Merleau-Ponty, and Deleuze. State
University of New York Press, New York
Clifford NJ (2007) Models in geography revisited. Geoforum Definition
39:675–686
Cronin L, Krasnoger N, Davis BD, Alexander C, Robertson N,
The edge betweenness centrality is defined as the num-
Steinke JHG, Schroeder SLM, Cooper G, Gardner PM,
Kholbystov AN, Siepmann P, Whitaker BJ, Marsh D (2006) ber of the shortest paths that go through an edge in
The imitation game – a computational chemical approach to a graph or network (Girvan and Newman 2002). Each
recognizing life. Nat Biotechnol 24(10):4 edge in the network can be associated with an
Edwards PN (2010) A vast machine: computer models, climate
edge betweenness centrality value. An edge with
data, and the politics of global warming. MIT Press,
Cambridge a high edge betweenness centrality score represents
Grimm V, Railsbeck SF (2005) Individual-based modeling and a bridge-like connector between two parts of a net-
ecology. Princeton University Press, Princeton work, and the removal of which may affect the com-
Laland KN, Sterelny K, Odling-Smee J, Hoppitt W, Uller T (2012)
munication between many pairs of nodes through the
Cause and effect in biology revisited: is Mayr’s proximate-
ultimate dichotomy still useful? Science 334:1512–1516 shortest paths between them. Figure 1 illustrates an
Langton C (1989) Artificial life. Addison-Wesley, Reading example of eight nodes in a network, and the red
E 648 Edge Ontology
Definition
References
Efficiency is a property of experimental design. It is
Girvan M, Newman ME (2002) Community structure in social
and biological networks. Proc Natl Acad Sci USA inversly proportional to the variance of the data-
99(12):7821–7826 derived estimates of the unknowns. Smaller variation
leads to higher efficiency.
Synonyms EFM
Definition
Eigen Decomposition
Inspired by ▶ gene ontology, edge ontology provides
a hierarchical vocabulary of terms for describing ▶ Principal Component Analysis (PCA)
Electronic Health Record Systems 649 E
by a given linear transformation, the prefix “eigen”
Eigenvalue is used, as in eigenfunction, eigenmode, eigenface,
eigenstate, and eigenfrequency. Eigenvalues and eigen-
Tianshou Zhou vectors have many applications in both pure and applied
School of Mathematics and Computational Sciences, mathematics. They are used in matrix factorization, in
Sun Yet-Sen University, Guangzhou, quantum mechanics, and in many other areas.
Guangdong, China
interoperability between systems cannot be guaranteed Grimson J, Grimson W, Berry D, Stephens G, Felton E, Kalra D
(Iakovidis et al. 2007). In that respect, the combination of et al (1998) A CORBA-based integration of distributed elec-
tronic healthcare records using the synapses approach. IEEE
standardized terminologies (e.g., ▶ ontologies) together Trans Inf Technol Biomed 2(3):124–138
with domain concept definitions (i.e., ▶ archetypes, as Grimson J (2001) Delivering the electronic healthcare record for
described above) will raise the level of interoperability. the 21st century. Int J Med Inform 64:111–127
This leads to semantic interoperability (▶ Semantic Garde S, Hovenga E, Buck J, Knaup P (2007) Expressing clinical
data sets with openehr archetypes: a solid basis for ubiquitous
Web, Interoperability), which ensures that the informa- computing. Int J Med Inform 76(3):334–341
tion being exchanged is computer-processable. For that Hull R, Zhou G (1996) A framework for supporting data inte-
reason, international efforts (Eichelberg et al. 2005), like gration using the materialized and virtual approaches. ACM
CEN’s 13606 and HL7’s RIMv3, aim to standardize how SIGMOD Record 25(2):481–492
Iakovidis I, Dogac A, Purcarea O, Comyn G, Laleci GB
domain concepts (i.e., ▶ Archetypes, ▶ Templates) (2007) Interoperability of eHealth systems – selection of
should be defined in order to achieve this level of recent EU’s research programme developments. In: Proceed-
interoperability. ings of the international conference eHealth: combining
health telematics, telemedicine, biomedical engineering and
bioinformatics to the edge, Berlin
Secondary Uses of Electronic Health Records IOM (2001) Commitee on quality of healthcare in america
In addition to improving health care delivery, electronic institute of medicine. Crossing the quality chasm: A new
health records also hold the promise of accelerating the health system for the 21st century. National Academic Press
outcomes and efficiency of biomedical research. The Kohn LT, Corrigan JM, Donaldson MS (eds) (2000) To err is
human: building a safer health system. National Academic
aggregation of vast amounts of patient-level information Press, Washington, DC
can, for example, be the basis upon which to test models/ Kohl P, Noble D (2009) Systems biology and the virtual physi-
theories about disease mechanisms and treatment out- ological human. Mol Syst Biol 5:292
comes, as is done in systems biology and the virtual Noy NF (2004) Semantic integration: a survey of ontology-based
approaches. ACM SIGMOD Record 33(4):65–70, Special
physiological human (Kohl and Noble 2009), or decrease section on semantic integration
patient recruitment cycle times for clinical trials. Price Water House Coopers (2007) The economics of IT and
The realization of this secondary use requires that hospital performance. Price-Water House Coopers’ Health
electronic health records are in wide spread use. In Research Institute, Boston
Sheth AP, Larson JA (1990) Federated database systems for
addition, the level of ▶ interoperability between these managing distributed, heterogeneous, and autonomous data-
systems needs to be sufficiently high as to facilitate the bases. ACM Comput Surv 22(3):183–236, Special issue on
integration of related sources, in order to efficiently heterogeneous databases
exploit this wealth of information. Weed LL (1968) Medical records that guide and teach. New Eng
J Med 278(11): 593–599 and 278(12):652–657. Published
also in MD Comput 10(2):100–114, March–April 1993
References
Elongation Complex
Elementary Mode
▶ Transcription Elongation Complex
Sang Yup Lee E
Department of Chemical and Biomolecular
Engineering and Department of Bio and Brain
Engineering, Korea Advanced Institute of Science and Elongation Factor
Technology (KAIST), Daejeon, Korea
▶ Transcription Elongation Factor
Synonyms
▶ Translation Elongation
Definition
Elementary modes are a minimal set of pathways and EMBL Genome Database
all possible steady-state flux distributions that the net-
work inherently can achieve (Schuster et al. 2002). ▶ EBI Genome Resources
This set of vectors can be calculated from the
stoichiometric matrix of a biochemical network
using convex analysis, and there exists a unique
set of elementary modes for a given network. Embryological Explanation
1. Each elementary mode is a minimal set of pathways
that consists of the minimum number of reactions ▶ Explanation, Developmental
necessary to exist as a functional unit. This property
is called “non-decomposability” or “genetic inde-
pendence.” If any reaction in an elementary mode is
eliminated, the resulting elementary mode cannot Embryonic Induction
operate as a functional unit.
2. The elementary modes are the set of all routes Philippe Huneman
through a given network corresponding with Institut d’Histoire et de Philosophie (IHPST),
property. des Sciences et des Techniques, Université Paris 1
Panthéon-Sorbonne, Paris, France
References Definition
Schuster S, Hilgetag C, Woods JH, Fell DA (2002) Reaction
routes in biochemical reaction systems: algebraic properties,
Before the rise of genetics, Hans Spemann in 1901
validated calculation procedure and example from nucleotide already emphasized the role of inducers at the tissular
metabolism. J Math Biol 45(2):153–181 scale, a role which can be played by many chemical
E 654 Emergence
substances, as Joseph Needham recognized later (Order Jacobson A, Sater A (1988) Features of embryonic induction.
and life), leading to the “paradox of an apparently Development 104:341–359
Kuroda H, Wessely O, De Robertis E (2004) Neural induction in
nonspecific stimulus eliciting a specific developmental Xenopus: requirement for Ectodermal and Endomesodermal
response” (Jacobson and Sater 1988, p. 341). Inducers signals via Chordin, Noggin, b-Catenin, and Cerberus.
trigger the development of sets of cells, morphogenetic PlosBio 2(5):623–634
fields, likely to develop into specific cell fates. The
neural crest, for example, is a transitory structure which
among other things gives rise to the nervous system.
Morphogenetic fields can until some stage be induced
into other fates by proper induction as it has been shown Emergence
by Spemann and Mangold’s sea urchin experiments.
This proves the plasticity proper to early development. Ulrich Krohs
The experimental investigation of the nature of Department of Philosophy, University of M€unster,
inducers as well as the relation between morphogenetic M€unster, Germany
fields and organizers has been a main endeavor
of embryology in the pre-molecular biology era. Cur-
rent developmental theory investigates the genetic Definition
mechanisms of induction in terms of transcription
factors, etc., and the developmental genes, such as Emergence is the appearance of novel properties,
homeogenes, underpinning morphogenetic fields structures, patterns, and processes in a system, which
(De Robertis et al. 1991). are absent from the isolated components of the system.
Inducers cannot induce anything except in Concepts of emergence vary in demanding the novel
a “competent” tissue, that is, a tissue which is already features also being unpredictable from knowledge of
prepared to deal with them and respond to them. This has the components and their possible interactions (weak
been again shown about the induction of the central emergence), or being irreducible (▶ Reduction) to
nervous system, which actually involves more requisites properties of the components (strong emergence).
than the sole induction by a specific center in the meso- Just novelty, without unpredictability and irreducibil-
derm – as one usually thought – but also signals ity, is sometimes called nominal emergence.
expressed in the ectoderm already from the blastula
stage, without which neural induction is inefficient Characteristics
(Kuroda et al. 2004). Therefore, developmental pro-
cesses require both induction and “competence” Taxonomy and History
(as a capacity to respond to specific signals) (Dawid The concept of emergence captures the view that
2004), which is a modern version of the complementarity the whole is more than the sum of its parts. When
between epigenesis (inducers) and preformism (compe- G.H. Lewes introduced the term in 1875, it was linked
tence) as two poles in developmental explanations. to the view of irreducibility of the systems level to the
level of components. This metaphysical view of emer-
gence was most influentially put forward by Broad
(1923) and, in a less strict version, by Alexander
Cross-References
(1920) (see McLaughlin 1992). Today, the concept
comes differentiated into several more specific ver-
▶ Explanation, Developmental
sions (see Bedau 2003), not all of which require irre-
ducibility. The concept of emergence has thus lost its
strong metaphysical implications and is used today
References even within strictly physicalist frameworks.
is perhaps determined by, the behavior of the compo- with theory reduction. Nothing with regard to the
nents. Despite this determination, the relation is not emergent dynamics or level needs to remain
a causal one. It is constitutional instead: The compo- unexplained, and no other than low-level entities are
nents in their relation to each other constitute the assumed to make up the system.
system. Their behavior is at the same time part of the
behavior of the system. To test this intuition, we may
adopt a view of causality as transfer of energy or other Cross-References
conserved parameters. We see immediately that the
components cannot transfer energy to the system of ▶ Causality
which they are proper parts, since their energy is ▶ Complexity
already included in the energy of the system. So the ▶ Mechanism
relation is not causal. Causal processes are diachronic, ▶ Reduction
constitution is synchronic. “Upward causality” is syn- ▶ Supervenience
chronic and thus constitution rather than diachronic
causality. Analogously, a causal relation of “down-
ward causation” does not exist, though obviously References
a part behaves differently whether it is isolated or
integrated in a system. However, the system does not Alexander S (1920) Space, time, and deitiy. Macmillan, London
Bedau MA (2003) Downward causation and autonomy in weak
transfer energy to its parts; instead, parts exchange
emergence. Princ Rev Int Epistemol 6:5–50
energy with other parts. The system merely constrains Broad CD (1923) Scientific thought. Humanities Press,
what the parts can do – and even this is not the precise New York
description of what is going on. In fact, the interacting Hooker C (2011) Conceptualizing reduction, emergence and
self-organization in complex dynamical systems. In:
parts constrain each other – which is most conveniently
Hooker C (ed) Philosophy of complex systems. Elsevier,
described as systems behavior. ‘Consisting of parts’ Amsterdam, pp 195–222
and ‘being constituted by’ are synchronic relations, McLaughlin B (1992) The rise and fall of British emergentism.
while the influence between the components is In: Beckermann A, Flohr H, Kim J (eds) Emergence or
reduction? Essays in the prospects of nonnaturalist physical-
diachronic. Thus, only the latter is a causal relation.
ism. de Gruyter, Berlin, pp 49–93
Emergence is not a diachronic, causal, but a synchronic, Stephan A (1997) Armchair arguments against emergentism.
constitutional relation. This does not rule out that Erkenntnis 46:305–314
emergent properties may have causal effects, e.g., on
other high-level systems (Stephan 1997).
APC / APC /
Srw1 Slp1
S-phase cyclin is regulated by a transcription factor while its inhibitor Rum1 is high which allows licensing
complex (MBF), which also assists the activation of the of replication origins in G1. The increasing level of
licensing factor, Cdc18, in fission yeast. The degradation Cdc2/Cig2 complex downregulates Rum1 after
of Cdc18 is promoted by CDK-dependent phosphoryla- reaching a threshold and S phase is initiated. However,
tion. Both Cdc2/Cdc13 and Cdc2/Cig2 control the activ- the high Cig2 level cannot be maintained because
ity of their regulators through feedback loops (Fig. 1). Cdc2/Cig2 turns off its own transcription factor
Depletion of Cdc10 (an MBF subunit) blocks the cell in (MBF). This effect results in a decrease of Cig2 level
G1 phase, while cdc18D enters into mitosis without in G2 phase and makes possible the re-accumulation of
DNA replication resulting in a “cut” phenotype Rum1 in the absence of Cdc2/Cdc13. The rise of
(Nishitani and Nurse 1995; Sveiczer et al. 2004; Morgan Rum1/Cig2 ratio resets the cells back to G1 where
2006). relicensing of origins can take place. The boundary
between G2 and G1 phases are experimentally not
Endoreduplication defined yet, but multiple rounds of DNA replication
The first example for endoreduplication in fission yeast are well separated (Hayles et al. 1994; Novak and
came during studying Cdc2 kinase, the catalytic subunit Tyson 1997; Kiang et al. 2009). Cdc13D cells undergo
of both S- and M-phase CDKs (Broek et al. 1991). periodic relicensing shown by cdc13Dcdc10ts and
Cdc2ts mutants blocked in post-replicative G2 phase of cdc13Drum1D double mutants. Cdc13D cells are
the cycle were nitrogen starved and heat-shocked (49 C) arrested in G1 after depletion of the MBF component,
for a short time in order to increase protein degradation. because the synthesis of both the licensing factor and
When cells were released from the cdc2ts block, they Cig2 is blocked. In contrast, Rum1 depletion stops the
were re-replicating their DNA rather than entering into endoreduplication in G2. The Cdc13DCig2D double
mitosis. This suggested that cells “forgot” their cell mutant cannot re-replicate either because Cig2 is
cycle stage as a consequence of these treatments. Later required to initiate S phase (Hayles et al. 1994;
experiments have shown that the “memory” molecule Novak and Tyson 1997; Kiang et al. 2009).
lost is the Cdc2/Cdc13 complex responsible for mitotic Overexpression of the CDK inhibitor (Rum1) leads
initiation and progression. When the Cdc13 expression to a phenotype similar to cdc13D (Moreno and Nurse
was turned off by germinating spores deleted for cdc13 1994). After upregulation of Rum1 level, the first G1
(cdc13D), cells underwent through multiple round of phase is much longer than in wild-type, because the
DNA replications (Hayles et al. 1994). stoichiometric inhibitor has to be eliminated before the
In the absence of Cdc2/Cdc13, cells cannot enter cell can enter into S phase. Rum1 inhibits both Cdc2/
into mitosis but the activity of the S-phase CDK (Cdc2/ Cig2 and Cdc2/Cdc13 complexes, but it is a weaker
Cig2) oscillates and drives G1-S-G2-G1 endoredu- inhibitor for the S-phase CDK than for the Cdc2/
plication cycles. The Cig2 cyclin level is low in G1, Cdc13. Therefore, the high Rum1 level creates
Endoreplication 659 E
a Endoreduplication b Re-replication
cdc25ts block & release with cdc13D cdc25ts block & release with cdc18OP
1.2
1.4
Cdc2 / Cdc13, Cdc13T, Rum1
0.8 1.0
0.8
0.6
0.6
0.4
0.4
0.2
0.2 E
0.0 4 0.0 100
1.4 M G1 S G2 G1 S G2 G1 S M G1 S
2.0
1.2 80
3
Cdc2 / Cig2, Cig2T
Cdc18
Cdc18
0.8
2
1.0
0.6 40
0.4 1 0.5 20
0.2
0.0 0 0.0 0
0 100 200 300 400 0 100 200 300 400
time (min) time (min)
Endoreplication, Fig. 2 Time course results of two different alternating without mitosis resulting in the proper doubling of
types of over-replication. The relative activity/level of the main the DNA content. (b) Re-replication. Fission yeast cells are
regulatory components are plotted on the “y” axis. The particular blocked and released from cdc25ts then Cdc18 licensing factor
cell cycle phases are indicated on the figure. (a) Endoredu- was overexpressed. The cells enter S phase and have
plication. Fission yeast cells are blocked and released from a continuous DNA replication without any relicensing period
cdc25ts then Cdc13 was deleted from the G1-S-G2 phases are
a complete block of mitosis while Cdc2/Cig2 can be and over again. This continuous DNA replication does
activated periodically and triggers S phases (Moreno not require any Cdc10 function (cdc18OPcdc10ts double
and Nurse 1994; Novak and Tyson 1997). mutant re-replicates) suggesting that cells are not pass-
ing through Start during re-replication. Even more
Re-replication Cdc18-driven over-replication does not depend on any
Overexpression of the Cdc18 licensing factor results in further protein synthesis (Cdc18 overexpression
another type of over-replication called re-replication. followed by cycloheximide addition does not block re-
Cdc18 plays an important role in the initiation of DNA replication) (Nishitani and Nurse 1995).
replication and its protein level is peaking during
S phase. Overproduction of Cdc18 protein is able to Controlling Over-Replication in Higher Eukaryotes
spoil the control of origin firing. Upregulation of Higher eukaryotes have some extra pathways which
Cdc18 level by thiamine repressible nmt1 promoter have to work coordinately to block more than one
blocks mitosis and cells show enlarged nucleus DNA replication per cycle. A protein called geminin
(Fig. 2b). Cdc18-induced over-replication is clearly dif- and the Cul4-Ddb1Cdt2 pathway can suppress over-
ferent than endoreduplication. The FACS profiles are replication by inhibiting the licensing factors. Emi1
much more smeared referring to incomplete rounds of directly can regulate APC activity negatively. How-
DNA replications or multiple replication initiations at ever, the essential element of the system is again CDK,
origins rather than G1-S-G2-G1 type oscillation. The and its level clearly defines that the cell could go to
large amount of Cdc18 keeps the cell in the S phase and M phase or should be blocked in an over-replication
promotes the re-initiation of the replication origins over state (Arias and Walter 2007; Porter 2008).
E 660 Enhancer
Definition Definition
An enhancer is a short segment of DNA that can be An ensemble is a collection of learning models
bound by proteins (transcription activators) to enhance (▶ Model Validation, Machine Learning), whose
Ensemble 661 E
Ensemble, Fig. 1 Schematic
representation of ensembles Combination
Dataset
outputs are combined to form the output of the attribute (in the case of prediction tasks), by injecting E
ensemble. Ensembles can be used for several randomness into the base learning algorithms, or
machine learning tasks, such as predicting class labels by a combination of several of these techniques.
(▶ Classification), predicting numeric outcomes Many different ensemble learning techniques have
(▶ Regression), or ▶ clustering. Therefore, the indi- been proposed. The most well-known include
vidual models can be combined in several ways. ▶ bagging (Breiman 1996) and boosting (Freund and
Shapire 1997), which both manipulate the training
examples.
Characteristics Ensemble classifiers have been applied in several
bioinformatics domains, such as protein folding (Shen
An ensemble learning algorithm constructs a set of base and Chou 2006), cancer classification (Tan and Gilbert
learners and produces an output by combining the out- 2003), and gene function prediction (Guan et al. 2008).
puts of these base learners, by Fig. 1. While ensembles
have been proposed for unsupervised learning Cross-References
(▶ Learning, Unsupervised) tasks, such as ▶ clustering,
they are especially popular for supervised learning ▶ Bagging
(▶ Learning, Supervised) problems, where individual ▶ Classification
predictions can be combined by taking a (weighted or ▶ Clustering
unweighted) vote for ▶ classification tasks, and by ▶ Learning, Supervised
taking the average in the case of ▶ regression. ▶ Learning, Unsupervised
Ensemble techniques are popular prediction methods ▶ Model Testing, Machine Learning
because of their excellent generalization performance. ▶ Regression
A necessary and sufficient condition for an ensemble of
classifiers to be more accurate than any of its individual
References
members is that the classifiers are accurate and diverse
(Hansen and Salamon 1990). An accurate classifier does Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
better than random guessing on new examples. Two Dietterich TG (1997) Machine-learning research. AI Magazine
classifiers are diverse if they make different errors on 18(4):97–136
Freund T, Shapire RE (1997) A decision-theoretic generalization
new data points. The underlying principle is that these
of online learning and an application to boosting. J Comput
uncorrelated errors get corrected by the voting or aver- Syst Sci 55(1):119–139
aging procedure. Dietterich (1997) gives a detailed Guan Y, Myers CL, Hess DC, Barutcuoglu Z, Caudy AA,
explanation why ensembles can improve the predictive Troyanskaya OG (2008) Predicting gene function in a hier-
archical context with an ensemble of classifiers. Genome
performance of their base classifiers.
Biol 9(Suppl 1):S3
Ensembles can be constructed either by using Hansen L, Salamon P (1990) Neural network ensembles. IEEE
a different learning algorithm in each iteration, or by Trans Patt Anal Mach Intell 12:993–1001
using the same algorithm, but training it in different Shen HB, Chou KC (2006) Ensemble classifier for protein fold
pattern recognition. Bioinformatics 22(14):1717–1722
ways. The latter can be accomplished by introducing
Tan AC, Gilbert D (2003) Ensemble machine learning on gene
some diversity, e.g., by manipulating the training set, expression data for cancer classification. Appl Bioinform
by manipulating the descriptive attributes or the target 2(3):S75–S83
E 662 Entity Disambiguation
Cross-References
Entity Disambiguation
▶ Named Entity Recognition
▶ Term Disambiguation, Text Mining ▶ Text Mining
▶ Text Mining, Tools
▶ Word Sense Disambiguation
Entity Identification
Entrez
▶ Entity Mention Normalization
Jingky Lozano-K€uhne
Department of Public Health, University of Oxford,
Oxford, UK
X
HðX; Y Þ ¼ pðx; yÞlog pðx; yÞ
ðx;yÞ2XY
Entropy Reduction
Conditional Entropy: For joint probabilistic vari-
▶ Information
ables X and Y, one defines the conditional entropy of
Y given a particular outcome x by:
X
H ðYjX ¼ xÞ ¼ pðyjxÞlog pðyjxÞ: (2) Entry into Mitosis
y2Y
Cross-References
▶ Fluorescence Microscopy
Environmental Genome
▶ Metagenome
Epifluorescence Microscopy
Xiaohua Wu
Department of Pediatrics, Herman B Wells Center for
Environmental Genomics Pediatric Research, Indiana University School of
Medicine, Indianapolis, IN, USA
▶ Metagenomics
Definition
Environmental Proteomics
Cross-References
▶ Metaproteomics
▶ Spectroscopy and Spectromicroscopy
Epigenetics,
Fig. 1 Diagrammatic
representation of packaging of Chromosome*
DNA in the nucleus. Histone H1
Chromatin as shown
represents the “Bead on Nucleus Histone H2A
Cell
a string” structure. *The
chromosome shown is the Histone H2B
structure chromosomes
assume during cell division Chromatin Histone H3
and are referred to as
metaphase chromosomes. The Histone H4
tail of histone H4 is marked, as Nucleosome
shown, and similar extensions
called histone tails are present
in all the histone monomers Histone Octamer + DNA
and these are most frequently
modified by acetylation/
methylation/phosphorylation
as described in the text
Histone tail
DNA
Histone Octamer
known as CpG islands are observed in constitutively Epigenetic Marks on Chromatin Proteins
expressing genes. The methylated cytosine inhibits The ▶ histone proteins in the chromatin structure are
the trans-acting factors from binding to the regulatory also important targets of epigenetic modifications.
regions producing a closed chromatin structure. Alter- There are organisms like the yeast (Saccharomyces
natively, an inhibitory protein like methylated cytosine cerevisiae) and the fruitfly (Drosophila melanogaster)
binding protein 2 (▶ MeCP2) can also bind to methyl- where DNA methylation has not been detected so far.
ated DNA to produce closed chromatin structure and However histone modifications are known in almost all
hence, regulate the expression of gene (Bogdanović and eukaryotes. Histone modifications are carried out after
Veenstra 2009). Therefore, undermethylation is translation of the protein; hence they are post-transla-
associated with active transcription of the gene. The tional modifications (Latchman 2005). Based on the
palindromic nature of CG dinucleotide ensures the main- three-dimensional structure of histones, the core and
tenance of methylation after DNA replication, thus, the N-terminal tails are distinguished from each other
satisfying the “memory” component of the epigenetic (Fig. 1). Post-translational modifications are known
mechanism. As the replication fork moves, the newly to occur both on the core and the N-terminal tails.
synthesized DNA strand, carrying hemi-methylated Most often the lysine and arginine residues in the
CpG dinucleotides, is methylated by DNMT (DNA histones undergo covalent modifications like
Epigenetics 667 E
P Ac Ub
Me-CH3
Histone H2B: NH3-PEP V K S A P V P K K G SK K A I N K...VK Y T SS K
5 12 15 120 Ub-Ubiquitin, a 76
amino acid protein.
Me Me Me Me
Me Me
P Me P E
P
Me Me
Ac Ac Ac Ac Ac Ac
Ac
P Me Ac Ac Ac Me
Epigenetics, Fig. 3 Potential sites of post-translational modi- Peterson 2003). The amino acids most frequently modified are
fications of histone tails of nucleosomes. The alphabets after - S-serine, K-lysine, E-glutamic acid, R-arginine, and T-tyrosine.
NH3 are the single letter symbols for amino acids. The chemical The numbers below the amino acids indicate their position in the
nature of the modifications is shown. Ac acetylation, Me meth- amino acid sequence of the respective histone
ylation, P phosphorylation, Ub ubiquitination (Jaskelioff and
proteins like ▶ ubiquitin or SUMO (small ubiquitin- to describe such states (Bernstein et al. 2006). Thus the
related modifier) on an internal lysine residue in the signature for expression state of genes in differentiated
histone is associated with repression of gene expression. cells is the outcome of a combination of epigenetic
The epigenetic modification can be reversed to modifications.
activate a silenced gene. Actively transcribing regions The observation of different modifications of histones
are devoid of DNA methylation. The conversion of within a given region of the genome has given rise to the
a repressed gene with methylated cytosine residues into concept of “Histone code” akin to the genetic code. The
an active unmethylated state has to be done with DNA concept first put forward by Jenuwein and Allis (2001)
synthesis either during replication or repair and MeCP2- suggests that the pattern of histone modifications in the
like proteins are phosophorylated leading to their genome generates a highly complex pattern that can be
disassociation from DNA and opening up of chromatin correlated with transcriptional status of the gene. The
structure to facilitate entry of transcription machinery. post-translational modifications of the histones confers
The relationship between histone modifications and a pattern of flagging the basic unit of chromatin in the
active gene expression is direct as in the case of acetyla- nucleus to signal either the recruitment of transcription
tion. Acetylation of specific lysine residues K9, 14, 18, activating complexes like the ▶ trithorax complexes or
23, 27 and K5, 8, 12, 16 in H3 and H4, respectively, leads silencing complexes like the ▶ polycomb complexes.
to loss of positive charge on lysine residues and weakens The memory component of epigenetics has
the association between histones and DNA, resulting in a complex mechanism. It is observed that the epigenetic
open chromatin state allowing the entry of transcription marks are always faithfully transmitted through mitosis,
activators. However, violation of this correlation is and sometimes, through meiosis like in cases of
observed in many cases, apparently repressive epigenetic ▶ genomic imprinting and ▶ X-chromosome inactiva-
marking on active genes which are explained by indirect tion. The mechanisms of transcriptional memory are yet
effects. For example, ubiquitination is a repressive mark, to be completely understood.
but ubiquitination of H2B acts as an activating mark as it
is accompanied with methylation on lysine residues at Cross-References
position 4 and 79. The phosphorylation of serine residues
in histone H3 leads to activation of gene expression, and ▶ Bisulfite Conversion
also enhances the ability of histone acetyl transferases ▶ Chromatin
(HATs) to acetylate lysine 14. On the other hand, ▶ Epigenetics, Drug Discovery
dephosphorylation and deacetylation at lysine 10 and ▶ Euchromatin
14 of histone H4 are signatures for repression, but they ▶ Genome
are known to promote methylation at position 9 of H4 ▶ Genomic Imprinting
and hence some of the potentially active genes bear these ▶ Heterochromatin
modifications. The term “bivalent mark” has been used ▶ Histones
Epigenetics, Drug Discovery 669 E
▶ MeCP2 Definition
▶ Methylation-Sensitive Restriction Endonucleases
▶ Nucleosomes The response to treatment varies widely in patients
▶ Polycomb Complexes with reference to beneficial effects as well as adverse
▶ Post-Replication Modification reactions. It is established that there is a genetic basis
▶ Post-translational Modifications for this variability, and ▶ pharmacogenomics is the
▶ Trithorax Complexes study of these underlying genetic variations at
▶ Ubiquitin and SUMO the whole genome level. The outcome of drug treat-
▶ X Chromosome Inactivation ment is a consequence of multiple steps culminating in
the interaction between the therapeutically active mol-
ecule and the target. The genes coding for drug trans- E
References porters and metabolizers and those coding for targets
of drug action are important players in the final out-
Bernstein BE, Mikkelsen TS, Xie X, Kamal M, Huebert DJ, Cuff J, come of treatment for diseases. Therefore, the mecha-
Fry B, Meissner A, Wernig M, Plath K, Jaenisch R, Wagschal
nisms that influence expression of the genes involved
A, Feil R, Schreiber SL, Lander ES (2006) A bivalent chroma-
tin structure marks key developmental genes in embryonic stem in ▶ pharmacokinetics and pharmocodynamics are
cells. Cell 125(2):315–326 important in drug discovery and disease treatment.
Bogdanović O, Veenstra GJC (2009) DNA methylation and In addition to genetic variations, epigenetic marks
methyl-CpG binding proteins: developmental requirements
and function. Chromosoma 118(5):549–565
on these classes of genes will impact drug metabolism.
Frommer M, Mcdonald LE, Millar DS, Collis CM, Watt F, Grigg In the systems biology approach to drug discovery and
GW, Molloy PL, Paul CL (1992) A genomic sequencing optimization of existing drugs, it is important to study
protocol that yields a positive display of 5-methylcytosine not only the genetic variations but also epigenetic
residues in individual DNA strands. Proc Nat Acad Sci USA
variations between populations and individuals.
89:1827–1831
Henikoff S, Furuyama T, Ahmad K (2004) Histone variants,
nucleosome assembly and epigenetic inheritance. Tren
Genet 20(7):320–326 Characteristics
Jaskelioff M, Peterson CL (2003) Chromatin and transcription:
histones continue to make their marks. Nat Cell Biol 5:395–399
Jenuwein T, Allis CD (2001) Translating the histone code. The discovery of novel drug targets and drugs to target
Science 293:1074–1080 them, as well as, discovering novel drugs for known
Latchman DS (2005) Gene regulation: a eukaryotic perspective. targets are the aspects of drug discovery relevant for
Taylor & Francis, London
Lee H (2004) Msx1 cooperates with histone H1b for inhibition of
consideration in this section. Significance of epige-
transcription and myogenesis. Science 304:1675–1678 netic effects in the context of drug discovery could be
Ptashne M, Gann A (2002) Genes and signals. Cold Spring in the following contexts:
Harbor Laboratory Press, Cold Spring Harbor 1. Epigenetic changes in the genome under disease
conditions as novel targets for treatment
2. Epigenetic variation in genes involved in drug metab-
olism in turn resulting in altered response to drugs
3. Drug molecules bringing about alteration in
Epigenetics, Drug Discovery epigenetic profile of tissues and, thus, affecting gene
expression
Vani Brahmachari and Shruti Jain
Dr. B. R. Ambedkar Center for Biomedical Research, Epigenetic Changes in Disease Conditions
University of Delhi, Delhi, India The tools, presently available, have made it possible to
analyze the epigenome. Epigenome is the epigenetic
marks both on DNA and the histones (▶ Epigenetics)
Synonyms at the whole genome level. Epigenome analysis has been
carried out in many disease contexts specially cancers
Drug metabolism; Drug targets, Epigenetics; and significant differences in DNA methylation profile
Epigenome; Gene regulation; Pharmacogenomics between normal and cancer cells from the same
E 670 Epigenetics, Drug Discovery
individual are reported in literature (Lo and Sukumar a therapeutically active drug molecule into a therapeuti-
2008). Such data can be obtained by different cally inactive form or a therapeutically inactive drug
approaches: (1) using methylation-sensitive restriction (referred to as ▶ prodrug) into a therapeutically active
endonucleases (▶ Epigenetics) or (2) ▶ chromatin form. The enzymes involved in drug metabolism can be
immunoprecipitation (ChIP) with antibodies directed divided into two categories based on the reaction they
against 5-methyl cytosine (▶ Epigenetics). These catalyze: Phase I comprising of oxidizing, reducing, and
methods applied on the total genome of the cells under hydrolyzing enzymes; Phase II comprising of methylat-
analysis will yield a pool of methylated DNA sequences ing, sulphonating, and acetylating enzymes. Two most
which are sequenced to derive information on the DNA studied examples of drug-metabolizing enzymes are
methylation pattern of the whole genome. It is observed cytochrome P450 monooxygenase system and alcohol
that the genome of cancer cells have higher methylation dehydrogenase, ADH which are oxidizing enzymes and
(hypermethylation) than normal cells. Since DNA meth- cytosolic N-acetyltransferase (NAT2) which has acety-
ylation leads to repression, there will be inappropriate lating activity. Various isoforms of cytochrome P450
gene silencing in cancer cells compared to normal monooxygenase system are encoded by genes CYP1A1,
cells. Pharmacological intervention by incorporating CYP2A6, CYP2C9, and CYP2E1.
▶ 5-azaCytosine, the non-methylatable analogue of
cytosine has been shown to reverse gene silencing in Where and When Are They Active?
cancer cells (Gilbert et al. 2004). Incorporation of this The principal organ of action of these genes is the liver
analogue also inhibits the activity of DNA methyl trans- as it is the first organ to be perfused by drugs absorbed
ferases (DNMT). The molecules which are used to alter in the gastrointestinal tract and hence, it has very high
the epigenome are designated as epigenetic drugs and concentrations of the drug-metabolizing enzymes.
are currently in use against hematological cancers like Other sites include the kidney, gut, lung, and the skin.
myelodysplastic syndrome (Gilbert et al. 2004). How-
ever, one has to consider the lack of selectivity of such Genetic Variations in the Drug Metabolism Genes
analogues at the cell and the gene level which would The genes coding for these enzymes vary between indi-
have undesirable effects. viduals both at genetic and epigenetic level resulting
The pattern of histone modifications on the whole in variable drug response between individuals. The
genome of cancer cells has also been compared with ▶ genetic polymorphisms in drug-metabolizing genes
that of normal cells using ChIP approach, which has have given rise to distinct subgroups in the population
shown that there is lower acetylation of histones in that differ in drug response. Polymorphisms in ▶ micro-
cancer cells. Acetylated histones that are present in satellite repeats and ▶ single nucleotide polymorphisms
actively transcribed genes are deacetylated by the (SNPs) in genes coding for drug transporters, metaboliz-
action of enzymes called histone decetylases (HDAC, ing enzymes, or drug targets lead to their functional
▶ Epigenetics). Deacetylation will lead to transcrip- variability. Thus, within a given population, individuals
tion repression, thus, altering gene expression profile can be characterized as hypermetabolizers, poor
of cancer cells compared to normal cells. Therefore the metabolizers, and non-metabolizers. Genetic polymor-
effort is to use inhibitors of histone deacetylation as phisms in drug-metabolizing enzymes may have more
therapeutic agents. Phenylbutyrate and buphenyl are subtle, yet clinically important consequences for
inhibitors of histone deacetylases. There are two interindividual variability in drug response.
HDAC inhibitors approved for clinical application in For example, SNPs in the gene NAT2 can distinguish
cutaneous T-cell lymphoma (CTCL; Grant et al. 2010; populations into two distinct subgroups: “slow
Mercurio et al. 2010). acetylators” and “fast acetylators.” NAT2 codes for the
enzyme N-acetyltransferase type 2 that conjugates
Epigenetic Influence on Drug Metabolism Genes substrates like ▶ isoniazid with an acetyl group, which
What Is Drug Metabolism? is subsequently eliminated from circulation. The slow
Drugs are metabolized by complex biochemical path- acetylators, therefore, have decreased ability to metabo-
ways dedicated for metabolism of ▶ xenobiotics. Phar- lize this class of drugs by NAT2 (Roden and George
macological compounds are metabolized by the same 2002). Similarly CYP1A gene polymorphisms can group
enzyme system, leading to the conversion of either individuals in to distinct subgroups. Further, the
Epigenetics, Drug Discovery 671 E
combination of genetic polymorphism in genes coding neuroprotective mechanisms in several neurodegenerative
for different drug-metabolizing enzymes like NAT2 diseases, in addition to its role as antiepileptic and anti-
and isoforms of CYP1A determines the drug response malignancy drug (Monti et al. 2009).
profile of an individual. For example, increased CYP1A, Genetic and epigenetic variation in genes between
coupled with slow acetylation results in reduced antican- individuals is an important issue to be considered in drug
cer activity of amonafide (Ratain et al. 1996). discovery. In addition, the nontarget effect of drugs not
only on protein activity but also in terms of their epige-
Epigenetic Influence on Drug Metabolism Genes netic effects being recognized recently, demands particu-
The role of epigenetic modifications in drug- lar attention and cannot be ignored. Interestingly similar to
metabolizing enzymes is well established in certain drug efficacy differences, adverse effects can be also
cases, for example, drug transporters like ▶ ATP-binding negligible or absent in the background of the genetic and E
cassette B1 transporter (ABCB1) and solute carrier epigenetic variation profile of different populations.
(SLC22A8) and CYP1A. The hypermethylation of CpG
islands of CYP1A1 involved in the metabolism of car-
cinogens and polycyclic aromatic hydrocarbons in cell Cross-References
lines like MCF-7 (human breast carcinoma) and HeLa
(human cervical adenocarcinoma) results in its repres- ▶ 5-azaCytosine
sion (Hirota et al. 2008). The acetylation of histone H3 is ▶ Apoptosis
another chromatin remodeling modification which is ▶ ATP-Binding Cassette B1 Transporter
well characterized in transcriptional activation of ▶ Chromatin Immunoprecipitation
ABCB1 gene (Hirota et al. 2008). ▶ Epigenetics
▶ Genetic Polymorphisms
Drugs Modulating Epigenetic Profile ▶ Isoniazid
Apart from the drugs recently being developed as epige- ▶ Microsatellite Repeats
netic drugs mentioned above, the case of volproic acid as ▶ Pharmacogenomics
an epigenetic drug is unique and creates a new paradigm ▶ Pharmacokinetics and Pharmacodynamics
to be considered in a system biology approach to drug ▶ Phase I Enzymes
discovery. Drugs currently in clinical use for certain ▶ Phase II Enzymes
diseases lead to epigenetic modulation of genes in addi- ▶ Prodrug
tion to their effect on their target; they can have wider ▶ Single Nucleotide Polymorphisms
effect on the transcriptional profile of the target cells. For ▶ Xenobiotics
example, if a drug induces hypermethylation and thus,
inactivation of the gene involved in its metabolism, it
can lead to the accumulation of the drug and therefore
References
toxicity. On the other hand, if the drug leads to
hypomethylation it can result in higher expression of Gilbert J, Gore SD, Herman JG, Carducci MA (2004) The clin-
the gene. Therefore drugs directed at other targets can ical application of targeting cancer through histone acetyla-
cause epimutations (alteration in epigenetic marking) tion and hypomethylation. Clin Cancer Res 10:4589–4596
Grant C, Rahman F, Piekarz R, Peer C, Frye R, Robey RW,
leading to altered gene expression, without changing Gardner ER, Figg WD, Bates SE (2010) Romidepsin: a new
the DNA sequence. One of the drugs known to have therapy for cutaneous T-cell lymphoma and a potential
such an effect is valproic acid (VPA, 2-propylpentanoic therapy for solid tumors. Expert Rev Anticancer Ther
acid). It is widely used as an antiepileptic drug and for the 10:997–1008
Hirota T, Takane H, Higuchi S, Ieiri I (2008) Epigenetic regula-
therapy of bipolar disorders. Its action was initially found
tion of genes encoding drug-metabolizing enzymes and
to be primarily related to neurotransmission and modu- transporters; DNA methylation and other mechanisms. Curr
lation of intracellular pathways. More recently, it is Drug Metab 9:34–38
recognized as an antineoplastic agent as well, acting on Lo P-K, Sukumar S (2008) Epigenomics and breast cancer.
Pharmacogenomics 9(12):1879–1902
cell growth, differentiation, and ▶ apoptosis. It has
Mercurio C, Minucci S, Pelicci PG (2010) Histone deacetylases
been demonstrated that valproic acid directly inhibits and epigenetic therapies of hematological malignancies.
histone deacetylases. This property is important for Pharmacol Res 62:18–34
E 672 Epigenome
Synonyms
Antigenic determinant
Epigenome
Equilibrium
Cross-References
▶ Life Span, Turnover, Residence Time
▶ Chain Type ▶ Lymphocyte Population Kinetics
▶ Complementarity Determining Region (CDR- ▶ Steady State
IMGT)
▶ Groove (G) Domain
▶ IMGT Collier de Perles
▶ IMGT Unique Numbering Equilibrium Probability Distribution
▶ IMGT-ONTOLOGY
▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom Ruiqi Wang
▶ IMGT-ONTOLOGY, Leafconcept Institute of Systems Biology, Shanghai University,
▶ IMGT-ONTOLOGY, SpecificityType Shanghai, China
▶ IMGT ® Information System
▶ Immunogenetics
▶ Immunoinformatics Synonyms
▶ Paratope
▶ Systems Immunology, Data Modeling and Scripting Steady-state probability distribution
in R
▶ Variable (V) Domain
Definition
eRF
Definition
▶ Release Factor, Translation
Type I error, also known as false positive, concerns the
mistaken acceptance of an incorrect assumption. Type II
error, also known as false negative, designates the
Ergodicity erroneous rejection of a true account.
ESCEC
Definition
▶ Experimental Standard Conditions of Enzyme
A system is said to be ergodic if the time average of the Characterizations
variable describing it, is equal to its ensembleaverage.
Suppose that we have measured a signal xSt m at
different points in time t ¼ 1,. . ., T, where Sm denotes
the underlying system.
SmThe
time average of the signal Euchromatin
PT Sm
is then denoted by xt t ¼ T t¼1 xt . Now assume
1
that instead of measuring the signal from one single Vani Brahmachari and Shruti Jain
system Sm at different time points, we have a whole Dr. B. R. Ambedkar Center for Biomedical Research,
ensemble {Sm}, m ¼ 1,. .., Mof identical systems, and University of Delhi, Delhi, India
we measure the signal xSt m at a fixed time point t
for every system Sm, from which we then calculate the
PM S m
ensemble
Sm average
xSt m m ¼ M1 m¼1 xt . Then, if Synonyms
xt t ¼ xSt m m , the system is said to be ergodic
(Gardiner 2002). Open chromatin
Eukaryotic Translation Initiation Factor Interactions 675 E
Definition Characteristics
The actively transcribing regions of the genome which Eukaryotic Translation Initiation Process
is in an open chromatin state is called euchromatin. In eukaryotic cells, eIF2 preferentially binds GDP. The
The nomenclature “euchromatin,” in contrast to guanine nucleotide exchange factor eIF2B replaces
▶ heterochromatin, was initially given based on the ligh- this GDP to GTP, but only on the unphosphorylated
ter staining of the chromatin in these regions with dyes form of eIF2 (Fig. 1a: v1), so as to comprise the ternary
that bind to DNA, when the nuclei are observed under complex (i.e., TC, comprising eIF2, GTP, and initiator
the microscope. Euchromatin corresponds to gene-rich methionyl-tRNA, Met-tRNAi). This ternary complex
regions and active transcription. It is also enriched in is essential for translation initiation (Fig. 1a: v2). The
epigenetic marks associated with active transcription. interaction between eIF1A and the 40S ribosomal E
subunit promotes the dissociation of a ribosome and
ensures that the 40S ribosomal subunit is open to
Cross-References receive other eIFs. Subsequently, eIF1, eIF3, eIF5,
and ternary complex are recruited to the 40S ribosomal
▶ Epigenetics subunit thereby generating the 43S preinitiation
▶ Heterochromatin complex, in which the initiator tRNA Met-tRNAi is
held close to the ribosome’s P site (Sonenberg and
Hinnebusch 2009). This may happen via binding of
the individual factors to the 40S ribosomal subunit in
Eukaryotic Translation Initiation Factor separate steps. On the other hand, these factors may
Interactions interact through cooperative binding. There is evi-
dence to suggest that these factors load onto the 40S
Tao You1, George M. Coghill2 and ribosomal subunit in the form of a multifactor complex
Alistair J.P. Brown3 (Fig. 1a: v3, v4, Asano et al. 2000). Also various partial
1
Computational Biology, Innovative Medicines, complexes have been isolated. Therefore, the multifac-
AstraZeneca, Macclesfield, Cheshire, UK tor complex may form via different routes, as shown in
2
Institute for Complex System and Mathematical Fig. 1b (You et al. 2010).
Biology, University of Aberdeen, Aberdeen, UK eIF4E interacts with the 50 cap structure of an
3
Institute of Medical Sciences, University of mRNA (beginning of an mRNA), and poly-A binding
Aberdeen, Aberdeen, UK protein (PABP) binds the poly-A structure at the 30 end
(mRNAs end), as depicted in Fig. 1c. eIF4G circular-
izes the mRNA by binding both eIF4E and PABP.
Synonyms eIF4G also recruits the mRNA helicase eIF4A and its
activating partner eIF4B, which removes mRNA sec-
Eukaryotic translation initiation pathway ondary structures that might impede the movement of
the ribosome during scanning (below). This entire
complex is known as eIF4F (Fig. 1c).
Definition The 43S preinitiation complex loads onto the 50 end
of an mRNA via multiple interactions between eIF3/
Eukaryotic translation initiation is the process that eIF5 and eIF4E/eIF4B complexes (Fig. 1a: v5). Subse-
allows a ribosome to assemble at the start codon on quently, the complex moves down the mRNA in a 30
a messenger RNA. A set of proteins known as eukary- direction, seeking the 50 proximal start codon, in
otic translation initiation factors (eIFs) orchestrates a process known as scanning (Fig. 1a: v6). eIF5 cata-
this process along with the two ribosomal subunits lyzes the hydrolysis of GTP bound to eIF2, which is in
and initiator tRNA. Some of the events in this process equilibrium with GDP plus phosphate (Fig. 1a: v7).
happen in a specific temporal order. The cooperativity Base pairing between the anticodon loop of the initia-
among protein interactions gives rise to some defined tor tRNA Met-tRNAi and the start codon leads to
preinitiation complexes. a change in the conformation of the 43S preinitiation
E 676 Eukaryotic Translation Initiation Factor Interactions
(A)n
a 4E
7G
m 4G
4A
4B
AUG
V2
(A)n
V3
7G
m 4G
V2 3 V4 3 V5 3 4E 4A
2 2 1 5 1 5
4B
1 5
40S 2 1A 40S 2 1A
2
2B AUG
V1
2 5 3 1
V6
c
V9 3 V8 3 V7 3
5 1 5 1 5 PABP
40S 1A 40S 2 1A 40S 2 1A 40S 2 1A 4A
+Pi
m7G AUG (A)n m7G AUG (A)n m7G AUG (A)n m7G AUG (A)n
(A)n
V10 4G 4G 7G
m 4G 4E
4A 4E 4A
1A 4B 4B
AUG
5B
V12 4B
+Pi
5B V11 5B Pi V13
40S 1A 40S 1A 40S 40S 4E-BP
m7G AUG (A)n m7G AUG (A)n m7G AUG (A)n m7G AUG (A)n
m7G AUG (A)n TORC1
60S
3
1 5
5 3
5 2 1 5
2 5
2 2 5
3
3
3 3 5
2
5 5
3 1
3 V21 3 3 3 1
1 1 5 5 1 5
5 1
3
1 5 1 5 3
2 1 5
2
Eukaryotic Translation Initiation Factor Interactions, Fig. 1 Schematic diagram of eukaryotic translation initiation
complex. This triggers release of eIF1 in a mechanism of the translation initiation process, a ribosome is
involving eIF1A and eIF5 (Fig. 1a: v8, Lorsch and poised at the start codon ready for translation elonga-
Dever 2010). This leads to release of the phosphate, tion (Sonenberg and Hinnebusch 2009).
GDP, eIF2, and eIF5, and commits the 40S ribosomal
subunit in an immobilized closed conformation over Multifactor Complex
the start codon (Fig. 1a: v9, Sonenberg and As described above, assembly of the 43S preinitiation
Hinnebusch 2009). complex relies on a multifactor complex. The cooper-
Next, eIF5B∙GTP is recruited to the 40S ribosomal ative binding between the individual factors that com-
subunit (Fig. 1a: v10). The 60S ribosomal subunit joins prise multifactor complex means a partial complex has
the complex after the hydrolysis of this eIF5B-bound higher affinity with the binding partners compared
GTP (Fig. 1a: v11) and the release of the phosphate, with the individual components of the partial complex.
eIF1A and eIF5B (Fig. 1a: v12, v13). Upon completion This stimulation favors the formation of a complete
Eukaryotic Translation Initiation Factor Interactions 677 E
multifactor complex, and hence ensures the proper on protein synthesis rate (Dimelow and Wilkinson
assembly of a functioning 43S preinitiation complex. 2009; You et al. 2010). eIF2 binds initiator tRNA
The multifactor complex contains equimolecular Met-tRNAi as well as the GTP whose hydrolysis is
amounts of the individual factors. However, these central to start codon recognition. Therefore, reducing
eIFs are not equimolar in vivo (von der Haar and eIF2 activity would directly affect the levels of
McCarthy 2002). The optimal operation of the trans- the multifactor complex and consequently the 43S
lation initiation apparatus relies on the complex inter- preinitiation complex. Therefore, sufficient levels of
actions among the eIFs, and overexpression of eIF2 expression are important for efficient translation.
a specific eIF does not necessarily lead to higher pro- On the other hand, eIF1 has a critical role in start codon
tein synthesis activities. For instance, overexpression recognition. Evolution might have led to the mainte-
of eIF5 sequesters eIF2∙GDP in a nonproductive nance of high eIF1 activities in vivo to ensure that E
eIF5∙eIF2∙GDP complex, thereby reducing protein mRNA translation is not impeded at the initiation
synthesis (Singh et al. 2006; You et al. 2010). step by this factor.
Characteristics
Evidence-based Medicine
The Translational GTPase Superfamily
Ravi Iyengar The translational GTPase superfamily consists of protein
Department of Pharmacology and Systems factors that share highly conserved motifs in the G-
Therapeutics, Mount Sinai School of Medicine, domain as well as in adjacent regions. Translational
New York, NY, USA GTPases include protein factors that stimulate GTPase
hydrolysis activity on the large subunit of the ribosome
and play essential roles in the initiation, elongation, and
Definition termination of protein synthesis. Some factors within the
translational GTPase family do not appear to be involved
Utilizing existing clinical trials, studies, and in translation but are engaged in the maintenance or
meta-analysis, as well as biological understanding to biogenesis of ribosomes instead. The members of this
determine treatment choice and effectiveness. family are largely divided into subfamilies according to
their similarities with EF-Tu/eEF1a, EF-G/eEF2, and
IF2/eIF5B (Table 1, see ▶ Translation Initiation and
Cross-References ▶ Translation Elongation). Typically, the genome of
individual species encodes many members of EF-Tu/
▶ Systems Pharmacology eEF1a and EF-G/eEF2 families whereas IF2/eIF5B is
Evolution of Elongation Factor 1 Alpha 679 E
Evolution of Elongation Factor 1 Alpha, Table 1 The trans- interaction between the codons of mRNA and antico-
lational GTPase family of eubacteria and eukaryotes dons of tRNA, EF-Tu/eEF1a hydrolyzes GTP, releases
Ancestor Factor name the aa-tRNA, and then exits the ribosome. These pro-
type Eubacteria Eukaryotes Function cesses lead to the relocation of the aminoacyl-CCA
IF2 IF2 eIF5B Initiation stem of the aa-tRNA toward the peptidyl transferase
EF-Tu EF-Tu eEF1a and/ Elongation center of the ribosome and result in elongation of the
or EFL
peptide chain. Subsequently, GTP-bound EF-G/eEF2
CysNa n.i.∗ Adenosine-50 -
enters the ribosome and translocates the peptidyl-
phosphosulfate synthesis
n.i. eRF3b Termination
tRNA along with mRNA toward the P site, leaving
n.i. HBS1b mRNA surveillance the A site ready for the next round of reactions cata-
lyzed by EF-Tu/eEF1a (Agirrezabala and Frank 2009). E
n.i. Ski7b mRNA surveillance
n.i. eIF2gammac Initiation The molecular functions of EF-Tu/eEF1a and EF-G/
SelBa EEFSEC Incorporation of eEF2 are the essential characteristics of protein syn-
selenocysteine thesis, and these two proteins are almost (see below)
EF-G EF-G EF2 Elongation fully conserved among all living organisms.
RF3a n.i. Termination
LepAa n.i. Unclear EFL
Teta n.i. Tetracycline resistance Despite the critical role of eEF1a in protein syn-
TypeAa n.i. Unclear thesis, there are certain eukaryotic species, such as
n.i. SNU114d mRNA splicing
green algae and chromalveolates, that seem to lack
∗
n.i. indicates “not identified” eEF1a orthologs and instead encode an elongation
For derived factors, references are as follows:
a
Margus et al. 2007, factor-like protein (EFL). Although the amino acid
b
Atkinson et al. 2008, sequence of EFL is clearly divergent from that of
c
Keeling et al. 1998, eEF1a, the sequences are similar enough to suggest
d
Fabrizio et al. 1997 that EFL and eEF1a are functionally equivalent. In
most cases, the distribution of EFL and eEF1a
among species is mutually exclusive. Only one
encoded by a single gene. Presumably, in relation with species has so far been identified to possess both
numbers of genes, EF-Tu/eEF1a and EF-G/eEF2 have eEF1a and EFL, and no species has been found
many more derivatives than IF2/eIF5B. Interestingly, which possess neither of them. This mutual exclu-
EF-Tu/eEF1a has more derivatives in eukaryotes, and siveness makes EFL different from the other EF1a
EF-G/eEF2 has more derivatives in bacteria (Fig. 1). family members, eRF3 and HBS1.
Also interestingly, with the exception of SelB The distribution of EFL is not vertical but is
(▶ Translational Control by Cis RNA Elements, unevenly scattered in a wide range of phylogenic
Bacteria; ▶ Selenocysteine-specific EF-Tu/eEF1a) trees, i.e., close relatives of organisms with EFL have
specific EF-Tu/eEF1a (Keeling et al. 1998), derivatives eEF1a instead of EFL in multiple branches. This situ-
of these two factors are unique to the different domains ation suggests distribution of EFL by lateral gene
of life (Table 1 and Fig. 1). transfer. Direct evidence for lateral gene transfer of
EFL has been reported in the phylogenetic analyses of
EF-Tu/eEF1a: The Key GTPase Elongation Factor EFL in Rhizaria and algae (Kamikawa et al. 2008).
eEF1a is the eukaryotic ortholog of the bacterial elon- Despite this, it is not clear if lateral transfer has
gation factor EF-Tu. The fundamental function of occurred in other phylogenetic groups containing spe-
eEF1a is equivalent to that of EF-Tu. In polypeptide cies with EFL. The gene-loss scenario is entirely pos-
elongation cycles, eEF1a and eEF2, another GTPase sible (Cocquyt et al. 2009). Namely, the common
that is the ortholog of bacterial EF-G, sequentially bind eukaryotic ancestor was assumed to have both EFL
the ribosome. EF-Tu/eEF1a forms a complex with an and eEF1a, of which EFL was lost in most lineages.
aminoacyl-tRNA (aa-tRNA) in a GTP-binding state Further analysis is required to elucidate the evolution-
and delivers the aa-tRNA to the ribosomal A site. In ary processes that underlie the distributions of EFL and
the ribosomal A site, upon cognate base-pairing eEF1a within and among different taxonomic groups.
E 680 Evolution of Elongation Factor 1 Alpha
Evolution of Elongation
Bacteria
Factor 1 Alpha,
Fig. 1 Evolutionary EF-Tu Elongation
relationship of EF1a and EF2
family proteins found in the
three domains of life on Earth. EF-G Elongation
LUCA, the last universal
RF3 Termination
common ancestor (Diversified)
LUCA
EF-Tu
EF-G Archaea
alF2γ Initiation
Elongation
aEF1A multipotent Termination
Surveillance
aEF2 Elongation
Eukaryotes
elF2γ Initiation
eEF1A Elongation
eRF3 Termination
HBS1 Surveillance (Diversified)
eEF2 Elongation
The EF1a Family Members – eRF3 and HBS1 – and complex in appearance, and this complex is thought
Mimicry of the EF-Tu/eEF1a-aa-tRNA Complex to function in mRNA surveillance or mRNA quality
Two family members of eEF1a in eukaryotes have control (Doma and Parker 2006). However, the precise
similar modes of action as eEF1a. Instead of forming molecular mechanism underlying this putative func-
complexes with RNA factors like tRNA, these proteins tion is not well understood.
form complexes with protein factors that mimic the The tRNA mimicking factors, eRF1 and Dom34,
structure and function of tRNA. One of these proteins are conserved in archaea as well as in eukaryotes.
is the class II eukaryotic release factor (eRF3) that However, their interacting GTPases, eRF3 and HBS1
forms a complex with class I eukaryotic release factor respectively, have not been identified in archaeal
(eRF1), which mimics tRNA in functional as well as genomes despite extensive research (Atkinson et al.
structural aspects to catalyze the decoding of stop 2008). A study has recently shown that canonical
codons in a way similar to tRNAs. Another family archaeal EF1a can form functional tRNA mimicking
member, HBS1, forms a complex with Dom34, complexes with any of the archaeal eRF1 or Dom34
which is partly homologous to eRF1. The HBS1– factors (Saito et al. 2010). Archaeal EF1a (aEF1a)
Dom34 complex resembles the eEF1a-aa-tRNA may thus function as a versatile carrier protein for
Evolution of Elongation Factor 1 Alpha 681 E
Interaction Evolutional event Interaction
tRNA tRNA
EF1α
aRF1 eRF1
Cenancestor eRF3
aEF1α EF1α
aPelota (aEF1α-like) Pelota E
HBS1
Archaea Eukaryotes
Evolution of Elongation Factor 1 Alpha, Fig 2 The special- termination, and mRNA surveillance. In archaea (toward the
ized eukaryotic EF1a and the related proteins eRF3 and HBS1 left), aEF1a remains a versatile translational GTPase. In eukary-
and the multipotent archaeal EF1a. Black arrows indicate the otes (toward the right), the cenancestoral EF1a differentiated
evolutionary events and gray arrows indicate interactions. into multiple proteins with specific functions and interacting
Archaeal and eukaryotic EF1a and eukaryotic eRF3 and HBS1 partners
presumably share a cenancestor that functioned in elongation,
Kisselev L, Ehrenberg M, Frolova L (2003) Termination of and load Met-tRNAiMet onto the P-site (▶ Translation
translation: interplay of mRNA, rRNAs and release factors. Initiation). The direct partner of Met-tRNAiMet is eIF2
EMBO J 22:175–182
Margus T, Remm M, Tenson T (2007) Phylogenetic distribution made of a, b, and g subunits. The GTP-binding
of translational GTPases in bacteria. BMC Genomics 8:15 subunit, a/eIF2g, is a close structural homologue of
Saito K, Kobayashi K, Wada M, Kikuno I, Takusagawa A, EF-Tu/eEF1A (Schmitt et al. 2002) (Fig. 1b) (▶ Evo-
Mochizuki M, Uchiumi T, Ishitani R, Nureki O, Ito K lution of Elongation Factor 1 Alpha). Similar to the
(2010) Omnipotent role of archaeal elongation factor 1
alpha (EF1{alpha}) in translational elongation and termina- latter, GTP-bound eIF2 forms a ternary complex (TC)
tion, and quality control of protein synthesis. Proc Natl Acad with Met-tRNAiMet off the ribosome and delivers the
Sci USA 107:19242–19247 aa-tRNA to it. The Ser-51 of eIF2a is the conserved
site of phosphorylation by various stress-activated
kinases (▶ Translation Initiation, ▶ Translational
Control of GCN4). Interestingly, e/aIF2a is a fusion
Evolution of Eukaryotic Translation between the N-terminal OB-fold domain, containing
Initiation Factors Ser-51, and the C-terminal domain (CTD) similar to
eEF1Ba, the catalytic subunit of eEF1B homologous
Katsura Asano to EF-Ts (▶ Translation Elongation) (Ito et al. 2004).
Molecular, Cellular, and Developmental Biology The aIF2a-CTD interacts with the domain II of aIF2g,
Program, Division of Biology, Kansas State just as eEF1Ba binds the domain II of eEF1A during
University, Manhattan, KS, USA the guanine nucleotide exchange reaction, although
their orientations relative to the domain II of the
GTP-binding protein are different (Yatime et al.
Definition 2007). The evolutionary relationship between the
OB-fold domains of e/aIF1A and e/aIF2a is yet
The current repertoire of eukaryotic translation initia- unclear.
tion factors (eIFs) evolved on the basis of the set of eIF1 and eIF2b-CTD have a similar eIF1/2/5-fold,
initiation factors found in archaea (aIFs) and the addi- suggesting their common origin (Conte et al. 2006).
tion of new proteins belonging to different protein Because a/eIF1’s function in start site fidelity appears
superfamilies including those specifically found in to be fundamental to the ribosome function, it is tempt-
eukaryotes. In this entry, the evolutionary relationship ing to hypothesize that this was the precursor of the
and origin of these eIFs are described. proteins of this family. In agreement with this, some
bacteria encode an eIF1 relative, termed YciH (named
in E. coli), which can replace eIF1 function in mam-
Characteristics malian reconstitution assays (Lomakin et al. 2006).
Based on this result, Lomakin et al. proposed that
Of ten eIFs involved in eukaryotic translation initia- eIF1/YciH was the major fidelity factor in the common
tion, two of them, eIF1A and eIF5B are universally ancestor and replaced with the newer protein, IF3, in
conserved (Fig. 1a) (for initiation factors, see bacterial lineages. However, since only a few bacterial
▶ Translation Initiation and its Table 1). eIF5B is species encode YciH (Kyrpides and Woese 1998), it
a member of translational GTPase superfamily and its cannot be ruled out that this was transferred from
bacterial counterpart is IF2 (▶ Evolution of Elongation Archaea by horizontal gene transfer (“?” in Fig. 1).
Factor 1 Alpha). eIF1A is an OB-fold protein homol-
ogous to bacterial IF1; OB-fold proteins include many Evolution of HEAT-Domain-Containing Factors
ribosomal proteins and are generally characterized as and Regulators
nucleic acid-binding proteins (Theobald et al. 2003). The mRNA-binding factor, eIF4G, in most metazoan
species carry three HEAT domains, MIF4G/HEAT1,
Evolution of eIF1 and eIF2 Conserved in Archaea MA3/HEAT2, and W2/HEAT3 as eIF4A- and eIF4E
eIF1, eIF1A, and eIF2 directly interact with the 40S kinase-binding sites (Fig. 2a). eIF4G identified in other
subunit, the eukaryotic ribosomal small subunit (SSU), species, such as plants and fungi, lack the W2 domain
prevent the SSU reassociation with the large subunit, or MA3 and W2 domains together (see Fig. 2b for
Evolution of Eukaryotic Translation Initiation Factors 683 E
Evolution of Eukaryotic Translation Initiation Factors, conserved universally (a), functionally conserved with archaeal
Fig. 1 Evolutionary relationship between translation initiation factors (b), with HEAT domains (c), and with their precursors
factors found in the universal ancestor (LUCA, last universal identified in archaea (d) are indicated by bracket or thin arrow.
common ancestor), archaea and eukaryotes. Each vertical col- Proteins, which appeared during eukaryotic evolution, are
umn with a distinct color represents the group of proteins in listed between the columns representing archaea and eukaryotes.
the hypothetical genome of LUCA (left column), or the proto- RRM superfamily is found in all three domains of life, so this
typical genome of archaea (middle) or eukaryotes (right). Pro- is listed to the left of the archaeal column. eIF2a, eIF2Be,
teins belonging to the same protein superfamily are described in and eIF5 are proposed to be generated by fusion between
the same color as defined in the blue box at bottom left. Proteins two distinct domains, and their origins are listed separately by
connected by solid horizontal arrows are homologous proteins. the color code and two arrows that merge together as defined in
Dotted horizontal arrows connect proteins under the same super- the blue box (except eIF2a-CTDEF1B derived from EF-Ts/
family but without direct homology. Eukaryotic factors eEF1Ba)
yeast eIF4G). Marintchev and Wagner identified of the nuclear cap-binding protein 80 (CBP80)
a significant homology between each of the three dis- (Marintchev and Wagner 2005). CBP80 is the larger
tinct HEAT domains of the metaozan eIF4G and those subunit of nuclear cap-binding complex (CBC). This is
E 684 Evolution of Eukaryotic Translation Initiation Factors
elF2
e elF5 GAP W2
AA-1 AA-2
elF2
f 5MP/BZW MA3 W2
AA-1 AA-2
Evolution of Eukaryotic Translation Initiation Factors, listed across the top. Bracket indicates an approximate area of
Fig. 2 HEAT domain-containing translation initiation factors interaction with indicted partners. Green boxes indicate HEAT
and regulators. Primary structures of human eIF4G1, yeast (S. domains. The C-terminal HEAT domain is the W2 domain with
cerevisiae) eIF4G2, human p97/NAT1/DAP5, human 5MP1/ short thick lines representing the location of AA-boxes 1 and 2
BZW2, and yeast eIF2Be and eIF5 are drawn to scale with filled (AA-1, AA-2, respectively)
boxes indicating segments known to interact with their partners
a highly conserved protein and this protein from all understood as having lost the W2/HEAT3 domain, or
eukaryotes has the three HEAT domain structure, in subsequently the MA3/HEAT3 domain in yeast, by
contrast to eIF4G. The smaller subunit CBP20 of CBC, intron or stop codon mutations (Fig. 3g–i).
carrying a common and ancient RNA recognition eIF5 and eIF2Be, the GTPase activating protein
motif (RRM), directly binds the mRNA cap. The cyto- (GAP) and the catalytic subunit of the guanine nucle-
plasmic cap-binding protein, eIF4E, has a distinct otide exchange factor (GEF) for eIF2, carry the W2
unique fold, suggesting that the cytoplasmic complex domain at their C-termini as eIF2-binding sites
(eIF4F) is a newer invention in terms of the cap- (Fig. 2d, e) (▶ Translation Initiation). Translational
binding function. Together, these findings led regulators NAT1/p97/DAP5 and eIF5-mimic protein
Marintchev and Wagner to propose that the metazoan (5MP)/BZW also carry HEAT domains, including W2,
eIF4G was the first HEAT-domain-containing transla- as found in metazoan eIF4G (Fig. 2c, f; reviewed in
tion factor, which evolved by gene duplication of pri- Singh et al. 2011). Because the W2 domains in these
mordial cap-binding protein (Fig. 3a–b). The extant proteins are distinct from the HEAT3 of CBP80,
form of eIF4G was assumed to have evolved when it lacking the last HEAT repeat found in the latter,
acquired a long N-terminal unstructured extension these W2 domains are likely to be derived from the
containing the binding sites for eIF4E and PABP archaic eIF4G with the three HEAT domains.
(Fig. 3c; see Fig. 2a–b for eIF4E and PABP binding Because NAT1 family is limited to Metazoa and is
sites). The diversity of eIF4G structure in eukaryotes is more similar to eIF4G proteins therein than those from
Evolution of Eukaryotic Translation Initiation Factors 685 E
(a) Eukaryotic H1 H2 H3 Primordial cap binding protein
ancestor
5MP
(d) elF2- H1 H2 H3 H2 H3 H1 H2 H3
binding E
proteins (g) Plants
H1 H2
Amoebozoa
H1 H2 H3 H2 H3 H1 H2 H3
H1 H2 H3 H1 H2 H3
H2 H3
(f) Metazoa
except H1 H2 H3
Nematodes SNT AT H3 NTD Zn H3
NAT1
Evolution of Eukaryotic Translation Initiation Factors, versions by gene duplication. (c) The cytoplasmic version
Fig. 3 The model of the evolution of HEAT domain-containing obtained N-terminal extension to bind eIF4E and became
translation initiation factors and regulators. The proposed eIF4G. (d) eIF5-mimic protein (5MP) evolved by truncation of
course of evolution of HEAT domain-containing proteins, the second eIF4G copy and became eIF2-binding translational
starting with the primordial cap-binding protein with three regulator. (e) eIF5 and eIF2Be acquired the W2 domains from
HEAT domains (three boxes representing HEAT1, HEAT2, 5MP in the last eukaryotic common ancestor (LECA). (f) NAT1
and HEAT3), is depicted in each hypothetical step of eukaryotic evolved from eIF4G more recently in metazoa (f). Because
evolution. (a) The early eukaryotic ancestor without a nucleus eIF4G has transferred the function carried by W2 to 5MP, this
contained a single cap-binding protein. (b) Formation of the domain of eIF4G tends to be lost during the course of eukaryotic
nucleus allowed the evolution of nuclear and cytoplasmic evolution (g–i)
outside (Marintchev and Wagner 2005), this protein The emergence of eIF5 and eIF2Be is understood as
evolved in the ancestor of Metazoa (Fig. 3f). On the the gene fusion of a duplicated copy of aIF2b (eIF1/2/5
other hand, nearly universal distribution of 5MP with fold) and the ancestor of the unique protein eIF2Bg
MA3 and W2 domains in eukaryotes including the (see below), respectively, with the C-terminal portion
primitive species Giardia lamblia (Marintchev and of 5MP (Figs. 1c and 3d–e).
Wagner 2005) argues for its ancient origin and sug-
gests that this was the evolutionary precursor for the Evolution of eIF4A and eIF2B
W2 domains of eIF5 and eIF2Be (Fig. 3d–e). Impor- Proteins with the DExD/H-box are ATP-dependent
tantly, this protein lacks the MIF4G domain essential RNA helicases, which have diversified to carry out
for eIF4A binding. Instead, this has the W2 domain, a variety of biological processes in eukaryotes (Tanner
which serves as the eIF2-binding site for all the extant and Linder 2001). These include transcription, ribo-
W2 domains except the eIF4G-W2 domain (reviewed some biogenesis, RNA processing and export, and
in Singh et al. 2011). Thus, the original function of translation initiation and termination. eIF4A is the
5MP might have been to bind eIF2 and regulate its major DExD/H-box RNA helicase involved in transla-
guanine nucleotide binding, similar to the GDP disso- tion initiation. Although eIF4A requires the associa-
ciation inhibition activity found for yeast eIF5-W2 and tion with eIF4G for its activity, some of the DExD/H
the adjacent linker peptide (Jennings and Pavitt 2010). helicases, such as yeast Ded1p and human DHX29 are
E 686 Evolution of Eukaryotic Translation Initiation Factors
involved in translation initiation, but do not require Unstructured Segments of eIFs Defining Functions
or bind eIF4G for their activities. Interestingly, archaea Unique to Eukaryotic Initiation
contain a prototypical member of this family, Similar to other eukaryotic proteins, more than half, on
MjDEAD (named in M. jannaschii). The original func- average, of the entire length of each eIF polypeptide is
tion of MjDEAD in archaeal biology is unknown. unfolded, yet carrying important functions. Examples
About half of the sequenced archaeal genomes include the N-terminal segment of eIF4G containing
encode a protein termed aIF2B more similar to PABP- and eIF4E-binding sites and the N-terminal
eIF2Ba/b/d than distinct archaeal enzymes with similar segment of eIF2b carrying conserved lysine-rich
folds (RBPI, ribose-1,5-bisphoshate isomerase, and segments (K-boxes) mediating eIF2 binding to eIF5
MTNA, methylthioribose-1-phosphate isomerase). and eIF2Be (▶ Eukaryotic Translation Initiation Fac-
aIF2B copurifies with aIF2a and a sugar-phosphate tor Interactions). eIF1A, the universally conserved
nucleotidyltransferase-like protein (SP-NUCT) from factor, also carries eukaryote-specific N-terminal and
T. kodakaraensis (Dev et al. 2009). Interestingly, bioin- C-terminal tails, which play important roles in regulat-
formatics analysis indicates that eIF2Bg/e has motifs ing ribosome conformation in response to AUG selec-
found in SP-NUCT. Because the SP-NUCT does not tion (Saini et al. 2010). Thus, the additional proteins or
carry the W2 domain essential for GEF catalysis, protein segments found in eukaryotic initiation sys-
this archaeal protein would rather be homologous to tems confer mRNA recruitment by unique terminal
non-catalytic subunit eIF2Bg. These results suggest modifications (m7G-cap and polyA tail), the accuracy
that aIF2B and the SP-NUCT originally evolved as in start codon selection and (via phospho-eIF2-
eIF2-binding proteins, which might regulate eIF2 in regulated eIF2:eIF2B interaction) the translational
response to sugar or nucleotide binding. The extant regulation unique to this domain of life.
eIF2B is hypothesized to have arisen by gene triplication
and duplication, respectively, of aIF2B and SP-NUCT,
heteropentamer formation by their gene products and Cross-References
a subsequent gene fusion between a SP-NUCT copy and
5MP, generating the catalytic subunit eIF2Be (Fig. 1d). ▶ Eukaryotic Translation Initiation Factor Interactions
▶ Evolution of Elongation Factor 1 Alpha
Evolution of eIF3 and RRM-containing Factors ▶ Translation Elongation
Human eIF3 has 13 distinct subunits (a-m), of which ▶ Translation Initiation
six of them contain a PCI domain whereas two of them ▶ Translational Control of GCN4
contain a MPN domain (Zhou et al. 2008). Six PCI
proteins and two MPN proteins are also found in
the 26S proteasome lid and the COP9 signalosome
deNEDDylase complexes (yellow box in Fig. 1). References
Cryo-EM analysis of the 26S proteasome lid suggests
Conte MR, Kelly G, Babon J, Sanfelice D, Youell J, Smerdon SJ,
that the PCI and MPN proteins form a ring structure. Proud CG (2006) Structure of the eukaryotic initiation factor
Thus, it appears that these motifs were used to build 5 reveals a fold common to several translation factors. Bio-
a large scaffold. Besides these subunits, eIF3 contains chemistry 45:4550–4558
Dev K, Santangelo TJ, Rothenburg S, Neculai D, Dey M, Sicheri F,
a WD40 subunit (eIF3i) (WD40 family is uniquely
Dever TE, Reeve JN, Hinnebusch AG (2009) Archaeal
eukaryotic and forms a b-propeller structure), aIF2B interats with eukaryotic translation initiation factors
two RRM subunits (eIF3b and g) and two other sub- eIF2alpha and eIF2Balpha: implications for aIF2B function
units (eIF3d and j) unique to eIF3. As expected for and eIF2B regulation. J Mol Biol 392:701–722
Ito T, Marintchev A, Wagner G (2004) Solution structure of
a reaction involving multiple mRNA binding steps,
human initiation factor eIF2alpha reveals homology to the
additional factors eIF4B and eIF4H contain an RRM. elongation factor eEF1B. Structure 12:1693–1704
A structural homology between CBP20 (see above), Jennings MD, Pavitt GD (2010) eIF5 has GDI activity necessary
the cap-binding subunit of CBC, and eIF4H has been for translational control by eIF2 phosphorylation. Nature
465:378–381
suggested (Marintchev and Wagner 2005), but its sig-
Kyrpides NC, Woese CR (1998) Universally conserved
nificance in the mechanism of translation initiation is translation initiation factors. Proc Natl Acad Sci USA
unknown. 95:224–228
Evolution of Metabolism, Amino Acid Biosynthesis Pathways 687 E
Lomakin IB, Shirokikh NE, Yusupov MM, Helln CUT, Pestova and use free energy to perform a wide diversity of
TV (2006) The fidelity of translation initiation: reciprocal functions. This process couples exergonic reactions
activities of eIF1, IF3 and YciH. EMBO J 25:196–210
Marintchev A, Wagner G (2005) eIF4G and CBP80 share of nutrient oxidation to the endergonic processes
a common origin and similar domain organization: implica- required to maintain the vital state of the cell, such as
tions for the structure and function of eIF4G. Biochemistry mechanical work, active transport, and biosynthesis
44:12265–12272 of molecules. Therefore, the origin and evolution of
Saini AK, Nanda JS, Lorsch JR, Hinnebusch AG (2010) Regula-
tory elements in eIF1A control the fidelity of start codon enzyme-driven metabolism are based on the idea of
selection by modulating tRNAiMet binding to the ribosome. gene duplication, followed by divergence. In this
Genes Dev 24:97–110 regard, two main hypotheses have been proposed to
Schmitt E, Blanquet S, Mechulam Y (2002) The large subunit of explain how the metabolic pathways actually origi-
initiation factor aIF2 is a close structural homologue of
nated: The ▶ retrograde hypothesis suggests that, in E
elongation factors. EMBO J 21:1821–1832
Singh CR, Watanabe R, Zhou D, Jennings MD, Fukao A, Lee the case where a substrate tends to be depleted, gene
B-J, Ikeda Y, Chiorini JA, Fujiwara T, Pavitt GD et al duplication can provide an enzyme capable of supply-
(2011) Mechanisms of translational regulation by a human ing the exhausted substrate, giving rise to homologous
eIF5-mimic protein. Nucl Acids Res 39:8314–8328
Tanner NK, Linder P (2001) DExD/H Box RNA helicases: from enzymes catalyzing consecutive reactions. In other
generic motors to specific dissociation functions. Mol Cell words, if a compound A was essential for the survival
8:251–262 of primordial cells, when A became depleted from the
Theobald DL, Mitton-Fry RM, Wuttke DS (2003) Nucleic acid primitive soup, this should have imposed a selective
recognition by OB-fold proteins. Annu Rev Biophys Biomol
Struct 32:115–133 pressure allowing the survival and reproduction of
Yatime L, Mechulam Y, Blanquet S, Schmitt E (2007) Structure those cells that were become able to perform the trans-
of an archaeal heterotrimeric initiation factor 2 reveals formation of a chemically related compound “B” into
a nucleotide state between the GTP and the GDP states. “A” catalyzed by enzyme “a” that would have led to
Proc Natl Acad Sci USA 104:18445–18450
Zhou M, Sandercock AM, Fraser CS, Ridlova G, Stephens E, a simple, one-step pathway. On the other hand, the
Schenauer MR, Yokoi-Fong T, Barsky D, Leary JA, ▶ patchwork hypothesis postulates that duplication of
Hershey JWB et al (2008) Mass spectrometry reveals genes encoding primitive and promiscuous enzymes
modularity and a complete subunit interaction map of the (capable of catalyzing various reactions) allows each
eukaryotic translation factor eIF3. Proc Natl Acad Sci USA
105:18139–18144 descendant enzyme to specialize in one of the ancestral
reactions. These hypotheses could explain the actual
anatomy of the metabolical pathways. Recently,
another mechanism associated to the metabolical
growth suggests the existence of ▶ alternative routes
Evolution of Metabolism, Amino Acid or alternologs, defined as branches or enzymes that,
Biosynthesis Pathways proceeding via different metabolites, converge in
a common end product. These alternative branches con-
Georgina Hernández-Montes1, Dagoberto Armenta- tribute to genetic buffering similar to gene duplication. It
Medina2 and Ernesto Pérez-Rueda2 has been proposed that the origin of alternative branches
1
Departamento de Medicina Molecular y Bioprocesos, is closely related to different environmental metabolite
Universidad Nacional Autónoma de México, sources and life styles among species (Conant and
Cuernavaca, Morelos, México Wagner 2003; Hernandez-Montes et al. 2008; Horowitz
2
Departamento de Ingenierı́a Celular y Biocatálisis, 1945).
Instituto de Biotecnologı́a, Universidad Nacional
Autónoma de México, Cuernavaca, Morelos, México
Characteristics
and Eucarya and the complexity of the living metabolic networks exhibit a high retention of dupli-
world (Caetano-Anolles et al. 2007). At present the cates within functional modules, and a preferential
large-scale information derived from genomics and biochemical coupling of reactions. This retention of
proteomics studies has allowed the development of duplicates may result from the biochemical rules
databases devoted to understand the metabolic pro- governing substrate-enzyme-product relationships.
cesses, such as KEGG and MetaCyc. The information The retention of duplicates between chemically dis-
contained in these databases can be analyzed from similar reactions is, however, also greater than
a network perspective (Diaz-Mejia et al. 2007) giving expected by chance as suggested Diaz-Mejia et al.
a more integrative view of cellular functioning and for (2007). Finally, a significant retention of duplicates
the global understanding of genomes properties and as groups, instead of single pairs, has been also
dynamics. In this regard, cell metabolism can be con- reported. In brief, the duplication events were evalu-
sidered as one of the most ancient biological networks, ated among modules using a hierarchical clustering
where the nodes represent substrates and/or enzymes algorithm. A paired measure of evolutionary distance
and the edges represent the relationships among them (ED) for all-against-all metabolic pathways was then
(Fig. 1). From this perspective, the study of metabolic calculated. In brief, the ED is a measure of the reten-
networks has focused on describing topological prop- tion of duplicates between pathways within and
erties such as the existence of functional modules, between modules, and is calculated as follows:
giving special relevance to the clustering and motif (ED) ¼ A0 /(A0 + AB), where A0 is the number of
formation, and showing the existence of similar attri- enzymes of the smaller pathway (pA) without homologs
butes to the small-world and scale-free networks in the second pathway (pB). AB is the number of
(Barabasi and Oltvai 2004). A small-world network is enzymes of pA with homologs in pB. At one extreme,
a type of graph in which most nodes are not neighbors when all the enzymes of pA have homologs in pB, the
of one another, but most nodes can be reached from evolutionary distance converges on 0. In contrast,
every other by a small number of steps, and where the when the two pathways share no homologs, the value
time required for a spreading of perturbation is close to of evolutionary distance converges on 1 (Diaz-Mejia
the theoretically possible minimum for a graph with et al. 2007). This analysis shows that metabolic path-
the same number of nodes and edges. This property ways of the same module tend to have a lower ED,
allows metabolism to react faster to perturbations. In suggesting a greater retention of duplicates within
addition, the scale freeness property (few nodes called modules than among them. Based on these data, the
“hubs,” having many more connections than the most capability of metabolic networks to grow modularly by
of others nodes in the network) is related with the gene duplication may be a consequence of two factors:
robustness of a metabolic networks, where the fraction the closeness between reactions, and the kind of sub-
of nodes with low connectivity in a small-world net- strate(s) participating within each module (Fig. 1b).
work is much higher than the fraction of hubs, in
consequence the probability of deleting an important Retention of Duplicates as Groups and Single
node is very low. Another relevant feature of metabolic Entities
networks is its modularity, where each module is It was observed that a high frequency of duplicates
a discrete entity of elementary components (enzymes were retained as groups (those defined here as pairs
and/or substrates) that perform a certain task, separable of consecutive reactions), instead of single entities.
from the functions of other modules. The elements of Based on this approach, it has been determined that
each module are similar to each other and may have enzymes catalyzing chemical similar reactions (CSRs,
been subjected to the same evolutionary process, such or those reactions that share a common enzymatic E.C.
as the amino acid biosynthesis pathways and lipid and number (preferentially the first two digits). Therefore,
nucleotide metabolisms (Fig. 1a). despite having similar reaction chemistry, enzymes
could exhibit different substrate specificities) in the
Metabolic Evolution by Recruitment of Modules fatty-acid metabolism have been originated by gene
Based on a network-based approach, an increased duplication (Diaz-Mejia et al. 2007). Thus, an ances-
retention of duplicates for enzymes catalyzing consec- tral pathway catalyzed both fatty-acid degradation and
utive reactions has been identified. As a consequence, biosynthesis could have originated in a first step.
Evolution of Metabolism, Amino Acid Biosynthesis Pathways 689 E
a
b ED
1.00
0.67
0.33
0.00
Evolution of Metabolism, Amino Acid Biosynthesis Path- Modules comprising related branches are indicated by color as
ways, Fig. 1 The metabolism can be analyzed as a network, in (a). A value of ED closer to zero (the darker squares) implies
where enzymes are nodes and substrates are edges, allowing the a greater retention of duplicates between the two given path-
evaluation of the influence of network modularity on the reten- ways. From this tree the amino acid pathways (c) exhibit a large
tion of duplicates: (a) A hierarchical clustering showing the fraction of duplicated events. Posterior analyses showed that the
modules in metabolic networks (in colors) (b) Metabolic path- amino acid biosynthetic network exhibits diverse alternative
ways (branches in the trees) within and across modules were pathways, defined as alternative or alternolog, whereas ancient
compared using a measure of evolutionary distance (ED). pathways are more connected among them
Evolution of Metabolism, Amino Acid Biosynthesis Pathways 691 E
The direction of this ancestral pathway would be of increasing genome size, protein structural complexity,
dependent on the acyl carriers and fatty acids available, and selective pressures in changing environments. In the
evidencing the existence of two different pathways. evolution of amino acid biosynthesis, for instance, diverse
Diaz-Mejia et al. and Light et al. (Diaz-Mejia et al. studies have suggested that they could be among the
2007; Light et al. 2005) using similar approaches earliest metabolic compounds and that their growth has
evaluated the generality of this observation, using an been affected by gene duplication. Based on this analysis,
all-against-all comparison of the enzymes catalyzing a core of 11 pathways that probably occurred in the last
consecutive CSRs. They identified that around 15% of common ancestor was identified using a combination of
enzymes have at least one homolog in a metabolic path- genomic tools with a network approach (Hernandez-
way. Of these, two thirds are retained as isolated dupli- Montes et al. 2008). Figure 1c. One of the most important
cates and a third is retained as groups, for instance, the results from this analysis is that diverse enzymes synthe- E
amino acid metabolic pathways. Alternatively, an esti- sizing 16 out of the 20 standard amino acids were identi-
mation of retention of duplicates in metabolism using an fied in the three cellular domains. It is important to
enzyme-centric network approach showed significant highlight that the full biosynthetic routes for these 16
results highlight the influence of both distances apart in amino acids, such as tryptophan, arginine, and proline,
the network and chemical similarity of reactions on the among others amino acid pathways appeared early in
retention of duplicates. Specifically, an increased reten- evolution. Therefore, when a network approach is used
tion of duplicates between consecutive reactions was to identify functional relationships among all pathways
observed and showed that this bias can be partially (Fig. 1c), the universal branches are connected to each
attributed to the preferential biochemical coupling of other, allowing the possibility that they feedback and
reactions. In addition, these analyses show a significant they complete a minimal set of enzyme-driven reaction
retention of duplicates acting on both CSRs and chemical for the biosynthesis of amino acids. In addition, path-
different reactions (CDRs, defined by the mismatch in ways, as lysine, methionine, and valine, among others,
the first two digits of the sequence enzymatic number. are partially distributed across the three domains of life,
Therefore, despite having different reaction chemistry, with branches identified in the LCA. This data suggest
enzymes could have similar substrate specificities), that partial branches may be intimately connected to the
supporting the idea that gene duplication is important main branches identified as the core, previously
in generating innovations as well as metabolic variants. described. In addition, in the same analysis, the authors
A synergy between closeness in the network and chem- identified the contribution of paralogs (duplicated genes
ical similarity between reactions explains the high reten- within an organism to occupy two different positions in
tion of duplicates between consecutive CSRs, suggesting the same genome, then copies evolve new functions,
that duplicates are significantly retained as groups that even if these are related to the original one), analogs
can be extended to several series of reactions. (genes that perform the same function but their
sequences do not have a common evolutionary history
New Insights on the Evolution of Metabolic and therefore come from independent origins) and
Pathways alternologs (see Definition section) in the evolution of
The modern perspective of metabolic processes has biosynthesis of amino acids. Eleven out of the 20 amino
shown that evolutionary studies must include not only acid biosynthetic routes revealed an important contribu-
phylogenetic relationships among enzymes, but also the tion of paralogs to the generation of diversity, 8 out of the
influence of some topological properties of metabolic 20 amino acids routes show contribution of analog
networks. In evolutionary terms, one can assume that the enzymes, while alternologs routes participate in 9.
universal occurrence of some pathways and branches in
modern species would suggest that they existed in the Concluding Remarks
last common ancestor (LCA). The evolution of these It is well accepted that biological networks exhibit sim-
pathways and the emergence of diverse homologues ilar topological properties, such as their scale-free behav-
reflect an increased metabolic diversity as a consequence ior and hierarchical modularity. Based on recent studies,
E 692 Evolution of Regulatory Circuits
Characteristics
Synonyms
Pseudocode :
with high “fitness” are treated as good solutions, and In transcription regulatory network construction, the
will be “reproduced” from last step and “recombined” fitness function can be the difference between the
with other solutions by swapping parts of the chromo- observed gene expression data and the calculated
some. Furthermore, the chromosomes are “mutated” expression data. Another possible fitness function is
with small changes made to the elements of the the Akaike’s information criterion (AIC) (▶ Akaike’s
chromosomes. Some new chromosomes are generated Information Criterion (AIC)) which is a measure of the
through recombination and mutation operations. Only goodness of fitting an estimated statistical model with
those chromosomes with high fitness will enter next the data and can be used as the fitness function in
step. evolutionary algorithm–based transcription regulatory
Evolutionary algorithms have been successfully network construction (Shin and Iba 2003).
applied to various fields, such as pattern recognition,
biomedical engineering, and transcription regulatory The General Steps of EA
network construction. Especially, evolutionary algo- The general steps of evolutionary algorithm are listed
rithms can be used to calculate the ligand-receptor below:
binding affinities (▶ Binding Affinity) (Morris et al. 1. Initialize a population of chromosomes.
1998). The transcription regulations can be represented 2. Apply recombination and mutation to generate new
as a network, where nodes represent proteins or genes chromosomes.
and edges represent direct regulatory interactions (e.g., 3. Evaluate each chromosome according to the fitness
the binding of a transcription factor to the promoter of function.
its target genes). The transcription regulatory 4. Select chromosomes to reproduce and replace the
network (TRN) describes the TF-gene interactions as old chromosomes.
a graph or network, and gene expression is regarded as 5. Repeat steps 2–4, until the specified numbers of
a function of regulatory inputs specified by interactions generations are completed or special condition is
between proteins and DNA (Blais and Dynlacht satisfied.
2005). Transcription regulation has influence on most
physiochemical activities in a cell (Thomas 1998; An Example
Albert and Barabási 2002). Therefore, it is important Here, an example is used to explain the workflow to
to construct transcription regulatory network. Com- infer the transcription regulatory network by
pared with other methods for constructing transcrip- employing evolutionary algorithm. Suppose there is
tion regulatory networks, the evolutionary algorithms a microarray dataset with n genes measured at m time
are able to infer smaller regulatory networks with good points. The expression of genes at time point t + 1
sensitivity and precision. However, much more com- should be the regulatory result of the expression of
putation power is required to infer a large regulatory regulators at time point t. The regulatory relationship
network from the experimental data. of every gene with i ¼ 1, 2, , n and t ¼ 1, 2, , m
can be expressed as:
The initial population of chromosomes that represent where aji is the regulatory strength of gene j on gene i, xjt
the regulatory relationships between genes can be is the expression data of gene j at time point t, and xit+1 is
produced randomly or according to user-defined the expression data of gene i at time point t + 1.
restriction. Sequentially, these chromosomes can be Since it is not known which genes regulate gene i, all
interpolated by recombination and mutation. Then the genes are possible candidate regulators for gene i.
fitness function (▶ Fitness Function) is employed to The regulatory strength aji can tell which genes regulate
evaluate these recombined and mutated chromosomes gene i, and those genes with aji 6¼ 0 are possible
and only those chromosomes with high fitness are regulators. Since there are usually only a few measure-
selected to reproduce the next generation population. ments (i.e., m) but with many variables (i.e., n) in the
One important issue in EA is the design of the fitness model, it is not easy to solve the problem. An initial
function which is the criteria for evolution. population of N solutions can be randomly generated.
Evolvability 695 E
The difference between the true expression data and the the quality of the solution. In order to identify better
calculated data is regarded as a fitness function which and better solutions, EAs apply some operators
can be used to evaluate the chromosomes. For each inspired by evolution of living organisms (mutation,
chromosome k, its fitness is defined as follows: recombination, and selection) to the population of
individuals.
Fn ¼ min xit xk (2) There are different types of EAs, such as Genetic
it
1kN
Algorithms, Evolutionary Programming and Evolution
Strategies, which differ mainly by the encoding of the
individuals and the operators used during the
evolution.
Cross-References E
References
Evolvability
Departure of the odds ratio from 1 represents evi- i.e., Pðjn00 Eðn00 Þj jt Eðn00 ÞjÞ. The criteria
dence of association between the two variables. When can provide different results due to discreteness and
marginal totals are fixed, tables with larger n00 have potential skewness.
larger odds ratio providing evidence in favor of the
alternative hypothesis. The p-value of the test is given Conclusions
by the probability Pðn00 t Þ, where t∗ is the Fisher’s test of independence for 2 2 tables is
observed value of n00 (Agresti 2002). applicable for small sample sizes. The test uses exact
distributions rather than large-sample approximations.
Fisher’s Tea Tasting Experiment Note that in small samples it is not always possible
Fisher’s colleague claimed that when testing the tea, to achieve an exact significance level because the
she could distinguish the order in which milk and tea hypergeometric distribution is discrete and the p-value
were added to the cup. To test her claim, Fisher devised can assume only a few values. Consequently, the Fish-
a small tasting experiment in which four cups had milk er’s test is conservative as the true type I error would be
added first and another four cups had tea poured first. less than the pre-specified one.
The cups were presented in random order for tasting.
A colleague was asked to predict which four cups had
milk added first. The table below shows the results Cross-References
Truth/Guess Milk Tea Total
▶ Fisher’s Test
Milk 3 1 4
Tea 1 3 4 References
Total: 4 4 8
Agresti A (2002) Categorical data analysis. Wiley, Hoboken
Truly distinguishing the order of pouring as
opposed to simply guessing corresponds to testing
whether odds ratio is greater than one. Under the Exon
null hypothesis, the observed table has a probability
pð3Þ ¼ 0:229. The p-value is given by the probabil- Luiz O. F. Penalva
ity Pðn00 3Þ ¼ Pðn00 ¼ 3Þ þ Pðn00 ¼ 4Þ ¼ 0:243. Department of Cellular and Structural Biology,
Hence, there is no evidence to confirm that predic- Greehey Children’s Cancer Research Institute,
tions were more accurate than a simple guess. University of Texas Health Science Center,
The sample size, however, is rather small, so San Antonio, TX, USA
the lack of association cannot be deduced with
certainty. Fisher in reality was convinced of his
Definition
colleague’s ability.
Exons are short (50–200 bp in length) RNA sequences
Testing Two-sided Alternatives
that are found in pre-mRNAs. Exons contain the
To test for one-sided alternative, we can order the
sequences found in a mature mRNA. When a gene is
tables by their odds ratios or by their first cell fre-
transcribed, the precursor mRNA is composed of both
quency n00. The resulting p-value will be exactly the
introns and exons. After the removal of introns by the
same, irrespective of the choice of the ordering criteria.
spliceosome, the precursor mRNA matures into
In practice, the two-sided tests are a lot more
mRNA which is then exported into the cytoplasm.
common. However, when testing a two-sided alter-
This mature mRNA can now be translated into protein.
native, different criteria may result in different
p-values. To test for a two-sided alternative, one can
calculate the p-value by summing over Pðn00 Þ ¼ t Cross-References
for all t with pðtÞ pðt Þ. Another criterion sums p(t)
for tables that are farther from the null, ▶ Post-transcriptional Regulatory Networks
Experiment 699 E
Expectation Experiment
Expectation-Maximization Algorithm
Definition
Kejia Xu
Institute of Systems Biology, Shanghai University, A procedure, method, or setting of devices and tools E
Shanghai, China designed to manipulate objects and study their features
and working.
Definition
Characteristics
An expectation-maximization (EM) algorithm is
a method for finding maximum likelihood estimates of
For the most part of the twentieth century, philosophy
parameters in statistical models where the available data
of science focused on scientific theories and concep-
set is incomplete. The EM algorithm consists of two steps,
tual foundations of science, with a special emphasis on
firstly guessing a probability distribution over comple-
physics. Only in the last 30 years or so, experiments
tions of missing data given the current model (known as
and biology have become themes for philosophical and
the E-step) and secondly reestimating the model parame-
historical analysis. In the following, I present some key
ters using these completions (known as the M-step).
general issues and questions for philosophy of experi-
Given a likelihood function L(y; X, Z), where y is
ment as well as a number of methodological, episte-
the parameter vector, X is the observed data, and
mological, and ethical challenges that arise specifically
Z represents missing values, the maximum likelihood
from experimenting with living, evolving things:
estimate (MLE) is determined by the marginal likelihood
What are the distinctive features and epistemic roles
of the observed data L(y; x), which is often intractable.
of experiments? What do experiments tell us about
Therefore, the EM algorithm maximizes the expectation
nature? When is an experimental result valid and infor-
of the log-likelihood function, based on the observed
mative? What ethical and sociopolitical concerns may
samples and the current estimate of y. The two steps of
arise from biological experimentation?
the algorithm are:
E-step: At the (t)th step of the iteration, where y(t) is
Analytic Descriptions of Biological Experiments
available, compute the expected value of the
and Their Epistemic Roles in Science
log-likelihood function.
One of the key tasks for philosophy of biological
experimentation is the identification of the distinctive
QðyjyðtÞÞ ¼ E½log LðyjX; ZÞjX; y ; (1)
features of biological experiments, their aims, and the
nature of experimental reasoning. Most of this work
M-step: Compute the next (t + 1)th estimate of y by
has been done in response to the kind of philosophy of
maximizing QðyjyðtÞÞ, that is,
science that dominated the early twentieth century.
Until the late twentieth century, philosophy of science
@QðyjyðtÞÞ
yðt þ 1Þ : ¼ 0; (2) was almost exclusively concerned with physics, and
@y
for the most part with physical theories, their structure,
and the dynamics of theory change. Experimental
References practice was often regarded as part of the “context of
Dempster AP, Laird NM, Rubin DB. Maximum likelihood from
discovery” and thus not amenable to philosophical
incomplete data via the EM algorithm. J Roy Stat Soc B. analysis. Also, for a good portion of the twentieth
1977;39(1):1–38. JSTOR 2984875. MR0501537. century, philosophers generally recognized only one
E 700 Experiment
epistemic role for experimental evidence: It serves as of the system itself. Following François Jacob,
test to support or reject theories or to decide between Rheinberger emphasizes that experimental systems are
competing theories. Arguably, it was the biological “machines to create the future.” Because they have
sciences that helped challenge the received view the power to produce unexpected and surprising out-
about the nature of scientific experimentation and the comes, experimental systems influence the direction of
roles of experiments in science. Historians’ and phi- research. Rheinberger’s concept “experimental system”
losophers’ increased attention to biology in the 1980s thus redirects the focus of the analysis from the role of
and 1990s highlighted the diversity of scientific prac- experiments as tests for well-developed theories to the
tices and thus the limitation of physics-based philoso- role of experimental systems as generators of new
phy. They showed that experiments play a number of knowledge. The point of Rheinberger’s conception of
epistemic roles in science. Three conceptions have experimental systems is that it explicitly acknowledges
been particularly fertile for the study of experiments: the material, technical, and institutional setting of exper-
“discovering mechanisms,” “experimental systems,” iments. The experimental system with all its technical,
and “exploratory experimentation.” instrumental, institutional, social, and epistemic aspects
The notion that biological research can be described is the site of knowledge generation. This notion has
as “discovering mechanisms” has been very influential made it easy for historians and sociologists to connect
for recent thought about experimentation. This to Rheinberger’s work, while philosophers have been
approach has been developed to characterize the nature slower to take it up.
of experimental reasoning in biology, thereby chal- The third new conception that has become fruitful
lenging the view that reasoning practices are outside for philosophy of experiment more generally is
the scope of philosophical analysis. The basic idea “exploratory experimentation” (see also “▶ Explor-
originates in the study of molecular biology. Molecular atory Experimentation”). Episodes of experimentation
biology does not have the kind of grand theory based that can be characterized as “exploratory” have been
around a set of laws or a set of mathematical models identified in the history of physics as well as biology.
that is familiar from the physical sciences. Rather, The term highlights a mode of experimentation where
general knowledge in the field takes the form of explicit theoretical principles, guidelines, or concrete
descriptions of the interaction of entities and activities theoretical expectations are absent. In other words,
in a biological system (such as mechanisms of DNA exploratory experimentation is the opposite of experi-
replication or protein synthesis). Several philosophers mentation for the purpose of testing well-developed
of biology have argued that these mechanisms – theories. To say it with Richard Burian, exploratory
specific collections of entities and their distinctive experimentation is “data driven,” not “hypothesis
activities – are the fundamental unit of scientific dis- driven” (Burian 2007). Large-scale sequencing tech-
covery and scientific explanation (Machamer et al. nologies are a striking example of exploratory
2000). Experiments serve to identify the components research: These technologies generate hundreds of
and interactions of the mechanisms. This conception results, which can be considered robust even though
has proven valuable not only in molecular biology the theoretical interpretation remains unclear.
but in a wide range of special sciences, such as In recent studies of biological experimentation, the
neuroscience. once-common view that the epistemic role of experi-
A very fruitful analytic term to describe biological ments is to be a “test” of theories has thus come under
experiments is Hans-J€ org Rheinberger’s notion “exper- attack from various angles. Of course, biological
imental system,” a term that is taken directly from experiments may function as tests, but they may also
experimental biologists’ parlance. An experimental sys- fulfill a number of other functions. They may be
tem is the working unit of both scientists and epistemol- motors for the reorientation of research activity, aids
ogists. The basic idea is that investigators never perform for discovering mechanisms, and devices for creating
isolated experiments but series of interlocking experi- robust results where no theory is involved.
mental trials, and that the major driving force for mod-
ifications and changes in the experimental settings Reality and Reliability
and for reorientations of questions and goals is not General philosophy of scientific experimentation has
a preconceived theory or hypothesis but the behavior concentrated on two issues: the reality of the objects of
Experiment 701 E
our theories and experiments and the reliability of standardized experimental objects (see “▶ Model
experimental results. The long-standing debate in phi- Organism”). They are selected because they are very
losophy of science about realism and antirealism con- simple, readily available, hardy, with short gestation
cerns the interpretations of the relation to reality of periods, and a large number of offspring, such as the
scientific theories and the entities that these theories nematode C. elegans or the fruit fly. These objects
are about, such as electrons and quarks. The philosoph- are studied as stand-ins for others. Features that have
ical question is the following: Do these things really been uncovered in detail in one model organism are
exist in the world, or are they merely theoretical con- taken to be “exemplars” for others that cannot be
structs (useful fictions, conceptual tools) that scientists easily studied.
use to explain observations and experimental results? Philosophical discussion has focused on a number
Like physics and chemistry, biological research also of issues such as the sense in which model organisms E
concerns processes, phenomena, and objects that can- are “models” (Ankeny 2001). Are they exemplars or
not be directly observed, such as genes, neurons, and abstractions, typical, ideal, homologous, or analogous
proteins. What reasons do we have for believing models? And is it really appropriate to transfer the
that the objects and phenomena of biological thought insights gained about model organisms to organisms
really exist? in their natural habitat, especially given that in some
In his influential book Representing and Interven- cases, the laboratory environment has changed the
ing, Ian Hacking makes this debate a topic for philos- model organisms currently being used to an extent
ophy of experiment. Hacking argues that the dispute that they no longer resemble organisms in the wild?
between realists and antirealists can be brought to Also, to what extent is it legitimate to transfer these
a close if we take into account that experimenters insights to other, infinitely more complex organisms,
often use experimental objects to manipulate others – especially humans, given that model organisms are
particle accelerators, for instance, utilize subatomic usually chosen for practical purposes, such as easy
particles to interfere with other particles. In these tractability and general availability? Philosophers
cases, Hacking claims, one may assume that the parti- have addressed these questions through examining
cles that are used as tools are real. Biologists base their modeling practices and the usages of the information
interpretations and explanations of biological things gained from experimenting with model organisms
and phenomena on the data and images produced by (Weber 2005, Chap. 6).
instruments in often highly complex experimental Another central task for philosophy of experimen-
settings. But like physicists, they routinely interfere tation is the assessment of the empirical information
with the research objects and assess their results by generated in experiments. Experiments are complex
assessing the effects of their interventions. Like phys- affairs involving intricate settings and recalcitrant
icists, they thus have good reason to assume that the objects, and it is a struggle to get experiments to
subvisible things and phenomena they investigate work. When is an experiment(al result) reliable and
really exist. informative? Recent philosophers of experiment have
But even if we follow Hacking and adopt a realist put together sets of strategies that may be used to
stance vis-à-vis experimental objects in biology, other assess the validity of experimental results, including
questions arise. Experimental settings in the laboratory experimental checks and calibration; elimination of
are artificial surroundings very different from natural plausible sources of error and alternative explanations
habitats, and in this sense, experiments create effects of the result; using an apparatus based on a well-
that do not exist in nature. This raises issues about corroborated theory; and statistical validation of the
the scope of the knowledge generated by laboratory evidence (Franklin 1986).
experiments. What, if anything, can such experiments Several philosophers of biology have argued that
tell us about objects and phenomena outside the the concept of robustness plays a major part in the
laboratory? validation of experimental results. If multiple different
This question is especially pressing in the context of experiments are robust, that is, if they agree in their
biological experimentation because today much exper- outcomes, we may assume that the experimental result
imental work is carried out on a small number of model is reliable. In its most general characterization, the
organisms. Model organisms are selected and concept of robustness straddles epistemology and
E 702 Experiment
ontology. Robustness is regarded as the decisive crite- problem at hand and clarifying the meaning and moral
rion in separating what is real from what is unreal and significance of key concepts such as harm and
what is reliable from what is unreliable (Wimsatt coercion, but without appeal to high-level ethical
1981). In recent years, philosophers and historians of theory. Philosophers of experiment may contribute to
biology have probed a number of scientific episodes to these discussions through the analysis of potential
establish whether robustness really is the main crite- conflicts between methodological standards of valid
rion for the assessment of experimental outcomes, and experimentation and ethical concerns about responsi-
what other criteria have been used, and should be used, ble experimentation.
to evaluate experimental results. This discussion is Two fields of biomedical research in particular have
still ongoing. been subject of heated ethical as well as political
debates: stem cell research and the human genome
Sociopolitical and Ethical Dimensions of Biological project. Stem cell research aims to identify the mech-
Experimentation anisms that govern cell differentiation. The ultimate
Scientific experimentation raises important ethical and goal is to turn stem cells into specific cell types that can
sociopolitical issues related to the use of experimen- be used for treating debilitating and life-threatening
tally generated knowledge and the responsibility of diseases and injuries. While this research promises to
investigators. These are vast topics, and the discus- have extremely important therapeutic implications, it
sions surrounding them have grown extremely com- is also strongly contested because stem cells are
plex and controversial, involving experts and obtained from human embryos. The very practice of
arguments from ethics, politics, religion, and law. experimenting raises intricate ethical questions. Is it
Here I can only touch upon some of the questions and ethical to destroy human embryos for the purposes of
issues that need to be addressed. clinical research? Here, the discussions focus on the
Most sensitive ethical questions arise of course in beginning of human life, the embryo’s right to life and
experiments with living beings, especially in biomed- the relative values of embryonic life and the life of
ical contexts. What ethical codes are and should be a mature human being. At present, each of these points
applied in biological and biomedical experimentation? is still under discussion.
What are the limits of ethically responsible experimen- The human genome project (HGP) – the mapping
tation? On the most general level, experimenters are and sequencing of the human genome – has also
confronted with the question: When is it acceptable to generated much discussion among philosophers of sci-
expose some to risks of harm for the benefit of others? ence, bioethicists, political philosophy, and philosophy
This question has been hotly debated ever since the of law, but here the discussion has focused more on the
first experiments were performed on humans and ani- application and use of the knowledge generated by the
mals (cf., LaFollette and Shanks 1997). The general HGP. As the HGP assists scientists in the identification
ethical principles that should guide human experimen- of genes and their functions, genetic testing made
tation – the principles of beneficence, respect, and possible by the human genome project has raised
justice – are stipulated in official documents and policy a number of concerns and worries, for instance,
statements, as are the regulations for the use of animals about access to genetic information by insurers and
in research facilities. But the interpretation of these employers, and the possibility of genetic discrimina-
guidelines and their application to concrete cases are tion. Prenatal genetic testing raises concerns about
often extremely difficult. For a long time, it was com- reproductive rights and eugenics. The rapid develop-
mon to assume that the appeal to ethical theory such as ment of new technologies poses additional challenges
utilitarianism, deontology, or virtue ethics should to the discussions.
guide the practice of bioethics. But philosophical Finally, the commercialization of biomedical
discussions often proved unhelpful in concrete cases. research raises challenging questions for philosophers
In recent philosophical debates about the relation of biology. Arguably, in recent decades, the influence
between (applied) bioethics and philosophical theory, on biomedical research of business and industry has
many bioethicists and several philosophers have led to changes in the ethical and methodological norms
become skeptical of applying ethical theory to practi- of biomedical practice (Krimsky 2003). It is an impor-
cal problems. Instead, they advocate starting with the tant task for philosophy of experiment to assess and
Experimental Design 703 E
critique the impact on the integrity of biomedical
experimentation of the privatization and commercial- Experimental Design
ization of biomedicine.
Philosophy of experiment is still a vibrant branch of Clemens Kreutz1 and Jens Timmer2,3,4
1
philosophy of science, and each of the aspects Institute for Physics, Freiburger Center for Data
discussed in this entry invites further study. As the Analysis and Modeling (FDM), University of
recent development of the field shows, philosophy of Freiburg, Freiburg, Germany
2
experiment has considerably broadened its scope, Institute for Physics, University of Freiburg, Freiburg,
seeking to make contact with the biological sciences Germany
3
as well as history and sociology of science and thus to BIOSS Centre for Biological Signalling Studies and
increase its social relevance. Freiburg Institute for Advanced Studies (FRIAS), E
Freiburg, Germany
4
Department of Clinical and Experimental Medicine,
Cross-References Link€oping University, Link€oping, Sweden
▶ Exploratory Experimentation
▶ Model Organism Definition
▶ Active Learning
Experiment Design ▶ Experimental Design, Variability
▶ Input
▶ Model-based Experimental Design, Global Sensitiv- ▶ Observable
ity Analysis ▶ Optimal Experimental Design
E 704 Experimental Design, Variability
Synonyms
Cross-References
ESCEC
▶ Mixed and Multi-Level Models
▶ Relative Expression Analysis
Definition
Cross-References Definition
References
Characteristics
Hicks MG, Kettner C (2010) Experimental standard conditions
of enzyme characterizations. Logos, Berlin
Explanations in biology, unlike those in physics, rarely
proceed by filling out the parameters in mathematical
equations (laws) that describe the general characteris-
tics of the relevant form of organization. With the
Experimental Unit notable exception of population genetics and theoreti-
cal ecology, biologists typically employ mechanistic,
Olga Vitek functional, and historical strategies of explanation,
Department of Statistics, Department of Computer rather than mathematical ones. To some scientists,
Science, Purdue University, West Lafayette, IN, USA this is a sign of the immaturity of biology, perhaps
resulting from the difficulty to take into account the
extraordinary high number of parameters and variables
Definition relevant to the phenomena that interest biologists.
Structuralist biologists such as Brian Goodwin (e.g.,
Experimental unit is the basic unit of experimental 1994) have pointed to the dominance of gene-centered
design and is the subject, sample, or object that carries and historical approaches as the main source of imma-
the condition or the treatment. turity. They hope that in the future, more research,
more data, more computational power, better mathe-
matics, and above all, more insight into the proper way
Cross-References of explaining in the natural sciences and more willing-
ness to pursue the structuralist strategy will enable
▶ Designing Experiments for Sound Statistical what is thought to be a more scientific approach.
Inference Many others reject the view that the minor role of
general laws in biological explanation is a sign of its
immaturity. They argue that the objects of study in
biology have characteristics that justify approaches dif-
Expert Elicitation ferent from those suitable in physics. One such view
points to the process by which life gets its shape: evolu-
▶ Prior Elicitation tion by natural selection. For example, the evolutionary
biologist Ernst Mayr (1904–2005), one of the founding
fathers of the modern synthesis, argues that it makes
Expert System sense to ask why questions in biology but not in physics
and chemistry (e.g., Mayr 1997, Chap. 6). The reason is
▶ Knowledge-based System that organisms owe their characteristics to their history,
Explanation in Biology 707 E
whereas history is, in general, not important to the char- explain why a certain mechanism has the characteris-
acteristics of physical and chemical systems. So whereas tics it has.
physics and chemistry must limit themselves to how A better justification starts with the observation that
questions, biologists should address why questions in organisms are highly organized (see also the essay on
addition to how questions. Within biology, how ques- ▶ Organization): their capacities and characteristics
tions and why questions are, according to Mayr, the critically depend not only on the characteristics of
subject of two “largely separate fields”: functional biol- their parts but also on the spatial arrangement of
ogy and evolutionary biology, corresponding with two those parts and on the order and timing of their activ-
modes of explanation: functional explanation and ities. Explanations in physics and chemistry are typi-
evolutionary explanation. Functional explanations cally aggregative: they derive the behavior of a system
answer how questions by describing the operation of (such as the behavior of a volume of gas as described E
mechanisms. Evolutionary explanations answer why by the Boyle-Charles’ law) by aggregating the behav-
questions by revealing the evolutionary history of those ior of the parts of that system (the molecules of which
mechanisms. Both kinds of explanation are legitimate the gas is composed as described by Newton’s laws of
and needed to understand the living world, but evolu- motion). As philosopher William Wimsatt (e.g., 2007,
tionary explanations provide the distinctive biological Chap. 12) has argued, very few system properties are
perspective. aggregative under all possible decompositions of
Whereas Mayr seeks the justification of the biolo- a system into parts. The list is pretty much exhausted
gist’s concern with why questions merely in the impor- by the quantities that appear in conservation laws:
tance of history for understanding the characteristics of mass, energy, momentum, and net charge. All other
organisms, many philosophers (e.g., Dennett 1995) system properties are more or less organized in the
refer to the specific character of that history as being sense that they break down under at least one of the
a selection history. Unlike physical objects, organisms four conditions for aggregativity identified by Wimsatt
have the character they have because ancestral variants (invariance under rearrangement and substitution, size
with those characteristics were favored over variants scaling, decomposition and reaggregation, and linear-
with other characteristics. For that reason in biology, ity). By carefully choosing a decomposition, by limit-
but not in physics and chemistry, the answer to the ing explanations to certain forms of organization
question how a certain characteristic is produced must and by introducing idealizations and approximations
involve an answer to the question why (e.g., for what (in the description of the system and in the derivation
effects) that characteristic was favored in the selection of the system’s behavior), the power of aggregative
process. explanation can be expanded immensely. However, if
In my view, attempts to justify types of explanation a system is very highly organized, there comes a point
in biology that differ from explanations in physics and where aggregative explanation is no longer possible
chemistry by appeal to the importance of history for and where the system’s organization must be taken into
understanding organisms or by appeal to the special account.
character of that history are unsatisfactory. Such In biology, this is done by viewing organisms as
attempts take the importance of history for granted, mechanisms (▶ Mechanism) for being alive (Wouters
whereas they should explain it. Moreover, they fail 2005). For the purpose of this essay, a mechanism for
to address the many differences between functional a certain behavior can be defined as a complex system
biology and physics and chemistry (see Fox Keller that produces that behavior by the organized interac-
2002 to get a feeling for the weirdness of mechanistic tion of its parts (cp. Glennan 1996; Machamer et al.
explanation in biology in the eyes of a theoretical 2000). Because the behavior of a mechanism results
physicist). The most notable differences are a prefer- from its organization, a mechanism can be seen as
ence for mechanistic models that visualize interactions a solution to the problem of how to organize the parts
over abstract mathematical system descriptions and their interaction in such way that this behavior is
(i.e., equations that do not bear an obvious relation to generated. Note, that this problem need not be experi-
the physical properties of the parts of the system), enced by the mechanism or some other system. The
appeal to role functions to explain how mechanisms difficulty (i.e., the amount of organization needed) to
work and the appeal to advantages and requirements to produce or maintain that behavior in the circumstances
E 708 Explanation in Biology
in which it occurs suffices to talk of problems. As explanations) but also why that mechanism’s organi-
George Cuvier (1769–1832), the founding father of zation solves the problem, whereas other forms of
functional zoology, already noted, the very existence organization (often called “designs”) do not. This
of organisms means that they have solved the problem opens up the possibility and the need to explain why
of how to stay alive (cf. Reiss 2009). Organisms exist a mechanism has the characteristics it has on the basis
far from thermodynamic equilibrium and can exist of the requirements it has to satisfy: the constraints
only by actively maintaining their organization imposed on it by the problems it must solve, the other
(Prigogine and Stengers 1984). To understand how characteristics of that mechanism, and the conditions
this form of existence is possible, a combination of under which it works. This is what functional explana-
explanations of four kinds must be employed, as many tions (▶ Explanation, Functional) do.
biologists have recognized (e.g., Tinbergen 1963): Furthermore, because, by definition, one does not
mechanistic explanation, functional explanation, devel- get a mechanism by throwing its parts together, the
opmental explanation, and evolutionary explanation. question of how the organization that solves the prob-
Mechanistic explanations explain how the behavior lems arises in the course of time needs consideration
of a mechanism arises from the properties of the parts, too. As became clear in the wake of Charles Darwin’s
their interaction, and the way in which this interaction theory of evolution, in the case of living mechanisms,
is organized. Biology’s functional perspective glues the answer requires a two-staged explanation. Devel-
explanations of different mechanisms together by opmental explanations (▶ Explanation, Developmen-
viewing those mechanisms in terms of their role in tal) explain how the organization arises in the course
maintaining the state of being alive (i.e., in terms of of individual development. Evolutionary explanations
their biological role). The different organ systems have (▶ Explanation, Evolutionary) explain how the machin-
specific roles in bringing about the living state. Each ery to develop the required organization arose in the
organ system consists of organs with specific roles in course of evolutionary history.
bringing about the properties of the organ system that So, the view of organisms as solutions to the prob-
enable that organ system to perform its role in the lem of how to maintain the living state provides
maintenance of the living state. Each organ in turn is a unifying perspective in biology that explains and
divided into subsystems, each with a specific role in justifies the importance of the four kinds of explanation
bringing about the properties relevant to that organ’s employed in biology, the relation between those expla-
biological role. And so on, until a level is reached in nations, and the main differences between biology on
which the relevant system properties arise out of unor- the one hand and physics and chemistry on the other.
ganized components. Attributions of biological roles
(often called “function ascriptions”) (see ▶ Function,
Biological) situate the different mechanisms in this Cross-References
encompassing hierarchical organization, making it
possible to see different mechanistic explanations as ▶ Explanation, Developmental
part of the larger project to understand how organisms ▶ Explanation, Evolutionary
are able to stay alive. Attributions of biological roles ▶ Explanation, Functional
are, hence, the key to explanation in biology. ▶ Function, Biological
The very fact that the behavior of mechanisms ▶ Mechanism
(including organisms) results from the organization ▶ Organization
of their parts (in addition to their composition) means
that in order to understand a mechanism, it does not
suffice to explain how it works. The mechanism’s References
solution to a certain problem need not be the only
possible solution to that problem. On the other hand, Dennett DC (1995) Darwin’s dangerous idea. Simon & Schuster,
by definition, not any form of organization will solve New York
Fox Keller E (2002) Making sense of life. Harvard University
any problem. So in order to understand a mechanism,
Press, Cambridge
one must not only explain how its organization results Glennan SS (1996) Mechanisms and the nature of causation.
in a certain behavior (as is done in mechanistic Erkenntnis 44(1):49–71
Explanation, Developmental 709 E
Goodwin BC (1994) How the leopard changed its spots. Phoenix narrowly, it means the process leading from the zygote
Books, London to the adult form. The larger meaning covers the whole
Machamer PK, Darden L, Craver CF (2000) Thinking about
mechanisms. Philos Sci 67:1–25 life of the organism, which includes episodes such as
Mayr E (1997) This is biology. Harvard University Press, reproduction and senescence (West Eberhardt 2003;
Cambridge Gilbert 2003). Developmental explanations pinpoint
Prigogine I, Stengers I (1984) Order out of chaos. Heinemann, events in the lifetime accounting for an individual
London
Reiss JO (2009) Not by design: retiring Darwin’s watchmaker. carrying a trait, whereas evolutionary explanations
University of California Press, Berkeley (▶ Explanation, Evolutionary) often stand at the pop-
Tinbergen N (1963) On aims and methods of ethology. ulation level.
Z Tierpsychol 20:410–433
Wimsatt WC (2007) Re-engineering philosophy for limited
What Development Explains E
beings. Harvard University Press, Cambridge, MA
Wouters AG (2005) The functional perspective of organismal The general explanandum of developmental explana-
biology. In: Reydon TAC, Hemerik L (eds) Current themes tions is twofold: to explain why and how a zygote
in theoretical biology. Springer, Dordrecht, pp 33–69 develops into an adult of its species, with its species-
typical traits; and to understand why and how systematic
deviations from the normal type occur, which usually
concerns “teratology.” Often, a back and forth move-
Explanation, Developmental ment between teratology and normal development char-
acterizes developmental explanation, because knowing
Philippe Huneman which alterations (e.g., silencing genes; injecting pro-
Institut d’Histoire et de Philosophie (IHPST), teins, etc.) produce a specific abnormal development
des Sciences et des Techniques Université Paris 1 allows understanding what is necessary to develop the
Panthéon-Sorbonne, Paris, France corresponding normal trait; therefore, many experimen-
tal developmental theorists proceed by such controlled
perturbations of development, at any hierarchical level
Synonyms (genes, cell, tissues).
Developmental explanations target ▶ model organ-
Embryological explanation isms such as nematodes, sea urchins, frogs, chicken;
through those models they aim at understanding what
is common across many species and clades. Develop-
Definition ment has many varieties: some species hatch, some
have larval stages, some develop from a unicellular
A developmental explanation explains how a trait zygote as a sort of bottleneck between generations –
came into being through a process starting at the sin- whereas others do not. Lowest level mechanisms
gle-celled stage of the existence of the organism. Since involving genes and gene transcription are likely to
developmentally explaining a trait is an explanation of be very general across multicellular organisms; but
its development, it requires explaining some detail of some higher level traits typical to a clade (i.e., eyes,
the developmental processes which were involved; hearts, etc.) require targeting an appropriate model
“developmental explanation” secondarily means the organism.
explanation of a particular feature, moment, or process In principle, any trait in an individual organism can
of development itself, for example, cell division, apo- receive a developmental explanation. However, the
ptosis, and specific pattern formations. focus is often put on species(or clades)-typical traits:
heart or central nervous system in vertebrates, tetrapod’s
limb, butterflies’ wings striped, etc. Some theoreticians
Characteristics view the scope of developmental explanation as in prin-
ciple limited to species-typical traits and the genealogy
Development has narrow and large meanings: very of a species type, then of a family type, etc., as in early
narrowly, it is the embryogenesis, that is, the process descriptive embryology. This conflicts with the core
leading from conception to hatching or birth; less Darwinian idea that variation is proper to species.
E 710 Explanation, Developmental
Explanation, Developmental, Fig. 1 (a) The French flag Articulating Explanatory Regimes
model pattern. (b) The causal underpinning of the flag pattern “Developmental explanation” in general means
(After Kerszberg and Wolpert 2007) explaining a trait by unraveling a developmental process.
Explanation, Evolutionary 713 E
When and how are developmental explanations needed? ▶ Plasticity
According to Ernst Mayr (1961), developmental expla- ▶ Preformation and Epigenesis
nations pertain to proximate causes of phenomena, and
evolutionary explanations (▶ Explanation, Evolution-
References
ary) to their ultimate causes. However, given that natural
selection requires variation and that some variation relies Amundson R (2005) The changing role of embryo in evolutionary
upon possibilities given by developmental processes theory. Cambridge University Press, Cambridge
(and not mere mutations and recombinations), develop- Carroll S (2005) Endless forms most beautiful: the new science
mental explanations may be implied by evolutionary of evo devo and the making of the animal kingdom. Norton,
New-York
questions. In general “developmental ▶ constraints” Davidson EH (1986) Gene activity in early development.
condition the generation of variation, through which Academic, Orlando
E
selection shapes traits and organisms. Therefore, evo- Davidson E, McClay D, Hood L (2003) Regulatory gene net-
lutionary and developmental explanations seem com- works and the properties of the developmental process.
PNAS 100(4):1475–1480
plementary, the former specifying which variations Forgacs G, Newman S (2005) Biological physics of the developing
are available and the second explaining which actual embryo. Cambridge University Press, Cambridge
variants will emerge on the basis of these variations. Gilbert S, Opitz G, Raff R (1996) Resynthesizing evolutionary
However, it may be that sometimes a developmental and developmental biology. Development and evolution
173:357–372
explanation is sufficient because it unravels a physico- Gilbert S (2003) Developmental biology. Sinauer, Sunderland
chemical structure involved in a large set of phyla, and Kerszberg M, Wolpert L (2007) Specifying positional information
therefore explains the pervasiveness of some traits in the embryo: looking beyond morphogens. Cell 130:205–210
without a need to appeal to selective pressures. The Mayr E (1961) Cause and effect in biology. Science
134(3489):1501–1506
self-organizing gene networks investigated by Mazzarello P (1999) A unifying concept: the history of cell
Kauffmann’s Origins of Order (1993) provide devel- theory. Nat Cell Biol 1:E13–E15
opmental explanations of this type. An example would Moss L (2003) What genes can’t do. MIT Press, Cambridge, MA
be the modularity of cell metabolic networks: even if it M€uller G, Newman S (2005) The innovation triad: an evo-devo
agenda. J Exp Zool 304B(6):487–503
seems that modularity confers a selective advantage Solé R, Valverde S (2008) Spontaneous emergence of modular-
(through ▶ robustness), so that natural selection seems ity in cellular networks. J R Soc Interface 5:129–133
responsible for the fact that all cell types in many clades von Baer KE (1828) Entwickelungsgeschichte der Thiere:
exhibit modular networks, the mere topology of net- Beobachtung und Reflexion, Borntr€ager, K€ onigsberg
West Eberhard MJ (2003) Developmental plasticity and evolu-
works will indeed promote mostly modular networks tion. Oxford University Press, Oxford
without the need for natural selection to prune a set of Wolff CF (1764) Theorie von der generation. Birnstiel, Berlin
randomly varying networks (Solé and Valverde 2008). Wolpert L (1994) Positional information and pattern formation
In cases like this, the developmental explanation seems in development. Dev Genet 15:485–490
enough to explain the generality of a trait, without
hypothesizing a fixation through natural selection. The
final status of Evo-Devo seems is tied to a decision about Explanation, Evolutionary
whether such cases are exceptions, or not.
Philippe Huneman
Institut d’Histoire et de Philosophie (IHPST), des
Cross-References Sciences et des Techniques, Université Paris 1
Panthéon-Sorbonne, Paris, France
▶ Canalization
▶ Emergence
▶ Epigenetics Definition
▶ Explanation, Evolutionary
▶ Gene Regulatory Networks Evolutionary theory is the general framework for mod-
▶ Model Organism ern biology, in the sense that all living phenomena have
E 714 Explanation, Evolutionary
an evolutionary history which somehow accounts for process of differential reproduction of individuals carry-
them being the way they are. Ernst Mayr usefully distin- ing traits which give them different chances of having
guished two sets of inquiries in biology: the “functional their traits represented in the next generation. As soon as
biology,” looking for “proximate causes” of a trait in an a population of individuals has traits which are varying
organism, that is, causes pertaining to the lifetime of the and heritable, and if those traits causally influence their
individual, and the “evolutionary biology,” looking for chances of reproduction, then we have natural selection.
“ultimate causes,” namely, causes at the level of the This third property means that those traits have, or
history of the species to which belongs the individual. contribute to, ▶ fitness. Finally, drift is the statistical
The former includes physiology, molecular biology, effect due to the fact that generation after generation
developmental biology, etc., whereas the latter includes a stochastic sampling of the population takes place.
paleontology, population genetics, behavioral ecology, The smaller the population, the higher the expected
systematics, etc. Evolutionary explanation is the set of amount of drift; whereas in an infinite population, selec-
explanatory styles to be met in this field (Ridley 2001). tion would be the sole causal factor, hence the fittest traits
or alleles would in general go to fixation.
Thus, evolutionary explanation is always
Characteristics a population level explanation. All factors are differen-
tial, so the whole process needs a population of varying
Explananda and Explanantia individual; it contrasts with an explanation which would
Evolutionary theory, as it was formulated by Darwin, consider an individual and explain its traits by investi-
holds two key ideas: all living forms share a common gating how they developed. To this extent, population
history and (as the naturalists hypothesized) a common level and individual level explanations are rather com-
origin; natural selection is a process responsible for many plementary than competing: “why does individual
of the features of this Tree, and of the species and organ- A have trait X?” can be answered by considering its
isms it includes. In the usual theory, called the “Modern history, whereas population level explanations answer
Synthesis,” namely, a synthesis of Darwinism and Men- questions such as: “why does each individual in the
delism on the basis of population genetics in the 1930s, population have trait X?,” or “why do many of them
evolution is related to the dynamics of gene frequencies. (or: a minority) have trait X?” (e.g., why are some people
Let us first specify what evolutionary explanations left-handed?)
are intended to explain. First, they should explain
evolution; then, they should explain the diversity of Two Evolutionary Questions
living beings; the adaptedness of organisms to their Evolutionary explanations often build models, analytic
environments – on the basis of the fact that they display or simulated, which on plausible assumptions describe
features which suit them to their environment, for the evolutionary dynamics at stake or the way it may
example, the webbed feet of ducks are suited to swim- have produced the explanandum; model-building and
ming. So they explain the traits themselves, which comparing with data are the two stages of the expla-
are adaptations. Given that the Modern Synthesis nation. ▶ Fitness and population size influence evolu-
has defined evolution as the “change in allelic tionary dynamics (as well as migration and mutation
frequencies,” a first explanandum is the change in the rates). Yet ▶ fitness is a probabilistic term, and has
relative frequency of genes. The existence of a trait in causes: all the interactions within which an organism
this context is explained when the fixation of a gene can engage, and into which the trait of interest is
underlying it is explained. involved, determine in the end the fitness value of the
The set of explanantia mostly used to explain those trait or of the organism. This entails a difference
explananda are natural selection, genetic drift, and “con- between two kinds of questions:
straints.” The last of these is loosely defined. It includes (a) How does selection proceed? With fitness as
“phylogenetics inertia,” that is, the fact that a same trait a variable, one can build a model describing the
in a population is indeed a character state of an ancestor dynamics of a population of alleles with specific
and did not happen through natural selection even if it is genetic structure (epistasis, pleiotropy, dominance,
adaptive. It also means “developmental constraints” (see etc.) and fitness values. This is what population
▶ Explanation, Developmental). Natural selection is the genetics does.
Explanation, Evolutionary 715 E
(b) Why is/was there selection? One can investigate the
causes of the fitness values, which implies consider-
ing the physiology of individuals and the ecology of
the population. Behavioral ecology, asking about the
▶ function of various traits – that is, the effect which E
led them to spread into the population and allows for
its maintenance – exemplifies such research pro-
gram, as well as paleontology.
the living state would diminish if the trait to be explained the rate of oxygen delivery functionally depend on
(the presence of a circulatory system) was replaced active transport (follows from Fick’s law of diffusion,
by a specific alternative (no active transport). The a well-known law of physical chemistry), (b) the rate
explaining traits are functionally dependent on the trait of oxygen consumption functionally depends on the
to be explained in the sense that the ability to maintain rate of oxygen delivery (follows from the obvious
the living state of an organism with the explaining traits invariance that the rate of consumption cannot be higher
would diminish if the trait to be explained was replaced than the rate of delivery), and (c) the rate of activity of the
by the alternative, whereas replacing the trait to be organism functionally depends on the rate of oxygen
explained would not make much difference if the organ- consumption (follows from a well-established invariant
ism lacked the explaining traits. For example, having relation (very roughly: the more active the organism, the
a certain size and activity is functionally dependent on more oxygen it consumes) that results from the way in
the presence of active transport because removal of which those organisms generate the energy needed for
active transport would diminish the life chances of their activities).
organisms with that size and activity, whereas the Several types of evidence support or undermine
replacement would have little effect on that organism’s claims about the existence of a relation of functional
capacity to maintain itself if it were small enough and not dependence. Comparisons provide an important clue. It
very active. is necessary that all organisms with the dependent traits
Functional dependence is a synchronic relation at have the trait to be explained or a functional equivalent,
the level of the individual (the size and activity of an but the support provided by comparisons for the conclu-
organism are dependent on that organism having sion that a certain trait is dependent on others is rather
a system of active transport). The relation is not weak. Experimental manipulation provides better evi-
of a causal nature (see ▶ Causality), but rather dence. By producing organisms that are in relevant
a ▶ constraint on what can be alive: our universe is aspects similar to the real organism, except that the trait
such that living systems of a certain size and activity to be explained is replaced by an alternative, one gets
cannot exist without mechanisms for active transport. good evidence for the advantageousness of that trait. In
There may, of course, exist causal relations between order to provide evidence for or against the thesis that
the trait to be explained and its dependents (perhaps the the need is due to the presence of the dependent traits,
transport system was maintained in the lineage because the dependent traits should be modified too. Techniques
variants with a less well-developed transport system of genetic manipulation have greatly increased the possi-
were less active and for that reason eliminated bilities to provide this kind of evidence. The explanation
by natural selection or perhaps the activity of the of a functional dependency on the basis of invariances is
developing organism influences the development of strong evidence for the existence of that dependency,
a transport system in the course of the ontogeny), but provided that both the invariances and the assumptions
the functional dependence exists independent of the on which the explanation rests are well established.
history of the organism and hence independent of Finally, the development of simulation models in systems
these causal relations (it would exist also if the same biology offers a new and powerful way to provide evi-
organism arose out of different ontogenetic and dence for or against functional claims because they allow
evolutionary processes). for precise modifications of large numbers of relevant and
Functional explanations not merely identify the traits potentially relevant variables in silico.
that are functionally dependent on the trait to be We can now see that functional explanations do not
explained, but they also account for the dependency assume ▶ teleology. Functional explanations are
itself. This is done by relating the dependency to invari- concerned with what is needed or advantageous for
ant relations that result from the way the organism is staying alive, but they do not tell us how the
wired and from more general invariant relations that required traits are brought about and, in fact, make no
scientists call “laws.” Such an explanation often intro- assumptions at all about how or why living systems and
duces a structure of functional dependencies intermedi- their traits come into being. For that very reason, they do
ate between the trait to be explained and the dependent not assume that traits are brought about because of their
traits. For example, the structure of functional dependen- advantages, in order to fit the requirements or because
cies in Krogh’s explanation is as follows: (a) the size and something or someone had a certain goal.
Explanation, Reductive 719 E
Functional explanations are, nevertheless, crucial evidence about the variants that occur in the
to understand life because they show us how the population, the occurrence of selection, the heritability
characteristics of an organism fit into the requirements of the trait, the structure of the population, and
for being alive. Living systems exist far from phylogenetic polarity, in addition to evidence about the
thermodynamic equilibrium and can exist only because advantageousness of the trait) (see ▶ Explanation, Evo-
they are able to maintain themselves actively (i.e., by lutionary). It is also unfortunate that this way of
using energy). Although there are many ways to stay presenting functional explanations has led some critics
alive, not all combinations of matter will do. The ability to reject functional explanations as mere speculation,
of a system to maintain the living state is critically depen- thus ignoring the valid insights provided by functional
dent on its organization, that is, on the composition, explanations together with the unsubstantiated evolu-
character, and arrangement of its parts and the order and tionary conclusions drawn from them. E
timing of its activities. Causal explanations can tell us
how a certain form of organization came into being and
how that form of organization brings about the ability to Cross-References
stay alive, but not why certain forms of organization are
able to stay alive and others not nor why certain forms of ▶ Causality
organization are better in staying alive than others. This is ▶ Constraint
the kind of understanding provided by functional expla- ▶ Explanation in Biology
nations, and this is why a combination of functional and ▶ Explanation, Evolutionary
causal explanations is needed to understand life (see ▶ Function, Biological
▶ Explanation in Biology). ▶ Teleology
The failure to understand the noncausal nature of
functional explanations and the failure to understand
how noncausal explanations can contribute to scientific References
understanding have instigated or strengthened many
misconceptions. If it is, erroneously, assumed that all Krogh A (1941) The comparative physiology of respiratory
explanations purport to explain how the phenomenon mechanisms. University of Pennsylvania Press, Philadelphia
Wouters AG (1995) Viability explanation. Biol Philos
to be explained was brought about, one might easily,
10(4):435–457
but erroneously, take functional explanations as resting Wouters AG (2007) Design explanation: determining the
on the teleological assumption that traits of organisms constraints on what can be alive. Erkenntnis 67(1):65–80
are brought about because they perform a certain func-
tion. In response to this misconception, one might reject
functional explanations, erroneously, as illegitimately
teleological, as both structuralist and reductionist Explanation, Reductive
biologists tend to do, or look for a place for teleology
within the Darwinian theory of evolution, as many nat- Marie I. Kaiser
uralistic philosophers tend to do. The idea that explana- Department of Philosophy, University of Cologne,
tions should explain how the trait to be explained is Cologne, Germany
brought about might also lie behind the tendency to
present the conclusion of functional explanations in evo-
lutionary terms. Several books advertise a functional Synonyms
approach as an evolutionary one, and an increasing num-
ber of articles present functional explanations as showing Explanatory reduction
that a certain trait “evolved as an adaptation for” or “was
selected for” some advantage, need, or dependent trait.
This is unfortunate not only because of the teleological Definition
odor of this kind of conclusion but also (and more
importantly) because it suggests much more than the Reductive explanations (or explanatory ▶ reduction)
evidence allows (claims about selection require are a special kind of scientific ▶ explanation. It has
E 720 Explanatory Reduction
been argued that three central features distinguish reduc- Kaiser MI (2011) Limits of reductionism in the life sciences.
tive explanations in biology from non-reductive ones Hist Philos Life Sci 33:389–396
Sarkar S (1998) Genetics and reductionism. Cambridge Univer-
(Kaiser 2011; for other analyses see Sarkar 1998; sity Press, Cambridge
H€uttemann and Love 2011). The first and third are nec-
essary conditions for a biological explanation to be
reductive, whereas the second is only a feature that is
typical for most (but not for all) reductive explanations in
biology. Explanatory Reduction
In reductive explanations, the behavior of
a biological system S is explained ▶ Explanation, Reductive
1. by referring only to factors that are located on
a lower level of organization than S,
2. by focusing on factors that are internal to S (i.e.,
that are genuine parts of S), Exploratory Experimentation
3. by describing parts of S only as being parts in
isolation (i.e., by appealing only to those relational Richard M. Burian
properties of and interactions between parts that can Department of Philosophy, Virginia Tech, Blacksburg,
be studied in other contexts than in situ). VA, USA
For instance, the contraction of a muscle
fiber is explained by appealing to certain molecules
(e.g., actin, myosin, calcium ions), cell organelles Definition
(e.g., sarcoplasmic reticulum), and by describing the
interactions between them. This explanation possesses Experiments count as exploratory when the con-
a reductive character because all factors mentioned in the cepts or categories in terms of which results should
explanation are internal to the muscle fiber (condition 2). be understood are not obvious, the experimental
External factors like the incoming neuronal signal are, if methods and instruments for answering the ques-
mentioned at all, simplified as being mere input- or tions are uncertain, or it is necessary first to estab-
background conditions. Furthermore, all explanatorily lish relevant factual correlations in order to
relevant factors (i.e., actin and myosin molecules, cal- characterize the phenomena of a domain and the
cium ions, sarcoplasmic reticulum) are located on regularities that require (perhaps causal) explana-
a lower level of organization than the muscle fiber itself tion. Elliot (2007, 322 ff.) offers a useful taxonomy
(condition 1). Finally, only those interactions between of exploratory experimentation. Rather than testing
the muscle fiber’s parts are represented in the explana- hypotheses; it varies parameters or circumstances to
tion that can be investigated by taking the parts out of the see what will happen; it utilizes background knowl-
original system (i.e., the muscle fiber) and studying their edge (including theories and experimental lore) to
properties in other contexts than in situ (condition 3). establish novel correlations, follow anomalies, seek
improvements in instrumentation and experimental
protocols, and the like; and it employs a variety of
Cross-References systematic strategies to govern appropriate variation
of parameters and appropriate orientation to the
▶ Explanation in Biology primary questions in the background. Thus, some
▶ Reduction of the investigations well suited for exploratory
experimentation
• Aim at developing new experimental technologies
for use in data-intensive research
References • Seek to track causal interactions that cross hierar-
chical levels
H€uttemann A, Love AC (2011) Aspects of reductive explanation
in biological science: intrinsicality, fundamentality, and tem- • Deal with cases in which the state space of relevant
porality. Br J Philos Sci 62:519–549 variables is not well delimited
Exploratory Experimentation 721 E
• Aim at developing new concepts for “fixing phe- Of course, studies of experimental methods and
nomena” calling for explanation by establishing practices always recognized that experiments may
relevant regularities or classifying relevant entities serve other purposes – e.g., to help design or calibrate
(e.g., regulatory RNAs, compounds that act as instruments, standardize reagents, refine key measure-
endocrine disruptors, and protein networks) that ments, establish correlations between changes in
have not yet been well characterized experimentally controlled parameters and dependent
• Seek to resolving anomalies, especially those aris- variables (▶ Experiment). Ian Hacking (1983), helped
ing across disciplinary boundaries in the handling spark a philosophical reexamination of experimental
of complex systems practice in its own right and reinforced recognition
Exploratory experimentation is often a key compo- within philosophy of science that experiment “has
nent in investigations of the importance of potentially a life of its own” and is not simply a handmaiden of E
relevant variables, parameters, and sources of error. theory. Philosophers supporting the “new experimen-
It also facilitates the discovery of unrecognized pat- talism” generally agree that experimental protocols
terns and regularities (▶ Pattern Mining) and the can be freed from pernicious dependence on the
design of instrumentation for investigating open ques- hypotheses or theories under investigation (Arabatzis
tions and for analyzing complex phenomena, particu- 2008; Hacking 1992; Weber 2004) and that experi-
larly when new or revised concepts and classifications ments serving exploratory and inductive purposes
are needed to help explain novel phenomena. It may serve different purposes and should meet different
help to establish system boundaries and resolve anom- evaluative criteria than those designed for hypothesis
alies that have turned up in previous investigations testing.
(Elliott 2007).
Conditions that Exploratory Experiments Should
Meet
Characteristics Several major steps are required for success in explor-
atory experiments. These include:
Philosophical Background • Establishing conditions in which experimental
Until about 1980, mainstream philosophers of science findings are stable (Hacking 1992) and/or system
focused mainly on hypotheses, theories, and their jus- behavior robust to various perturbations (Ferrell
tification rather than discovery and hypothesis genera- 2009)
tion, holding that the latter can neither be analyzed • Establishing which downstream effects follow on
philosophically nor made subject to philosophical small changes in the location, timing, or order of
norms. They paid far more attention to the events
hypothetico-deductive method as opposed to inductive • Iteratively adjusting experimental conditions
methods for inference from data to hypotheses: if jus- and technologies to reduce noise in experimental
tification depends on deduction from true or justified findings
premises, inductive inferences (which are deductively • Rethinking the identities of the “players” (boundary
invalid!) are suspect and, arguably, even illegitimate conditions, entities, mechanisms, parameters,
(▶ Deduction; ▶ Induction). This centrality of theory processes, and unknowns) relevant to the findings
and of hypothetico-deductive methodology is nicely • Iteratively deploying revised or new simulations
exemplified in the philosophy of Sir Karl Popper, and computational and theoretical models of the
who held that science is composed of conjectures experimental systems and situations
from which testable hypotheses are deduced with the These steps are intensively experimental. Satisfactory
aid of further hypotheses (e.g., about boundary condi- results are dependent on
tions, experimental setups, and the behavior of exper- • Careful tuning of experimental techniques and skills
imental instruments): • Integration of appropriate calculational tools and
theoretical commitments
The theoretician puts certain definite questions to the
experimenter, and the latter, by his experiments tries to
• Reworking of the experimental conditions, proto-
elicit a decisive answer to these questions and to no cols, instrumentation, and analytical categories in
others. . . (Popper 1968) terms of which the experiments are described
E 722 Exploratory Experimentation
• Iterative reexamination of the scope of tentative find- to cope with the high dimensionality of the interrelation-
ings and the applicability of the techniques and con- ships involved. Such work often requires innovative
cepts to additional research problems and systems statistical and computational tools as well as significant
• Interactive stabilization of the protocols, descrip- adjustment of terminologies and conceptual apparatus
tive terminology, and experimental findings, with from different disciplines and organisms. A common
iterative procedures for refining the experimental issue in analyzing pathway and networks is, therefore,
apparatus, the calculational tools, and of the ana- managing and reducing the semantic difficulties arising
lyses of patterns in the phenomena and causal path- from differing disciplinary commitments.
ways at issue (Kelder et al. 2010) An important type of investigation drawing on
exploratory experimentation is the interactive revision of
classifications and analyses to discover empirically mean-
Exploratory Experimentation in Systems Biology ingful patterns that can facilitate the production of coher-
Because systems biology is concerned with establishing ent and reliably reproducible results, transportable from
the strength and scope of interactions across many one problem or database to another. Thus the recognition
scales in complex systems that often have partially of novel patterns exhibited in data and/or turned up while
specified architectures and boundaries, exploratory developing novel explanatory concepts is a major purpose
experimentation typically serves different purposes of exploratory experimentation. Pattern recognition of this
and employs different modalities than conventional sort encounters many obstacles in the interdisciplinary
hypothesis testing (e.g., high-throughput technologies approaches to complex problems now prevalent in sys-
and data-intensive research with high levels of noise) tems biology and beyond. An extreme example illustrates
(Franklin 2005; O’Malley 2007). Recent debates about the difficulty: consider recent debates about the (partly
the structure and value of “data-driven research” and anthropogenic) causes and effects of climate change in
“discovery science” are therefore relevant (▶ Data- relation to ecological change and the spread of various
Intensive Research; ▶ Data Mining; ▶ De Novo Com- pathological organisms. The resolution of the issues about
putational Discovery of Motifs; ▶ Functional/Signature which patterns are salient and involve anthropogenic
Network Module for Target Pathway/Gene Discovery; causes must reconcile perspectives from biogeography,
▶ Rule Discovery). Furthermore, much research in sys- biological systematics, ecology, economics, epidemiol-
tems biology is comparative and classificatory, e.g., ogy, genomics, meteorology, molecular biology, paleocli-
comparing the distinctive functions of genes or proteins matology, pathology, and many other disciplines. It is
in different systems, different developmental stages of hardly a surprise that the integration and interpretation of
a developing system, or the networks and pathways of data to attempt a systematic approach to some of the issue
different systems. Such comparisons often require use of involved is, by itself, an enormously difficult problem.
data from multiple databases and disciplines. Ensuring Problems of this sort, though not always as dramatic, arise
robustness of findings and comparability of data from often enough when exploratory experimentation is
diverse sources is often problematic, fraught with the employed in systems biology.
difficulties of developing or meeting the constraints of
curated databases and/or standardized “ontologies” such Conclusion
as the Gene Ontology (▶ Bio-Ontologies; ▶ Disease The most important points of this entry are the distinc-
Ontology; ▶ Gene Ontology; ▶ Ontology Analysis of tiveness of exploratory experimentation and the inade-
Biological Networks; ▶ Ontology Structure). quacy of the characterizations it has received in standard
Again, the relationship between high-throughput accounts of experimental methods. Philosophers and
exploratory experiments and well-controlled hypothesis- methodologists are only beginning to develop models
testing experiments is crucial to evaluation of systems of exploratory experimentation. There has been little
biological experiments. Recently, protocols have been appreciation of the relatively indirect role of hypothesis
developed that combine the two with means of calibrating testing in exploratory experimentation. For example,
findings (e.g., Pfeifer et al. 2010); the resulting mutual standard reviews of experiment in systems biology
reinforcement is potentially very powerful. Often (as (e.g., Kreutz and Timmer 2009) retain strong rhetoric
illustrated by Pfeifer et al.), data from several sources about the need to specify exact hypotheses even when
need to be reconfigured to make the data comparable and they concentrate mainly on issues described here as
Exposure 723 E
pertinent to exploratory experiments. Unlike hypothesis ▶ Gene Ontology
testing, exploratory experiments do not begin with a null ▶ Induction
model and are not able to specify in detail the expected ▶ Ontology Analysis of Biological Networks
outcome of key experiments. They must focus, instead, ▶ Ontology Structure
on reducing noise in the data, on how best to perturb ▶ Pattern Mining
systems to uncover relevant parameters or the triggers of ▶ Rule Discovery
threshold transitions, and on ensuring robustness of
findings. Exploratory experiments do, however, have
a strong basis in (theoretical) background knowledge of References
relevant mechanisms and comparative knowledge of
related systems. One analytical approach of great interest Arabatzis T (2008) Experiment. In: Psillos S, Curd M (eds) The E
Routledge companion to the philosophy of science.
compares the ways in which exploratory experiments
Routledge, London/New York, pp 159–170
utilize complex data sets and revision of concepts and Elliott KC (2007) Varieties of exploratory experimentation in
classifications with the methods utilized by natural his- nanotoxicology. Hist Philos Life Sci 29(3):313–336
tory in dealing with complex data, revision of concepts, Ferrell JE Jr (2009) What is systems biology? J Biol 8:2
Franklin LR (2005) Exploratory experiments. Philos Sci
and revision of classifications (Strasser 2010). The point
72:888–899
is partly that the highly experimental work discussed Hacking I (1983) Representing and intervening: introductory
above seeks to understanding of complex interactions topics in the philosophy of natural science. Cambridge Uni-
in natural conditions among “natural assemblages” of versity Press, Cambridge
Hacking I (1992) The self-vindication of the laboratory sciences.
molecules, genomes, organisms, and a welter of
In: Pickering A (ed) Science as practice and culture. Univer-
interacting mechanisms, “players” and processes – and sity of Chicago Press, Chicago, pp 29–64
this is quite like what is done in natural history except Kelder T, Conklin BR, Evelo CT, Pico AR (2010) Finding the right
that it is in carried out in highly experimental contexts. questions: exploratory pathway analysis to enhance biological
discovery in large datasets. PLoS Biol 8(8):e1000472
Natural historical research is often characterized as
Kreutz C, Timmer J (2009) Systems biology: experimental
“descriptive.” Fair enough. But it should be recognized design. FEBS J 276:923–942
that descriptive findings are equally essential to inten- O’Malley MA (2007) Exploratory experimentation and scien-
sively experimental laboratory research required to tific practice: metagenomics and the proteorhodopsin case.
Hist Philos Life Sci 29(3):337–360
characterize the systems and the processes of interest in
Pfeifer AC, Kaschek D, Bachmann J, Klingm€ uller U, Timmer J
systems biology. This is why broadly inductive uses (2010) Model-based extension of high-throughput to high-
of exploratory experimentation and natural-historical content data. BMC Syst Biol 4:106
methods employing high-throughput technologies are Popper KR (1968) The logic of scientific discovery. Harper and
Row, New York
so central in systems biology. And it is why iterative
Strasser BJ (2010) Laboratories, museums, and the comparative
interaction between the methods of exploratory experi- perspective: Alan A. Boyden’s serological taxonomy,
mentation and those of hypothesis testing is crucial to the 1925–1962. Hist Stud Nat Sci 40(2):149–182
ongoing development of systems biology (Kelder et al. Weber M (2004) Philosophy of experimental biology.
Cambridge University Press, Cambridge and New York
2010).
Cross-References Exposure
Synonyms X
Nd
!
Synonyms
Orientations histogram
Definition
Characteristics
Cross-References
The aim, for docking applications, is the search for
▶ Extended Gaussian Image for Pocket-Ligand subregions that are complementary between different
Matching molecules. When we have a large molecule (receptor)
and a small molecule (▶ Ligand), docking takes place
in a protein cavity. In this connection the first subprob-
References lem to be solved, in protein-ligand interfaces, is to
develop the representations and the data structures
Hermann M (1910) Geometrie der Zahlen, Leipzig and Berlin:
suitable to support the computational methods that
R. G. Teubner, MR0249269
Horn BKP (1984) Extended Gaussian images. Proc IEEE allow a quantitative evaluation of the protein-ligand
72:1671–1686 matching on the basis mainly of their 3D structure and
E 726 Extended Gaussian Image for Pocket-Ligand Matching
morphology. The EGI is a possible representation for A given 3D molecule, modeled through its SES in
the ligand molecule, with the correspondent data a triangular mesh, is described by the set of triangles:
structure based on a first-order statistic of the surface
orientations. After the segmentation of the protein SES T ¼ fT 1 ; :::; T m g; T l R3 (1)
(Cantoni et al. 2010), the interface regions, which
potentially can be active sites, are represented by an where each Tl consists of a set of three vertices (2):
EGI (Fig. 2).
Tl ¼ PA;l ; PB;l ; PC;l (2)
specificity and functionality. Hyaluronan is non-sulfated reticular epithelial cells of the thymus and the
and is composed of an unmodified disaccharide repeat surface of brain and spinal cord. Basement membranes
that is not covalently linked to any protein (Toole 2004). usually contain type IV collagen, heparan sulfate pro-
The basement membrane is a permeable, thin ECM, teoglycan, chondroitin sulfate proteoglycan, entactin,
evident as a three-layered basal lamina when using hyaluronic acid, collagen VII, and laminin. Each of the
electron microscopy, placed between epithelial cells components of the basal lamina is synthesized by the
or endothelial cells and adjacent connective tissue. cells that rest upon it and are modified by adjacent
It surrounds smooth and striated muscle cells, fat stromal cells beneath. Basement membrane is both
cells, Schwann’s cells, cells of adrenal medulla, and structural and instructional, providing key positional
E 730 Extrinsic DNA Damage Checkpoints
Definition
FP FP
FDR ¼ E ¼E if
FP þ TP R
The false discovery rate (FDR) is a statistical approach
used in multiple hypothesis testing to correct for R > 0; 0 otherwise: (1)
False Discovery Rate (FDR), Table 1 Number of errors com- False Positive Rate, Table 1 Sample confusion matrix
mitted when testing m null hypothesis H0 versus H1
Predicted
Accept H0 Reject H0 Total A Non-A
H0 true TN FP m0 Actual A TP FN
H1 true FN TP m m0 Non-A FP TN
Total mR R m
The FDR is the proportion of the rejected null false positive rateðaÞ ¼ freject H0 jH0 trueg
hypotheses which are erroneously rejected. In practice,
the level of acceptable FDR is fixed by the investigator, In machine learning (▶ Model Validation, Machine
depending on the desired stringency. Learning), the false positive rate is closely related to the
An example of the practical use of the FDR is in notion of specificity, one of statistical measures widely
genomic association studies, for which a large number used to assess the performance of prediction models.
of statistical tests are performed simultaneously. In Let TP be true positives (samples correctly classi-
such studies, the objective is to identify genomic fac- fied as class A), FN be false negatives (samples incor-
tors worthy of further analysis. If the multiplicity of the rectly classified as not belonging to class A), FP be
data is not taken into account, the probability that false positives (samples incorrectly classified as
a false identification (type I error) is committed can class A), and TN be true negatives (samples correctly
increase sharply when the number of tested genes classified as not belonging to class A). The relationship
becomes large. The FDR controlling approach is between these prediction outcomes can then be sum-
widely used to avoid such errors. marized using a confusion matrix (Kohavi and Provost
1998) as illustrated Table 1.
The false positive rate is the proportion of samples
References not belonging to class A that were incorrectly classified
as class A, i.e.,
Benjamini Y, Hochberg Y (1995) Controlling the false discovery
rate: a practical and powerful approach to multiple testing.
J Royal Stat Soc Ser B 57(1):449–518 false positive rate ðaÞ ¼ FP=ðFP þ TNÞ
¼ 1 specificity
Synonyms
Cross-References
Type I error rate
▶ Model Validation, Machine Learning
Definition
References
In statistical analysis, the false positive rate of a test is
defined as the probability of rejecting the null hypoth- Kohavi R, Provost F (1998) Glossary of terms. Mach Learn
esis H0 when it is true, which can be denoted as: 30:271–274
FBA Analysis, Plant-Pathogen Interactions 733 F
Nevertheless, this type of modeling lacks the
FBA Analysis, Plant-Pathogen capacity to represent the underlying biological
Interactions processes that take place during an infection, for
instance, those that modify infected cell behavior at
Andrés Mauricio Pinzón Velasco1, Silvia Restrepo2 the molecular level. Recently, due to the availability
and Andrés Fernando González Barrios3 of high-throughput biological data as well as the
1
Department of Systems Engineering, Universidad development of sophisticated computational tools,
Jorge Tadeo Lozano, Bogotá, Colombia there is a growing interest in the modeling of these
2
Department of Biological Sciences, Universidad de underlying molecular processes.
los Andes, Bogotá, Colombia
3
Department of Chemical Engineering, Universidad de Computational Modeling of Metabolism
los Andes, Bogotá, Colombia Although the way plants resist a particular pathogen
F
attack vary according to plant species and specific path-
ogen characteristics, in general terms, plants face
Synonyms a pathogen attack by shifting their defense mechanisms.
These mechanisms consists of a complex and highly
Flux balance analysis; Pathosystem interconnected set of networks, in which host defense
genes interact with each other as well as with pathogen
proteins present in the cell. Metabolic and regulatory
Definition networks are of particular interest when studying this
kind of biological processes. It is at this level that one
Flux Balance Analysis (FBA) linear programming expects that pathogen manipulation lead to phenotypic
based optimization approach that departs from an objec- and behavioral differences among plants under attack. In
tive cell or system function restricted by mass and this context, a particular approach for the in silico repre-
energy conservation, which has been widely used for sentation and analysis of the set of biochemical reactions
the analysis of biochemical fluxes in metabolic net- that take place in a given organism, known as metabolic
works. This approach allows researchers to determine reconstructions (MRs), is of particular interest.
changes in metabolic phenotypes given a set of Typically MRs are based on genomics information
biochemical constraints. In the case of plant-pathogen and therefore known as GEMRs or Genome Scale
interactions, or any host-pathogen model, this method- Metabolic Reconstructions (Thiele and Palsson
ology is particularly promising since it offers the oppor- 2010). This type of reconstructions are based on the
tunity to analyze the structure, dynamics, and complex gene content of a complete genome, from where the set
behavior of metabolic networks during a particular of genes that code for enzymes is selected and the
infective process. reactions they belong to are gathered.
Another approach for metabolic reconstruction is
known as TMRs or Targeted Metabolic Reconstruc-
Characteristics tion (Pinzón et al. 2010), which are not based on
genome content but in transcriptomics information.
Mathematical modeling of plant-pathogen interactions Due to its main characteristic TMRs do not cover the
has been carried out since the 1950s (Pinzón et al. complete set of potential genes present in an organism
2009). Typically, plant pathologists have approached genome, but they do provide a way to analyze active
the quantitative and computational modeling of this metabolic networks, as the set of enzymes expressed at
type of interactions focusing on processes at high a given time point and under particular conditions.
levels of resolution, such as the description of temporal Overall, the main aim of metabolic reconstructions
dynamics of crops diseases and spatial patterns of is the identification of suitable targets for metabolic
dispersion. From its very beginning these models engineering, the improvement of production yields and
belonged to the family of logistic equations and its nutritional value of crops, as well as the understanding
application to a broad spectrum of plant diseases has of certain processes such as resistance or defense from
been a constant since then (Pinzón et al. 2009). pathogens, for instance, through the study of time
F 734 FBA Analysis, Plant-Pathogen Interactions
FBA Analysis, Plant-Pathogen Interactions, Fig. 1 FBA is growth is used as OF, and it is represented in the form of
a constraint-based optimization method that facilitates the com- biomass. However, under particular conditions for some meta-
putational prediction of systemic phenotypes in the form of bolic reconstructions in plants, other OFs can be described. For
fluxes of reactions. For instance, given a set of available nutri- example, in S. tuberosum and other tuber-like plants it is feasible
ents for an organism, FBA allows for the prediction of the set of to use starch production and storage as OF instead of a typical
fluxes of metabolic reactions that optimize the growth for that biomass set of reactions, given that starch storage could be seen
organism. FBA requires the conversion of the metabolic system as an indicator of growth. Part B shows a standard form of a FBA
(A) into a stoichiometric matrix “S” (C). In this matrix rows problem where flux through the OF is maximized subject to
represent metabolites and columns reactions. In R1, metabolite some constraints such as reactions stoichiometric and uptake.
A is consumed (1) and metabolite B is produced (1), metabo- In this way reaction fluxes (Vj) are between lower (lb) and upper
lite C does not participate in the reaction (0). The FBA model (ub) bounds. Since FBA assumes a steady state for all reactions,
usually optimizes for a particular characteristic in the organism, metabolite concentrations are fixed (S.v ¼ 0)
which is commonly called the Objective Function (OF). Usually
evolution of a host network under pathogen attack. Using FBA it is possible to determine how
One of the most important characteristics of MRs is microorganisms utilize their metabolism, and predict
that they can be interrogated by means of computa- their changes under environmental perturbations. For
tional modeling and therefore derive hypothesis based instance, it is possible to assess the effect that a single
on model predictions. gene deletion can have over the flux of all reactions in
a metabolic network (Fig. 2). These fluxes of reactions
Flux Balance Analysis (FBA) in the Context of correspond to intracellular biochemical networks that,
Plant-Pathogen Interactions due to the lack of kinetics information, are assumed to
There are several methodologies than can be used to operate under pseudo steady-state conditions, which
interrogate a metabolic reconstruction (Oberhardt et al. seems to be in agreement with experimental data. For
2009). A common constraint-based method, known as instance, this approach showed an 85% consistency of
Flux Balance Analysis or FBA (Becker et al. 2007), gene essentiality for the genome scale reconstruction
has been widely used for this purpose (Fig. 1). of Escherichia coli and 70% consistency for gene
FBA Analysis, Plant-Pathogen Interactions 735 F
FBA Analysis, Plant-Pathogen Interactions, Fig. 2 (a) met- SAGE or protein-protein interaction data, it is possible to silence
abolic profile before virtual knockout. (b) Metabolic profile after some reactions and let FBA to predict flux change under these
virtual knockout of reactions between B and C and between new conditions. In the case of plant-pathogen interactions, it is
F and E. OF: Objective function. In part (a) of the figure it is possible to use data from known host R-proteins and pathogen
clear how a different flux of reactions is present for OF optimi- effectors interaction, as well as genes with differential expres-
zation. Based on biological information, such as microarray, sion obtained by essays of plants challenged with the pathogen
essentiality for the reconstruction of Pseudomonas between host R-proteins and pathogen effectors is
aeruginosa (Reviewed in Pinzón et al. 2009). effective and then the pathogen is able to suppress
This characteristic of FBA is of particular interest in host recognition and the plant defense response
the study of plant-pathogen interactions, but before derived from it, leading to a disease known as Late
trying to clarify how this approach can be used in this blight of potato.
context, some background on the pathogenicity and This pattern of interaction is also typical for many
host defense response is necessary. other pathosystems such as Arabidopsis thaliana and
Plant-pathogen interactions can be divided in two Pseudomonas syringae or A. thaliana and Hyaloper-
main mechanisms. During the first one, known also as onospora parasitica, among many others.
basal defense, the plant recognizes general features in Although for many of these pathosystems some
pathogens that are universally associated with microbes. molecular details are known, this is not the reality for
These features are known as MAMPs (Microbe- most of them. In the case of P. infestans and
Associated Molecular Patterns) which are recognized S. tuberosum, some phenotypical responses to plant
by membrane receptors in plants. The recognition of attack are recognized to date, such as a decrease
these MAMPs triggers the first line of defense, which in plant’s photosynthetic capacity several hours after
for most plants it is enough to stop pathogen attack. infection, but the precise biochemical mechanisms that
However, some plant pathogens are specific to a host lead to that phenotype are not known. Although typical
and have developed special strategies for the manipula- molecular approaches (proteomics, transcriptomics,
tion of host defenses. This is the case for the pathogen etc.) have shown to be effective in revealing
Phytophthora infestans in Solanum tuberosum. During how some of these mechanisms work, the methodolo-
the pathogenicity process P. infestans releases an arse- gies are not tractable when trying to understand
nal of proteins known as effectors, which are injected a complex network of biochemical interactions. There-
into the host plant through specialized secretion sys- fore, FBA over a metabolic reconstruction can be an
tems, suppressing the first line of defense. S. tuberosum ideal alternative for the comprehension of this type of
in turn, have evolved to recognize these effectors using networks.
Resistance proteins (R-proteins), which directly or indi- In general, what a FBA analysis describes is the set
rectly interact with them. Not always this interaction of fluxes of reactions that take place in a metabolic
F 736 Feasibility
References minimize 0
n
x2
Becker SA, Feist AM, Mo ML, Hannum G, Palsson BØ, subject to fi ðxÞ 0; i ¼ 1; . . . ; m:
Herrgard MJ (2007) Quantitative prediction of
cellular metabolism with constraint-based models: the
COBRA toolbox. Nat Protoc 2(3):727–738. doi:10.1038/ If the constraints cannot be satisfied, the optimal
nprot.2007.99 value is 1 by definition. On the other hand, if an x
Hoppe A, Hoffmann S, Gerasch A, Gille C, Holzhutter H-G
exists that satisfies the constraints, the optimal value
(2011) FASIMU: flexible software for flux-balance compu-
tation series in large metabolic networks. BMC Bioinformat- will be 0.
ics 12(1):28. doi:10.1186/1471-2105-12-28 The formulation of a feasibility problem as
Oberhardt MA, Palsson BØ, Papin JA (2009) Applications of optimization problem is particularly attractive for
genome-scale metabolic reconstructions. Mol Syst Biol
constraint classes where a solver exists which
5:320. doi:10.1038/msb.2009.77
Pinzón A, Barreto E, Bernal A, Achenie L, Barrios is guaranteed to find the optimal value, e.g.,
AFG, Isea R et al (2009) Computational models in ▶ semidefinite programs. In this case, the optimi-
plant-pathogen interactions: the case of Phytophthora zation algorithm can directly be used to decide the
infestans. Theor Biol Med Model 6:24. doi:10.1186/
feasibility problem.
1742-4682-6-24
Rocha I, Maia P, Evangelista P, Vilaca P, Soares S, Pinto JP,
Nielsen J et al (2010) OptFlux: an open-source software
platform for in silico metabolic engineering. BMC Syst Cross-References
Biol 4(1):45. doi:10.1186/1752-0509-4-45
Thiele I, Palsson BØ (2010) A protocol for generating a high-
quality genome-scale metabolic reconstruction. Nat Protoc ▶ Model Falsification, Semidefinite Programming
5(1):93–121. doi:10.1038/nprot.2009.203 ▶ Semidefinite Program
Feed Forward Loop 737 F
References terms of prediction accuracy, but are generally com-
putationally more intensive. In summary, filter and
Boyd S, Vandenberghe L (2004) Convex optimization. wrapper methods can complement each other.
Cambridge University Press, Cambridge
References
Feasible Parameter Space
Dash M, Liu H (1997) Feature selection for classification, intel-
ligent data analysis. An Int’l J 1(3):131–156
▶ Dynamic Metabolic Networks, k-Cone Kim YS, Street WN, Menczer F (2003) Feature selection in data
mining. In: Wang J (ed) Data mining: opportunities and
challenges. Idea Group, Hershey, pp 80–105
Feature Reduction
F
▶ Feature Selection Feed Forward Loop
Guangxu Jin
Feature Selection Systems Medicine and Bioengineering,
Bioengineering and Bioinformatics Program, The
Chenglei Sun Methodist Hospital Research Institute, Weill Medical
Institute of Systems Biology, Shanghai University, College, Cornell University, Houston, TX, USA
Shanghai, Shanghai, China
Synonyms
Synonyms
Feed forward loop motif, FFL
Attribute selection; Feature reduction; Variable selec-
tion; Variable subset selection
Definition
References
Feedback Regulation
Mangan S, Alon U (2003) Structure and function of the feed-
forward loop network motif. Proc Natl Acad Sci USA Yufei Huang
100:11980–11985 Picower Institute for Learning and Memory,
Shen-Orr SS, Milo R, Mangan S, Alon U (2002) Network motifs
in the transcriptional regulation network of Escherichia coli. Massachusetts Institute of Technology, Cambridge,
Nat Genet 31:64–68 MA, USA
Greehey Children’s Cancer Research Institute,
University of Texas at Texas Health Science Center at
San Antonio, San Antonio, TX, USA
Feed Forward Loop Motif, FFL Department of Epidemiology and Biostatistics,
University of Texas at Texas Health Science Center at
▶ Feed Forward Loop San Antonio, San Antonio, TX, USA
Definition
Feedback Loops
Feedback regulation describes the particular type of
▶ MicroRNA Regulation, Feedback Loop gene regulation, where the current status of gene
Fibroblasts 739 F
expression will influence the future status of the same Characteristics
set of genes. In a cell, feedback regulation is an essen-
tial mechanism to ensure the stability of the cellular Fibroblasts reside in the stroma, are derived from the
system under changing environmental conditions. mesoderm, and are members of the cell type known as
mesenchymal. Dormant fibroblasts become fibrocytes
that can be easily identified along the margins of many
Cross-References types of connective tissue.
Fibroblasts are important to the structural integrity
▶ Gene Regulation of tissues and organisms, in part because they generate
type I collagen, the most abundant protein in animals.
Collagen I is a triple-helical ECM protein that provides
References mechanical strength. Fibroblasts are also a major
F
source of fibronectin, which serves as a scaffold for
Campbell N, Mitchell L et al (2003) Biology: concepts and cell adhesion, and generate many minor components of
connections. Benjamin Cummings, New York
the ▶ extracellular matrix. Fibronectin production by
Reik W (2007) Stability and flexibility of epigenetic gene regula-
tion in mammalian development. Nature 447(7143):425–432 fibroblasts is highly responsive to stress and is alterna-
tively spliced in response to specific signals resulting
in a highly variable protein with a multiplicity of
functions.
Feed-Forward Loops Fibroblasts are described by their common
characteristics that include spindle cell morphology,
▶ MicroRNA Regulation, Feed-Forward Loops vimentin cytoskeleton, and (relatively) proliferative
quiescence; yet fibroblasts are highly heterogeneous.
The distribution of fibroblasts varies widely between
organs, from single cells tucked between epithelium
FFL and endothelium to densely packed. Another feature of
fibroblasts is that they are readily cultured because
▶ Canonical Network Motifs they attach to tissue-culture plastic and proliferate in
response to serum used in most culture media. Because
of this common trait, many experimental studies have
used fibroblasts as “garden-variety” cells. Yet genome-
Fibroblasts wide patterns of gene expression in cultured fetal and
adult human fibroblasts derived from skin at different
Mary Helen Barcellos-Hoff anatomical sites revealed that fibroblasts from each
Department of Radiation Oncology and Cell Biology, site displayed distinct and characteristic transcriptional
New York University School of Medicine, New York, patterns (Chang et al. 2002).
NY, USA A broad classification discriminate fibroblasts
derived from visceral organs and cutaneous origin. It
was suggested that the site specificity of the molecular
Synonyms
signals and extracellular proteins expressed by fibro-
blasts provide “home addresses” that are monitored by
Fibrocytes; Interstitial cells; Mesenchymal cells
epithelial cells to restrict their migration, survival, and
proliferation (Chang et al. 2002). Fibroblasts isolated
Definition from different anatomical sites and from the same ana-
tomical site but of different diseases display topographic
Fibroblasts are cells that reside in connective tissues differentiation and positional memory. Although this
and produce extracellular matrix (ECM). Quiescent suggests fibroblasts at different locations in the body
fibroblasts, known as fibrocytes, are abundant in the could be considered distinct differentiated cell types,
interstitial tissues of all organs. there has been little success if classification schemes
F 740 Fibrocytes
References
Fisher’s Linear Discriminant Fisher RA. On the interpretation of w2 from contingency tables,
and the calculation of P. J Roy Stat Soc. 1922;85(1):87–94.
F
Fitness
Fisher’s Test
Philippe Huneman
Kejia Xu Institut d’Histoire et de Philosophie (IHPST), des
Institute of Systems Biology, Shanghai University, Sciences et des Techniques Université Paris 1
Shanghai, China Panthéon-Sorbonne, Paris, France
Definition Definition
Fisher’s exact test is a statistical test used to exam- The word comes from the phrase “survival of the
ine the significance of the association between two fittest,” which Darwin borrowed from Spencer in the
categorical variables. The test is useful for categor- last edition of the Origin of species due to his own
ical data that result from classifying objects in two hand. Fitness has two components, survival and repro-
different ways, and it is used to examine the sig- duction. If an organism is very well adjusted to its
nificance of the association between the two kinds milieu, but does not reproduce, it has no evolutionary
of classification. impact; hence, reproduction is often taken as crucial,
For an example application of the 2 2 and survival considered as a proxy for reproduction
test, let X be a journal, either a mathematics (the longer X survives, the higher the chances it has
magazine or a science magazine, and let Y be the offspring). However, even if often correct, those
number of articles on the topics of mathematics and approximations prove to be controversial, for example,
biology. when some organisms grow rather than reproduce. In
some contexts, one has to model the two components
Mathematics Science of fitness separately, for instance, when the issue is to
magazine magazine Total
understand how individual resources are partitioned
Mathematics a b a+b
between reproduction and survival.
Biology c d c+d
More formally, fitness can be defined as the probabil-
Total a+c b+d a+b+c+d
ity distribution of the representation of a gene, a trait or
Fisher calculates the conditional probability of get- an individual at the next generation; yet often equations
ting the actual matrix given the particular row and can consider only the expectancy (fitness meaning
column sums: expected offspring number of an individual). Because it
concerns expected rather than actual offspring, fitness
is often metaphysically considered as a propensity
aþb cþd aþbþcþd
P¼ (sometimes called “expected fitness”) rather than as
a c aþc
a categorical property (then called “realized fitness”).
ða þ bÞ!ðc þ dÞ!ða þ cÞ!ðb þ dÞ! Sometimes, several generations have to be taken
¼ ; (1)
a!b!c!d!ða þ b þ c þ dÞ! into account to understand the evolutionary dynamics
F 742 Fitness Function
(e.g., when explaining the constancy of sex ratio, which computed a priori by estimating the performances of
involves considering the effect on grandchildren) (Sober various trait types (race speed, rate of metabolism,
2002). The case of altruism compels biologists to con- visual acuity, etc.), provided that one has an idea
sider selection at the level of genes. According to about the relative importance of all factors for survival
▶ Hamilton’s rule, the ▶ relatedness between actor and reproduction.
and beneficiary may account for the selection of
the altruistic act because the degree of relatedness
mitigates the cost: if the act increases the number Cross-References
of altruistic alleles at the next generation (as com-
pared to the selfish alleles), be they directly alleles ▶ Explanation, Evolutionary
of the offspring of the actor, or alleles of the
offspring of the beneficiaries of its altruistic acts,
then altruism evolves. One can therefore reason by References
including within fitness all those alleles due to the
altruist activities of the focal individual. Inclusive Bouchard F, Rosenberg A (2008) Fitness. In: Stanford on-line
encyclopedia of philosophy
fitness is therefore the number of genes directly
Grafen A (2009) Formalizing Darwinism and inclusive fitness
passed on to the next generation by a focal indi- theory. Phil Trans R Soc B 364(1):3135–3141
vidual, plus the ones that are passed on by its kin. Sober E (2002) The two faces of fitness. In: Krimbas R (ed)
What is therefore increased by selection is rather Thinking about evolution: historical, philosophical, and
political perspectives. Cambridge University Press,
inclusive than individual fitness, even though
Cambridge
calculating inclusive fitness may be difficult in
practice (Grafen 2009).
Even if mathematically speaking the construal of
fitness is clear and how to construe it in a given prob- Fitness Function
lem is often straightforward, the issue of the bearer of
fitness is difficult. Originally, only organisms had fit- Ke-Qin Liu
ness, which was often computed as the number of Institute of Systems Biology, Shanghai University,
offspring; with Modern Synthesis and the formulation Shanghai, China
of evolution in terms of gene frequencies in the context
of population genetics, fitness is also ascribed to genes
and genotypes. These values are interdependent of Synonyms
each other because the fitness of a gene can be seen
as the contribution it makes to the fitness of the organ- Objective function
ism, but only in the case of asexual organisms the
number of offspring equals the number of copies of
a given gene. Moreover, fitness is often seen as lifetime Definition
fitness, that is, computed along the whole life of the
organism; yet in behavioral ecology, one mainly In evolutionary algorithm, fitness function is actually
considers individually each act and ascribes fitness to an objective function that is used to determine which
it (e.g., costs and benefits of various strategies are solution within a population is better when solving
measured in terms of fitness). a particular problem (Holland 1975; Nelson et al. 2009).
Often, absolute fitness cannot be measured but rel- In evolution algorithm, the solution space consists
ative fitness can, and only the latter is evolutionarily of a population of chromosomes where each chro-
important (individuals with identical fitnesses do not mosome is one solution that can be evaluated by
undergo natural selection). Sometimes, one measures the fitness function. Finally, the chromosomes can
a posteriori the fitness of types of organisms by be ranked by calculating the fitness function value
counting the number of offspring. Otherwise, one can (Holland 1975). A proper fitness function is very
consider fitness as strictly correlated to the way indi- important for the speed of the search and the
viduals face environmental demands, and then it can be solution of optimization.
Flow Cytometry 743 F
References This mapping FðtÞ ¼ QðtÞetS gives rise to a time-
dependent change of coordinates (y ¼ Q1 ðtÞx), under
Holland JH. Adaptation in natural and artificial systems: which the original system becomes a linear system
an introductory analysis with applications to biology,
control and artificial intelligence. (2nd edition. 1992,
with real constant coefficients dy dt ¼ Sy. Since QðtÞ is
MIT) Ann Arbor: University of Michigan Press; 1975. continuous and periodic it must be bounded. Thus, the
Nelson AL, Barlow GJ, Doitsidis L. Fitness functions in evolu- stability of the zero solution for yðtÞ and xðtÞ is deter-
tionary robotics: a survey and analysis. Robot Auton Syst. mined by the eigenvalues of S. The representation
2009;57(4):345–370.
FðtÞ ¼ PðtÞetB is called a Floquet normal form for
the fundamental matrix FðtÞ.
The eigenvalues of the matrix eTB are called the
Fitting of Continuous and Deterministic Floquet multipliers of the system (some called charac-
Models teristic multipliers). They are also the eigenvalues of the
F
(linear) Poincaré maps xðtÞ ! xðt þ T Þ. A Floquet
▶ Grid Computing, Parameter Estimation for Ordinary exponent (sometimes called a characteristic exponent)
Differential Equations is a complex m such that emT is a Floquet multiplier of the
system. Notice that Floquet exponents are not unique
since eðmþð2pik=TÞÞT ¼ emT (where k is an integer), but
Floquet multipliers are unique. The real parts of the
Flemming Body Floquet exponents are called Lyapunov exponents. The
zero solution is asymptotically stable if all Lyapunov
▶ Midbody exponents are negative, Lyapunov stable if the Lyapunov
exponents are nonpositive, and unstable otherwise.
Floquet Multiplier
Flow Cytometry
Tianshou Zhou
School of Mathematics and Computational Sciences, Xiaojun Liu
Sun Yet-Sen University, Guangzhou, Guangdong, China Internal Medicine, The Second Hospital of Hebei
Medical University, Shijiazhuang, Hebei, China
Definition
Definition
Consider a linear differential equation of the form:
Flow cytometry is a technology that simultaneously
x_ ¼ AðtÞx measures and then analyzes multiple physical charac-
teristics of single particles, usually cells, as they flow in
where AðtÞ is a continuous periodic function with a fluid stream through a beam of light. The properties
period T. If FðtÞ is a fundamental matrix solution to measured include a particle’s relative size, relative gran-
this system, then for all t 2 R, ularity or internal complexity, and relative fluorescence
intensity. These characteristics are determined using an
Fðt þ T Þ ¼ FðtÞF1 ð0ÞFðTÞ optical-to-electronic coupling system that records how
the cell or particle scatters incident laser light and emits
In addition, for each matrix B (possibly complex) fluorescence.
such that eTB ¼ F1 ð0ÞFðTÞ, there is a periodic
(period T) matrix function t7!PðtÞ such that
FðtÞ ¼ PðtÞetB for all t 2 R. Also, there is a real matrix Cross-References
S and a real periodic (period-2T) matrix function
t7!QðtÞ such that FðtÞ ¼ QðtÞetS for all t 2 R. ▶ Cell Sorting
F 744 Flow Cytometry, Flow Microfluorimetry
Cross-References
Flow Cytometry, Flow Microfluorimetry
▶ Fluorescence Microscopy
▶ Cell Cycle Analysis, Flow Cytometry ▶ Spectroscopy and Spectromicroscopy
Synonyms
Definition
Epifluorescence microscope; Fluorescence
Fluorescence occurs when a fluorescent compound microscope
absorbs light energy over a range of wavelengths
that is characteristic for that compound. It is
a transition of energy produced by the electron Definition
when it decays from higher energy level raised
by the absorption light to its ground state. In Fluorescence microscopy utilizes fluorescence as
Fluorescence-activated cell sorting fluorescence is a means of detecting objects through a microscope.
often used to show the particles’ characters
according to the specific antibody conjugated by
the fluorescent compound. Characteristics
Cross-References Cross-References
Steven D. Rhodes
References School of Medicine, Indiana University, Indianapolis,
IN, USA
Herman B (1998) Fluorescence microscopy. Microscopy hand-
books. Bios Scientific, Oxford. ISBN 9780387915517
Definition
Huang K, Murphy RF (2004) From quantitative microscopy to
automated image understanding. J Biomed Opt
9(5):893–912, ISSN 1083-3668 Fluorescence-activated cell sorting, or FACS for short,
MedlinePlus. Medical encyclopedia, August 2011. URL is a specialized type of cell sorting that utilizes flow
http://www.nlm.nih.gov/medlineplus/ency/encyclopedia_
cytometry. Fluorescently tagged cells are isolated into
A.htm
Molecular expressions. Fluorescence microscopy, August 2011. charged droplets which are separated based on their
URL http://micro.magnet.fsu.edu/primer/techniques/fluores- deflection between two electrodes.
cence/fluorhome.ht%ml
Rost FWD (1991) Quantitative fluorescence microscopy. Cam-
bridge University Press, Cambridge/New York. ISBN
Cross-References
9780521394222
Spring KR (2003) Fluorescence microscopy. In: Driggers RG ▶ Single Cell Assay, Mesenchymal Stem Cells
(ed) Encyclopedia of optical engineering, Dekker encyclo-
pedias series. Marcel Dekker, New York
Wolf DE (2007) Fundamentals of fluorescence and fluorescence
microscopy. In Greenfield Sluder, DE Wolf (eds) Digital
microscopy. Methods in cell biology, 3rd edn, vol 81. Aca- Fluorescent Dye
demic, pp 63–91
Wouters FS (2006) The physics and biology of fluorescence
▶ Fluorochrome
microscopy in the life sciences. Contemp Phys 47(5):
239–255, ISSN 0010-7514
Fluorescent Marker
Fluorescence Spectroscopy
▶ Fluorescent Markers
Xiaohua Wu
Department of Pediatrics, Herman B Wells Center for
Pediatric Research, Indiana University School of
Medicine, Indianapolis, IN, USA Fluorescent Markers
Cross-References References
Fluorophore
Synonyms
Yi Zeng
Department of Pediatrics, University of Arizona, Metabolic flux balancing
Tucson, AZ, USA
Definition
Definition
Flux Balance Analysis (FBA) is a method in ▶ meta-
Fluorophore is the structural domain or specific region bolic pathway modeling to quantify a metabolic state
of a molecule that is capable of exhibiting fluores- of a cell. It is a constraint-based approach, which uses
cence. Fluorochromes that are conjugated to a larger linear programming, subject to constrains imposed by
macromolecule through absorption or covalent bonds the stoichiometry of the metabolic network, thermody-
are termed “fluorophores.” namics, and the measured rates (rm). The fluxes of
F 750 Flux Balance Analysis
the metabolic network are computed under these a Network c Stoichiometric Matrix
constraints by optimizing an objective function under Ax r1 r2 r3 r4 r5 r6 r7 r8 r9
▶ steady state operation of the metabolic network r1 A 1 −1 0 0 0 0 0 0 0
(Edwards et al. 2002). FBA is used when limited A B 0 1 −1 0 −1 0 0 0 0
r2 C 0 0 1 −1 0 0 0 0 0
information about the accumulation rates, typically of r3 r4 D 0 0 0 0 1 −1 0 −1 0
an extracellular metabolite (i.e., measured rates, rm), B C Cx E 0 0 0 0 0 1 −1 0 0
is known and the stoichiometric matrix with coeffi- r5 F 0 0 0 0 0 0 0 1 −1
cients of all the unmeasured reactions (denoted by D
r6 r8
Su) is noninvertible to provide a unique solution. d Constraints
E F
Objective function = Maximize (r7)
r7 r9 0 ≤ r1 < 1
Characteristics Ex Fx 0.2 ≤ r4 ≤ 1
0.3 ≤ r9 ≤ 1
b Mass Balance equations r1−9 ≥ 0
The first step in FBA is reconstruction of the metabolic
S.r = 0
network. Once all the internal and transport reactions dA = r − r = 0
are identified, dynamic mass balance equations for all dt 1 2 e Solution
dB = r − r − r = 0
the metabolites are formulated. These mass balances dt 2 3 5 r1 1
dC = r − r = 0 r2 1
can also be represented in a matrix form representing 3 4 r3 0.2
dt
the stoichiometric matrix (S), and the flux vector (r). dD = r − r − r = 0 r4 0.2
5 6 8 r = r5 = 0.8
FBA analyzes the metabolic network assuming dt
dE = r − r = 0 r6 0.5
a steady-state condition, therefore the dynamic mass 6 7 r7 0.5
dt
dF = r − r = 0 r8 0.3
balance equations are set to zero, as given below: 8 9 r9 0.3
dt
Cross-References
Flux Control Coefficient
▶ FBA Analysis, Plant-Pathogen Interactions
▶ Network Modeling of Biochemical Transport Emma Saavedra and Rafael Moreno-Sánchez
Phenomena Department of Biochemistry, National Institute for
▶ Pathway Modeling, Metabolic Cardiology “Ignacio Chávez”, Mexico City, Mexico
▶ Optimization Algorithms for Metabolites
Production
▶ Steady State Definition
pharmacological, or other information that had not or Hill function, or one of their generalizations; how-
been used for the design of the model. If the model ever, other options include power-law functions,
has successfully gone through these design and testing lin-log representations, or more complex mathematical
phases, one may cautiously deem it appropriate for the descriptions (Voit and Chou 2009). The computational
purposes that initially led to the model design. In aspects of estimating parameters for kinetic rate func-
particular, one may use the model for explanations, tions are rather simple, as these functions are explicit
predictions, and applications, the generation of new and usually contain only a few parameters. For
hypotheses, assistance in the design of new biological instance, in the case of MM rate laws, the parameters
experiments, possibly the development of treatment can be estimated with methods of linear regression.
strategies for diseases, and the manipulation and opti- The data characterizing the kinetic properties of an
mization of the model toward specific goals, such as enzyme or transporter are usually measured on isolated
the production of organics in metabolic engineering. enzymes in vitro and account for optimal temperature
The phasing of the modeling process may give the and pH ranges, cofactors, modulators, and secondary
impression that modeling is straightforward. However, substrates. Once sufficient information on individual
in most cases it is an iterative process requiring the rate processes has been assembled, the modeler attempts
repeated return to earlier phases (Voit et al. 2008). to merge this information into an integrative mathemat-
ical model of the entire system, which typically consists
The Challenge of Parameter Estimation of ordinary differential equations.
The most challenging among the phases of model devel- Though this forward approach is theoretically
opment is usually the estimation of parameter values, straightforward, it has several disadvantages. First, it
especially when the investigated system is moderately requires that a considerable amount of local kinetic
large. Addressing this challenge successfully is crucial, information is at hand. Second, even if this information
because it is clear that the parameter values qualitatively is available in the literature, it had often been obtained
and quantitatively characterize the metabolic model and from different organisms or with experiments under
distinguish it from others. Furthermore, most model various conditions. As a consequence, the ultimately
analyses can only be executed when all parameter emerging integrated model is often not internally
values are specified, and most predictions and many consistent or does not match biological observations,
explanations made with the model are of a numerical thus mandating numerous rounds of refinements,
nature that requires parameterization. restructuring, and reparameterization. These iterations
The development of parameter estimation methods is are very labor intensive and time consuming and greatly
driven by the availability and characteristics of experi- benefit from a combination of biological and computa-
mental data. Correspondingly, estimation methods can tional expertise which, however, is still rare.
be very distinct, thereby reflecting the variety of exper-
imental data types (Voit 2004; Goel et al. 2006). The Inverse Estimation
currently available methods can be classified as: Instead of first representing one process at a time and
1. Forward (bottom-up) approaches and subsequently merging all representations into a system
2. Inverse (top-down) methods using steady-state or of differential equations, it is also possible to infer
time series data quantitative information directly from data that char-
acterize the entire system. These inverse approaches
Forward Estimation fall into two categories that correspond either to
Before molecular high-throughput data were available, ▶ steady-state data or to time series data.
essentially all metabolic models were developed Steady-state data characterize a metabolic system
according to the first strategy, and most models are under a condition where all concentrations have
still generated in this fashion today. Specifically, a reached constant levels and all influxes and effluxes
model is designed based on local kinetic information are in balance. The most common approaches for
that is used to formulate functions for individual bio- parameter estimation from such data use stoichiomet-
chemical processes, such as enzyme-catalyzed reactions ric analysis (▶ Conservation Analysis) and ▶ flux bal-
and transport steps. The default choice for these func- ance analysis (FBA) (e.g., Palsson 2006). In both
tions or rate laws is typically a Michaelis–Menten (MM) methods, the stoichiometry is assumed to be known,
Forward and Inverse Parameter Estimation for Metabolic Models 755 F
some influxes and effluxes are measured as they enter sometimes even in vivo, and that insights gained from
or leave the system under steady-state conditions, and their analysis are therefore as close to reality as is
all other fluxes are inferred with computational currently feasible.
methods. In addition, FBA accounts for physico- The concept of these methods is intuitively simple:
chemical constraints on the fluxes in the system and Use the chosen symbolic model and an optimization
furthermore assumes that the cell attempts to optimize algorithm to determine values for all model parameters
an overall objective, such as maximal growth. This such that the dynamics of the model matches the
objective and all constraints are reformulated as observation data as closely as possible. The attraction
a constrained linear optimization task, which permits of this approach is that time series data, if obtained
an effective determination of a unique, optimal flux from in vivo perturbation or stimulus–response exper-
distribution that satisfies all constraints and maximizes iments, directly or indirectly account for all processes
the selected objective. These stoichiometrically based associated with the reaction of the system to the
F
methods solely characterize a steady-state flux distri- perturbation.
bution and do not provide information on concentra-
tions, regulation, or parameters associated with Algorithms for Inverse Estimation
dynamic features of the system. Many optimization algorithms have been proposed for
To a limited degree, parameter values can also be inverse estimation tasks in biology. Most of them
obtained from data close to a steady state. In this case, attempt to minimize the difference between the exper-
the data are obtained from experiments that measure imental data and a model response that is obtained per
the responses of a biological system to small perturba- computer simulation. However, in spite of substantial
tions (Sorribas and Cascante 1994). Such experiments efforts, none of these algorithms is truly satisfactory in
may use biochemical inhibitors, artificial ligands, realistic situations of moderately large systems or
genetic mutations, or a variety of other methods. where time series data are sparse, incomplete, or
The second category of inverse estimation corrupted by noise.
approaches is very different from these methods at, or The most prominent algorithms for inverse tasks may
close to, a steady state. Namely, they are based on time be divided into three groups. The first group consists
series data that characterize the full dynamic response of gradient-based, steepest-descent, or hill-climbing
of a system to some stimulus, such as an environmental methods. Best known among these are the Gauss–New-
stress or the sudden availability of food. These types of ton and Levenberg–Marquardt algorithms, which are
data are becoming more prevalent, thanks to recent included in all major software packages of the field,
advancements in molecular high-throughput tech- such as Mathematica and Matlab. The second group
niques at the genomic, proteomic, and metabolomic consists of stochastic search algorithms (▶ Evolution
levels. At least in principle, these data have the capa- Programming). These include evolutionary computa-
bility of characterizing global responses in a dynamic tion (EC) (▶ Stochastic Simulation Algorithm), ▶ sim-
manner, which subsequently permits the estimation of ulated annealing (SA), adaptive stochastic methods,
parameter values and even the identification of the ▶ clustering methods, and other meta-heuristics, such
topology and regulation of a system in a “top-down” as ant colony optimization (ACO) and ▶ particle swarm
manner. optimization (PSO). Some of these algorithms are bio-
The experimental tools that can generate dynamic logically inspired and have been applied to parameter
data of metabolites include nuclear magnetic resonance estimation tasks with the goal of finding global solu-
(NMR), mass spectrometry (MS) coupled with high- tions, especially in the context of identifying the struc-
performance liquid chromatography (HPLC), and flow tures of gene regulatory networks (Moles et al. 2003).
cytometry. The experiments can be performed under The best-known evolutionary method among these is
many different conditions, such as different initial set- the genetic algorithm (GA), which is now widely avail-
tings, various gene knock-outs or knock-downs, or dif- able in many variants that have proven useful and prac-
ferent types of enzyme inhibition, and shed light on the tical in a variety of biological applications. The third
system from different angles. Clear advantages of such group of methods accounts for the fact that observed
“global” data include that their information content is data are affected by random events. Thus, these
rich, that they can be collected from the same organism, approaches consider parameter estimation as a branch
F 756 Forward and Inverse Parameter Estimation for Metabolic Models
of statistics and use statistical or machine-learning tech- biological phenomena are usually also very robust, so
niques as solution strategies. These techniques include that modest alterations in parameter values often do not
maximum likelihood estimation, Markov chain Monte change their behavior. One contributor to this robustness
Carlo methods, and Kalman filtering. is redundancy, which allows the compensation of fail-
One significant issue with time series analysis is ures in one component with changes in other compo-
that the differential equations of the model have to be nents. This compensation directly translates into
solved thousands of times. Approximation methods different sets of parameter values, which all perform
have been developed to avoid this numerical solution with similar effectiveness and create biological sloppi-
step. While they save an enormous amount of comput- ness that may or may not be related to the sloppiness in
ing time, they face their own challenges, which are not optimized parameter sets. Thus, precise solutions,
always easily overcome (Chou and Voit 2009; Voit consisting of unique sets of parameter values, may not
and Almeida 2004; Ramsay et al. 2007). even be the ultimate criterion for a quality fit, and instead
of aiming for a model with the smallest residual error,
Quality of Fit one might set as the goal to identify all sets of different
Comparisons between parameter estimation algorithms parameterizations that are consistent with the data,
are usually based on the goodness of their fits to exper- within some acceptable error. Some of these may be
imental data, which is typically assessed as the sum of discarded if they correspond to models with high sensi-
squared errors (SSE) between model results and the tivities, a low degree of stability, or other undesirable
observed data (Voit 2011). However, the best fit should properties. The remaining set of parameter values may
not be the only criterion, whether the estimation occurs turn out to be clustered tightly, consist of several distinct
in the forward or inverse direction. In many actual cases, clusters, or be scattered throughout a large domain within
the “best” model, as judged by the smallest SSE, actu- the search space. In each case, the characteristics of this
ally tends to run into ▶ overfitting problems, which set, combined with comparative analyses of the candi-
means that the model contains too many parameters. date models, may identify one model as more likely than
As a consequence, its extrapolation and prediction its alternatives or suggest hypotheses and critical labora-
power with respect to untested conditions is low. tory experiments that may ultimately yield deeper
Related to this issue is the observation that the same insights into the system under investigation.
dataset can often be fitted by distinctly different param-
eter sets or even different model structures with similar
accuracy. This issue of unidentifiability and sloppiness Cross-References
has received much attention in recent times (Gutenkunst
et al. 2007; Srinath and Gunawan 2010). ▶ Linear Regression
The insistence on a good fit is not necessarily the ▶ Metabolic Networks, Reconstruction
best strategy due to the nature of biological phenomena. ▶ Optimization and Parameter Estimation, Genetic
First, biological data are often affected significantly by Algorithms
inter-individual variability, as well as uncertainties ▶ Ordinary Differential Equation (ODE)
due to slight variations in experimental conditions. As ▶ Pathway Modeling, Metabolic
a consequence, biological observations are not as exactly
replicable as experiments in physics. For instance, the
KM value is often seen as a genuine property of an
enzyme, and in vitro characterizations are taken as true. References
However, the same enzyme might slightly vary between
Chou I-C, Voit EO (2009) Recent developments in parameter
species and strains, and even a high degree of sequence
estimation and structure identification of biochemical and
homology may not prevent differences in the affinity genomic systems. Math Biosci 219:57–83
between enzyme and substrate, which directly affects Goel G, Chou I-C, Voit EO (2006) Biological systems modeling
the KM value. In other areas of science, the intrinsic and analysis: a biomolecular technique of the twenty-first
century. J Biomol Tech 17:252–269
variability among items can often be characterized with
Gutenkunst RN, Casey FP, Waterfall JJ, Myers CR, Sethn AJP
large numbers of measurements, but this strategy is not (2007) Extracting falsifiable predictions from sloppy models.
always feasible in biology. While variability is natural, Ann N Y Acad Sci 1115:203–211
Founders Effect 757 F
Moles CG, Mendes P, Banga JR (2003) Parameter estimation in
biochemical pathways: a comparison of global optimization Foundational Model of Anatomy
methods. Genome Res 13:2467–2474
Palsson BØ (2006) Systems biology: properties of reconstructed
networks. Cambridge University Press, Cambridge, UK Mark A. Musen
Ramsay JO, Hooker G, Cao J, Campbell D (2007) Parameter Stanford Center for Biomedical Informatics Research,
estimation for differential equations: a generalized smooth- Stanford University, Stanford, CA, USA
ing approach. J Roy Stat Soc B 69:741–796
Sorribas A, Cascante M (1994) Structure identifiability
in metabolic pathways: parameter estimation in models
based on the power-law formalism. Biochem J 298:
Definition
303–311
Srinath S, Gunawan E (2010) Parameter identifiability of The most thorough, thoughtful, and comprehensive
power-law biochemical system models. J Biotechnol ▶ ontology of human anatomy in existence, initiated by
149(3):132–140
Voit EO (2004) The dawn of a new era of metabolic systems
Prof. Cornelius Rosse at the University of Washington in F
analysis. Drug Discov Today BioSilico 2:182 the 1990s.
Voit EO (2011) What if the fit is unfit? In: Dehmer M, Emmert-
Streib F, Salvador A (eds) Applied statistics for biological
networks. Wiley, New York Cross-References
Voit EO, Almeida JS (2004) Decoupling dynamical systems for
pathway identification from metabolic profiles. Bioinformat- ▶ Protégé Ontology Editor
ics 20(11):1670–1681
Voit EO, Chou I-C (2009) Parameter estimation in canon-
ical biological systems models. Int J Syst Synth Biol 1:
1–19
Voit EO, Qi Z, Miller GW (2008) Modeling complex biological
systems, one step at a time. Pharmacopsychiatry 41(suppl 1):
S78–S84 Founders Effect
Sabina Leonelli
ESRC Centre for Genomics in Society, University of
Exeter, Exeter, Devon, UK
Forward-scattered Light (FSC)
Frequency
In SI units, the unit of frequency is the hertz (Hz), a database. Frequent itemsets play an essential role in
named after the German physicist Heinrich Hertz. 1 Hz many ▶ Data Mining tasks and are related to interesting
means that an event repeats once per second. patterns in data, such as ▶ Association Rules. Some
A previous name for this unit was cycles per second. concepts are necessary in order to understand this
A traditional unit of measure used with rotating definition:
mechanical devices is revolutions per minute, abbre- 1. Transaction: Let X ¼ {x1, x2, . . ., xm} be a set of m
viated RPM. 60 RPM equals 1 Hz. The period, usually elements called items and let T ¼ {t1, t2, . . ., tn} be
denoted by T, is the length of time taken by one cycle, a set of n subsets of items called transactions. Each
and is the reciprocal of the frequency f: transaction in T identifies a subset of items.
2. Frequent itemset: Given a set of items X ¼ {x1, x2,. . .,
1
T¼ xm} and a set of transactions T ¼ {t1, t2, . . ., tn},
f
a subset of X, S, is called a frequent itemset if S
The SI unit for period is the second. occurs in a percentage of all transactions in T that
Calculating the frequency of a repeating event is exceeds a threshold, named support.
accomplished by counting the number of times that 3. Support: The support of an itemset Y, support (Y),
event occurs within a specific time period, then dividing is defined as the number of transactions in T which
the count by the length of the time period. For example, contain the itemset Y.
if 71 events occur within 15 s the frequency is: What exactly constitutes an item or a transaction
depends on the application and on the type of infor-
71
f ¼ 4:7ðHzÞ mation to be extracted. In a ▶ Gene Association
15 s
Analysis context, the meaning of transaction is usu-
If the number of counts is not very large, it is ally associated with “Over-expression,” that is, only
more accurate to measure the time interval for those “Over-expressed” genes will be understood to
a predetermined number of occurrences, rather than be included in the transaction. Equivalently, the
the number of occurrences within a specified time. The term frequent itemset is related to frequent subset
latter method introduces a random error into the count of of genes.
between 0 and 1, so on average half a count. This is Frequent Pattern Mining (FPM) techniques provide
called gating error and causes an average error in the methods to extract automatically all the frequent
calculated frequency of Df ¼ 1/(2 Tm), or a fractional itemsets from a dataset, being an extremely costly
error of Df / f ¼ 1/(2 f Tm), where Tm is the timing task (Alves et al. 2010). In ▶ Microarray data analysis,
interval and f is the measured frequency. This error the specific ▶ Gene Expression dataset structure
decreases with frequency, so it is a problem at low (thousands of genes against only hundreds of exper-
frequencies where the number of counts N is small. imental conditions) increases the frequent itemsets
mining process complexity. Due to this fact, devel-
oping efficient FPM techniques to be applied to
▶ Genomic studies has been an important challenge
Frequent Pattern Mining during the last years.
Next, in order to provide a better understanding of
Jesús Aguilar-Ruiz1, Domingo Rodrı́guez -Baena1 and Frequent Pattern Mining methods, a simple APRIORI
Ronnie Alves2 (see the review paper of Han et al. 2007 for further
1
School of Engineering, Pablo de Olavide University, information) example is illustrated in Fig. 1.
Escuela Politécnica Superior, Seville, Spain Let M be a discretized matrix, where 1 and 0 mean
2
Instituto de Informática, Universidade Federal do Rio “Over-expressed” and “Under-expressed,” respec-
Grande do Sul, Porto Alegre, Brazil tively. Table T represents the transactions and their
items. The APRIORI algorithm generates, iteratively,
Definition candidate itemsets. In every iteration, the support of
every candidate itemset is calculated, eliminating
Frequent Pattern Mining is a ▶ Data Mining subject those itemsets with a support value under a threshold
with the objective of extracting frequent itemsets from (set to 2/4 in this example). Based on the idea that an
Frequentist Approach 761 F
Iteration 1
g1 g2 g3 g4 g5 Itemset Sup.
Itemset Sup.
Tid Items Itemset {g1,g2} 0.25
c1 1 0 1 1 0 {g1} 0.5
c1 g1,g3,g4 {g1} {g1,g3} 0.5
c2 0 1 1 0 1 {g2} 0.75
c2 g2,g3,g4 {g2} {g1,g5} 0.25
c3 1 1 1 0 1 c3 {g3} 0.75
g1,g2,g3,g5 {g3} {g2,g3} 0.5
c4 0 1 0 0 1 {g4} 0.25
c4 g2,g5 {g5} {g2,g5} 0.75
{g5} 0.75
M T {g3,g5} 0.5
Iteration 3 Iteration 2
Frequent Pattern Mining, Fig. 1 Example of the Apriori algorithm with support set to 2/4, that is, every itemset, to be considered
as a valid candidate, has to appear in at least two of the four transactions
FuGE Overview
The core resource in the FuGE project is an object model,
comprising 10 packages (Fig. 1), each of which pertains
FR-IMGT to a particular type of data or metadata that is generically
required for digitally representing any type of life sci-
▶ Framework Region (FR-IMGT) ences experiment (Jones et al. 2007). An example is the
FuGE 763 F
Audit Contacts, auditing and security settings for all objects.
Protocol package (Fig. 2), which contains a set of UML method. For example, extensions built for proteomics
classes and diagrams showing the relationships between capture the essential parameters of protein separation
objects. The key components in the Protocol package are techniques (Gibson et al. 2010) as defined by the
the Protocol class – a description of what should be done, associated minimum reporting guidelines documents –
for example, a standard operating procedure – and the MIAPE (Taylor et al. 2007). In this way, the format
ProtocolApplication class – a description of what was developer does not have to recreate all the models for
done, that is, with any runtime parameter values differing the basic core; FuGE provides all the elements for
from the defaults defined in the Protocol. Similarly, the handling protocols, data files, samples, and ontologies.
Data package contains structures for capturing data New data models can be developed quickly, and the
values in regular multidimensional matrices or as exter- associated project software can be used to create
nal files. ProtocolApplication is used to map between the a relational database or format an editing software auto-
input and output data (or biological samples) used at each matically (Belhajjame et al. 2008; Swertz et al. 2010).
stage of an experimental or data analysis pipeline, and Systems biology investigations typically use a range
thus can track the full audit trail for how the final results of experimental approaches to understand cellular
were generated through each stage in the process. processes. In the absence of using an extensible model,
developing a new data format or exchange standard can
FuGE for Data Format Development be a challenging process, and resulting models for differ-
One of the primary uses of FuGE is to develop new data ent domains tend to represent similar concepts (such as
exchange formats or standards (Jones et al. 2009). protocols and parameters) using different terminologies
A user interacts with the FuGE model with a UML and different levels of detail, thus making data integra-
editing tool or an XSD editing tool if working in the tion more difficult. Furthermore, some models do
FuGE-light mode. Extensions are built to capture the not allow for audit trails to capture all the processes
data types specific to the experimental or analysis that have been applied to the sample or data file.
F
764
Parameter
1 Parameter 0..* ParameterValue
+isInputParam : Boolean [0..1]
0..1 0..*
0..* 0..1
DefaultValue Value
0..1 1
ContactRole Parameters ParameterValues
Measurement
0..*
Provider
0..1 0..1 1
Parameterizable ParameterizableApplication
Protocol
Protocol 0..* ProtocolApplication
1
0..* 0..* +activityDate : DateTime [0..1]
Software Equipment 1 0..1 0..* 0..1
InputData
0..* OutputMaterials
MeasuredMaterial 0..*
1 Data
0..*
0..*
Material
FuGE, Fig. 2 A component of the Protocol package, showing how parameter values are provided by a ProtocolApplication
FuGE
Function 765 F
Adoption of the FuGE methodology for model develop- management of high-throughput systems biology
ment should encourage developers to capture this level investigations. FuGE captures the necessary object
detail where appropriate, which will help with unambig- abstractions that can be implemented across three layers
uous interpretation of the results by the diverse commu- of data management: (1) data persistence, (2) in silico
nity of potential consumers of the data. and bench protocol management, and (3) data exchange.
Functional explanations address questions about why Millikan RG (1989) In defense of proper functions. Philos Sci
a biological role is performed the way it is (e.g., “why do 56:288–302
Neander K (1991) The teleological notion of ’function’. Australa
many pathways that generate ATP start by activating J Philos 69:454–468
their substrate?”) by pointing to the advantages of Wouters AG (2005) The function debate in philosophy. Acta
performing the role in that way rather than in some Biotheor 53:123–151
conceivable alternative way (Wouters 2007). This kind Wouters AG (2007) Design explanation: determining the con-
straints on what can be alive. Erkenntnis 67:65–80
of explanation is often called “functional explanation” Wright L (1976) Teleological explanations: an Etiological
(especially by biologists) because it is concerned with analysis of goals and functions. University of California
the advantages of certain forms of organization rather Press, Berkeley
than with the question of how those forms are brought
about. It is discussed in the entry on functional explana-
tion (▶ Explanation, Functional).
Biological roles also play an important role in certain Function, Distributed
explanations in evolutionary biology (the study of
the history and dynamics of lineages of organisms), Jonathan F. Davies
especially in adaptation explanations. Adaptation expla- Fondazione Bruno Kessler, Trento, Italy
nations (also called “selection explanations”) are evolu-
tionary explanations that explain certain characteristics
of a population as the result of past interaction in that Synonyms
population between variants that differed in their fitness.
If a certain trait evolved because the fitness of past Delocalized; Function
variants having that trait was higher than that of their
competitors lacking that trait because the presence of that
trait improved the performance of a certain role function, Definition
one might say that the trait evolved as an adaptation for
performing that role function. For this reason, adaptation A distributed function is a feature of mechanistic
explanations are sometimes called “functional explana- explanations of some of the properties of complex
tions” (especially by philosophers) (Brandon 1990). This systems (▶ Complex System). It is a component
kind of explanation is discussed in the entry on evolu- causal-role ▶ function that cannot be localized onto
tionary explanation (▶ Explanation, Evolutionary). a readily identified component structure. A function
may be distributed across multiple nonlinearly
interacting elements or processes, across the whole sys-
tem or even transcend systemic boundaries.
Cross-References
▶ Explanation in Biology
▶ Explanation, Evolutionary
Characteristics
▶ Explanation, Functional
Central strategies in the mechanistic explanation of the
properties and behaviors of biological systems are
decomposition and functional localization. Decomposi-
References tion depends on the assumption that the behavior of
a system is a product of a set of subordinate functions,
Bigelow J, Pargetter R (1987) Functions. J Philos 84:181–196
and that the interactions between the functional ele-
Brandon RN (1990) Adaptation and environment. Princeton
University Press, Princeton ments are minimal and can be handled additively
Canfield JV (1964) Teleological explanation in biology. Br (Bechtel and Richardson 2010). Functional localization
J Philos Sci 14:285–295 is achieved when a structural decomposition of the
Cummins R (1975) Functional analysis. J Philos 72:741–765
system – in simple cases achieved through observation,
Garson J (2008) Function and Teleology. In: Sarkar S,
Plutynski A (eds) A companion to the philosophy of biology. and in more complex cases by the identification, of
Blackwell, Cambridge, pp 525–549 regions of maximal causal interaction, bounded with
Function, Distributed 769 F
regions of low interactivity (see ▶ Top-down Decom- tested, for example, through the use of targeted ▶ per-
position of Biological Networks) – yields identifiable turbation experiments. Successful explanations are
components that are held to be responsible for these achieved when functional decompositions map onto
functions (▶ Functional Modules and Complexes). structural decompositions. Problems arise, however,
The full explanation is achieved by the localization of when this mapping is not achieved or when the pertur-
a complete set of bottom-level component functions bation of components yields counterintuitive results.
(activities) in particular structures (entities) possessing Some features of biological systems do not appear to
the requisite characteristics or capacities that enable be analyzable into functionally discrete components and
them to be the bearer of these functions (Machamer the functionally relevant properties of many compo-
et al. 2000). Although it is not necessary to assume that nents are context sensitive to some degree. Bechtel and
a single component (in the sense of an individual, spa- Richardson (2010) identify systems in which the struc-
tially localized entity or structure identified as tural components seem to perform tasks that would not
F
a component of the system) is responsible for a specific appear in an intuitively plausible functional decomposi-
activity, this assumption is often made, if only as a first tion of the system. This makes the localization of com-
approximation. Even in cases where components interact ponent functions onto identified discrete structures
(a feature of what Herbert Simon (1996) called near- impossible and has given rise to the suggestion that the
decomposable systems), the primary explanatory burden emergent behavior of the system is inexplicable in
rests on the intrinsic (context-independent) properties of mechanistic terms (▶ Emergence).
the components. Interactions are usually modeled in Kauffman’s (1993) explanation of the stability of
a linear, additive fashion and organizational factors are ▶ gene regulatory networks on the basis of their con-
relegated to a secondary role, as constraints. nectivity is an extreme example of an explanation of
A well-known example of this explanatory strategy a systemic feature that does not make use of functional
in molecular biology is Jacob and Monod’s (1961) localization. His network model consists of simple
operon model of ▶ gene regulation. They posited regu- nodes, each node being in one of two states (on, off).
latory genetic elements as a solution to the problem of The interactions between the nodes are simple activa-
accounting for changes in gene expression. The details tion or repression, so making transitions between suc-
of this model include the functionally defined ▶ repres- cessive states of the network Boolean operations
sor. This consists of a protein that exhibits certain struc- (▶ Boolean Networks). Kauffman argues that networks
tural features allowing it to bind to the operator and so of this sort exhibit behavior that encounters stable cycles
prevent the expression of the structural genes. The shape (▶ Stability) even in the face of perturbations and that
and molecular constitution of the regulatory protein a gene network characterized by a similar architecture
ensures that it is able to perform the bottom-level activ- will be robust. The explanatory properties for the
ities (in this case, geometrical/mechanical and chemical ▶ robustness of the network are distributed across
bonding) that constitute an instance of gene regulation. the entire system and the intuitively satisfying parsing
Other component functions (structural genes, ▶ Pro- of the systemic behavior into functionally discrete
moter, operator, etc.) are performed by identified struc- components is not possible. Any explanatory properties
tures each constituted in such a way as to make their of individual nodes are dependent on their relations to
individual contribution to the behavior of the system other nodes rather than on their intrinsic properties; for
intelligible. The adequacy of the explanation is a result example, a ▶ hub is a hub in virtue of its high number of
of the gross systemic behavior being the sum of the connections and if it is functionally significant, it is so
linear sequence of sub-tasks (functions), which in turn because of these connections.
are the sum of their component sub-tasks until we reach A less radical example of a process of the distribution
the lowest level of entities and activities of concern to of a functional component of a biological system is the
the field (Machamer et al. 2000). notion of a eukaryotic gene. Genes are made up of
Standard strategies for the development of this kind non-contiguous exons (▶ Exon) interspersed with non-
of mechanistic explanation often involve the intuitive coding introns (▶ Intron), which, to complicate matters
parsing of the systemic behavior into component func- further, are utilized in differing ways in different devel-
tions giving rise to a plausible ▶ mechanism sketch or opmental processes. The localization of the functional
schema (Craver 2007). Competing schemas are then unit (“gene”) onto a discrete molecular structure
F 770 Function, Distributed
(i.e., a contiguous section of DNA) does not appear to be ascription of functional roles to individual modules,
possible. This, and further complications with regard to combinations of modules, and other network features
the physiological roles of the gene products, has led (e.g., connectivity, small world architecture; see
Moss (2003) to argue that the functional unit (called ▶ Small-World Property and ▶ Functional Modules
“gene-P” to signify its connection to phenotype) cannot and Complexes) can then be attempted by coordinated
be mapped onto any recognizable molecular structure perturbation experiments, conducted in vitro or in silico
(called “gene-D” to signify status as a developmental (or even in vivo) on the basis of molecular data and
resource). The phenotypic (functional) role of any par- theoretical work.
ticular molecular gene (gene-D) is not fixed by its intrin- The assumption underlying these methodological sug-
sic structural properties, but is rooted in features gestions is that functionality in biological systems is not
distributed throughout a dynamic network of regulatory always readily localizable onto discrete structural com-
and developmental resources. The extent of this distri- ponents. An individual function (which is defined in
bution will depend on the particular process and the relation to the concerns of the investigator) may be
context in which it is taking place. located onto a single molecular component, a discrete
It is argued that biological systems are characterized structure, a network module or complex, a chemical path-
by a spectrum from highly localized to radically way, a dynamic process, an architectural feature of the
distributed functional elements and that neither the whole system or may even interfere in the interaction
“reductionistic” (▶ Reduction) localization of bottom- between the system and its environment. One of the
level component functions in discrete structures nor the novel features of systems biology is thought to be that it
“holistic” (▶ Holism) representation of system dynamics facilitates the empirical investigation into the extent of
constitute adequate explanatory strategies (Krohs and distribution of any particular function in cases in which
Callebaut 2007). Instead a plurality of explanatory our intuitions and mental capacities let us down. Even
approaches, which depend on the nature of the subject in cases in which the function bearer is a dynamic process
matter and the interests of the investigator, should be (so involving multiple structural components over
pursued (Mitchell 2003). In systems biology, this means a period of time), it may be possible to construct
dealing with hierarchically organized dynamic systems a simulation that accounts for its role in the overall system
and networks (see ▶ Hierarchy and ▶ Organization), and ascertain to what extent it is spatiotemporally distrib-
the features of which are explicable in terms of combi- uted. It remains to be seen to what extent the computa-
nations of network and system dynamics and the prop- tional and theoretical tools available to systems biologists
erties of components structures, sub-networks, and will overcome the obstacles to achieving such ambitious
pathways (Kitano 2002). Addressing such ▶ complexity, aims. However, the developments in the capacity of
non-linearity, and heterogeneity is only possible with systems biologists to account for the behaviors of com-
huge quantities of data across the full range of omic plex biological systems are already throwing new light on
levels and the capacity to analyze these data in parallel. the roles of decomposition and functional localization
Traditional techniques for localizing functions through, and encouraging philosophers of science to think again
for example, single gene knockout experiments cannot about the limits of mechanistic explanatory strategies.
deal with robust systems or organizational and dynamic
properties. One suggestion is to attempt simulations in Cross-References
silico that integrate network and dynamical systems
models with molecular data concerning known compo- ▶ Boolean Networks
nents (Kitano 2002). Another suggestion, which relates ▶ Complex System
directly to the problems of explaining systems ▶ Complexity
containing distributed functional components, is to ▶ Emergence
focus on the “unbiased” structural decomposition of the ▶ Exon
system. Making use of quantitative methods for delin- ▶ Function
eating network modules, through the identification of ▶ Functional Modules and Complexes
local maxima of interaction, avoids the need for the ▶ Gene Regulation
intuitive decomposition of the system into functional ▶ Gene Regulatory Networks
components (Krohs and Callebaut 2007). The secondary ▶ Holism
Functional 771 F
▶ Hub the ▶ IDENTIFICATION Axiom) of ▶ IMGT-
▶ Intron ONTOLOGY, the global reference in ▶ immunogenet-
▶ Mechanism ics and ▶ immunoinformatics (Giudicelli and Lefranc
▶ Organization 1999; Lefranc et al. 2004, 2005, 2008; Duroux
▶ Perturbation et al. 2008), built by IMGT®, the international ImMu-
▶ Promoter noGeneTics information system® (http://www.imgt.
▶ Reduction org) (▶ IMGT® Information System). “Functional”
▶ Repressor identifies, whatever the molecule type (▶ MoleculeType),
▶ Robustness the functionality of ▶ Molecule_EntityType leafconcepts
▶ Small-World Property in undefined or germline configuration (▶ Configuration
▶ Stability Type), whose coding region has an open reading frame
▶ Top-down Decomposition of Biological Networks without stop codon and for which there is no described
F
defect in the splicing sites, recombination signals
(▶ Recombination Signal (RS)) and/or regulatory
References elements.
“Functional” is one of the three leafconcepts (the
Bechtel W, Richardson RC (2010) Discovering complexity: other two being “▶ ORF” and “▶ Pseudogene”) that
decomposition and localization as strategies in scientific
identify the functionality of Molecule_EntityType
research. MIT Press, Cambridge
Craver CF (2007) Explaining the brain: mechanisms and the leafconcepts in undefined configuration (▶ Configura-
mosaic unity of neuroscience. Clarendon Press, Oxford tion Type) conventional genes (▶ Conventional Gene)
Jacob F, Monod J (1961) Genetic regulatory mechanisms in the and immunoglobulin (IG) and T cell receptor (TR) con-
synthesis of proteins. J Mol Biol 3:318–356
stant (C) genes (▶ Constant (C) Gene), or in germline
Kauffman S (1993) The origins of order: self organization and
selection in evolution. Oxford University Press, Oxford configuration (▶ Configuration Type) (IG and TR vari-
Kitano H (2002) Looking beyond the details: a rise in system- able (V), diversity (D), and joining (J) (▶ Variable (V)
oriented approaches in genetics and molecular biology. Curr Gene, ▶ Diversity (D) Gene, ▶ Joining (J) Gene) genes
Genet 41:1–10
before DNA rearrangements) (Giudicelli and Lefranc
Krohs U, Callebaut W (2007) Data without models merging with
models without data. In: Boogerd F et al (eds) Systems biology: 1999; Lefranc et al. 2004; Duroux et al. 2008).
philosophical foundations. Elsevier, Amsterdam, pp 181–213
Machamer P, Darden L, Craver CF (2000) Thinking about
mechanisms. Phil Sci 67:1–25
Mitchell SD (2003) Biological complexity and integrative
Cross-References
pluralism. Cambridge University Press, Cambridge
Moss L (2003) What genes can’t do. MIT Press, Cambridge ▶ Configuration Type
Simon H (1996) The sciences of the artificial. MIT Press, ▶ Constant (C) Gene
Cambridge
▶ Conventional Gene
▶ Diversity (D) Gene
▶ FunctionalityType
▶ IMGT-ONTOLOGY
Functional ▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom
▶ IMGT-ONTOLOGY, Leafconcept
Marie-Paule Lefranc ▶ IMGT ® Information System
Laboratoire d’ImmunoGénétique Moléculaire, Institut ▶ Immunogenetics
de Génétique Humaine UPR 1142, Université ▶ Immunoinformatics
Montpellier 2, Montpellier, France ▶ Joining (J) Gene
▶ Molecule Entity Type
▶ MoleculeType
Definition ▶ Open Reading Frame (ORF)
▶ Pseudogene
“Functional” is a ▶ leafconcept of the “▶ Functiona- ▶ Recombination Signal (RS)
lityType” concept of identification (generated from ▶ Variable (V) Gene
F 772 Functional Differential Equations
References References
Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, Lefranc Al-Shahrour F, et al. FatiGO: a web tool for finding significant
M-P, Giudicelli V (2008) IMGT-Kaleidoscope, the Formal associations of gene ontology terms with groups of genes.
IMGT-ONTOLOGY paradigm. Biochimie 90:570–583 Bioinformatics. 2004;20:578–80.
Giudicelli V, Lefranc M-P (1999) Ontology for immunogenet- Reimand J, Kull M, Peterson H, Hansen J, Vilo J. g:Profiler—a
ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–1054 web-based toolset for functional profiling of gene lists from
Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G, large-scale experiments. Nucleic Acids Res. 2007;35(Web
Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D, Server Issue):W193–200.
Thouvenin V, Combres K, Girod D, Jeanjean S, Protat C,
Yousfi MM, Duprat E, Kaas Q, Pommié C, Chaume D,
Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics
and immunoinformatics. In Silico Biol 4:17–29
Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P,
Coelho I, Combres K, Ginestoux C, Giudicelli V, Chaume D,
Functional Genomics Experiment Model
Lefranc G (2005) IMGT-Choreography for immunogenetics
and immunoinformatics. In Silico Biol 5:45–60 ▶ FuGE
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT, a
system and an ontology that bridge biological and computa-
tional spheres in bioinformatics. Brief Bioinform 9:263–275
Jiguang Wang
Beijing Institute of Genomics, Chinese Academy of
Sciences, Beijing, Beijing, China Functional Model
▶ Process-based Model
Definition
References Characteristics
Spirin V, Mirny AL. Protein complexes and functional modules Identification of Signature and Functional
in molecular networks. Proc Natl Acad Sci USA.
Network Modules
2003;100(21):12123–8
Modular networks are analyzed to identify signa-
ture/function network modules from larger molecu-
lar networks. It involves topological analysis of the
networks to find the maximally scoring regions in
the network, followed by mapping the results to the
Functional/Signature Network Module true function categories like those present in the
for Target Pathway/Gene Discovery protein function annotation databases, e.g., the
Gene Ontology (GO) and pathway databases. Net-
Shipra Agrawal1 and M. R. Satyanarayana Rao2 work topology is the layout pattern of interconnec-
1
BioCOS Life Sciences Pvt. Limited, Institute of tions of the various elements like nodes and edges.
Bioinformatics and Applied Biotechnology, The functional annotation of signature modules is
Bangalore, Karnataka, India based on their gene/protein enrichment analysis
2
Chromatin Biology Laboratory, Molecular Biology from GO and pathway database. The gene or protein
and Genetics Unit, Jawaharlal Nehru Center for enrichment analysis of networks is carried out by
Advanced Research, Bangalore, Karnataka, India mapping network genes to known pathways and
gene ontology terms to determine which pathways/
GO terms are overrepresented in a given set of
Synonyms genes.
Functional modules are significantly enriched
Co-expression network; Expression signature; Func- in known target genes and can be used as fingerprints
tional GO and pathway-based network; Genomics to identify genes relevant to some specific biological
network functions or diseases or to discover new potential
drug targets (Wong et al. 2008). The system level expres-
sion network is processed to identify upregulated/
Definition downregulated signature and functional expression
network modules. For example, identification of
High-throughput molecular biology data is often downregulated synaptic vesicle module for brain disor-
analyzed and represented today through molecular ders that includes Alzheimer’s disease, bipolar disorder,
networks. The network of interacting proteins/genes Schizophrenia, and glioblastoma (Suthram et al. 2010)
is an undirected network, where proteins/genes (refer Expression Network Module Box for details).
F 774 Functional/Signature Network Module for Target Pathway/Gene Discovery
2. Bottleneck Nodes
Expression Network Modules Bottleneck nodes are those, which have a higher
Expression modules consist of clusters of genes betweenness (i.e., “bottleneck-ness”) and lower
whose expression profiles share a local similarity degree of connectivity with neighboring
or coordinate expression. The local similarity/ genes/proteins. Betweenness is one of the most
co-expression across genes is measured using important topological properties of a network. It
statistical correlation parameters, i.e., Pearson’s measures the number of shortest paths (the shortest
Correlation Coefficient. The genes with very distance between two nodes) between nodes going
high co-expression levels constitute a cluster/ through a certain node. Therefore, nodes with the
module, which usually indicates that they have highest betweenness control most of the informa-
a common biological function or share tion flow in the network, representing the critical
a common physiological condition. The condi- points of the network (Fig. 1). They act as important
tion-specific co-expression information provides links between modules in protein interaction net-
clues to the dynamic features of these network works, just as bridges connecting two important
modules. For example, dilated cardiomyopathy, hubs/modules. Bottlenecks control the major infor-
which is a leading cause of heart failure, has been mation flow in a network and correspond to the
well studied using expression network modules. dynamic components of the interaction network.
These networks have facilitated identification of They are observed to be significantly less well
putative biomarkers or therapeutic targets for co-expressed with their neighbors. They have been
heart failure and the underlying molecular found to be present in the yeast interactome in
mechanism of dilated cardiomyopathy (Lin abundance (Yu et al. 2007).
et al. 2010) 3. Network Motifs
Network motifs are patterns of a larger and more
complex network. They are overrepresented node
The modular structure of complex biological connectivity patterns and recur frequently in
networks also facilitates identification of biologi- a given network than expected at random. They
cally relevant gene hubs, bottleneck nodes, network may represent autoregulation or feed-forward
motifs, and biomarker nodes, which are described loops in a regulatory network and indicate func-
below. tional and evolutionary constraints in a network
1. Gene Hubs (Lee et al. 2002). The transcription networks of
In a molecular network, nodes (gene/protein) well-studied microorganisms are made up of
which are highly connected to other nodes are a small set of network motifs. These motifs are
called gene hubs/network hubs. The gene hubs also found in protein modification network and
have a high degree of connectivity with their interactions between neuronal cells and signaling
neighboring genes. The latest method for scoring networks. The specific ways (i.e., autoregulation or
a functional hub is based on the role the nodes feed-forward loops) in which the network motifs
play in providing connectivity among genes or are wired together describes the dynamics of each
proteins of interest relative to their role in the individual motif. These patterns exhibit a robust
global network. They are centrally important for dynamical stability across a complex biological
the cellular functions and tend to be essential and network (Fig. 1).
conserved in a network module. In the context of 4. Biomarker Nodes
disease pathways, hubs may represent potential Biomarkers are biomolecular signatures, which are
drug targets (Wuchty 2004; Levy and Siegal detectable and measurable and can be used to study
2008). Scientists working on common diseases the diagnosis/prognosis of disease or therapy.
have reported several important drug targets by Under normal conditions, they may be present at
identifying important functional hubs across basal levels in the cell. However, if the amount of
larger molecular networks derived from high- these molecules changes, they may indicate
throughput expression as well as protein complex response to therapies, complications or diseases.
data. Biomarker nodes can be identified by studying
Functional/Signature Network Module for Target Pathway/Gene Discovery 775 F
Functional Module
Functional/Signature Network Module for Target Path- If the network is constructed from co- expression (expression
way/Gene Discovery, Fig. 1 A complex biological network similarity data), its statistical analysis leads to identification of
and functionally important network elements. Topological anal- expression module. The expression modules or network module
ysis of a system level biological network leads to the identifica- can be functionally annotated to have a biological function,
tion of network modules, network motifs, and bottleneck nodes. which is finally termed as a functional module
gene network models having disease-associated The statistically significant functional gene/protein net-
gene expression profiles and biofluid proteomes works can be further analyzed and annotated to map
(Schiess et al. 2009). A gene, which is coordinately biologically important pathways and gene hubs as tar-
and consistently connected to critical disease gets. This section is focused on identification of pertur-
causal candidate genes and pathways are termed bation targets, target pathways/genes, using network
“biomarker nodes.” A highly connected biomarker biology approach.
node sometimes targets multiple related diseases; 1. Perturbation Targets from Network Biology
e.g., AZGP1 gene is a biomarker node for cardiac The network biology approach can be used to com-
hypertrophy, idiopathic cardiomyopathy, and idio- pare gene expression profiles of drug-treated or
pathic thrombocytopenic purpura, (Dudley and diseased-cell population with those of the normal
Butte 2009). cells. This helps in identifying the target genes and
pathways that are directly affected by the perturba-
Mapping Pathway/Gene Target From Networks tion experiments, e.g., genes that are deregulated in
Protein–protein interaction and transcriptional regula- some disease pathways or genes whose products
tory networks can be used to identify putative can be used as potential targets of some drug
biomarkers, target genes, target pathways, etc. The tar- (Mani et al. 2008, Karlebach and Shamir 2008).
get pathways and genes are more reliably detected from Topologically important networks are processed
expression profiling–based perturbation experiments. for gene ontology and pathway annotations to
F 776 Functional/Signature Network Module for Target Pathway/Gene Discovery
Target genes
define the effect of perturbation experiment and This involves constructing networks and studying
subsequently to identify the targets of molecular the kinetic interactions of a gene/protein with other
perturbation from global expression profile data. genes/proteins in a network (Fig. 2) (Dezso et al.
2. Target Pathways 2009; Gilchrist et al. 2006). The discovery of target
Biologically significant pathways can be identified pathways and genes is interrelated. A researcher can
by performing functional annotation and pathway always narrow down the list of target genes from the
enrichment analysis of molecular networks. The list of target pathways discovered from modular
high-scoring and conserved pathways can be used networks.
in disease-related studies, i.e., for understanding the
pathogenesis of disease, identifying important
deregulated metabolic and signaling networks and References
subsequently discovering potential drug targets for
the disease (Fig. 2) (Zheng and Christina 2009; Dezso Z, Nikolsky Y, Nikolskaya T, Miller J, Cherba D,
Zhou and Wong 2009). Further analysis of biologi- Webb C, Bugrim A (2009) Identifying disease-specific
genes based on their topological significance in protein
cally significant pathways and their cross talk across
networks. BMC Syst Biol 3:36
networks leads to identification of the most critical Dudley JT, Butte AJ (2009) Identification of discriminating
pathways in a complex disease mechanism that could biomarkers for human disease using integrative network
also serve as potential key target pathways. For biology. Pac Symp Biocomput 27–38
Erten S, Li X, Bebek G, Li J, Koyut€ urk M (2009) Phylogenetic
example, human Toll-like receptor signaling net-
analysis of modularity in protein interaction networks. BMC
work has been used to establish the important control Bioinformatics 10:333
points through pathway cross talk studies. The anal- Gilchrist M, Thorsson V, Li B, Rust AG, Korb M, Roach JC,
ysis identified potential candidates for inhibitory Kennedy K, Hai T, Bolouri H, Aderem A (2006) Systems
biology approaches identify ATF3 as a negative regulator of
mediation of TLR signaling with respect to their
toll-like receptor 4. Nature 441(7090):173–178
specificity and potency (Li et al. 2008). Karlebach G, Shamir R (2008) Modelling and analysis of gene
3. Target Genes regulatory networks. Nature Reviews Molecular Cell
A system level network biology approach can be used Biology 9:770–780
Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z,
to identify novel target genes of interest which are
Gerber GK, Hannett NM, Harbison CT, Thompson CM,
found to have significant involvement in some dis- Simon I, Zeitlinger J, Jennings EG, Murray HL,
ease, biological process, or an abnormality. Gordon DB, Ren B, Wyrick JJ, Tagne JB, Volkert TL,
FunctionalityType 777 F
Fraenkel E, Gifford DK, Young RA (2002) Transcriptional Definition
regulatory networks in Saccharomyces cerevisiae. Science
298(5594):799–804
Levy SF, Siegal ML (2008) Network hubs buffer environmental FunctionalityType is a concept of identification
variation in Saccharomyces cerevisiae. PLoS Biol 6(11):e264 (generated from the ▶ IDENTIFICATION Axiom)
Lin CC, Hsiang JT, Wu CY, Oyang YJ, Juan HF, Huang HC of ▶ IMGT-ONTOLOGY, the global reference in
(2010) Dynamic functional modules in co-expressed protein ▶ immunogenetics and ▶ immunoinformatics (Duroux
interaction networks of dilated cardiomyopathy. BMC Syst
Biol 4:138 et al. 2008), built by IMGT®, the international ImMuno-
Mani KM, Lefebvre C, Wang K, Lim WK, Basso K, Dalla- GeneTics information system® (http://www.imgt.org)
Favera R, Califano A (2008) A systems biology approach (▶ IMGT® Information System), that allows to
to prediction of oncogenes and molecular perturbation tar- identify, whatever the molecule type (▶ MoleculeType)
gets in B-cell lymphomas. Mol Syst Biol 4:169
Schiess R, Wollscheid B, Aebersold R (2009) Targeted proteo- (gDNA, cDNA, mRNA, or protein), the type of
mic strategy for clinical biomarker discovery. Mol Oncol functionality of a ▶ Molecule_EntityType leafconcept
3(1):33–44
F
(Giudicelli and Lefranc 1999; Lefranc et al. 2004;
Suthram S, Dudley JT, Chiang AP, Chen R, Hastie TJ, Butte AJ Duroux et al. 2008).
(2010) Network-based elucidation of human disease similar-
ities reveals common functional modules enriched for plu- The “FunctionalityType” concept comprises five
ripotent drug targets. PLoS Comput Biol 6(2):e1000662 leafconcepts, divided into two categories, according
Wong DJ, Liu H, Ridky TW, Cassarino D, Segal E, Chang HY to the configuration type (▶ Configuration Type) of
(2008) Module map of stem cell genes guides creation of the Molecule_EntityType leafconcept.
epithelial cancer stem cells. Cell Stem Cell 2(4):333–344
Wuchty S (2004) Evolution and topology in the yeast protein Three leafconcepts, ▶ functional, ORF (open read-
interaction network. Genome Res 14(7):1310–1314 ing frame), and ▶ pseudogene, identify the functional-
Yu H, Kim PM, Sprecher E, Trifonov V, Gerstein M (2007) The ity of Molecule_EntityType leafconcepts in undefined
importance of bottlenecks in protein networks: correlation configuration (conventional genes (▶ Conventional
with gene essentiality and expression dynamics. PLoS
Comput Biol 3(4):e59 Gene) and immunoglobulin (IG) and T cell receptor
Zheng L, Christina C (2009) Systems biology for identifying (TR) constant (C) genes (▶ Constant (C) Gene) or
liver toxicity pathways. BMC Proc 3(Suppl 2):S2 in germline configuration (IG and TR variable (V),
Zhou X, Wong ST (2009) Computational systems diversity (D) and joining (J) (▶ Variable (V) Gene,
bioinformatics and bioimaging for pathway analysis and
drug screening. Proc IEEE Inst Electr Electron Eng ▶ Diversity (D) Gene, ▶ Joining (J) Gene) genes
96(8):1310–1331 before DNA rearrangements).
Two leafconcepts, ▶ productive and unproductive,
identify the functionality of Molecule_EntityType
leafconcepts in rearranged or partially-rearranged
configuration (IG and TR entities after DNA
Functionality Type rearrangements, and by extension fusion entities resulting
from translocations, and hybrid entities obtained by
▶ FunctionalityType biotechnology molecular engineering).
Cross-References
FunctionalityType
▶ Configuration Type
Marie-Paule Lefranc ▶ Constant (C) Gene
Laboratoire d’ImmunoGénétique Moléculaire, Institut ▶ Conventional Gene
de Génétique Humaine UPR 1142, Université ▶ Diversity (D) Gene
Montpellier 2, Montpellier, France ▶ Functional
▶ IMGT-ONTOLOGY
▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom
Synonyms ▶ IMGT-ONTOLOGY, Leafconcept
▶ IMGT-ONTOLOGY, Unproductive
Functionality type ▶ IMGT ® Information System
F 778 Functional-Structural Plant Modeling
Synonyms Characteristics
Virtual plant modeling According to the FSPM paradigm, a plant responds to its
environment by adaptive modification of processes
constituting its physiology (e.g., ▶ photosynthesis) and
Definition plant architecture (e.g., bud break or dormancy, growth,
development, morphogenesis), thereby explicitly
Functional-structural plant modeling (FSPM) refers to addressing the feedbacks between structure and func-
a paradigm for the description of a plant by creating tion. In addition, such feedbacks can be implemented
a (usually object-oriented) computer model of its struc- and verified at different levels, e.g., locally at the organ
ture and selected physiological and physical processes, scale and globally at the plant or canopy scale.
at different hierarchical levels: organ, plant individual, A typical reaction of the plant to a change in envi-
canopy (a stand of plants), and in which the processes ronment or to an intervention (cutting, bending, etc.)
are modulated by the local environment. Structure by management or in the course of an experiment
comprises the explicit topology (connection between will thus be the relatively rapid readjustment of
Functional-Structural Plant Modeling 779 F
physiological functions and a somewhat slower reac- History
tion in terms of structural adaptation by growth of new The concept of Functional-Structural Plant Modeling
structures or active shedding of existing structures arose in the 1990s from the desire to link existing
(Vos et al. 2010). ▶ process-based models of crops with spatially
Usually, the dynamics in an FSPM is simulated explicit representations of plants (usually represented
using rules (for growth, extension, and branching) in a rule-based manner as ▶ Lindenmayer systems) in
always starting from a growing tip (meristem) which order to consider, in a mathematical model, interactions
produces ▶ phytomers (consisting of a leaf, a node, an between plant structure and functioning (Siev€anen et al.
axillary bud, and an internode). The axillary bud can 1997). Early representatives of FSPMs were restricted to
itself produce a phytomer (with a meristem on top) to one aspect of plant functioning and showed how the
form a new shoot; this is referred to as branching. For selected function was affected by morphology (e.g.,
each of the organs of a phytomer, specific dynamics for light interception, assimilate allocation, xylem sap
F
extension or growth in biomass apply, which can be flow). In these models, information was frequently uni-
described using nonlinear functions of time. If directional (from structure to function). Furthermore,
such a dynamics follows a logistic function, then it is due to the lack of conventions among modelers, these
characterized by three parameters: time of onset models were very specific and difficult to reuse or to
of process, time of maximum rate, and maximum transfer to other plant systems.
dimension. Thus, using a rule-based notation (e.g., During a second wave, FSPMs were designed to be
a ▶ Lindenmayer system) to describe these processes, more modular, exhibiting submodels contributed by
two types of rules are usually sufficient: several authors and considering more than one aspect
(a) For the formation of a new ▶ phytomer from of functionality at the same time. Examples for this type
a (terminal or axillary) meristem: of model are LIGNUM (Perttunen et al. 2001) or
Greenlab (Guo et al. 2006). New features included
conditions
M ! IN ½ L½ MM in these models were, e.g., bidirectional information
flow between structure and function (Hemmerling
where M symbolizes the meristem and I, N, L the et al. 2008); processes taking place at more than one
internode, node, and leaf, respectively; ! designates hierarchical level, e.g., phytohormone biosynthesis and
a replacement of the left-hand side symbol transport (Buck-Sorlin et al. 2005); or the fluxes of
(meristem M) by the string of symbols on the right phloem and xylem within the tree (Lacointe and
hand-side, i.e., the organs constituting a phytomer, Minchin 2008). Recent simulation studies addressed
plus another meristem at the end of the string, thereby also aspects of agricultural pest management, the impact
ensuring that the rule can be applied over and over of biomechanics on tree architectural development, and
again (as long as certain conditions are fulfilled); the more refined models of light distribution in canopies,
square opening [ and closing ] brackets represent including the effects of light quality on growth regula-
the beginning and end of a branch. tion (Vos et al. 2010). FSPMs have been created for crop
(b) For the dynamics of growth or extension: plants such as the cereals maize, wheat, rice, and barley,
but also for trees (Vos et al. 2010). In most of these
conditions FSPMs, the structure was modeled dynamically, from
OðlÞ ) Oðl þ dlÞ an initial meristem, but there exist also static FSPMs in
which the structure is fixed (Vos et al. 2010).
where O is one of {I, N, L,. . .} (can also be a fruit or
flower), l is its dimension (e.g., length), and ) denotes Elements for Conception and Construction of a
a rule in which the dimension of an already Functional-Structural Plant Model
existing organ (resulting from the application of rule The establishment of an FSPM starts with the
(a)) is updated by adding a fraction dl to the dimension observation of plant morphology, i.e., topology
l (specified on the right-hand side of the rule). (arrangement of ▶ phytomers in shoots, with lateral
Expressed as a rate, dl/dt would be the derivative shoots up to a maximum branching order) and
of the function employed to describe the growth or geometry, including the biometry and morphology of
extension dynamics of the organ in question. the different plant organs at different positions within
F 780 Functional-Structural Plant Modeling
the plant. As with every model, it is wise to determine et al. 2008); in addition, some FSPMs have been
the boundaries, the scale, and the elements of devised using a standard programming language
the system to be modeled: Usually, the scope of (e.g., C, C++, Java, Simula). Platforms used for
an FSPM is not more than a dozen plant individuals; FSPM include L-Studio, GroIMP, and OpenAlea.
the scales are that of the (sub)organ, shoot, individual
and canopy level, and the elements considered are
the organs constituting the phytomer, plus collective Cross-References
entities (the “plant” as the sum of all shoots, the shoot
as the sum of its phytomers, etc.). Sometimes, other ▶ Biological System Model
elements characteristic of a crop production system are ▶ Calibration
simulated, such as a greenhouse with lamps or a patch ▶ Complex System
of soil. Key topological parameters for an FSPM are ▶ Data Integration
maximum branching order, maximum rank of ▶ Ecological Modeling
a phytomer within a shoot of a given branching order, ▶ Feedback Regulation
or rank of lowest flower. Using rank and order as ▶ Function, Biological
parameters for M in equation (a), those measured max- ▶ Functional
imum values are used to restrict the production of ▶ Graphical Model
phytomers by the meristems (see phytomer for ▶ Hierarchical Structure
a definition of rank and order). For each considered ▶ Interdisciplinarity
plant element, a number of (physical or) physiological ▶ Markov Chain
processes (growth, extension, respiration, ▶ photosyn- ▶ Markov Process
thesis, etc.) are defined. This will yield an object- ▶ Model Validation
oriented model prototype. ▶ Modularity
In order to parameterize the dimensions and ▶ Monte Carlo Simulation
orientations of organs (e.g., growth and extension models ▶ Ordinary Differential Equation (ODE)
(equation (b)), a database with lengths, diameters, areas, ▶ Paradigm
and angles (phyllotaxis, divergence) of representative ▶ Phenomenological vs Physiological Modeling
organs needs to be established. Statistical analysis ▶ Plant Systems Biology
(regression) ideally returns a relationship between ▶ Rule
an organ dimension and its topology (rank, order). The ▶ Rule-Based Methods
same relation can be established between a physiological ▶ Spatiotemporal Pattern Formation
function (e.g., leaf photosynthesis) and topology. ▶ Time-Course
Methods used to establish a biometric database are man- ▶ Topology and Toponomics
ual measurement using a ruler and caliper, digitizing
(using a stylus or laser scanning), and image analysis.
After calibration of the FSPM, it is tested using a number References
of scenarios and then validated in the usual way by
reparameterization with a calibration data set (measured Buck-Sorlin G, Kniemeyer O, Kurth W (2005) Barley morphol-
ogy, genetics and hormonal regulation of internode elonga-
at organ scale) and comparison with output variables, tion modelled by a relational growth grammar. New Phytol
usually measured at canopy or plant scale. 166:859–867
Guo Y, Ma Y, Zhan Z, Li B, Dingkuhn M, Luquet D, de Reffye P
FSPM Algorithms, Languages, and Software (2006) Parameter optimization and field validation of the
functional-structural model Greenlab for maize. Ann Bot
Languages and formalisms employed for FSPM are
97:217–230
usually following the rule-based or object-oriented Hemmerling R, Kniemeyer O, Lanwert D, Kurth W, Buck-Sorlin G
paradigm, often in combination with the procedural (2008) The rule-based language XL and the modelling envi-
paradigm. The most widespread rule-based formalisms ronment GroIMP illustrated with simulated tree competition.
Funct Plant Biol 35:739–750
are ▶ Lindenmayer systems and ▶ relational growth
Lacointe A, Minchin PEH (2008) Modelling phloem and xylem
grammars. Dedicated languages for FSPM are transport within a complex architecture. Funct Plant Biol
L + C (Prusinkiewicz et al. 2007) and XL (Hemmerling 35:772–780
Fuzzy Logic 781 F
Perttunen J, Nikinmaa E, Lechowicz MJ, Siev€anen R, Messier C A consequence of the properties of a functor
(2001) Application of the functional-structural tree model F : C ! D is that it can be applied to all parts of
LIGNUM to sugar maple saplings (Acer saccharum Marsh)
growing in forest gaps. Ann Bot 88:471–481 a diagram in category C, yielding a valid diagram in
Prusinkiewicz P, Karwowski R, Lane B (2007) The L + C plant category D. Furthermore, the resulting diagram will
modeling language. In: Vos J, Marcelis LFM, de Visser PHB, commute if the original commutes.
Struik PC, Evers JB (eds) Functional-structural plant model- Many mathematical constructions that relate enti-
ling in crop production. Springer, Dordrecht, pp 27–42
Siev€anen R, M€akel€a A, Nikinmaa E, Korpilahti E (1997) Special ties of different nature can be understood as functors
issue on functional-structural tree models, preface. Silva between suitable categories. Typical examples are
Fennica 31:237–238 (tensor) products or the mapping from Lie groups to
Vos J, Evers JB, Buck-Sorlin G, Andrieu B, Chelle M, de Visser Lie algebras. Functors also play a crucial role in uni-
PHB (2010) Functional–structural plant modelling: a new
versatile tool in crop science. J Exp Bot 61:2101–2115 versal algebra and coalgebra, where they specify
expression languages for the construction and decon-
F
struction of structured elements, respectively (Jacobs
and Rutten 1997).
Functor
References
Baltasar Trancón y Widemann
Ecological Modelling, University of Bayreuth, Adámek J, Herrlich H, Strecker GE (2004) Abstract and concrete
categories. Wiley, New York. http://katmat.math.uni-
Bayreuth, Germany
bremen.de/acc/acc.htm, online edition
Jacobs B, Rutten J (1997) A tutorial on (co)algebras and (co)
induction. EATCS Bull 62:222–259
Synonyms Lawvere W, Schanuel S (1997) Conceptual mathematics: a first
introduction to categories. Cambridge University Press,
Cambridge/New York
Category homomorphism; Category map
Definition
Fuzzy Logic
A functor is a mapping that respects the structure of
a category (Lawvere and Schanuel 1997; Adámek et Eyke H€ullermeier, Thomas Fober and
al. 2004). A functor F from category C to category D Marco Mernberger
consists of the following: Philipps-Universit€at Marburg, Marburg, Germany
• A mapping that associates with every object X 2 C
an object FðXÞ 2 D
• A mapping that associates with every arrow Synonyms
f 2 HomC ðX; YÞ an arrow Fðf Þ 2
HomD ðFðXÞ; FðYÞÞ Fuzzy set theory
Such that
• Identities are mapped to identities
Definition
FðidX Þ ¼ idFðXÞ (1)
The term “fuzzy logic” is used with different meanings
• Compositions are mapped to compositions in the literature. In a narrow sense, it refers to a branch
of mathematical logic, where it is studied as a special
Fðg f Þ ¼ FðgÞ Fðf Þ (2) type of multivalued logic, that is, a logic with more
than two truth degrees (Hajek 1998). In a wider (and
where the left composition is in C and the right com- more common) sense, fuzzy logic is used as an
position is in D. umbrella term for a collection of methods, tools, and
F 782 Fuzzy Logic
techniques for constructing intelligent systems that, by of real objects belonging to that concept) in the real
virtue of the very idea of partiality of truth, are capable word. For example, what is a “short DNA molecule”?
of handling, processing and exploiting uncertain, Biologists have a vague though sufficiently clear idea
imprecise, and incomplete information. These of this concept, without using a precise definition in
methods build on the key concept of a fuzzy set, as terms of an exact upper bound on the number of base
introduced by the founder of fuzzy logic, Lotfi A. pairs (bp) or the length in mm. Given such bounds,
Zadeh, in his seminal paper (Zadeh 1965). Fuzzy sets a proposition like “DNA molecule XYZ is small”
formalize the idea of graded class membership, would be either true or false; likewise, the truth
according to which an element can partially belong degree of a natural language proposition like “Einstein
to a set. In conjunction with generalized logical was born around noon” could be determined in
(set-theoretical) operators and derived notions like a unique way, given a precise definition of the
a fuzzy relation, the concept of a fuzzy set can be meaning of “around noon” in terms of a time interval
developed into a generalized set theory, which in turn (e.g., 12 o’clock 15 min) and the possibility to
provides the basis for generalizing theories in different pinpoint the exact time of a birth.
branches of (pure and applied) mathematics as well as Needless to say, this way of adapting human think-
fuzzy set-based approaches to intelligent systems ing and language to conventional logic and set theory
design, encompassing methods for information would be neither desirable nor useful. Instead, fuzzy
processing, decision making, optimization, and data logic offers an approximation in the other direction:
analysis. logic and set theory are generalized, so as so enable
a more faithful mathematical modeling of human con-
ception. The core idea in this regard is the notion of
Characteristics a fuzzy set, which allows for partial membership and
soft class boundaries (Pedrycz and Gomide 2007).
The notion of truth is commonly considered as
a bivalent concept: Logically, a proposition is either Fuzzy Sets
true or false, but nothing in-between. This conception, A fuzzy subset A of a reference set is identified by
which pervades modern science and thinking, has a so-called membership function, often denoted mA ð Þ,
a long-standing tradition in Western philosophy, and which is a generalization of the characteristic function
manifests itself in standard mathematical theories, A ð Þ of an ordinary set A . For each element
notably logic and set theory. And admittedly, formal x 2 , this function specifies the degree of member-
systems based on bivalent logic (including theories of ship of x in the fuzzy set; it can be interpreted as the
uncertainty based on such systems, like probability truth degree of the proposition that x 2 A. Usually,
theory) have proved extremely useful in the scientific membership degrees mA ðxÞ are taken from the unit
terrain, where they paved the way for the amazing interval [0,1], i.e., a membership function is an
success of the exact and engineering sciences in the X ! ½0; 1 mapping. In principle, however, more gen-
last century. eral membership scales (such as ordinal scales or com-
In many other less exact fields of science, however, plete lattices) can be used. We denote by ðÞ the set
ranging from the biological and life sciences over legal of all fuzzy subsets of .
practice to the modeling of cognitive processes and Fuzzy sets are often used for discretizing numerical
human intelligence, the bivalence of truth can be called attributes in a “soft” manner, taking advantage of their
into question. In fact, it was already noticed by ability to model “non-sharp” boundaries between clas-
Bertrand Russell in 1923 that “All traditional logic ses. Thus, they serve as an interface between the orig-
habitually assumes that precise symbols are being inal numerical scale and a symbolic scale comprised of
employed. It is therefore not applicable to this terres- the (natural language) terms associated with the fuzzy
trial life, but only to an imagined celestial existence” sets. For example, in ▶ gene expression analysis, one
(Russell 1923). Roughly speaking, this is because of typically distinguishes between normally expressed,
the vagueness and ambivalence of the concepts dealt under-expressed, and over-expressed genes. This clas-
with in these fields: for the intension of these concepts, sification is made on the basis of the expression level of
there is rarely a precise extension (in the sense of a set the gene (a normalized numerical value), by using
Fuzzy Logic 783 F
Fuzzy Logic, Fig. 1 Fuzzy under normal over
partition of the gene
expression level with
a “smooth” transition between
membership
under-expression, normal
expression, and
over-expression; membership
functions are shown as red
lines
–2 0 2
expression level
F
corresponding thresholds. For example, a gene is often In a quite similar way, the Cartesian product of
called over-expressed if its expression level is at least fuzzy sets A 2 ðÞ and B 2 ðÞ can be defined:
twofold increased. Needless to say, a precise threshold mAB ðx; yÞ ¼ mA ðxÞ mB ðyÞ for all ðx; yÞ 2 .
of that kind is arbitrary to some extent and implies • The logical disjunction can be generalized
an unnatural sudden jump from completely over- analogously, namely by means of a t-conorm
expressed to not at all over-expressed. Figure 1 . If is a t-norm, then defined by
shows a fuzzy partition of the expression level with a b ¼ 1 ð1 aÞ ð1 bÞ is a t-conorm.
a “smooth” (and arguably less arbitrary) transition Well-known examples of t-conorms include the
between under-, normal, and over-expression, formal- maximum ða; bÞ 7! maxða; bÞ, the algebraic sum
ized in terms of fuzzy sets with trapezoidal member- ða; bÞ 7! a þ b ab, and the Łukasiewicz
ship functions. For instance, according to this t-conorm ða; bÞ 7! minða þ b; 1Þ. A t-conorm can
formalization, a gene with an expression level of at be used for defining the union of fuzzy sets:
least 3 is definitely considered over-expressed, below 1 mA[B ðxÞ ¼ mA ðxÞ mB ðxÞ for all x 2 .
it is definitely not over-expressed, but in-between, it is • A generalized implication ⇝ is a
considered over-expressed to a certain degree. ½0; 1 ½0; 1 ! ½0; 1 mapping which is monotone
decreasing in the first and monotone increasing in
Generalized Logical Connectives the second argument, and which satisfies the bound-
To operate with fuzzy sets in a formal way, fuzzy set ary conditions a ⇝ 1 = 1, 0 ⇝ b = 1, 1 ⇝ b = b.
theory offers generalized set-theoretical resp. logical (Apart from that, additional properties are
connectives (like in the classical case, there is a close sometimes required.) Implication operators of that
correspondence between set theory and logic). Espe- kind, such as the Łukasiewicz implication ða; bÞ 7!
cially, important in this regard is a class of operators minð1 a þ b; 1Þ, are especially important in con-
called triangular norms or t-norms for short (Klement nection with the modeling of fuzzy rules.
et al. 2002).
• A t-norm is a ½0; 1 ½0; 1 ! ½0; 1 mapping The Extension Principle
which is associative, commutative, monotone Apart from basic logical connectives, fuzzy logic
increasing (in both arguments) and which satisfies offers a number of tools for generalizing and
the boundary conditions a 0 = 0 and a 1 = a for “fuzzifying” existing theories and methods. One of
all 0 a 1. Well-known examples of t-norms these tools is the so-called extension principle, which
include the minimum ða; bÞ 7! minða; bÞ, the prod- allows for extending a mapping f : n ! to a fuzzy
uct ða; bÞ 7! ab, and the Łukasiewicz t-norm mapping F : ðÞn ! ðÞ in a generic way.
ða; bÞ 7! maxða þ b 1; 0Þ. F accepts fuzzy subsets of as input and, correspond-
A t-norm naturally qualifies as a generalized logical ingly, produces fuzzy subsets of as output. More
conjunction. Moreover, it can be used to define the specifically, for fuzzy subsets A1 ; . . . ; An as input, the
intersection of fuzzy subsets A; B 2 ðÞ as fol- output B ¼ FðA1 ; . . . ; An Þ is a fuzzy subset of with
lows: mA\B ðxÞ ¼ mA ðxÞ mB ðxÞ for all x 2 . membership function
F 784 Fuzzy Logic
mB ðyÞ ¼ sup minfmA1 ðx1 Þ; . . . ; mAn ðxn Þg
y¼f ðx1 ;...; xn Þ
mB ðyÞ ¼ sup min mA ðxÞ; min ðmAðiÞ ðxÞ⇝mBðiÞ ðyÞÞ ;
x21 ...m i¼1;...;n
The supremum operator in this expression is where mAðiÞ ðxÞ ¼ mAðiÞ ðx1 Þ mAðiÞ ðx2 Þ ... mAðiÞ ðxm Þ:
1 2 m
playing the role of a generalized existential quantifier. for a t-norm , and ⇝ is a fuzzy implication.
Thus, mB ðyÞ can be interpreted as the truth degree
of the following proposition: 9ðx1 ; . . . ; xn Þ 2 n : Applications
ð8i 2 f1; . . . ; ng : ðxi 2 Ai ÞÞ ^ ðy ¼ f ðxÞÞ. Or, stated Fuzzy methods have not only been developed
differently, y belongs to the fuzzy output F(A) insofar for ▶ knowledge representation and information
as there exist x1 ; . . . ; xn that belong, respectively, to processing, but also in many other branches of applied
A1 ; . . . ; An and are mapped to y. mathematics, including optimization, decision making,
statistics, and data analysis. Moreover, a plethora of
Fuzzy Inference concrete applications have been developed in different
Fuzzy sets can be used to formalize vague and fields, ranging from fuzzy control systems over flexible
imprecise knowledge. For example, a basic propo- querying in databases to medical expert systems.
sition of the form “X is A”, where X is a variable In bioinformatics and systems biology, fuzzy
with domain and A a fuzzy subset of , can be methods have been used with the aim to capture the
understood as a flexible ▶ constraint on the value of intrinsic fuzziness of terms, concepts, and relation-
X: those values x 2 with mA ðxÞ ¼ 0 are excluded, ships in the biological sciences, and advantages of
while all other values x are considered as possible to fuzzy sets and fuzzy logic for analyzing biological
some degree, namely to the degree mA ðxÞ; in partic- data and modeling biological systems are becoming
ular, those x with mA ðxÞ ¼ 1 are declared as fully more and more recognized. As a concrete example, we
plausible. mention the use of fuzzy ▶ clustering as a data analysis
In principle, fuzzy ▶ inference can then be real- tool for inferring homogeneous groups of related bio-
ized by combining and propagating constraints of logical entities (like genes) in a data-driven way; oblig-
that type, using suitable logical operators as well as ing to biological reality, these groups may have soft
projection and extension operators for fuzzy rela- boundaries and are allowed to overlap (Dembélé and
tions. Fuzzy rule-based ▶ inference can be seen as Kastner 2003; M€oller-Leveta et al. 2005). Apart from
an important special case. Here, the idea is to this, fuzzy methods have been applied to many other
express knowledge about the (functional) depen- problems in this field; for an overview of recent
dency between attributes in the form of a set of advances, see (Xu et al. 2008; Jin and Wang 2009).
IF–THEN rules:
ðiÞ ðiÞ
Ri : IFðX1 is A1 Þ AND ðX1 is A2 Þ AND . . . AND Cross-References
ðXm is AðiÞ
m Þ THEN ðY is B Þ ðiÞ
▶ Clustering
▶ Constraint
As an example, consider a rule like “If gene X is ▶ Gene Expression
over-expressed and gene Y is under-expressed, ▶ Inference
then gene Z is over-expressed”. A rule of that ▶ Knowledge Representation
kind can be seen as a soft ▶ constraint that partly
excludes some value combinations
ðx1 ; x2 ; . . . ; xm ; yÞ 2 1 2 . . . m . Given
a fuzzy specification of the input attributes Xj in References
terms of fuzzy subsets Aj ð j ¼ 1; . . . ;mÞ, the output
Dembélé D, Kastner P (2003) Fuzzy C-means method for clus-
produced by the fuzzy rule system consisting of n
tering microarray data. Bioinformatics 19(8):973–980
rules Ri ði ¼ 1; . . . ; nÞ is a fuzzy subset B of such Hajek P (1998) Metamathematics of fuzzy logic. Kluwer,
that Dordrecht
Fuzzy Set Theory 785 F
Jin Y, Wang L (eds) (2009) Fuzzy systems in bioinformatics and Xu D, Keller JM, Popescu M (2008) Applications of fuzzy logic
computational biology. Springer, Berlin in bioinformatics. World Scientific, Singapore
Klement EP, Mesiar R, Pap E (2002) Triangular norms. Kluwer, Zadeh LA (1965) Fuzzy sets. Inform Control 8(3):338–353
Dordrecht
M€oller-Leveta CS, Klawonn F, Choc KH, Yina H,
Wolkenhauer O (2005) Clustering of unevenly sampled
gene expression time-series data. Fuzzy Set Syst
152(1):49–66
Pedrycz W, Gomide F (2007) Fuzzy systems engineering:
Fuzzy Set Theory
toward human-centric computing. Wiley, New York
Russell B (1923) Vagueness. Aust J Psych Phil 1:84–92 ▶ Fuzzy Logic
F
G
G Type Domain
G1 Checkpoint
G2/M Transition
Definition
▶ Cell Cycle Transitions, G2/M
Gene and allele nomenclature for immunoglobulins
(IG) or antibodies and T cell receptors (TR) has been
set up by IMGT®, the international ImMunoGeneTics
GCN Control information system® (http://www.imgt.org) (▶ IMGT®
Information System) (Lefranc and Lefranc 2001a, b).
▶ Translational Control of GCN4 The gene and allele nomenclature is based on
steps: (1) the establishment of sets of high-quality of data are protein-protein interaction networks
samples; (2) the use of a large-scale gene expression (interactome) and gene regulatory networks. The use
platform; (3) the identification of a gene expression of these networks allows the use of graph theory to
biomarker through mathematical and computational explore other quantitative parameters of the data.
strategies; and (4) validation of the discriminatory The dissection of a gene expression profiling into
power of the gene expression biomarker in an indepen- pathways and functional modules (through a system biol-
dent set of samples. ogy approach) will serve to improve the identification of
Initially, the discovery process for gene expression new gene expression biomarkers as well as to increase
biomarkers involves the purification of RNA for the opportunities for therapeutic intervention in case of
a given sample and the use of one of the above plat- diseases. Even if a drug target is not directly differen-
forms for gene expression profiling. The process is tially expressed in a disease state, the corresponding
only effective if a large number of samples are used drug may still be useful if the pathway that the target
to account for biological variability. ▶ Data mining belongs is seen differentially expressed in that disease.
strategies are then used to identify putative gene
expression biomarkers. These biomarkers, either
a single product or a signature, need to be further Cross-References
validated in a larger and independent panel of samples.
In the last few years, the field of gene expression ▶ Biomarker Discovery, Knowledge Base
biomarkers has been revolutionized by the development ▶ Biomarkers
of next-generation sequencing. These new sequencing ▶ Biomarkers, Clinical Relevance
technologies have allowed an exhaustive analysis of the ▶ Biomarkers, Protein Expression
transcriptome, covering with high sensitivity all RNA ▶ Data Mining
molecules in a sample. Moreover, they have allowed the ▶ DNA Microarrays
identification of a large collection of transcripts variants ▶ Gene Expression
generated through processes like alternative splicing and ▶ Gene Ontology
alternative polyadenylation (Caballero et al. 2001).
The last decade has also witnessed the emergence
of non-coding RNAs, especially ▶ microRNAs, as References
critical regulatory molecules in many normal and path-
ological conditions. Not surprisingly, micro-RNAs Blomme EAG, Warder SE (2008) Gene expression-based bio-
markers of drug safety. In: Wang F (ed) Biomarker methods
have been characterized as gene expression bio-
in drug discovery and development. Humana Press, Totowa
markers in many biological conditions, especially can- Caballero OL, de Souza SJ, Brentani RR, Simpson AJG
cer (Jeffrey 2008). (2001) Alternative spliced transcripts as cancer markers.
Dis Markers 17:67–75
Jeffrey SS (2008) Cancer biomarker profilling with microRNAs.
Gene Expression Biomarkers and Systems Biology
Nat Biotechnol 26:400–401
Although these large-scale gene expression profiling
technologies have generated datasets that are per se
very informative, their integrated use has proven to be
the most effective way to extract more valuable infor- Gene Expression Biomarkers, Ranking
mation. In that aspect, platforms based on systems
biology approaches have been very useful in serving Ronnie Alves
as a scaffold for integration of gene expression data. Instituto de Informática, Universidade Federal do Rio
A critical issue when integrating gene expression Grande do Sul, Porto Alegre, Brazil
data into a system biology approach is the ▶ ontology
of genes and gene products. To be effective, this inte-
gration has to obey certain classification rules that will Synonyms
allow the identification of pathways and functional
modules as gene expression biomarkers. The most Differential expression analysis; Gene ranking; Order
used networks to serve as a scaffold for the integration statistics; Top-K lists
Gene Expression Biomarkers, Ranking 793 G
Definition situation is gene expression measurements of one cell
line before and after chemical treatment. Furthermore,
Gene expression biomarkers by ranking refer to the the availability of replicates enables to rank genes
task of selection and ordering of the most significant according to their associated t-statistic for each gene:
genes from transcriptomic studies, where, usually,
p
a control versus target experimental setting is properly t ¼ m=ðst= nÞ (1)
designed for the evaluation of differentially expressed
genes (DEG) associated to a particular tissue or cell in where m is the difference of means across replicates; st,
the related study. Selection is evaluated through the within groups standard deviation; and n, the
a differentiation score (Statistics), and then, genes are number of genes considered for testing. F-scores are
ranked in ascending order having high-scored genes the straightforward generalization of t-scores in the
(Gene Markers) in the top of the gene list. multiconditional case (Ewens and Grant 2004). Prob-
lems arise when genes with small intensity differences
present almost no changes between groups. This might G
Characteristics yield high t-scores and thus, these genes will pop up in
the top of the list. To overcome this situation one could
Ranking Gene Expression Differences artificially enlarge these variances by employing
Differential expression analysis is the traditional strat- different penalizing factors in the t-statistic test. In
egy for ranking genes (Biomarkers, Gene Markers) (Lonnstedt and Speed 2002) a parametric empirical
from gene expression data. The task of finding DEG Bayes approach being equivalent to a penalized
falls into the following steps (Steinhoff and Vingron t-statistic is introduced:
2006): p
• Ranking: genes are ranked according to their evi- t¼m f þ std2 n (2)
dence of differential expression.
• Assigning significance: a statistical significance is where f is the penalty value which is estimated from the
being assigned to each gene. mean and standard deviation of the variance across
• Cut-off value: to arrive at a limited number of DEG samples. The approach entitled “significance analysis
a cut-off value for the statistical significance needs of microarrays,” or just SAM (Tusher et al. 2001), is
to be determined. one of the most used penalized t-statistic. Another
The simplest experimental setting is the comparison similar strategy based on percentiles is proposed in
of two experimental groups CA and CB and asking for (Efron et al. 2001), although in this case it suggested
their differences in each gene. Basically, one can use the application of an additive penalizing factor in the
the empirical intensity values of each series CA and CB denominator of the t-statistic that it is the 90th percen-
and introduce an ordered list of ranked differences tile of the standard deviation across samples. If the
between them. Typically, in this “simple” approach penalization factor is zero the method is solely based
a fixed cut-off is chosen, usually this is a fold change on an ordinary t-statistic test.
of two. Thus, all genes showing a fold change of more A number of linear methods have also been pro-
than two are considered to be differential. It has been posed for ranking gene expression. In the ANOVA
shown in several studies that such approach is able to (Regression, Statistics) model it is assumed a linear
provide a decent DEG list with high reproducibility, model of specific effects (like dye, slide, treatment,
but it has also been reported having low statistical gene effects and their associated interactions) for log
power and high false discovery rate (FDR). intensities of all genes. A moderated t-statistic is
Availability of repetitions provides for a richer spec- suggested in (Smyth 2004) which is proportional to
trum of applicable statistical procedures. Either one com- the t-statistic with sample variance offset. It is also
pares two groups or multiple groups. In the two-group possible to design a robust linear model for each single
comparison, one considers either a paired or unpaired gene, estimating contrasts of all pairwise comparisons
situation. Comparing a healthy group with a diseased one of tested groups. Rather than applying t-statistics
is an example for an unpaired experiment because the approaches for ranking gene expression differences,
samples are independent. An example for a paired one could take advantages of nonparametric tests,
G 794 Gene Expression Biomarkers, Ranking
usually based on a Wilcoxon rank sum test or permu- Gene Expression Biomarkers, Ranking, Table 1 Possible
tation t-test. Rank-based strategies use rank scale outcomes from m hypothesis tests based on a significance
threshold t 2 (0, 1] to their associated P-values
information instead of the numerical ones for differen-
tiation expression analysis and it can be applied with- Not significant Significant
(P-value > t) (p-value t) Total
out any assumptions regarding data distribution.
Null true U V m0
Alternative T S m1
Significance of the Ranked Gene Lists true
Once gene rankings are found, following statistical W R m
tests, the next step is checking its significance. Usually,
researchers use a P-value cut-off of 0.05 and genes
presenting a lower P-value are those showing signifi- As demonstrated in (Benjamini and Hochberg 1995),
cance. On the other hand, in order to obtain such the FDR offers a less strict multiple test criterion than
p-value multiples tests are conducted, and, in the end, the FWER, being more appropriated for differential
it can increase the false discovery rate. To overcome expression analysis.
this situation one alternative is finding a criterion to
limit the number of testing procedures. Thus, one can Consensus Gene-Ranking
either remove from the test genes which are not Several statistical tests have been proposed in the
expected to be relevant or ignoring those genes literature making the selection of one unique ranking
presenting quite low expression variation across all method a hard task. Indeed, there is no consensus with
experimental conditions. Therefore, when using mul- respect to one universal (unique) rank test (r). One
tiple tests one must evaluate the error rate associated suggested alternative is to evaluate empirically the
when assessing the statistical significance. Given ranked list while combining several tests before electing
a type I error rate (false positive or false discovery the final gene list, rather than pushing optimizations into
rate) controlling for multiple testing means correcting one particular test trying to find an expected gene list.
P-values such that the given error rate can be The common criterion is evaluating the level of
guaranteed for all tests. Methods can be divided into consensus among the top-100 ranked genes by
those that control the family wise error rate (FWER) or intersecting the top-k lists:
the false discovery rate (FDR). The probability of at
least one type I error within the significant genes is X
p
called FWER. The FDR is the expected proportion of sðr; r 0 ; kÞ ¼ Iðrj k ^ r 0j kÞ (5)
type I errors within the rejected hypotheses. Table 1 j¼1
Gene Modulation
Cross-References
▶ Gene Regulation
▶ Biomarkers, Ranking
▶ Gene Expression Biomarkers
▶ Gene Expression Biomarkers, Ranking
▶ Gene Regulation Gene Network
▶ Modular Organization of Gene Regulatory
Networks ▶ Biological Disease Mechanism Networks
▶ Regression Analysis ▶ Gene Regulatory Networks
G 796 Gene Normalization with GNAT
Gene Regulation G
References
Jia Meng1 and Yufei Huang1,2,3
Fundel K, Zimmer R (2006) Gene and protein nomenclature in 1
Picower Institute for Learning and Memory,
public databases. BMC Bioinformatics 7:372. doi:10.1186/
1471-2105-7-372. http://dx.doi.org/10.1186/1471-2105-7-372
Massachusetts Institute of Technology, Cambridge,
Hakenberg J, Plake C, Leaman R, Schroeder M, Gonzalez G (2008) MA, USA
2
Inter-species normalization of gene mentions with GNAT. Greehey Children’s Cancer Research Institute,
Bioinformatics 24(16):i126–i132. doi:10.1093/bioinformatics/ University of Texas at Texas Health Science Center at
btn299. http://dx.doi.org/10.1093/bioinformatics/btn299
San Antonio, San Antonio, TX, USA
3
Department of Epidemiology and Biostatistics,
University of Texas at Texas Health Science Center at
San Antonio, San Antonio, TX, USA
Gene Ontology
Synonyms
Jiguang Wang
Beijing Institute of Genomics, Chinese Academy of
Gene modulation; Regulation of gene expression
Sciences, Beijing, China
Definition
Definition
Gene regulation (Watson and Roberts 1965) concerns
Gene Ontology (GO) (Ashburner et al. 2000) provides the control of the synthesis of functional gene products or
a hierarchical vocabulary of GO terms for describing expression (mainly mRNAs or proteins) in cells. Proper
functions and characteristics of gene or gene products gene regulation defines the distinct phenotype of
in different databases and different species. There are a biological system and ensures its stability, and
three structured, species-independent ontologies, i.e., misregulation is usually associated with disease. It is
biological processes, cellular components, and molec- also essential in promoting the versatility and adaptabil-
ular functions. GO project is a collaborative effect to ity of a living organism under various environmental
maintain, make cross-links for, and develop tools to conditions.
use the three ontologies.
Characteristics
References
Stages of Gene Expression Regulation
Ashburner M, et al. Gene ontology: tool for the unification of
biology. The Gene Ontology Consortium. Nat Genet. To control the synthesis of gene product (mRNA or
2000;25:25–29. proteins), gene regulation may assume different modes
G 798 Gene Regulation
at including chromatin domain, transcription, post- gene regulations, gene regulatory network is a systems
transcription, and translation. Summarized in Table 1 biology approach toward understanding overall molec-
is a list of different modes of gene regulations. ular mechanisms that control the cell response. Com-
putational reconstruction of gene regulatory network is
High-Throughput Technologies for Gene one of the current research focuses of computational
Regulation system biology.
As gene regulation occurs at multiple stages of gene
expression, different high-throughput technologies Modeling of Gene Regulatory Network
based on microarray, deep sequencing, or Liquid chro- Modeling is the key step in uncovering gene regulatory
matography-mass spectrometry (LC-MS) have been network. Various models have been proposed and we
developed for investigating different gene regulations, discuss some most popular models in the following.
which monitor the expression profiles of thousands of
genes simultaneously. A summary of these technologies Boolean Network
is shown in Table 1. Note that, although the final product A Boolean network (Shmulevich and Dougherty 2009)
of gene regulation is protein, most of the technologies models the association instead of direct regulation
are array/sequencing based and they mainly measure between sets of genes, where transition of the associ-
mRNA activities. This reality is due to the low sensitiv- ation states are modeled by Boolean functions. In
ity of existing proteomics technologies including LC- a Boolean network–based gene regulatory network,
MS and protein array in measuring protein expression. both the regulatory states and regulatory effects are
digitalized, which facilitate the related computation
Systems Biology and Gene Regulatory Network and analysis. Particularly, a Boolean node represents
Systems biology seeks to model all the individual units the binary state (on/activation or off/repression) of
of a biological system under a proper mathematical a gene, and an edge indicates the regulatory effect/
framework, such that both the overall structure and association between the two nodes, which is modeled
construction units of the system can be captured simul- by Boolean functions. When modeling the regulatory
taneously. Since response of cells to changing endog- dynamics, states of all Boolean variables can update
enous or exogenous conditions is governed by intricate with time. A simple example of a Boolean network is
Gene Regulation 799 G
the reaction kinetics, or the dynamics of regulatory
effects. Suppose that an ODEs GRN consists of G
nodes representing the expression of G genes, and let
yg ðtÞ; g 2 ½1; 2; :::; G represent the expression of the
a b g-th gene at time t. Then, the temporal regulatory
effects on a target gene by the regulators can be
modeled by the ODE:
dyg
¼ f g ðy1 ; y2 ; :::; yG Þ (1)
dt
2 3 2 32 3
y1 ðtÞ a1;1 a1;l x1 ðtÞ
6 .. 7 6 .. .. .. 76 .. 7
Gene Regulation, Table 3 Boolean network state update 4 . 5¼4 . . . 54 . 5 (2)
Time 0 1 2 3 4 5 6 7 8 ... yG ðtÞ aG;1 aG;L xL ðtÞ
a 1 0 0 1 0 1 0 1 0 ...
b 0 1 0 1 0 1 0 1 0 ... where ag;l is the regulatory coefficient of the g-th gene
c 0 0 1 0 1 0 1 0 1 ...
by the l-th transcription factor. Note that (2) can also be
written in a matrix form:
Discussion ▶ Epigenetics
Various models have been proposed for modeling sys- ▶ Epigenetics, Drug Discovery
tems-level gene regulations. However, because of the
complexity of gene regulation and the availability of
data, no existing models can cover all aspects of gene References
regulations. Most existing models only apply to
a limited subset of the mRNAs or proteins of interest, Chicone C (2006) Ordinary differential equations with applica-
and they model mostly indirect regulations/association tions. Springer, New York
Child D (2006) The essentials of factor analysis. Continuum,
between genes products.
London
Limited by the current techniques, most current Heckerman D (2008) A tutorial on learning with Bayesian net-
models utilize only mRNA expression levels for works. Innov Bayesian Netw 156:33–82
Gene Regulatory Networks 801 G
Huang Y, Tienda-Luna I et al (2009) Reverse engineering gene
regulatory networks. Signal Proc Mag IEEE 26(1):76–97
Shmulevich I, Dougherty E (2009) Probabilistic boolean
networks: the modeling and control of gene regulatory
networks. Siam, New York Gene6
Watson J, Roberts K (1965) Molecular biology of the gene.
Wa Benjamin, New York
Gene1
e1 Gene5
G
G
Gene Regulatory Networks Gene3
Yong Wang
Academy of Mathematics and Systems Science,
Chinese Academy of Sciences, Beijing, China
Gene Regulatory Networks, Fig. 1 A toy example for
the gene regulatory network with six genes. Here, nodes
Synonyms denote genes, and edges denote their regulatory relationships.
Specifically, red arrows represent activation, and blue arcs
represent repression
Gene network; Genetic network
transcription factor/DNA interactions. As described in important first step toward biological function discovery
the section on differential equation models, an influence of small molecules and drug design.
model may be advantageous when trying to predict the We can rewrite Eq. 1 in a compact form for all time
global response of the cell to stimuli. The limitation of points of one dataset by matrix notation:
this approach is that the model can be difficult to interpret
in terms of the physical structure of the cell, and therefore dX
¼ JX þ PC (2)
difficult to integrate or extend with further research. dt
Moreover, the implicit description of hidden regulatory
factors may lead to prediction errors. where X ¼ (x(t1),. . .,x(tm)) and dX/dt ¼ (dx(t1)/dt,. . .,
dx(tm)/dt) are n n matrices with the first derivative
Linear Differential Equations for Gene Regulatory of mRNA concentration dxi(tj)/dt ¼ [xi(tj + 1)xi(tj)]/
Network [tj+1 tj] for i ¼ 1,. . .,n; j ¼ 1,. . .,m. Although the
In general, a genetic network can be expressed by a set forward difference approximation here is utilized for
of nonlinear differential equations. Almost all of numerical computation of dx/dt, backward or other
the existing approaches for gene regulatory network difference approximation methods can be applied
inference use linear or additive models, primarily similarly. Suppose that there are s external perturba-
due to the complex structures of biological systems and tion compounds, then C ¼ (c(t1),. . .,c(tm)) is an s m
the scarcity of data (Wang et al. 2006a, b, 2007). Fur- matrix representing the s perturbations. The unknowns
thermore, linear equations can capture the main features to be calculated are connectivity matrix J and P.
of the network near the steady state, and can provide
a good starting point for further modeling and analysis.
Assume that there are N microarray datasets X1, Cross-References
X2, . . ., XN with m1, m2, . . ., mN time points, respec-
tively, for one organism. These time-course datasets may ▶ Combinatorial Transcription Regulatory Network
be measured under various environments or stimuli by ▶ Continuous Model
different labs. Let us first consider one time-course ▶ Linear Model
dataset with m time points. A linear differential ▶ MicroRNA-embedding Regulation Networks,
equation can be used to represent the rate of synthesis Logical Modeling
of a transcript as a function of the concentrations of other ▶ Ordinary Differential Equation (ODE)
transcripts in a cell and the external perturbations: ▶ Regulation
▶ Reverse Engineering
dxðtÞ
¼ JxðtÞ þ PcðtÞ; t ¼ t1 ; t2 ; :::; tm (1)
dt
References
where x(t) ¼ (x1(t), . . ., xn(t)) T2Rn, xi(t) is the expres-
sion level (mRNA concentrations) of gene i at time Brazma A, Jonassen I, Vilo J, Ukkonen E (1998) Predicting gene
point t. J ¼ (Jij)n n is an n n connectivity matrix regulatory elements in silico on a genomic scale. Genome
with elements Jij representing the effect of gene j on gene Research 8(11):1202
Chen L, Wang RS, Zhang XS (2009) Biomolecular networks:
i with a positive, zero, or negative sign, indicating acti- methods and applications in systems biology. Wiley,
vation, no interaction, and repression, respectively. P ¼ Hoboken
(Pij)n s is an n s matrix representing the effect of the De Hoon MJL, Imoto S, Kobayashi K, Ogasawara N, Miyano S
s perturbations or s small molecules on x, and c(t)2Rs (2003) Inferring gene regulatory network from time-ordered
gene expression data of bacillus subtilis using differential
represents the external perturbations with s compounds equations. Pac. Symp. Biocomput 17–28
at time t. (In principle, the external perturbation can be of Demeter J, Beauheim C, Gollub J, Hernandez-Boussard T, Jin H,
virtually any type, for example, an external environmen- Maier D (2007) The Stanford Microarray Database: imple-
tal factor, a small molecule, an enzyme, a microRNA, mentation of new analysis tools and open source release of
software. Nucl. Acids Res. 35(suppl_1):D766–770
or a post-translationally modified protein.) A non-zero Dewey TG, Galas DJ (2001) Dynamic models of gene expres-
element Pij of P implies that the i-th gene is a direct target sion and classification. Functional & Integrative Genomics
of the j-th perturbation or compound. Identifying P is an 1(4):269–278
Gene Set and Protein Set Expression Analysis 805 G
D’Haeseleer P, Liang, S, Somogyi R (2000). Genetic network Definition
inference: from co-expression clustering to reverse engineer-
ing. Bioinformatics 16(8):707-726
Friedman N (2004) Inferring cellular networks using probabilis- Gene set expression analysis (GSEA) often referred
tic graphical models. Science 303(5659):799–805 to as gene set enrichment analysis is based upon
Gardner TS, Faith JJ (2005) Reverse-engineering transcription determining whether predefined sets of genes differ
control networks. Phys Life Rev 2(1):65–88 in their expression patterns in some way. This is
Gardner TS, di Bernardo D, Lorenz D, Collins JJ (2003) Infer-
ring genetic networks and identifying compound mode of opposed to conventional ▶ relative expression analysis
action via expression profiling. Science 301(5629):102–105 which examines differences on a gene-by-gene by
Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, Macisaac KD, basis. Similarly, these methods can be applied to pro-
Danford TW (2004) Transcriptional regulatory code of a tein expression data from mass spectrometry (MS)
eukaryotic genome. Nature 431:99–104
Holter NS, Maritan A, Cieplak M, Fedoroff NV, Banavar JR proteomic analysis.
(2001) Dynamic modeling of gene expression data. Proc Natl
Acad Sci US A 98(4):1693
Husmeier D (2003) Sensitivity and specificity of inferring genetic Characteristics
regulatory interactions from microarray experiments with G
dynamic Bayesian networks. Bioinformatics 19(17):2271–2282
Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, Gerber Utilizing information about predefined sets, typically
GK (2002) Transcriptional Regulatory Networks in Saccha- taken from biochemical pathway databases such as
romyces cerevisiae. Science 298(5594):799–804 KEGG, Panther, or Metacyc or from the Gene Ontology
Nachman I, Regev A, Friedman N (2004) Inferring quantitative
models of regulatory networks from expression data. Bioin- (GO), increases the ability to infer biological meaning
formatics 20(90001) from gene expression differences and can increase the
Tegner J, Yeung, MK, Hasty J, Collins JJ (2003) Reverse engi- power to detect differences by combining data across
neering gene networks: Integrating genetic perturbations with related genes. Originally, analyses were based on data
dynamical modeling. Proc Natl Acad Sci USA 100(10):5944
Teixeira MC, Monteiro P, Jain P, Tenreiro S, Fernandes AR, from gene expression microarrays; more recently other
Mira NP (2006) The YEASTRACT database: a tool for the technologies such next-gen sequencing are being used to
analysis of transcription regulatory associations in Saccharo- measure gene expression. A number of comparisons and
myces cerevisiae. Nucl. Acids Res. 34(suppl_1):D446–451 reviews of GSEA analyses have been published (Nam
Wang Y, Joshi T, Zhang X-S, Xu D, Chen L (2006) Inferring
gene regulatory networks from multiple microarray datasets. and Kim 2008; Emmert-Streib and Glazko 2011). Addi-
Bioinformatics 22(19):2413–2420 tionally, the methodology is being applied to protein set
Wang Y, Joshi T, Xu D, Zhang, X, Chen L (2006) Supervised expression analysis (PSEA) through the use of MS pro-
inference of gene regulatory networks by linear program- teomics data.
ming. Lecture notes in computer science 4115:551
Wang RS, Wang Y, Zhang XS, Chen L (2007) Inferring tran- The simplest approach to GSEA analysis was based
scriptional regulatory networks from high-throughput data. on chi-square or ▶ Fisher’s tests. A criteria was chosen
Bioinformatics 23:8 for differential expression (e.g., ER > 2, P-value <.05)
Yeung MKS, Tegner J, Collins JJ (2002) Reverse engineering and the proportion of differentially expressed (or over
gene networks using singular value decomposition and
robust regression. Proc Natl Acad Sci USA 99:6163–6168 or under expressed) genes was compared across gene
sets as can be seen in Table 1. ▶ Fisher’s test could
then be used to calculate a p-value for comparing the
proportion of differentially expressed genes in
Gene Set and Protein Set Expression a particular gene set to the proportion in the remaining
Analysis sets based on the hypergeometric distribution in (1).
Roger Higdon
P NP
Seattle Children’s Research Institute, Seattle,
X DX
WA, USA P value ¼ PðXÞ ¼ (1)
N
D
Synonyms
This approach has two major flaws: The first, Fish-
Gene set enrichment analysis er’s Exact test assumes expressions from genes within
G 806 Gene Set Enrichment Analysis
Gene Set and Protein Set Expression Analysis, randomization is not useful with very small sample
Table 1 Using Fisher’s Exact test for pathway enrichment sizes. Some argue that using self-contained methods
Differentially Not differentially with sample randomization is the only approach
expressed expressed to ensure statistically valid results (Goeman and
Genes in pathway X P-X Buhlmann 2007). Others contend it is better to use
Genes not in pathway D-X N + X-D-P multiple approaches in order to get more information
out the analyses since competitive and self-contained
methods test different hypotheses (Tian et al. 2005).
Although methods are applicable to proteomics
gene sets are uncorrelated with each other. This is data, PSEA has unique difficulties of its own.
unlikely since gene sets are chosen precisely because Typically, sample sizes are quite small, so the use
the genes are related in some biological manner. of sample permutation or randomization tests is
Hence, there is usually positive correlation among problematic. Also, unlike microarray data which
genes and therefore Fisher’s Exact test is often highly assays an entire genome, MS proteomics usually
anticonservative. In addition, Fisher’s Exact test assay only a fraction of the proteome. This results
dichotomizes a continuous variable (Expression ratio often in sparse coverage of proteins sets such as
or t-statistic) and therefore is less powerful than biochemical pathways.
methods that take advantage of the continuous data.
One of the most widely used approaches for gene
set analysis is the gene set enrichment analysis (GSEA) Cross-References
approach (Subramanian et al. 2005). In this approach,
genes are ordered by a differential expression statistic ▶ False Discovery Rate (FDR)
(typically a ▶ Student’s t-Test), then for each gene set ▶ Fisher’s Test
the distributions of expression statistics are compared ▶ Relative Expression Analysis
between those in the set and those not in the set using ▶ Student’s t-Test
the Kolmogorov-Smirnov test statistic. To calculate
p-values and ▶ false discovery rates (FDR), a permu-
tation test of samples is used in order to account for References
effects of correlation within sets. The Kolomogorov-
Smirnov test has been criticized for lacking power; Efron B, Tibshirani R (2007) On testing the significance of sets
of genes. Ann Appl Stat 1:107–129
therefore, others have proposed gene set analysis
Emmert-Streib F, Glazko GV (2011) Pathway analysis of
approaches that utilize other statistical tests, such as expression data: deciphering functional building blocks of
the GSA approach of Efron and Tibshirani (2007). complex diseases. PLoS Comput Biol 7(5):e1002053
More generally, GSEA approaches have been Goeman JJ, Buhlmann P (2007) Analyzing gene expression data
in terms of gene sets: methodological issues. Bioinformatics
broken down into self-contained and competitive
23:980–987
approaches. Self-contained approaches compare indi- Nam D, Kim S (2008) Gene-set approach for expression pattern
vidual gene sets across conditions without regard analysis. Brief Bioinform 9:189–197
for other gene sets. On the other hand, competitive Subramanian A, Tamayo P, Mootha VK et al (2005) Gene set
enrichment analysis: a knowledge-based approach for
methods measure relative differences between gene
interpreting genome-wide expression profiles. Proc Natl
sets. Examples of competitive tests are enrichment Acad Sci USA 102:15545–15550
tests such as Fisher’s exact test described above, e.g., Tian L, Greenberg SA, Kong SW et al (2005) Discovering sta-
find sets with more or fewer differentially expressed tistically significant pathways in expression profiling studies.
Proc Natl Acad Sci 102:13544–13549
genes. A number of methods are a mixture of these two
approaches. Although, there are a few methods based
on parametric tests, most GSEA approaches are based
on permutation or randomization tests of samples or
genes. Randomizing samples preserves correlation Gene Set Enrichment Analysis
structures and therefore leads to p-values that are less
biased than gene randomization. However, sample ▶ Gene Set and Protein Set Expression Analysis
Gene Sets for Pathways 807 G
products, i.e., messenger RNA (mRNA) levels, that
Gene Sets for Pathways provide a surrogate measurement for pathway activa-
tion. The fundamental assumption for any gene set is
Michael F. Ochs that changes in the measured biomolecules will be
Department of Oncology, Johns Hopkins University, coordinated.
Baltimore, MD, USA The most common gene sets presently comprise
mRNA levels expected to undergo coordinated change
and measurable on a gene “expression” microarray.
Definition The inherent assumption therefore is that these genes
are coregulated by a series of transcriptional activators.
A gene set is a grouping of genes that function in A gene set could also be created to monitor activation
a coordinated manner to provide some biological behav- of a signaling pathway through phosphoprotein levels,
ior. Many biological processes are controlled by path- as most signaling proteins change activity based on
ways, where proteins encoded by genes create phosphorylation of specific amino acid residues. This G
a directional though potentially circular flow, such as would effectively be a protein post-translational mod-
creation of a series of metabolites (metabolic pathway) ification set. A further “gene” set could be created from
or an ordered series of post-translational modifications metabolomic measurements, where the amounts of
(signaling pathway). Gene sets that serve as surrogate metabolites along a metabolic pathway could provide
indicators for activity in biological pathways must mea- a surrogate measurement for enzyme activity. This
sure the appropriate targets, which differ for each type of type of gene set could provide insight into metabolic
pathway. syndromes and diseases, including diabetes.
For a gene set to serve as a surrogate for measure-
ment of activity on a biological pathway, changes in
Characteristics the levels measured by the set must be linked to
changes in the pathway. Unfortunately, the term “path-
Biological phenotypes arise from the coordinated actions way” is used quite loosely biologically, primarily for
of numerous biomolecules and structures in an organism. historical reasons. Initially, the term pathway was used
The coordination, such as during organismal develop- quite literally, indicating nerve conduction, a viral
ment, is controlled by master regulators, often cell invasion course, or auditory path. Later, it referred to
signaling networks responding to environmental and the metabolic pathways that created biomolecules in
endocrine clues coupled to transcriptional and transla- cells (Carson and Frischer 1966), effectors for hor-
tional controllers. Often the change that drives a pheno- monal control (Samuels 1964), and gene interactions
type, such as the development of a specific tissue type, is and epistasis (Avery and Wasserman 1992). For gene
the activation of a set of genes, thus a coordinated change set enrichment analysis, we typically take a pathway to
in the transcript levels of many genes simultaneously. indicate a linked series of biomolecules: (1) a series of
Alternatively, the change could be coordinated post- proteins that are enzymes and the metabolic products
translational modifications of a set of proteins modifying produced by them (a metabolic pathway), (2) a series
their activity or localization, the increase in flux through of proteins that act together to transduce a signal when
a metabolic pathway leading to a buildup or excretion of post-translationally modified (a signaling pathway), or
a metabolite, or the breakdown of cellular structures as (3) a series of transcriptional regulators produced in
occurs during apoptosis. series (a transcriptional regulatory network). For each
Gene sets provide an approach for an analysis to of these, different gene sets are needed to serve as
gain insight into these drivers of phenotype through surrogate markers of activity.
identification of small changes in many biomolecular A gene set for a metabolic pathway could take three
levels or states rather than through identification of forms, depending on the biomolecule measured. The
a change in a single gene, protein, or metabolite. For most common measurement remains the mRNA levels
example, all the transcriptional products produced by from a microarray. In this case, a pathway, such as
the activation of the RAS-RAF-MEK-ERK signaling provided by the Kyoto Encyclopedia of Genes and
pathway could be used as a gene set of transcriptional Genomes (KEGG), would be converted to just the
G 808 Gene Sets for Pathways
enzymes and then to the genes encoding the enzymes. which is the incorrect set if the goal is a surrogate of
The gene set would comprise all the genes whose pathway activity.
protein products (i.e., enzymes) are necessary to take A gene set for a transcriptional regulatory network
the metabolic precursor chemical to its final form. The (TRN) also relies on the relationship between
assumption is that if the cell needs to produce more of a transcription factor (TF) and its target genes. A TRN
this final chemical form, it will coordinately upregulate is essentially a cascade of TFs linked through the gene of
all the genes to produce all enzymes in the pathway. a TF being a transcriptional target of an upstream TF.
This is an indirect measurement of the levels of the Then the initial activation of a single TF can lead to
proteins, where the protein levels are assumed to track activation of additional TFs, which then may lead to
the mRNA levels. It is worth noting that this is not true activation of further TFs, etc. The relationship of genes
in eurkaryotic organisms in general, although it may be sets for this series and for a signaling pathway is obvious.
valid in this limited case of metabolic enzymes. Alter- In both cases, the gene targets of a TF are the key to
natively, the gene set could be proteomic measure- constructing an appropriate gene set. In the case of
ments of the enzymes themselves, indicating the a TRN, there is an additional complication that arises
concentration of the catalysts that drive the chemical due to the timing of the measurements. If time resolution
changes. Finally, the gene set could be the set of is high, then the data can potentially separate the first
metabolites produced, measured by metabolomic tech- round (i.e., primary) of transcription from later (i.e.,
nologies such as mass spectrometry. Here, the direct secondary) transcription, and a true regulatory network
metabolic flux through the pathway would be mea- of relationships could be used, with each TF’s signature
sured. Note that this progression is from less direct to isolated. However, if a time resolution is low or, in the
more direct measurements of the pathway activity, but extreme case, only a single measurement is made, then
also from more mature to less mature technological the appropriate gene set may be all the genes regulated
platforms. by any member in the TRN. Effectively, the superset of
A gene set for a signaling pathway could take two all TF gene sets in the TRN would be the appropriate
forms. The first would be the direct measurement of the gene set to serve as a surrogate for TRN activity.
amount of phosphoproteins and unmodified proteins A gene set is typically tested for significance using
along the pathway. As with the metabolites in a rank sum test upon a statistic generated for each gene,
a metabolic pathway, this would effectively be protein, or metabolite during preprocessing. This initial
a measurement of the flux through the pathway, here individual biomolecular (e.g., gene) statistic relies on
providing the strength of the signal. The second is more comparing two or more phenotypic groups, such as
complex, involving mRNA levels. The complexity arises through a t-test or Wilcoxon rank sum test. The members
from the fact that signaling is driven not by protein levels of the set are compared to the non-set members also
but by post-translational modification of proteins and the measured in the experiment, and significance is deter-
duration of these modifications, the specific modifica- mined by permutation tests on the set labels or by the
tions made, and the number of proteins with these mod- assumption of a distribution on all sets. The final signif-
ifications. As such, the mRNA levels for genes encoding icance for the gene set must be adjusted for multiple
the proteins become an inadequate surrogate for signal- testing, using either family-wise error rate or false dis-
ing activity. However, most, though not all, signaling covery rate corrections. Family-wise error rates are more
pathways lead to changes in the activity of transcription strict, but if the goal of a study is hypothesis-generation
factors (TFs). These TFs have transcriptional gene tar- for more focused testing, then false discovery rate
gets, and the mRNA levels of these targets are directly methods provide a greater chance of finding a novel
modified by the transcription factors, either increased hypothesis.
(activated) or decreased (repressed). The coordinated The complexity of defining gene sets argues that
changes of the transcription factors downstream of sets must be chosen with an understanding of biology
a signaling pathway then provide a surrogate measure and with a determined goal in mind. The set must
of pathway activity. It is critical to note that the gene sets conform to known biological behavior, so that the set
available in databases ignore this complication, however, is an appropriate surrogate for the pathway of interest.
so that the gene set for a KEGG signaling pathway is For example, using the genes encoding the proteins in
typically just the genes encoding the signaling proteins, a signaling pathway is an incorrect set if the goal is
Gene Silencing 809 G
a measure of pathway activity. Limiting the sets tested
to those that are of interest is also useful for minimiz- Gene Silencing
ing the effect of multiple testing, since testing all
existing sets can lead to thousands of tests, reproducing Yan Zhang
the multiple testing problem of individual gene Key Laboratory of Systems Biology, Shanghai
measurements. Institutes for Biological Sciences, Chinese Academy
Future gene sets are likely to involve integration of of Sciences, Shanghai, China
different molecular species. For instance, studies have
already integrated mRNA levels, protein levels, and
metabolite levels into a coherent biological picture. Definition
Gene sets that comprise different types of measure-
ments and molecules will likely replace the present Gene silencing refers to a mechanism by which cells
gene sets as such measurements become more ubiqui- shut down large sections of chromosomal DNA. It is
tous. It will be critical when constructing such sets generally used to describe the “switching off” of a gene G
to carefully reproduce the biological relationships by a mechanism other than genetic modification. That
between molecular species, in order to construct mean- is, a gene which would be expressed (turned on) under
ingful sets. normal circumstances is switched off by machinery in
the cell. Gene silencing is done by incorporating the
DNA to be silenced into a form of DNA called
Cross-References heterochromatin that is already silent.
Gene Silencing,
Fig. 1 RNAi-mediated gene RNA Pol III prom shRNA insert T T T T T shRNA expression cassette (psiRNA)
silencing in mammals using
short hairpin RNA genes
Cytoplasm
UU
Dicer
UU
P
siRNA
UU P
UU
P
FMRP
VIG
Tudor-SN
UU
P
elF2C2
RISC
References
Gene Type
Bass BL (2000) Double-stranded RNA as a template for gene
silencing. Cell 101(3):235–238 ▶ IMGT-ONTOLOGY, GeneType
Gene-centered Information Resource, GoGene 811 G
model organisms (▶ Model Organism) to concepts of
Gene-centered Information Resource, the Gene Ontology (GO) and to diseases based on
GoGene millions of citations in PubMed. With these associa-
tions, GoGene supports interpretation of results from
Conrad Plake high-throughput experiments or simply searching the
Biotechnology Center (BIOTEC), Technische literature for discussed genes. A sequence query lets
Universit€at Dresden, Dresden, Germany users retrieve genes with similar gene or protein
sequences and explore their GO and disease annota-
tion, for example, to find functional hints for yet
Definition uncharacterized genes.
1 2 3
5′ 3′
5′ 3′
5′ 3′
Parkinson’s disease 5′ 3′
5′ 3′
5′ 3′
5′ 3′
5′ 3′
Gene-centered Information Resource, GoGene, Fig. 1 GoGene accepts three types store, are added such as descriptive summaries, literature references, and ontological
of queries: keywords, sequences, and gene identifiers. Queries are sent either to PubMed, concepts describing gene functions, and their role in biological processes and diseases
UniProt, or Entrez Gene (1). If the destination for a query was not specified by the user, (3). The list of genes is then displayed in GoGene (4). Concepts from the GO and MeSH
GoGene determines automatically which database gives the best result. After a list of for all genes are displayed as a tree to the left
genes has been retrieved (2), more details about each gene, taken from a local annotation
Gene-centered Information Resource, GoGene
General Transcription Factors 813 G
query in Entrez Gene results in two rat genes (Pth and Cross-References
Tnfsf11). In the literature database PubMed, the query
“rats osteoporosis bone resorption” returns 857 cita- ▶ Alibaba
tions. Reading their abstracts to identify the relevant ▶ Entity Mention Normalization
genes is cumbersome. GoGene alleviates this task by ▶ Gene Normalization with GNAT
automatically searching these abstracts for mentioned ▶ Gene Ontology
genes and displaying the resulting gene list to the user. ▶ MEDLINE and PubMed
In the tree, the biological process, bone resorption, the ▶ Model Organism
organism, rat, and the disease, osteoporosis, are listed ▶ Mutual Information
as top categories. Selecting each as mandatory aug- ▶ Retrieving and Extracting Entity Relations from
ments the query and eventually leaves five rat genes EBIMed
(Pth, Tnfsf11, Ctsk, Tnfrsf11b, and Csf1) for which ▶ Search Engines with Faceted Search
literature references state their role in the specified
disease and biological process. G
References
Methods
Prerequisite for annotating genes with concepts from Baumgartner WA, Cohen KB, Fox LM, Acquaah-Mensah G,
Hunter L (2007) Manual curation is not sufficient for anno-
biomedical ontologies by automated literature analysis
tation of genomic databases. Bioinformatics 23(13):i41–i48
is the correct identification of genes and concepts in Manning CD, Schuetze H (1999) Foundations of Statistical
text. For gene identification, GoGene employs GNAT, Natural Language Processing, 1st edn. The MIT Press, Cam-
a tool for gene mention recognition and normalization bridge, MA, USA. ISBN 0262133601
Plake C, Royer L, Winnenburg R, Hakenberg J, Schroeder M
to a gene database (▶ Named Entity Recognition,
(2009) GoGene: gene annotation in the fast lane. Nucleic
▶ Entity Mention Normalization). Having defined the Acids Res 37(Web server issue):W300–W304. doi:10.1093/
gene literature in PubMed using GNAT, GoGene then nar/gkp429. http://dx.doi.org/10.1093/nar/gkp429
associates these publications with biomedical concepts
provided by the GoPubMed web server. This also
includes all MeSH concepts assigned by the US Gene-External Type of Promoter
National Library of Medicine for indexing. For each
concept found, the corresponding citation is also asso- ▶ Type 3 Promoters
ciated to all its ascendants according to the GO or
MeSH disease hierarchy. For instance, if a PubMed
citation is associated with pancreatic neoplasms,
then it is also associated with digestive system neo- General Transcription Factors
plasms, neoplasms by site, neoplasms, and diseases.
A relationship between a gene and a GO or disease Tetsuro Kokubo
concept is established based on co-occurrences within Department of Supramolecular Biology, Graduate
PubMed citations. The confidence in such a relation- School of Nanobioscience, Yokohama City
ship is computed as the log ratio of observed University, Yokohama, Kanagawa, Japan
co-occurrence probability to the co-occurrence proba-
bility expected under independence, or pointwise Synonyms
▶ mutual information (Manning and Schuetze 1999).
Basal transcription factors
Related Tools
Tools and web servers performing tasks similar Definition
to GoGene are ▶ Alibaba, EBIMed (▶ Retrieving
and Extracting Entity Relations from EBIMed), In eukaryotes, RNA synthetic activities are performed
and iHOP. by three distinct types of RNA polymerases
G 814 Generalization
(RNAPI, II, and III) that demonstrate different sensitiv- linear predictors are replaced by smooth nonlinear
ities to a-amanitin, a toxic substance from the mush- functions.
room (Amanita phalloides). These RNAPs alone cannot
bind to the core promoter, a DNA region surrounding
the transcription initiation site of a given gene. Thus, Characteristics
other proteins are essential for transcription initiation at
the specific site of the core promoter. The minimal set Generalized additive models (GAMs) were devel-
of such proteins was purified by chromatography for oped by Hastie and Tibshirani (1990) and presented
each RNAP and designated as GTFs (general transcrip- in a similar manner to ▶ generalized linear models
tion factors) (Hampsey 1998; Orphanides et al. 1996; (GLMs) where a function of the mean (the link
Thomas and Chiang 2006). The main functions of GTFs function) is modeled as a linear combination
are to recognize core promoter structures, recruit RNAP of smooth functions of explanatory or predictor
to the core promoter, and regulate transcription in variables.
response to activators and repressors.
Cross-References
gðmÞ ¼ b þ f1 ðx1 Þ þ ::: þ fm ðxm Þ (1)
▶ Transcription in Eukaryote
GO Level
is measured by the BLAST bit 4
score. The relationship is
model using generalized 3
additive model where the
curve is estimated by lowess 2
regression
1
0 G
3 4 5 6
BLAST bit score
Cross-References Characteristics
for the user in CUDA are blocks, grids, and threads, to A program can be organized into the following
ease the decomposition of the problem domain. parts: (a) set up input data on the CPU; (b) transfer
A CUDA program is organized into a host program, the data to the GPU by cudaMemcpy(dev_a, &a,
consisting of one or more sequential threads running size, cudaMemcpyHostToDevice), a function that
on a host CPU, and one or more parallel kernels suit- allows to transfer input data from CPU to GPU by
able for execution on a parallel computing GPU. In using as last parameter the cudaMemcpyHost-
Fig. 1 some basic features of parallel programming ToDevice keyword; (c) run the kernel on the GPU
with CUDA are shown where it is possible to note by the __global__ modifier indicating that the pro-
the affinity with C language. cedure is a kernel entry point, but the execution on
Genetic Algorithms 819 G
the GPU is done by the extended function call i.e.,
add < <<4, 2>> > (dev_a, dev_b, dev_c) Genetic Algorithms
launches the kernel add() in parallel across 4 blocks
of 2 threads each; (d) finally, transfer the result Feng-Sheng Wang and Li-Hsunan Chen
back to the CPU by the cudaMemcpy(dev_a, Department of Chemical Engineering, National Chung
&a, size, cudaMemcpyDeviceToHost) defining Cheng University, Chiayi, Taiwan
the direction from GPU to CPU by using the
cudaMemcpyDeviceToHost keyword.
Synonyms
Cross-References
Evolutionary algorithm
▶ Multicore Computing
G
Definition
References
Genetic algorithms (GA) are adaptive heuristic
Dematté L, Prandi D (2010) GPU computing for systems biol- search algorithms based on the conjecture of natural
ogy. Brief Bioinform 3:323–333
Nickolls J, Dally WJ (2010) The GPU computing era. IEEE
selection and genetics (Zhilinskas and Žilinskas
2:56–69 2008). The basic concept of genetic algorithms is
designed to simulate processes in a natural system
necessary for evolution, specifically those that fol-
low the survival of the fittest inspired by Darwin’s
principle. As such they represent an intelligent
General-Purpose GPU exploitation of a random search within a defined
search space to solve a problem. Algorithm is
▶ GPU Computing started with a population of strings (represented
by chromosomes or the genotype of the genome),
which encodes candidate solutions (called individ-
uals or phenotypes) to an optimization problem, and
GENESIS: General Neural Simulation evolves toward better solutions. The typical steps
System are:
1. Choose an initial population of candidate solutions.
▶ Database of Quantitative Cellular Signaling 2. Calculate the fitness, how well the solution is, of
(DOQCS) each individual.
3. Perform crossover from the population.
The operation is to randomly choose some pair of
the individuals as parents and exchange so parts
Genes-to-Systems Breast Cancer from the parents to generate new individuals.
Database 4. Mutation is to randomly change some individuals to
create other new individuals.
▶ Data Integration, Breast Cancer Database 5. Evaluate the fitness of the offspring.
6. Select the survive individuals.
7. Proceed from 3 if the termination criteria have not
been reached.
Genetic Algorithm Example. Genetic algorithms have been applied in
the fields of bioinformatics, computational biology,
▶ Evolutionary Algorithm, Transcription Regulatory and systems biology (Larranaga et al. 2006; Handl
Network Construction et al. 2007).
G 820 Genetic Association
ATTGGCCTTAACCCCCGATTATCAGGAT
single nucleotide polymorphism
ATTGGCCTTAACCTCCGATTATCAGGAT
ATTGGCCTTAACCCGATCCGATTATCAGGAT
insertion−deletion variant
ATTGGCCTTAACCC −−− CCGATTATCAGGAT
ATTGGCCTTAACCCCCGATTATCAGGAT
inversion variant
ATTGGCCTTCGGGGGTTATTATCAGGAT
ATTGGCCTTAACACACACACAGGAT
microsatellite
ATTGGCCTTAACACACA −−−− GGAT
Genetic Marker, Fig. 1 Examples of the main DNA variants whereas microsatellite markers have many possible states
or genetic markers (modified from (Frazer et al. 2009)). Each depending on the number of simple repeats. Copy number var-
DNA sequence segment represents one possible state (allele) iants (CNVs) involve DNA segments of much longer lengths
Single nucleotide polymorphism (SNP), insertion–deletion var- (represented by dots in the graph), and it is common to have only
iant, and inversion variant typically have only two alleles, two alleles
four ordered genotypes (e.g., A|A, A|G, G|A, G|G). method for microsatellite markers is to first identify
When the parental origin of the pair of chromosomes the location of the repeat sequence by its flanking
is ignored, there are at most three genotypes for a SNP sequence, then amplify the amount of repeats-
(e.g., A/A, A/G, G/G). containing DNA segment by polymerase chain reac-
Genotyping is an experimental determination of tion (PCR), and finally the repeat length is deter-
the genotype of a genetic marker. The genotyping mined by a gel electrophoresis. Microsatellite
technology evolves constantly. Many are based markers played an important role in the construc-
on hybridization: single-stranded probe set binds the tion of genetic map of human genome during
single-stranded DNA segment containing a specified 1980s–1990s. The roughly 30,000 known microsat-
SNP, and the binding strength differs between the two ellite markers in human genome cover on average
alleles. The chip technology allows such hybridiza- 100 kb per marker.
tion-based detection to be carried out for hundreds of
thousands of probes, making it possible to genotype Copy Number Variants
a large fraction of SNPs in human genome in one run. Copy number variants are also variable number of
SNP has been a popular choice of genetic marker repeats markers with a much longer repeating
since mid-1990s. The total number of SNPs in human sequence (1 kb or larger) (Feuk et al. 2006). The
genome that differentiate unrelated individuals is in normal number of copies of any DNA segment in
the range of several millions (Frazer et al. 2009). The a diploid genome is 2, and a deletion (duplication)
exact number of SNPs depends on a particular ethnic leads to a reduced (increased) copy number of 1 (3).
population (e.g., Caucasian, Asian, African), and Several mechanisms for the generation of copy number
depends on whether SNPs with very low minor- variants have been proposed, such as nonhomologous
allele-frequency are counted as polymorphism. An recombination.
evenly distributed set of 10 million SNPs in human More than a thousand CNVs have been observed in
genome corresponds to one SNP per 300 bases on human genome. The exact number of CNVs should
average. again depend on a particular ethnic population, and
depend on whether rare or newly created de novo
Microsatellite Markers CNVs are counted. Even though the number of CNVs
Microsatellite markers are one type of variable number might be low, more than 90% of base difference
of tandem repeat markers of which the repeat sequence between two individual’s genomes are due to CNVs
is simple (1–6 bases) (Fig.1). The common genotyping and other structural variants (Frazer et al. 2009).
Genetic Marker 823 G
Genetic Marker, Table 1 An illustration of a genetic association test. A total of 1,000 normal and 1,000 disease affected samples
are genotyped, and the sample counts for three genotypes (A/A, A/G, G/G) are listed in the table. The row–column correlation in the
2-by-3 genotype table can be tested by Pearson’s chi-square test: w2 ¼ 72.516 leading to p-value ¼ 2.22 1016. The Cochran–
Armitage trend test statistic of the genotype table is CAT ¼ 66.366, corresponding to p-value ¼ 3.33 1016. For the 2-by-2 allele
count table, w2 ¼ 69.829 leads to p-value ¼ 1.11 1016. Two more quantities can be obtained from the 2-by-2 allele count table:
The minor allele frequency difference between the diseased and the normal group is 0.095, and odds-ratio is 2.13
Genotype count Allele count
Phenotype A/A A/G G/G total A G total
Normal n0,AA ¼ 20 n0,AG ¼ 170 n0,GG ¼ 810 n0 ¼1,000 n0,A ¼ 210 n0,G ¼ 1,790 2n0 ¼ 2,000
2% 17% 81% 100% 10.5% 89.5% 100%
Diseased n1,AA ¼ 40 n1,AG ¼ 320 n1,GG ¼ 640 n1 ¼ 1,000 n1,A ¼ 400 n1,G ¼ 1,600 2n0 ¼ 2,000
4% 32% 64% 100% 20% 80% 100%
Association Between Phenotypes and Genetic near the same chromosome location. The term “inter- G
Markers action” used in genetics, statistics, epidemiology, and
The main application of genetic markers is as biochemistry has conflicting meanings, and has been
a surrogate of genes for studying genotype–phenotype a source of confusion. We may loosely distinguish two
relationship based on statistical correlation. One such types of meanings of interaction.
study is disease gene mapping, where the phenotype is Bateson’s interaction (or epistasis) (Cordell 2002;
the disease status. Investigation of genotype–pheno- Phillips 2008), after William Bateson (1861–1926),
type relationship can be carried out on family data refers to the situation when the nature of phenotype–
(linkage analysis (Ott 1999)), on a randomly selected genotype relationship for gene 1 is modified by
dataset in a population (association analysis (Balding a change of allele in gene 2. This is often discussed
2006; Li 2008)), or on an exhaustive collection of when gene 1 is the major effect gene, and gene 2 is
people in an isolated population. The latter contains a modifier gene.
elements from both linkage and association analyses. Fisher’s interaction (or statistical interaction), after
For a genetic marker to be surrogate of a gene, an Ronald Aylmer Fisher (1890–1962), refers to any devi-
allele of the marker has to be in linkage disequilibrium ation from the linear phenotype–genotype relation:
(LD) with that of a gene in proximity. LD is simply y ¼ c1 g1 + c2 g2 where y is a quantitative phenotype,
a nonrandom statistical association between alleles at g1 and g2 are numerical coding of two genes (e.g.,
two loci (Slatkin 2008). The presence or absence of LD number of minor alleles), and c1, c2 are regression
between neighboring markers partitions the human coefficients. If another model with a nonlinear term,
genome into discrete LD blocks, and all markers in e.g., y ¼ c1 g1 + c2 g2 + c12 g1 g2 fits the data better
a LD block can potentially be surrogates of genes than the linear relationship (either in the sense of
within the same block. model selection or in the sense of significant c12 coef-
Genetic association carried out on the whole genome ficient), a statistical interaction between genes 1 and 2
level with a large number of genetic markers is called is claimed.
a genome-wide association study (GWAS) (▶ Genome- The term “interaction” in biochemistry, biological
wide Association Study). GWAS has generated a long pathway, and system biology has a more intuitive
list of genetic markers associated with various human meaning of a physical contact by chemical bonds, to
diseases. Statistical analysis plays an important role in be in the same pathway, or being linked in a gene
marker-based genetic association studies (Balding 2006; network. The purpose to study Fisher’s or Bateson’s
Li 2008). Table 1 shows how a typical association test interaction in genetic marker data is to discover the
can be carried out for a single SNP. true physical interaction among gene products.
The concept of gene–gene interaction can be
Gene–Gene and Gene–Environment Interaction extended by gene–environment interaction (Thomas
Genetic markers can be used in studying gene–gene 2010). The Bateson’s interaction is particularly easy
and gene–environment interaction, based on the to be translated in this context: where certain environ-
assumption that these can be surrogates of genes at or ment factor plays the role of modifier gene.
G 824 Genetic Material
Cross-References Definition
▶ Genome-wide Association Study The stable coexistence of two or more variations in the
genotype in a population at a high frequency is called
genetic polymorphism. Based on these variations
References
populations can be divided into subgroups and
Balding DJ (2006) A tutorial on statistical methods for popula- variability is seen in the gene pool. These polymor-
tion association studies. Nat Rev Genet 7:781–791 phisms lead to multiple alleles for a particular locus,
Cordell HJ (2002) Epistasis: what it means, what it doesn’t and in disease-associated genes, selective advantage
mean, and statistical methods to detect it in humans. Hum makes one allele protective and the another susceptible
Mol Genet 11:2463–2468
Feuk L, Carson AR, Schere SW (2006) Structural variation in the to a disease. In enzyme coding genes, genetic poly-
human genome. Nat Rev Genet 7:85–97 morphism, may affect the biological activity or affinity
Frazer KA, Murray SS, Schork NJ, Topol EJ (2009) Human of the enzyme to the substrate. The fact that they
genetic variation and its contribution to complex traits. Nat coexist stably in the population indicates that most
Rev Genet 10:241–251
Li W (2008) Three lectures on case-control genetic association often they do not have drastic effects on the phenotype.
analysis. Brief Bioinfom 9:1–13
Ott J (1999) Analysis of human genetic linkage, 3rd edn. The
Johns Hopkins University Press, Baltimore Cross-References
Phillips PC (2008) Epistasis - the essential role of gene interac-
tions in the structure and evolution of genetic systems. Nat
Rev Genet 9:855–867 ▶ Epigenetics, Drug Discovery
Slatkin M (2008) Linkage disequilibrium – understanding the
evolutionary past and mapping the medical future. Nat Rev
Genet 9:477–485
Strachan T, Read A (2010) Human molecular genetics, 4th edn.
Garland Science, New York, Part 4 Genetic Redundancy
Thomas D (2010) Gene-environment-wide association studies:
emerging approaches. Nat Rev Genet 11:259–272 Diana Ascencio and Alexander DeLuna
Laboratorio Nacional de Genómica para la
Biodiversidad, CINVESTAV-IPN, Irapuato,
Genetic Material Guanajuato, Mexico
▶ Genome
Synonyms
Gene redundancy
Genetic Network
modification of the proteins. Usually, the regulation is of gene expression is initiated with the activation and
achieved by altering the rates of various steps men- binding of ▶ transcription factors (TF) to the promoter
tioned above. The various mechanisms that elicit such regions. TFs also help in activating and recruiting RNA
kind of regulations are briefed below (see Fig. 1) polymerases (enzymes involved in RNA synthesis) to
(Perdew et al. 2006; White and Sharrocks 2010). the transcriptional initiation site. A single TF can act as
an activator or repressor for two different genes or mul-
Protein-DNA Interactions tiple genes at a time. There are certain enhancer and
Protein-DNA interactions are the most fundamental silencer regions on the DNA, which when bound by the
interactions in regulation of the gene expression respective transcription factors drastically increase
where various regulatory proteins bind to the DNA and decrease the rate of transcription, respectively.
(Fig. 1a) to accomplish transcription. A gene contains Moreover, binding of certain activator or repressor mol-
a core promoter region and contains sequence called as ecules to the promoter regions increases or decreases the
TATA box where the proteins complexes including dif- rate of transcription, respectively. Similarly in prokary-
ferent transcription factors (TFIID,-B,-E,-F,-H), TBPs otes, certain operons are catabolite-regulated operons
(TATA-binding proteins), and TAFs (TBP-associated (e.g., lac-operon) where catabolite acts as an activator
factors) can bind to facilitate transcription. The process for energy metabolism whereas for attenuated operons
Genetic Regulation Mechanisms 829 G
(e.g., trp-operon) the proteins act as a repressor for its modifications in TFs and proteins are ubiquitination,
own synthesis (Orphanides and Reinberg 2002; Levine phosphorylation, acetylation on lysine residues,
and Tjian 2003; Kornberg 1999). methylation on arginine and lysine residues, glycosyl-
ation, fatty acylation, disulfide bond formation, and
Protein-Protein Interactions proteolysis (Orphanides and Reinberg 2002).
Various protein molecules and protein complexes are
known to interact with the TFs and RNA polymerases Multiple Binding Sites and Cooperativity
to modulate the regulation of the gene expression Certain promoter regions possess multiple binding
(Fig. 1b). The rate of gene expression can be regulated sites for the TFs which can modulate the genetic
by modulating either the probability of the TF binding or responses (Fig. 1c). It is observed that higher promoter
the strength (affinity) of the TF binding to the promoter occupancy and higher strength of TF binding leads to
region. It is found that the increase in TF binding strength increased rate of transcription and vice versa. The avail-
is achieved by the formation of dimeric or multimeric ability of the multiple binding sites provides a room for
protein-DNA complex. Multimeric protein-DNA inter- altering the rates of transcription based upon the combi- G
actions have been shown to yield steep sensitive natorial effect of the upcoming signals. The bound
responses through the effect of stoichiometry as com- subunit has the cooperative effect on the binding of the
pared to a binding of single TF. Several proteins act as next subunit by increasing or decreasing its affinity
specificity factors, inducers, activators, repressors, core- toward the binding region exhibiting positive or negative
pressors, coactivators, and mediator complexes to regu- cooperativity, respectively. The stoichiometry of inter-
late gene expression. Specificity factors are the proteins action is drastically affected and the equilibrium is so
that alter (increase or decrease) the specificity of the shifted that a steep rise in the rate of expression is
RNA polymerases to the promoter regions. Inducers obtained with increasing number of binding sites. Occu-
are the molecules that interact with the activators and pancy of multiple binding sites has shown to yield
repressors to induce the transcription through positive ultrasensitive and subsensitive responses by the binding
and negative controls, respectively. Activators enhance of activators and repressors, respectively. Such responses
the rate of transcription by increasing the affinity of the also facilitate initial delays and are operational only by
RNA polymerase to the respective promoter regions. certain activation or repression thresholds (Chin et al.
Repressors intervene the association of the RNA poly- 1999; Sacketta and Saroff 1996).
merases and TF binding to the promoter regions and
decrease the rate of transcription. The mediator protein Auto-Regulation and Feedback Mechanisms
complex is an important interface that compounds the Auto-regulation is a phenomenon in which the rate of
activity of RNA polymerases with the activator protein gene expression is directly or indirectly regulated by
complex to facilitate the transcriptional regulation. its own gene product with certain feedback mecha-
Moreover, there are various interactions of coactivators nisms (Fig. 1d). The process by which a gene product
and corepressors with activators and repressors leading upregulates its own production by increasing the rate of
to positive and negative control mechanisms (Perdew its gene expression as a result of positive feedback is
et al. 2006; Orphanides and Reinberg 2002; Levine and called as positive auto-regulation (PAR). PAR exhibits
Tjian 2003; Kornberg 1999). slower response time due to the delay in activation and is
sensitive to the inherent noise leading to cell-to-cell
Protein Modifications and Stability variability in the gene product concentrations. It helps
The cell has to respond to changing environmental in signal amplification and pattern generation. The pro-
stimuli and correspondingly regulates the net protein cess by which the gene product downregulates its own
abundance at a given instance. This is achieved by production by the decreasing rate of its gene expression
controlling the rates of synthesis and degradation of as a result of negative feedback is called as negative auto-
the proteins and altering the stability of the mRNAs regulation (NAR). NAR fastens the response time and
and proteins as the requirement. Various cofactors is resistant to the inherent noise leading to reduced
assign the functional groups to the amino acid residues cell-to-cell variability of gene product concentrations.
and help in catalyzing the enzymatic reactions Various combinations of positive and negative feedback
for protein stabilization and degradation. Several loops have shown to elicit memory, bistability,
G 830 Genetic Variation
oscillations, and robustness in the gene expression pro- which leads to acetylation and methylation of N-terminal
files. Moreover, such feedback mechanisms allow the residues of histone tails. On the contrary, the transcrip-
cells to filter out transient input signals and aids in tional repressors recruit HDACs (Histone deactylases)
appropriate decision making by responding to the multi- that leads to deacetylation of the histone tails and subse-
ple regulatory pathways (Alon 2007). quent repression of the transcription (Orphanides and
Reinberg 2002; Kornberg 1999).
Nucleocytoplasmic Transport The above discussed mechanisms work in coordina-
In eukaryotes, a potential way of regulating gene tion with different levels to regulate gene expression.
expression is by controlling nucleocytoplasmic trans- Cells have engineered gene regulatory motifs involving
port (Fig.1e) through the nuclear pore complexes and these mechanisms in combination with feed forward and
the selective exchange of RNA (export) and protein feedback loops to elicit a system level regulation. More-
molecules (import) in the two compartments. over, the global regulatory pathways from signaling and
The nuclear and cytoplasmic environment is spatially metabolic networks also coordinate to govern the gene
separated by the nuclear envelope, which facilitates the expression profiles. With advances in research in this
separation of transcriptional and translational machin- area, newer interactions and mechanisms are being
ery. The transport of the RNA and proteins are medi- explored that elicit precise regulation.
ated by the specialized transport motifs called as
nuclear export signals (NES) and nuclear localization
Cross-References
signals (NLS), respectively. The shuttling of the
heterodimeric complex (to and fro) from nucleus to
▶ Transcription
cytoplasm serves as a quality control mechanism by
▶ Transcription Factor
selective transport of the only mature RNAs to the
translational machinery. This transport can also be
regulated by the covalent modifications of the NES References
and NLS molecules. Furthermore, the event of RNA
export is coupled with the transcription and pre-mRNA Alon U (2007) Network motifs: theory and experimental
approaches. Nat Rev Genet 8:450–461
processing by the capping and splicing mechanisms.
Chin JW, Kohler JJ, Schneider TL, Schepartz A (1999) Gene
Many proteins shuttle continuously between cyto- regulation: protein escorts to the transcription ball. Curr Biol
plasm and nuclease based on their requirement and 9:R929–R932
localization at the respective sites (Orphanides and Dimaano C, Ullman KS (2004) Nucleocytoplasmic transport:
integrating mRNA production and turnover with export
Reinberg 2002; Dimaano and Ullman 2004).
through the nuclear pore. Mol Cell Biol 24(8):3069–3076
Kornberg RD (1999) Eukaryotic transcriptional control, multi-
Chromatin Remodeling and Histone Modification layered transcription mechanisms, millenium issue. TCB
In eukaryotes, DNA is not easily accessible to the tran- 9(12):(0962–8924)
Levine M, Tjian R (2003) Transcription regulation and animal
scription interactions as it is supercoiled around the core of
diversity. Nature 424:147–151
octameric histone proteins in the nucleosomes leading to Orphanides G, Reinberg D (2002) A unified theory of gene
a highly compact structure called as chromatin. Various expression. Cell 108:439–451
coregulatory proteins are recruited along with the tran- Perdew GH, Vanden Heuvel JP, Peters JM (2006) Regulation
of gene expression-molecular mechanisms. Human Press,
scriptional factor to facilitate chromatin decompaction
Totowa, Book
and conformational changes that provide access to the Sacketta DL, Saroff HA (1996) The multiple origins of
promoter regions for recruitment of RNAPII and general cooperativity in binding to multi-site lattices. FEBS Lett
transcriptional machinery. This process is coupled with 397:1–6
White RJ, Sharrocks AD (2010) Coordinated control of the gene
the transcription elongation. The chromatin structure
expression machinery. Trends Genet 26(5):214–220
can be temporarily modified by the phosphorylases
from signaling cascades and permanently modified by
methylation of DNA, termed as gene silencing. For the
activaton of the genes, activator proteins recruit Genetic Variation
HATs (Histone acetyltransferse) and HMTs (Histone
methyltransferase) to the promoter regions of the genes, ▶ Genetic Marker
Genome Project 831 G
process consists of starting with raw DNA sequences
Genetic Variations and giving a biological meaning to its content (Stein
2001). The first step in annotating a raw sequence will
▶ Genetic Polymorphisms require the mapping of structural elements in the genome
by comparing the latter against a library of already
known sequences. This especially includes genetic
markers, genetic polymorphisms, protein-coding regions
Genome (▶ Protein Structure Comparison, High-Performance
Computing, ▶ Proteome, ▶ Proteome Analysis Pipeline,
Vani Brahmachari and Shruti Jain ▶ Proteomics), as well as other functional elements such
Dr. B. R. Ambedkar Center for Biomedical Research, as non-translated RNAs (▶ RNA-seq), repetitive ele-
University of Delhi, Delhi, India ments, duplicated genes, and regulatory regions
(▶ Gene Regulation, ▶ Regulation, ▶ Regulation and
Autoregulation, ▶ Regulation Function). The structural G
Synonyms elements are aligned using a specialized search tool,
such as BLAST (http://blast.ncbi.nlm.nih.gov/Blast.cgi)
Genetic material developed at the National Center for Biotechnology
Information Database (Altschul et al. 1990), or
GENSCAN (http://genes.mit.edu/GENSCAN.html) by
Definition Burge and Karlin (1997). The second step in annotation
involves the association of biological processes to the
The genome is defined as the entire genetic makeup of an sequences identified from the first step. These processes
organism. It can be either DNA or RNA (as is in the case include metabolic functions, regulation (▶ Gene
of some viruses). It includes the nuclear as well as Regulation), interactions (▶ Binding Affinity), and gene
organelle DNA. The complexity and the size of the expression (▶ Gene Expression). In 1999, a consortium
genome vary between organisms. The average human created the gene ontology (http://www.geneontology.
genome size is 3.2 109 base pairs and corresponds to org) to standardize the vocabulary used for describing
one copy of each of the 23 chromosomes (haploid set) as genes, gene products, and genomes.
opposed to 4.6 106 base pairs for Escherichia coli.
References
Cross-References
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ
(1990) Basic local alignment search tool. J Mol Biol
▶ Epigenetics
215:403–410
Burge C, Karlin S (1997) Prediction of complete gene structures
in human genomic DNA. J Mol Biol 268:78–94
Stein L (2001) Genome annotation: from sequence to biology.
Nat Rev Genet 2:493–503
Genome Annotation
Akos Dobay
Institute of Evolutionary Biology and Environmental Genome Browsers
Studies (IEU), University of Zurich, Zurich,
Switzerland ▶ Genomic Resources
Definition
Genome Project
Genome annotation is the process of attaching higher-
level information to primary sequences. The whole ▶ NCBI BioProject Genome Resources
G 832 Genome-Scale Metabolic Modeling
Maps Gene
Genetic
Clone/contig sequence
Physical
Polymorphism
Control region
Variation and mutation
Motif
Pathway
Structural features
G
Expression
Centromere
Other
Repeat
Schemas
Database management
system (DBMS)
Query type
Genomic Databases,
Sequence similarity query Pattern search query
Fig. 3 Query types
that are recoverable from genomic databases. For relational schemas, whose structure is anyways inde-
example, Genomic segments include all the nucleotide pendent from the biological nature of data (see Fig. 2).
subsequences that are meaningful from a biological For example, the Genomics Unified Schema (GUS)
point of view, such as genes, clone/clontig sequences, (GUS 2006) is a relational schema suitable for a large
polymorphisms, control regions, motifs, and structural set of biological information, including genomic data,
features of chromosomes. genic expression data, and protein data, while the
Pathway Tools Schema (Karp 2000) is an object
Database Schemas schema used in Pathway/Genome databases.
Most genomic databases are relational, even if relevant
examples exist based on the object-oriented or the Query Types
object-relational model (see e.g., (WormBase 2006; In Fig. 3, a taxonomy of query types supported by most
Twigger et al. 2002)). Four different types of database genomic databases is illustrated.
schema that are designed “specifically” to manage By simple querying, it is possible, for example, to
biological data can be distinguished from other recover data satisfying some standard search parameters
unspecific database schemas, which are mostly generic such as gene names, functional categories, and others.
Genomic Databases 837 G
Genomic Databases, Database search methods
Fig. 4 Search methods
Text based Grafical interaction based Sequence based Query language based
Analysis queries are more complex and, somehow, more Query result format
typical of the biological domain. They consist in retriev-
ing data based on similarities (similarity queries) and HTML
G
patterns (pattern search queries). The former ones take
XML
as input a DNA or a protein (sub)sequence and return SBML
those sequences found in the database that are the most
BioPAX
similar to the input sequence. The latter ones take as
input a pattern p and a DNA sequence s and return Other XML formats
those subsequences of s which turn out to be most FASTA
strongly related to the input pattern p.
Flat file
EMBL file
Search Methods
A further classification criterion is related to methods GenBank file
used to query available databases, as illustrated in Other flat file formats
Fig. 4 where four main classes of query methods
are distinguished: text-based methods, graphical interac- Comma/tab-separated files
tion–based methods, sequence-based methods, and TSV
query language–based methods. The most common
CSV
methods are the text-based ones, further grouped in two Attribute-value file
categories: free text and forms. In both cases, the query
can be formulated specifying some keywords. With the Genomic Databases, Fig. 5 Result formats
free text methods, the user can specify sets of words, also
combining them by logical operators. With forms, recently, in order to facilitate the spreading of infor-
searching starts by specifying the values to look for that mation in heterogeneous contexts, the XML format is
are associated to attributes of the database. sometimes supported.
Result Formats
In genomic databases, several formats are adopted to Cross-References
represent query results (see the taxonomy in Fig. 5).
Web interfaces usually provide answers encoded in ▶ Metabolic Networks, Databases
HTML, but other formats are often available as well.
For example, flat files are semistructured text files
where each information class is reported on one or References
more consecutive lines, identified by a code used to
characterize the annotated attributes. Often, special De Francesco E, Di Santo G, Palopoli L, Rombo SE
(2009) A Summary of genomic databases: overview and
formats have been explicitly conceived for biological
discussion. In: Sidhu AS, Dillon TS (eds) Biomedical data
data. An example is the FASTA format (Mount 2004), and applications, studies in computational intelligence.
commonly used to represent sequence data. More Springer, Berlin, pp 37–54
G 838 Genomic Datasets
Galperin MY (2007) The molecular biology database collection: epigenetic marking is transmitted through meiosis.
2007 update. Nucl Acids Res 35:D3–D4 Only a small subset of genes in the human genome
GUS (2006) The Gen. Unified Schema docum. http://www.
gusdb.org/documentation.php are subjected to genomic imprinting. If a gene is said to
Karp PD (2000) An ontology for biological function based on be maternally imprinted it means that the copy of the
molecular interaction. Bioinformatics 16:269–285 gene coming from the maternal side is repressed, as per
Mount DW (2004) Bioinformatics: sequence and genome the convention in the field (Ideraabdullah et al. 2008).
analysis, 2nd edn. Cold Spring Harbor Laboratory Press,
Cold Spring Harbor
Twigger S, Lu J, Shimoyama M et al (2002) Rat genome
database (RGD): mapping disease onto the genome. Nucl Cross-References
Acids Res 30(1):125–128
WormBase (2006) Db on biology and genome of C. Elegans
http://www.wormbase.org/ ▶ Epigenetics
References
Genomic Resources
Genomic Imprinting
Ricardo Cruz-Herrera del Rosario
Vani Brahmachari and Shruti Jain Genome Institute of Singapore, Singapore, Singapore
Dr. B. R. Ambedkar Center for Biomedical Research,
University of Delhi, Delhi, India
Synonyms
Vogel J, Searle SM (2011) Ensembl 2011. Nucleic Acids Res ¼ fwk ðX yk ÞpðX yk ; tÞwk ðXÞpðX; tÞ:
@t k¼1
39(Database issue):D800–D806
Frazer KA, Pachter L, Poliakov A, Rubin EM, Dubchak I (2004) (1)
VISTA: computational tools for comparative genomics.
Nucleic Acids Res 32(Web Server issue):W273–279
Although the analytical solution of the master equa-
Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler
AM, Haussler D (2002) The human genome browser at tion is rarely available, the density function can be
UCSC. Genome Res 12:996–1006 constructed numerically using the stochastic simulation
G 840 Glazier–Graner–Hogeweg Model
X
M
a0 ðXÞ ¼ wj ðXÞ;
j¼1
Global Linearization
where Dt 0 and m ¼ 1; M.
The reaction probability density function provides ▶ Quasilinearization
the basis for the SSA.
According to the joint density function (Eq. 2), the next
reaction and the time of its occurrence can be generated
through the direct method. Draw two random numbers r1 Global Maximum
and r2 from a uniform distribution in the unit interval
[0, 1]. The time to the next reaction Dt and the index of ▶ Global Optimum
the next reaction m, given X, can be taken as follows:
1 1
Dt ¼ lnð Þ; (3) Global Minimum
a0 ðXÞ r1
▶ Global Optimum
m ¼ the smallest integer satisfying
X
m
wj0 ðXÞ > r2 a0 ðXÞ: (4) Global Network Alignment
j0
Global network alignment problem is used to find the Jongen HT, Meer K, Triesch E (2004) Optimization theory.
Kluwer, Dordrecht
best overall alignment between the input networks. The
mapping for it should cover all of the input nodes. Each
node in an input network is either matched to one or more
nodes in the other network(s) or explicitly marked as
a gap node which has no match in another network Global Sensitivity Analysis
(Singh et al. 2008; Zaslavskiy et al. 2009).
Yunfei Chu and Juergen Hahn
Artie McFerrin Department of Chemical Engineering,
Cross-References Texas A&M University, College Station, TX, USA
References
Global State, Boolean Model
Saltelli A, Ratto M, Andres T, Campolongo F (2008) Global
sensitivity analysis: the primer. Wiley, New York
Xi Chen, Wai-Ki Ching and Nam-Kiu Tsing
Advanced Modeling and Applied Computing
Laboratory, Department of Mathematics, University of
Hong Kong, Hong Kong, China
Global Stability
Definition
Definition
In a Boolean model, a global state is a gene state vector
Global stability means that the attracting basin of consisting of all the gene expression states. A Boolean
trajectories of a dynamical system is either the state model actually consists of a set of n nodes (where each
space or a certain region in the state space, which is the node corresponds to a gene):
defining region of the state variables of the system.
In other words, global stability means that any trajec- V ¼ fv1 ; v2 ; . . . ; vn g
tories finally tend to the attractor of the system, regard-
less of initial conditions. Most of biological systems, and a list of Boolean functions (which represent the
e.g., gene regulatory systems, are needed to be globally regulatory rules for nodes). Define vi(t) to be the state
stable. (0 or 1) of the node vi at time t. Here we let:
To help understand global stability, here we give an
example. Consider the famous Lorenz system: vðtÞ ¼ ðv1 ðtÞ; v2 ðtÞ; . . . ; vn ðtÞÞT
GPGPU
The kinetic equations describing the Goodwin
oscillator are ▶ GPU Computing
dX v0
¼ k1 X
dt 1 þ ðX=KÞp
dY GPU
¼ v1 X k2 Y (1)
dt
dZ Luca Lombardi and Piercarlo Dondi
¼ v2 Y k3 Z Department of Computer Engineering and Systems
dt
Science, University of Pavia, Pavia, Italy
Here X, Y, Z are concentrations of mRNA, protein,
and protein product, respectively; v0, v1, and v2 deter-
mine the rates of transcription, translation, and cataly- Synonyms
sis; k1, k2, and k3 are rate constants for degradation of
each component; 1/K is the binding constant of protein Visual processing unit (VPU)
product to transcription factor; and p is a measure of
the cooperativity of the repressor.
▶ Hopf bifurcation analysis in Goodwin’s model Definition
shows that to obtain biochemical oscillation, the
cooperativity of the negative feedback must be very A Graphic Processing Unit (GPU) is a specialized
high, say p > 8, and the degradation rate constants of processor dedicated to the creation of the images visu-
the three components have to be nearly equal. alized on the screen. It implements a set of the most
Bliss, Painter, and Marr (1982) fixed these problems frequently used graphics primitive operations, so as the
by a slight modification of Goodwin’s model: CPU is not involved in the time-expensive graphical
computation. GPUs, originally designed for personal
dX v0 computers, are currently used in many other kinds of
¼ k1 X
dt 1þX devices, such as mobile phones, games consoles,
dY tablets, or embedded systems. In a PC a GPU can be
¼ v1 X k2 Y (2)
dt included as a standalone graphic card (usually in the
dZ Z mid-high-level solutions) or embedded in the mother-
¼ v2 Y k3 :
dt 1 þ Z=K board (in low-level models). Finally, at the end of
2010, CPUs with integrated GPU were released:
Now the feedback step is no longer cooperative, and a solution particularly efficient to reduce communica-
the uptake of protein product has a form of Michaelis- tion times among the processors and to limit battery
Menten function. consumption in mobile devices (as notebooks or
netbooks).
References
Cross-References
Bliss R, Painter P, Marr A (1982) Role of feedback inhibition in
stabilizing the classical operon. J Theor Biol 97:177–193 ▶ High-Performance Computing, Structural Biology
GPU Computing 845 G
Device Architecture (CUDA) of NVIDIA, that greatly
GPU Computing facilitate the programming of applications targeted at
GPUs. The use of GPUs for general-purpose applica-
Fransisco Vázquez, José Antonio Martı́nez and tions has exceptionally increased in the last few years
Ester M. Garzón due to the evolution of both GPU programming
Department of Computer Architecture and Electronics, resources and the semiconductors technology. In this
University of Almerı́a, Almerı́a, Spain way, GPUs have emerged as new computing platforms
that offer massive parallelism and provide incompara-
ble performance-to-cost ratio for scientific computa-
Synonyms tions (Kaeli and Leeser 2008). Currently, NVIDIA is
leading the GPU computing and its interface CUDA is
General-Purpose GPU; GPGPU the key of thousands of applications which are accel-
erated by the NVIDIA GPUs; a relevant percentage of
these applications are referred to the systems biology. G
Definition CUDA is based on the programming model SIMT
(Single Instruction, Multiple Threads), that is, every
The use of GPUs (Graphics Processing Units) to accel- instruction in the program is executed by hundreds of
erate the computation of scientific and engineering threads with different data. The mapping of the threads
applications. on the GPU cores is automatically carried out by
CUDA (Kirk and Hwu 2010).
An approach to facilitate the GPU programming is
Characteristics based on the use of basic routines or libraries which
(1) compute the most used operations in the applica-
At first the goal of the Graphics Processing Units tions and (2) are optimally accelerated by GPU. In this
(GPUs) was to accelerate the specific computation line, NVIDIA supplies a wide set of routines related to
related to the visualization in the computer; in this several kinds of applications such as CuBLAS,
way the CPU could be entirely dedicated to the com- CuFFT, and so on.
putation of general applications. So GPU hardware It should be noted that the computation related to
resources were specialized in graphic computation every application can be classified in two kinds:
and included several specific Arithmetic Units and • Computationally intensive, if it includes large
a particular memory hierarchy connected to the central sequences of arithmetic-logic operations over
process unit (CPU). Later, the role of GPUs has been large data sets. So, these operations can be com-
more relevant and the graphic software to exploit the puted by a large set of threads which are mapped on
GPUs was based on graphics-specific programming the high number of cores into the GPU and all of
languages like OpenGL and Cg with a substantial them execute the same sequence of instructions.
cost of programming. Then, the SIMT programming model is suitable
In the last decade the evolution of GPU technology for the computation and it can be accelerated by
has been based on the following: the massive parallelism of the GPU (see Fig. 1).
1. The amount of hardware resources has been signif- • Control-flow intensive, if the process is dominated
icantly increased. by control-flow operations, that is, decision points,
2. The GPU was only focused on graphic computa- in order to drive different instructions over different
tion, so it became an underutilized resource in the data. This computation is not suitable for the SIMT
computer. programming model and it is not accelerated by the
3. A wide range of applications in Science and Engi- GPU. This kind of computation can achieve better
neering share the characteristics of the graphic performance on the multicore CPU.
processes. Consequently, the GPU computing cannot substi-
Thus, the main recent advances in GPU technology tute the “CPU computing” because both can improve
have focused on the development of Application Pro- the performance of different kinds of computations
gramming Interfaces (APIs), such as Compute Unified included in the applications. So, currently, the
G 846 GPU Computing
Processes 2 and 4
They are computationally intensive, that is, these
processes compute the same operations over
large data sets. So, they are suited for the GPU
CPU GPU programming model (Single Instruction Mutliple
Threads). Their run-time can be strongly
reduced by means of GPU computing, thought
they need some CPU - GPU communication
processes.
GPU Computing, Fig. 1 GPU computing can strongly accelerate the computationally intensive procedures in the program
high-performance computation (HPC) is based on het- preserve the specimen from radiation damages. But,
erogeneous computation (GPU-multicore computing) as a consequence, the computed images are blurred,
and the effort of programming is relevant in this con- that is, they exhibit poor signal-to-noise ratio (SNR).
text, because the programmers have to identify both Then, in ET it is necessary to apply either a simple
types of computations and develop the program com- 3D-reconstruction method which produces blurred
bining two parallel interfaces. However, if one kind of images plus a sophisticated denoising method, or
computation dominates the application, then, only one sophisticated 3D-reconstructions which are capable
parallel interface can be considered. generating images with high resolution. Both alterna-
tives need long run-times to compute the reconstruc-
GPU Computing for Systems Biology tion because both are computationally intensive. So,
Electron Tomography (ET) is taken as an illustrative HPC is paramount in this context to cope with those
example related to the systems biology in order to computational needs.
analyze the role played by GPU computing in this The GPU computing has emerged as a new HPC
field. ET has emerged as the leading technique for the technique that offers massive parallelism. It is based on
structural analysis of unique complex biological spec- the GPUs platforms which are very suitable for their
imens. It combines electron microscopy with the integration in laboratories of structural biology due to
power of 3D imaging. ET has made it possible to their incomparable performance-to-cost ratio and easy
directly visualize the molecular architecture of organ- maintenance and use.
elles, cells, and complex viruses. Furthermore, ET The specific methods in ET are computationally
allows the identification of the macromolecular assem- intensive and can be strongly accelerated on GPU
blies in their native cellular environment and the study platforms. However, it is necessary to reprogram
of their distribution in 3D as well as their interactions. them or even to propose a new approach to define
ET has been crucial for recent breakthroughs in life the ET algorithms, in order to better exploit the paral-
sciences (see Lucic et al. (2005); Frank (2006) for lelism of the GPU platforms. In this line, several ET
reviews). approaches have already been proposed (Castaño-Diez
Computer-automated data collection has been et al. 2008; Vazquez et al. 2010; Xu et al. 2010).
essential for the advent of ET as a structural technique
in cellular biology. It allows to automate specimen Tomographic Reconstruction
tilting, area tracking, focusing, and recording of Tomographic reconstruction can be modeled as a least
images under low electron-dose conditions in order to square problem that can be solved by means of
GPU Computing 847 G
algorithms based on matrix operations (Herman 1980). Afterward, that matrix is used to reconstruct all the slices
Large sparse matrices are involved in these algorithms. by Sparse Matrix-Vector products (SpMV). So, the WBP
However, matrix data structures have not been tradi- method is computationally intensive and the acceleration
tionally included in the implementations due to their of SpMV is the key to achieve a high performance.
large memory requirements. As a consequence, the This matrix formulation of WBP allows to use CUDA
algorithms usually recompute of matrix coefficients routines previously developed, which efficiently accel-
when needed. Nonetheless, modern computers have erate the operation SpMV on the GPU. The routine based
large memory units available, so it is now possible to on the format called ELLPACK-R achieves highest per-
improve the performance of reconstruction algorithms formance on GPUs (Vazquez et al. 2011) and facilitates
by storing large matrices in core. the development based on CUDA of WBP. Additionally,
Assuming the single tilt axis geometry and using the particular operation SpMV (B ∗ ps) involved in the
voxels to represent the volume to be reconstructed, matrix implementation of WBP has some specific char-
the 3D reconstruction problem can be decomposed into acteristics that can be exploited to accelerate these oper-
a set of independent 2D reconstruction subproblems ations with GPUs. Using the general implementation of G
corresponding to the slices perpendicular to the tilt SpMV on GPU based on ELLPACK-R as a starting
axis. Each of the 2D slices of the volume can then be point, the three geometry-related symmetry properties
computed from the corresponding set of 1D projections allow a significant reduction of the memory require-
(usually known as sinogram). The reconstructed 3D ments to store the matrix B. The corresponding formats
volume is obtained by simply stacking the 2D slices. to compress the matrix related to these symmetries are
The standard method to solve this problem is referred as sym1, sym2, and sym3.
Weighted Back Projection (WBP) (Frank 2006). Briefly, The matrix WBP approach has been implemented
the method uniformly distributes the object mass present with CUDA and evaluated on a NVIDIA GeForce
at the projection images over computed backprojection GTX 295 GPU. In order to evaluate the use of matrix
rays. The intersection of the backprojection rays from the WBP and GPU computing, the speedup factors against
different images reinforces the density at the points the standard recomputation-based WBP on the CPU
where the mass is in the original structure. Therefore, were computed. As it is shown in the evaluation results
the mass of the object is reconstructed. Formally, the at Fig. 2, matrix WBP on the GPU yield excellent net
backprojection can be defined by means of the matrix speedups up to 165x, especially for huge datasets as
backprojection operator B as: commonly used in the tomography field.
Speed-up
100
80 69,50
60
40
20
0
128 192 256 384 512 768 1024
Datasets
GPU Computing,
Fig. 3 From left to right, the
original HIV-1 reconstruction
and the results with the noise
reduced at 10, 25, 50, 100,
150, and 200 iterations are
shown. Only a representative
slice of the 3D reconstruction
is presented
GPU Computing, Table 1 Run-times(s) of Beltrami Filter on a core of CPU based on 2 Quad Core Intel Xeon 2,26 Ghz and GPU
code on one Tesla C1060 card for 3D images of dimension V V V
V 128 192 256 384 512 640 768
Sequential Beltrami 8,2 28,2 98,7 335,5 800,5 1552,1 2729,3
GPU Beltrami 0,5 1,0 2,1 5,7 14,6 22,9 38,7
Speedup 16,4 28,2 47,0 58,9 54,8 67,8 70,5
vector, that is ∇I ¼ (Ix, Iy, Iz), Ix ¼ ∂I/∂x being the arithmetic operations over every voxel in the 3D
derivative of I with respect to x (similar applies for image. So, GPU computing is capable of accelerating
y and z); g denotes the determinant of the first funda- its computation due to a highly computationally inten-
mental form of the surface, which is g ¼ 1 + |∇I|2; and sive nature of the algorithm. This characteristic is
div is the divergence operator. frequently shared by most of the advanced filtering
Figure 3 is intended to illustrate the performance of methods. The results in Table 1 show that the acceler-
this method in terms of noise reduction and feature ation of Beltrami filter based on GPU computing is
preservation over a representative ET dataset that was very relevant, specially for larger images.
taken from the Electron Microscopy Data Bank (http://
emdatabank.org). Conclusions
To sum it up, the Beltrami Method is translated into Currently, the GPU computing is paramount to
an iterative method where every step includes the same cope with the high computational needs in the
Granular Computing 849 G
systems biology field. It is capable of accelerating
the processes computationally intensive. However Granular Computing
its performance decreases for control-flow inten-
sive processes. There is a wide range of applica- C. Maria Keet
tions in systems biology which can be accelerated KRDB Research Centre, Free University of Bozen-
by GPU computing, but their software must be Bolzano, Bolzano, Italy
reprogrammed or even the algorithms need
rethinking in order to adapt them to the GPU
platforms. Definition
a granular perspective on a particular domain of inter- Granularity, Table 2 Distinguishing characteristics at the
branching points of the top-level taxonomy of types of granular-
est, provided the granulation is carried out in
ity of Fig. 1
a consistent manner using a particular criterion for
Branching
granulation and type of granularity (mechanism of
point Distinguishing feature
granulation, see below). It can be proven that to be
sG – nG Scale (quantitative) – non-scale (qualitative)
able to have an instance of a granularity framework, it sgG – saG Grain size (scale on entity) – aggregation (scale of
requires at least two adjacent levels of granularity in entity)
a granular perspective; an informal example of sgrG – sgpG Resolution – size of the entity
a granular perspective is shown in Table 1. The gran- saoG – samG Overlay aggregated – entities aggregated
ular perspectives within one granularity framework according to scale
can be linked to each other, provided the contents in naG – nrG – Semantic aggregation – one type of relation
nfG between entities in different levels – different
the levels of different perspectives overlap; e.g., Insu-
type of relation between entities in levels and
lin is located at the Molecule-level as a subtype of relations among entities in level
Peptide and as a subtype of Hormone in a functional nacG – nasG Parent-child not taxonomic and relative
perspective. independence of contents of higher/lower level –
Granulation is the act of devising the granular parent-child with taxonomic inheritance
levels and perspectives and assigning contents to
those levels, which can be done manually or computa-
tionally. For data and information granulation, the categorizes the mechanism of granulation, of which
automated approach is generally called ▶ granular the eight principle types are described in Fig. 1 and
computing (Yao 2010), whereas at the knowledge Table 2 (see Keet 2008, Chap. 2 for motivation
layer with its entities at the intensional level (classes, and explanation). The main distinction is between
concepts, relationships, etc.), one uses processes such quantitative and qualitative granularity. Concerning
as ▶ modularization, ▶ abstraction, and expansion that qualitative granularity, nrG’s granulation relations
are constrained by the criterion for granulation and include part-whole relations that are either a type of
type of granularity that together determine uniqueness true parthood (▶ Mereology) or motivated by linguis-
of a granular perspective. tics (▶ Meronymy), such as structural part-of (e.g.,
Table 1), contained in, and being a member of some-
Types of Granularity thing. Examples for nfG are the MAPK cascade and
Good granular perspectives adhere to underlying prin- the Second messenger system with its components, and
ciples regarding how the levels are identified. This can clustering in ER models. nacG’s aggregate generally is
be structured in a taxonomy of types of granularity that termed with a meaningful collective noun and such
G 852 Granularity
Granularity, Table 3 Typical classification levels in ecosystems using as criteria for granulation similar spatial scales and
subdivisions in biotic and abiotic environments, resulting in seven granular perspectives (depicted as columns) that each contains
eight granular levels or less (rows). Each named level contains instances, such as Palearctic and Afrotropic in the Ecozone-level
Biotic Abiotic
Ecosystem Biogeography Zoogeography Phytogeography Physiography Geology Pedology
Ecozone Biome Floral kingdom
Ecoprovince Zoogeographic province Floral province Geoprovince
Ecoregion Bioregion Floral region Physio-region Georegion Pedoregion
Ecodistrict
Ecosection
Ecosite
Ecotope Biotope Zootope Phytotope Physiotope Geotope Pedotope
Ecoelement Bioelement Geoelement
that the instances of the aggregate are different from Granularity, Table 4 Two abbreviated granular perspectives
instances of its members and a change in its members (table columns), one for plant anatomy with parthood and one
time granularity, which are linked at three levels in the hierarchy
does not change the meaning of the whole. sgrG’s (denoted with “,”) so that cross-granular conditions can be
grain size with respect to resolution can also be applied asserted and, e.g., information retrieved from the information
in GIS; e.g., the city of Paris is represented on carto- system
graphic maps as polygon or as point. With sgpG, one Plant sample Time granularity
focuses on the physical size of the entities themselves; Tissue , Day
within one level one can distinguish instances of, say, ↕ ↕
<5 and 1 mm, but instances <1 mm are indistin- Cell , Hour
guishable from each other (but are distinguishable in ↕ ↕
a finer-grained level), such as sieves with different Molecule , Millisecond
pore sizes or two objects touching each other, like the
wallpaper and the wall where, when zoomed in, we
also observe the glue that connects the wallpaper to the
wall. samG has an associated mathematical function that, say, biological sample analysis of a cell culture –
for measured aggregation (1960s in a minute and so be it the structural components or the processes it is
forth). An example of granulation motivated by scale is involved in – yields information at a time granularity
included in Table 3. of hours (Table 4).
The interaction of granularity with notions such
Challenges as ▶ emergence, ▶ holism, and ▶ reduction is to be
Currently, there is still a gap between the ▶ knowledge explored further. Granularity can provide a methodo-
representation of granularity with its (onto)logical logical modeling framework to enable structured
foundations and implementations (▶ Granular Com- examination of claims of emergence both from
puting), being how to relate structured thinking and a formal ontological modelling and computational
structured problems solving, respectively (Yao 2005), angle by making the complex at least less complex,
in particular with respect to dealing with attributes in and aids understanding which levels are essential for
granular computing and its ontological counterpart explanation of some property of observed behavior.
of criterion of granulation. In addition, conditional
granularity across granular perspectives is to be
addressed in applications, such as linking time granu- Cross-References
larity (Euzenat and Montanari 2005) together with
qualitative granularity that already can be modeled ▶ Abstraction
unambiguously with the theory of granularity (Keet ▶ Emergence
2008; Vogt 2010). Examples for its need are abound: ▶ Granular Computing
for instance, in a multiscale analysis, one has to know ▶ Holism
Graph Algorithms in Network Analysis 853 G
▶ Mereology A directed graph is a tuple (V,E) where V is a set of
▶ Meronymy vertices and E {(x,y)x,y2V} is a set of arcs. There-
▶ Modularity fore, (x,y) and (y,x) are different arcs, while for an
▶ Modularization undirected graph, {x,y} and {y,x} are two representa-
▶ Organism State, Lymphocyte tions of the same edge.
▶ Reduction An edge {v,w} (or arc (v,w)) is said to be incident
▶ Top-Down Decomposition of Biological Networks with vertices v and w. v and w themselves are said to be
adjacent. The degree of a vertex is the number of edges
(arcs) incident with it.
References A labeled graph is a tuple (V,E,l) where (V,E) is
a graph (either directed or undirected) and l:V[E ! S
Bittner T, Smith B (2003) A theory of granular partitions. In: is a labeling function assigning to all vertices and
Duckham M, Goodchild MF, Worboys MF (eds) Founda-
edges (or arcs) labels from some alphabet of labels S.
tions of geographic information science. Taylor & Francis,
London, pp 117–151 A hypergraph is a graph whose edges (arcs) can link G
Euzenat J, Montanari A (2005) Time granularity. In: Fisher M, more than two vertices. A multigraph is a graph where
Gabbay D, Vila L (eds) Handbook of temporal reasoning in there may be more than one edge (arc) between a given
artificial intelligence. Elsevier, Amsterdam, pp 59–118
pair of vertices.
Keet CM (2008) A Formal theory of granularity. Ph.D. Thesis,
KRDB Research Centre, Faculty of Computer Science, Free Graph theory is the study of the properties of graphs
University of Bozen-Bolzano, Italy (Diestel 2002). Algorithmic graph theory is the study
Salthe SN (1985) Evolving hierarchical systems – their structure of algorithms to compute properties of graphs (Gross
and representation. Columbia University Press, New York
and Yellen 2004). ▶ Graph mining is the study of
Salthe SN (2001) Summary of the Principles of Hierarchy
Theory. http://www.nbi.dk/natphil/salthe/hierarchy_th.html. performing data mining on and machine learning
Accessed 10 October 2005. from data represented with graphs.
Vogt L (2010) Spatio-structural granularity of biological mate-
rial entities. BMC Bioinformatics 11:289
Yao YY (2005) Perspectives of granular computing. IEEE Conf
Granul Comput (GrC2005) 1:85–90 References
Yao JT (ed) (2010) Novel developments in granular computing:
applications for advanced human reasoning and soft compu- Diestel R (2010) Graph theory. Springer, Heidelberg
tation. IGI Global, Hershey Gross JL, Yellen J (2004) Handbook of graph theory. CRC Press,
Boca Raton
Graph
Definition
Characteristics
A graph is an abstract representation of a set of objects,
some of which are connected by links. The objects are Graph Theory
represented with vertices (also called nodes) and the We first define necessary terminologies in graph
links with edges or arcs. Several types of graphs are theory. For more details, see (Bondy and Murty 1976;
used depending on what is suitable for a particular West 2001).
application. A graph is a popular mathematical structure for
An undirected graph is a tuple (V,E) where V is a set modeling entities and the relationships between
of vertices and E {{x,y}x,y2V} is a set of edges. them. A graph G ¼ (V, E) consists of a set of
G 854 Graph Algorithms in Network Analysis
Graph Algorithms in a 1 b 1
Network Analysis,
Fig. 1 Example of (a) an
undirected graph, and (b)
a directed graph
2 3 4 2 3 4
a 1 b 1 c 1
2 3 2 3 2 3
Graph Algorithms in
Network Analysis,
Fig. 2 Example of (a)
a DAG, (b) a directed graph
with a cycle, and (c) the
directed cycle in (b) 4 4
vertices (or nodes) V and a set of edges E, where an edge A path in a graph is a sequence of vertices such that
e 2 E is defined by a pair of vertices v, w 2 V and denoted each vertex in the sequence is connected by an edge to
by e ¼ (u, v). We say that an edge e ¼ (u, v) is incident to the next vertex in the sequence. The length of a path
vertices u and v, and that u and v are adjacent to each is the number of edges in the path. A cycle is a path where
other. The vertices u and v are called the endvertices of the starting vertex is the same as the ending vertex.
e. The degree of a vertex v 2 V in a graph G is the A path or cycle is called Hamiltonian if it visits all the
number of edges incident to v. For example, vertex 3 of vertices of the graph exactly once. A graph is acyclic if it
the graph shown in Fig. 1a has degree 5. contains no cycles. A directed acyclic graph is called a
An undirected graph is a graph in which the edges DAG. For example, the directed graph in Fig. 2a is a
have no orientation. A directed graph (or digraph) is DAG, while the directed graph in Fig. 2b is not a DAG,
a graph in which the edges have an orientation, i.e., an since it contains the directed cycle shown in Fig. 2c.
edge e ¼ (u, v) is defined by an ordered pair of two A tree T is a connected simple graph with no cycles.
vertices v, w 2 V. For example, Fig. 1a shows an A vertex of degree 1 in a tree is called a leaf. A non-leaf
undirected graph, and Fig. 1b shows a directed graph. vertex is called an internal vertex. A rooted tree has
A weighted graph is a graph in which the edges have a special vertex called the root, and is often treated as
weights, such as integers or real numbers. a DAG with the edge directions originating from the
A loop is an edge e whose endvertices are the same root. A k-ary tree is a rooted tree in which every
vertex, i.e., e ¼ (v, v). An edge e ¼ (u, v) is called internal vertex has at most k children. In particular,
a multiple edge, if there is another edge e 0 ¼ (u, v) with a 2-ary tree is called a binary tree. For example, Fig. 3b
the same endvertices u and v. A multigraph is a graph shows a tree, and Fig. 3c shows a rooted binary tree.
with multiple edges. For example, the undirected graph A bipartite graph G ¼ (V, W, E) consists of two
in Fig. 1a has multiple edges between vertex 1 and disjoint vertex sets V and W (i.e., no two vertices in V
vertex 3, and the directed graph in Fig. 1b has a loop at are adjacent and no two vertices in W are adjacent), and
vertex 4. A simple graph has no multiple edges or an edge set E, where every edge e ¼ (v, w) has an
loops. In most cases, the graph refers to a simple undi- endvertex v 2 V and the other endvertex w 2 W.
rected graph. Figure. 3a shows an example of a bipartite graph.
Graph Algorithms in Network Analysis 855 G
Graph Algorithms in 2
Network Analysis,
b
a1 2 7
Fig. 3 Example of a (a)
bipartite graph, (b) a tree, and 12 6 1 3 c 1
(c) a rooted binary tree 2 3
5 8 4 5 6 7
4
3 4 5 9 10 11 8 9
a b c d
1 2 1 2 1 2 1 2
Graph Algorithms in G
Network Analysis,
Fig. 4 Example of (a)
a clique, (b) a subgraph of (a),
(c) a spanning subgraph of (a),
and (d) a spanning tree of (a) 4 3 3 4 3 4 3
a 1
b
5 8 2 3
2 3
1
7
Graph Algorithms in
Network Analysis,
Fig. 5 Example of (a) 4 6 9 4 5
a disconnected graph, and (b)
a graph with a cut vertex 10
A regular graph is a graph in which every vertex connected, if every pair of vertices is reachable; other-
has the same degree. For each n, the complete graph, wise, it is disconnected. A directed graph is weakly
denoted by Kn, is a simple graph with n vertices in connected, if it contains an undirected path for every
which every vertex is adjacent to all the other vertices. pair of vertices. It is strongly connected, if it contains
An (induced) subgraph G0 ¼ (V0 , E0 ) of a graph a directed path for every pair of vertices. A cut vertex is
G ¼ (V, E) is a graph where V0 is a subset of V, E0 is a vertex whose removal disconnects the remaining
a subset of E, and all endvertices of e 2 E0 are in V0 . subgraph. A bridge (or cut edge) is an edge whose
A subgraph G0 is a spanning subgraph of G if V ¼ V0 . removal disconnects a graph. If a graph is still
A spanning tree is a spanning subgraph that is a tree. connected after removing any k1 vertices, then the
For example, Fig. 4a shows complete graph K4, and graph is called k-connected. For example, the graph in
Fig. 4b shows a subgraph of K4. Fig. 4c shows a Fig. 5a is disconnected, and the graph in Fig. 5b is
spanning subgraph of K4, and Fig. 4d shows a spanning connected, with vertex 1 as a cut vertex.
tree of K4. The distance d(u, v) between two vertices u and v in
A pair of vertices v and w in a graph G is reachable, a graph G is the length of the shortest path between
if there is a path between v and w in G. A graph G is them. The eccentricity of a vertex v in a graph G is the
G 856 Graph Algorithms in Network Analysis
Graph Algorithms in a 1 b 1
Network Analysis,
Fig. 6 Example of (a)
Breadth-first search (BFS) and
(b) Depth-first search (DFS) 2 3 4 2 5 7
5 6 7 8 3 4 6 8
maximum distance from v to any other vertex. A center Both the DFS and BFS algorithms can be implemented
is a vertex with the minimum eccentricity. The diam- to run in O(|V| + |E|) time, i.e., linear in the size of the
eter of a graph G is the maximum eccentricity over all graph G. For details, see (Cormen et al. 2001; Goodrich
vertices in that graph. For example, the graph in Fig. 5b and Tamassia 2001). For example, Fig. 6 depicts exam-
has a diameter 2, and vertex 1 is the center of the graph. ples of BFS and DFS, where the number of a vertex
The distance between vertex 2 and vertex 5 is 2. represents the ordering performed by BFS and DFS.
4 5 6 4 5 6
5 10 5
a 2
4
4
b 2 4
2 15 3 2
3
Graph Algorithms in
Network Analysis, 7 20 1 6
1 6
Fig. 8 Example of a shortest 6 G
path: (a) a weighted graph, and
1 12 12
(b) the shortest path between 5
3 5 3 5
vertices 1 and 6
available choice at each stage of the algorithm. More shows a shortest path between vertex 1 and vertex 6 of
specifically, at each step of the algorithm, they add one the graph in Fig. 8a.
safe edge at a time (i.e., an edge that does not create There are two problems: the single-source shortest
a cycle), preserving the minimum weight spanning tree path problem and the all-pair shortest path problem.
property.
Kruskal’s algorithm first sorts the edges in Single-Source Shortest Path The aim of the single-
a nondecreasing order by their weights. It then finds source shortest path problem is to find shortest paths
a safe edge to add to the growing forest (i.e., a set of from a chosen source vertex v to all the other vertices in
disconnected trees), by finding an edge e ¼ (u, v) with the graph. For unweighted graphs, one can use breadth-
the minimum weight among all the edges that connect first search algorithm to compute a shortest path from v
any two trees in the forest. to all the other vertices. For weighted graphs, there are
Prim’s algorithm maintains a single tree T ¼ (VT, two well-known algorithms: Dijkstra’s algorithm and
ET) at each stage of the algorithm. The tree initially the Bellman-Ford algorithm.
only contains an arbitrary vertex v and grows until it Both algorithms produce a shortest path tree,
spans all the vertices in the graph. At each step, it finds starting from the source vertex v to all the other reach-
a safe edge with the minimum weight among all edges able vertices in the graph. Each vertex u has a value on
e ¼ (u, v), where u 2 V VT and v 2 VT (i.e., an edge shortest path estimate (initially assigned with a very
connecting a vertex in T and a vertex in G T). large value). Each step of the algorithms tries to
The running time of Prim’s algorithm and Kruskal’s decrease the estimate value, using the relaxation tech-
algorithm depends on the data structures that are used. For nique. More specifically, relaxing an edge e ¼ (u, w)
example, Prim’s algorithm can be implemented to run in tests whether one can improve the shortest path from v
O(|E|log|V|) time, using a binary heap data structure. to w by rerouting the path through u.
Kruskal’s algorithm can be implemented to run in O(|E| Dijkstra’s algorithm solves the single-source
log|V|) time, using a disjoint-set data structure. For faster shortest path problem on a weighted directed graph
implementations using more complex data structures, see G ¼ (V, E), with nonnegative edge weights. It also
(Cormen et al. 2001; Goodrich and Tamassia 2001). uses a greedy approach, and maintains a set S of ver-
tices whose final shortest-path weights from v have
Shortest Path already been determined. The algorithm repeatedly
The aim of the shortest path problem is to find a path chooses a vertex u 2 V S with the minimum
between two vertices in a weighted graph, such that the shortest-path estimate, adds u to S, and relaxes all the
sum of edge weight is minimized. For example, Fig. 8b edges leaving u. Dijkstra’s algorithm can be
G 858 Graph Algorithms in Network Analysis
1 1
cE ðuÞ ¼ ¼ (2) k=3
eðuÞ maxfdðu; vÞ : v 2 Vg
k=2
shows a blockmodel of the graph in Fig. 11a, defined Vertices u and v are automorphically equivalent if
according to structural equivalence. Block a consists v ¼ a(u) for some automorphism a. For example,
of vertex 1, block b consists of vertex 2, block e Fig. 11c shows a blockmodel of the graph in Fig. 11a
consists of vertex 5, and block f consists of the vertex as defined by automorphic equivalence. Block a
set {6, 7}. consists of vertex 1, block b consists of the vertex
There are two relaxations of structural equivalence: set {2, 3}, block c consists of the vertex set {4, 5},
automorphic equivalence and regular equivalence. and block d consists of the vertex set {6, 7}.
Informally, automorphically equivalent vertices have Computing the automorphisms of a given graph is
the same position in a network in a more abstract sense a computationally hard problem known as isomor-
than structurally equivalent vertices: They are not phism complete. However, for special classes of graphs
connected to the exactly same vertices, but to vertices such as trees and planar graphs, an automorphism can
that play analogous roles in the network. be computed efficiently in linear time. For details, see
To make this idea more formal, we need the concept (Brandes and Erlebach 2005).
of an automorphism. A permutation a of V is an auto-
morphism of the graph G ¼ (V, E) if, for each edge Network-level Analysis
(u, x) in G, (a(u), a(x)) is an edge in G, and vice versa. Graph Isomorphism An isomorphism between two
graphs is a mapping between the vertex sets of the two
graphs, which preserves adjacency. An isomorphism
1
a b a c of graphs G and H is a bijection f: V (G) ! V (H)
a between the vertex sets of G and H, such that any two
vertices u and v of G are adjacent in G if and only if f
(u) and f (v) are adjacent in H.
2 3 For example, Fig. 12a and b are isomorphic graphs,
b c b
since we can define a mapping as follows: {(4, a),
(1, b), (5, e), (2, c), (6, f), (7, g), (3, d), (8, h), (9, i),
(10, j)}. However, Fig. 12a and c are not isomorphic
4 5
d e c graphs, since there is no mapping between the two
vertex sets that preserves adjacency.
Testing whether two graphs are isomorphic is a com-
6 7 d putationally hard problem. The time complexity of the
f problem is not known for general graphs. However,
Graph Algorithms in Network Analysis, Fig. 11 Examples
the problem can be solved efficiently in linear time, for
of blockmodels of a graph in (a), based on (b) structural equiv- special classes of graphs such as trees and planar graphs.
alence, and (c) automorphic equivalence For details, see (Brandes and Erlebach 2005).
a
5 c 16 17 18
15
b
1 2 a b 11 12
6 e
Graph Algorithms in
Network Analysis, 7 j
Fig. 12 Example of
isomorphic graphs: the graphs 4 3 d c 14 13
in (a) and (b) are isomorphic
graphs, (c) nonisomorphic
8 9 10 h i f g 19 20
graph
Graph Alignment, Protein Interaction Networks 861 G
References Similarly to sequence alignment, it provides means
for functional and phylogenetic comparison of the
Bondy JA, Murty USR, Bondy JA, Murty USR (1976) Graph proteins.
theory with applications. North Holland, New York
Brandes U, Erlebach T (2005) Network analysis: methodologi-
cal foundations. Springer, Berlin
Cormen TH, Leiserson CE, Rivest RL, Stein C (2001) Introduc- Characteristics
tion to algorithms, 2nd edn. MIT Press/McGraw-Hill,
Cambridge, MA/New York
Graph Alignment Structure
Goodrich MT, Tamassia R (2001) Algorithm design. Wiley,
Chichester A graph alignment (▶ Network Alignment) compares
Junker BH, Schreiber F (2008) Analysis of biological networks. protein–protein interaction networks (Protein–Protein
Wiley, Hoboken Interaction Network) (PIN) and groups their proteins
Wasserman S, Faust K (1994) Social network analysis: methods
into equivalence (analogy) classes. In the case of
and applications. Cambridge University Press, New York
West D (2001) Introduction to graph theory. Prentice Hall, a pair-wise comparison, the equivalence is conveniently
Upper Saddle River represented by a pairing of the analogous proteins thus G
creating a map between the nodes (proteins) of the two
networks. The alignment of the links (protein–protein
interaction) results from the aligned nodes.
Graph Alignment Figure 1 represents a pair-wise graph alignment
between two PINs representing a part of the ribosomal
▶ Scoring Function, Graph Alignment complex in two distinct species. Proteins are represented
as the nodes of the network and the protein–protein
interactions as the links. The alignment is shown by
dashed lines, which interconnect the proteins in the
Graph Alignment, Protein Interaction same equivalence class. The alignment of nodes forces
Networks the alignment of links. Several modalities exist for the
aligned links. The links may be either present in both
Michal Kolář species resulting from matching protein–protein interac-
Institute of Molecular Genetics, Academy of Sciences tions or they may be absent in one or both species.
of the Czech Republic, Prague, Czech Republic In practice, the alignment of the two networks is
represented as in Fig. 2. The proteins belonging to the
same equivalence class are overlaid. The line type indi-
Synonyms cates presence or absence of the protein–protein interac-
tions in individual PINs and thus the modality of the link.
Alignment, protein interaction networks; Network An alignment of multiple networks is represented simi-
alignment, protein interaction networks larly, for example, see Fig. 3. There, the nodes represent
proteins of the same equivalence class (e.g., MRPL19,
RPLK, and RLX1). Some equivalence classes may be
Definition absent in one or more of the networks (e.g., MRPL4 and
HP1409 do not have an analogous protein in fish).
Graph alignment of protein–protein interaction A pair-wise graph alignment of networks G1(V1, E1)
networks (▶ Protein-Protein Interaction Networks) is and G2(V2, E2) is formally defined as a mapping A from
a method of comparison of protein interaction data the vertex (node) set V1 to the vertex set V2; A: i 2 V1 !
between two or more species and is one of the methods i0 2 V2. An edge (link) (i, j) 2 E1 is told to be aligned to
of ▶ comparative analysis of molecular networks. The an edge (i0 , j0 ) 2 E2 if and only if A(i) ¼ i0 and A(j) ¼ j0 .
method represents the interaction data in a form The type of the mapping A distinguishes whether
of a graph (▶ Graph, ▶ Protein-Protein Interaction the alignment is considered global or local. The global
Networks, ▶ Interactome) and constructs a mapping graph alignment (▶ Global Network Alignment) is
between the nodes of the graph (proteins) and the defined by a (total) injective mapping A. Thus, each
corresponding links (protein–protein interactions). node in the smaller network is aligned to some node in
G 862 Graph Alignment, Protein Interaction Networks
RPLD
RPLP RPLK
YML6
MRPL16 MRPL19
HP1409 UVRA
RPLO
MRPL4 MRP7
MRPL10
bacterium yeast
Graph Alignment, Protein Interaction Networks, Fig. 1 A sparse with only several interactions described in the databases.
small part of protein–protein interaction networks The yeast network on the other side forms an almost complete
(Protein–Protein Interaction Network) of a bacterium clique. The graph alignment, which is denoted by dashed lines,
(Helicobacter pylori) and yeast (Saccharomyces cerevisiae) may be used to predict physical interactions among the bacterial
representing protein components of the ribosome. Each node genes. The figure is based on Sharan et al. (2005) and updated
represents a protein and is labeled by its symbol. The links using STRING db version 8.3 (http://string-db.org)
stand for protein–protein interactions. The bacterial network is
the larger network. No two nodes from the smaller protein–protein interaction networks. Nodes (proteins)
network may be aligned to the same node in the larger in the same equivalence class are considered function-
network. The local graph alignment (▶ Local Network ally analogous.
Alignment) is defined by a partial injective mapping A.
Thus, only elements of a subset of the node set V1 are Score of a Graph Alignment
aligned to some nodes in V2. Still, no two nodes from The alignment is scored by interaction similarity
one network may be aligned to the same node in the and protein similarity. Each equivalence class of
other network. In some definitions of the graph align- aligned proteins (or a pair of aligned proteins in case
ment, the injective property of the mapping is lifted, of a pair-wise graph alignment) contributes to a node
allowing for alignment of two or more nodes of one score (▶ Node Score, Graph Alignment) Sn, which
network to the same node in the other network. This rewards similarity of the aligned proteins (e.g., level
option allows for the representation of, for example, of their homology, say a BLAST bit score) and penal-
protein duplications. izes similarity among proteins not respected by the
A multiple graph alignment (▶ Multiple Network graph alignment and its equivalence classes. Aligned
Alignment) of N graphs Gi(Vi,Ei), i ¼ 1, . . ., N is an protein pairs contribute a positive link score (▶ Link
equivalence relation A over the nodes V ¼ V1 [ . . . [ Score, Graph Alignment) Sl if the interaction between
VN. An equivalence relation is transitive and partitions the proteins is conserved in all or some networks. Thus,
V into a set of disjoint equivalence classes. A local the interaction between equivalence classes (MRPL16,
alignment is a relation over a subset of the nodes in V; RPLP, MRPL16) and (YML6, RPLD, MRPL4) in
a global alignment is a relation over all nodes in V. Fig. 3 would contribute a positive link score (Link
Figure 3 shows an example of an alignment of three Score, Graph Alignment), as the protein–protein
Graph Alignment, Protein Interaction Networks 863 G
both yeast
bacterium bacterium
YML6 yeast fish
YML6
RPLD
RPLD
MRPL4
applications, see references Sharan and Ideker (2006), complexity of the data leads also to its weakness: the
Chen et al. (2009), Stumpf and Wiuf (2009), Cannataro computational cost of the algorithms for finding of
et al. (2010), and Pržulj (2011). the optimal graph alignment is large.
Graph Clustering
▶ Modules in Networks, Algorithms and Methods First, in the transactional graph mining setting, databases G
▶ Network Clustering of separate, independent graphs are considered.
For example, in a molecule database, molecules are
commonly represented using one vertex for every atom
Graph Mining and one edge for every bond between two atoms. Large,
publicly available databases of chemical compounds
Jan Ramon include the NCI dataset (http://cactus.nci.nih.gov/) and
Declarative Languages and Artificial the ZINC dataset (http://zinc.docking.org/).
Intelligence Group, Katholieke Universiteit Leuven, Second, in the single (large) network setting, all data
Leuven, Belgium is represented in one large, connected network. Exam-
ples of such networks include the Internet, social net-
works, citation networks, concept networks, computer
Synonyms networks, chemical interaction networks, gene regula-
tory networks, socioeconomic networks, and encyclope-
Learning from graph structured data; Network analysis dias. Sample datasets are publicly available at, among
others, http://snap.stanford.edu/data/. In a chemical
interaction network, molecules are represented by verti-
Definition ces connected by chemical reactions. The level of detail
and the exact representation may be different among
Graph mining is the study of how to perform ▶ data datasets. For example, chemical reactions may be
mining and machine learning (▶ Identification of Gene represented as separate nodes in the network with arcs
Regulatory Networks, Machine Learning) on data from/to the participating compounds, or they may be
represented with graphs. One can distinguish between, implicit, in which case compounds which are involved
on the one hand, transactional graph mining, where in the same chemical reaction are just connected
a database of separate, independent graphs is considered with an undirected edge. Next to networks of chemical
(such as databases of molecules and databases of compounds, it is also common to consider higher-level
images), and, on the other hand, large network analysis, networks such as protein interaction networks and gene
where a single large network is considered (such as regulatory networks. For example, in gene regulatory
chemical interaction networks and concept networks). networks nodes represent genes and arcs between
nodes indicate that one gene codes for a transcription
factor regulating the other gene. In comparison to the
Characteristics transactional setting, an important challenge in the single
network setting is that one’s beliefs on all data may be
Graph-Structured Data dependent on one another. Most traditional machine
In many applications, it is natural to represent data learning techniques assume that examples are drawn
with ▶ graphs. One can distinguish two main settings. identically and independently (i.i.d.) (Fig. 1).
G 866 Graph Mining
Other abstractions are sometimes preferred to graphs maximum common subgraph operators. Kernels on
in order to represent similar data, such as relational data- molecular graphs such as presented in (De Grave and
bases and logic. The domains focusing on data mining Costa 2010) can be used with any kernel-based learn-
using these representations are called relational data min- ing (▶ Learning, Kernel-based) method such as sup-
ing and inductive logic programming, respectively. port vector machines and Gaussian processes. Metrics
Representing data with graphs has several advantages. and maximum common subgraph operators can be used
First, the representation language is simple and therefore, in instance-based learning approaches, or as features for
allows for the fast development of algorithms. Second, a wide range of classification algorithms (Schietgat et al.
the representation language is expressive and adequate 2010).
for the majority of applications. Finally, there is a vast
literature on efficient graph algorithms. A potential dis- Methods for Analyzing Large Networks
advantage, especially in order to use algorithms Analyzing Overall Network Regularity
implemented only for simpler graph representation, is An important starting point for many methods
that it may be necessary to transform the data into for analyzing large networks is the observation that
a simpler (but equally expressive) graph format in large real-world networks, independently of the domain,
a preprocessing step. satisfy a number of statistical regularities. For example,
many networks satisfy the small world model, which
Transactional Graph Mining Methods informally corresponds to the fact that the number of
Graph mining methods cover the whole range of highly connected nodes is much smaller than the number
methods from data mining and machine learning. We of low degree nodes. Also, many networks can be clus-
only list here a few examples of methods which tered in modules of nodes which are much better
received significant attention in the literature. connected to each other than to nodes in other modules.
As a consequence, much inspiration has come from ran-
Graph Pattern Mining dom graph theory (Bollobás 2001; Durrett 2007) and
Graph pattern mining methods perform ▶ pattern mining spectral graph theory (Chung 1997), which study the
on graph-structured data, i.e., they list all patterns which statistical properties of such graphs. Alon (2007) dis-
satisfy some interestingness criterium such as being cusses motifs in biological networks and the surprising
frequent. A frequent pattern is a pattern which is deviation of frequencies of certain motifs from what one
a subgraph of at least a certain fraction of the transaction would expect if the given network were completely
graphs in the database. Well-known graph mining systems random.
are gSpan (Yan and Han 2002) and Gaston (Nijssen and
Kok 2004). Predicting Node Properties
A popular strategy for the application of these systems Often however, in addition to network-level regulari-
and related ones to quantitative structure–property rela- ties, also a more detailed node-by-node analysis of
tionship (QSPR) modeling (i.e., the modeling of the rela- a network is necessary. Several approaches aim at
tionship between the structure of molecules and their modeling properties of nodes in a network. First, in
chemical properties) is to first generate frequent molecular the field of statistical relational learning (Getoor and
fragments, then to generate one boolean feature per pat- Taskar 2007), probabilistic models are being studied
tern (with value 1 for molecules having the pattern as which allow to reason about beliefs of the properties of
substructure and with value 0 for other molecules), and individual nodes and their connections in a Bayesian
then to apply some suitable ▶ classification algorithm network manner. Second, semi-supervised learning
(such as a support vector machine (▶ Biomedical (Zhu and Goldberg 2009) aims at learning predictive
Decision Support Systems)) on these features. models exploiting not only the information about the
training examples but also the information about the
Comparing Graphs unlabeled examples. This is especially useful in net-
In order to compare small graphs, such as molecular works where nodes and their connections are known,
graphs, one can use graph kernels, graph metrics, and but not the value of some target attribute.
Graphical Gaussian Model 867 G
Cross-References
Graphical Gaussian Model
▶ Biomedical Decision Support Systems
▶ Classification Zhong-Yuan Zhang
▶ Data Mining School of Statistics, Central University of Finance and
▶ Learning, Kernel-Based Economics, Beijing, China
▶ Learning, Relational
▶ Learning, Supervised
▶ Learning, Unsupervised Synonyms
▶ Pattern Mining
Concentration graph model; Covariance selection
model
References
G
Alon U (2007) An introduction to systems biology. Chapman & Definition
Hall/CRC, Boca Raton
Bollobás B (2001) Random graphs. Cambridge University Press,
Cambridge Graphical Gaussian model (CGM) (Crzegorxczyk et
Chung F (1997) Spectral graph theory. AMS, Providence al. 2008; Hache et al. 2009; Werhli et al. 2006) is
De Grave K, Costa F (2010) Molecular graph augmentation an undirected graph whose nodes are genes and
with rings and functional groups. J Chem Inf Model
two genes are linked by an edge if there is an
50(9):1660–1668
Durrett R (2007) Random graph dynamics. Cambridge University interaction between them. The interactions are mea-
Press, Cambridge sured by the partial correlation coefficients condi-
Getoor L, Taskar B (2007) An introduction to statistical tioned on all the other genes. After the correlation
relational learning. MIT Press, Cambridge, MA
coefficients are determined, a statistical significance
Nijssen S, Kok J (2004) A quickstart in frequent structure mining
can make a difference. In: Proceedings of the 10th ACM test is employed, and the two genes whose score is
SIGKDD international conference, pp 647–652 significantly large are considered to have an inter-
Schietgat L, Costa F, Ramon J, De Raedt L (2010) Effective feature action. Otherwise they are considered to be condi-
construction by maximum common subgraph sampling.
Machine Learning 83(2):137–161
tional independence and there is no edge between
Yan X, Han J (2002) gSpan: graph-based substructure pattern them. Under the assumption that the data are dis-
mining. In: Proceedings of the 2002 international conference tributed according to a multivariate Gaussian distri-
on data mining (ICDM’02), pp 721–724 bution, the partial correlation coefficient of gene
Zhu X, Goldberg AB (2009) Introduction to semi-supervised
x and gene y is given as:
learning. Synth Lect Artif Intell Machine Learn 3:1–130
C1 xy
scoreðx; yÞ ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
C1xx Cyy
1
Graph Partitioning
1
▶ Modules, Identification Methods and Biological where C1
xy is the element of C , inverse of the covari-
Function ance matrix C of the data.
▶ Network Clustering
Cross-References
Graph Study of Metabolic Networks
▶ Identification of Gene Regulatory Networks,
▶ Topology of Metabolic Reaction Networks Machine Learning
G 868 Graphical Model
References References
Crzegorxczyk M, Husmeier D, Werhli AV (2008) Reverse engi- Koller D, Friedman N (2009) Probabilistic graphical models.
neering gene regulatory networks with various machine MIT Press, Massachusetts. ISBN 0-262-01319-3
learning methods. In: Emmert-Streib F, Dehmer M (eds)
Analysis of microarray data: a network-based approach.
Wiley GmbH, Weinheim, KGaA
Hache H, Lehrach H, Herwig R (2009) Reverse engineering of
gene regulatory networks: a comparative study. EURASIP
J Bioinform Syst Biol 2009:617281
Graphical Representation of Biological
Werhli AV, Grzegorczyk M, Husmeier D (2006) Comparative Processes
evaluation of reverse engineering gene regulatory networks
with relevance networks, graphical Gaussian models and ▶ SBGN
Bayesian networks. Bioinformatics 22(20):2523–2531
Definition
C D
Global Optimization
Definition The nonlinearity and the constrained nature of the
system dynamics make usually these problems non-
The parameter estimation for ordinary differential equa- convex. This means that the NLP-DAE solution must
tions with grid computing refers to the exploitation of be searched with a global optimization (GO) method,
a combination of computer resources, usually character- since it is very likely that a local method would identify
ized by loose coupling, heterogeneity, and geographical a solution of local nature. GO strategies can be classi-
dispersion, to carry out the calculations required to fied in two broad classes, deterministic and stochastic:
identify a particular set of values for the parameters generally speaking, deterministic approaches can
included in continuous and deterministic models. assure a higher degree of assurance that the global
optimum will be reached, but no algorithm can guar-
antee the identification of the global optimum with
Characteristics certainty in finite time and, moreover, the computa-
tional cost increases very quickly with the problem
Systems biology kinetic models contain parameters that size; stochastic methods have weak theoretical guar-
usually describe physical and chemical properties of antees of convergence to global optimality, but can
GridMPI 871 G
locate the vicinity of it in modest computation time, after the previous swap. If this is the case, the script flags
and, moreover, are easy to implement and do not a specific field in the database that enables the exchange
require a transformation of the original problem when each process has completed the current step.
(Moles et al. 2003).
Evolution strategies (ES), a sub-class of nature-
inspired stochastic optimization methods belonging to Cross-References
the class of evolutionary algorithms (Fogel 2006), are
a good candidate to cope with NLP-DAE problems ▶ Convex Programming
exploiting grid platform. In fact, they have shown good ▶ Dynamical Systems Theory, Asymptotics and
performance when applied to parameter estimation in Singular Perturbations
biochemical pathways (Moles et al. 2003), and it is ▶ Evolutionary Algorithms
possible to implement them in a data parallel manner, ▶ Global Optimum
which is the best solution to obtain very good perfor- ▶ Grid Computing, Parallelization Techniques
mance with grid computing. More precisely, it is possi- ▶ High-Throughput Computing, Asynchronous G
ble to run different instances of an ES algorithm Communication
simultaneously, swapping periodically the best results ▶ Mathematical programming, Constraint
among the processes. Importantly, this approach speeds ▶ Mathematics, Nonlinear Programming
up the convergence to the optimal solution since a wider ▶ Ordinary Differential Equation (ODE)
search over the solutions space is performed. ▶ Parallel Computing, Data Parallelism
Distributed Implementation
To manage the distribution of each run of the ES References
algorithm on different computational resources and to
enable the asynchronous communication among these Fogel DB (2006) Evolutionary computation: toward a new phi-
losophy of machine intelligence. IEEE Press, Piscataway
runs, a software environment is required. A possible
Gunawardena J (2010) Models in systems biology: the parameter
solution, described in Mosca et al. (2008), involves problem and the meanings of robustness. In: Lodhi HM,
a relational database that stores the results achieved Muggleton SH (eds) Elements of computational systems
during each evolution process. More precisely, each biology. Wiley, Hoboken, pp 21–47
Moles CG, Mendes P, Banga JR (2003) Parameter estimation in
run is performed on a computational resource, which
biochemical pathways: a comparison of global optimization
can be completely independent of the others. The only methods. Genome Res 13(11):2467–2474
requirement is the availability of a communication Mosca E, Merelli I, Alfieri R, Milanesi L (2008) A distributed
network to establish a connection to the database. approach for parameter estimation in systems biology
models. Il Nuovo Cimento 32:165–168
Each run starts by contacting the main database to
Nelson DL, Cox MM (2005) Lehninger principles of biochem-
inform the system of its presence, then it randomly istry. WH Freeman, New York
initializes the population and runs the optimization.
After the accomplishment of each iteration, each pro-
cess stores its solutions along with the corresponding
cost function (that evaluates the quality of the solu- GridMPI
tion). If the other processes are ready to swap their
results, the algorithm downloads from the database Giuseppe Agapito
a subset of the solutions (e.g., sorted by the fitness Department of Experimental Medicine and Clinic,
value) from another randomly selected process and University Magna Graecia of Catanzaro,
adds them to its current set of results. Catanzaro, Italy
The coordination of the swap among the processes is
delegated to a script which runs in close association
with the database. This script queries the database to Definition
check if all processes are ready to exchange their indi-
viduals, which means that they have performed GridMPI is an open source project with the aim
a minimum number of iterations from the beginning or to provide an efficient and easy use of the MPI
G 872 Groove (G) Domain
(Message Passing Interface) standard in the Grid envi- to the major histocompatibility (MH) superfamily
ronments. GridMPI unlike from MPI comes with (MhSF) (▶ MH Superfamily (MhSF)). The G domain
a collection of functions and services specific for comprises the G-DOMAIN of the MH and the G-
the Grid environments that allow the user to over- LIKE-DOMAIN of the MhSF proteins other than MH
come in an easy way problems typical of the Grid, (or related proteins of the immune system (RPI)-
related with the development of Grid applications. MH1Like).
Typical examples regarding such problems often A G domain (G-DOMAIN or G-LIKE-DOMAIN)
include a connection between different resources is usually encoded by one exon of a gene. Two
and organizations in order to improve the compu- G domains participate to the characteristic groove
tation costs associated with scientific applications structure that, in the MH, binds a peptide. Each
tipically performed by limiting the latency time, G domain is very conserved and comprises four anti-
and managing and managing security problems parallel beta strands (floor of the groove) and a helix
automatically and in a transparent way. (wall of the floor). In the MH class I (MH1) or in the
RPI-MH1Like, the two G domains belong to the same
chain (MH1-Alpha, or RPI-MH1Like-Alpha), whereas
in the MH class II (MH2), the two domains belong
Cross-References
to different chains, MH2-Alpha and MH2-Beta
(Lefranc et al. 2005). The G-domain description per
▶ Grid Computing, Parallelization Techniques
receptor type and chain type, based on the ▶ IMGT-
ONTOLOGY concepts, is shown in Table 1. IMGT ®
labels are in capital letters.
Analysis of G-domain amino acid sequences can be
Groove (G) Domain performed by tools of IMGT®, the international
ImMunoGeneTics information system ® (http://www.
Marie-Paule Lefranc imgt.org) (▶ IMGT ® Information System).
Laboratoire d’ImmunoGénétique Moléculaire, IMGT/DomainGapAlign tool (Ehrenmann et al.
Institut de Génétique Humaine UPR 1142, Université 2010) aligns the user amino acid sequences with the
Montpellier 2, Montpellier, France closest G domains of the IMGT domain reference
directory, creates gaps according to the ▶ IMGT
unique numbering for G domain (Lefranc et al.
Synonyms 2005), delimits the strands, loops and helix, highlights
differences with the closest reference(s), and generates
G domain; G type domain; Groove domain the IMGT Colliers de Perles (▶ IMGT Collier de
Perles).
For MH proteins with known three-dimensional (3D)
Definition structures, IMGT Colliers de Perles are used for compar-
ison of pMH contact analysis (Kaas et al. 2008). Contact
The Groove (G) domain is a type of structural unit analysis between V-ALPHA and V-BETA (▶ Variable
(domain) that characterizes a protein chain belonging (V) Domain of the T cell receptor TR-Alpha, and
Groove (G) Domain, Table 1 G-domain description per receptor type and chain type
G domain Receptor type G-domain description per chain type
G-DOMAIN MH class I (MH1) G-ALPHA1 On the same chain
G-ALPHA2
MH class II (MH2) G-ALPHA
G-BETA
G-LIKE-DOMAIN MhSF other than MH (RPI-MH1Like) G-ALPHA1-LIKE On the same chain
G-ALPHA2-LIKE
Grouping 873 G
TR-Beta chains, respectively) and G-ALPHA1 and G- Lefranc M-P, Duprat E, Kaas Q, Tranne M, Thiriot A, Lefranc G
ALPHA2 (G domains of the MH1-Alpha chain) or G- (2005) IMGT unique numbering for MHC groove
G-DOMAIN and MHC superfamily (MhcSF) G-LIKE-
ALPHA and G-BETA (G domains of the MH2-Alpha DOMAIN. Dev Comp Immunol 29:917–938
and MH2-Beta chains, respectively) are available in the
3D database (IMGT/3Dstructure-DB) of the ▶ IMGT®
information system (Ehrenmann et al. 2010).
Groove Domain
Cross-References
▶ Groove (G) Domain
▶ IMGT Collier de Perles
▶ IMGT Unique Numbering
▶ IMGT ® Information System
▶ IMGT-ONTOLOGY Ground Substance G
▶ MH Superfamily (MhSF)
▶ Variable (V) Domain ▶ Extracellular Matrix
References
Grounding
Ehrenmann F, Kaas Q, Lefranc M-P (2010) IMGT/3Dstructure-
DB and IMGT/DomainGapAlign: a database and a tool for ▶ Entity Mention Normalization
immunoglobulins or antibodies, T cell receptors, MHC, IgSF
and MhcSF. Nucleic Acids Res 38:D301–307
Kaas Q, Duprat E, Tourneur G, Lefranc M-P (2008) IMGT
standardization for molecular characterization of the T cell
receptor/peptide/MHC complexes. In: Schoenbach C, Grouping
Ranganathan S, Brusic V (eds) Immunoinformatics,
Immunomics reviews, series of springer science and business
media LLC. Springer, New York, pp 19–49 (Chap. 2) ▶ Abstraction
H
Hairpin Characteristics
Definition
codon during the translation process is located on one
of the unpaired loops in the tRNA. Two nested hairpin Half-life time (t1/2) of a drug in a tissue is the time
structures occur in RNA pseudoknots, where the loop it takes to decrease by one half from its maximum
of one structure forms part of the second stem. concentration. If the decay law is exponential with
Many ribozymes also feature hairpin structures. exponent l, i.e., obeys first-order kinetics with rate l,
dt ¼ lX, then t1=2 ¼ l . If the decay curve
i.e., dX ln 2
The self-cleaving hammerhead ribozyme contains
three stem-loops that meet in a central unpaired region looks like a sum of two exponential curves, then in
where the cleavage site lies. The hammerhead a phenomenological perspective it can be modeled as a
ribozyme’s basic secondary structure is required for solution curve to second-order kinetics or equivalently
self-cleavage activity. a chain of two first-order kinetics with exponents l and
The mRNA hairpin structure forming at the m (l > m), and then pharmacologists define a t1/2a and
ribosome binding site may control an initiation of a t1/2b (t1/2a < t1/2b) measurable from the decay curve,
translation. which are nothing but lnl2 and lnm2 , respectively.
Hairpin structures are also important in prokaryotic A semiphysiological interpretation can be found to
rho-independent transcription termination. The hairpin such kinetics, consisting of “distribution” (i.e., fast
loop forms in an mRNA strand during transcription diffusion) followed by slow elimination (renal, or by
and causes the RNA polymerase to become dissociated binding to plasma proteins). Sometimes are also
from the DNA template strand. This process is known mentioned three half-life times, t1/2a, t1/2b, and t1/2g
as rho-independent or intrinsic termination, and the when three consecutive episodes are clearly distin-
sequences involved are called terminator sequences. guishable in the decay curve.
References HAM/TSP
Svoboda P, Di Cara A (2006) Hairpin RNA: a secondary
structure of primary importance. Cell Mol Life Sci ▶ Human T-Lymphotropic Virus Type-I-associated
63(7–8):901–908 Myelopathytropical Spastic Paraparesis
HapMap 877 H
Cross-References
Hamartoma
▶ Explanation, Evolutionary
Barbara J. Davis
Section of Pathology, Tufts Cummings School
of Veterinary Medicine Biomedical Sciences, References
North Grafton, MA, USA
Frank SA (1998) The foundations of social evolution. Princeton
University Press, Princeton
Hamilton WD (1964) The genetical evolution of social behav-
Definition iour. J Theor Biol 7:1–52
Van Veelen M (2007) Hamilton’s missing link. J Theor Biol
Hartoma is a disorganized benign mass of 246(3):551–554
tissue found in the particular site at which it
develops and, therefore, considered a developmental
malformation.
Haplotype Map H
Cross-References ▶ HapMap
▶ Cancer Pathology
HapMap
Jingky Lozano-K€uhne
Hamilton’s Rule Department of Public Health, University of Oxford,
Oxford, UK
Philippe Huneman
Institut d’Histoire et de Philosophie (IHPST), des
Sciences et des Techniques, Université Paris 1 Synonyms
Panthéon-Sorbonne, Paris, France
Haplotype map; International HapMap project
Definition
Definition
Due to Hamilton (1964), the rule concerns the fix-
ation by natural selection of an altruistic behavior: The shortened term HapMap (from Haplotype Map) is
an act with cost c for its actor and benefit b for its generally used to refer to the International HapMap
beneficiary will evolve if and only if c < br, where project that aims to find genes affecting health, disease,
r is the ▶ relatedness of the beneficiary to the focal and responses to medications and environmental
individual. C and b are measured in terms of factors. The information produced by the project is
▶ fitness. Originally supposed to underpin the pro- stored in the HapMap database and made freely
cess of kin selection, it proved to be the general available to the public (Thorisson et al. 2005).
rule for the evolution of cooperation, given that
many cases where cooperation emerges among
non-kin behave according to such rule; the main
reason is that relatedness as such measures a statis-
References
tical association between individuals rather than
Thorisson GA, Smith AV, Krishnan L, Stein LD (2005) The
a degree of kinship, even if the latter yields ipso International HapMap Project Web site. Genome Res
facto an association. 15:1591–1593
H 878 Hardy-Weinberg Law
0.8
0.6
Pr(T>t)
0.4
0.2
H
ALL
AML Low Risk
0.0 AML High Risk
1.0 chemotherapy
chemotherapy + radiotherapy
0.8
0.6
Pr(T>t)
0.4
0.2
0.0
Hazard Ratios,
Fig. 2 Survival for the 0 500 1000 1500 2000 2500 3000
90 gastric cancer patients t (days)
H 880 Health Informatics
References References
Copelan EA, Biggs JC, Thompson JM, Crilley P, Szer J, Coiera E (2003) Guide to health informatics. Hodder, London
Klein JP, Kapoor N, Avalos BR, Cunningham I,
Atkinson K, Downs K, Harmon GS, Daly MB, Brodsky I,
Bulova SI, Tutschka PJ (1991) Treatment for acute myelo-
cytic leukemia with allogeneic bone marrow transplantation
following preparation with BuCy2. Blood 3(1):838–843 Helper T Cell 1 Response
Stablein DM, Koutrouvelis IA (1985) A two-sample test sensi-
tive to crossing hazards in uncensored and singly censored
data. Biometrics 41(3):643652 ▶ Th1 Response
Proteins involved in
DNA metabolism HP1 family proteins (H3K9me)
Polycomb family proteins (H3K27me)
Heterochromatin
Spreading
Heterochromatin
Histone
modifying
protein Euchromatin
Similarly, various combinations of histone modifica- which convert neighboring euchromatic nucleosome to
tions in euchromatin attract various kinds of proteins that heterochromatic state (Fig. 2). To avoid the uncontrolled
regulate transcription, resulting in wide range of tran- silencing of the gene expression by spreading of hetero-
scriptional states, from silent state to active state. chromatin, the “boundaries” between euchromatin and
One characteristic of heterochromatin is “spreading”; heterochromatin should be strictly regulated.
heterochromatin can autonomously spread into neigh- Heterochromatin and euchromatin are determined
boring euchromatic region, resulting in ▶ position effect by histone modification. Therefore, primary determi-
variegation of neighboring genes. The exact mechanism nant of the boundaries would be the competition
of spreading is not clear yet, but a simple model is widely between the heterochromatic modifying enzymes
accepted that heterochromatin protein recruits histone- and the euchromatic ones. Indeed, the competitive
modifying enzymes via protein-protein interaction, determination mechanism is observed at the boundary
Heterochromatin and Euchromatin 883 H
Heterochromatin and
Euchromatin, Histone Chromatin
modifying remodeling
Fig. 3 Boundary elements/
protein factor
barrier insulators
Heterochromatin
TFIIIC
CTCF Euchromatin
BEAF
Boundary element
(Barrier insulator) H
(Vogelmann et al. 2011). In addition, tRNA genes as Sun JQ, Hatanaka A, Oki M (2011) Boundaries of transcription-
well as the boundary proteins including TFIIIC form ally silent chromatin in Saccharomyces cerevisiae. Genes
Genet Syst 86:73–81
cluster in nucleus (Noma et al. 2006). The association Vogelmann J, Valeri A, Guillou E, Cuvier O, Nollmann M
with nuclear structure or self-clustering of the (2011) Roles of chromatin insulator proteins in higher-order
boundary elements results in the formation of chro- chromatin organization and transcription regulation. Nucleus
matin loop, and this chromatin loop defines silent 2:358–369
heterochromatic domain and active euchromatic
domain of the genome. Therefore, the boundary
elements regulate not only the boundary between Heterochromatin Protein 1
heterochromatin and euchromatin but also domain
structure of chromatin in the nucleus, which might Yota Murakami
be important for regulation of whole genome orga- Department of Chemistry, Hokkaido University,
nization (Vogelmann et al. 2011) Fig. 4. Sapporo, Japan
Synonyms
Cross-References
HP1
▶ Centromere
▶ Chromatin
▶ Euchromatin Definition
▶ Heterochromatin
▶ Heterochromatin Protein 1 Heterochromatin protein 1 (HP1) is widely
▶ Histone Post-translational Modification to conserved protein that localized at heterochromatin.
Nucleosome Structural Change HP1 is a main component of constitutive heterochroma-
▶ Nuclear Pore tin that localizes on repeated sequences including
▶ Nucleosomes centromere and telomere. HP1 contains two distinct
▶ Polycomb Complexes domains, chromo domain and chromo-shadow domain.
▶ Position Effect Variegation Chromo domain of HP1 specifically recognizes di- or tri-
▶ RNA Polymerase methylated lysine 9 of histone H3. Chromo-shadow
domain promotes dimerization of HP1, which is thought
to be responsible for the packaging of nucleosomes in
heterochromatin. In addition, many proteins interact
References with HP1 through chromo-shadow domain (Grewal and
Jia 2007). Those proteins play various roles in the
Grewal SI, Jia S (2007) Heterochromatin revisited. Nat Rev
Genet 8:35–46
regulation of heterochromatin function (Fanti and
Ishii K, Arib G, Lin C, Van Houwe G, Laemmli UK Pimpinelli 2008).
(2002) Chromatin boundaries in budding yeast: the nuclear
pore connection. Cell 109:551–562
Kimura A, Umehara T, Horikoshi M (2002) Chromosomal gradi-
ent of histone acetylation established by Sas2p and Sir2p func-
Cross-References
tions as a shield against gene silencing. Nat Genet 32:370–377
Lunyak VV (2008) Boundaries. Boundaries. . .Boundaries??? ▶ Heterochromatin and Euchromatin
Curr Opin Cell Biol 20:281–287
Noma KI, Cam HP, Maraia R, Grewal SIS (2006) A role for
TFIIIC transcription factor complex in genome organization.
Cell 125:859–872
Scott K, Merrett S, Willard H (2006) A heterochromatin barrier References
partitions the fission yeast centromere into discrete chroma-
tin domains. Curr Biol 16:119–129 Fanti L, Pimpinelli S (2008) HP1: a functionally multifaceted
Shimada A, Murakami Y (2010) Dynamic regulation of hetero- protein. Curr Opin Genet Dev 18:169–174
chromatin function via phosphorylation of HP1-family pro- Grewal SI, Jia S (2007) Heterochromatin revisited. Nat Rev
teins. Epigenetics 5:30–33 Genet 8:35–46
Heuristic Search 885 H
The common disadvantages of heuristic optimiza-
Heuristic Optimization tion algorithms are as follows:
1. Not absolute optimal solution: Heuristic algorithms
Feng-Sheng Wang and Li-Hsunan Chen cannot guarantee to find the optimal solution.
Department of Chemical Engineering, National Chung 2. Uncertainty: The time required for finding a “near-
Cheng University, Chiayi, Taiwan optimal” solution can be large in an unlucky case.
Definition Cross-References
Tatsuhiko Someya1, Nobukazu Nameki2 and Marie Lisandra Zepeda-Mendoza and Osbaldo
Gota Kawai3 Resendis-Antonio
1
University of Tsukuba, Tsukuba, Japan Center for Genomics Sciences-UNAM, Universidad
2
Gunma University, Kiryu, Japan Nacional Autónoma de México, Cuernavaca,
3
Department of Life and Environmental Sciences, Morelos, Mexico
Chiba Institute of Technology, Narashino,
Chiba, Japan
Synonyms
Definition Definition
Escherichia coli Hfq is a 102 residues protein Cluster analysis consists to classify a set of
that is first identified as a host factor of the RNA objects (observations, individuals, cases) into sub-
phage Qb (Chao and Vogel 2010). Hfq orthologous sets, called clusters, such that they have similar
proteins are conserved among approximately half of characteristics or properties. There are different
all sequenced Gram-positive and Gram-negative bacte- ways to define the similarities among objects or
ria (Brennan and Link 2007). Hfq protein forms variables through the use of different metrics.
homohexameric ring and binds to poly(A) and single- Some of them are as follows:
stranded AU-rich RNAs at two distinct binding surfaces • The single-linkage clustering, or nearest neighbor
(Brennan and Link 2007). In Gram-negative bacteria, clustering, takes into account the shortest distance
Hfq promotes base-pairing between sRNA and its target of the distances between the elements of each clus-
mRNA by inducing the RNA structural changes acting ter. This is one of the simplest methods.
as an RNA chaperone (Jousselin et al. 2009). The • The complete linkage clustering, or farthest neigh-
sRNA–mRNA interactions are known to regulate the bor clustering, takes the longest distance between
expression of several genes (Jousselin et al. 2009). the elements of each cluster.
Hierarchical Structure 887 H
• The average linkage clustering takes the mean of
the distances between the elements of each cluster. Hierarchical Structure
The merged clusters are the ones with the minimum
mean distance. Marie Lisandra Zepeda-Mendoza and Osbaldo
There are a variety of clustering algorithms; one Resendis-Antonio
of them is the agglomerative hierarchical clustering. Center for Genomics Sciences-UNAM, Universidad
This clustering method helps us to represent Nacional Autónoma de México, Cuernavaca, Morelos,
graphically the results through a dendogram. The Mexico
dendogram has a tree structure that consists of the
root and the leaves; the root is the cluster that has all
the observations, and the leaves are individual obser- Synonyms
vations. The agglomerative hierarchical clustering
starts with the individual observations and succes- Hierarchical modularity; Hierarchical organization
sively fuses the clusters that are closer together
(the most similar ones).
Definition H
Cross-References A hierarchical structure in a network context is
characterized by being a topological structure in
▶ Modules, Identification Methods and Biological which nodes are placed in different layers; and in
Function these layers there can be small and highly connected
modules. The nodes in one layer collect influences
from each other and from the nodes of the superior
References layer, while also being able to influence nodes of the
inferior layer.
Jain AK, Dubes RC (1998) Algorithms for clustering data. The transcriptional regulatory network of
Prentice Hall, Englewood Cliffs
Escherichia coli shows a clear example of this view
Jain AK, Murthy MN, Flynn PJ (1999) Data clustering: a review.
ACM Comput Rev 31(3):264–323 of hierarchical structure. The nodes are organized in
layers where every node of one layer can receive
inputs from nodes in upper layers and can be linked
to nodes in lower layers (Resendis-Antonio et al.
2005).
Hierarchical Models In metabolic networks, the high size-independent
clustering coefficient (which is evidence for modu-
▶ Mixed and Multi-Level Models larity) and the power law degree distribution (which
supports the scale-free model that would rule out
modular topology) pose an apparent contradiction
that can be solved by proposing the existence
of metabolic hierarchical modules (Ravasz et al.
Hierarchical Modularity 2002). This model presents a C(k)k1, and this
property can be used as a quantitative indicator of
▶ Hierarchical Structure hierarchy.
To better understand this model, imagine a starting
point of a small cluster of four densely connected
nodes. Next generate three replicas of the starting
module and connect them to the central node of the
Hierarchical Organization first starting cluster, obtaining a large 16-node module
made of 4 smaller modules. Then generate 3 replicas of
▶ Hierarchical Structure this 16-node module and connect the peripheral nodes
H 888 Hierarchy
▶ Hierarchy
▶ Modules, Identification Methods and Biological
Function Highly Structured System
▶ Module Network
▶ Complex System
References
High-Performance Computing,
Hierarchy Structural Biology
Shihua Zhang
Piercarlo Dondi and Luca Lombardi
National Center for Mathematics and Interdisciplinary
Department of Computer Engineering and Systems
Sciences, Academy of Mathematics and Systems
Science, University of Pavia, Pavia, Italy
Science, Chinese Academy of Sciences, Beijing, China
Synonyms
Synonyms
Parallel computing
Hierarchical structure
Definition
Definition
High Performance Computing (HPC) refers to
Hierarchy is defined as an arrangement of items or
technologies used for implementing systems able to
simply an ordered set. In networks, hierarchy describes
execute time expensive elaborations and to manage
the hierarchical relationship among nodes (Ravasz
a huge amount of data in a small amount of time.
et al. 2002).
HPC solutions are commonly exploited in different
scientific fields that require the solution of complex
mathematical models, like climatology, physics,
References medicine, or biology. One of the most recent innova-
tions, which presents a good compromise between
Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabási AL
(2002) Hierarchical organization of modularity in metabolic hardware cost and performances, is the use of the
networks. Science 297(5586):1551–1555 ▶ GPU for parallel computation. This technology
High-Performance Computing, Structural Biology 889 H
High-Performance Serial Execution
Computing, Structural
Biology, Fig. 1 Serial versus
parallel computation, the
efficiency of parallelization
depends on the relative portion Op1 Op2 Op3 Op4 Op5
of parallelizable operations
with respect to the total
(Eq. 3). Not parallelizable Parallelizable Operations Not parallelizable
Operation Operation
Parallel Execution
Op2
H
Op1 Op3 Op5
Op4
supplies promising results in simulation and modeling processors must operate independently and have
of biological systems and in real-time medical a workload equally distributed between them; the
analysis. computational time has the priority, time needed for
the communications; and the synchronization between
processors and for data transfer has to be reduced to
Characteristics the minimum (Culler et al. 1998). Figure 1 shows an
example of serial and parallel execution of a simple
Parallel Computing algorithm.
HPC (▶ Large-Scale and High-Performance Comput- Nearly all situations in which a parallel
ing) comes from the need of more and more great implementation is useful can be reduced to two cases: a
computational power to elaborate complex mathemat- general case in which different independent instructions
ical models and to evaluate new scientific theories. can be executed on different data, or a single operation
Technical and physical constraints limit the maximum that must be executed on a large amount of data (e.g.,
speed reachable by a single CPU, so all HPC solutions meteorological analysis – each unit examines in the same
involve the use of parallel computing resources, in way a small portion of a geographic area) (Hord 1998).
which multiple processing units cooperate to complete In most cases, biological and medical simulations fall
a task in a time as small as possible. Not all types of into the latter category.
problems can be parallelized (e.g., a computation in The performance of a parallel algorithm are usually
which each step depends on the outcomes of the measured by two related indexes: the speedup (Eq. 1),
previous one) and also if the parallelization is possible, that is, the ratio between the execution time with one
some conditions must be satisfied to work properly. processor and that with N processors, and the efficiency
It is crucial to redesign the program (the so-called (Eq. 2), that is, the ratio between the speedup and the
porting of the application) in a parallel way: the number of processors:
H 890 High-Performance Computing, Structural Biology
1500
1000
Geforce GTX280
750
High-Performance Computing, Structural Biology, Fig. 2 Growing rate of computational power for CPU and GPU in terms of
single precision floating point operations per second – based on data presented in Kirk and Hwu (2010)
References
Hill Equation
Culler D, Singh JP, Gupta A (1998) Parallel computer architec-
ture: a hardware/software approach. Morgan Kaufmann,
Pramod R. Somvanshi and Kareenhalli V. Venkatesh
San Francisco
Hardwick J (2009) Embedded Tesla-using CUDA and Tesla in Department of Chemical Engineering, Indian Institute
a medical device. In: GTC09, GPU technology conference of Technology Bombay, Powai, Mumbai,
2009, 30 Sept–2 Oct, San Josè Maharashtra, India
Hord RM (1998) Understanding parallel supercomputing.
Piscataway, IEEE
Kirk D, Hwu W (2010) Programming massively parallel
processors: a hands-on approach. Morgan Kaufmann, Synonyms
Burlington
Stone JE, Phillips JC, Freddolino PL, Hardy DJ, Trabuco LG,
Hill function; Hill kinetics
Schulten K (2007) Accelerating molecular modeling appli-
cations with graphics processors. J Comput Chem
28:2618–2640
Stone JE, Saam J, Hardy DJ, Vandivort KL, Hwu WW, Definition
Schulten K (2009) High performance computation and
interactive display of molecular orbitals on GPUs and
multi-core CPUs. In: 2nd workshop on general-purpose Several molecular interactions in cellular systems
processing on graphics processing units, ACM international exhibit sigmoidal response curve to variations in the
conference proceeding series, vol 383. ACM, New York, input concentrations. Such a response curve is
pp 9–18
typically represented by Hill equation as given below.
Munshi A, Gaster B, Mattson TG, Fung J, Ginsburg D (2011)
OpenCL programming guide. Addison-Wesley Profes-
sional, 1st edition, July 2011, Boston I nH
Y¼ (1)
K0:5 þ I nH
nH
Fractional Response
0.6
0.5
0.4
0.3
0.2
K0.5
0.1
0
0 1 2 3 4 5 6 7 8 9 10
Input Concentration in Arbitrary Units (AU)
3 Input Fractional
in mM response
2.5
I Y
0.62 0.1 Slope = ηH =1.89
2
1.65 0.4
1− Y
3.25 0.75
1
ln
6.6 0.9
0.5
0
−0.5 0 0.5 1 1.5 2
−0.5 ln I
−1
Y-intercept = − ηH ln K0.5 = −1.264
−1.5 K0.5 = 1.95 mM
Hill Equation, −2
Fig. 2 Graphical evaluation
of parameters of Hill equation −2.5
Hill equation (Eq. 1). The linearized form of Hill estimate the half-saturation constant (K0.5) (Covel
equation is given by: 1970). Figure 2 shows an example to demonstrate the
graphical evaluation of these parameters.
Y
ln ¼ nH ln I nH ln K0:5 (9)
1Y Hill Coefficeint and Cooperativity
Hill coefficient provides the measure of cooperativity
By plotting the LHS of the equation against ln I one that can be quantified based on the steepness of the
can obtain the Hill coefficient (nH ) as the slope of the binding curve saturation (Goldbeter and Dupont 1990).
curve and the intercept of Y-axis can be used to The measure of the steepness of the curve is captured
Histone Chaperones 895 H
by Hill coefficient, which depicts the variation in the Hill AV (1913) The combinations of haemoglobin with oxygen
response curves from hyperbolic to sigmoidal. and carbon monoxide. Biochem J 7:471–480
Klipp E, Herwig R, Kowald A, Wierling C, Lehrach H (2005)
The Hill coefficient is computed based on the fold Systems biology in practice concepts, implementation and
change in input stimuli required to take a response application. WILEY-VCH, Weinheim
from 10% activation to 90% activation. Koshland DE (1987) Switches, thresholds and ultrasensitivity.
Trends Biochem Sci 12(6):225–229
Vinod PK, Venkatesh KV (2008) Quantification of signaling
log ð81Þ networks. J Indian Inst Sci 88:1
nH ¼ (10)
Weiss JN (1997) The Hill equation revisited: uses and misuses.
log II9010 FASEB J 11:835–841
reactions in eukaryotes. Several lines of evidence change have been studied as separate research fields,
have shown that histone chaperones play critical the findings of these two fields are being integrated into
roles in transcription, replication, and DNA repair a comprehensive picture as the molecular mechanisms
(Eitoku et al. 2008). linking histone PTMs to nucleosome structural change
continue to be discovered.
In 1964, Murray reported methylation of histones in
Cross-References the cell. This was probably the first report of histone
modification. Soon after that, the Allfrey group revealed
▶ Histone Post-translational Modification to that histones are subjected to acetylation and methylation
Nucleosome Structural Change after the completion of the histone polypeptide chain,
and that there is a correlation between histone acetylation
and transcriptional activation (Allfrey et al. 1964).
References Although this discovery suggested the connection
between histone post-translational modifications
Eitoku M, Sato L, Senda T, Horikoshi M (2008) Histone (PTMs) and nuclear events, the molecular mechanism
chaperones: 30 years from isolation to elucidation of
underlying the observed correlation was at that time
the mechanism of nucleosome assembly and disassembly.
Cell Mol Life Sci 65:414–444 unknown due to lack of a sufficient knowledge of the
Wolffe AP (1998) Chromatin, structure function, 3rd edn. chromatin template and biological signaling in the
Academic, San Diego nucleus. It took more than 40 years of effort for
researchers to begin to understand the molecular mech-
anisms connecting histone PTMs and transcription acti-
Histone Covalent Modification vation; for these breakthroughs to take place, substantial
advances were required in three research fields, the
▶ Post-translational Modifications, Histone structure of the chromatin template, factors involved in
nucleosome structural change, and nuclear signaling
with histone PTMs.
In 1974, Kornberg discovered the nucleosome struc-
Histone Modification ture (Kornberg and Lorch 1999). The nucleosome core
particle was considered to be a complex of the histone
▶ Post-translational Modifications, Histone octamer, which comprises two copies of each canonical
histone (H2A, H2B, H3, and H4), and DNA. As was
easily recognized from the tertiary structure of the nucle-
osome, the nucleosome structure inhibits enzyme reac-
Histone Post-translational Modification tions with DNA such as transcription, replication, and
to Nucleosome Structural Change DNA repair due to tight interactions between DNA and
histone proteins. Therefore, the nucleosome structure
Naruhiko Adachi1 and Toshiya Senda2 must be disassembled to initiate these reactions. On the
1
Structure-guided Drug Development Project, JBIC other hand, the nucleosome structure is necessary to
Research Institute, Japan Biological Informatics stably maintain genetic information in the nucleus. To
Consortium, Tokyo, Japan resolve the functional conflicts of the nucleosome, regu-
2
Biomedicinal Information Research Centre (BIRC), lation of nucleosome assembly and disassembly is
National Institute of Advanced Industrial Science and required.
Technology (AIST), Tokyo, Japan Two types of factors involved in the
nucleosome structural change, histone chaperones
and ATP-dependent nucleosome-remodeling factors,
Characteristics have so far been identified. Nucleoplasmin, which is
categorized as a histone chaperone, was the first factor
Although histone post-translational modification found to have a nucleosome structural change activity
(PTM) and factors involved in nucleosome structural (Wolffe 1998; Eitoku et al. 2008). Although many
Histone Post-translational Modification to Nucleosome Structural Change 897 H
histone chaperones have been found since then, their identified, as summarized in Fig. 1 (Allis et al. 2006).
biological significance had remained elusive until the These findings established the notion that histone
functional identification of the histone chaperone PTMs are recognized by their corresponding recognition
CIA/Asf1 (Eitoku et al. 2008). Genetic, biological, domains, which are likely to function as adaptor domains
and biochemical studies showed that CIA/Asf1 is linking histone PTMs and chromatin factors, such as
involved in various nuclear events such as transcrip- ATP-dependent nucleosome-remodeling factors and his-
tion, replication, and DNA repair through nucleosome tone chaperones.
structural change. Indeed, ATP-dependent nucleosome-remodeling
The first-identified ATP-dependent nucleosome- factors contain histone PTM recognition domains.
remodeling factor, the SWI/SNF complex, was isolated For example, the Rsc4 subunit of RSC contains tandem
as a complex that contained the gene product of swi2/ bromodomains that physically interact with acetylated
snf2. The swi2/snf2 gene was initially identified from Lys14 of histone H3 (H3-K14) on the histone tail
mutant yeast strains that showed the SWI and SNF region, suggesting a relationship between histone acet-
phenotypes. Since these phenotypes were not observed ylation and the activity of RSC. Genetic and biological
in mutant yeast strains harboring destabilized chromatin analyses have suggested that RSC is involved in
structures, the swi2/snf2 gene was considered to be site-specific nucleosome remodeling in transcription H
functionally relevant to the chromatin structure. in response to histone PTMs including acetylated
A purified protein complex including the swi2/snf2 H3-K14. The connection between acetylation of his-
gene product showed ATP-dependent nucleosome- tone proteins and nucleosome structural change seems
remodeling activity (Turner 2001; Elgin and Workman to be mediated by the tandem bromodomains in the
2001). After this finding, several ATP-dependent nucle- Rsc4 subunit.
osome-remodeling factors such as NURF, RSC, ACF, In contrast with ATP-dependent nucleosome-
CHRAC, NuRD/NURD, and INO80 were isolated. Bio- remodeling factors, no histone chaperones have histone
chemical and biological studies have shown that ATP- PTM recognition domains in the molecule. The
dependent nucleosome-remodeling factors adjust the Horikoshi group, however, discovered physical and
position of nucleosomes and are involved in nuclear genetic interactions between histone chaperone CIA/
events such as transcription and DNA replication Asf1 and double bromodomains in the general transcrip-
(Eberharter and Becker 2004; Elgin and Workman tion initiation factor TFIID (Chimura et al. 2002),
2001). leading to the idea that histone acetylation could
The last critical advance is attributed to functional regulate the function of the histone chaperone CIA/Asf1.
and mechanistic studies of histone PTMs. The discov- A molecular model that connects histone PTMs and
ery of a histone acetyltransferase and the subsequent transcription activation through nucleosome structural
identification of several histone modification enzymes change by histone chaperone CIA/Asf1 was proposed
accelerated studies of histone PTMs and their on the basis of two crystal structures, CIA/Asf1–H3–H4
functions (Allis et al. 2006). Since some transcriptional and CIA/Asf1–double-bromodomain complexes. Struc-
coactivators contain a histone acetyltransferase domain/ ture-based biochemical, genetic and molecular biologi-
subunit, the connection between histone PTMs and cal analyses have suggested that acetylated histones
nucleosome structural change that is required for tran- recruit histone chaperone CIA/Asf1 via the double
scription activation came to be recognized. It was pro- bromodomain in the TFIID complex to a promoter
posed that histone PTMs themselves affect the structure region, resulting in histone eviction around the promoter
of chromatin through changing the electrostatic and/or site (Fig. 2) (Akai et al. 2010).
physical properties of nucleosomes. However, this The correlation between histone PTMs and transcrip-
hypothesis seemed to be insufficient to explain all nucle- tion activation (through nucleosome structural change)
osome structural changes required for transcriptional was discovered about 45 years ago. In the past 45 years,
activation. The situation was advanced by the structural structural details of chromatin, the signaling system with
and functional analysis of bromodomains. These studies histone PTMs, and the structural and functional relation-
revealed that bromodomains recognize acetylated ship of histone chaperones and ATP-dependent chroma-
lysines on histone proteins. Following this finding, tin-remodeling factors have been revealed. These
several histone PTM recognition domains have been findings have begun to be integrated such that the
H 898 Histone Post-translational Modification to Nucleosome Structural Change
Histone Post-translational Modification to Nucleosome nucleosome-remodeling factors contain a histone PTM recogni-
Structural Change, Fig. 1 Histone PTM recognition domains. tion domain in the complex. (The relationships are shown by
Histone acetylation (red), methylation (cyan), and phosphoryla- thick purple lines.) ATP-dependent nucleosome-remodeling
tion (yellow) recognition domains are shown in blue, green, and factors are shown in purple with labels. (Names of the subunit
orange, respectively. The names of the domains are labeled. containing a histone PTM recognition domain are given in
Histone chaperones interacting with these histone PTM recog- parentheses.) No recognition domains have been reported for
nition domains are shown in red. Usually ATP-dependent histone ubiquitination (green) and arginine methylation (cyan)
Histone Post-translational Modification to Nucleosome Biochemical, genetic, and structural analyses indicate that the
Structural Change, Fig. 2 A schematic representation of the double bromodomain of TFIID recruits CIA to specific promoter
hi-MOST model (linking the biological signaling from histone regions. Recruited CIA evicts histones and promotes RNA poly-
modifications to structural change of the nucleosome). merase II entry. BrD bromodomain, Ac acetylation
connection between histone PTMs and the nucleosome and DNA dissociation from the nucleosome – remain
structural change can be reasonably explained at the elusive. In addition, much is unknown about the signal-
molecular level. However, our understanding of these ing mechanism with histone PTMs. Although the histone
molecular processes is still limited. For example, the code hypothesis was proposed to explain the signaling
details of the molecular mechanism for nucleosome dis- mechanism with histone PTMs (Allis et al. 2006), the
assembly by histone chaperones – such as H2A–H2B hypothesis is too simple to explain the results of several
Histopathology 899 H
mutational analyses of the histone tail region. In order to
understand the whole picture of histone PTM signaling, Histones
a macroscopic theory that considers the whole network
of histone PTM signaling is required. A combination of Vani Brahmachari and Shruti Jain
experimental and theoretical approaches should lead to Dr. B. R. Ambedkar Center for Biomedical Research,
a comprehensive understanding of molecular processes University of Delhi, Delhi, India
between histone modification and various nuclear events
including nucleosome structural change.
Synonyms
▶ Histones
▶ Nucleosome Structure Definition
▶ Nucleosomes
▶ Post-translational Modifications Histones are highly basic proteins with molecular H
weights ranging from 11 to 23 kDa and form the
basic unit of ▶ nucleosomes. There are four core
References histones named H2A, H2B, H3, and H4; two subunits
of each interact to form the histone octamer. Histone
Akai Y, Adachi N, Hayashi Y, Eitoku M, Sano N, Natsume R, H1 is called the linker histone as it binds to the DNA
Kudo N, Tanokura M, Senda T, Horikoshi M (2010)
Structure of the histone chaperone CIA/ASF1-double
between two nucleosomes. The histone-DNA complex
bromodomain complex linking histone modifications and brings about the first level of compaction of the DNA
site-specific histone eviction. Proc Natl Acad Sci USA in the nucleus. The H1, H2A, and H2B are rich in
107:8153–8158 lysine amino acid, whereas H3 and H4 are rich in
Allfrey VG, Faulkner R, Mirsky AE (1964) Acetylation and
arginine. The amino-terminal tails of these proteins
methylation of histones and their possible role in the
regulation of RNA synthesis. Proc Natl Acad Sci USA extend beyond the nucleosome and, hence, are acces-
51:786–794 sible for covalent modification, while the octamer is
Allis CD, Jenuwein T, Reinberg D, Caparros M-L (2006) Epige- bound to DNA and participates in the interaction with
netics, 1st edn. Cold Spring Harbor Laboratory Press,
various other gene regulatory proteins.
New York
Chimura T, Kuzuhara T, Horikoshi M (2002) Identification and
characterization of CIA/ASF1 as an interactor of
bromodomains associated with TFIID. Proc Natl Acad Sci
Cross-References
USA 99:9334–9339
Eberharter A, Becker PB (2004) ATP-dependent nucleosome ▶ Epigenetics
remodeling: factors and functions. J Cell Sci 117: ▶ Nucleosomes
3707–3711
Eitoku M, Sato L, Senda T, Horikoshi M (2008) Histone
chaperones: 30 years from isolation to elucidation of the
mechanism of nucleosome assembly and disassembly. Cell Histopathology
Mol Life Sci 65:414–444
Elgin SCR, Workman JL (2001) Chromatin structure and gene
expression (Frontiers in molecular biology), 2nd edn. Oxford Jingky Lozano-K€uhne
University Press, Oxford Department of Public Health, University of Oxford,
Kornberg RD, Lorch Y (1999) Twenty-five years of the nucleo- Oxford, UK
some, fundamental particle of the eukaryote chromosome.
Cell 98:285–294
Turner BM (2001) Chromatin and gene regulation: molecular Definition
mechanisms in epigenetics, 1st edn. Wiley-Blackwell,
Hoboken
Wolffe AP (1998) Chromatin, structure function, 3rd edn. Histopathology is the study of microscopic structures
Academic, San Diego of diseased tissues (Crissman et al. 2004).
H 900 Holism
Holm’s Method
Holm-Bonferroni Method
Winston Haynes
Seattle Children’s Research Institute, Seatlle, ▶ Holm’s Method
WA, USA
Synonyms
Holobiont
Holm’s procedure; Holm-Bonferroni method; Sequen-
tially rejective Bonferroni test ▶ Microbiome
Hopf Bifurcation 903 H
Holoenzyme Recruitment Pathway Hopf Bifurcation
Homeostasis
Synonyms
▶ Life Span, Turnover, Residence Time
Poincaré-Andronov-Hopf bifurcation
▶ Lymphocyte Population Kinetics
Definition
dX
¼ F ðXÞ: (1)
Homogeneous Structures dt
@ F ðXÞ
Homologous Recombination J¼ : (2)
@ X X¼X
Myong-Hee Sung
Laboratory of Receptor Biology and Gene Expression, Suppose that all eigenvalues of J have negative real
National Cancer Institute, National Institutes of parts except one conjugate nonzero purely imaginary
Health, Bethesda, MD, USA pair b. A Hopf bifurcation arises when these two
imaginary eigenvalues cross the imaginary axis because
of a variation of the system parameters (Hale and Kocak
Definition 1991; Strogatz Steven 1994; Kuznetsov 2004). In the
critical situation when all eigenvalues of J have negative
Homologous recombination is a process in which two real parts except one conjugate nonzero purely imaginary
similar or identical DNA segments are swapped, pair, the system is said to have critical parameter value.
giving rise to a newly combined sequence of DNA.
References
Cell wall Antigen 85, LAM, MmaA4, PDIM, Tfr / Tf for iron acquisition IFN-γ and TNF-α
Fc-γ Receptor Opsonized PcaA downregultes Tfr
mycobacterium Cellular FadD33, Isocitrate lyase, LipF, and ferritin
metabolism Nitrate reductase, PanC / PanD expression
Complement Ag 85C, LAM Heat-shock protein HspX Utilizes cholesterol as carbon source IFN-γ inhibits all
receptor 3 (CR3) Iron uptake Mycobactin other source of
Magnesium uptake MgtC Activate isocitrate lyase to utilize carbon by excluding
Mannose receptor ManLAM Regulation DevRS, HspR, IdeR, MprAB, WhiB3 two carbon source via the Mtb phagosome
PhoP, RelA, SigA, SigE, SigF, SigH glyoxalate shunt Pathway from recycling
Secreted proteins ESAT-6 / CFP-10 endosomes
Scavenger Unknown
Secretion system ESX-1, ESX-5 Lpd and SucB play critical roles in the
receptors
Stress protein AhpC, KatG, SodA, SodC, synthesis of acetyl-coenzyme A, which
Toxin Phospholipase C is essential for the gloxylate shunt
Unclassified Erp, HbhA, PE / PE-PGRS pathway
Phagosome (1) PtpA dephosphorylation of VPS33B, 2) Protein kinase G inhibits RNI noxR1, noxR3, peptide methionine sulfoxide reductase catalyse
maturation maturation of the mycobacterial phagosome, (3) SapM depletes the reduction of methionine sulfoxide. AhpC, Lpd, SucB and the
PI3P mycobacterial phagosomal membranes and preventing thioredoxin-like AhpD forms an antioxidant complex with NADH
phagosome-lysosome fusion, (4) ManLAM blocks phagosomal dependent peroxidase and peroxynitrite reductase activity.
maturation. ROI Lsr2 protects mycobacteria against ROI. Mycobacterial Superoxide
Ca Signalling (1) ManLAM inhibits the increase in cytosolic Ca2+ / calmodulin dismutase and Catalase degrade ROI in the phagosome.
And PI3K concentration, by blocking the association of Ca2+ / calmodulin MHC class ll Mycobacterial 19-kDa lipoprotein inhibits activation of the MHC
pathway with Pl3K and thereby prevent the recruitment of EEA1 to antigen processing class ll expression activated by IFN-γ. Intraphagosomal production
phagosomes. (2) Mycobacterium inhibits SK1 which phosphorylates of Mtb urease is responsible for disruption of MHC class ll
Sphingosine to S1P which regulates intracellular. Ca homeostasis trafficking leading to intracellular retention.
by releasing Ca from cytoplasmic organelles. Apoptosis of Two mycobacterial proteins inhibit apoptosis of infected
p38 and ERK Pathogenic mycobacteria have evolved mechanisms to prevent a the infected macrophage: secA2 and nuoG.
MAPK signalling sustained activation of the ERK1 / 2 and p38 cascades, and this macrophage
pathway accounts, at least in part, for their survival.
Adaptive immune M. tuberculosis avoids elimination by the host immune system
IFN-γ and JAK / Cells that are infected with virulent mycobacteria show response by delaying the onset of adaptive responses until the
STAT pathway decreased levels of the IFN-γ receptor, which impairs the
bacteria have established a protected niche in which they can persist
tyrosine phosphorylation of JAK1 / 2 and reduces the
permanently
905
Host–Pathogen Interactions, Fig. 2 Experimental and computational methods to study host–pathogen interactions at different
levels of biological hierarchy
the host system in response to an infection can be (h) biofilm formation to protect themselves against
countered by the pathogen as well. Thus, it becomes antibiotics and host defense mechanisms (Brogden
a complex interplay between the two species, making it et al. 2007). All of these processes are complex sys-
often difficult to predict the winner. Appreciation of tems in themselves involving several molecules
the enormity of HPIs and how they influence the out- including toxins and extensive cross talk among
come of an infection requires the study of the individ- them, some of which can be readily obtained from
ual components as one connected system (Forst 2006). KEGG and other bio-systems’ databases.
Recent developments in “omics”-scale experimental Pathogenesis of cholera by V. cholerae occurs
and computational technologies have facilitated the through a series of spatio-temporally controlled events
study of systems biology of HPIs. Figure 2 illustrates under the control of a gene cascade termed the ToxR
different levels at which HPIs are studied. regulon that encodes the virulence factors. A systems
biology study of the temporal regulation of gene expres-
HPI in Virulence Mechanisms sion in V. cholerae using high-resolution time series
Several pathogens utilize virulence factors in them to genomic profiling (Kanjilal et al. 2010) provided insights
inhibit various host functions, to achieve colonization, not only into the temporal dynamics of the ToxR regulon
immuno-evasion, or immune-suppression, through one but also identified potential new members of the process.
or more of the following: (a) adherence to host cell
involving adhesins (e.g., ManLAM in Mycobacteria) HPI for Adherence to or Entry into Host Cells
(b) invasion through invasins (e.g., collagenase pro- Pathogens must first adhere to host cells so as to colo-
duced by Clostridium histolyticum and neuraminidase nize at appropriate sites. In its simplest form, attach-
by Vibrio cholerae and Shigella dysenteriae) that dam- ment to the host cell requires two factors: a ligand and a
age host cells to facilitate growth and spread of the host cell surface receptor. Alteration of the host’s cellu-
pathogen, (c) avoidance of phagolysosome formation, lar morphology through actin polymerization or use of
(d) autophagy, (e) preventing complement activation, specialized structures in bacteria such as pili, fimbriae,
(f) nutrient acquisition through specialized receptor or capsules is an example of mechanisms used by path-
systems (e.g., Tfr/Tf in Mycobacteria), (g) quorum ogens toward this (Pizarro-Cerdá and Cossart 2006).
sensing and communication with other bacteria, and This initial phase then leads to a complex host–pathogen
Host–Pathogen Interactions 907 H
molecular cross talk that enables them to reach their Although most microorganisms can synthesize
appropriate intracellular niches, subversion of cellular organic molecules that they need, they are critically
functions, and establishment of disease. dependent on the host for a few compounds. Due to the
Endocytosis and phagocytosis are extremely abundance of these resources in the host, microorgan-
dynamic processes and involve more than 200 proteins. isms may have lost genes required for their biosynthe-
Some pathogens trigger receptors that can activate these sis. Chlamydia, for example, is entirely dependent on
processes, while some others secrete toxins that enter the host for supply of tryptophan, an essential nutrient.
the host cell and activate these processes. Specific sets However, the host has devised a defense strategy to
of interactions not only trigger specific signaling cas- limit tryptophan through IFN-g-mediated activation of
cades but also determine factors such as tissue tropism, indoleamine 2,3 dioxygenase (IDO), to catabolize
species specificity, and genetic specificity. Specific sig- L-tryptophan to n-formylkynurenine.
naling events bring about particular cellular responses
such as stimulation of innate and adaptive immunity, HPI in Critical Host Cell Pathways
growth, proliferation, survival, and apoptosis. Pathogens often hijack one or more host intracellular
An example of a systems biology study of pathogen pathways in the host, for their survival (Bhavsar et al.
entry is reported for Chlamydia pneumonia, which 2007). Salmonella and Listeria, for example, utilize the H
illustrates the complexity of HPIs during entry (Wang host cytoskeleton to enter and move within host cells.
et al. 2010). Nine functional modules that were activated S. flexneri, Yersinia spp, and M. tuberculosis provide
upon exposure to the pathogen, consisting of 135 molec- examples of inhibition of a signaling cascade (NF-kB
ular components, involved in cell adhesion, transcription, pathway), the latter having been activated in the host in
endocytosis, and receptor systems, were incorporated response to the infections through the use of Toll-like
into a network capturing known inter-pathway cross receptors and other pattern recognition receptors.
talks. The network was observed to be significantly resil- The c-Met signaling network that is mediated by the
ient since intervention at any single point alone was hepatocyte growth factor (HGF) in normal physiology
insufficient to prevent entry. Instead manipulation of to achieve mitogenesis, motogenesis, and morphogen-
a combination of three key proteins (a chemokine recep- esis gets activated by the Helicobacter pylori virulence
tor, an integrin receptor, and a platelet-derived growth factor CagA as well. However, with the latter, it is
factor) was found to inhibit pathogen’s entry. highly correlated with gastric cancer. A systems level
study based on comparative logical modeling identi-
HPI for Nutrition fied intervention points for CagA-induced but not
The human host provides an appealing ecosystem for HGF-induced c-Met signaling, which were subse-
numerous microorganisms, with a variety of adaptations, quently validated experimentally (Franke et al. 2008).
to harness the existing nutrient resources. Bacterial path-
ogens that colonize extracellular niches often face envi- HPI for Elimination of Pathogen by Host or Immune
ronments with frequently changing physical conditions Evasion by Pathogens
and nutrients. However, intracellular pathogens replicat- Exposure to a pathogen leads to layers of defense
ing in cytoplasm or phagosomal compartments encoun- responses in the host. For each level of defense, path-
ter a more congenial environment for growth. Several ogens have designed ways to evade them (Henderson
nutrient acquisition adaptations are also known in intra- and Oyston 2003). For example, countering innate
cellular bacteria (Schaible and Kaufmann 2005). immune responses can be seen in the following cases:
Iron being an essential growth factor for both host H. pylori counteract acidic environment in the stomach
and pathogen is associated with coevolution of mutual by secreting urease which increases the pH surrounding
high affinity iron uptake and retention systems. the bacterium. Some bacteria such as Streptococcus
Mycobacteria, for example, block phagosomal matu- pneumoniae have antiphagocytic substances in their sur-
ration and are present in an early phagosome and faces to inhibit phagocytosis. Catalase and superoxide
accesses host cellular iron through the host transferrin dismutase synthesized by bacteria such as H. pylori and
receptor/transferrin (Tfr/Tf) system. The host counters Staphylococci scavenge reactive oxygen intermediates.
this by Interferon-g activation, which downregulates Examples where adaptive immunity is evaded include
Tfr and ferritin levels. cleaving and inactivation of IgA through IgA1 proteases
H 908 Host–Pathogen Interactions, Mathematical Models
by Streptococci and Neisseria spp., and over-activation Brogden KA, Minion FC, Cornick N, Stanton TB, Zhang Q (eds)
of T-cells by Staphylococci to produce toxic levels of the (2007) Virulence mechanisms of bacterial pathogens,
1st edn. ASM Press
pro-inflammatory cytokines, through the use of super- Devignot S, Sapet C, Duong V, Bergon A, Rihet P, Ong S, Lorn
antigens. There are also reports about the pathogen trig- PT, Chroeung N, Ngeav S, Tolou HJ, Buchy P, Couissinier-
gering complex interactions among the host subsystems Paris P (2006) Genome-wide expression profiling deciphers
that are detrimental to the host itself. For example, the host responses altered during dengue shock syndrome and
reveals the role of innate immunity in severe dengue. PLoS
mechanisms of systemic vascular dysfunction in Dengue ONE 5(7):e11671
Shock Syndrome were correlated with interplay of innate Forst CV (2006) Host-pathogen systems biology. Drug Discov
immunity, inflammation, and host lipid metabolism Today 11(5–6):220–227
(Devignot et al. 2006). Franke R, Muller M, Wundrack N, Gilles ED, Klamt S, Kahne T,
Naumann M (2008) Host-pathogen systems biology: logical
modelling of hepatocyte growth factor and Helicobacter pylori
Application in drug and vaccine discovery induced c-Met signal transduction. BMC Syst Biol 2(1):4
Knowledge of the molecular mechanisms involved in Hartman AL, Ling L, Nichol ST, Hibberd ML (2008) Whole-
HPIs can be utilized in disease diagnosis, treatment, or genome expression profiling reveals that inhibition of host
innate immune response pathways by Ebola virus can be
prevention, in a number of ways. For example, in reversed by a single amino acid change in the VP35 protein.
Ebola virus, substitution of a single amino acid in the J Virol 82(11):5348–5358
VP35 protein is sufficient to disrupt the viral inhibition Henderson B, Oyston PCF (eds) (2003) Bacterial evasion of host
of innate immune signaling, while maintaining the immune responses, 1st edn, Advances in molecular and cellular
microbiology (2). Cambridge University Press, Cambridge
ability to replicate to wild-type levels in cell culture Kanjilal S, Citorik R, LaRocque RC, Ramoni MF,
(Hartman et al. 2008), which has led to exploration of Calderwood SB (2010) A systems biology approach to
mutant varieties as vaccine candidates. modeling Vibrio cholerae gene expression in virulence-
Specific host mechanisms are also being explored to inducing conditions. J Bacteriol 192(17):4300–4310
Pizarro-Cerdá J, Cossart P (2006) Bacterial adhesion and entry
target drug therapy, by (a) preventing exploitation of into host cells. Cell 124(4):715–727
host proteins for pathogen’s replication or (b) enhanc- Schaible UE, Kaufmann SHE (2005) A nutritive view on the
ing hosts’ natural pathogen elimination mechanisms host-pathogen interplay. Trends Microbiol 13(8):373–380
such as production of interferons, the latter is achieved Tan SL, Ganji G, Paeper B, Proll S, Katze MG (2007) Systems
biology and the host response to viral infection. Nat
through agonists of certain toll-like receptors such as Biotechnol 25(12):1383–1389
7-thia-8 oxoguanosine (TLR7 agonist) that is used for Wang A, Johnston SC, Chou J, Dean D (2010) A systemic
HCV treatment (Tan et al. 2007). network for Chlamydia pneumoniae entry into human cells.
Although systems level studies and hence applica- J Bacteriol 192(11):2809–2815
tions in drug and vaccine discovery emanating from
them are not as yet commonplace, perhaps due to dif-
ficulties in comprehensive model building and experi-
mental design, the need for studying them as whole Host–Pathogen Interactions,
systems has become firmly established. Some examples Mathematical Models
from literature are already showing the power of these
approaches, with a very high potential to translate into Sumanta Mukherjee1 and Nagasuma Chandra2
1
clinically useful applications in the coming years. Bioinformatics Centre, Indian Institute of Science,
Bangalore, Karnataka, India
2
Department of Biochemistry, Indian Institute of
Cross-References Science, Bangalore, India
▶ Host-Pathogen Systems, Target Discovery
Definition
References Systems approaches are essential to understand the
Bhavsar AP, Guttman JA, Finlay BB (2007) Manipulation of
complex web of host–pathogen interactions (HPIs)
host-cell pathways by bacterial pathogens. Nature that determine the outcome of an infection. Mathemat-
449(7164):827–834 ical modeling helps enormously to study emergent
Host–Pathogen Interactions, Mathematical Models 909 H
properties of the system, dissect the role of individual (▶ Host–Pathogen Interactions, Fig. 2), the most well
components and their interactions, to understand studied being epidemiological level models and
systems difficult to study experimentally and to predict molecular level models.
outcomes under a range of scenarios and ultimately to
identify strategies for countering the disease.
HPIs constitute typical multicomponent interacting Characteristics
systems, making model-building a daunting task.
Given that both host and pathogen are entire systems Epidemiological Models
themselves, their interaction can be viewed as a system Models in this category provide a parametric represen-
of systems. The complexity in HPIs, as in other tation of evolution of the geographical and population-
biological systems, arises through feedback and feed wide spread of an epidemic, over a period of time,
forward controls, bistability, as well as activatory and addressing questions such as (a) which population is
inhibitory mechanisms. Evaluation of system parame- more susceptible to a given disease (b) which strain of
ters is often challenging, and it is a common practice a pathogen is more likely to spread and cause disease
to derive them from experimental observations. and (c) which are the key parameters that control the
Currently available models of HPIs span across differ- epidemic, hence leading to identifying strategies for H
ent levels of hierarchy in biological organizations disease diagnosis and control.
The SIR model and its variants are the most
commonly used models in this category, in which,
the population is split into three compartments
bI g (Anderson and May 1992), which are (S (t)), suscepti-
S I R ble to disease; (I (t)), actively infected population; and
(R (t)), population recovered or not vulnerable to
Host–Pathogen Interactions, Mathematical Models, disease (Fig. 1). The infectivity rate (b) and the
Fig. 1 Compartmental view of the SIR model recovery rate (g) are both assumed to be constants.
SIR model
1
S (Susceptible population)
S / I / R population concentration
(normalized to the scale of 1)
I (Infected population)
0.8 R (Recovered population)
0.6
0.4
0.2
Host–Pathogen Interactions, Mathematical Models, populations. The population axis has been normalized between
Fig. 2 Simulation output of SIR model. The Red curve shows 0 and 1 representing (S,I,R) concentrations as a fraction of the
the time evolution of the susceptible population over time. Blue population
and Green lines show profiles of recovered and infected
H 910 Host–Pathogen Interactions, Mathematical Models
For short-period and low-fatality infections, the Host–Pathogen Interactions, Mathematical Models,
model equations are summarized in (Eq. 1), and a Table 1 Choice of modeling strategies
typical simulation output is shown in Fig. 2. System parameter Time Modeling strategy
Continuous Continuous ODE, PDE
Equation 1 Differential equations representing Continuous Discrete Difference equation
a simple SIR model. Discrete Continuous Network model, integer
programming
dS Discrete Discrete Agent-based modeling
¼ bSI
dt
insights into the system dynamics. Boolean logic
dI functions are used for describing interactions.
¼ bSI gI When the interacting components are represented in
dt
the vector form as, x ¼ ðx1 ; x2 ; . . . ; xn Þ 2 f0; 1gn ,
the time updates defined by a set of Boolean
dR functions, time updates are represented as
¼ gI
dt xi ðt þ 1Þ ¼ fi ð
xðtÞÞ V i 2 f1 . . . ng, then the dynamics
of the network is represented by means of a transition
The extent of infectivity in the population will graph. A Boolean network with N variables has 2N
depend upon rates of infectivity, recoverability, and distinct states, each state being a set of unique
initial susceptible population. The parameter Ro combination of system variables. Formally,
(Basic Reproductive Ratio) is generally defined as ! s f0; 1gn f0; 1gn shows the synchronous
(Ro ¼ bSg ), capturing the expected number of sec- transition of the states. Boolean systems too exhibit
ondary infections arising from a single infected attractor dynamics, similar to other deterministic
individual. A number of case studies illustrate the dynamical systems. An attractor is a distinct set of
usefulness of the SIR model. Studies on the Bom- states, occurring in the time evolution of the system.
bay plague epidemic (1905–1906), the influenza If xs is an attractor point, then xs ¼ f ðxs Þ, where f is the
epidemic in an English school (1978), the Eyam Boolean time evolution rule, which implies that once
plague episode in England, all show very good attained, it remains in that state. Cyclic attractors on
correlation with predictions by their respective the other hand recur in sequence, once any of its states
SIR models (Murray 2005). is visited. A given system can have multiple attractors.
Different initial states eventually converge to one or
Modeling Interacting Networks of Biochemical the other attractors. The set of states which leads to
and Biological Species a particular attractor is collectively called the basin of
This set of mathematical models capture interactions attraction for the given attractor (Albert et al. 2008).
in and between cells at the molecular level. Different variants that are explored are the use of
These mathematical models are used to predict the synchronous evolution of states, asynchronous firing
evolution of various system parameters over time. of the state transition, and compartmentalization of
Each parameter in the model can either be continuous biological components (Kauffman 1993).
or discrete in nature. Depending on the nature of the A simplified Boolean model of phagocytosis is
parameter, different modeling strategies are employed, shown in Fig. 3, in which a resting macrophage
as illustrated in Table 1. M engulfs extracellular bacteria BE , and becomes
an infected macrophage MI , rendering extracellular
Boolean Network bacteria as intracellular bacteria BI .
Although a quantitative modeling of the system The Boolean rules are as in (Eq. 2).
dynamics provides accurate predictions, the required
data for such modeling is often not available, making Equation 2 Boolean rules representing state transi-
qualitative modeling the only feasible option. Though tion of simple phagocytosis model.
qualitative in nature, they are often sufficient to capture
the topology of the network and provide significant MI ¼ M and BE
Host–Pathogen Interactions, Mathematical Models 911 H
Host–Pathogen
Interactions, Mathematical
Models, Fig. 3 Macrophages
ingesting bacteria and Resting
becoming infected. Macrophage
Internalized bacteria
proliferate and burst out
Infected Bursting
Bacterium Macrophage Macrophage
BI ¼ ðBE and MÞ or ðBE and MI Þ ODEs in studying HPIs. The set of ordinary equations
representing the phagocytosis model (Fig. 3) are
The host immune response to a Bordetella infection, given in
studied using this approach (Thakar et al. 2007), Equation 3, where, l is the average rate of mac-
indicated that the dynamics were different for rophage production, g is the pathogen intake probabil-
the freshly infected host, reinfection of the host, ity by a resting macrophage per contact. d represents the
and fresh infection with antibody injections. A conva- death rate of a normal macrophage and d1 is the mortal- H
lescent host immune response was predicted to be ity rate of infected macrophages. gN and gI are the
much faster than in a host with antibody transfused average pathogen intake rates by normal resting
treatment, correlating well with experiments using macrophage and infected macrophages respectively. N
mice (Thakar et al. 2007). In a separate study, is the average number of pathogens thrown out into the
a 75 node host–pathogen interactome model, built system per macrophage burst, and represents the
to study tuberculosis, indicated that the propensity ingestion rate of phagocytosed pathogen.
for bacteria to persist was the highest as compared
to those for clearance or active proliferation (Raman
Equation 3 System of differential equations
et al. 2010).
representing interactions between macrophages
and bacteria.
Ordinary Differential Equations (ODES)
Ordinary differential equations serve as powerful tools dM
to build time-continuous models of any system. ¼ l gMBE dM
dt
By first principle, differential equations describe the
rate of change of value of a variable with respect to dMI
¼ gMBE d1 MI
time. The symbolic representation of a differential dt
equation is given as dBE
¼ aBE gN MBE gI MI BE þ d1 NMI
dt
dy
or y_
dx dBI
¼ gN MBE þ gI MI BE d1 NMI BI
dt
yðtþDtÞyðtÞ
In notation using limits, y ¼ dy
lim ¼ lim Dt ,
Dt0
representing the slope of the curve yðtÞ at any given Partial Differential Equations and Compartment
time t. Models (PDEs)
HPI models generally contain many parameters, Given the complexity and heterogeneity in the host
giving rise to coupled systems, as shown in the previ- system, it is often difficult to place them into a single
ous SIR model (Eq. 1). Coupled differential equations differential framework. In order to address the tempo-
are a system of ODEs, where each equation signifies ral and spatial evolution together, multi-compartment
the time evolution of one parameter. The parameters models and partial differential equation based models
are dependent on each other as specified by their are employed. In these, the entire space is subdivided
respective equations. Hence, the system of equations into well-mixed smaller compartments, each of which
needs to be solved simultaneously. A simple model has its own governing equations, while communication
from literature is described to illustrate the use of among them is allowed through interface equations.
H 912 Host–Pathogen Interactions, Mathematical Models
Host–Pathogen Systems,
Target Discovery, Interaction
Fig. 1 Types of elementary
interactions/reactions in Protein−Protein Interaction: Protein Protein
biochemical networks
Enzyme (Protein)
Metabolic Reaction: Substance Substance
Kinase (Protein)
Signal Transduction: Protein Protein
Transcr.
Poly-
Factors
merase
Gene Expression: Gene mRNA
Ribosome
Protein Expression: mRNA Protein
Co−factors
or Polymerase
Protein
Gene Activation/Inhibition: Gene
(TF)
More complex network representations include hyper- global RNA interference (RNAi), proteomics, and other
graphs (Forst et al. 2006) and rule-based descriptions platforms, are revealing important insights into pathways
(Hlavacek et al. 2006). In simple node/edge represen- and functional units that are essential for pathogen
tations, genes, RNA/DNA, proteins, and chemicals are infection, proliferation, and persistence as well as for
represented as nodes, binary interactions and reactions host defense.
are described as edges (Fig. 1). Multiple interactions Genomic screening data are analyzed in the context of
between chemicals in biochemical reactions are biochemical networks. Figure 2 describes a principle
appropriately described by a graph generalization, approach to analyze host-pathogen networks for target
a hypergraph, where an (hyper) edge can connect deconvolution. On the left side, the corresponding exper-
more than two nodes. Alternative representations iments assays are developed, the experimental data col-
require appropriate ontologies that relate biological lected and preprocessed. Data involves qualitative
concepts using a controlled dictionary (Karp 2000). sequence tags, presence/absence of proteins, as well as
Examples include network representations in KEGG quantitative values such as gene expression, Z-scores
Kanehisa et al. (2011) and Reactome (Matthews after RNAi screens, and drug concentrations. On the
et al. 2008). right side, interaction information is collected and syn-
Interactome information is readily available on thesized into a large biochemical host-pathogen network.
the Internet. The website Pathguide (http://www. The data is then analyzed in the context of biochem-
pathguide.org) provides a comprehensive list of repos- ical networks. Typical analysis protocols involve
itories and resources on protein-protein interactions, • Network biology and graph topology
metabolic pathways, signaling pathways, pathway • Network clusters, complexes, and modules
diagrams, transcription factors and gene-regulatory • Response networks
networks, protein-compound interactions, genetic • Pathway enrichment
interaction networks, as well as other interaction- Network biology was coined by Barabasi and Oltvai
based resources. 2004 and describes the graph-topological analysis of
biochemical networks (Barabasi and Oltvai 2004).
Network Analysis His research hypothesized that high connectivity of
Genome-wide studies of host-pathogen interactions proteins in protein interaction networks indicate impor-
using mRNA and microRNA (miRNA) transcriptomics, tance with respect to biological functions. Highly
Host–Pathogen Systems, Target Discovery 915 H
Genome/Literature
Experiments
information
System response
Feedback
to experiment
Network
components
Sub-network
construction Interaction in
BioSystems
Connections?
(complete, correct ...)
Host–Pathogen Systems, Target Discovery, Fig. 2 Response network analysis. Screening data is analyzed in the context of
biochemical networks
connected protein knockouts tend to be lethal for the used to determine enrichment of gene-ontology nodes
organism as tested in the case of yeast. by genes. By including quantitative data, such gene-
Network clusters, complexes, and modules are expression profiles, the Gene Set Enrichment Analysis
used to group subsets of network components (genes, method (GSEA; Subramanian et al. 2005) is capable to
proteins, compounds) into connected subgraphs. Purpose determine whether an a priori defined set of genes
of this exercise is to cluster tightly connected compounds shows statistically significant, concordant differences
which are thought to be functionally related (Aravind between two biological states (e.g., phenotypes).
2000). Predefined gene sets (aka Molecular Signature Data-
The calculation of response networks involves of the bases) were extracted, for example, from BioCarta,
superposition of local, component-based data, such as KEGG (Kanehisa et al. 2011), and Reactome path-
gene-expression values, with network information. ways. In a next step, together with biochemical net-
Response networks of a system refer to best-scored sub- work information, enriched gene sets are used as
networks in large biochemical networks responding to scaffolds to construct enriched pathways. Thus, the
specific environmental conditions measured by the network provides additional context information.
corresponding experiment (Ideker et al. 2001; Ideker
et al. 2002; Cabusora et al. 2005). Target Deconvolution
Pathway enrichment is a multistep process. It is Target deconvolution describes the retrospective iden-
based on set enrichment analysis identifying statisti- tification of targets that underlie observed phenotypic
cally significant genes that enrich a particular set. responses. In the case of host-pathogen interactions,
For example, hypergeometric distributions have been the phenotypic responses typically involve survival or
H 916 Host–Pathogen Systems, Target Discovery
y c
q q
Hough Transform,
Fig. 1 The Hough transform
duality between edge points p
and straight line parameters:
two points of a segment of the p
image space (left) and two
lines of the parameters space x m
(right)
H 918 Hough Transform, Structural Motif Retrieval
might correspond to any control point previously tool among the available heuristics that allow to esti-
established. If the object of interest is present in the mate the presence of particular structural features
image, one point in the image space will be highly within the structure of a protein.
voted; it will correspond to the reference point of the
object and will be voted once by every control point of
the object actually in the image. Characteristics
The advantages of HT are efficiency, rotation- and
translation-insensitiveness, parallelizability, noise and Systems biology benefits enormously from automatic
occlusion insensitiveness, and relative tolerance to methods aimed at sorting out meaningful information
approximate descriptions. Among its disadvantages, and overall trends from the huge amount of data about
its high computational complexity, which results in biological systems that has been, is being and will be
high time and memory requirements. Randomized collected with modern high-throughput techniques.
and deterministic HT variations help dealing with Among these methods, it is widely acknowledged
these downsides. that a substantial role in carrying out data mining
nowadays is played by protein structural comparison.
Known functional units such as structural motifs can
Cross-References be retrieved in recently discovered proteins and simi-
larities between known and new structures can be
▶ Hough Transform, Structural Motif Retrieval established. This allows inferring protein function,
explaining the role of specific sequences in biochemi-
cal pathways and biological networks, building up
References phylogenetic trees and ultimately creating new data-
base annotations, which in including the results of the
Ballard DH (1981) Generalizing the Hough transform to detect completed predictions will be helpful for the structures
arbitrary shapes. Pattern Recognit 13:111–122 that will be discovered in the future.
Duda RO, Hart PE (1972) Use of the Hough transformation
Various approaches have been developed for pro-
to detect lines and curves in pictures. Commun ACM
15:11–15 tein structure comparison, based on either distance
Hough PVC (1962) Methods and means for recognizing matrices (Taylor and Orengo 1989; Holm and Sander
complex patterns. US patent 3069654 1993), graph theory, or geometric hashing (Nussinov
and Wolfson 1991; Comin et al. 2004). The various
available algorithms are heuristics.
It is fundamental, in this context, to be acquainted
Hough Transform, Structural Motif with the notion of heuristic: An experience-based or
Retrieval intuition-guided problem-solving technique, which is
generally fast at the price of suboptimality, i.e., there is
Virginio Cantoni1,2 and Elio Mattia3 generally no theorem to fully guarantee the reliability
1
Department of Computer Engineering and of the results they provide. Heuristics are employed
Systems Science, University of Pavia, Pavia, Italy either when just no optimal approach is available at all
2
Computational Biology, KTH Royal Institute of or when, despite availability of an optimal method, the
Technology, Stockholm, Sweden heuristic approach is way faster and yields results with
3
Centre for Systems Chemistry, Rijksuniversiteit an acceptable accuracy given the inherent gain in
Groningen, Groningen, The Netherlands execution time.
The reason for the existence of multiple solutions of
the same protein structural comparison problem is that
Definition disparate inspirations lead to diverse approaches, very
often based on very different lines of reasoning.
The Hough transform can be used efficiently in the It turns out that the existence of multiple heuristics
context of protein structural comparison and motif for protein structure comparison is actually vital
retrieval, thereby constituting another fundamental and the reason for this is that it is very difficult to
Hough Transform, Structural Motif Retrieval 919 H
ρ θ θ
θ ρ
y y′
reference
point Whole circular
x x′ range of votes
with same ρ
z z′ and θ
Hough Transform, Structural Motif Retrieval, Fig. 1 The principle of applying the Hough transform to protein structure
comparison. Left: model protein; right: object protein (voting space)
H
establish quantitatively the distance from optimality of a cleaner voting space and in a better signal-to-noise
suboptimal structural comparison results. Therefore, ratio.
only when such methods yield results which are in The principle of the method is illustrated in Fig. 1.
accordance with one another, despite the fact that the On the left is the xyz space of the so-called model
nature of the calculations can be profoundly different, protein, while on the right is the x0 y0 z0 space of the
can the predictions be deemed accurate. so-called object protein. Voting is performed in the
The purpose of this entry is to illustrate a recently latter space. In order to do so, for every secondary
developed heuristic approach for assessing protein structure of the model protein (illustrated as a bold
structure similarity and for retrieving structural motifs, arrow on the left of Fig. 1), the characteristic parame-
which is inspired by a computational method imported ters rho and theta must be computed; these correspond
from the field of computer vision: the ▶ Hough to the distance of the secondary structure to a well-
transform (Mattia 2010). defined reference point, such as the geometric center of
In this method, protein similarity is determined by the protein, and the angle that the direction of the
performing a comparison between pairs of protein secondary structure forms with the segment joining
structures and calculating a comparison score. Motif the latter with the reference point. The parameters
retrieval is allowed for by letting one of the proteins in rho and theta allow to vote reference points in the
the comparison be the structural motif which is to be voting space. Knowledge of only two parameters in
searched for. The comparison score is calculated in a 3D space results in incomplete definition of the
a vote space, which generally corresponds to the coor- position of the point to vote, making voting of an entire
dinate space in which one of the proteins is described, circumference indispensable. This circumference is
by appropriately aligning couples of secondary struc- the rim of a cone centered in the object protein’s
tures of the two proteins (e.g., alpha-helices and secondary structure, as Fig. 2 illustrates. If the second-
beta-strands), one from each of the latter, and voting ary structures of the object proteins have a similar
selected reference points. Either the vote of the mostly spatial arrangement as the ones of the model protein
voted point in the voting space or other functions of the from which the rho and theta parameters are taken
votes in the voting space can be used as the final from, then, after many voting steps, i.e., when each of
comparison score. the rho and theta values from all of the secondary
The method can be easily tuned so that voting is structures of the model protein has been used to vote
performed only between secondary structures of simi- circumferences around every secondary structure of
lar type, e.g., alpha-helices with alpha-helices and the object protein, as many circumferences as the num-
beta-strands with beta-strands. Higher information ber of secondary structures in the model protein will all
content in the voting procedure always results in intersect, in the voting space, in one highly voted point,
H 920 Hough Transform, Structural Motif Retrieval
Hough Transform,
Structural Motif Retrieval,
Fig. 2 Example of a filled
voting space, in which
circumferences from different
secondary structures interfere
to create voting peaks
(two different perspectives)
which can be used as a score for the comparison. This computationally costly, to the point that it easily
will not happen whether the two proteins have different becomes too heavy to be executed in reasonable
structures, resulting in a low score. Figure 2 illustrates times for ordinary purposes. The algorithmic complex-
a sample voting space, in which successful comparison ity is in this case of O(N2), where N is the number of
results in the formation of vote peaks due to the voting votes in the vote space, which is in turn proportional to
circumferences intersecting with one another. the number of secondary structures in the model pro-
Fundamental for the implementation of the method tein and in the object protein and to the number of steps
is discretization. Both the geometric space where in which the voting circumference is divided.
voting is performed and the voting circumference are The problems associated with the high computa-
discretized, the former in “voxels” (cubic volume tional requirements of the smoothing algorithm are
elements), the latter in an integer number of steps. solved if a particular data structure is introduced in
The entity of the discretization deeply influences the implementation, i.e., the ▶ range tree. The use of
the quality of the results: A high-detail mesh produces range trees in the smoothing step allows the computa-
more reliable scores, although at the price of longer tional complexity to fall down to O(Nlog3N).
execution times, making a compromise between Balancing of the range trees guarantees that the com-
parameter settings and acceptable execution times putational complexity stick to the logarithmic order
necessary. Overwhelming memory usage is avoided above, without worst-case scenarios.
by maintaining information of only the voxels that The execution times vary strongly depending on the
contain a nonzero vote. size of the input, i.e., how many proteins to compare
The most important aspect of the implementation and how many secondary structures they contain, and
regards the way of dealing with the voting space after it the parameters settings. A typical execution time for
has been filled. Since structural similarity does not a one-to-one comparison is from a fraction of second to
mean structural identity, the circumferences in the a few seconds on a standard laptop. Noteworthy is the
voting space will not actually overlap perfectly, change in execution times that the use of range trees
although they will come to close proximity around brings about: A 33-to-33 proteins comparison took
one point. This results in the votes not forming the 8 h to complete with the standard algorithm (without
actual peaks that are wanted in order to compute range trees), while only 11 min with the optimized
a score. For this reason, smoothing of the vote space algorithm (with range trees).
by accumulation of all of the votes sufficiently near to The algorithm is embarrassingly parallel. This
every point in the vote space is needed. Performing means that, since it requires very little communication
a smoothing of the vote space by merely scrolling the between independent voting steps, it is easily split into
vote space and for every point summing every vote that parallel tasks, resulting in faster execution, which is
happens to be sufficiently close to it (thereby scanning fundamental for operation in the context of database
the vote space again for every point in it) is annotation.
HTLV, Cellular Transcription 921 H
All in all, as it has been pointed out in the introduc-
tory paragraphs that results of structural alignment from HPC
different methods which are in reciprocal agreement
help to confirm positive predictions, the Hough trans- ▶ Large-Scale and High-Performance Computing
form method, which has proven successful, is therefore
a fundamental component of the suite of protein struc-
tural comparison algorithms available to date.
The method has its own peculiar advantages, too,
HTLV
which are inherited directly by the algorithm it is based
▶ HTLV, Cellular Transcription
upon, the ▶ Hough transform. Among them, effi-
ciency, rotation- and translation-insensitiveness,
parallelizability, noise and occlusion insensitiveness,
and relative tolerance to approximate descriptions. The HTLV, Cellular Transcription
method is also highly parameterized, so that it can be
adapted to specific cases in order to obtain the best Christian Sch€onbach
results. The main downside is its suboptimality, as is Department of Bioscience and Bioinformatics, Kyushu H
the case with any heuristic. Other specific disadvan- Institute of Technology, Iizuka, Fukuoka, Japan
tages of the Hough transform, such as high time and
memory requirements, and of range trees, such as the
inherent difficulties in treating complex data struc- Synonyms
tures, have been successfully dealt with and solved.
HTLV; Human T-lymphotropic virus
Cross-References
Definition
▶ Hough Transform
▶ Range Tree Human T-lymphotropic viruses (HTLV) are members
of the ▶ Deltaretroviridae. HTLV-1, the only known
human oncogenic retrovirus is the etiological agent of
▶ adult T-cell leukemia/lymphoma (ATL) and
References ▶ Human T-Lymphotropic Virus Type-I-associated
Myelopathytropical Spastic Paraparesis (HAM/TSP).
Comin M, Guerra C, Zanotti G (2004) PROuST: a comparison
method of three-dimensional structures of proteins using HTLV-2 is associated with HAM/TSP-like illness,
indexing techniques. J Comput Biol 11–6:1061–1072 whereas the pathogenicities of HTLV-3 and HTLV-4
Holm L, Sander C (1993) Protein structure comparison by align- are unknown. HTLVs share a similar proviral genomic
ment of distance matrices. J Mol Biol 233:123–138
Mattia E (2010) Protein structure comparison and motif retrieval
organization. Long terminal repeats (LTR) flank
by the generalized Hough transform. IUSS Thesis, Pavia, the structural genes gag, pol, and env (Fig. 1). The pX
Italy region encodes the regulatory protein Tax and so-called
Nussinov R, Wolfson H (1991) Efficient detection of three- accessory proteins Rex, p21, p12, p13, and p30. The
dimensional motifs in biological macromolecules by
basic leucine zipper factor HBZ is translated from an
computer vision techniques. Proc Natl Acad Sci USA
88:10495–10499 antisense transcribed mRNA of the 30 LTR region
Taylor WR, Orengo CA (1989) Protein structure alignment. (Matsuoka and Jeang 2007; Higuchi and Fujii 2009).
J Mol Biol 208:1–22 CD4+ T-cell transformation by HTLV-1, viral per-
sistence, and immune response modulation are driven
by the highly pleiotropic oncoprotein Tax1 (Tax of
HTLV-1) in conjunction with HBZ, p12, p13, Rex1,
HP1 and p30. Env, Rex, Tax, and dendritic cells are asso-
ciated with HTLV-1 tropism for infecting CD4+
▶ Heterochromatin Protein 1 T-cells (Jones et al. 2008; Boxus and Willems 2009).
H 922 HTLV, Cellular Transcription
p30
p13
p12
Rex (p27)
p21
Tax (p40)
env
gag pol
protease
5⬘LTR gag pol env pX 3⬘LTR
HBZ
p12 p30
p12 protein usually localizes to the endoplasmic retic- p30 antagonizes the effects of Tax1 on both, transcrip-
ulum (ER) and cis-Golgi. Yet, proteolytic cleavage of tional and post-transcriptional levels. Direct interac-
p12 at a non-canonical ER retention signal produces tion of p30 with CREBBP and EP300 suppresses the
p8, which localizes to lipid rafts of the immunological transcription of HTLV-1 from the 50 LTR and of cellu-
synapse and affects T-cell receptor (TCR) signaling. In lar genes that are dependent on CREBBP/EP300 acti-
addition, interaction of p8 with linker for activation vation. Another cellular target of p30 is transcription
of T-cells (LAT) decreases LAT phosphorylation. factor SPI1 also known as PU.1. Binding of p30 to the
As a result, suppressed T-cell receptor signaling ETS domain of SPI1 prevents its binding to DNA and
downregulates the activity of NFAT transcription fac- therefore transcriptional activation of its target genes
tor family members which decreases cytokine produc- (e.g., TLR4) including itself. Downregulation of
tion and T-cell proliferation (Van Prooyen et al. 2010). TLR4 expression interferes with the innate immune
ER-located p12 promotes in synergy with Tax1 response, induction of pro-inflammatory cytokines
T-cell proliferation and viral persistence by activating (e.g., IL8, TNFA, CCL2, etc.) and production of
both IL2 and IL2R expression. p12 interaction with anti-inflammatory IL10 in TLR4-primed dendritic
CALR and CANX increases cytoplasmic Ca2+ levels cells (Bai et al. 2010).
and dephosphorylation of NFATC1, which locates to Post-transcriptionally p30 interaction with the large
the nucleus to activate IL2 transcription. In turn, the ribosomal subunit protein L18a and binding to Tax1
increase in IL2 activates Tax1 transcription via and Rex1 mRNAs were found to increase their nuclear
CREB2 and ATF2. At the same time ER-located p12 retention, thereby suppressing viral replication and
H 924 HTML
Conclusions
In the past 5 years, HTLV-1 and -2 studies have con- HTML
siderably contributed to the understanding of the
molecular mechanism of T-cell transformation and Tim Clark
viral manipulation of cellular pathways at both tran- Department of Neurology, Massachusetts General
scription and protein levels. Potential therapeutic tools Hospital and Harvard Medical School, Boston,
(e.g., PI3K inhibitors, allogeneic hematopoietic stem MA, USA
cell transplantation, or mutant Tax vaccination) and
diagnostic markers such as miRNAs have emerged
(Matsuoka and Jeang 2007; Ruggero et al. 2010), but Synonyms
efficient therapies and diagnostic markers have yet to
be developed. Hypertext markup language
HTTP 925 H
Definition Hickson I, Schwartz B (eds) (2012) HTML5: a technical speci-
fication for Web developers. WHATWG. http://developers.
whatwg.org/
HTML is the principal markup language of the Raggett D, Hors AL, Jacobs I (eds) (1999) HTML 4.01 specifica-
▶ World Wide Web. It specifies the structural tion, W3C recommendation 24 December 1999. World Wide
semantics for text in Web documents. Its elements Web Consortium. http://www.w3.org/TR/REC-html40/
indicate to Web browsers how to display content on
a Web page, including text, images, video, and
audio. It also can embed programming instructions
to the browser using JavaScript, or other scripting HTTP
languages.
A separate language, CSS (Cascading Style Tim Clark
Sheets), is used to specify the presentation of the Department of Neurology, Massachusetts General
page elements, such as typefaces and styles, layout, Hospital and Harvard Medical School, Boston,
colors, etc. MA, USA
Elements of HTML consist of tags enclosed in
angle brackets, inserted into the content, like this
Synonyms H
example in HTML5:
<!DOCTYPE html>
Hypertext transfer protocol
<html>
<head>
<title>Definition of HTML Definition
</title>
</head> HTTP is a generic stateless Internet application protocol
<body> for distributed and collaborative ▶ hypertext- and hyper-
<p>HTML is the markup media-based information systems. It provides the foun-
language of the World Wide dation for data communications on the World Wide Web.
Web.</p> It is currently the dominant Internet protocol.
</body> The HTTP protocol has a circumscribed set of allowed
</html> actions to access its target, which is a distributed resource
Tags plus CSS indicate to the browser how the specified by the ▶ URL or Uniform Resource Locator
content should be rendered. string. Resources are resolved, interpreted, and acted
HTML 4.01 is the most recent fully standardized upon according to client request, by computers running
W3C version of HTML. HTML5, with a number of Web server software (e.g., Apache httpd).
attractive new features, is currently usable in “working The actions supported by HTTP are constrained
draft” state. It is under parallel and coordinated devel- to a relatively limited set. Programmatic Web services
opment by both the W3C (http://w3.org) and the that conform to this set of allowed actions
WHATWG (http://www.whatwg.org). (GET, HEAD, PUT, POST, DELETE, TRACE,
OPTIONS, and CONNECT) are called “resource-ori-
ented” services and are said to conform to the REST
Cross-References (representational state transfer) architectural pattern –
they are said to be “RESTful.”
▶ World Wide Web Secure communications on the Web for such
purposes as financial transactions are implemented
using the ▶ HTTPS protocol, which encrypts the Inter-
net transport layer data transmitted by HTTP.
References HTTP standards development has been a global
collaborative engineering effort, coordinated by the
Hickson I (ed) (2011) HTML5: a vocabulary and associated
APIs for HTML and XHTML, W3C Working Draft World Wide Web Consortium (http://w3.org) and the
25 May 2011. World Wide Web Consortium Internet Engineering Task Force (http://www.ietf.org).
H 926 HTTP Secure
HTTP Secure
References
▶ HTTPS
Apache Software Foundation (2011) SSL / TLS strong
encryption: an introduction. Apache Software Foundation.
https://httpd.apache.org/docs/2.3/ssl/ssl_intro.html
Dierks T (2008) Rescorla E: IETF RFC 5246: the transport
layer security (TLS) protocol, version 1.2. In: IETF Requests
HTTPS for Comment, vol 5246. Internet Engineering Task Force.
https://www.ietf.org/rfc/rfc5246.txt
Tim Clark Rescorla E (2000) HTTP over TLS: RFC 2818. Internet Engi-
Department of Neurology, Massachusetts General neering Task Force. https://www.ietf.org/rfc/rfc2818.txt
Hospital and Harvard Medical School, Boston,
MA, USA
Synonyms Hub
Cross-References
Human T-Lymphotropic Virus
Type-I-associated Myelopathytropical ▶ HTLV, Cellular Transcription
Spastic Paraparesis
Cross-References Cross-References
Cross-References
Hypothesis Space
Definition
Eyke H€ullermeier, Thomas Fober and
Marco Mernberger Conventional statistical hypothesis testing validates
Philipps-Universit€at Marburg, Marburg, Germany whether a hypothesis about a quantity of interest
(a parameter) is true or false based on the likelihood
that the observed data could have been generated if the
Definition hypothesis were true.
Hypothesis Testing, Table 1 Outcomes of a statistical Given the above definitions, the basic steps of
hypothesis test a statistical hypothesis test are as follows:
Fail to reject H0 Reject H0 1. The first step is to describe the null and alternative
H0 true Correct Type I error hypotheses that relate to the research objective.
H0 false Type II error Correct This is important so that the results of the hypothe-
sis test are relevant to the research objective.
Specifically, the null hypothesis needs to be defined
data is taken from the population and a test statistic is in such a way that its rejection allows for the con-
calculated from the data (Lehmann and Romano 2005). clusion that research objective has been achieved.
Below are a number of definitions associated with For example, rejecting the null hypothesis that the
hypothesis testing. difference in means between a treatment and con-
trol is 0 allows for the conclusion that the treatment
• Simple hypothesis – A hypothesis based on a single had an effect.
parameter value (i.e., y ¼ 0) 2. The second step is to consider the statistical
• Composite hypothesis – A hypothesis based on assumptions being made about the sample in
a range of parameter values (i.e., y > 0) doing the test. Is the data continuous or discrete?
• Null hypothesis (H0) – A simple hypothesis associ- Are the data independent? Are they paired? Col-
ated with the theory one would like to disprove lected over time? Is the sample random? This will
(status quo) help determine which methods to use and whether
• Alternate hypothesis (HA) – A hypothesis (often results are valid or generalizable.
composite) associated with a theory one would 3. Determine the appropriate test statistic given the
like to prove assumptions.
• Test statistic – A function of the data with which 4. Derive the distribution of the test statistic under the
decisions about the hypothesis are made null hypothesis based on the assumptions. In most
• Decision rule – Values of the test statistic that cases, there exists a test statistic with a known
lead to rejecting or failing to reject the null distribution, such as the two-sample t-test, ANOVA
hypothesis F-tests, chi-squared tests for independence, t-tests for
• Acceptance region – The set of values for the test regression parameters, and so on.
statistic which we fail to reject the null hypothesis 5. The distribution of the test statistic and the level of
• Rejection region – The set of values for the test significance create acceptance and rejection regions.
statistic which we reject the null hypothesis 6. Compute the value of the test statistic.
• Critical value(s) – The value(s) on the border of the 7. Decide to either fail to reject the null hypothesis or
acceptance and rejection regions reject it in favor of the alternative hypothesis.
In a statistical hypothesis test, either H0 or HA will One problem with this approach is that it leads to an
be true, and either H0 will be rejected or we will fail to absolute reject or does not reject outcome and fails to
reject it. This leads to the four possible outcomes of separate uncertain conclusions from definitive ones. It
a statistical hypothesis test shown in Table 1. also leads to arbitrary thresholds of significance such
These outcomes lead to two types of errors and the as 0.05. Calculating a p-value improves upon this by
probability of those errors informs the decision rule providing a quantitative scale rather than absolute
defined by a statistical hypothesis test. Below are the decision. P-values near 0 provide definitive statistical
definitions of these errors and probabilities. proof against H0, while values near alpha imply evi-
• Type I error – Wrongly rejecting H0 dence against H0 but are uncertain. Finally p-values
• Type II error – Wrongly not rejecting H0 near 1 would give no reason to believe the null hypoth-
• Level of significance/size of the test (a) – The esis was not true. Also calculating p-values is simple
probability of a type I error since they are based on the distribution of test statistic
• Power – one minus the probability of a type II error under the null hypothesis. All this still assumes that
for a given parameter value in Ha (1-b) people properly interpret p-values rather than use them
• p-value – smallest alpha value for which H0 is as another way to threshold and that they are calculated
rejected in a valid manner (Jones and Tukey 2000).
Hypothesis Testing, Bayesian vs Frequentist 933 H
Another issue that is not addressed by this approach is testing, is about comparing the prior knowledge
practical significance versus statistical significance. If the about research hypothesis to posterior knowledge
power of a test is very high, then often statistical signif- about the hypothesis rather than accepting or
icance is achieved without any practical significance. If rejecting a very specific hypothesis based on the
the power of the test is low, then statistical significance experimental data.
may be difficult to achieve no matter how large the
observed test statistic is. Power for a specific test versus
a specific simple alternative hypothesis is generally con- Characteristics
trolled by the sample size; so it is important to choose
a large enough sample size to achieve statistical signifi- Just as with Bayesian inference, Bayesian hypothesis
cance for practically significant alternative hypotheses, testing is based on posterior distribution (Box and Tiao
but not so large that it yields statistical significance for 1973):
uninteresting results thereby wasting resources.
Other issues to consider in hypothesis testing, par- pðyjXÞ ¼ pðXjyÞpðyÞ
ticularly with respect to systems biology, are the
impact of multiple hypothesis testing and the validity A p-value-like quantity can be generated for H
of results when assumptions are violated. If violating a one-side hypothesis like y < 0 by integrating the
assumptions is a concern, then use of nonparametric posterior:
hypothesis testing should be considered. ð0
p¼ pðyjXÞ
Cross-References 1
More formal comparisons of models can be carried Hypothesis Testing, Parametric vs Nonparametric,
out using the Bayesian Decision Theory, where cost Table 1 Statistical tests and their nonparametric analogs
functions are added to the Bayesian model (Berger Parametric test Nonparametric test
1985). One sample T-test Sign-rank test
Two-sample T-test Rank-sum or Mann-Whitney
test
One-way ANOVA Kruskal-Wallis test
Cross-References
One-way randomize complete Friedman test
block ANOVA
▶ Bayesian Inference Pearson correlation Spearman rank correlation or
▶ Hypothesis Testing Kendall’s tau
References
Berger JO (1985) Statistical decision theory and Bayesian anal- modeled by standard probability distributions
ysis, 2nd edn. Springer, New York
(Gibbons and Chakraborti 2003 and Wasserman
Box GEP, Tiao GC (1973) Bayesian inference in statistical
analysis. Wiley, New York 2007). Most commonly this occurs when there is
Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc doubt about the data having a normal distribution.
90(430):791 The most commonly used nonparametric tests are
based upon substituting ranks for the original data.
Table 1 gives nonparametric analogs to many com-
monly used statistical hypothesis tests.
There are many other tests such as the Kolmogorov-
Hypothesis Testing, Parametric vs Smirnov test for comparing two distributions or the
Nonparametric Wald-Wolfowitz runs test for randomness.
An alternative to rank tests is to apply randomiza-
Roger Higdon tion (Edgington 1995), permutation, or other
Seattle Children’s Research Institute, resampling approaches such as the bootstrap (Efron
Seattle, WA, USA 1982) to conventional test statistics. These approaches
utilize the random assignment of the data to treatment
groups or explanatory variables in order to estimate
Synonyms p-values. Randomization tests are based on calculating
all possible randomizations of the data without
Distribution-free tests; Nonparametric tests; Rank tests replacement. A permutation test approximates a
randomization test by taking random permutations of
the data with replacement. Bootstrapping resamples
Definition the original data with replacement. Randomization
tests are commonly used in systems biology due
Nonparametric hypothesis testing, in contrast to to the doubt about parametric assumptions in
parametric hypothesis testing, does not rely on biological data. Some common examples are the
assumptions that data come from a particular parame- significance analysis of microarrays tests in relative
terized distribution such as a normal distribution. expression analysis (Tusher 2001) and gene set
enrichment tests.
Nonparametric tests offer a number of advan-
Characteristics tages including protection from the violation of
distributional assumptions, increased power when
Nonparametric or distribution-free tests are widely assumptions are violated, and the ability to
used for statistical hypothesis testing, particularly preserve complex correlation structures. Disadvan-
when there is doubt as to whether data can be easily tages include decreased power when distributional
Hypoxia-inducible Factor-1 935 H
assumptions can be met, requirements for large embryonic development for the formation of multiple
sample sizes, and difficulty in applying tests to tissues including the central nervous system and the
complex statistical models. vasculature. Hypoxia also occurs in diseases such
as cardiac ischemia and cancer; therefore, targeting
signaling pathways initiated by hypoxia represents
Cross-References a promising therapeutic strategy for these diseases
(Cassavaugh and Lounsbury 2011).
▶ Hypothesis Testing
▶ Relative Expression Analysis
Cross-References
Marsha A. Moses
Department of Surgery/Harvard Medical School,
Hypoxia Vascular Biology Program/Children’s Hospital
Boston, Boston, MA, USA
Marsha A. Moses
Department of Surgery/Harvard Medical School,
Vascular Biology Program/Children’s Hospital Definition
Boston, Boston, MA, USA
Hypoxia-inducible factor-1 (HIF-1) is a dimeric protein
complex comprises of two subunits, HIF-1a and
Definition HIF-1b, both of which are members of the bHLH-PAS
family of transcription factors. It regulates the transcrip-
Hypoxia refers to the reduced oxygen tension caused tion of many genes whose functions are critical for the
by low oxygen availability or insufficient oxygen survival of cells under hypoxic conditions. Under
delivery within an organism. Under hypoxic condi- normoxic conditions, HIF-1a is hydroxylated by prolyl
tions, the organism must exert a series of adaptive hydroxylase domain–containing (PHD) proteins loaded
responses to ensure cellular survival, such as changes with oxygen as cofactors. The hydroxylated HIF-1a
in metabolism, regulation of pH, increased red blood then interacts with the von Hippel-Lindau tumor
cell production and oxygen transport, and initiation of suppressor protein (VHL), a member of the E3
angiogenesis (Cassavaugh and Lounsbury 2011). ubiquitination complex, and is subjected to protein deg-
These processes are mediated by dimeric protein radation by the 26s proteasome. Under hypoxic condi-
complexes, the hypoxia-inducible factors, which tions, the activity of PHD is decreased due to low
regulate the transcription of many genes that are crit- availability of oxygen in the cells. Therefore, HIF-1a
ical for these cellular responses (Semenza 2009). Hyp- is relieved from protein degradation, translocates
oxia-induced cellular signaling is essential during to the nucleus, and forms a dimer with HIF-1b.
H 936 Hysteresis
Cross-References Hysteresis
Definition References
where x ¼ (x1, x2, , xm) and y ¼ (y1, y2, , ym) are the
Identification of Gene Regulatory m–dimensional observations (e.g., the expression level
Networks, Machine Learning of m time points or conditions) of gene x and y. The two
genes whose correlation score is larger than a predefined
Zhong-Yuan Zhang threshold are considered to have an interaction. Other-
School of Statistics, Central University of Finance and wise, they are considered to be marginal independence
Economics, Beijing, China and there is no edge between them.
Note that there may be pseudo-associations or false-
positive correlations in relevance networks. For exam-
Definition ple, the genes that are regulated by a common regulator
may have similar expression patterns and thus a high
Machine learning (ML) is a branch of artificial intelli- correlation score, but they do not necessarily have
gence (AI). ML is concerned with the design and a direct association.
development of algorithms and computer programs,
which make computers (machine) “learn” from expe- Graphical Gaussian Networks
rience by analyzing the datasets available. Graphical Gaussian model (GGM, Crzegorczyk et al.
Machine learning algorithms can be divided into 2008; Hache et al. 2009; Werhli et al. 2006) is an
four communities according to the desired outcome undirected graph whose nodes are genes and two genes
of them: (1) supervised learning, (2) semi-supervised are linked by an edge if there is an interaction between
learning, (3) unsupervised learning, and (4) reinforce- them. The interactions are measured by the partial corre-
ment learning. It has a wide variety of applications in lation coefficients conditioned on all the other genes. After
finance, text, biology, and etc. the correlation coefficients are determined, a statistical
In this entry, we particularly focus on ML approaches significance test is employed and the two genes whose
to reverse engineering of gene regulatory networks. We score is significantly large are considered to have an
describe the most widely used types of ML algorithms, interaction. Otherwise, they are considered to be condi-
including relevance network, graphical Gaussian net- tional independence and there is no edge between
work, Boolean network, Bayesian network, and network them. Under the assumption that the data are distributed
component analysis, which apply in reconstructing gene according to a multivariate Gaussian distribution, the par-
regulatory networks. tial correlation coefficient of gene x and gene y is given as:
C1 xy
Characteristics scoreðx; yÞ ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ;
Cxx C1
1
yy
Relevance Networks
1
Relevance network (Butte and Kohane 2003; where C1
xy is the element of C , inverse of the covari-
Crzegorczyk et al. 2008; Hache et al. 2009; Werhli ance matrix C of the data.
et al. 2006) is an undirected graph and allows loops.
The nodes are genes and two genes are linked by an Boolean Networks
edge if there is a regulatory interaction between them. Maybe the simplest way to model gene regulatory
The interactions are measured by some association coef- network as a directed graph is Boolean network.
ficients such as Pearson correlation, Spearman correla- A Boolean network model (Jong 2002; Kauffman
tion, or mutual information. For example, if the Pearson 1969; L€ahdesm€aki et al. 2003) is composed of
Identification of Gene Regulatory Networks, Machine Learning 939 I
X T T+1 1 1 0
Z X
0 0
1 1 0 1 1
1 0 1
1 0 0
T T+1 T T+1
X Y X Y Z
Y Z
0 0 0 0 0
0 1 1 1 1 1
1 1
1 0 0
1 1 1
Identification of Gene Regulatory Networks, Machine fz(0,1) ¼ 0; fz(1,1) ¼ 1. Right: The states trajectories of two
Learning, Fig. 1 Left: An example of Boolean network. X, Y, particular realizations of the Boolean network. The top trajectory
and Z are three genes. Their Boolean functions are given by the falls into a cyclic attractor, and the bottom one falls into a point
truth tables placed next to them. For example, the Boolean attractor
function fz of gene Z at time T + 1 is: fz(0,0) ¼ 0; fz(1,0) ¼ 0;
A I
P (A), P (B|A), P (D|A), P (C|B)
Characteristics
Identification of Gene Regulatory
Networks, Neural Networks Linear Additive Network Model
In linear additive regulatory model, the regulatory
Zhong-Yuan Zhang interactions are supposed to be linear (D’Haeseleer
School of Statistics, Central University of Finance and et al. 1999; Wessels et al. 2001). The activation
Economics, Beijing, China function in linear model is the identity function
f(x) ¼ x. In other words, given the expression state
vector u(t) where ui(t) is the expression value of gene
Definition i at time t, the model predicts the state vector u(t + Dt)
at the next time point t + Dt as follows:
Artificial neural network (ANN) model, also known as X
connectionist model or parallel distributed processing ui ðt þ DtÞ ¼ Wij uj ðtÞ þ bi ; i ¼ 1; 2; n; (1)
model, is a branch of artificial intelligence (AI). ANN j
where bi is the gene-specific parameter, and Wij0 ¼ Wij If the degradation rate di of gene ui is included, the
if i 6¼ j and Wii0 ¼ Wii 1. model can be reformulated as:
It can be also rewritten as difference quotient
ui ðt þ DtÞ ui ðtÞ 1
ui ðt þ DtÞ ui ðtÞ X 00 ¼ mi di ui ðtÞ (4)
¼ W ij uj ðtÞ þ b0 i ; Dt 1 þ eðai ri ðtÞþbi Þ
Dt j
In summary, the regulatory interactions of
W0
where Wij00 ¼ Dtij and b0i ¼ Dt
bi
. recurrent neural network model are nonlinear, that
The parameters W and bi, i ¼ 1, 2, , n can be is, the regulatory input ri(t) is transferred into
found by training the model using the experimental a transcriptional response by the nonlinear sigmoid func-
data available (Fig. 1). tion f ðxÞ ¼ 1þeðaxþbÞ
1
or any other squashing functions
such as the arctan function. To find the parameters, the
Recurrent Neural Network Model weight matrix W, a and b, several algorithms have been
Though the linear additive network model is expected to developed such as particle swarm optimization and
capture several important linear components of gene differential evolution (Xu et al. 2007) (Fig. 2).
regulation, still the model is oversimplified (Hache
et al. 2009; Weaver et al. 1999; Wessels et al. 2001). Stochastic Neural Network Model
The recurrent neural network model extends it by intro- Though noise is considered as the by-product of
ducing the sigmoid function f ðxÞ ¼ 1þeðaxþbÞ
1
; where a life activities, it often plays important roles in life activ-
and b are two gene-specific parameters, as the activation ities, for example, gene regulatory networks are inher-
function, which is in accordance with the common prac- ently noisy (Tian and Burrage 2003). Furthermore, noise
tice in systems biology. In other words, given the expres- contamination in experimental data is inevitable due to
sion state vector u(t) where ui(t) is the expression value of the limitations of technology. Hence, the stochastic
gene i at time t, the model can predict the state vector model conforms to reality better than the classical ones.
u(t + Dt) at the next time point t + Dt : For experimental data with expression levels,
stochastic neural network model introduces Poisson
1 random variables to describe the biochemical process
ui ðt þ DtÞ ¼ ui ðtÞ þ Dt mi ; (2)
1þ eðai ri ðtÞþbi Þ in the gene regulatory networks:
ui ðt þ DtÞ ui ðtÞ 1 where f(x) is the sigmoid function of the sum of u0i s
¼ mi ; (3)
Dt 1 þ eðai ri ðtÞþbi Þ received weighted inputs, P(l) is a Poisson random
P variable with mean l and
where ri ðtÞ ¼ j Wij uj ðtÞ þ bi is the regulatory input
of gene i, bi, ai, and bi are the gene-specific parameters lm l
ProbfPðlÞ ¼ mg ¼ e ; m ¼ 0; 1; :
and mi is the maximal expression value of gene i. m!
IgSF 943 I
Identification of Gene gene 1
Regulatory Networks, W1i
Neural Networks, gene 2 gene i activation function
Fig. 2 Schematic diagram of W2i
gene i in the recurrent neural
network model. The activation output
•••
function is: f ðxÞ ¼ 1þeðaxþbÞ
1
a b
lgSF domain of V type V-DOMAIN (IG, TR)
A F T G G Y
1 E T N P V Y T
B V Y Y Y Y G G
Q G W I T S S 85 G G
L 26 S 39 L 55 D 66 T S T A D
T
E A G G N A R Y
D
E K 41 W I N 80 A Y A W 118
G E
S 23 C V W E T 89 M 104 C G
F G S K E K L Q E Q
D F
C P I Q G F I L Y G
10
E K R H R A S V T
L V P G G K 75 S A S
C′
46
V S L S V
R T 1a3l_H T D T
C″ 15 P G 16 S E V
1a3I_H A B C C¢ C≤ D E F G S
S 128
IGHV-D-J
S
F S
F T G G R Y Q
Y P P L Q
T L Y T N P V P G Y E N
G V G Y Y Y Y Y N M N F G L
G Q G G W 85 S I S T L K V Y R 85 E Y N Q
D L A 26 S 39 L T 55 D S T 66 D I E 26 S 39 A S 55 V G V 66
T L
Y E R A G A G N N L I Y S V V Y
D K
118 W E A K 41 W Y I A N 118 E V K K 41 L T C G S
23 23
G S 104 C C V 89 M W T E K K 104 C C H 89 F V D K
Q G F S K Q E L K S Q F S K Y E C T
10 F 10 V
G P Y I Q L G I F N S Y L G L A N G
P
T E V K R S H A R G M I N L Q S F
S L A V 46 P S G 75 K G T L D V 46 N D 75
V V S S L I V T A L
T R D T T 1a3I_H I A Q N Y 1yjd_C
V 5 P E G 16 S H 5 Y N D 16 V
S V
G A F B C E C¢ D C≤ G A F B C E C¢ D C≤
128 S 128 K
IGHV-D-J CD28[D1]
IMGT Collier de Perles, Fig. 1 IMGT Collier de Perles for amino acids shown in squares (anchor positions), which belong to
V domain. (a) Ribbon representation of a V-DOMAIN taken as the neighboring strands (FR-IMGT). BC loops are represented
an example. A similar topology and 3D structure characterize online in red, C0 C00 loops in orange, and FG loops in purple.
a V-LIKE-DOMAIN. (b) and (c) V-DOMAIN on one layer and Hatched circles or squares correspond to missing positions
on two layers, respectively (Mus musculus VH [8.8.12]). (d) according to the IMGT unique numbering for V domain (Lefranc
V-LIKE-DOMAIN on two layers (Homo sapiens CD28 et al. 2003). Arrows indicate the direction of the beta strands and
[9.9.13]). Amino acids are shown in the one-letter abbreviation. their designations in 3D structures. The IMGT Colliers de Perles
Position at which hydrophobic amino acids (hydropathy index on two layers show, on the forefront, the GFCC0 C00 strands and, on
with positive value: I, V, L, F, C, M, A) and tryptophan (W) are the back, the ABED strands. IMGT Colliers de Perles on two
found in more than 50% of analyzed sequences are shown online layers with hydrogen bonds are available in IMGT/3Dstructure-
in blue. All proline (P) are shown online in yellow. The loops BC, DB (http://www.imgt.org): 1a3l_H, 1yjd_C
C0 C00 , and FG (corresponding to the CDR-IMGT) are limited by
IMGT Collier de Perles 947 I
brackets and separated by dots. In Fig. 1, the lengths numbering first described for the V-REGION and for
are [8.8.12] for the V-DOMAIN CDR-IMGT and the V domain (Lefranc et al. 2003). The five positions
[9.9.13] for the V-LIKE-DOMAIN. IMGT Colliers that are essential for the core structure have the
de Perles for V domain can be represented on one same number in the C domain: cysteine 23 (1st-
layer (Fig. 1b) or two layers (Fig. 1c, d) (Ruiz and CYS), tryptophan 41 (CONSERVED-TRP), conserved
Lefranc 2002; Lefranc et al. 2003; Kaas et al. 2007; hydrophobic amino acid 89, cysteine 104 (2nd-CYS)
Kaas and Lefranc 2007). If 3D structures are available, and “FG anchor” 118 (Lefranc et al. 2005a).
IMGT Colliers de Perles on two layers are provided IMGT Colliers de Perles for C domain can be
with hydrogen bonds, in IMGT/3Dstructure-DB represented on one layer (Fig. 2b) or two layers
(http://www.imgt.org) (Ehrenmann et al. 2010). (Fig. 2c, d) (Lefranc et al. 2005a; Kaas et al. 2007;
Kaas and Lefranc 2007). If 3D structures are available,
IMGT Collier de Perles for C Domain IMGT Colliers de Perles on two layers are provided
“IMGT Collier de Perles for C domain” with hydrogen bonds, in IMGT/3Dstructure-DB
(or “IMGT_Collier_de_Perles_for_C_domain”) is (http://www.imgt.org) (Ehrenmann et al. 2010).
a leafconcept of the “IMGT Collier de Perles” concept
of numerotation. It represents the C domain (▶ Con- IMGT Collier de Perles for G Domain
stant (C) Domain) that includes the C-DOMAIN of IG “IMGT Collier de Perles for G domain”
or antibodies and TR and the C-LIKE-DOMAIN of (or “IMGT_Collier_de_Perles_for_G_domain”) is I
IgSF proteins other than IG and TR. It is based on the a leafconcept of the “IMGT Collier de Perles” concept
IMGT unique numbering for C domain (Lefranc et al. of numerotation. It represents the G domain (▶ Groove
2005a) (IMGT unique numbering). (G) Domain) that includes the G-DOMAIN of major
The topology and 3D structure of a C-LIKE- histocompatibility (MH) and the G-LIKE-DOMAIN
DOMAIN are very similar to those of an IG or of the major histocompatibility superfamily (MhSF)
TR C-DOMAIN (Fig. 2a). proteins other than MH. It is based on the IMGT
A C domain consists of about 100 amino acids in unique numbering for G domain (Lefranc et al.
seven antiparallel beta strands (A, B, C, D, E, F, 2005b) (IMGT unique numbering).
and G), linked by beta turns (AB, DE, and EF), The topology and 3D structure of a G-LIKE-
a transversal strand (CD) and loops (BC and DOMAIN are very similar to those of a MH
FG), located in two layers, forming a sandwich of two G-DOMAIN (Fig. 3a). Two G domains are necessary
sheets. The sheets are closely packed against each other to form the characteristic groove of the MhSF proteins
through hydrophobic interactions giving a hydrophobic that comprise the MH (MH class I or MH1, and MH
core and joined together by a disulfide bridge between class II or MH2) and related proteins of the immune
the B-STRAND in the first sheet and the F-STRAND in system (RPI)-MH1Like (Lefranc et al. 2005b).
the second sheet (Lefranc et al. 2005a). Four strands A G domain consists of about 90 amino acids in four
form one sheet and three strands form a second sheet. antiparallel beta strands (A, B, C, and D), linked by
The C domain has a topology and 3D structure turns (AB, BC, CD), forming half of the groove floor,
similar to those of the V domain, but differs by the and one alpha helix, forming one wall of the groove.
number of antiparallel beta strands (seven instead of For each G domain, the positions that contribute to the
nine). By comparison with a V domain, the groove floor comprise positions 1–49, and comprise
C0 -STRAND, C0 C00 -LOOP, and C00 -STRAND are miss- the four strands linked by turns (Lefranc et al. 2005b).
ing in the C domain and are replaced by the character- The numbering of the G domain helix starts at position
istic transversal CD-STRAND that links the two 50 and ends at position 92 (Fig. 3).
sheets. Depending on the CD-STRAND length, the Interestingly, despite the high sequence divergence,
D-STRAND is in the first or in the second sheet only five additional positions (54A, 61A, 61B, 72A, and
(Lefranc et al. 2005a). Additional positions observed 92A) are necessary to align any G-DOMAIN and
in the C domain define the AB-TURN, DE-TURN, G-LIKE-DOMAIN (Lefranc et al. 2005b). It is worth-
and EF-TURN (Lefranc et al. 2005a). while to note that position 54A in G-ALPHA1-LIKE is
The IMGT unique numbering for C domain the only additional position needed to extend the IMGT
(Lefranc et al. 2005a) is derived from the IMGT unique numbering for G-DOMAIN to the G-LIKE-DOMAIN.
I 948 IMGT Collier de Perles
A D
D K S
A P K S T 111 112
1 A Y D D Y T S
P F I 85.1 Q S K T
T N N 36 D M 85 H S
A 84
V 26 N 39 V T S T P
W
S L K S A I
B S
I F 41 W 80 N T E V 118
E
F 23 C K L 89 L 104 C K 119
D P V I V T T S
G
P V D G 77 L Y F
F 10 S Q N
S S 45 G S E R 45.5 45.6
45.7 T S N
C 45.3 45.4
45.1 45.2
E A K N R
Q G D H
CD L G 1axt_L E R
15 T S 16 Y E 128
1.5 M
1.4 R
1.3 A D 1.3 T
1.2 D 85.4 S K 84.4 1.2 E 85.4 84.4
S T H N N 36.85 M D 84 S K L Y S 36.85 84
P V T 25 N 39 V S T T A N 26 A 39 T
W
I S A L K S L V T G Q S
S A
118 V I E F 41 W T N 80 118 S V Q Q W S Q 80
23
119 K F 104 C 23 C K 89 L L 119 D F 104 C C F 89 Y S
S P T V I T V P L R K H F S
F P Y V D L G 77 V E Y L N I I 77
S Q N P
N S S S 45 G S E R 45.5 45.6 45.7 Q 10 Q E T 45 E S L 45.6 45.7
10 45.4 45.4 45.5
45.1 K 45.1
R E N A L W G V A
Q H G D E Y S S A
L R G E V S D D T
Y 1axt_L 1e4j_A
126 15 T E S 16 H 15 V N K 16 V
96.2 95.2
15.1 15.3 127 I 15.1 L 15.3
E
15.2 15.2
G A F B E G A F B E
IMGT Collier de Perles, Fig. 2 IMGT Collier de Perles for in the V domain. Positions 45 and 77 which delimit the charac-
C domain. (a) Ribbon representation of a C-DOMAIN taken as teristic CD-STRAND of the C domain, and position 118 which
an example. A similar topology and 3D structure characterize corresponds structurally to J-PHE or J-TRP of the IG and TR
a C-LIKE-DOMAIN. (b) and (c) C-DOMAIN on one layer and J-REGION (Lefranc and Lefranc 2001a, b) are also shown in
on two layers, respectively (Mus musculus C-KAPPA). (d) squares. Hatched circles correspond to missing positions
C-LIKE-DOMAIN on two layers (Homo sapiens FCGR3B according to the IMGT unique numbering for C domain (Lefranc
[D1]). Amino acids are shown in the one-letter abbreviation. et al. 2005a). Arrows indicate the direction of the beta strands
Position at which position at which hydrophobic amino acids and their designations in 3D structures. The IMGT Colliers de
(hydropathy index with positive value: I, V, L, F, C, M, A) and Perles on two layers show, on the forefront, the GFC strands and,
tryptophan (W) are found in more than 50% of analyzed on the back, the ABE strands. IMGT Colliers de Perles on two
sequences are shown in gray. The positions 26, 39, and 104 are layers with hydrogen bonds are available in IMGT/3Dstructure-
shown in squares by homology with the corresponding positions DB (http://www.imgt.org): 1axt_L, 1e4j_A
I 950 IMGT Collier de Perles
MH class I
IMGT Collier de Perles, Fig. 3 IMGT Collier de Perles for numbering for G domain (Lefranc et al. 2005b). In IMGT
G domain. (a) Ribbon representation of two G-DOMAIN taken Colliers de Perles, position 7A is only displayed in the G-
as an example. A similar topology and 3D structure characterize ALPHA, G-ALPHA1, and G-ALPHA1-LIKE domains, and
the G-LIKE-DOMAIN. (b) G-DOMAIN of MH class I (or positions 61A and 61B in the G-BETA, G-ALPHA2, and
MH1). G-ALPHA1 [D1] and G-ALPHA2 [D2] (Homo sapiens G-ALPHA2-LIKE domains. Position 54A is only occupied in
HLA-A*0201). (c) G-DOMAIN of MH class II (or MH2). G-ALPHA1-LIKE of RPI-MH1Like proteins. Position 92A is
G-ALPHA [D1] and G-BETA [D1] (Homo sapiens only added for HLA-DMA and H2-DMA IMGT Colliers de
HLA-DRA*0101 and HLA-DRB1*0101). (d) G-LIKE- Perles. Note that the N-terminal end of a peptide in the cleft
DOMAIN of RPI-MH1Like. G-ALPHA1-LIKE [D1] and G- would be on the left-hand side. IMGT Colliers de Perles are
ALPHA2-LIKE [D2] (Mus musculus CD1D1). Amino acids available in IMGT/3Dstructure-DB (http://www.imgt.org):
are shown in the one-letter abbreviation. Hatched circles corre- 1akj_A, 1fyt_B, 1cd1_C
spond to missing positions according to the IMGT unique
The IMGT Colliers de Perles are particularly useful easily compare the amino acid sequences of the
in molecular engineering and antibody humaniza- four FR-IMGT (FR1-IMGT: positions 1–26,
tion design based on CDR grafting. Indeed they FR2-IMGT: 39–55, FR3-IMGT: 66–104, and
allow to precisely define the CDR-IMGT and to FR4-IMGT: 118–128) between the mouse
IMGT Collier de Perles 951 I
(or other species) and the closest human diseases and leukemias, pMH contact analysis, and
V-DOMAIN. Analyses performed on humanized more generally ligand–receptor interactions involving
therapeutic antibodies underline the importance of V, C, and/or G domains.
a correct delimitation of the CDR regions to be
grafted.
4. The IMGT Colliers de Perles also allow Cross-References
a comparison to the IMGT Colliers de Perles
statistical profiles for the human expressed ▶ Chain Type
IGHV, IGKV, and IGLV repertoires. These statis- ▶ Complementarity Determining Region
tical profiles are based on the definition of 11 IMGT (CDR-IMGT)
amino acid physicochemical characteristics classes ▶ Constant (C) Domain
which take into account the hydropathy, volume, ▶ Conventional Gene
and chemical characteristics of the 20 common ▶ Diversity (D) Gene
amino acids (Pommié et al. 2004) (“Amino acids” ▶ Domain Type
in IMGT Aide-mémoire, http://www.imgt.org). ▶ Framework Region (FR-IMGT)
The statistical profiles identified positions which ▶ Gene Type
are conserved for the physicochemical characteris- ▶ Groove (G) Domain
tics: 41 FR-IMGT positions for the human IGHV ▶ IMGT Unique Numbering I
and 59 FR-IMGT positions for the human IGKV ▶ IMGT-ONTOLOGY
and IGLV at >80% threshold (see Plate 3 in ▶ IMGT-ONTOLOGY, Leafconcept
Pommié et al. 2004). After assignment of the IMGT ▶ IMGT ® Information System
Collier de Perles amino acids to the IMGT amino acid ▶ Immunogenetics
physicochemical classes, comparison can be made ▶ Immunoglobulin Superfamily (IgSF)
with the statistical profiles of the human expressed ▶ Immunoglobulin Synthesis
repertoires. This comparison is useful to identify ▶ Immunoinformatics
potential immunogenic residues at given positions in ▶ Joining (J) Gene
chimeric or humanized antibodies or to evaluate ▶ MH Superfamily (MhSF)
immunogenicity of therapeutic antibodies. ▶ IMGT-ONTOLOGY, NUMEROTATION Axiom
5. IMGT Colliers de Perles are also of interest when ▶ Variable (V) Domain
3D structures are available. In IMGT/3Dstructure- ▶ Variable (V) Gene
DB (Ehrenmann et al. 2010), “IMGT Collier de
Perles on 2 layers” are displayed with hydrogen
bonds for V and C domains. Clicking on References
a residue in “IMGT Collier de Perles on one
Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, Lefranc M-P,
layer” gives access to the corresponding IMGT
Giudicelli V (2008) IMGT-Kaleidoscope, the Formal IMGT-
Residue@Position card which provides the atom ONTOLOGY paradigm. Biochimie 90:570–583
contact types and atom contact categories for that Ehrenmann F, Kaas Q, Lefranc M-P (2010) IMGT/3Dstructure-
amino acid. IMGT Colliers de Perles display DB and IMGT/DomainGapAlign: a database and a tool for
immunoglobulins or antibodies, T cell receptors, MHC, IgSF
the IMGT pMH contact sites for 3D structures and MhcSF. Nucleic Acids Res 38:D301–D307
with peptide/MH (pMH) complexes, which can be Garapati VP, Lefranc M-P (2007) IMGT Colliers de Perles and
compared with the pMH contact sites available in IgSF domain standardization for T cell costimulatory
IMGT/3Dstructure-DB. activatory (CD28, ICOS) and inhibitory (CTLA4, PDCD1
and BTLA) receptors. Dev Comp Immunol 31:1050–1072
The IMGT Colliers de Perles for the V, C, and
Giudicelli V, Lefranc M-P (1999) Ontology for immunogenet-
G domains, based on the IMGT unique numbering, ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–1054
represent therefore a major step forward for the Kaas Q, Lefranc M-P (2007) IMGT Colliers de Perles: standard-
comparative analysis of the sequences and structures ized sequence-structure representations of the IgSF and
MhcSF superfamily domains. Curr Bioinf 2:21–30
of the IgSF and MhSF domains, for the study of their
Kaas Q, Ehrenmann F, Lefranc M-P (2007) IG, TR, MHC, IgSf
evolution and for the applications in antibody and MhcSF: what do we learn from the IMGT Colliers de
engineering, IG and TR repertoires in autoimmune Perles? Brief Funct Genomic Proteomic 6:253–264
I 952 IMGT Unique Numbering
Lefranc M-P, Lefranc G (2001a) The immunoglobulin (▶ IMGT® Information System). The “IMGT unique
factsbook. Academic, London, pp 1–458 numbering” concept allows to number domain types
Lefranc M-P, Lefranc G (2001b) The T cell receptor factsbook.
Academic, London, pp 1–398 (▶ Domain Type) that are characteristic of protein
Lefranc M-P, Pommié C, Ruiz M, Giudicelli V, Foulquier E, superfamilies, whatever the species, the molecule
Truong L, Thouvenin-Contet V, Lefranc G (2003) IMGT type (▶ MoleculeType) or the chain type (▶ Chain
unique numbering for immunoglobulin and T cell receptor Type). Three leafconcepts (▶ IMGT-ONTOLOGY,
variable domains and Ig superfamily V-like domains.
Dev Comp Immunol 27:55–77 Leafconcept) have been defined, respectively, for the
Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G, V domain (▶ Variable (V) Domain) and C domain
Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D, (▶ Constant (C) Domain) of the ▶ immunoglobulin
Thouvenin V, Combres K, Girod D, Jeanjean S, Protat C, superfamily (IgSF) and for the G domain (▶ Groove
Yousfi Monod M, Duprat E, Kaas Q, Pommié C, Chaume D,
Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics (G) Domain) of the ▶ MH superfamily (MhSF).
and immunoinformatics, In Silico Biol 4:17–29 “IMGT unique numbering for V domain” numbers the
Lefranc M-P, Pommié C, Kaas Q, Duprat E, Bosc N, Guiraudou D, V-DOMAIN of immunoglobulins (IG) or antibodies and
Jean C, Ruiz M, Da Piédade I, Rouard M, Foulquier E, T cell receptors (TR) and the V-LIKE-DOMAIN of the
Thouvenin V, Lefranc G (2005a) IMGT unique numbering for
immunoglobulin and T cell receptor constant domains IgSF other than IG and TR. “IMGT unique numbering for
and Ig superfamily C-like domains. Dev Comp Immunol C domain” numbers the C-DOMAIN of IG and TR and
29:185–203 the C-LIKE-DOMAIN of IgSF other than IG and TR.
Lefranc M-P, Duprat E, Kaas Q, Tranne M, Thiriot A, Lefranc G “IMGT unique numbering for G domain” numbers the
(2005b) IMGT unique numbering for MHC groove
G-DOMAIN and MHC superfamily (MhcSF) G-LIKE- G-DOMAIN of major histocompatibility (MH) and
DOMAIN. Dev Comp Immunol 29:917–938 the G-LIKE-DOMAIN of the MhSF other than MH.
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT, a The IMGT unique numbering has been a great break-
system and an ontology that bridge biological and computa- through in immunogenetics and system biology in
tional spheres in bioinformatics. Brief Bioinform 9:263–275
Pommié C, Levadoux S, Sabatier R, Lefranc G, Lefranc M-P allowing, for the first time, to bridge the gap between
(2004) IMGT standardized criteria for statistical analysis of amino acid (and nucleotide) sequences of any V, C, or
immunoglobulin V-REGION amino acid properties. J Mol G domain and their two-dimensional (2D) and
Recognit 17:17–32 three-dimensional (3D) structures. The IMGT unique
Ruiz M, Lefranc M-P (2002) IMGT gene identification and
Colliers de Perles of human immunoglobulin with known numbering has been fundamental in the creation of the
3D structures. Immunogenetics 53:857–883 ▶ IMGT Collier de Perles and in the standardization of
the description of mutations, amino acid changes,
polymorphisms, and contact analysis in the IMGT®
databases, tools, and Web resources (http://www.imgt.
org) (▶ IMGT® Information System).
IMGT Unique Numbering
IMGT Unique Numbering, Table 1 V domain strands and loops, IMGT positions and lengths, based on the IMGT unique
numbering for V domain (V-DOMAIN and V-LIKE-DOMAIN) (Lefranc et al. 2003)
V domain strands Characteristic Residue @ V-DOMAIN
and loopsa IMGT positions Lengthsb Positionc FR-IMGT and CDR-IMGT
A-STRAND 1–15 15 (14 if gap at 10) FR1-IMGT
B-STRAND 16–26 11 1st-CYS 23
BC-LOOP 27–38 12 (or less) CDR1-IMGT
C-STRAND 39–46 8 CONSERVED-TRP 41 FR2-IMGT
C0 -STRAND 47–55 9
C0 C00 -LOOP 56–65 10 (or less) CDR2-IMGT
C00 -STRAND 66–74 9 (or 8 if gap at 73) FR3-IMGT
D-STRAND 75–84 10 (or 8 if gaps
at 81, 82)
E-STRAND 85–96 12 hydrophobic 89
F-STRAND 97–104 8 2nd-CYS 104
FG-LOOP 105–117 13 (or less, or more) CDR3-IMGT
G-STRAND 118–128 11 (or 10) V-DOMAIN J-PHE 118 FR4-IMGT
or J-TRP 118d
a ®
IMGT labels (concepts of description) are written in capital letters (Labels and relations)
b
in number of amino acids (or codons)
c ®
Residue@Position is a IMGT concept of numerotation that numbers the position of a given residue (or that of a conserved property
amino acid class), based on the IMGT unique numbering
d
In the IG and TR V-DOMAIN, the G-STRAND (or FR4-IMGT) is the C-terminal part of the J-REGION, with J-PHE or J-TRP 118
and the canonical motif F/W-G-X-G at positions 118–121
IMGT Unique Numbering, Fig. 1 “IMGT Protein display for V-LIKE-DOMAIN (Lefranc et al. 2003). Line 7: Pipes for the
V domain” header. The header comprises 7 lines. Line 1: Frame- start and end positions of strands and loops and highlighted
work regions FR-IMGT (e.g., FR1-IMGT) and complementarity positions, dots for the other positions, plus if needed, letters or
determining regions CDR-IMGT (e.g., CDR1-IMGT). Line 2: numbers for additional positions (e.g., AB or 123. . .). Optionally
start and end positions of FR-IMGT and CDR-IMGT, e.g., small circles above the line (instead of dots) may indicate the
(1–26). Line 3: name of strands (A, B. . .) and loops (AB, positions at the top of the BC, C0 C00 , and FG loops considering
BC. . .). Line 4: start and end positions of strands, e.g., (1–15). loop lengths of [12.10.13] (these positions are 32, 33 for BC, 60,
Line 5: arrows (for the strands). Line 6: positions according 61 for C0 C00 and 111, 112 for FG)
to the IMGT unique numbering for V-DOMAIN and
a crucial and original concept of IMGT-ONTOLOGY. IMGT Gaps and Additional Positions
The lengths of the BC (CDR1-IMGT), C0 C00 (CDR2- Dots in IMGT Protein displays (Scaviner et al. 1999;
IMGT), and FG (CDR3-IMGT) loops characterize the Folch et al. 2000; Lefranc and Lefranc 2001a, b)
V-DOMAIN and V-LIKE-DOMAIN. Thus, the length (Fig. 1) and hatched circles or squares in IMGT Col-
of the three loops BC, C0 C00 , and FG is shown, in liers de Perles for V domain (▶ IMGT Collier de
number of amino acids (or codons), into brackets and Perles) correspond to missing positions according to
separated by dots. For example [9.6.9] means that the the IMGT unique numbering for V domain (Lefranc
BC, C0 C00 , and FG loops (or CDR1-IMGT, CDR2- et al. 2003).
IMGT, and CDR3-IMGT for V-DOMAIN) have For BC, C0 C00 , or FG loops shorter than 12, 10, 13
a length of 9, 6, and 9 amino acids (or codons), amino acids, respectively, gaps are created at the apex
respectively. (missing positions, hatched in ▶ IMGT Collier de
IMGT Unique Numbering 955 I
Perles, or not shown in structural data representations). IMGT Unique Numbering, Table 2 C domain strands, turns
The gaps are placed at the apex of the loop with an and loops, IMGT positions and lengths, based on the IMGT
unique numbering for C domain (C-DOMAIN and C-LIKE-
equal number of amino acids (or codons) on both sides DOMAIN) (Lefranc et al. 2005a)
if the loop length is an even number, or with one more
amino acid (or codon) in the left part if it is an odd Characteristic
C domain strands, IMGT Residue @
number. For example, for FG loops shorter than 13 turns and loopsa positions Lengthsb Positionc
amino acids, gaps are created from the apex of the A-STRAND 1–15 15 (14 if
loop, in the following order: 111, 112, 110, 113, 109, gap at 10)
etc. For FG loops longer than 13 amino acids, addi- AB-TURN 15.1– 0–3
tional positions are created, between positions 111 and 15.3
B-STRAND 16–26 11 1st-CYS 23
112 at the top of the FG loop, in the following order:
BC-LOOP 27–31 10 (or less)
112.1, 111.1, 112.2, 111.2, 112.3, etc. (Lefranc et al.
34–38
2003).
C-STRAND 39–45 7 CONSERVED-
TRP 41
IMGT Unique Numbering for C Domain CD-STRAND 45.1– 0–9
C Domain Strands, Loops and Turns 45.9
The IMGT unique numbering for C domain numbers D-STRAND 77–84 8 (or 7 if
the C-DOMAIN of IG or antibodies and TR and the gap at 82) I
C-LIKE-DOMAIN of the IgSF other than IG and TR. DE-TURN 84.1– 0–14
84.7
The C domain strands, turns, and loops and their
85.1–
IMGT positions and lengths, based on the IMGT 85.7
unique numbering for C domain (C-DOMAIN and E-STRAND 85–96 12 hydrophobic 89
C-LIKE-DOMAIN) (Lefranc et al. 2005a), are shown EF-TURN 96.1– 0–2
in Table 2. 96.2
The C domain (C-DOMAIN of IG and TR and F-STRAND 97–104 8 2nd-CYS 104
C-LIKE-DOMAIN of IgSF other than IG and TR) is FG-LOOP 105–117 13 (or less,
composed by the A-STRAND of 15 (or 14 if gap at 10) or more)
G-STRAND 118–128 11 (or less)
amino acids (positions 1–15), the AB-TURN (addi- ®
a
tional positions 15.1, 15.2, and 15.3; the longest IMGT labels (concepts of description) are written in capital
letters (Labels and relations)
AB-TURN have three amino acids), the B-STRAND b
In number of amino acids (or codons)
®
of 11 amino acids (positions 16–26) with the 1st-CYS c
Residue@Position is a IMGT concept of numerotation that
at position 23, the BC-LOOP (positions 27–31, 34– numbers the position of a given residue (or that of a conserved
38), the C-STRAND of seven amino acids (positions property amino acid class), based on the IMGT unique
numbering
39–45) with the CONSERVED-TRP at position 41, the
CD-STRAND of one to nine amino acids (additional
positions 45.1–45.9), the D-STRAND of eight (or
seven if gap at 82) amino acids (positions 77–84), the
DE-TURN (additional positions 84.1–84.7 and 85.7– C Domain and V Domain Comparison
85.1, corresponding to 14 amino acids), the The A-STRAND and B-STRAND of the C domain are
E-STRAND of 12 amino acids (positions 85–96) with similar to those of the V domain (Lefranc et al. 2005a).
a conserved hydrophobic amino acid at position 89, the The longest BC-LOOP of the C domain have 10 amino
EF-TURN (additional positions 96.1 and 96.2, acids (missing positions 32 and 33), instead of
corresponding to two amino acids), the F-STRAND 12 amino acids in the V domain. The C-STRAND
of eight amino acids (positions 97–104) with the 2nd- and the D-STRAND of the C domain are shorter of
CYS at position 104, the FG-LOOP (positions 105– one position (46) and two positions (75, 76), respec-
117, these positions corresponding to a FG loop of 13 tively, compared to those of the V domain. The trans-
amino acids), and the G-STRAND of 11 (or less) versal CD-STRAND is a characteristic of the
amino acids (positions 118–128) (Lefranc et al. C domain (a V domain has instead two antiparallel
2005a) (Table 2, Fig. 2). beta strands C0 -STRAND and C00 -STRAND linked by
I 956 IMGT Unique Numbering
A AB B BC C CD D DE E EF F FG G
(1-15) (16-26) (27-38) (39-45) (77-84) (85-96) (97-104) (105-117) (118-128)
> > > > > > > >
1 10 15 16 23 26 27 38 39 41 45 77 84 85 89 96 97 104 105 117 118 128
87654321 . . . . . . . . . . . . 123 . . . . . . . . . . . . . . . . . . . . 1234567 . . . . . . 12345677654321 . . . . . . . . . 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
IMGT Unique Numbering, Fig. 2 “IMGT Protein display for (Lefranc et al. 2005a). Missing positions (32 and 33 in the BC
C domain” header. The header comprises 5 lines. Line 1: name loop and 46 to 76) in the C domain by comparison with the
of strands (A, B. . .), turns and loops (AB, BC. . .). Line 2: V domain are not shown. Line 5: Pipes for the start and end
start and end positions of strands, e.g., (1–15). Line 3: arrows positions of strands and loops and highlighted positions, dots
(for the strands). Line 4: positions according to the IMGT for the other positions, plus if needed, numbers for additional
unique numbering for C-DOMAIN and C-LIKE-DOMAIN positions (e.g., 123. . .) (Lefranc et al. 2005a)
the C0 C00 -LOOP). The E-STRAND, F-STRAND, and IMGT Unique Numbering, Table 3 G domain strands, turns
G-STRAND of the C domain are similar to those of the and helix, IMGT positions and lengths, based on the IMGT
unique numbering for G domain (G-DOMAIN and G-LIKE-
V domain (Lefranc et al. 2005a) (IMGT Scientific
DOMAIN) (Lefranc et al. 2005b)
chart rules, http://www.imgt.org).
Characteristic
Residue @
IMGT Gaps and Additional Positions Positionc and
Dots in IMGT Protein displays (Fig. 2) and hatched G domain strands, IMGT additional
circles or squares in IMGT Colliers de Perles for turns and helixa positions Lengthsb positionsd
C domain (▶ IMGT Collier de Perles) correspond to A-STRAND 1–14 14 7A, CYS-11
missing positions according to the IMGT unique num- AB-TURN 15–17 3 (or 2 or
0)
bering for C domain (Lefranc et al. 2005a).
B-STRAND 18–28 11 (or 10e)
The longest BC-LOOP of the C domain have 10
BC-TURN 29–30 2
amino acids (missing positions 32 and 33, that are
C-STRAND 31–38 8
a feature of the C domain are not shown in the IMGT CD-TURN 39–41 3 (or 1f)
Colliers de Perles and IMGT Protein displays for D-STRAND 42–49 8 49.1 to 49.5
C domain). For BC loops shorter than 10 amino HELIX 50–92 43 (or less 54A, 61A, 61B,
acids, gaps are created from the apex in the following or more) 72A, CYS-74, 92A
®
order 34, 31, 35, 30, 36, etc. The FG-LOOP of the a
IMGT labels (concepts of description) are written in capital
C domain is similar to that of the V domain. Gaps for letters (Labels and relations)
b
FG loops shorter than 13 amino acids and additional In number of amino acids (or codons)
®
c
Residue@Position is a IMGT concept of numerotation that
positions for FG loops longer than 13 amino acids are numbers the position of a given residue (or that of a conserved
created following the same rules as those of the property amino acid class), based on the IMGT unique
V domain. numbering
d
Additional positions in the C domain define the AB- For details on the characteristic Residue@Position and addi-
tional positions, see text (Lefranc et al. 2005b)
TURN, DE-TURN, and EF-TURN (Table 2). For AB- e
Or 9 in some G-BETA (Lefranc et al. 2005b)
TURN shorter than 3 amino acids, gaps are created f
Or 0 in some G-ALPHA2-LIKE (Lefranc et al. 2005b)
(hatched in IMGT Colliers de Perles, or not shown in
structural data representations) in a decreasing ordinal
manner. For DE-TURN shorter than 14 amino acids,
gaps are created in the following order: 85.7, 84.7, other than MH or related proteins of the immune sys-
85.6, 84.6, 85.5, etc. For EF-TURN shorter than 2 tem (RPI)-MH1Like. The G domain strands, turns
amino acids, gaps are created in the following order: and helix and their IMGT positions and lengths,
96.2, 96.1 (Lefranc et al. 2005a). based on the IMGT unique numbering for G domain
(G-DOMAIN and G-LIKE-DOMAIN) (Lefranc et al.
IMGT Unique Numbering for G Domain 2005b), are shown in Table 3.
G Domain Strands, Turns and Helix The G domain (G-DOMAIN of MH and G-LIKE-
The IMGT unique numbering for G domain numbers DOMAIN of RPI-MH1Like) comprises a sheet of four
the G-DOMAIN of major histocompatibility (MH) and antiparallel beta strands linked by turns and a helix that
the G-LIKE-DOMAIN of ▶ MH superfamily (MhSF) seats on the beta strands, its axis forming an angle of
IMGT Unique Numbering 957 I
A B C D helix
AB BC CD
(1-14) (18-28) (31-38) (42-49) (50-92)
> > > >
1 7 14 18 28 31 38 42 49 50 54 61 72 80 90 92
0987654321 . . . . . A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1234567 . . . A . . . . . . AB . . . . . . . . . . A . . . . . . . . . . . . . . . . .
IMGT Unique Numbering, Fig. 3 “IMGT Protein display for numbering for G-DOMAIN and G-LIKE-DOMAIN (Lefranc
G domain” header. The header comprises 5 lines. Line 1: name et al. 2005b). Line 5: Pipes for the start and end positions of
of strands (A, B, C, D), turns (AB, BC, CD) and helix. Line 2: strands and helix and highlighted positions, dots for the other
start and end positions of strands (1–14), (18–28), (31–38), (42– positions, plus if needed, letters or numbers for additional posi-
49) and helix (50–92). Line 3: arrows (for the strands) and line tions that characterize some G domains (e.g., A, AB or 123. . .)
(for the helix). Line 4: positions according to the IMGT unique (Lefranc et al. 2005b)
about 40 with the strands. Two G domains are needed The additional position 7A represents a bulge in 3D
to form the MhSF groove made of a “floor” and two structures and is present in some G-ALPHA domains,
“walls.” Each G domain contributes by its strands and for instance those of the human HLA-DQA1 and
turns to half of the groove floor and by its helix to one HLA-DOA, and mouse H2-AA and H2-DOA chains
wall of the groove (Lefranc et al. 2005b). (Lefranc et al. 2005b). Additional positions at the
The G domain is composed by the A-STRAND of N-terminus of A-STRAND or at the C-terminus of I
14 amino acids (positions 1–14), the AB-TURN of D-STRAND can be added if necessary. Thus, two addi-
three or less amino acids (positions 15–17), the tional positions (1.2 and 1.1) are added at the N-terminus
B-STRAND of 11 or less amino acids (positions 18– of the A-STRAND of the G-ALPHA domains as the
28), the BC-TURN of two amino acids (positions 29 presence of these two amino acids was demonstrated
and 30), the C-STRAND of eight amino acids (posi- by protein sequencing of the HLA-DRA; however the
tions 31–38), the CD-TURN of three or less amino proteolytic cleavage site of the leader peptide (L-
acids (positions 39–41), the D-STRAND of eight REGION) needs to be confirmed experimentally for the
amino acids (positions 42–49), and a helix of 43 (or other G-ALPHA (Lefranc et al. 2005b). In each [D1] G-
less or more) amino acids (positions 50–92) (Lefranc DOMAIN (except the G-ALPHA domain of human
et al. 2005b) (Table 3, Fig. 3). HLA-DMA and mouse H2-DMA discussed below), the
The helix is split into two parts separated by a kink, amino acid at position 1 (shown within parentheses in
positions 58 of G-ALPHA1, 61 of G-ALPHA2, 63 of IMGT Protein displays, http://www.imgt.org) is encoded
G-ALPHA, and 62 of G-BETA being the “highest” by the codon that results from the splicing between the
points on the groove floor (Lefranc et al. 2005b). first exon (EX1) that encodes the L-REGION, and the
second exon (EX2) that encodes [D1] (Lefranc et al.
IMGT Gaps and Additional Positions 2005b). Ten amino acids have been added at positions
Dots in IMGT Protein displays (Fig. 3) and hatched 1.10–1.1 of HLA-DMA and H2-DMA but it is necessary
circles or squares in IMGT Collier de Perles for to confirm experimentally if they belong, or not, to
G domain (▶ IMGT Collier de Perles) correspond to the mature protein. Four additional positions (49.1–
missing positions according to the IMGT unique num- 49.4) are also observed at the C-terminus of D-STRAND
bering for G domain (Lefranc et al. 2005b). of the G-BETA domain of HLA-DMB and H2-DMB1
The gaps are localized in the turns. The AB-TURN (Lefranc et al. 2005b).
(positions 15–17) comprises three amino acids in the Interestingly, despite the high sequence divergence,
G-ALPHA1, G-ALPHA2, and G-BETA domains but only five additional positions in the helix (54A, 61A,
these positions are unoccupied in the G-ALPHA domains 61B, 72A and 92A) are necessary to align any G domain
(as well as position 18 of B-STRAND). The BC-TURN (Lefranc et al. 2005b). Three of them (61A, 61B, 72A)
(positions 29 and 30) comprises two positions that are characterize the G-ALPHA2 and/or G-BETA domains.
occupied in all G-DOMAIN. The CD-TURN (positions Indeed, positions 61A and 72A are occupied in the G-
39–41) is occupied by three amino acids in the ALPHA2 domain, whereas positions 61A, 61B, and 72A
G-ALPHA1 domains and by one amino acid in most of are occupied in G-BETA domains (except for the mouse
the other G domains (Lefranc et al. 2005b). H2-AB chain). The position 92A is only occupied in the
I 958 IMGT Unique Numbering
HLA-DMA and H2-DMA G-ALPHA domains. It is 2003, 2005a, b), the IMGT unique numbering has been
worthwhile to note that position 54A in G-ALPHA1- fundamental:
LIKE is the only additional position needed to • In the creation of the IMGT Colliers de Perles for V,
extend the IMGT numbering for G-DOMAIN to the C and G domain (▶ IMGT Collier de Perles)
G-LIKE-DOMAIN of the RPI-MH1Like proteins • In the definition of the Residue@Position concept
(Lefranc et al. 2005b). (standardized numerotation of amino acid changes
and polymorphisms)
G Domain Characteristics Comparison • In the standardized numerotation of mutations (at
Two cysteines, CYS-11 (in strand A) and CYS-74 the codon and nucleotide level) and definition of
(in the helix), are well conserved in the G-ALPHA2 alleles
and G-BETA domains where they participate to • In the delimitation of the FR-IMGT and CDR-
a disulfide bridge that fastens the helix on the groove IMGT for antibody engineering and antibody
floor. The G-ALPHA1 and G-ALPHA domains have humanization (Lefranc 2009)
a conserved N-glycosylation site at position 86 (N-X- • In the comparison of interactions and contact anal-
S/T, where N is asparagine, X is any amino acid except ysis between domains in 3D structures (Ehrenmann
proline, S is serine, and T is threonine). An et al. 2010)
N-glycosylation site is also found at position 86 in By bridging the gap between amino acid (and nucle-
the G-ALPHA2 domain of the mouse classical MH1 otide) sequences and 3D structures, the IMGT unique
chains (H2-D1, H2-K1 and H2-L). The G-BETA numbering is essential for domain comparison in immu-
domains (except for the human HLA-DMB and nogenetics and system biology. It is used in the IMGT®
mouse H2-DMB1 chains) have a conserved N- sequence and structure databases, the IMGT online
glycosylation site at position 15 (AB-TURN) (Lefranc tools (IMGT/V-QUEST, IMGT/DomainGapAlign,
et al. 2005b) IMGT/DomainDisplay, IMGT/Collier-de-Perles, etc.)
Interestingly, the G-ALPHA domains of the and the IMGT knowledge web resources (IMGT Pro-
HLA-DMA and H2-DMA chains have specific fea- tein displays, IMGT Alignments of alleles, IMGT
tures compared to the other G-ALPHA domains and Table of alleles, IMGT Colliers de Perles, etc.) (http://
share common characteristics with the G-ALPHA2 www.imgt.org) (▶ IMGT® Information System).
and G-BETA domains: there is a conserved As a major IMGT-ONTOLOGY concept of
CYS-11_CYS-74 disulfide bridge, positions 61A and numerotation, the IMGT unique numbering is used,
61B are occupied (as in the G-BETA domains) and with the IMGT-ONTOLOGY concepts of classifica-
there is no N-glycosylation site at position 86 (Lefranc tion (▶ Gene and Allele Nomenclature) and concepts
et al. 2005b) (IMGT Protein display in IMGT Reper- of description (▶ Labels and Relations), in the
toire, http://www.imgt.org). definition of monoclonal antibodies (mAb, suffix -mab)
Practically, the IMGT unique numbering for posi- and fusion proteins for immune applications (FPIA,
tions 1–39 and 73–92 of the G-ALPHA2 domains can suffix -cept) of the World Health Organization/Interna-
be obtained very easily by subtracting 90 from the tional Nonproprietary Name (WHO/INN) programme
mature protein numbering (91–129 and 163–182). (Lefranc 2011).
Between these positions, the two gaps (at positions
40 and 41) and the two insertions (at positions 61A
and 72A) are necessary, in the IMGT unique number- Cross-References
ing, to allow meaningful sequence and structure align-
ment and comparison between the G-ALPHA1 and ▶ Chain Type
G-ALPHA2 sequences. ▶ Complementarity Determining Region (CDR-
IMGT)
IMGT Unique Numbering Applications in ▶ Configuration Type
Immunogenetics and System Biology ▶ Constant (C) Domain
By allowing to number the V and C domains of the IG, ▶ Domain Type
TR, and IgSF other than IG and TR, and the G domain ▶ Framework Region (FR-IMGT)
of the MH and MhSF other than MH (Lefranc et al. ▶ Gene and Allele Nomenclature
IMGT® Information System 959 I
▶ Groove (G) Domain domains and Ig superfamily C-like domains. Dev Comp
▶ IMGT Collier de Perles Immunol 29:185–203
Lefranc M-P, Duprat E, Kaas Q, Tranne M, Thiriot A, Lefranc G
▶ IMGT-ONTOLOGY (2005b) IMGT unique numbering for MHC groove
▶ IMGT-ONTOLOGY, CLASSIFICATION Axiom G-DOMAIN and MHC superfamily (MhcSF) G-LIKE-
▶ IMGT-ONTOLOGY, DESCRIPTION Axiom DOMAIN. Dev Comp Immunol 29:917–938
▶ IMGT-ONTOLOGY, Leafconcept Scaviner D, Barbié V, Ruiz M, Lefranc M-P (1999) Protein
displays of the human immunoglobulin heavy, kappa and
▶ IMGT-ONTOLOGY, NUMEROTATION Axiom lambda variable and joining regions. Exp Clin Immunogenet
▶ IMGT ® Information System 16:234–240
▶ Immunogenetics
▶ Immunoglobulin Superfamily (IgSF)
▶ Immunoglobulin Synthesis
▶ Immunoinformatics
▶ Labels and Relations IMGT ® Information System
▶ MH Superfamily (MhSF)
▶ MoleculeType Marie-Paule Lefranc
▶ Variable (V) Domain Laboratoire d’ImmunoGénétique Moléculaire,
Institut de Génétique Humaine UPR 1142,
Université Montpellier 2, Montpellier, France I
References
Se
qu
Identification of
ce
s/
mu
Gen tat
e ion
iden and al s
Nu
IMGT / CLL-DB le
tific
atio le
cle
Ana n IMGT / Allele-Align
oti
ly
junc sis of
dic
tion
/p
s
ro
IMGT / V-QUEST
tei
cse
es
qu
nc
IMGT /
ns f
tio s o
en
ue
mAb-DB
ce
nc si
eq
ju aly
s
/s
Se
An
ity
q ue
ific
INN
nc
ec
sequences
Sp
IMGT/JunctionAnalysis
es
/ ph
ylo
IMGT / GeneFrequency IMGT /
ge
Genes / reference sequences
ny
2Dstructure-DB
IMGT/PhyloGene
IMGT / GeneInfo
IMGT / DomainDisplay
es
IMGT / CloneSearch
nc
ny
ue
ge
IMGT / GeneSearch
eq
IMGT / DomainGapAlign
ylo
es
IMGT / GeneView
ph
nc
ere
s/
ne
IMGT / LocusView
ref
Ge
IMGT / Collier-de-Perles
s/
ne
Ge
loc Gen
3D structures
ali e
proteins
sa
tio
n
Genes
/ 3D str
IMGT / GENE-DB uctures
Genetic approach
Genomic approach IMGT /
3Dstructure-DB
Structural approach
s
ture
s truc 3D
Sequences 3D domains
Genes
3D structures IMGT / StructuralQuery IMGT / DomainSuperimpose
Monoclonal antibodies
IMGT ® Information System, Fig. 1 IMGT®, the international throughput version of IMGT/V-QUEST for Next Generation
ImMunoGeneTics information system ® (http://www.imgt.org). Sequencing (NGS) and deep sequencing of IG and TR V
IMGT ® comprises databases (shown as cylinders), tools (shown domains. Interactions between databases and tools in the geno-
as rectangles) and Web resources (not shown, see Table 1 for mic, genetic, and structural approaches are represented with
examples). IMGT/HighV-QUEST (not shown) is the high continuous, dotted, and broken lines
The IMGT ® sequences databases manage: 2. MH sequences of human and other vertebrates
1. IG and TR nucleotide sequences from human and (IMGT/MH-DB)
other vertebrate species (IMGT/LIGM-DB, created 3. Oligonucleotide sequences used as primers for com-
in 1989, on the Web since July 1995, the first and binatorial library constructions, scFv, phage display,
the largest IMGT ® database) or microarray technologies (IMGT/PRIMER-DB)
I 962 IMGT ® Information System
IMGT ® Information System, Table 1 IMGT ® databases, tools, and Web resources for genomic, genetic, and structural
approaches
Approaches Databases Tools Web resourcesa
Genomic IMGT/GENE-DB IMGT/GeneView IMGT Repertoire, Locus and genes section:
IMGT/LocusView – Chromosomal localizations
IMGT/CloneSearch – Locus representations
IMGT/GeneSearch – Locus description, etc.
IMGT/GeneInfo – Gene tables
IMGT/GeneFrequency – Potential germline repertoires
– Lists of genes
– Correspondence between nomenclatures
Genetic IMGT/LIGM-DB IMGT/V-QUEST IMGT Repertoire, Proteins and alleles section:
IMGT/MH-DB IMGT/HighV-QUEST – Alignments of alleles
IMGT/PRIMER-DB IMGT/JunctionAnalysis – Tables of alleles
IMGT/CLL-DB IMGT/Allele-Align – Allotypes
IMGT/mAb-DB IMGT/PhyloGene – Isotypes
IMGT/2Dstructure-DB IMGT/DomainDisplay – Protein displays, etc.
Structural IMGT/3Dstructure-DB IMGT/DomainGapAlign IMGT Repertoire, 2D and 3D structures section:
IMGT/Collier-de-Perles – IMGT Colliers de Perles (2D representations on one layer
or two layers)
IMGT/DomainSuperimpose – IMGT ® classes for amino acid characteristics
IMGT/StructuralQuery – IMGT Colliers de Perles reference profiles
– 3D representations
a
Only Web resource examples from the IMGT Repertoire sections are shown
4. Amino acid sequences of therapeutical monoclonal binding characteristics in relationship with the protein
antibodies of the WHO/International Nonproprietary functions, polymorphisms, and evolution.
Name (INN) program (IMGT/2Dstructure-DB cards) The IMGT® 3D structure database comprises IG,
The IMGT ® sequence analysis tools allow the iden- TR, MH, IgSF, MhSF, RPI, mAb, and FPIA with
tification of the variable (V), diversity (D), and joining known 3D structures (IMGT/3Dstructure-DB). The
(J) genes and of their mutations in rearranged IG and IMGT/3Dstructure-DB cards provide chain details with
TR sequences (IMGT/V-QUEST, approved by the IMGT annotations (receptor, chain and domain descrip-
European Research Initiative on Chronic Lymphocytic tion, IMGT gene and allele names, domain delimitations,
Leukaemia CLL (ERIC) for the analysis of the IGHV and ▶ IMGT Collier de Perles), contact analysis, down-
gene mutational status in CLL), the analysis of the V-J loadable renumbered IMGT/3Dstructure-DB flat files,
and V-D-J junctions that confer the antigen receptor visualization tools (Jmol and QuickPDB), and external
specificity (IMGT/JunctionAnalysis), the detection of links. IMGT Residue@Position cards provide detailed
polymorphisms (IMGT/Allele-Align), gene evolution information on the inter- and intra-domain contacts at
analyses (IMGT/Phylogene), or the display of amino each residue position, based on the ▶ IMGT unique
acid sequences from the IMGT domain directory numbering.
(IMGT/DomainDisplay). The IMGT ® structural tools allow to analyze amino
The IMGT ® genetics resources include alignments acid sequences per domain (IMGT/DomainGapAlign),
of alleles, tables of alleles, allotypes, isotypes, and to make ▶ IMGT Collier de Perles (IMGT/Collier-de-
protein displays (IMGT Repertoire, Proteins and Perles), to superimpose two domain 3D structures from
alleles section). IMGT/3Dstructure-DB (IMGT/DomainSuperimpose),
or to retrieve the IMGT/3Dstructure-DB entries
Structural Approach containing a V domain, based on specific structural
The IMGT ® structural approach refers to the study of characteristics of the intramolecular interactions: phi
the 2D and 3D structures and to the antigen- or ligand- and psi angles, distance in angstrom between amino
IMGT® Information System 963 I
acids, ▶ complementarity determining region (CDR-
FROM SEQUENCES TO
3D STRUCTURES IMGT) lengths, and so on (IMGT/StructuralQuery).
>CAMPATH-1H|1bey_H|VH-CH1|219 AA|23397.2
The IMGT ® structural resources include IMGT
MW|VH+CH1
Colliers de Perles, ▶ framework region (FR-IMGT),
QVQLQESGPGLVRPSQTLSLTCTVSGFTFTDFYMNWVRQ
PPGRGLEWIGFIRDKAKGYTT
and CDR-IMGT lengths, profiles based on the 11
EYNPSVKGRVTMLVDTSKNQFSLRLSSVTAADTAVYYCAR
EGHTAAPFDYWGQGSLVTVS IMGT amino acid physicochemical classes (IMGT
SASTKGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVS
WNSGALTSGVHTFPAVLQS Repertoire, 2D and 3D structures section).
SGLYSLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKKV
IMGT ® Users
Since July 1995, IMGT ® has been available on the
Web at the IMGT Home page http://www.imgt.org
(Montpellier, France). The knowledge provided by the
IMGT® information system is of much value to clini-
cians and biological scientists in general. IMGT® data-
bases, tools, and Web resources are extensively queried
and used by scientists from both academic and research
laboratories from pharmaceutical companies, who are
equally distributed between the United States, Europe,
and the remaining world. IMGT® is used in very diverse
domains: (1) fundamental research, (2) medical research Kaas Q, Ehrenmann F, Lefranc M-P (2007) IG, TR, MHC, IgSf
(vaccine design, repertoire analysis of the IG antibody and MhcSF: what do we learn from the IMGT Colliers de
Perles? Brief Funct Genomic Proteomic 6:253–264
sites and of the TR recognition sites in normal and Lefranc M-P (2008) WHO-IUIS Nomenclature Subcommittee
pathological situations such as autoimmune diseases, for immunoglobulins and T cell receptors report August
infectious diseases, AIDS, leukemias, lymphomas, and 2007, 13th International Congress of Immunology, Rio de
myelomas), (3) veterinary research (immune repertoires Janeiro, Brazil. Dev Comp Immunol 32:461–463
Lefranc M-P (2009) Antibody database and tools: the IMGT ®
in domestic and wild life species, animal models), experience, Ch 4. In: Zhiqiang An (ed) Therapeutic mono-
(4) genomics (study of the genome diversity and evolu- clonal antibodies: from bench to clinic. John Wiley Sons,
tion of the adaptive immune response), (5) structural Hoboken, pp 91–114
biology (evolution of the domains of the IgSF and Lefranc M-P, Lefranc G (2001a) The immunoglobulin
FactsBook. Academic Press, London, UK, pp 1–458
MhSF proteins), (6) biotechnology related to antibody Lefranc M-P, Lefranc G (2001b) The T cell receptor FactsBook.
engineering and ▶ antibody humanization (construction Academic Press, London, UK, pp 1–398
and analysis of single chain Fragment variable Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008)
(scFv), phage displays, combinatorial libraries, chimeric, IMGT, a system and an ontology that bridge biological and
computational spheres in bioinformatics. Brief Bioinform
humanized, and human antibodies) (Fig. 2) (Lefranc 9:263–275
2009; Ehrenmann et al. 2010), (7) diagnostics (clonalities, Lefranc M-P, Giudicelli V, Ginestoux C, Jabado-Michaloud J,
detection, and follow-up of minimal residual diseases), Folch G, Bellahcene F, Wu Y, Gemrot E, Brochet X, Lane J,
and (8) therapeutical approaches (grafts, immunotherapy, Regnier L, Ehrenmann F, Lefranc G, Duroux P (2009)
IMGT ®, the international ImMunoGeneTics information
and vaccinology). system ®. Nucl Acids Res 37:D1006–D1012
Cross-References
Synonyms
References
International ImMunoGeneTics information system ®;
Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, Lefranc M-P, Ontology of IMGT ®; Ontology of the IMGT ®
Giudicelli V (2008) IMGT-Kaleidoscope, the formal IMGT-
information system
ONTOLOGY paradigm. Biochimie 90:570–583
Ehrenmann F, Duroux P, Giudicelli V, Lefranc M-P (2010) -
Standardized sequence and structure analysis of antibody
using IMGT ®. In: Kontermann R, D€ ubel S (eds) Antibody Definition
engineering, Ch 2, vol 2, 2nd edn. Springer-Verlag, Berlin
Heidelberg, pp 11–31
Giudicelli V, Lefranc M-P (1999) Ontology for immunogenet- IMGT-ONTOLOGY is the ontology for ▶ immunoge-
ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–1054 netics and ▶ immunoinformatics. IMGT-ONTOLOGY,
IMGT-ONTOLOGY 965 I
created to manage the complexity of ▶ immunogenetics IDENTIFICATION OBTENTION
knowledge, is the first ontology and global reference in
the domain. IMGT-ONTOLOGY has been essential DESCRIPTION
IMGT-ONTOLOGY, Table 1 Axioms of the Formal IMGT-ONTOLOGY (or IMGT-Kaleidoscope), concepts generated from them,
examples of concepts, and IMGT Scientific chart rules
Formal IMGT- Concepts generated from IMGT Scientific
ONTOLOGY axioms the axioms Examples of IMGT-ONTOLOGY concepts chart rulesa
IDENTIFICATION Concepts of ChainType (*) Standardized
axiom identification ConfigurationType keywords
DomainType
FunctionalityType
FunctionType
GeneType
LocationType
Molecule_EntityType
MoleculeType
MoleculeUnit
ReceptorType (*)
SpecificityType
StructureType
TaxonRank (*)
DESCRIPTION axiom Concepts of description Core Prototypes
Molecule_EntityPrototype Standardized labels
Recombination signal (RS) (Labels and
relations)
CLASSIFICATION Concepts of Group Gene and allele
axiom classification Subgroup nomenclature
Gene
Allele
NUMEROTATION Concepts of IMGT_unique_numbering (IMGT unique numbering) Contact analysis
axiom numerotation for V, C, and G domains Paratope/epitope
IMGT_Collier_de_Perles (IMGT Collier de Perles) for Conserved positions
V, C, and G domains
LOCALIZATION axiom Concepts of localization Cell compartment Chromosomal
Chromosome localization
Locus Locus
representation
Organ part Standardized
Organ keywords
ORIENTATION axiom Concepts of orientation DNA_strand_orientation IMGT index
Genomic_orientation (*)
OBTENTION axiom Concepts of obtention Origin Standardized
Methodology keywords
(*) indicates a highconcept, i.e., a concept with different levels of granularity
a
The corresponding controlled vocabulary and rules are available in the IMGT Scientific Chart at http://www.imgt.org
obtained can be characterized (Duroux et al. 2008; any molecule, cell, tissue, organ, organism, or popula-
Lefranc et al. 2008) (Fig. 1). tion, any process and any relation, has to be identified.
The IMGT-ONTOLOGY axioms, the concepts The IDENTIFICATION axiom has generated
generated from them, examples of concepts, and the concepts of identification which provide the
IMGT Scientific chart rules are listed in Table 1. terms and rules to identify an entity, its processes,
and its relations. The ▶ TaxonRank highconcept
IDENTIFICATION Axiom (Table 1) allows to identify the type of taxon
The ▶ IDENTIFICATION axiom of the Formal IMGT- in which an object, process, or relation is found
ONTOLOGY or IMGT-Kaleidoscope postulates that (Giudicelli and Lefranc 1999). Two major concepts
IMGT-ONTOLOGY 967 I
conventional
gDNA undefined
variable
mRNA germline
MoleculeType GeneType diversity ConfigurationType
cDNA rearranged
joining
protein partially-rearranged
constant
is_defined_by
defines is_defined_by
defines
defines is_defined_by
Molecule_EntityType
functional I
FunctionalityType StructureType regular
ORF
chimeric
pseudogene
humanized
productive
unproductive
IMGT-ONTOLOGY, Fig. 2 The “Molecule_EntityType” “is_defined_by” and “defines,” “_has_”, and “_for_”.
concept. The “Molecule_EntityType” concept is defined by the Leafconcepts are general (online: in blue) or specific of the IG
“MoleculeType,” “GeneType,” and “ConfigurationType” and TR (online: in red). The “Molecule_EntityType” concept
concepts of identification and has properties identified in the has 38 leafconcepts. Only a few examples of the
“FunctionalityType” and “StructureType” concepts (IDENTIFI- “StructureType” leafconcepts are shown
CATION axiom). Arrows indicate reciprocal relations
ChainType
defines is_defined_by
ReceptorType
IMGT-ONTOLOGY, Fig. 3 The “ReceptorType” concept. The “DomainType” concepts and by concepts of classification orga-
“ReceptorType” concept, defined by the “ChainType” concept nized in a hierarchy (see CLASSIFICATION axiom). Arrows
of identification, has properties identified in the indicate reciprocal relations “is_defined_by” and “defines,”
“StructureType,” “SpecificityType,” and “FunctionType” con- “_has_”, and “_for_”. The “ChainType” and “ReceptorType”
cepts (IDENTIFICATION axiom). The “ChainType” concept is concepts have different levels of granularity (up to five) and are
itself defined by the “Molecule_EntityType” and highconcepts
S
-
S
S
CL VL
S
CH1 CH1
CL
-
S
-
-
S
S
-
S
S
S
CH1
S
- VL
-
S
S
Hinge-Region
S
CL
CH2 CH2
S S
CH2
-
S S
CH3
CH3 S S
CH3
-
S S
c d
V
I
V
V V
V J
1
2
CH1 CH1 V 3 CL
C C
DJ CH
1
CH2 CH3
Hinge
CH2 CH3
CH2 CH2 DJ
CH1
V
CL
CH3 CH3
IMGT-ONTOLOGY, Fig. 4 Identification of an IG or antibody respectively. CL, CH1, CH2, and CH3 are C domains (Constant
as a leafconcept of the “ReceptorType” concept. The four (C) domain). (a) 3D structure, (b) organization in Ig-like
representations, although different, allow to identify an IG domains, (c) organization in modules, and (d) regions coded by
as a receptor of four chains, two IG-Heavy-Chain and two the V, D, J, and C genes (GeneType). The C gene codes the
IG-Light-Chain (“ChainType”), themselves organized in constant region (CL for the IG-Light-Chain and CH1, hinge,
domains (“DomainType”). VH and VL are V domains (Variable CH2, and CH3 for the IG-Heavy-Chain). This representation,
(V) domain), coded by the V-D-J-region and V-J-region, schematized as a Y shape, is frequently used to represent an IG
leafconcept describes the entity organization with ▶ Constant (C) Gene) (▶ Gene Type). These impor-
its constitutive motifs and relations (▶ Labels and tant labels allow to describe the leafconcepts of
Relations) (Fig. 5). “Molecule_EntityPrototype” and to link sequences
and structures. Moreover, they permitted the definition
“Core” Concept and standardized description of the IG and TR alleles
Among the concepts of description, the “Core” (concepts of classification), now approved at the inter-
concept allows to describe the coding region of genes national level (Lefranc and Lefranc 2001a, b).
and contains five leafconcepts which are “REGION”
(for ▶ Conventional Gene), “V-REGION,” CLASSIFICATION Axiom
“D-REGION,” “J-REGION,” and “C-REGION” (for The ▶ CLASSIFICATION axiom of the Formal
V, D, J, and C genes, respectively) (▶ Variable (V) IMGT-ONTOLOGY or IMGT-Kaleidoscope postu-
Gene, ▶ Diversity (D) Gene, ▶ Joining (J) Gene, lates that any molecule, cell, tissue, organ, organism,
I 970 IMGT-ONTOLOGY
a V-GENE
L-V-GENE-UNIT
L-INTRON-L V-REGION V-RS
3′-V-REGION V-SPACER
V-EXON
b V-D-J-GENE
(N-D)-REGION
D-REGION
N-REGION N-REGION
L-INTRON-L V-REGION J-REGION
3′-V-REGION 5′-J-REGION
JUNCTION
V-D-J-REGION
V-D-J-EXON
IMGT-ONTOLOGY, Fig. 5 Graphical representation of two GENE. Thirty-nine labels and twelve relations are necessary
leafconcepts of “Molecule_EntityPrototype” (DESCRIPTION and sufficient for a complete description of these leafconcepts
axiom of IMGT-ONTOLOGY). (a) V-GENE. (b) V-D-J-
or population, any process and any relation, has to be (Giudicelli et al. 2005), the IMGT® gene database, in
classified. In molecular biology, the concepts of clas- the Human Genome Database (GDB), in LocusLink at
sification generated from the CLASSIFICATION the National Center for Biotechnology Information
axiom allow to classify and name the genes and their (NCBI), in Entrez Gene when this database superseded
alleles (▶ Gene and Allele Nomenclature). The genes LocusLink, in Ensembl at the European Bioinformatics
which code the IG and TR belong to highly polymor- Institute (EBI), and in the Vega Genome Browser at the
phic multigenic families. A major contribution of Wellcome Trust Sanger Institute.
IMGT-ONTOLOGY was to set the principles of their
classification (Giudicelli and Lefranc 1999) (Fig. 6) NUMEROTATION Axiom
and to propose a standardized nomenclature (Lefranc The ▶ NUMEROTATION axiom of the Formal IMGT-
and Lefranc 2001a, b) based on the concepts of Group, ONTOLOGY or IMGT-Kaleidoscope postulates that any
Subgroup, Gene, and Allele (▶ Gene and Allele molecule, cell, tissue, organ, organism or population, any
Nomenclature). process and any relation, has to be numbered. In molec-
The IMGT® gene nomenclature for IG and TR genes ular biology, two major concepts of numerotation, gener-
(Lefranc and Lefranc 2001a, b) has been approved at the ated from the NUMEROTATION axiom, comprises the
international level by the Human Genome Organisation widely acknowledged ▶ IMGT unique numbering and
(HUGO) Nomenclature Committee (HGNC), in 1999, ▶ IMGT Collier de Perles concepts.
and by the World Health Organization-International
Union of Immunological Societies (WHO-IUIS) LOCALIZATION, ORIENTATION, and OBTENTION
(Lefranc 2007, 2008). The IMGT® IG and TR gene Axioms
names are the official reference for the genome projects The ▶ LOCALIZATION axiom, the ▶ ORIENTA-
and, as such, have been entered in IMGT/GENE-DB TION axiom, and the ▶ OBTENTION axiom of the
IMGT-ONTOLOGY 971 I
a b
Group Locus IGHV Human IGH
14q32.33
in
in
d_
d_
re
ne
re
de
ne
ge
de
Subgroup IGHV1
or
ge
s_
or
is_
s_
is_
ha
ha
is_a_member_of has_member is_a_member_of has_member
Gene IGHV1-2
Allele IGHV1-2*01
IMGT-ONTOLOGY, Fig. 6 Concepts of classification for gene They are associated to a “TaxonRank” concept and more pre-
and allele nomenclature (CLASSIFICATION axiom). (a) Hier- cisely for the “Gene” and “Allele” concepts to a leafconcept of
archy of the concepts of classification and their relations. “Species” (here, Homo sapiens). The “Locus” concept is I
(b) Examples of leafconcepts for each concept of classification. a concept of localization (LOCALIZATION axiom)
Chain type
References
Cross-References Definition
Cross-References Definition
Cross-References Definition
Marie-Paule Lefranc
Laboratoire d’ImmunoGénétique Moléculaire,
Cross-References
Institut de Génétique Humaine UPR 1142,
Université Montpellier 2, Montpellier, France
▶ Constant (C) Domain
▶ Epitope
▶ Groove (G) Domain
Synonyms
▶ IMGT-ONTOLOGY
▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom
Gene type
▶ IMGT-ONTOLOGY, Leafconcept
▶ IMGT-ONTOLOGY, SpecificityType
▶ IMGT ® Information System
▶ Paratope
Definition
▶ Variable (V) Domain
“GeneType” is a concept of identification
(generated from the ▶ IDENTIFICATION axiom)
References of ▶ IMGT-ONTOLOGY, the global reference
in ▶ immunogenetics and ▶ immunoinformatics
Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, (Giudicelli and Lefranc 1999; Lefranc et al. 2004,
Lefranc M-P, Giudicelli V (2008) IMGT-Kaleidoscope, 2005, 2008; Duroux et al. 2008), built by IMGT ®, the
the Formal IMGT-ONTOLOGY paradigm. Biochimie 90: international ImMunoGeneTics information system ®
570–583
Giudicelli V, Lefranc M-P (1999) Ontology for immunoge-
(http://www.imgt.org) (▶ IMGT ® Information Sys-
netics: the IMGT-ONTOLOGY. Bioinformatics 12: tem), that allows to identify the type of gene.
1047–1054 The “GeneType” concept comprises six
Lefranc M-P, Giudicelli V, Ginestoux C, Bose N, Folch G, leafconcepts (▶ IMGT-ONTOLOGY, Leafconcept):
Guiraudou D, Jabado-Michaloud J, Magris S,
Scaviner D, Thouvenin V, Combres K, Girod D,
• Two leafconcepts refer to “conventional”
Jeanjean S, Protat C, Yousfi Monod M, Duprat E, (▶ Conventional Gene) genes, that is, any (coding
Kaas Q, Pommié C, Chaume D, Lefranc G (2004) or not coding) gene other than the immunoglob-
IMGT-ONTOLOGY for immunogenetics and immunoin- ulin (IG) or T cell receptor (TR) genes:
formatics. In Silico Biol 4:17–29
Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P,
these leafconcepts “conventional-with-leader”
Coelho I, Combres K, Ginestoux C, Giudicelli V, and “conventional-without-leader” identify any
Chaume D, Lefranc G (2005) IMGT-Choreography for (coding or not coding) gene with or without
immunogenetics and immunoinformatics. In Silico Biol leader L region (or signal peptide), respectively.
5:45–60
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008)
• Four leafconcepts refer to the IG and TR genes
IMGT, a system and an ontology that bridge biological of the vertebrate adaptive immune response and
and computational spheres in bioinformatics. Brief are specific to immunogenetics: three of these
Bioinform 9:263–275 leafconcepts, “variable” (V), “diversity” (D), and
Magdelaine-Beuzelin C, Kaas Q, Wehbi V, Ohresser M,
Jefferis R, Lefranc M-P, Watier H (2007) Structure-function
“joining” (J) (▶ Variable (V) Gene, ▶ Diversity (D)
relationships of the variable domains of monoclonal anti- Gene, ▶ Joining (J) Gene) identify the IG and TR
bodies approved for cancer treatment. Crit Rev Oncol genes that rearrange at the DNA level in the B and
Hematol 64:210–225 T cells and code the V, D, and J regions, respec-
Pappalardo F, Lefranc M-P, Lollini P-L, Motta
S (2010) A novel paradigm for cell and molecular inter-
tively, of the IG and TR variable domains
action: from the CMM model to IMGT-ONTOLOGY. (▶ Variable (V) Domain); the fourth leafconcept,
Immunome Res 6:1 “constant” (C), (▶ Constant (C) Gene) identifies the
IMGT-ONTOLOGY, Highconcept 979 I
IG and TR genes that code the C region of the IG
and TR chains (▶ Chain Type) (Lefranc and IMGT-ONTOLOGY, Highconcept
Lefranc 2001a, b).
Marie-Paule Lefranc
Laboratoire d’ImmunoGénétique Moléculaire,
Cross-References Institut de Génétique Humaine UPR 1142,
Université Montpellier 2, Montpellier, France
▶ Chain Type
▶ Configuration Type
▶ Constant (C) Gene Definition
▶ Conventional Gene
▶ Diversity (D) Gene A highconcept is a type of concept of ▶ IMGT-
▶ IMGT-ONTOLOGY ONTOLOGY, the global reference in ▶ immunoge-
▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom netics and ▶ immunoinformatics (Giudicelli and
▶ IMGT-ONTOLOGY, Leafconcept Lefranc 1999; Lefranc et al. 2004, 2005, 2008; Duroux
▶ IMGT ® Information System et al. 2008), built by IMGT ®, the international ImMu-
▶ Immunogenetics noGeneTics information system ® (http://www.imgt.
▶ Immunoinformatics org) (▶ IMGT ® Information System), that I
▶ Joining (J) Gene identifies a concept with different levels of granularity
▶ Variable (V) Domain corresponding to a hierarchy of concepts. A highconcept
▶ Variable (V) Gene provides a general characterization and the level in the
hierarchy is required for a precise characterization.
“▶ TaxonRank,” “▶ ChainType,” and “ReceptorType”
References are three examples of highconcepts of identification
(IDENTIFICATION axiom). The hierarchy of
Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, “TaxonRank” is related to taxonomy levels and that
Lefranc M-P, Giudicelli V (2008) IMGT-Kaleidoscope, of “ChainType” and “ReceptorType” in part to concepts
the Formal IMGT-ONTOLOGY paradigm. Biochimie of classification.
90:570–583
Giudicelli V, Lefranc M-P (1999) Ontology for immunoge-
netics: the IMGT-ONTOLOGY. Bioinformatics 12:
1047–1054
Lefranc M-P, Lefranc G (2001a) The immunoglobulin
Cross-References
FactsBook. Academic Press, London, pp 1–458
Lefranc M-P, Lefranc G (2001b) The T cell receptor FactsBook. ▶ Chain Type
Academic Press, London, pp 1–398 ▶ IMGT-ONTOLOGY
Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G,
Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner
▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom
D, Thouvenin V, Combres K, Girod D, Jeanjean S, ▶ IMGT ® Information System
Protat C, Yousfi Monod M, Duprat E, Kaas Q, Pommié ▶ Immunogenetics
C, Chaume D, Lefranc G (2004) IMGT-ONTOLOGY ▶ Immunoinformatics
for immunogenetics and immunoinformatics. In Silico
Biol 4:17–29
▶ TaxonRank
Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P,
Coelho I, Combres K, Ginestoux C, Giudicelli V,
Chaume D, Lefranc G (2005) IMGT-Choreography for
immunogenetics and immunoninformatics. In Silico Biol References
5:45–60
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C,
IMGT, a system and an ontology that bridge biological Lefranc M-P, Giudicelli V (2008) IMGT-Kaleidoscope,
and computational spheres in bioinformatics. Brief the Formal IMGT-ONTOLOGY paradigm. Biochimie
Bioinform 9:263–275 90:570–583
I 980 IMGT-ONTOLOGY, IDENTIFICATION Axiom
▶ Chain Type
▶ Configuration Type
IMGT-ONTOLOGY, IDENTIFICATION ▶ Domain Type
Axiom ▶ Function Type
▶ FunctionalityType
Marie-Paule Lefranc ▶ Gene Type
Laboratoire d’ImmunoGénétique Moléculaire, ▶ IMGT-ONTOLOGY
Institut de Génétique Humaine UPR 1142, ▶ IMGT-ONTOLOGY, CLASSIFICATION Axiom
Université Montpellier 2, Montpellier, France ▶ IMGT-ONTOLOGY, DESCRIPTION Axiom
▶ IMGT-ONTOLOGY, Highconcept
▶ IMGT-ONTOLOGY, LOCALIZATION Axiom
Definition ▶ IMGT-ONTOLOGY, NUMEROTATION Axiom
▶ IMGT-ONTOLOGY, OPTENTION Axiom
The ▶ IDENTIFICATION axiom is an axiom of the ▶ IMGT-ONTOLOGY, ORIENTATION Axiom
Formal IMGT-ONTOLOGY or IMGT-Kaleidoscope ▶ IMGT-ONTOLOGY, SpecificityType
(Giudicelli and Lefranc 1999; Lefranc et al. 2004, ▶ IMGT ® Information System
2005, 2008; Duroux et al. 2008). The IDENTIFICA- ▶ Immunogenetics
TION axiom postulates that any molecule, cell, ▶ Immunoinformatics
tissue, organ, organism, or population, any process, ▶ Molecule Entity Type
and any relation, has to be identified. The IDENTIFICA- ▶ MoleculeType
TION axiom has generated the concepts of ▶ Structure Type
identification of ▶ IMGT-ONTOLOGY, the global ref- ▶ TaxonRank
erence in ▶ immunogenetics and ▶ immunoinformatics
(Giudicelli and Lefranc 1999; Lefranc et al. 2004, 2005,
2008; Duroux et al. 2008), built by IMGT®, the interna-
tional ImMunoGeneTics information system® (http:// References
www.imgt.org) (▶ IMGT® Information System).
Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C,
IMGT-ONTOLOGY major concepts of identifica-
Lefranc M-P, Giudicelli V (2008) IMGT-Kaleidoscope,
tion comprise: the Formal IMGT-ONTOLOGY paradigm. Biochimie
“▶ Chain Type,” “▶ Configuration Type,” 90:570–583
“▶ Domain Type,” “▶ Function Type,” “▶ Functiona- Giudicelli V, Lefranc M-P (1999) Ontology for immunogenet-
ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–1054
lityType,” “▶ Gene Type,” “▶ Molecule Entity Type,”
Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G,
“▶ MoleculeType,” “ReceptorType,” “▶ IMGT- Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D,
ONTOLOGY, SpecificityType,” “▶ Structure Type” Thouvenin V, Combres K, Girod D, Jeanjean S, Protat C,
IMGT-ONTOLOGY, LOCALIZATION Axiom 981 I
Yousfi Monod M, Duprat E, Kaas Q, Pommié C, Chaume D, ▶ IMGT-ONTOLOGY, NUMEROTATION Axiom
Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics ▶ IMGT ® Information System
and immunoinformatics. In Silico Biol 4:17–29
Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P, ▶ Immunogenetics
Coelho I, Combres K, Ginestoux C, Giudicelli V, Chaume D, ▶ Immunoinformatics
Lefranc G (2005) IMGT-Choreography for immunogenetics
and immunoinformatics. In Silico Biol 5:45–60
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT,
a system and an ontology that bridge biological and
computational spheres in bioinformatics. Brief Bioinform References
9:263–275
Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, Lefranc M-P,
Giudicelli V (2008) IMGT-Kaleidoscope, the Formal
IMGT-ONTOLOGY paradigm. Biochimie 90:570–583
Giudicelli V, Lefranc M-P (1999) Ontology for immunoge-
netics: the IMGT-ONTOLOGY. Bioinformatics 12:
IMGT-ONTOLOGY, Leafconcept 1047–1054
Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G,
Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D,
Marie-Paule Lefranc Thouvenin V, Combres K, Girod D, Jeanjean S, Protat C,
Laboratoire d’ImmunoGénétique Moléculaire, Yousfi Monod M, Duprat E, Kaas Q, Pommié C, Chaume D,
Institut de Génétique Humaine UPR 1142, Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics I
and immunoinformatics. In Silico Biol 4:17–29
Université Montpellier 2, Montpellier, France
Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P,
Coelho I, Combres K, Ginestoux C, Giudicelli V, Chaume D,
Lefranc G (2005) IMGT-Choreography for immunogenetics
Definition and immunoinformatics. In Silico Biol 5:45–60
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT,
a system and an ontology that bridge biological and
A leafconcept is a type of concept of ▶ IMGT- computational spheres in bioinformatics. Brief Bioinform
ONTOLOGY, the global reference in ▶ immunoge- 9:263–275
netics and ▶ immunoinformatics (Giudicelli and
Lefranc 1999; Lefranc et al. 2004, 2005, 2008; Duroux
et al. 2008), built by IMGT ®, the international ImMu-
noGeneTics information system ® (http://www.imgt.
org) (▶ IMGT ® Information System), that identifies IMGT-ONTOLOGY, LOCALIZATION Axiom
a concept that corresponds to the finest level of
granularity. A leafconcept is defined relative to Marie-Paule Lefranc
a parent concept (on a higher level). Leafconcepts Laboratoire d’ImmunoGénétique Moléculaire,
have been extensively characterized for the Institut de Génétique Humaine UPR 1142,
concepts of identification (▶ IMGT-ONTOLOGY, Université Montpellier 2, Montpellier, France
IDENTIFICATION Axiom), the concepts of
description (▶ IMGT-ONTOLOGY, DESCRIPTION
Axiom), the concepts of classification (▶ IMGT- Definition
ONTOLOGY, CLASSIFICATION axiom), and the
concepts of numerotation (▶ IMGT-ONTOLOGY, The ▶ LOCALIZATION axiom is an axiom of the
NUMEROTATION Axiom) of ▶ IMGT-ONTOLOGY. Formal IMGT-ONTOLOGY or IMGT-Kaleidoscope
(Giudicelli and Lefranc 1999; Lefranc et al. 2004,
2005, 2008; Duroux et al. 2008). The LOCALIZATION
Cross-References axiom postulates that any molecule, cell, tissue,
organ, organism, or population, any process, and any
▶ IMGT-ONTOLOGY relation, has to be localized. The LOCALIZATION
▶ IMGT-ONTOLOGY, CLASSIFICATION Axiom axiom has generated the concepts of localization
▶ IMGT-ONTOLOGY, DESCRIPTION Axiom of ▶ IMGT-ONTOLOGY, the global reference
▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom in ▶ immunogenetics and ▶ immunoinformatics
I 982 IMGT-ONTOLOGY, LocationType
Definition
Cross-References
“LocationType” is a concept of identification
▶ IMGT-ONTOLOGY (generated from the ▶ IDENTIFICATION Axiom)
▶ IMGT-ONTOLOGY, CLASSIFICATION Axiom of ▶ IMGT-ONTOLOGY, the global reference
▶ IMGT-ONTOLOGY, DESCRIPTION Axiom in ▶ immunogenetics and ▶ immunoinformatics
▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom (Giudicelli and Lefranc 1999; Lefranc et al. 2004,
▶ IMGT-ONTOLOGY, NUMEROTATION Axiom 2005, 2008; Duroux et al. 2008), built by IMGT®, the
▶ IMGT-ONTOLOGY, OPTENTION Axiom international ImMunoGeneTics information system®
▶ IMGT-ONTOLOGY, ORIENTATION Axiom (http://www.imgt.org) (▶ IMGT® Information System),
▶ IMGT ® Information System that allows to identify, whatever the molecule type
▶ Immunogenetics (▶ MoleculeType) (gDNA, cDNA, mRNA, protein),
▶ Immunoinformatics the ▶ Molecule_entity type leafconcepts (▶ IMGT-
ONTOLOGY, Leafconcept) based on their location.
The “LocationType” concept comprises, for
instance, the following leafconcepts:
References • “Orphon” identifies, whatever the molecule type,
a gene that is found in vivo on a different locus from
Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, the main locus (either on the same chromosome or
Lefranc M-P, Giudicelli V (2008) IMGT-Kaleidoscope, on another chromosome).
the Formal IMGT-ONTOLOGY paradigm. Biochimie
• “Transgene” identifies, whatever the molecule
90:570–583
Giudicelli V, Lefranc M-P (1999) Ontology for immunoge- type, a gene that is artificially introduced into a
netics: the IMGT-ONTOLOGY. Bioinformatics 12: multicellular organism (mouse, plant, etc.).
1047–1054 • “Translocated” identifies, whatever the molecule
Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G,
Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D,
type, a gene that results from a translocation (in vivo).
Thouvenin V, Combres K, Girod D, Jeanjean S, Protat C, • “Transposed” identifies, whatever the molecule
Yousfi Monod M, Duprat E, Kaas Q, Pommié C, Chaume D, type, a transgene or a retrotransposon that is
Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics inserted in a chromosome.
and immunoinformatics. In Silico Biol 4:17–29
Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P,
Coelho I, Combres K, Ginestoux C, Giudicelli V, Chaume D,
Lefranc G (2005) IMGT-Choreography for immunogenetics Cross-References
and immunoinformatics. In Silico Biol 5:45–60
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT,
▶ IMGT-ONTOLOGY
a system and an ontology that bridge biological and compu-
tational spheres in bioinformatics. Brief Bioinform ▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom
9:263–275 ▶ IMGT-ONTOLOGY, Leafconcept
IMGT-ONTOLOGY, Molecule_EntityType 983 I
▶ IMGT ® Information System 2005, 2008; Duroux et al. 2008), built by IMGT®, the
▶ Immunogenetics international ImMunoGeneTics information system®
▶ Immunoinformatics (http://www.imgt.org) (▶ IMGT® Information System),
▶ Molecule Entity Type that allows to identify the molecule entity type.
▶ MoleculeType The “Molecule_EntityType” concept is defined by the
“▶ MoleculeType,” “▶ GeneType,” and “▶ Configura-
tionType” concepts.
References The “Molecule_EntityType” concept comprises
38 leafconcepts (▶ IMGT-ONTOLOGY, Leafconcept).
Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, The ten leafconcepts that are the most classical
Lefranc M-P, Giudicelli V (2008) IMGT-Kaleidoscope, the
representatives of IG and TR data identification in
Formal IMGT-ONTOLOGY paradigm. Biochimie 90:
570–583 the ▶ IMGT ® information system are shown as exam-
Giudicelli V, Lefranc M-P (1999) Ontology for immunogenet- ples. They are also illustrated in ▶ immunoglobulin
ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–1054 synthesis.
Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G,
• V-gene identifies, for gDNA (▶ MoleculeType),
Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D,
Thouvenin V, Combres K, Girod D, Jeanjean S, Protat C, molecule entities with a germline V gene (▶ Gene
Yousfi MM, Duprat E, Kaas Q, Pommié C, Chaume D, Type). V-gene is in germline configuration
Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics (▶ Configuration Type). I
and immunoinformatics. In Silico Biol 4:17–29
• D-gene identifies, for gDNA, molecule entities with
Lefranc M-P, Clément O, Kaas Q, Duprat E, Chestellan P,
Coelho I, Combres K, Ginestoux C, Giudicelli V, Chaume D, a germline D gene (▶ Gene Type). D-gene is in
Lefranc G (2005) IMGT-Choreography for immunogenetics germline configuration.
and immunoinformatics. In Silico Biol 5:45–60 • J-gene identifies, for gDNA, molecule entities with
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT,
a germline J gene (▶ Gene Type). J-gene is in
a system and an ontology that bridge biological and
computational spheres in bioinformatics. Brief Bioinform germline configuration.
9:263–275 • C-gene identifies, for gDNA, molecule entities
with a C gene in undefined configuration (▶ Gene
Type). C-gene is in undefined configuration
(▶ Configuration Type).
• V-J-gene identifies, for gDNA, molecule entities
IMGT-ONTOLOGY, Molecule_EntityType with a rearranged V gene and a rearranged J gene
(▶ Gene Type). V-J-gene is in rearranged configu-
Marie-Paule Lefranc ration (▶ Configuration Type).
Laboratoire d’ImmunoGénétique Moléculaire, • V-D-J-gene identifies, for gDNA, molecule entities
Institut de Génétique Humaine UPR 1142, with a rearranged V gene, at least one rearranged
Université Montpellier 2, Montpellier, France D gene and a rearranged J gene (▶ Gene Type).
V-D-J-gene is in rearranged configuration
(▶ Configuration Type).
Synonyms • L-V-J-C-sequence identifies, for cDNA
(▶ MoleculeType), molecule entities with a leader
Molecule entity type L region, a rearranged V region, a rearranged
J region, and a C region in undefined configuration
(▶ Gene Type). L-V-J-C-sequence is in rearranged
Definition configuration (▶ Configuration Type).
• L-V-D-J-C-sequence identifies, for cDNA, molecule
Molecule_EntityType is a major concept of identifi- entities with a leader L region, a rearranged V region,
cation (generated from the ▶ IDENTIFICATION at least one rearranged D region, a rearranged
Axiom) of ▶ IMGT-ONTOLOGY, the global refer- J region, and a C region in undefined configuration
ence in ▶ immunogenetics and ▶ immunoinformatics (▶ Gene Type). L-V-D-J-C-sequence is in rearranged
(Giudicelli and Lefranc 1999; Lefranc et al. 2004, configuration (▶ Configuration Type).
I 984 IMGT-ONTOLOGY, NUMEROTATION Axiom
Marie-Paule Lefranc
Laboratoire d’ImmunoGénétique Moléculaire,
Institut de Génétique Humaine UPR 1142,
References
Université Montpellier 2, Montpellier, France Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, Lefranc M-P,
Giudicelli V (2008) IMGT-Kaleidoscope, the Formal
IMGT-ONTOLOGY paradigm. Biochimie 90:570–583
Definition Giudicelli V, Lefranc M-P (1999) Ontology for immunogenet-
ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–1054
Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G,
The ▶ OBTENTION axiom is an axiom of the Formal Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner
IMGT-ONTOLOGY or IMGT-Kaleidoscope (Giudicelli D, Thouvenin V, Combres K, Girod D, Jeanjean S,
I 986 IMGT-ONTOLOGY, ORIENTATION Axiom
Protat C, Yousfi Monod M, Duprat E, Kaas Q, • Antisense (synonyms: Minus or Not coding). The
Pommié C, Chaume D, Lefranc G (2004) IMGT- complementary 30 - > 50 strand which is the
ONTOLOGY for immunogenetics and immunoin-
formatics. In Silico Biol 4:17–29 strand transcribed by the RNA polymerase is
Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P, also designated as “template” DNA.
Coelho I, Combres K, Ginestoux C, Giudicelli V, Chaume D, 2. The Genomic_Orientation concept is
Lefranc G (2005) IMGT-Choreography for immunogenetics a highconcept that defines the orientation of
and immunoinformatics. In Silico Biol 5:45–60
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT, genomic entities (chromosome, locus, and gene)
a system and an ontology that bridge biological and and comprises:
computational spheres in bioinformatics. Brief Bioinform • The Gene_Orientation concept, with a unique
9:263–275 ▶ leafconcept: “50 ! 30 ”. It is by convention
the orientation of transcription of the gene.
• The Locus_Orientation concept, with a unique
leafconcept: “50 ! 30 ”, defined by the orientation
IMGT-ONTOLOGY, ORIENTATION Axiom of transcription of the majority of the genes in the
locus, and/or for immunoglobulin (IG) and T cell
Marie-Paule Lefranc receptor (TR) loci, by the orientation of transcrip-
Laboratoire d’ImmunoGénétique Moléculaire, tion of the constant (C) gene(s) (▶ Constant (C)
Institut de Génétique Humaine UPR 1142, gene).
Université Montpellier 2, Montpellier, France • The Gene_Orientation_in_Locus concept, for
a gene or a cluster (set of genes), with two
exclusive leafconcepts:
Definition – Direct: same orientation as the locus.
– Opposite: opposite orientation.
The ▶ ORIENTATION axiom is an axiom of the • The Orientation_on_Chromosome concept,
Formal IMGT-ONTOLOGY or IMGT-Kaleidoscope for a gene or a locus, with two exclusive
(Giudicelli and Lefranc 1999; Lefranc et al. 2004, leafconcepts:
2005, 2008; Duroux et al. 2008). The ORIENTATION – Forward (synonyms: Watson, FWD):
axiom postulates that any molecule, cell, tissue, organ, “50 ! 30 ” gene or locus orientation is from
organism, or population, any process, and any rela- pter (short arm telomeric end) to qter (long
tion, has to be orientated. The ORIENTATION arm telomeric end).
axiom has generated the concepts of orientation – Reverse (synonym: REV): “50 ! 30 ” gene or
of ▶ IMGT-ONTOLOGY, the global reference in locus orientation is from qter to pter.
▶ immunogenetics and ▶ immunoinformatics The ORIENTATION axiom is one of seven
(Giudicelli and Lefranc 1999; Lefranc et al. 2004, axioms in the Formal IMGT-ONTOLOGY, the others
2005, 2008; Duroux et al. 2008), built by IMGT ®, being “▶ IDENTIFICATION axiom,” “▶ DESCRIP-
the international ImMunoGeneTics information sys- TION axiom,” “▶ CLASSIFICATION axiom,”
tem ® (http://www.imgt.org) (▶ IMGT ® Information “▶ NUMEROTATION axiom,” “▶ LOCALIZATION
System). axiom,” and “▶ OBTENTION axiom.”
IMGT-ONTOLOGY concepts of orientation com-
prise (IMGT Index, http://www.imgt.org):
1. The DNA_strand_orientation concept with two Cross-References
exclusive leafconcepts:
• Sense (synonyms: Plus or Coding): The 50 - > 30 ▶ Constant (C) Gene
DNA strand is designated, for a given gene, as ▶ IMGT-ONTOLOGY
“sense,” “plus,” or “coding” because its ▶ IMGT-ONTOLOGY, CLASSIFICATION Axiom
sequence is identical to the sequence of the ▶ IMGT-ONTOLOGY, DESCRIPTION Axiom
premessenger (premRNA) [except for uracile ▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom
(u) in RNA, instead of thymine (t) in DNA]. ▶ IMGT-ONTOLOGY, Leafconcept
This coding strand is not transcribed. ▶ IMGT-ONTOLOGY, LOCALIZATION Axiom
IMGT-ONTOLOGY, SpecificityType 987 I
▶ IMGT-ONTOLOGY, NUMEROTATION Axiom the type of specificity for the immunoglobulins (IG) or
▶ IMGT-ONTOLOGY, OPTENTION Axiom antibodies and the T cell receptors (TR).
▶ IMGT ® Information System IG and TR are the specific antigen receptors of the
▶ Immunogenetics adaptive immune response (and, therefore, major
▶ Immunoinformatics leafconcepts of the “ReceptorType” concept) character-
ized by their function of binding and recognition of the
antigen (▶ Function Type). The “SpecificityType”
References concept allows an amino acid characterization of this
binding.
Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, Lefranc M-P, The “SpecificityType” concept includes two major
Giudicelli V (2008) IMGT-Kaleidoscope, the Formal
concepts:
IMGT-ONTOLOGY paradigm. Biochimie 90:570–583
Giudicelli V, Lefranc M-P (1999) Ontology for immunogenet- • “▶ Paratope” identifies the amino acids of the anti-
ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–1054 gen receptor variable domains (V-DOMAIN)
Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G, (▶ Variable (V) Domain) that recognize the antigenic
Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D,
determinant (antigen (Ag) for IG or peptide/major
Thouvenin V, Combres K, Girod D, Jeanjean S, Protat C,
Yousfi Monod M, Duprat E, Kaas Q, Pommié C, Chaume D, histocompatibility (pMH) for TR).
Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics • “▶ Epitope” identifies the antigenic determinant
and immunoinformatics. In Silico Biol 4:17–29 (Ag or pMH) recognized by the antigen receptor I
Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P,
(IG or TR, respectively).
Coelho I, Combres K, Ginestoux C, Giudicelli V, Chaume D,
Lefranc G (2005) IMGT-Choreography for immunogenetics Paratope and epitope are used in IMGT/3Dstructure-
and immunoinformatics. In Silico Biol 5:45–60 DB (http://www.imgt.org) for analysis of interactions
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT, a in IG/Ag and TR/peptide/major histocompatibility
system and an ontology that bridge biological and computa-
(TR/pMH) complexes (Kaas et al. 2008).
tional spheres in bioinformatics. Brief Bioinform 9:263–275
Cross-References
IMGT-ONTOLOGY, SpecificityType ▶ Epitope
▶ Function Type
Marie-Paule Lefranc ▶ IMGT-ONTOLOGY
Laboratoire d’ImmunoGénétique Moléculaire, ▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom
Institut de Génétique Humaine UPR 1142, ▶ IMGT ® Information System
Université Montpellier 2, Montpellier, France ▶ Immunogenetics
▶ Immunoinformatics
▶ Paratope
Synonyms ▶ Variable (V) Domain
Function type
References
Definition Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, Lefranc M-P,
Giudicelli V (2008) IMGT-Kaleidoscope, the Formal IMGT-
“SpecificityType” is a concept of identification ONTOLOGY paradigm. Biochimie 90:570–583
Giudicelli V, Lefranc M-P (1999) Ontology for Immunogenet-
(generated from the ▶ IDENTIFICATION Axiom) of
ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–
▶ IMGT-ONTOLOGY, the global reference in ▶ immu- 1054
nogenetics and ▶ immunoinformatics (Giudicelli and Kaas Q, Duprat E, Tourneur G, Lefranc M-P (2008) IMGT
Lefranc 1999; Lefranc et al. 2004, 2005, 2008; Duroux standardization for molecular characterization of the T cell
receptor/peptide/MHC complexes, chap. 2. In: Schoenbach C,
et al. 2008), built by IMGT®, the international ImMuno-
Ranganathan S, Brusic V (eds) Immunoinformatics,
GeneTics information system® (http://www.imgt.org) Immunomics Reviews, Series of Springer Science and
(▶ IMGT® Information System) that allows to identify Business Media LLC. Springer, New York
I 988 IMGT-ONTOLOGY, StructureType
Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G, (“engineered,” “humanized,” “chimeric,” “fusion”) (“chi-
Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D, meric” and “fusion” can also be obtained in vivo)
Thouvenin V, Combers K, Girod D, Jeanjean S, Protat C,
Yousfi Monod M, Duprat E, Kaas Q, Pommié C, Chaume D, (Giudicelli and Lefranc 1999; Lefranc et al. 2004).
Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics
and immunoinformatics. In Silico Biol 4:17–29
Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P, Cross-References
Coelho I, Combers K, Ginestoux C, Giudicelli V, Chaume D,
Lefranc G (2005) IMGT-Choreography for immunogenetics
and immunoinformatics. In Silico Biol 5:45–60 ▶ IMGT-ONTOLOGY
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT, a ▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom
system and an ontology that bridge biological and computa- ▶ IMGT-ONTOLOGY, Leafconcept
tional spheres in bioinformatics. Brief Bioinform 9:263–275
▶ IMGT ® Information System
▶ Immunogenetics
▶ Immunoinformatics
▶ Molecule Entity Type
IMGT-ONTOLOGY, StructureType ▶ MoleculeType
Marie-Paule Lefranc
Laboratoire d’ImmunoGénétique Moléculaire, References
Institut de Génétique Humaine UPR 1142,
Université Montpellier 2, Montpellier, France Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, Lefranc M-P,
Giudicelli V (2008) IMGT-Kaleidoscope, the Formal IMGT-
ONTOLOGY paradigm. Biochimie 90:570–583
Giudicelli V, Lefranc M-P (1999) Ontology for immunogenet-
Synonyms ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–1054
Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P,
Coelho I, Combres K, Ginestoux C, Giudicelli V, Chaume D,
Molecule structure type; Structure type
Lefranc G (2005) IMGT-Choreography for immunogenetics
and immunoinformatics. In Silico Biol 5:45–60
Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G,
Definition Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D,
Thouvenin V, Combres K, Girod D, Jeanjean S, Protat C,
Yousfi Monod M, Duprat E, Kaas Q, Pommié C, Chaume D,
“StructureType” is a concept of identification Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics
(generated from the ▶ NUMEROTATION Axiom) and immunoinformatics. In Silico Biol 4:17–29
of ▶ IMGT-ONTOLOGY, the global reference Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT, a
system and an ontology that bridge biological and computa-
in ▶ immunogenetics and ▶ immunoinformatics
tional spheres in bioinformatics. Brief Bioinform 9:263–275
(Giudicelli and Lefranc 1999; Lefranc et al. 2004,
2005, 2008; Duroux et al. 2008), built by IMGT ®,
the international ImMunoGeneTics information sys-
tem ® (http://www.imgt.org) (▶ IMGT ® Information
System), that allows to identify, whatever the molecule IMGT-ONTOLOGY, Unproductive
type (▶ MoleculeType) (gDNA, cDNA, mRNA,
protein), the type de structure of ▶ Molecule_ Marie-Paule Lefranc
EntityType leafconcepts (▶ IMGT-ONTOLOGY, Laboratoire d’ImmunoGénétique Moléculaire,
Leafconcept). Institut de Génétique Humaine UPR 1142,
The “StructureType” concept comprises 16 Université Montpellier 2, Montpellier, France
leafconcepts that identify structures with a classical orga-
nization (“regular”), from those which have been modi-
fied either naturally in vivo (“processed,” “unprocessed,” Definition
“partially-processed,” “sterile-transcript,” “unspliced,”
“partially-spliced,” “spliced,” “secreted,” “membrane,” “Unproductive” is a ▶ leafconcept of the
“mature-form,” “immature-form”), or artificially in vitro “▶ FunctionalityType” concept of identification
Immobilized Template Assay 989 I
(generated from the ▶ IDENTIFICATION Axiom) Giudicelli V, Lefranc M-P (1999) Ontology for immunogenet-
of ▶ IMGT-ONTOLOGY, the global reference in ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–1054
Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N,
immunogenetics and immunoinformatics (Giudicelli Folch G, Guiraudou D, Jabado-Michaloud J, Magris S,
and Lefranc 1999; Lefranc et al. 2004, 2005, Scaviner D, Thouvenin V, Combres K, Girod D, Jeanjean S,
2008; Duroux et al. 2008), built by IMGT®, the inter- Protat C, Yousfi Monod M, Duprat E, Kaas Q, Pommié C,
national ImMunoGeneTics information system® (http:// Chaume D, Lefranc G (2004) IMGT-ONTOLOGY for
immunogenetics and immunoinformatics. In Silico Biol
www.imgt.org) (▶ IMGT® Information System). 4:17–29
“Unproductive” identifies, whatever the molecule Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P,
type (▶ MoleculeType), the functionality of Coelho I, Combres K, Ginestoux C, Giudicelli V, Chaume
▶ Molecule_EntityType leafconcepts in rearranged or D, Lefranc G (2005) IMGT-Choreography for immunoge-
netics and immunoinformatics. In Silico Biol 5:45–60
partially-rearranged configuration (▶ Configuration Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT, a
Type), whose coding region has stop codon(s) and/or system and an ontology that bridge biological and computa-
frameshift mutation(s), and/or whose junction is out-of- tional spheres in bioinformatics. Brief Bioinform 9:263–275
frame (for immunoglobulin (IG) and T cell receptor
(TR)), and/or for which a mutation affects the initiation
codon, and/or for which there are defects in the splicing
sites and/or in the regulatory element(s), and/or there are
unusual features (translocated, gene fusion, etc.), and/or Imitation Switch I
changes of conserved amino acids demonstrated as lead-
ing to uncorrect folding. ▶ ISWI
“Unproductive” is one of the two leafconcepts
(the other one being “▶ Productive”) that identify the
functionality of Molecule_EntityType leafconcepts in
rearranged or partially-rearranged configuration Immobilized Template Assay
(▶ Configuration Type) (IG and TR entities after
DNA rearrangements, and by extension fusion entities Tetsuro Kokubo
resulting from translocations, and hybrid entities Department of Supramolecular Biology, Graduate
obtained by biotechnology molecular engineering). School of Nanobioscience, Yokohama City
University, Yokohama, Kanagawa, Japan
Cross-References
Synonyms
▶ Configuration Type
▶ FunctionalityType Immobilized template system; In vitro transcription
▶ IMGT-ONTOLOGY assay using an immobilized template
▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom
▶ IMGT-ONTOLOGY, Leafconcept
▶ IMGT ® information system Definition
▶ Immunogenetics
▶ Immunoinformatics The immobilized template assay is a powerful tech-
▶ Molecule Entity Type nique for analyzing the correlation between transcrip-
▶ MoleculeType tional activity and the transcription factors involved in
▶ Productive the reaction. The assay is composed of two experimen-
tal parts: an in vitro transcription assay using an
immobilized template (promoter DNA) bound to mag-
References netic beads via a biotin-streptavidin linkage (Fig. 1a),
and immunoblot analyses that examine the presence or
Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, Lefranc M-P,
Giudicelli V (2008) IMGT-Kaleidoscope, the Formal absence of specific transcription factor(s) in the PIC
IMGT-ONTOLOGY paradigm. Biochimie 90:570–583 (pre-initiation complex) (Fig. 1b).
I 990 Immobilized Template System
magnetic beads
Mediator RNAPII
DNA TFIIF
TFIIB TFIIE
UAS core promoter
PstI activator TFIID
TFIIA TFIIH
DNA
UAS core promoter
+cell extracts, activator PstI
magnetic
PIC beads
wash out
Mediator RNAPII
TFIIF
magnetic beads TFIIB TFIIE
activator TFIID
TFIIA TFIIH Mediator RNAPII
DNA TFIIF
UAS core promoter TFIIB TFIIE
PstI TFIID
activator TFIIH
TFIIA
DNA
UAS core promoter
+rNTP PstI
transcription
Mediator immunoblot analyses
RNAPII
TFIIF magnetic
magnetic beads TFIIB TFIIE
beads
activator TFIID
TFIIA TFIIH
DNA
UAS core promoter
PstI
Immobilized Template Assay, Fig. 1 Immobilized template subjected to immunoblot analyses, as described in B; (b) PstI
assay, (a) In vitro transcription assay. Template DNA containing digestion and immunoblot analyses. To minimize contamination
a core promoter and binding site (UAS) for the transcriptional by nonspecific proteins in the system, a PstI recognition site (or
activator of interest is immobilized on magnetic beads via that of another appropriate restriction enzyme) is introduced into
a biotin-streptavidin linkage. Crude cell lysates (or purified the template DNA in advance. After extensive washing, the PIC
transcription factors) and the transcription activator of interest is then released into the supernatant by mild treatment with PstI.
are added to the immobilized template and incubated for the The component(s) in the PIC is then analyzed by immunoblot
appropriate period to assemble the PIC. Once the PIC is assem- analyses
bled, transcription is initiated by the addition of rNTP or
a b
A variable proportion of protein X in All of protein X in a cell is
a cell is visualized by X-FP reporter visualized by X-FP
X
X-FP
X-FP
Immune Pathway Dynamics, Biological Methods, also express protein X from the native gene X on its resident
Fig. 1 Many imaging approaches visualize a fraction of the chromosome. Shown are two cells that have different propor-
protein population, leaving the native protein “in the dark.” (a) tions of X tagged. (b)If the fusion reporter is introduced to cells
A fluorescent reporter is prepared by fusing the gene of interest that lack the endogenous gene X, or if X is replaced by X-FP at
(X) with the gene of a fluorescent protein (FP). The tagged the native genomic location, then all the cells express X-FP. This
protein X-FP is typically expressed by transient transfection or results in complete visualization of the total population
delivery into a random genomic site by viral infection. The cells
of the unnatural promoter and context of the transfected nuclear processes (Hakim et al. 2010). Furthermore, the
DNA. Another concern with transfection-based methods natural promoter-driven or the BAC-driven expression
is the cell stress induced by the transfection procedure vector can be delivered to “knockout” cells that lack the
itself. The harsh transfection administered before each expression of the endogenous gene X, when available.
experiment induces specific cellular responses, which is This has the advantage of tagging and visualizing all the
unavoidable with all common delivery methods that use protein molecules X in the cell, providing faithful mea-
lipids, electroporation, or ultrasound. surements of the total pool of protein X (Fig. 1).
Another approach is to have the reporter mimic Ideally, replacement of the native gene with the
the naturally regulated expression of protein X by fusion reporter would guarantee complete tagging
incorporating the promoter sequence of the native gene (Fig. 1b) as well as endogenous regulation of the
X upstream of the fusion reporter. Even a ▶ bacterial reporter. This can be achieved, albeit technically
artificial chromosome (BAC) containing a large genomic demanding, by ▶ homologous recombination to gener-
DNA sequence surrounding the gene X, now replaced by ate the desired cell line or a transgenic animal. In this
X-FP, can be integrated to the genome and expressed in case, the X-FP fusion gene would reside in the natural
cells. The BAC contains all the naturally occurring reg- context of the gene X, complete with the promoter, distal
ulatory DNA sequences in the neighborhood of the gene regulatory sequences, and long-range DNA elements in
X. This aspect increases the likelihood that it will support close spatial proximity that may be important for the
an expression pattern close to the endogenous gene regulation of X.
regardless of where the BAC is integrated in the genome. In all of the above approaches, the functionality of
However, it may still be affected by the different spatial the fluorescently tagged protein X-FP should be
organization around the gene which seems to play an rigorously tested especially in the case where X-FP is
increasingly important role in transcription and other the only version of protein X present in the cells.
Immune Pathway Dynamics, Biological Methods 993 I
Immune Pathway Dynamics, Biological Methods, excluded from consideration such as those moving into/out
Fig. 2 An example analysis flow for time lapse images from of view or those where automatic identification failed.
live cell imaging. A segmentation algorithm identifies the The procedure is applied to each image in the time series, and
objects of interest (nuclei, membranes, cytoplasm, or other then a tracking algorithm identifies and links the images from
necessary regions for pixel intensity quantification). A noise the same cell over time. Finally, intensity measurements
filtering step reduces fine-grain pixel fluctuations without are collected from the relevant regions found during the
altering the major features of the image. Certain cells can be segmentation step
Sometimes the addition of a fluorescent tag to a protein during the entire course of the experiment including the
alters its conformation or interferes with molecular setup. For the same reason, photo-damage must be min-
interactions with other molecules, which disrupts its imized to ensure the natural cell physiology. Therefore,
cellular function. the excitation setting for the FP should be the minimum
Once the cells expressing the fusion reporter are pre- necessary to obtain quantifiable images (low laser power,
pared, it is critical to select the appropriate imaging setup. short exposure time, etc.), and the detection setting should
An important goal for live cell imaging is to quantify allow maximum light output (large pinhole, optimal
images and obtain measurements that are useful for sys- filters, etc.). In addition, the time interval between
tems biology. This requires a microscope that has high- image acquisitions must be just short enough to resolve
quality optical components including a light source suit- the relevant temporal dynamics, in order to minimize the
able for the particular FPs, appropriate objective lenses, extent of ▶ photobleaching (gradual loss of fluorescence
a sensitive camera, and high precision control of stage by repeated imaging).
and other devices. An autofocus mechanism is necessary Analysis of time lapse images can involve
for long-term imaging to minimize intensity fluctuations background adjustment, noise filtering, ▶ image
even from slight focus drifts. Another basic requirement segmentation (identifying the regions/objects of
is built-in microscope components that provide standard- interest, e.g., cell boundaries, nuclear boundaries, organ-
ized cell culture conditions (temperature, humidity, CO2). elles), tracking of objects over time, and the extraction of
A good rule of thumb in performing any live cell imaging the relevant intensities measured within specified objects
experiment is to minimize deviations from the conditions (Fig. 2). Certain tasks can be performed using commer-
of the tissue culture incubator that the cells experience cially available image processing software, but often
I 994 Immune Pathway Dynamics, Modeling
▶ Immunomonitoring
Immunofluorescence Immunogen
Definition
Definition
Immunogen is a molecule which is recognized as for-
Immunofluorescence is a technique that allows visual- eign and leads to the activation of the host immune
ization of specific proteins or antigens in cells or system.
I 998 Immunogenetics
References
Immunogenetics
Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C, Lefranc M-P,
Giudicelli V (2008) IMGT-Kaleidoscope, the Formal IMGT-
Marie-Paule Lefranc
ONTOLOGY paradigm. Biochimie 90:570–583
Laboratoire d’ImmunoGénétique Moléculaire, Giudicelli V, Lefranc M-P (1999) Ontology for immunogenet-
Institut de Génétique Humaine UPR 1142, ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–1054
Université Montpellier 2, Montpellier, France Lefranc M-P (2005) IMGT, the international ImMunoGeneTics
information system ®: a standardized approach for immuno-
genetics and immunoinformatics. Immunome Res 1:3, http://
www.immunome-research.com/content/1/1/3
Lefranc M-P, Lefranc G (2001a) The immunoglobulin
Definition FactsBook. Academic Press, London, pp 1–458
Lefranc M-P, Lefranc G (2001b) The T cell receptor FactsBook.
Academic Press, London, pp 1–398
Immunogenetics is the science that studies, in relation Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G,
with immunology, the genetic data of the immune Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D,
system and immune response, at the molecule, cell, Thouvenin V, Combres K, Girod D, Jeanjean S, Protat C,
Yousfi Monod M, Duprat E, Kaas Q, Pommié C, Chaume D,
tissue, organ, organism, and/or population levels, in
Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics
vertebrates and invertebrates. and immunoinformatics. In Silico Biol 4:17–29
Immunogenetics data are increasingly studied in Lefranc M-P, Duprat E, Kaas Q, Tranne M, Thiriot A, Lefranc G
▶ immunoinformatics. Immunogenetics molecules (2005a) IMGT unique numbering for MHC groove
G-DOMAIN and MHC superfamily (MhcSF) G-LIKE-
comprise the antigen receptors, immunoglobulins
DOMAIN. Dev Comp Immunol 29:917–938
(IG) or antibodies (▶ Immunoglobulin Synthesis) Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P,
(Lefranc and Lefranc 2001a) and T cell receptors Coelho I, Combres K, Ginestoux C, Giudicelli V, Chaume D,
(TR) (Lefranc and Lefranc 2001b), and the major Lefranc G (2005b) IMGT-Choreography for immunogenetics
and immunoinformatics. In Silico Biol 5:45–60
histocompatibility (MH) (Lefranc et al. 2005a), that
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT, a
characterize the adaptive immune response, and mol- system and an ontology that bridge biological and computa-
ecules of the innate immune response, that include tional spheres in bioinformatics. Brief Bioinform 9:263–275
cytokines, chemokines, integrins, related proteins of
the immune system (RPI) of the ▶ immunoglobulin
superfamily (IgSF) and of the ▶ MH superfamily Immunogenic Peptides
(MhSF).
Immunogenetics knowledge management is ▶ T Cell Epitopes
based on the concepts of ▶ IMGT-ONTOLOGY,
the global reference in ▶ immunogenetics and
▶ immunoinformatics (Giudicelli and Lefranc 1999; Immunoglobulin Superfamily
Lefranc et al. 2004, 2005b, 2008; Duroux et al.
2008), built by IMGT ®, the international ImMuno- ▶ Immunoglobulin Superfamily (IgSF)
GeneTics information system® (http://www.imgt.org)
(▶ IMGT® Information System) (Lefranc 2005).
Immunoglobulin Superfamily (IgSF)
Marie-Paule Lefranc
Cross-References Laboratoire d’ImmunoGénétique Moléculaire,
Institut de Génétique Humaine UPR 1142,
▶ IMGT ® Information System Université Montpellier 2, Montpellier, France
▶ IMGT-ONTOLOGY
▶ Immunoglobulin Superfamily (IgSF)
▶ Immunoglobulin Synthesis Synonyms
▶ Immunoinformatics
▶ MH Superfamily (MhSF) IgSF; Immunoglobulin superfamily
Immunoglobulin Synthesis 999 I
Definition Giudicelli V, Lefranc M-P (1999) Ontology for immunogenet-
ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–1054
Lefranc M-P, Lefranc G (2001a) The immunoglobulin
The immunoglobulin superfamily (IgSF) is a superfamily FactsBook. Academic Press, London, pp 1–458
of proteins that share domain structural characteristics: Lefranc M-P, Lefranc G (2001b) The T cell receptor FactsBook.
a protein belongs to the immunoglobulin superfamily Academic Press, London, pp 1–398
(IgSF) if it has at least one ▶ variable (V) domain or Lefranc M-P, Pommié C, Ruiz M, Giudicelli V, Foulquier E,
Truong L, Thouvenin-Contet V, Lefranc G (2003) IMGT
one ▶ constant (C) domain (Lefranc et al. 2003, 2005a). unique numbering for immunoglobulin and T cell receptor
A protein of the immunoglobulin superfamily (IgSF) variable domains and Ig superfamily V-like domains. Dev
may also belong to the ▶ MH superfamily (MhSF). Comp Immunol 27:55–77
Proteins of the immunoglobulin superfamily (IgSF) Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G,
Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D,
are important components of the immune response. Thouvenin V, Combres K, Girod D, Jeanjean S, Protat C,
They comprise the antigen receptors, immunoglobu- Yousfi Monod M, Duprat E, Kaas Q, Pommié C, Chaume D,
lins (IG) or antibodies (▶ Immunoglobulin Synthesis) Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics
(Lefranc and Lefranc 2001a) and T cell receptors (TR) and immunoinformatics. In Silico Biol 4:17–29
Lefranc M-P, Pommié C, Kaas Q, Duprat E, Bosc N, Guiraudou D,
(Lefranc and Lefranc 2001b) of the adaptive immune Jean C, Ruiz M, Da Piedade I, Rouard M, Foulquier E,
response, characterized by V-DOMAIN (▶ Variable Thouvenin V, Lefranc G (2005a) IMGT unique numbering
(V) domain) and C-DOMAIN (▶ Constant (C) for immunoglobulin and T cell receptor constant domains
domain), and related proteins of the immune system and Ig superfamily C-like domains. Dev Comp Immunol I
29:185–203
(RPI) characterized by at least one V-LIKE-DOMAIN Lefranc M-P, Clément O, Kaas Q, Duprat E, Chastellan P,
(▶ Variable (V) domain) or one C-LIKE-DOMAIN Coelho I, Combres K, Ginestoux C, Giudicelli V, Chaume D,
(▶ Constant (C) domain). IgSF genes and proteins are Lefranc G (2005b) IMGT-Choreography for immunogenetics
extensively studied in ▶ immunogenetics and and immunoinformatics. In Silico Biol 5:45–60
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT, a
▶ immunoinformatics. system and an ontology that bridge biological and computa-
IgSF knowledge management is based on the tional spheres in bioinformatics. Brief Bioinform 9:263–275
concepts of ▶ IMGT-ONTOLOGY, the global
reference in ▶ immunogenetics and ▶ immunoinformatics
(Giudicelli and Lefranc 1999; Lefranc et al. 2004, 2005b,
2008; Duroux et al. 2008), built by IMGT®, the interna- Immunoglobulin Synthesis
tional ImMunoGeneTics information system®, http://
www.imgt.org (▶ IMGT® Information System). Marie-Paule Lefranc
Laboratoire d’ImmunoGénétique Moléculaire,
Institut de Génétique Humaine UPR 1142,
Cross-References Université Montpellier 2, Montpellier, France
“MoleculeType” “ConfigurationType”
V-gene D-gene J-gene C-gene V-gene J-gene C-gene concept concept
gDNA germline
Genome 5′ 3′ 5′ 3′
1 DNA rearrangements 1
5′ 3′ 5′ 3′ rearranged
gDNA
2 2
Transcription
L-V-D-J-C-sequence L-V-J-C-sequence
mRNA rearranged
Transcriptome 5′ 3′ 5′ 3′
(or in vitro cDNA)
3 Translation
3
V-D-J-C-sequence
V-J-C-sequence
protein rearranged
Proteome IG-Light-Chain
IG-Heavy-Chain
IG (or antibody)
(standardized keywords), the concepts of description code the IG (and TR): “variable” (V), “diversity” (D),
(▶ IMGT-ONTOLOGY, DESCRIPTION Axiom) “joining” (J), and “constant” (C) (▶ Variable (V)
describe the composition of the sequences and structures Gene, ▶ Diversity (D) Gene, ▶ Joining (J) Gene,
with standardized labels (▶ Labels and Relations), the ▶ Constant (C) Gene). The configuration of the
concepts of classification (▶ IMGT-ONTOLOGY, V-gene, D-gene, and J-gene is identified as “germline”
CLASSIFICATION axiom) classify the genes and (Fig. 1), the configuration of the C-gene is “undefined”
alleles with a standardized nomenclature (▶ Gene and (▶ Configuration Type). During the differentiation of the
Allele Nomenclature), and the concepts of numerotation B lymphocytes in the bone marrow, the genomic DNA is
(▶ IMGT-ONTOLOGY, NUMEROTATION Axiom) rearranged first in the IGH locus and then in the IGK and
numerotate the nucleotide and amino acid numbering IGL loci. The rearrangements in the IGH locus lead to the
within sequences and structures (▶ IMGT Unique Num- junction of a D-gene and a J-gene to form a D-J-gene, and
bering, ▶ IMGT Collier de Perles). then to the junction of a V-gene to the D-J-gene to form
An IG or antibody is composed of two identical a V-D-J-gene. The rearrangements in the IGK or IGL loci
heavy chains associated with two identical light lead to the junction of a V-gene and a J-gene to form
chains, kappa or lambda. In humans, heavy chain a V-J-gene. The configuration of these genes is
genes (locus IGH), light chain kappa genes (locus identified as “rearranged” (▶ Configuration Type). After
IGK), and light chain lambda genes (locus IGL) are transcription and maturation of the pre-messenger by
located on the chromosomes 14 (14q32.3), 2 (2p11.2) splicing, the messenger RNA (mRNA) L-V-D-J-C-
and 22 (22q11.2), respectively (Lefranc and Lefranc sequence Molecule_EntityType (▶ IMGT-ONTOL-
2001). The synthesis of an immunoglobulin requires OGY, Molecule_EntityType) and L-V-J-C-sequence
rearrangements of the IGH, IGK, and IGL genes dur- (L for leader) are obtained and then translated into the
ing the differentiation of the B lymphocytes. heavy chain (IG-Heavy-Chain) (▶ Chain Type) and the
In the human genome (genomic DNA or gDNA) light chain (IG-Light-Chain) of an IG (or antibody)
(▶ MoleculeType), four types of genes (▶ Gene Type) (Fig. 1) (Lefranc and Lefranc 2001).
Immunoglobulin Synthesis 1001 I
Immunoglobulin Synthesis, VH
Fig. 2 The variable domains CDR2-IMGT
VL
VH and VL (Variable (V) CDR3-IMGT
domain) of the heavy and light CDR3-IMGT CDR1-IMGT
Q66
chains of an IG or antibody. CDR1-IMGT
N
I
The variable domains VH and VL (▶ Variable (V) ▶ Diversity (D) Gene
Domain) are coded by the V-D-J-REGION and the ▶ Epitope
V-J-REGION (Fig. 2). Each domain includes four ▶ Framework Region (FR-IMGT)
framework regions (FR) (▶ Framework Region ▶ Gene and Allele Nomenclature
(FR-IMGT)) (in pale gray in Fig. 2) and three hyper- ▶ IMGT Collier de Perles
variable loops or complementarity determining ▶ IMGT Unique Numbering
regions (CDR) (▶ Complementarity Determining ▶ IMGT-ONTOLOGY
Region (CDR-IMGT)). The CDR, and more particu- ▶ IMGT-ONTOLOGY, CLASSIFICATION Axiom
larly the CDR3 that results from the junction of the ▶ IMGT-ONTOLOGY, DESCRIPTION Axiom
V-D-J genes (in the VH domain) and V-J genes (in the ▶ IMGT-ONTOLOGY, IDENTIFICATION Axiom
VL domain), are involved in the antigen recognition. ▶ IMGT-ONTOLOGY, NUMEROTATION Axiom
The VH and VL amino acids in contact with the anti- ▶ IMGT-ONTOLOGY, SpecificityType
gen constitute the ▶ paratope (▶ IMGT-ONTOLOGY, ▶ IMGT ® Information System
SpecificityType). The part of the antigen recognized by ▶ Joining (J) Gene
the antibody is the ▶ epitope. The number of potential ▶ Labels and Relations
V-D-J and V-J rearrangements depends on the number ▶ Molecule Entity Type
of functional V, D, and J genes in the genome. Addi- ▶ MoleculeType
tional mechanisms (N diversity at the V-D-J and V-J ▶ Paratope
junctions and somatic hypermutations) allow to ▶ Variable (V) Domain
reach 1012 different antibodies per individual (Lefranc ▶ Variable (V) Gene
and Lefranc 2001a) (IMGT Repertoire, http://www.
imgt.org).
References
Cross-References Duroux P, Kaas Q, Brochet X, Lane J, Ginestoux C,
Lefranc M-P, Giudicelli V (2008) IMGT-Kaleidoscope,
▶ Chain Type the Formal IMGT-ONTOLOGY paradigm. Biochimie
▶ Complementarity Determining Region (CDR- 90:570–583
Giudicelli V, Lefranc M-P (1999) Ontology for Immunogenet-
IMGT) ics: the IMGT-ONTOLOGY. Bioinformatics 12:1047–1054
▶ Configuration Type Lefranc M-P, Lefranc G (2001) The immunoglobulin
▶ Constant (C) Gene FactsBook. Academic, London, pp 1–458
I 1002 Immunohistochemistry
Lefranc M-P, Giudicelli V, Ginestoux C, Bosc N, Folch G, Immunoinformatics is crucial for understanding
Guiraudou D, Jabado-Michaloud J, Magris S, Scaviner D, the immune response, particularly the adaptive
Thouvenin V, Combres K, Girod D, Jeanjean S, Protat C,
Yousfi Monod M, Duprat E, Kaas Q, Pommié C, Chaume D, immune response of vertebrates. Implementation of
Lefranc G (2004) IMGT-ONTOLOGY for immunogenetics specific databases and tools (Flower 2007; Lefranc
and immunoinformatics. In Silico Biol 4:17–29 2008; Schoenbach et al. 2008), creation of the
Lefranc M-P, Clément O, Kaas Q, Duprat E, Chestellan P, ▶ IMGT® information system in 1989, and development
Coelho I, Combres K, Ginestoux C, Giudicelli V, Chaume
D, Lefranc G (2005) IMGT-Choreography for immunoge- of ▶ IMGT-ONTOLOGY, the global reference in
netics and immunoinformatics. In Silico Biol 5:45–60 ▶ immunogenetics and ▶ immunoinformatics (Giudicelli
Lefranc M-P, Giudicelli V, Regnier L, Duroux P (2008) IMGT, a and Lefranc 1999; Lefranc et al. 2004, 2005, 2008; Duroux
system and an ontology that bridge biological and computa- et al. 2008), built by IMGT®, the international ImMuno-
tional spheres in bioinformatics. Brief Bioinform 9:263–275
GeneTics information system® (http://www.imgt.org)
(▶ IMGT® Information System), have contributed to the
coming of age of immunoinformatics.
Immunohistochemistry
Cross-References
Barbara J. Davis
Section of Pathology, Tufts Cummings School ▶ IMGT ® Information System
of Veterinary Medicine Biomedical Sciences, ▶ IMGT-ONTOLOGY
North Grafton, MA, USA ▶ Immunogenetics
▶ Structural Immunoinformatics
Definition
Cross-References
Immunomonitoring
▶ Lymphocyte Dynamics and Repertoires, Biological
Bertrand Bellier, Adrien Six, Véronique
Methods
Thomas-Vaslin and David Klatzmann
▶ Systems Immunology, Novel Evaluation of Vaccine
Immunology-Immunopathology-Immunotherapy,
UPMC University Paris 06, UMR7211, CNRS,
UMR7211, INSERM, U959, Paris, France
References
of success in preventing and treating infectious dis- versus nonself discrimination (Viret and Janeway
eases. Vaccines, in particular, offer protection against 1999). The short antigenic peptides, derived from
infectious diseases. However, new threats to health parent molecules have been proposed as interesting
continually emerge as bacteria and viruses evolve, targets of vaccine design. Thus, a new generation of
are transported to new environments, or develop resis- high efficiency, low toxicity epitope-based vaccines
tance to drugs and vaccines. Also, the current threat of are in development.
the potential use of viruses and bacteria as biological Accessing subdominant specificities might be of
weapons demands efficient methodologies that ana- particular value in the case of tumor antigens, where
lyze genetic information related to potential viral and self-tolerance might have inactivated T-cells recogniz-
bacterial threats, identify potential vaccine target, and ing the most dominant specificities. This approach also
then develop diagnostic tests and vaccines for them. allows elicitation of immune responses against multi-
With the advances in genomics, understanding of ple conserved epitopes, a factor of crucial importance
pathogen variability as well as of the diversity of the in the case of rapidly mutating pathogens, such as HIV
human immune system, has greatly improved. and HCV. It is also possible to overcome limitations in
The availability of human genome data and various recombinant protein delivery by combining epitopes
microbial genome sequences provides opportunity to from multiple antigens into a single immunogen. They
establish computational methodologies to meet the can also increase potency and break tolerance. T-cell
demands. To date, genome sequences are availab