You are on page 1of 9

C h a p t e r

10
Integrative Systems Biology:
Implications for the
Understanding of Human
Disease
M. Michael Barmada n David C. Whitcomb

“What is now proved was once only imagined.” defined by variable amounts of maldigestion (loss of
William Blake pancreatic acinar cell function), diminished bicarbon-
“All models are wrong, but some are useful.” ate secretion (loss of duct cell function), diabetes
George Box [1] mellitus (loss of islet cell function), inflammation
(immune system), fibrosis (regenerative systems), pain
(nervous system), and cancer risk (DNA repair and cell
INTRODUCTION cycle regulation systems). Since genetic and environ-
mental factors affect each of these systems in different
From the smallest collections of elementary particles ways, it is not surprising that dysfunction of these sys-
to the largest collections of galaxies, all matter is tems in response to stress or injury will be variable
constructed of interacting collections of elements, or between patients. However, it is also clear that each
networks. Interactions within and between these net- of these systems interacts with each other. Further-
works give rise to the observable properties of matter. more, for the systemic systems the commonly dysfunc-
Biological networks are no exception, from the level tional components of effector or regulatory pathways
of interacting enzymes and substrates which form a will also be dysfunctional in parallel disorders of other
pathway, up to the level of the biosphere, incorporat- tissues or organs. Careful modeling of these different
ing all living things and their relationships. Networks relationships within the phenome (the set of all phe-
have recently generated considerable interest in notypes of an organism) or within the interactome
biological sciences, not because of any change in our (the set of all interactions of an organism) can greatly
basic knowledge about them, but because of key tech- increase the power to identify functional genomic vari-
nological changes that have enhanced our ability to ation important in disease phenotypes and lead to a
interrogate them and thereby develop descriptive better understanding of how to readjust the system to
models allowing prediction of their behavior. This return to a normal state.
shift in ability has fueled new perspectives on how to At its most basic, systems biology comprises three
apply the scientific method to biomedical science, challenges: (i) how to generate a sufficient quantity
emphasizing integration instead of reduction. Systems of data to analyze variability in networks, (ii) how to
(or network) biology focuses on the investigation of properly integrate data from multiple disparate
complex interactions in biological systems. Much of sources into a usable corpus of knowledge, and (iii)
the promise of systems biology in biomedicine revolves how to use that corpus of knowledge to model a com-
around its potential to explain disease as a perturba- ponent system and optimize it. Model construction
tion of a normal system. Many disease syndromes allows predictions of the behavior of a system and, ulti-
include elements from multiple tissue-specific and sys- mately, an understanding of how to change that behav-
temic systems. For example, chronic pancreatitis is ior in predictable ways. The promise of this approach

Molecular Pathology # 2009, Elsevier, Inc. All Rights Reserved. 185


Part II Concepts in Molecular Biology and Genetics

is vast because the exact definition of a system is not requires that knowledge of the individual components
fixed—anything from a particular biochemical path- is sufficient to explain the behavior of the system. Apart
way up to the level of an entire organism can be con- from a few exceptions, current reductionist research
sidered a system in biomedical applications. In this models have not been successful at explaining the
regard, population-level biosciences, such as epidemi- large-scale behavior of biologic systems. To this end,
ology and population genetics, have long practiced a systems biology endorses a more top-down approach,
form of systems biology, but their focus has always allowing modeling to occur at a higher level of organi-
been on a collection of individuals as the system of inter- zation (not at the level of the components, but at the
est. Current systems biology approaches model systems level of the organ or the individual). In this point of
of interest to clinical outcomes, such as metabolic path- view, a better understanding of the system as a whole
ways, individual cells, and organs. Modeling at this level can create a better understanding of the components.
has the appeal of allowing translation of results from “Systems biology . . . is about putting together rather
decades of population-based laboratory research to be than taking apart, integration rather than reduction.
applied to individualized clinical medicine. It requires that we develop ways of thinking about inte-
The field of systems biology borrows from control gration that are as rigorous as our reductionist pro-
theory, which deals with the behavior of dynamic sys- grammes, but different” [2]. Or, put more simply [3]:
tems (systems that fluctuate in a dependent fash- “You [can] study each part of a Boeing 777 aircraft
ion). Control theory proposes the idea that systems and describe how it functions, but that still wouldn’t
can best be modeled as a cycle of controllers which tell you how the airplane flies.” This has led to the
modify inputs to produce the desired outputs, and recognition that new paradigms (integration rather
sensors which provide feedback to the system, pro- than reduction) may be necessary to understand and
ducing a steady-state system in which equilibrium is model systems with multiple interacting quantitative
achieved by balancing the input and output concen- components.
trations and the feedback signals. In a similar fash-
ion, systems biology includes the definition and
measurement of the components of a system, formu-
lation of a model, and the systematic perturbation
DATA GENERATION
(either genetically or environmentally) of and Central to the success of any modeling endeavor is
remeasurement of the system. The experimentally the generation of large quantities of data. Because
observed responses are then compared with those of the increased interest in systems biology, the
predicted by the model, and new perturbation theme of the late 1990s and early 21st century
experiments are designed and performed to distin- reflected exactly this necessity, exemplified by the
guish between multiple or competing models. This fact that several papers were published on methods
cycle of test-model-retest is repeated until the final of dealing with the so-called “data deluge” [4–6].
model predicts the reaction of the system under a Genomic technologies were the first to enter the
broad range of perturbations. high-throughput realm and subsequently the first
to produce large quantities of high-quality data. Cou-
pled with a concurrent explosion in computing
power, high-throughput data generation made realis-
Systems Biology as a Paradigm Shift
tic modeling feasible. New technologies produced
Although systems biology is itself a new discipline, the during this time vastly increased the ability to query
study of systems in biology is not new. Early studies of an entire system at once and led to the advent of
enzyme kinetics employed a similar cycle of testing the major international efforts of the early 21st cen-
and modeling, followed by retesting of new hypoth- tury, such as the Human Genome Project (the effort
eses. In a similar fashion, modeling of neurophysio- to sequence the entire complement of human chro-
logical processes like the propagation of action mosomes) [27]. The HapMap Project (the effort to
potentials along a neuron are illustrative of early for- catalog common genetic variation across multiple
ays into mathematical modeling of cellular processes. human ethnic populations) [7], the ENCODE Project
However, like all next-generation methodological (basically a merger of the Human Genome and Hap-
shifts, systems biology involves a rethinking of basic Map projects—an effort to resequence particular por-
principles—a return to a more basic understanding tions of the human genome in multiple individuals
or a more simplistic framework in which to generate from different ethnic populations to explore com-
hypotheses. Traditional science follows a bottom-up mon and rare genetic variation at a higher resolu-
paradigm—that is, break a problem down into its indi- tion) [8], and the 1000 Genomes Project (an effort
vidual components (reductionism), learn everything to completely sequence 1000 human genomes) [28].
there is to know about those individual components, Specifically, microarray technology coupled with
and then integrate the information together to get advances in mass spectroscopy, combinatorial chemis-
information about a system. Systems biology can work try, and robotics created an explosion of data and
within this paradigm, creating models for individual unique visualizations of cellular processes. Capitaliz-
components, and then integrating the models (creat- ing on this explosion of data, the first quantitative
ing, in essence, models of models, to explain the behav- model of the metabolism of a whole (hypothetical)
ior of the original system). However, this approach cell was published in 1997 [9].

186
Chapter 10 Integrative Systems Biology: Implications for the Understanding of Human Disease

Microarrays Transcriptomics
Microarray technology grew out of a confluence of trends Transcriptomics is the study of relative RNA transcript
and technologies. On the experimental level, microar- abundances, using microarray technologies. Chips that
rays are most similar to Southern blotting, where frag- are specialized for this purpose are known as RNA micro-
mented DNA is attached to a substrate and then probed arrays and are typically prepared with a library of tran-
(hybridized) with a known gene or fragment to identify scripts of known origin (representative tags for a known
complementary sequences. The trend through the complement of genes, for example—the current human
1990s was to increase capacity in individual experiments, RNA arrays contain approximately 60,000 probes, repre-
allowing the analysis of more and more samples (or more sentative of a majority of known RNA species from the
and more markers) in a single experiment. With the 20,000 or so human genes). These are then interrogated
advent of paper-blotting techniques (allowing DNA with RNA (typically reverse transcribed into cDNA) from
molecules to be immobilized or spotted on special paper, two different samples, labeled with different dyes (com-
then hybridized), and subsequently glass-blotting techni- monly green and red dyes). This allows the relative abun-
ques, coupled with advances in robotic pipetting (allow- dance (in one sample versus the other) of each RNA
ing smaller and smaller volumes of liquid to be spotted, transcript to be assessed. Experiments of this type are
in greater and greater densities), microarrays became routinely used to determine which RNAs are upregulated
feasible. Initial microarray experiments [10] began or downregulated in a disease sample versus a normal
with small numbers of immobilized probes, owing sample. However, note that the utility of the information
mostly to limited knowledge about the genes that make generated from transcriptomics studies is highly variable,
up a genome. This was quickly followed by arrays as transcription levels are potentially influenced by a wide
with hundreds, thousands, tens of thousands, and now range of factors, including disease phenotypes. To be of
(currently) millions of probes as our understanding of general use, transcriptomic data must be generated
the components of genomes grew (owing, in large part, under a wide variety of conditions and compared, to
to early efforts to identify and catalog all expressed genes eliminate the trivial sources of variation.
in organisms and to large-scale efforts like the Human
Genome and ENCODE projects).
Microarrays have now found use in multiple experi- Genotyping
mental venues. Most commonly, the molecules being One of the earliest explosions of data on a whole-genome
immobilized on the array are DNA molecules, allowing scale came from early genetic linkage studies. The first
measurement of expression levels or detection of poly- whole-genome genetic linkage scans were performed in
morphisms (such as single-nucleotide polymorphisms the early 1990s using panels of 350 genetic markers scat-
or SNPs). A DNA microarray consists of thousands of tered throughout the autosomal chromosomes, as well as
microscopic spots (or features), each containing a spe- the X-chromosome [11]. Subsequent advances in map-
cific DNA sequence. Each feature is typically labeled ping techniques and marker discovery increased the
with a fluorescent tag and then used as probes in number of markers in the genetic maps, and included
hybridization experiments with a DNA or RNA sample both sex chromosomes, as well as the mitochondrial
(the target), which is labeled with a complementary genome. Markers for whole-genome genotyping have fol-
fluorescent label. Hybridization is quantified by fluo- lowed a rather amusing pattern of oscillation between the
rescence-based scanning of the array, allowing deter- use of simple polymorphisms (with two alleles) and com-
mination of the relative abundance of nucleic acid plex polymorphisms (with multiple alleles). The earliest
sequences in the target by characterization of the rela- genetic markers to be used (easily observable phenotypes
tive abundance of each fluorophore (Figure 10.1). such as eye color, sex, handedness, and others) were gen-
erally treated as simple binary traits. The first protein bio-
markers (blood group protein polymorphisms and HLA
polymorphisms) were complex, having many alleles with
complicated ethnic and regional variations in frequency.
Restriction fragment length polymorphisms (or RFLPs)
having only two alleles indicating the presence or absence
of a recognition site for restriction enzymes were the next
marker type to be in vogue, and were the first to be sug-
gested as usable for whole-genome analysis because of
their frequency throughout the genome. This was fol-
lowed by the discovery of DNA-length polymorphisms,
in the form of tandem repeated regions (called Variable
Numbers of Tandem Repeats or VNTRs), which are com-
plex polymorphisms. Recognition of the existence of tan-
dem repeats led to the discovery of microsatellite
markers—basically short tandem repeats (STRs), on the
Figure 10.1 Example of an approximately 40,000 probe order of 2–5 base pairs repeated several times. The fre-
spotted oligonucleotide microarray with enlarged inset quency of STRs in the genome enabled generation of
to show detail (Image from Wikimedia Commons). the first whole-genome maps. This was followed in quick

187
Part II Concepts in Molecular Biology and Genetics

succession by the discovery of single nucleotide poly-


morphisms (or SNPs), which initially were described as
having only two alleles, and so continue in use as simple
markers (though in fact many SNPs have >2 alleles).
Due to their abundance (estimated at 1 SNP per 100 base
pairs of DNA) and the relative ease of genotyping these
markers, the development of DNA microarray-based gen-
otyping technologies for SNPs occurred quickly. SNPs are
now the current standard for genome-wide genetic stud-
ies. When SNP content on genotyping arrays became suf-
ficiently dense (and also due to findings from other
genomic technologies like array-based comparative gen-
omic hybridization (arrayCGH) techniques), the wide-
spread occurrence of copy number variations (deletions
and duplications) was detected, heralding a return to
more complex (multiallelic) markers again. Current gen-
otyping arrays now contain a mixture of simple polymor-
phic markers (SNPs) and so-called copy-number probes,
allowing the assessment of copy-number changes across
the genome, at a density which provides excellent cover-
age for most ethnic groups (1,000,000 SNPs and nearly
that many copy number probes, meaning each array
carries almost 2 million features). It is expected that chip
densities will continue to grow, allowing for higher-
throughput genotyping. However, at the same time
rapid low-cost sequencing technologies are becoming
available that may well allow for affordable whole-
genome sequence-based assessment of genomic varia-
tion, replacing any need for increased array densities.

Figure 10.2 Omics technologies gather data on numerous


Other Omic Disciplines levels. Various “omics” fields are displayed here along with the
laboratory techniques used to generate the data, as well as the
Recognizing that many properties of biological systems relationship of data from one level to another. Adapted from [25].
are emergent, or a result of complex interactions
between components that are not understandable at other mass spectroscopy profiles to allow for identifica-
the level of individual system components, scientists tion of individual components.
in the 21st century turned to analysis of different levels Other omic disciplines are similarly named—the gen-
of organization in an organism. Naturally, given the eral rule being that the respective discipline arises from
penchant of science for fancy names, each level of the study of the associated “ome” (or set). This gives rise
organization had to be given an appropriate name. to disciplines such as (i) metabolomics—the study of the
For example, since the genome is the entire set of entire range of metabolites taking part in a biological pro-
genes of an organism, genomics is the name given to cess; (ii) interactomics—the study of the complete set of
the study of the whole set of genes of a biological sys- interactions between proteins or between these and other
tem. Various methods, such as genotyping, sequencing molecules; (iii) localizomics—the study of the localization
(examining the linear sequence of DNA) and tran- of transcripts, proteins, and other molecules; and (iv) phe-
scriptomics (examining the expression of all genes in nomics—the study of the complete set of phenotypes of a
an organism) can be used to interrogate the genome. given organism. These various “omes” can be organized
Likewise, given that the proteome is the collection of hierarchically, as demonstrated in Figure 10.2, based on
proteins expressed in a system, proteomics is the name the relationships recognized from years of biological
given to the study of the proteome. Proteomic methods experimentation. As with genomics and proteomics,
include traditional techniques such as mass spectros- experimental procedures for these other “omics” disci-
copy—identifying compounds based on the mass- plines are adaptations of traditional methods (such as
charge ratio of ionized particles. But just as microarrays microscopy for localizomics, or single metabolite measure-
are a modification of traditional genomic techniques ments for metabolomics) for high-throughput protocols.
for high-throughput experiments, proteomics modi-
fies the techniques of mass spectroscopy in a similar
fashion—creating experimental and informatics pipe- DATA INTEGRATION
lines that allow a sample (like a blood draw) to be par-
titioned into various fractions, have those fractions While data generation is certainly of paramount impor-
examined via mass spectroscopy, and have the results tance for systems biology, because the technologies that
of the mass spectroscopy experiments compared to generate the data have their own unique (and often

188
Chapter 10 Integrative Systems Biology: Implications for the Understanding of Human Disease

proprietary) formats for storing the data, systems biolo- The representation of information by RDF allows the
gists must also concern themselves with integrating data location of data and data resources to be independent
that span multiple resources. Since 1998, the number of of the location of user interfaces or analytic resources.
recognized online databases related to biological infor- In essence, this allows the integration of data from multi-
mation has increased 10-fold (from just under 100 in ple repositories and the querying of the integrated data.
1998 to over 1000 in 2008) [12]. Resources like the Coupled with the concept of RDF schemas to accurately
Bioinformatics Links Directory (http://bioinformatics. describe knowledge about objects and ontologies to orga-
ca/links_directory/) extend this further by collecting nize the relationship of objects to one another, the use of
links to molecular resources and tools as well as data- RDF can solve several of the problems mentioned for
bases, and currently list 2300 links. However, the problem data integration (namely, the lack of standards and the
is more than just quantity. Issues of data quality, lack of lack of interfaces for integration). Another current
standards, lack of interfaces allowing for integration, trend—the use of wikis (a website that allows collabora-
and longevity (or lack thereof) continue to plague online tive editing of its content by its users)—addresses the
biological data resources, and make cross-resource query- problem of data quality by allowing users to mark data
ing or integration difficult [12,13]. Nonetheless, tools con- as reliable or not, or by allowing users to update out-of-
tinue to appear to assist in the integration of data from date or incorrect data.
disparate sources [14]. Using technology adopted from
the burgeoning Semantic Web (or Web 2.0) projects
[15], biologists and bioinformaticists are able to capture
instances of data sufficient for systems biology scale efforts.
MODELING SYSTEMS
Once the appropriate types of data have been generated,
and the various sources of data collected and integrated,
the resulting information is turned into knowledge by
Semantic Web Technologies
interpreting what the data actually mean, and how they
Many of the problems inherent to integration of address questions that need to be answered. Data model-
biological data resources are similar to those being ing is used to understand the relationships important in
faced by the larger community of World Wide Web defining the system. As noted previously, systems biology
users. The Semantic Web is a vision for how to have com- draws heavily from control theory, which itself is derived
puters infer information relating one web page (or ele- from mathematical modeling of physical systems.
ment) to another [15]. It is an extension of the Although the mathematical derivation of control prob-
current web protocols (primarily the HyperText Trans- lems is complex, the fundamental concepts governing
port Protocol or HTTP), which allows for meaning to the formulation of control models are more intuitive
be imbedded together with content, such that auto- and relatively limited. Three basic concepts can be
mated agents can make associations between data with- thought of as central to forming control models: (i) the
out needing user input. In essence, creating the need for control (regulation or feedback), (ii) the need
Semantic Web involves recasting the information in for fluctuation, and (iii) the need for optimization. A sim-
the World Wide Web (which currently stores relation- ple control model is presented in Figure 10.3.
ships in the form of hyperlinks, which can link anything Key characteristics of this model include a control-
to anything, and so do not represent information con- ler (responsible for the conversion between input
tent or meaning) into a format which allows relation- and output) and sensor (responsible for determining
ships to be represented. The eXtensible Markup the degree and direction of the feedback)—each of
Language (or XML) is an early example of this type of which can be the result of a single or multiple ele-
recasting—allowing content to be stored with represen- ments. An initial model of this form can often be
tative tags that describe the content. Resource Descrip- derived from an initial data generation step, which
tion Framework (or RDF) builds on XML by using can measure a baseline set of conditions for the system
triples (subject-predicate-object) to represent the infor- and also provide an idea of the components of the sys-
mation in XML tags (or in hyperlinks), and defining a tem. For example, an expression network (groups of
standard set (or schema) of RDF triples to describe a genes that are co-regulated in some fashion) that regu-
particular object. Each part of a triple names a resource lates a biochemical pathway can be thought of as a
using either a Uniform Resource Identifier (URI) or control model. A single transcriptomics experiment
Uniform Resource Locator (URL), or a literal. The can give you information on genes that are potentially
advantage of this format is that RDF schemas are prede- co-regulated (sets of genes that are overexpressed
fined so that meaning can be imbedded in the defini- compared to control, and so which might be providing
tion, or by using a hierarchy or ontology, describing the function of a controller), as well as information
the relationship within and between schemas. Also, about potential sensors (genes that differ in re-
since the RDF triple can contain a location as well as sponse—one being overexpressed, the other under-
attribute-value pair, the components of a schema do expressed). However, the problem with a single exper-
not need to be located in the same place. Thus, if we iment (or a single snapshot of the transcriptome) is
have a schema representing a web page in which the sec- that the information derived is not sufficient to disen-
tions (header, footer, left panel, right panel, title, and tangle the true positives (elements that truly should be
others) are defined by RDF triples, these web pages components of the model) from the false positives
can easily integrate data from multiple sources. (random variation which transiently mimics the behavior

189
Part II Concepts in Molecular Biology and Genetics

Figure 10.3 Example of a control module. This module represents a simple feedback loop, with output from the sensor
either upregulating or downregulating the process that converts the input into output.

Figure 10.4 The iterative nature of systems biology research. Note that every refinement of the model needs additional
data generation for retesting of specific hypotheses. Adopted from [26].

of the model). This reflects the need for fluctuation. By IMPLICATIONS FOR UNDERSTANDING
perturbing (changing an aspect of the system to produce
a predictable outcome) and remeasuring the system,
DISEASE
additional confidence regarding the true positives can A new revolution in medicine was made possible at the
be achieved assuming the new measurements reflect end of the 20th century by the sequencing of the human
the predictions of the model. If not, another model genome. This remarkable feat led to the revelation that
which explains the previous data measurements is millions of variations exist in the DNA sequence of indi-
selected, and another test is devised. This cycle continues vidual humans, with an infinite number of possible var-
until the final model accurately represents the data iations. Furthermore, genetic studies revealed that
measured in all tests. In this way, the most optimized many chronic inflammatory diseases, such as Crohn’s
model is selected (Figure 10.4). disease, ulcerative colitis, chronic pancreatitis, rheuma-
Lastly, although not immediately obvious from the tologic disease, heart diseases, and others do not have a
preceding discussions, data modeling also requires large single genetic factor causing the disease, but rather
amounts of computational power. Due to the size of the several dozen common genetic variations that, in the
high-throughput data sets currently in use (expression context of many possible interacting environmental fac-
data on 30,000 genes for multiple time points; SNP geno- tors, cause a disease that is defined by the location of
types for millions of markers; occurrence of all known inflammation and associated signs and symptoms.
protein-protein interactions; computational prediction Unfortunately, the simplicity and early success of the
and annotation of all known protein-DNA binding sites; germ theory of disease, in which a single pathological
and others), the advanced mathematical and visualiza- agent was responsible for a specific disease with charac-
tion frameworks currently in use, and the iterative nature teristic signs and symptoms, lulled physicians into think-
of analysis, large amounts of computing power are nec- ing that all disease would follow a similar pattern. This
essary, often pushing the limits of even parallelized or reductionist-like approach in medicine allowed physi-
grid-enabled computational clusters. Continued growth cians and scientists from different disciplines to triangu-
in computing power will be necessary to support the late on the same target using different methods and
ongoing use of systems biology as the models in use perspectives. It has identified numerous components
become more and more elaborate and incorporate of individual systems—many of which are reused over
information from greater numbers of high-throughput and over, leading to a modular view of system compo-
methodologies. nents—and has provided numerous insights into

190
Chapter 10 Integrative Systems Biology: Implications for the Understanding of Human Disease

human disease. Despite these advances, the reductionist which have looked at the clustering of disease phe-
approach in medicine has been less successful in identi- notypes based on their representative genomics
fying the many complex interactions between disease (associated gene/SNP polymorphisms or expression
components, and in explaining how system properties profiles) [16] have demonstrated that even disease phe-
(like disease phenotypes) emerge—a concept which notypes we once thought were roughly homogeneous
becomes important when considering manipulation of (for instance, disease subtypes like Crohn’s disease with
systems to produce predictable and desirable outcomes, ileal involvement in Caucasian-only populations) are in
as required for preventative medicine. As such, new fact associated with different genetic loci (heteroge-
paradigms are required, likewise requiring new neous) [17]. Personalized medicine must deconvolute
approaches and new methods. systemic and tissue-specific pathways and reconstruct
One such new paradigm is the advent of persona- them in a way that leads to precise, patient-specific
lized medicine, which can be thought of as an applica- treatments.
tion of systems biology-type thinking to medicine.
Personalized medicine represents a transition from
population-based thinking (describing risks on a popu-
lation level) to an individual-based approach (describ-
The Transition to Personalized Medicine
ing risks for an individual based on his or her personal The effect of approaching complex inflammatory dis-
genomic/proteomic/metabolomic/phenomic profiles). orders as a single disease rather than as a complex pro-
Reductionist approaches have successfully identified cess is illustrated through the comparison of multiple
many of the biomarkers required to define disease states small studies by meta-analysis. While some of the
in complex disorders on a population level (that is variance in estimated effect sizes from small genetic
attributing population risks to various changes), but have studies can be attributed to random chance, the pos-
not been able to describe how individual exposures sibility also exists that populations from which the
or biomarkers (or combinations of these components) samples were taken were not equivalent, and that dif-
interact to create the disease state in individuals. To ferent etiologies will lead to the same end-stage signs
achieve this transition, systemic models which explain and symptoms through different, parallel pathways.
the function of systems (cells, organs, etc.) are needed, In cases where a candidate gene is critical to some
from which the implications of changes in particular pathologic pathway, but not others, the wide variance
exposures or biomarkers (genetic changes, proteomic between small genetic studies may reflect the fraction
changes, and others) can be understood on the system- of subjects that progress to disease through that spe-
level. cific gene-associated pathway.
Evidence of this effect was recently demonstrated in
an evaluation of reports on the effect of the pancreatic
secretory trypsin inhibitor gene (SPINK1) N34S poly-
Redefining Human Diseases morphism in chronic pancreatitis [18]. A model was
The first major hurdle in transitioning from allopathic developed to test the hypothesis that alcohol, which
medicine to personalized medicine is to redefine com- is known to be associated with chronic pancreatitis
mon human diseases. In allopathic medicine, diseases and fibrosis via the collagen-producing pancreatic stel-
are typically defined by characteristic signs and symp- late cell (PSC), drives pancreatic fibrosis through a
toms that occur together and meet accepted criteria. recurrent trypsin activation pathway as seen in heredi-
Chronic inflammatory diseases are defined by the loca- tary pancreatitis [19,20], or through a trypsin-indepen-
tion of the inflammation, the persistence of inflamma- dent pathway, as illustrated in Figure 10.5 [18].
tion, and the presence of tissue destruction or We identified 24 separate genetic association stud-
scarring. These definitions indicate that the specific ies of the effect of the SPINK N34S mutation on
mechanism causing a specific organ to develop and chronic pancreatitis with effect sizes (odds ratio, OR)
to sustain an inflammatory reaction is not known. Fur- reported to be between nonsignificant to 80. Using
thermore, it has been impossible to identify the cause meta-analysis, we determined an overall effect of the
when using the traditional scientific methods used to SPINK1 pN34S on risk of chronic pancreatitis to be
identifying a single, causative infectious agent. Instead, high (OR 11.00; 95% CI: 7.59–15.93), but with signifi-
the possibility that disease affecting one or more cant heterogeneity. Subdividing the patients into four
organs can occur through any one of multiple path- groups based in proximal etiological factors (alcohol,
ways, and that each pathway has multiple steps and tropical region, family history, and idiopathic), we
regulatory components, and that multiple effects on found that the effect of SPINK1 pN34S on alcohol
multiple systems and multiple environmental expo- (OR 4.98, 95% CI: 3.16–7.85) was significantly smaller
sures may be required before disease is manifest must than idiopathic chronic pancreatitis (OR 14.97, 95%
be embraced to transition to personalized medicine. CI: 9.09–24.67) or tropical chronic pancreatitis (OR
Indeed, physicians recognized that diseases located in 19.15, 95% CI: 8.83–41.56). Thus, we conclude that
specific organs share some common systemic features alcohol acts through Factor A-type pathway, while trop-
such as the type of inflammatory response (autoim- ical chronic pancreatitis and idiopathic chronic pan-
mune or fibrosing), or chronic pain since they use a creatitis act through trypsin-associated pathways. The
limited number of anti-inflammatory or pain medica- fact that a higher percentage of alcoholic patients than
tions for a variety of diseases. Additionally, studies the control populations had SPINK1 mutations may

191
Part II Concepts in Molecular Biology and Genetics

Figure 10.5 Hypothesis of etiology-defined pathways to pancreatic fibrosis. Hypothetical influence diagram illustrating
pathologic pathways linking proximal factor (Factor A and B) to PSC (pancreatic stellate cell) and fibrosis through multiple
steps (a1, a2, a3). Etiological factors of type B activate trypsinogen to trypsin, and therefore their pathologic pathway to the
PSC can be interrupted by SPINK1. Etiological factors of type A are independent of trypsin, and therefore will not be
influenced by variations in SPINK1 expression or function.

also mean that some idiopathic patients were misdiag- disease phenotype. The resulting network graph (Figure
nosed as being alcoholics since the threshold for 10.6) demonstrates the association of different genes
alcohol-associated risk was previously unknown [21]. together in modules, many of which represent asso-
It could also mean that alcohol accelerates the pro- ciations of proteins in macromolecular complexes
gression from recurrent acute pancreatitis to chronic (protein-protein interactions), or association of proteins
pancreatitis as demonstrated in animal models [22]. in metabolic or regulatory pathways.
Thus, these results are important for understanding This view of the network of disease genes can at the
the etiology and progression of alcoholic chronic same time inform us about novel associations we might
pancreatitis, but have broader implications for under- not have been aware of (such as PAX6 in the ophthal-
standing the presence and effects of heterogeneity of mological cluster, or PTEN and KIT in the cancer clus-
pathways in complex disorders. ter), and also direct us toward pathways (clusters) that
The chronic pancreatitis model also illustrates a might be of most benefit in understanding the net-
major problem for transitioning to personalized medi- work more completely (that is, those that are involved
cine. Very large association studies require the recruit- in multiple disease states or that have the most connec-
ment of subjects from many large medical centers tions to them). This network-centric view of disease
in different regions and different countries. This can also inform clinical medicine in terms of the
approach will tend to obscure mechanism-based het- choice of medications (indicating that medications
erogeneity and converge on only the most common used in one disease might also function appropriately
features of the disease (population-level effects), even in another, based on clustering of associated genetic
though other factors may have a stronger biological factors).
effect in a limited number of patients, while being
irrelevant in others. This argues for more detailed phe-
notyping of study populations (phenomics) and careful
analysis of context-dependent effects (interactomics).
DISCUSSION
The application of systems biology to this data would We have outlined many of the reasons that a systems
allow for a better understanding of the various pathways biology-based approach is needed for understanding
leading to disease states, and so a better understanding complex disorders. Once the key components are
of the possible interactions (or context-dependent organized into a logical series, then formal modeling
effects) that accurately predict disease in a single of the disease process can be applied, tested, addi-
individual. tional experimental data added, the system calibrated
and retested. The optimal model or models are yet to
be determined experimentally for any system, but
Applications of Systems Biology to Medicine with the continued rapid increase in high-throughput
The realization of personalized medicine requires not data generation and the continued increase in
only a more systems-like perspective regarding risk computational power, large-scale integrations and
factors, it also requires a rethinking of existing classifi- models can be achieved. The challenge will be to
cations, which are largely derived from observation accurately anticipate the information necessary to be
and reductionist approach (common observable phe- included in the model, as accurate assessment of the
notypes should be the result of common underlying fac- context is essential to the appropriate understanding
tors). As an example of this rethinking, a recent study of the effect of individual variation and the integra-
undertook the reclassification of the disease phenome tion of that understanding into clinical practice. It is
(the space of all associations between disease pheno- not enough to know that certain changes affect a phe-
types) using the explosion of current information on notype (like disease risk) in a population, as these
genetic disease associations [23,24]. Disease-associated overall effects are typically very small (on the order
genes were clustered based on information about gene- of an odds ratio of 1.1 to 1.2). However, with a proper
gene or protein-protein interactions (from transcrip- understanding of the context in which each variant is
tomics or interactomics studies), and superimposed with important, appropriate medical interventions can be
a representation of the organ system of the associated implemented.

192
Chapter 10 Integrative Systems Biology: Implications for the Understanding of Human Disease

Figure 10.6 Disease gene network. Each node is a single gene, and any two genes are connected if implicated in the same
disorder. In this network map, the size of each node is proportional to the number of specific disorders in which the gene is
implicated. Reproduced with permission from [24].

REFERENCES 15. Berners-Lee T, Hendler J, Lassila O. The Semantic Web: A new form
of web content that is meaningful to computers will unleash a revolu-
1. Box GEP, Draper NR. Empirical Model-Building and Response Sur- tion of new possibilities. Scientific American. May 17, 2001;29–37.
faces. Wiley;1987;424. 16. Oti M, Brunner HG. The modular nature of genetic diseases.
2. Noble D. The Music of Life: Biology Beyond the Genome. Oxford: Clinical Genetics. 2007;71:1–11.
Oxford University Press;2006;21 17. Rioux JD, Xavier RJ, Taylor KD, et al. Genome-wide association
3. UCSF School of Pharmacy—News Archive, Systems Biology Reshapes study identifies new susceptibility loci for Crohn disease and
Science, March 9, 2004;http://pharmacy.ucsf.edu/news/2004/ implicates autophagy in disease pathogenesis. Nature Genet.
03/09/1/ 2007;39:596–604.
4. Blake JA, Bult CJ. Beyond the data deluge: Data integration and 18. Aoun E, Chang CC, Greer JB, et al. Pathways to injury in chronic
bio-ontologies. J Biomedical Informatics. 2006;39:314–320. pancreatitis: Decoding the role of the high-risk SPINK1 N34S
5. Lanfear J. Dealing with the data deluge. Nature Reviews Drug haplotype using meta-analysis. PLoS ONE. 2008;3:e2003.
Discovery. 2002;1:479. 19. Whitcomb DC, Gorry MC, Preston RA, et al. Hereditary pancrea-
6. Preston JD. Taking advantage of the data deluge. Int J Prosthodon. titis is caused by a mutation in the cationic trypsinogen gene.
1996;9:509. Nature Genet. 1996;14:141–145.
7. The International HapMap Consortium. A haplotype map of the 20. Gorry MC, Gabbaizedeh D, Furey W, et al. Mutations in the cat-
human genome. Nature. 2005;437:1299–1320. ionic trypsinogen gene are associated with recurrent acute and
8. ENCODE Project Consortium, Birney E, Stamatoyannopoulos chronic pancreatitis. Gastroenterology. 1997;113:1063–1068.
JA, et al. Identification and analysis of functional elements in 21. Yadav D, Hawes RH, Brand RE. Alcohol Consumption, Cigarette
1% of the human genome by the ENCODE pilot project. Nature. Smoking and the Risk of Recurrent Acute and Chronic Pancrea-
2007;447:799–816. titis. Pancreatology [in press].
9. Tomita M, Hashimoto K, Takahashi K, et al. E-CELL: Software 22. Deng X, Wang L, Elm MS, et al. Chronic alcohol consumption
environment for whole cell simulation. Genome Informatics Work- accelerates fibrosis in response to cerulein-induced pancreatitis
shop Genome Informatics. 1997;8:147–155. in rats. Am J Pathol. 2005;166:93–106.
10. Schena M, Shalon D, Davis RW, et al. Quantitative monitoring of 23. Goh KI, Cusick ME, Valle D, et al. The human disease network.
gene expression patterns with a complementary DNA microar- Proceed Natl Acad Sci USA. 2007;104: 8685–8690.
ray. Science. 1995;270:467–470. 24. Loscalzo J, Kohane I, Barabási AL. Human disease classification
11. Peltomaki P, Aaltonen LA, Sistonen P, et al. Genetic mapping of in the postgenomic era: A complex systems approach to human
a locus predisposing to human colorectal cancer. Science. pathobiology. Mol Syst Biol. 2007;3:124.
1993;260:810–812. 25. Fischer HP. Towards quantitative biology: Integration of biological
12. Galperin MY. The Molecular Biology Database Collection: 2008 information to elucidate disease pathways and drug discovery.
update. Nucleic Acids Research. 2007;36:D2–D4. Biotechnol Annu Rev. 2005;11:1–68.
13. Merali Z, Giles J. Databases in peril. Nature. 2005;435:1010– 26. Studer SM, Kaminski N. Towards systems biology of human
1011. pulmonary fibrosis. Proceed Am Thorac Soc. 2007;4:85–91.
14. Cohen-Boulakia S, Biton O, Davidson S, et al. BioGuideSRS: Que- 27. Human Genome Project: http://www.ornl.gov/sci/techresources/
rying multiple sources with a user-centric perspective. Bioinformat- Human_Genome/home.shtml
ics. 2007;23:1301–1303. 28. 1000 Genomes Project: http://www.1000genomes.org/page.php

193

You might also like