You are on page 1of 10

Applying Genomics to the Study of Complex

Disease
Brian D. Juran, B.S.,1 and Konstantinos N. Lazaridis, M.D.1

ABSTRACT

The interest in dissecting the genetic and environmental components of complex


human disease is growing, fueled by the emerging advances in the field of genomics and
related disciplines. Improved understanding of the pathogenesis of complex liver diseases
such as gallbladder stones, nonalcoholic fatty liver disease, viral hepatitis, and hepatocel-
lular carcinoma remains a goal of the clinical and experimental hepatologist alike. Despite
the scientific progress and technological advancement, elucidating the underlying mech-
anisms of complex hepatic diseases from the genomic standpoint will be demanding.
Complexity of genomic structure and function, disease heterogeneity, influence of the
environment on disease development and progression, and epigenetics all contribute to the
challenge. To overcome these obstacles, novel conceptual frameworks regarding biological
systems and human diseases are necessary in addition to a coordinated endeavor among
different scientific disciplines. Deciphering in an integrated fashion the genomic, tran-
scriptional, and translational aspects of the pathogenesis of complex liver diseases will lead
to their better prediction, diagnostics, and treatment.

KEYWORDS: Systems biology, disease susceptibility, genetics

C omplex diseases are heterogeneous, the cumu- ies toward achieving a better understanding of complex
lative result of a wide array of gene variants (both disease.
common and rare), somatic mutations, epigenetic mod-
ifications, and environmental exposures, the combina-
tions of which are apt to be significantly varied among DISEASE COMPLEXITY
the spectrum of affected individuals.1 Thus, inherited
genetic variation is not directly the cause of complex Systems Biology: Robustness, Modularity,
disease but instead acts to mediate the risk of disease and Redundancy
development in response to environmental exposures. We humans are in essence complex biological machines,
The clinical and genetic heterogeneity inherent in these shaped over millennia by evolutionary forces and defined
disorders greatly complicates our ability to dissect the by our genome. All of the information necessary for life
underpinnings of their etiology and pathogenesis. is encoded in our DNA. However, its utilization is
In the following pages we provide a brief overview dependent on the cellular context in which it resides,
of the concepts involved with disease complexity and the allowing the development, organization, and sustain-
field of genomics. We then discuss the current strategies ment of the diverse set of cells that comprise us. The
and future challenges of applying genomics-based stud- ability of the genome to generate and coordinate this

1
Division of Gastroenterology and Hepatology, Center for Basic Genetics and Genomics of Complex Diseases in Hepatology; Guest
Research in Digestive Diseases, Mayo Clinic College of Medicine, Editor, Konstantinos N. Lazaridis, M.D.
Rochester, Minnesota. Semin Liver Dis 2007;27:3–12. Copyright # 2007 by Thieme
Address for correspondence and reprint requests: Konstantinos N. Medical Publishers, Inc., 333 Seventh Avenue, New York, NY 10001,
Lazaridis, M.D., Mayo Clinic College of Medicine, 200 First Street USA. Tel: +1(212) 584-4662.
SW, Rochester, MN 55905. DOI 10.1055/s-2006-960167. ISSN 0272-8087.
3
4 SEMINARS IN LIVER DISEASE/VOLUME 27, NUMBER 1 2007

tremendous level of diversity has evolved over the billions another gene or module occurs (Fig. 1). Thus, redun-
of years that DNA-based life has existed.2,3 Robustness is dancy can and often does generate disconnect between
the driving force behind this process and is a fundamental genotype and phenotype, a process that is dependent on
feature of complex biological systems.4 Simply put, robust higher order pathway and network interactions and the
properties provide phenotypic stability in the presence of context in which the genome is utilized.8,9 These funda-
unpredictable environmental or genetic challenges. Com- mental features of complex biological systems signifi-
plex systems become robust through modularity and cantly affect the genetic mechanisms underlying the
redundancy, both of which act to mitigate the potential etiology of complex human disease and pose limits on
for system-wide damage.5 Modular structure is pervasive our current ability to study them.
in life and often exists in a hierarchy (i.e., organs, tissues,
cells, and organelles). In addition to physical structure,
functional and regulatory mechanisms such as metabo- Disease Genetics
lism, cell cycle, and signal transduction are widely modu- Genetic predisposition is thought to play a role in most
larized, taking the form of numerous networks and human diseases.10 Currently, three classifications based
pathways operating at the postgenome level.5,6 Intercon- on genetic involvement with disease are recognized:
nection of these networks and widespread gene duplica- chromosomal, Mendelian, and complex.11 Chromosomal
tion derive the means for use of alternative genes or disorders are characterized by gross abnormalities in
pathways generating redundancy,7 a compensatory proc- chromosome number or structure and often result in
ess to achieve the desired phenotype when failure in preterm death related to developmental abnormalities.

Figure 1 Interconnected biological pathways generate redundancy. The interrelation of numerous pathways generates a redundant
network, buffering the effect of input variability and genetic polymorphism on phenotype. Shown is a simplified hypothetical signaling
network through which phenotypic effects are stimulated by a primary input that is detected by members of two distinct modular
pathways (delineated by boxes). In this example, these pathways communicate through a primary node (dark gray square) that acts to
mediate a large portion of the signal in the network and provides a feedback loop to diminish the effect of input variation on the
phenotype (illustrated by dotted lines). However, some of the signal from each pathway, as well as a secondary input (dark gray circle), is
able to bypass this node and directly stimulate an effect on phenotype. Genetic polymorphism in individual components of the network
(i.e., genes) is unlikely to have a great effect on phenotype because of the many means through which the input stimulus can be passed.
However, a slight phenotypic effect could be demonstrated by these putative variants, potentially contributing to risk of disease.
Furthermore, genetic variants of the primary node and secondary nodes (light gray circles) are more likely to display a detectable effect.
APPLYING GENOMICS TO THE STUDY OF COMPLEX DISEASE/JURAN, LAZARIDIS 5

Mendelian diseases run in families and display classic Disease concordance in monozygotic (MZ) twins
patterns of inheritance, such as autosomal dominant, is presently the best means for establishing the strength
autosomal recessive, or X-linked. In general, these of the inherited genetic determinates of complex dis-
disorders are rare, arise early in life, and can be attributed ease.13 As MZ twins have identical DNA sequences,
to mutation in a single gene that, when present, directly disease concordance is suggestive of genetic influence,
causes the disease phenotype. Often these causative whereas discordance indicates a greater role for environ-
mutations are family specific. mental or stochastic effects. Furthermore, the difference
The vast majority of human diseases are genet- in disease concordance between MZ and dizygotic (DZ)
ically complex, wherein the direct correspondence be- twin pairs may provide additional insight into the
tween causative genotype and disease phenotype mechanisms at play in complex disease development.
characteristic of Mendelian disorders is not present.12 For example, large differences in disease concordance
Instead, complex diseases develop as the cumulative between MZ and DZ twin pairs could signify the
result of environmental exposures, exerting their effect involvement of numerous risk-modifying gene variants
over time, in genetically susceptible individuals. There- and, conversely, slight differences in concordance might
fore, the genotypic components of complex diseases are imply a stronger shared-environment effect. However, it
not causative but rather mediate disease risk. Many such should be noted that de novo genetic and/or epigenetic
susceptibility genotypes are expected for each complex effects prior to or after MZ twinning could have an effect
disorder, some of which are common to similar diseases on MZ disease concordance and MZ/DZ concordance
(e.g., autoimmune disorders) and some of which may be ratios, obscuring the reality of the inherited genetic
disease specific. Regardless, the individual contributing contribution to disease.14 Familial aggregation also
variants are likely to have only a slight contribution to provides a way to estimate the genetic impact on
the overall risk of each specific disease in the affected complex diseases,11 as family members share more
population (Fig. 2). genetic material between themselves than with the
Complex diseases are diverse in clinical presenta- general population. Relative risk ratios (l) are often
tion, progression, and response to treatment. Because of used to illustrate the risk of disease development in the
this heterogeneity, it is useful to break down complex families of affected individuals. The l is calculated by
disorders to a series of disease traits or phenotypes for dividing the prevalence of a complex disease among
consideration in genomic studies. These traits can be family members (often specifically siblings, ls) by the
qualitative, such as the presence of a comorbid disease, an prevalence of the disease in the population at large.11 In
associated diagnostic marker, or a previously determined general, higher l values suggest a greater role of the
risk factor. In addition, these traits could be quantitative genetic component in disease.
measures such as age of disease onset, results of serum
liver tests, or gene expression profiles. Comprehensive
characterization, assessment, and utilization of these GENOMICS
traits will be essential in the dissection of the genetic From the earliest stages of life the human genome is
and environmental contributors to complex disease. selectively activated in response to both internal and

Figure 2 Comparison between Mendelian and complex diseases. In general, Mendelian diseases have lower frequency in the
population, display higher prevalence, and are caused by a single or a few genes, each of which has a high effect on the disease
phenotype. In contrast, complex diseases are usually more common, have reduced prevalence, and are mediated by several to
numerous genes, each of which has a small contribution to the phenotype.
6 SEMINARS IN LIVER DISEASE/VOLUME 27, NUMBER 1 2007

external cues to guide the development and continued (i.e., intergenic SNPs) are thought to be far less likely
function of numerous cell types organized into higher to affect function. Nevertheless, it has been suggested
order structures such as our tissues and organ systems. that a large portion of this intergenic sequence is under
Genomics is the field of study that seeks to understand positive selection and, thus, potentially functional.20 To
and characterize the genome’s role in this process. The facilitate a better understanding of genome operation,
achievement of sequencing the human genome is in- the National Human Genome Research Institute
deed momentous and has provided new impetus to the (NHGRI) has launched the ENCODE (ENCyclope-
study of its involvement in disease pathogenesis. How- dia of DNA Elements) program, which seeks to per-
ever, the intricacies of genomic function cannot be form an exhaustive determination of all functional
understood by solely focusing on our species. To this elements in the human genome. Information regard-
extent, the genomes of a wide variety of organisms ing ENCODE can be found on the web at http://
ranging from bacteria to mammals have been eluci- www.genome.gov/10005107.
dated, and the effort continues. Complete lists of these
genome sequencing projects are freely available through
the National Center for Biotechnology Informa- Genomic Diversity
tion (NCBI) website (http://www.ncbi.nlm.nih.gov/ Diversity of the genome is primarily driven by random
entrez/query.fcgi?db=genomeprj). mating and meiotic recombination, the phenomenon
during which regions between pairs of equivalent chro-
mosomes are exchanged in the course of gametogenesis,
Sequence Variation generating discreet genetic differences between parents
The sequence of the human genome is varied, as and offspring.21 As we have transmitted the genetic
illustrated by the nearly 12 million polymorphisms material through relatively few generations since our
currently cataloged on the NCBI’s dbSNP website ancestral origins, contemporary chromosomes have re-
(http://www.ncbi.nlm.nih.gov/SNP/). Single nucleo- gions of variation in common, or display linkage dis-
tide polymorphisms (SNPs) are the most prevalent of equilibrium (LD) (Fig. 3). Thus, the relatively young
these variants, accounting for over 90% of human age of our species limits the diversity of the human
polymorphic loci.15 Distribution of SNPs across the genome.
genome is rather uniform and primarily dependent on The pattern of LD across the genome is incon-
the frequency of the minor allele (MAF). For instance, sistent because of the presence of recombination hot
a SNP with a MAF of 1% is expected to be present in spots, regions of the genome in which recombination
every 300 bases of genomic DNA, whereas a SNP with occurs more readily.22,23 The result is that regions of low
MAF of 40% is expected in only every 3300 bases.16 LD flank regions of high LD, creating haplotype
Resequencing portions of the genome in 137 individ- blocks.23,24 Limited allelic diversity within these blocks
uals has confirmed that SNPs are quite common, can potentially simplify genomic studies aimed at deci-
occurring approximately every 180 base pairs.17 In phering the mechanisms of complex disease.25,26 To
addition, a majority of these SNPs (64%) are rare, facilitate such efforts, the International Human Haplo-
with MAF < 5%.17 type Map (HapMap) project (http://www.hapmap.org/)
The location of a SNP, to some extent, suggests was initiated to determine the structure of human
its potential for functional significance and impact on haplotype blocks and identify SNPs that are predictive
disease risk.18 SNPs in gene coding sequence resulting for the variation in a larger set (i.e., tag SNPs). To date,
in premature termination (i.e., nonsense SNPs) or the HapMap project has assessed some 3 million SNPs
amino acid substitution (i.e., nonsynonymous SNPs) in 269 individuals from four racial/ethnic groups. The
alter protein sequence and possibly function and, there- HapMap effort has confirmed that significant redun-
fore, are more likely to demonstrate a phenotypic effect. dancy exists among common SNPs such that local
Interestingly, resequencing of two genes in a large genomic variation can be reliably determined using a
reference sample of 450 individuals identified a large subset of tagging SNPs.22,27
number of rare nonsynonymous SNPs that were not
detected in a smaller screening set,19 suggesting that
many SNPs with a high likelihood to affect phenotype The Horizon
(and by correlate disease) are present in the population, Genome sequencing projects, variant cataloging, and
albeit at low prevalence. SNPs located in splice site haplotype block mapping efforts all provide a solid
recognition, transcription factor binding, or enhancer basis for the study of genomics but together amount to
sequences also have the potential to alter phenotype by only the ‘‘tip of the iceberg’’ in the level of under-
affecting gene expression, splicing, or stability. How- standing that will be required if genomic studies are to
ever, we are not yet able to predict these functional affect globally the way in which we approach the
consequences a priori. SNPs located between genes diagnosis, prognosis, and treatment of complex disease.
APPLYING GENOMICS TO THE STUDY OF COMPLEX DISEASE/JURAN, LAZARIDIS 7

Figure 3 Linkage disequilibrium (LD) limits the diversity of the human genome. LD is the nonrandom association of alleles at two or
more loci. In humans, this is largely driven by the limited number of generations, and thus recombination events, through which the
genome has passed since our common ancestors, generating allele combinations that are widely shared among contemporary humans
(i.e., common haplotypes). Patterns of LD are influenced by population structure and dynamics, such as founder effects and population
admixture, and as a result display differences between racial and ethnic groups.

Increasing our knowledge regarding sequence func- ill equipped to study the effects of inherited genetic
tionality, currently the aim of the ENCODE program, variation across numerous physically separated loci and
will take us one step further in this endeavor by determine how these combinations of alleles interact
providing a basis for the prediction of variant-induced with the environment and lead to the development of
consequences outside the nonsense and nonsynony- complex disease.28 However, utilizing the genomic ap-
mous polymorphisms for which this approach is cur- proaches described in the following pages, we are begin-
rently feasible. Elucidation of the rules governing ning to identify simple associations between genetic
contextual genome utilization, including the genera- variants and complex disease traits.
tion of cell type–specific transcriptomes and proteomes Common single-variant and haplotypic associ-
(i.e., all gene transcripts and functional proteins) as ations are expected to be weak (e.g., odds ratio [OR]
well as the principles determining postgenome protein < 2.0) and, thus, poor predictors of disease develop-
network, environmental, and physiological interac- ment,29 but they may account for a large portion of
tions, will be required before genomic effects on the disease risk experienced by the affected popula-
high-order structure and redundant processes will be tion. Conversely, rare genetic variants significantly
fully appreciated. increasing susceptibility to disease development (e.g.,
OR > 5.0) may be widespread but individually are
likely to account for disease predisposition in only a
APPLYING GENOMICS TO THE STUDY small subset of the affected population. Moreover, the
OF COMPLEX DISEASE role of emergent genetic phenomena such as somatic
The knowledge gained from our current efforts at mutation and epigenetic modification in complex
sequencing and cataloging the variation of the human disease development is becoming more appreciated.
genome provides a sound basis for exploring its role in At present, it is unclear which mechanisms will prove
modulating traits of complex disease. Admittedly, we are to have the greater overall impact on disease in
8 SEMINARS IN LIVER DISEASE/VOLUME 27, NUMBER 1 2007

general, although it is becoming apparent that all are plex disease could prove useful, especially when ap-
likely to play a role in the spectrum of most complex plied to families in whom the genetic component is
disorders. likely to be enriched, such as those with an unusually
high rate of disease occurrence or exceptionally early
age of disease onset. Although these families might
Investigating Inherited Disease Susceptibility not be representative of the disease in the majority of
the population, findings could implicate major net-
LINKAGE works or pathways involved in the disease and would
Genetic linkage analysis seeks to identify the cosegre- certainly provide a basis for further investigations.
gation of polymorphic genetic markers in affected
family members to map disease-related genomic loci ASSOCIATION
by exploiting the nature of meiotic recombination. In general, association studies look for a statistical
That is, genes near each other will be inherited difference in the frequency of alleles between affected
together more often than those located farther apart. and unaffected individuals. Often this involves pop-
Accordingly, the genetic markers segregating with the ulation-based comparisons between cases and unre-
disease are assumed to be located near (linked to) the lated controls; however, family-based tests of
causative gene. The loci identified by traditional ge- association can also be quite useful.32 Association
nome-wide linkage approaches have often been large studies take advantage of the LD generated by meiotic
(5 to 10 Mb) because of historically applied low marker recombination throughout our ancestral history to
densities and to some extent the limited number of identify genomic loci contributing to disease pheno-
observable meioses in family groups,30 generally re- type, in contrast to linkage approaches, which observe
quiring an extensive fine-mapping effort to pinpoint only recent meioses in family groups. The former
the offending alleles. However, significantly better approach has traditionally been applied to the study
resolution could be achieved in many studies by utiliz- of genetic variants in candidate genes preselected for
ing higher densities of genetic markers (e.g., SNPs), a their potential involvement in specific disease proc-
proposition that is becoming more affordable as the esses such as genes or loci identified by prior whole-
cost of genotyping continues to drop. Both model- genome linkage scans or known to be involved
based (i.e., parametric) and model-free (i.e., nonpara- biochemically with disease. Lately, genome-wide asso-
metric) linkage approaches can be employed in the ciation studies utilizing hundreds of thousands to
search for disease-related alleles.31 Parametric linkage millions of SNPs spread across the genome have
analysis calls for the researcher to specify a model of become reality.33,34
inheritance and estimate the frequency and penetrance Overall, association studies are capable of identi-
of the disease genes, and it is a powerful approach fying substantial genetic effects (i.e., OR > 2.0) on dis-
when applied to extended pedigrees with many affected ease phenotype with relatively small sample sizes (n
individuals. Nonparametric linkage analysis is less !200)35 and have high power to detect small effects of
powerful but does not require the specification of a genetic variation (i.e., OR < 2.0) but require the sample
genetic model and instead looks simply for excessive sizes to be quite large (n ! 1000). SNPs are the genotyp-
sharing of alleles identical by descent among affected ing markers most often employed in association studies
family members. as they are quite abundant and easy to type. Depending
The tenets underlying linkage analysis make on the individual marker being tested, detected associ-
this a powerful approach for the identification of ations could directly affect the phenotype (i.e., suscept-
genes involved with Mendelian diseases but limit its ibility variant) or may be in LD with the true phenotypic
application to complex disorders.31 Clinical, locus, effector.36 In general, follow-up functional studies aimed
and allelic heterogeneities, all of which are common at explaining the disease mechanism underlying any
features of complex diseases, effectively dilute the detected associations would be beneficial. However,
linkage signal, significantly reducing the chance of such studies are often not performed because of lack of
identifying plausible candidate regions. Parametric a suitable model system and/or adequate specimens from
approaches are impractical, as models of inheritance which to derive RNA or protein. Moreover, when they
and estimations of the frequency and penetrance of are performed, the results can be unsatisfying and
the disease genes are not readily assumable for com- difficult to interpret, as the contextual milieu in which
plex disorders. Moreover, the late age of onset of function is affected and contributes to disease is likely to
most complex diseases often precludes the assessment be altered or lost.
of multiple generations, significantly reducing the Population-based association studies are suscep-
power of both parametric and nonparametric ap- tible to false positive findings related to population
proaches. However, the use of linkage analysis to stratification, a phenomenon stemming from unequal
identify genes involved in the development of com- genetic backgrounds (i.e., allele frequencies) between the
APPLYING GENOMICS TO THE STUDY OF COMPLEX DISEASE/JURAN, LAZARIDIS 9

case and control populations. In fact, loci completely confirm the role of leptin (the cause of extreme obesity in
unlinked to the tested disease will exhibit association the obese ob/ob mouse model) genetic polymorphism in
when the allele frequencies in the populations are con- human obesity43 and also used in the identification of a
siderably different.37 Moreover, less extreme manifesta- chemokine receptor (CCR5-delta 32) mutation that is
tions of selection bias can work to cloud the results of protective against human immunodeficiency virus (HIV)
association studies. For example, control populations infection and progression.44 As genotyping costs con-
derived from blood-bank donors may not be representa- tinue to decline, gene resequencing will become a more
tive of the case population as blood donors tend to be attractive approach to the identification of the genetic
rigorously screened and possibly in better general health determinants of complex diseases and their phenotypes
than the cases, especially when collected for elderly as it allows the simultaneous assessment of both com-
populations.38 Thus, positive associations may reflect mon and rare variants. Toward this end, the NHGRI is
genetic variations involved with overall health and not aggressively funding the advancement of revolutionary
specifically the disease of interest. Another source of genome sequencing technologies, with the goal of re-
false positive findings in association studies is multiple sequencing the entire human genome for $1000 (some
testing. For instance, when P values of .05 are considered four orders of magnitude less than currently feasible).
significant, 1 out of 20 positive associations are likely to When this lofty goal becomes reality, the way in which
be false. When extrapolated to thousands of tests, we view and approach genome science will be forever
numerous false associations are likely to be detected. changed.
Several methods to correct for multiple testing have been
employed, but many of these are quite draconian and
probably drive the exclusion of true positive findings.39 Investigating Emergent Disease Susceptibility:
Replication in an independent data set is the best Epigenetics, Somatic Mutation, and Aging
method to verify the research findings, and great care Although inherited genetic variation is likely to play a
should be taken to ensure that the follow-up study is significant role in determining susceptibility to complex
adequately powered and the controls are well matched.40 diseases, it is becoming apparent that epigenetic changes
On the other hand, family-based association tests are not and mutation in somatic cells may contribute a greater
prone to stratification biases and, thus, offer an alter- role in the etiology of these disorders than previously
native to study and confirm the observations of case- thought.14,45 This has become widely apparent in cancer,
control studies.41 where some level of inherited risk is seemingly present,
Genotyping and statistical methods keep im- but the mechanisms leading to malignant transformation
proving, and LD-based association studies will con- primarily involve misregulated epigenetics and accumu-
tinue to demonstrate usefulness in the future. The lation of somatic mutation.46 The extent of these epi-
collection of large sets of cases, their family members, genetic and mutagenic phenomena is possibly somewhat
and well-matched controls remains the primary chal- determined by inherited variation, invoking a vicious
lenge to the investigator wishing to study complex cycle, but they are thought to be largely driven by
disorders. To this extent, the case for developing a large stochastic phenomena and aging.
United States prospective cohort of !200,000 individ- Epigenetics involves stable changes in gene ex-
uals to study the role of genes and environment in pression that do not entail alteration of DNA sequence
disease development has been raised42 and is currently and are decoupled from labile, reactionary transcrip-
debated between scientists, policy makers, and the tional control processes.14 This form of transcriptional
federal government. regulation is known to involve the methylation of
cytosine bases at cytosine-guanine dinucleotides but
may involve some of the classes of noncoding RNAs as
Gene Resequencing well.14 Epigenetic events play a major role in mamma-
In general, association-based approaches to the identi- lian development and in the maintenance of tissue-
fication of alleles involved with complex disease are not specific cellular function, and aberrant methylation is
well suited to identify rare, recently arising variants that becoming increasingly evident in the expression of
may be involved with disease risk or protection, as these disease phenotypes.14,46 However, the methylation
variants are not likely to be directly assessed or indirectly profile of the human genome remains largely un-
detected by haplotypic association. An approach to known. To this extent, the Human Epigenome Project
identifying such rare variants is the complete resequenc- has been established, which aims to analyze DNA
ing of a particularly interesting candidate gene/gene- methylation patterns in the regulatory regions of all
region in affected patients, most likely focused on the known human genes in most of the major cell types in
protein coding exons and splice junctions, although a healthy state and their diseased counterparts.47 In-
potentially across whole genes or multigene regions formation regarding the current status of this effort
when relatively small. This approach has been used to can be found at http://www.epigenome.org.
10 SEMINARS IN LIVER DISEASE/VOLUME 27, NUMBER 1 2007

Somatic mutations ranging from individual point banks needed to gather comprehensive data sets on
mutations to large rearrangements, duplications, or de- genomic variation and environmental exposures. Third,
letions are increasingly noted as playing a role in the the definition and classification of several complex dis-
development of complex diseases, primarily in cancers eases and associated traits, whether clinical, biochemical,
for which diseased tissue (i.e., tumor) is often observable. or other, have to be expanded and improved to minimize
DNA rearrangements or deletion resulting in the loss of disease heterogeneity.
heterozygosity (LOH) at one allele can unmask the Pursuing clinical genomic research requires di-
deleterious effect of a recessive-acting mutation in a verse expertise of investigators (i.e., clinicians, labora-
tumor suppressor gene and lead to the development of tory-based researchers, genetic epidemiologists,
malignancy.48 LOH is identified when heterozygosity statistical geneticists, bioinformatics specialists). This
for a genetic marker is noted in the germline DNA but need comes in antithesis to the conventional model of
not in the DNA of the tumor. DNA rearrangements can research programs where an individual principal inves-
also generate fusion genes, sometimes resulting in a tigator is responsible for the direction of the study. The
gain-of-function abnormal hybrid protein. Previously cost of current genomic-based technology methods is
identified fusion genes have often been classified as falling but still quite high, limiting the application of
oncogenes because of their involvement with malignancy these exciting approaches to large, well-funded research
and poor outcome; one example is BCR-ABL, which is programs. Improvements of high-throughput solutions
strongly associated with the development of chronic are necessary to reduce the price tag of these technologies
myeloid leukemia.49 Moreover, segmental gene duplica- and make them affordable to more researchers and
tion, occurring by a process involving low-copy repeat applicable across the spectrum of complex diseases.
sequences, can affect gene copy number and, therefore, Finally, but most important, the protection of human
dosage of gene product, which can ultimately have an participants, whether patients, unaffected family mem-
effect on phenotype. Such copy number changes have bers, or unrelated healthy controls, has to be ensured.
been associated with susceptibility to HIV infection.50 These individuals are the key component of genomic
At present, the extent to which somatic mutation may research and their legal rights need to be protected if we
play a role in complex disease phenotype is unclear. wish to continue on with genomic science and to
Moreover, this putative effect is hard to assess. eventually apply genomic-based medicine for the good
Finally, some portion of most common complex of humankind.
disease cases may be attributed solely to stochastic
processes related to aging, primarily the result of accu-
mulated somatic mutation and cellular damage due to CONCLUSIONS
long-term exposure to harmful endogenous and environ- Genomics offers the potential to change the way we
mental agents.51,52 diagnose and treat complex disease. The foremost goals
of medical research are to advance the prognostication of
disease and to develop safe, effective novel drugs. A
CHALLENGES clearer understanding of the roles the genome as a whole
Presently, we stand in the midst of scientific promise and and environmental interaction play in complex disease
challenges of applying genomics to the study of complex pathogenesis will provide the keystone for future medical
disease. Ideally, these obstacles can turn into opportu- breakthroughs.
nities leading to discovery of better methods for disease
prognosis and novel therapies. Because of the Human
Genome Project and subsequent initiatives (e.g., Hap- ACKNOWLEDGMENTS
Map, ENCODE), we are now able to read the instruc- Supported by NIH grant DK68290, the Palumbo Foun-
tion of our genome and we possess the first tools to begin dation, and the Morgan Foundation. The authors thank
elucidating its functionality. However, we still lack Stacy Roberson for secretarial assistance.
methods to systematically examine the interactions
among the numerous genes of the genome as well as to
comprehend how variation can leave some of us more ABBREVIATIONS
vulnerable to developing disease than others. DZ dizygotic
The barriers we have to overcome in genomic HIV human immunodeficiency virus
research are considerable. First, it is important to realize LD linkage disequilibrium
the intricate nature of human biological systems. Clearer LOH loss of heterozygosity
understanding of the redundancy of networked arrange- MAF minor allele frequency
ments is needed in the context of the genome itself and the MZ monozygotic
high-order processes it encodes. Second, we lack the large NHGRI National Human Genome Research
patient, family member, and matched control specimen Institute
APPLYING GENOMICS TO THE STUDY OF COMPLEX DISEASE/JURAN, LAZARIDIS 11

OR odds ratio 23. Stumpf MP. Haplotype diversity and the block structure of
SNP single nucleotide polymorphism linkage disequilibrium. Trends Genet 2002;18:226–228
24. Gabriel SB, Schaffner SF, Nguyen H, et al. The structure of
haplotype blocks in the human genome. Science 2002;296:
2225–2229
REFERENCES
25. Cardon LR, Abecasis GR. Using haplotype blocks to map
1. Chakravarti A, Little P. Nature, nurture and human disease. human complex trait loci. Trends Genet 2003;19:135–
Nature 2003;421:412–414 140
2. Koonin EV. Orthologs, paralogs, and evolutionary genomics. 26. Johnson GC, Esposito L, Barratt BJ, et al. Haplotype tagging
Annu Rev Genet 2005;39:309–338 for the identification of common disease genes. Nat Genet
3. Makalowski W. The human genome structure and organ- 2001;29:233–237
ization. Acta Biochim Pol 2001;48:587–598 27. Zeggini E, Rayner W, Morris AP, et al. An evaluation of
4. Kitano H. Biological robustness. Nat Rev Genet 2004;5:826– HapMap sample size and tagging SNP performance in large-
837 scale empirical and simulated data sets. Nat Genet 2005;37:
5. Dover G. How genomic and developmental dynamics affect 1320–1322
evolutionary processes. Bioessays 2000;22:1153–1159 28. Carlson CS, Eberle MA, Kruglyak L, Nickerson DA.
6. Bortoluzzi S, Romualdi C, Bisognin A, Danieli GA. Disease Mapping complex disease loci in whole-genome association
genes and intracellular protein networks. Physiol Genomics studies. Nature 2004;429:446–452
2003;15:223–227 29. Holtzman NA. Putting the search for genes in perspective.
7. Gu Z, Steinmetz LM, Gu X, et al. Role of duplicate genes in Int J Health Serv 2001;31:445–461
genetic robustness against null mutations. Nature 2003; 30. Botstein D, Risch N. Discovering genotypes underlying
421:63–66 human phenotypes: past successes for mendelian disease,
8. Flatt T. The evolutionary genetics of canalization. Q Rev future approaches for complex disease. Nat Genet 2003;
Biol 2005;80:287–316 33(suppl):228–237
9. Nowak MA, Boerlijst MC, Cooke J, Smith JM. Evolution of 31. Mayeux R. Mapping the new frontier: complex genetic
genetic redundancy. Nature 1997;388:167–171 disorders. J Clin Invest 2005;115:1404–1407
10. Collins FS. The human genome project and the future of 32. Majumder PP, Ghosh S. Mapping quantitative trait loci in
medicine. Ann NY Acad Sci 1999;882:42–55; discussion 56– humans: achievements and limitations. J Clin Invest 2005;
65 115:1419–1424
11. Lazaridis KN, Juran BD. American Gastroenterological 33. Maraganore DM, de Andrade M, Lesnick TG, et al. High-
Association future trends committee report: the application resolution whole-genome association study of Parkinson
of genomic and proteomic technologies to digestive disease disease. Am J Hum Genet 2005;77:685–693
diagnosis and treatment and their likely impact on gastro- 34. Puppala S, Dodd GD, Fowler S, et al. A genomewide search
enterology clinical practice. Gastroenterology 2005;129: finds major susceptibility loci for gallbladder disease on
1720–1752 chromosome 1 in Mexican Americans. Am J Hum Genet
12. Strohman R. Maneuvering in the complex path from 2006;78:377–392
genotype to phenotype. Science 2002;296:701–703 35. Whitcomb DC, Aoun E, Vodovotz Y, et al. Evaluating
13. MacGregor AJ, Snieder H, Schork NJ, Spector TD. Twins: disorders with a complex genetics basis: the future roles of
novel uses to study complex traits and genetic diseases. meta-analysis and systems biology. Dig Dis Sci 2005;50:
Trends Genet 2000;16:131–134 2195–2202
14. Jiang YH, Bressler J, Beaudet AL. Epigenetics and human 36. Palmer LJ, Cardon LR. Shaking the tree: mapping complex
disease. Annu Rev Genomics Hum Genet 2004;5:479–510 disease genes with linkage disequilibrium. Lancet 2005;366:
15. Lander ES, Linton LM, Birren B, et al. Initial sequencing and 1223–1234
analysis of the human genome. Nature 2001;409:860–921 37. Gordon D, Finch SJ. Factors affecting statistical power in the
16. Kruglyak L, Nickerson DA. Variation is the spice of life. detection of genetic association. J Clin Invest 2005;115:
Nat Genet 2001;27:234–236 1408–1418
17. Crawford DC, Akey DT, Nickerson DA. The patterns of 38. Vineis P, McMichael AJ. Bias and confounding in molecular
natural variation in human genes. Annu Rev Genomics Hum epidemiological studies: special considerations. Carcinogen-
Genet 2005;6:287–312 esis 1998;19:2063–2067
18. Tabor HK, Risch NJ, Myers RM. Opinion: candidate-gene 39. Shephard N, John S, Cardon L, et al. Will the real disease gene
approaches for studying complex genetic traits—practical please stand up? BMC Genet 2005;6(suppl 1):S66
considerations. Nat Rev Genet 2002;3:391–397 40. Colhoun HM, McKeigue PM, Davey Smith G. Problems
19. Glatt CE, DeYoung JA, Delgado S, et al. Screening a large of reporting genetic associations with complex outcomes.
reference sample to identify very low frequency sequence Lancet 2003;361:865–872
variants: comparisons between two genes. Nat Genet 2001; 41. Cardon LR, Bell JI. Association study designs for complex
27:435–438 diseases. Nat Rev Genet 2001;2:91–99
20. Waterston RH, Lindblad-Toh K, Birney E, et al. Initial 42. Collins FS. The case for a US prospective cohort study of
sequencing and comparative analysis of the mouse genome. genes and environment. Nature 2004;429:475–477
Nature 2002;420:520–562 43. Montague CT, Farooqi IS, Whitehead JP, et al. Congenital
21. Moens PB. The double-stranded DNA helix in recombina- leptin deficiency is associated with severe early-onset obesity
tion at meiosis. Genome 2003;46:936–937 in humans. Nature 1997;387:903–908
22. International HapMap Consortium. A haplotype map of the 44. Dean M, Carrington M, Winkler C, et al. Genetic restriction
human genome. Nature 2005;437:1299–1320 of HIV-1 infection and progression to AIDS by a deletion
12 SEMINARS IN LIVER DISEASE/VOLUME 27, NUMBER 1 2007

allele of the CKR5 structural gene. Hemophilia Growth and 48. Knudson AG. Antioncogenes and human cancer. Proc Natl
Development Study, Multicenter AIDS Cohort Study, Acad Sci USA 1993;90:10914–10921
Multicenter Hemophilia Cohort Study, San Francisco City 49. Randolph TR. Chronic myelocytic leukemia. Part II:
Cohort, ALIVE Study. Science 1996;273:1856–1862 approaches to and molecular monitoring of therapy. Clin
45. Dean M. Approaches to identify genes for complex human Lab Sci 2005;18:49–56
diseases: lessons from Mendelian disorders. Hum Mutat 50. Gonzalez E, Kulkarni H, Bolivar H, et al. The influence
2003;22:261–274 of CCL3L1 gene-containing segmental duplications on
46. Laird PW. The power and the promise of DNA methylation HIV-1/AIDS susceptibility. Science 2005;307:1434–1440
markers. Nat Rev Cancer 2003;3:253–266 51. Kirkwood TB. Understanding the odd science of aging. Cell
47. Rakyan VK, Hildmann T, Novik KL, et al. DNA 2005;120:437–447
methylation profiling of the human major histocompatibility 52. Suh Y, Vijg J. Maintaining genetic integrity in aging:
complex: a pilot study for the human epigenome project. a zero sum game. Antioxid Redox Signal 2006;8:559–
PLoS Biol 2004;2:e405 571