Applying Genomics to the Study of Complex Disease

Brian D. Juran, B.S.,1 and Konstantinos N. Lazaridis, M.D.1

ABSTRACT

The interest in dissecting the genetic and environmental components of complex human disease is growing, fueled by the emerging advances in the field of genomics and related disciplines. Improved understanding of the pathogenesis of complex liver diseases such as gallbladder stones, nonalcoholic fatty liver disease, viral hepatitis, and hepatocellular carcinoma remains a goal of the clinical and experimental hepatologist alike. Despite the scientific progress and technological advancement, elucidating the underlying mechanisms of complex hepatic diseases from the genomic standpoint will be demanding. Complexity of genomic structure and function, disease heterogeneity, influence of the environment on disease development and progression, and epigenetics all contribute to the challenge. To overcome these obstacles, novel conceptual frameworks regarding biological systems and human diseases are necessary in addition to a coordinated endeavor among different scientific disciplines. Deciphering in an integrated fashion the genomic, transcriptional, and translational aspects of the pathogenesis of complex liver diseases will lead to their better prediction, diagnostics, and treatment.
KEYWORDS: Systems biology, disease susceptibility, genetics

omplex diseases are heterogeneous, the cumulative result of a wide array of gene variants (both common and rare), somatic mutations, epigenetic modifications, and environmental exposures, the combinations of which are apt to be significantly varied among the spectrum of affected individuals.1 Thus, inherited genetic variation is not directly the cause of complex disease but instead acts to mediate the risk of disease development in response to environmental exposures. The clinical and genetic heterogeneity inherent in these disorders greatly complicates our ability to dissect the underpinnings of their etiology and pathogenesis. In the following pages we provide a brief overview of the concepts involved with disease complexity and the field of genomics. We then discuss the current strategies and future challenges of applying genomics-based stud1 Division of Gastroenterology and Hepatology, Center for Basic Research in Digestive Diseases, Mayo Clinic College of Medicine, Rochester, Minnesota. Address for correspondence and reprint requests: Konstantinos N. Lazaridis, M.D., Mayo Clinic College of Medicine, 200 First Street SW, Rochester, MN 55905.

C

ies toward achieving a better understanding of complex disease.

DISEASE COMPLEXITY Systems Biology: Robustness, Modularity, and Redundancy We humans are in essence complex biological machines, shaped over millennia by evolutionary forces and defined by our genome. All of the information necessary for life is encoded in our DNA. However, its utilization is dependent on the cellular context in which it resides, allowing the development, organization, and sustainment of the diverse set of cells that comprise us. The ability of the genome to generate and coordinate this
Genetics and Genomics of Complex Diseases in Hepatology; Guest Editor, Konstantinos N. Lazaridis, M.D. Semin Liver Dis 2007;27:3–12. Copyright # 2007 by Thieme Medical Publishers, Inc., 333 Seventh Avenue, New York, NY 10001, USA. Tel: +1(212) 584-4662. DOI 10.1055/s-2006-960167. ISSN 0272-8087.

3

4

SEMINARS IN LIVER DISEASE/VOLUME 27, NUMBER 1

2007

tremendous level of diversity has evolved over the billions of years that DNA-based life has existed.2,3 Robustness is the driving force behind this process and is a fundamental feature of complex biological systems.4 Simply put, robust properties provide phenotypic stability in the presence of unpredictable environmental or genetic challenges. Complex systems become robust through modularity and redundancy, both of which act to mitigate the potential for system-wide damage.5 Modular structure is pervasive in life and often exists in a hierarchy (i.e., organs, tissues, cells, and organelles). In addition to physical structure, functional and regulatory mechanisms such as metabolism, cell cycle, and signal transduction are widely modularized, taking the form of numerous networks and pathways operating at the postgenome level.5,6 Interconnection of these networks and widespread gene duplication derive the means for use of alternative genes or pathways generating redundancy,7 a compensatory process to achieve the desired phenotype when failure in

another gene or module occurs (Fig. 1). Thus, redundancy can and often does generate disconnect between genotype and phenotype, a process that is dependent on higher order pathway and network interactions and the context in which the genome is utilized.8,9 These fundamental features of complex biological systems significantly affect the genetic mechanisms underlying the etiology of complex human disease and pose limits on our current ability to study them.

Disease Genetics Genetic predisposition is thought to play a role in most human diseases.10 Currently, three classifications based on genetic involvement with disease are recognized: chromosomal, Mendelian, and complex.11 Chromosomal disorders are characterized by gross abnormalities in chromosome number or structure and often result in preterm death related to developmental abnormalities.

Figure 1 Interconnected biological pathways generate redundancy. The interrelation of numerous pathways generates a redundant network, buffering the effect of input variability and genetic polymorphism on phenotype. Shown is a simplified hypothetical signaling network through which phenotypic effects are stimulated by a primary input that is detected by members of two distinct modular pathways (delineated by boxes). In this example, these pathways communicate through a primary node (dark gray square) that acts to mediate a large portion of the signal in the network and provides a feedback loop to diminish the effect of input variation on the phenotype (illustrated by dotted lines). However, some of the signal from each pathway, as well as a secondary input (dark gray circle), is able to bypass this node and directly stimulate an effect on phenotype. Genetic polymorphism in individual components of the network (i.e., genes) is unlikely to have a great effect on phenotype because of the many means through which the input stimulus can be passed. However, a slight phenotypic effect could be demonstrated by these putative variants, potentially contributing to risk of disease. Furthermore, genetic variants of the primary node and secondary nodes (light gray circles) are more likely to display a detectable effect.

APPLYING GENOMICS TO THE STUDY OF COMPLEX DISEASE/JURAN, LAZARIDIS

5

Mendelian diseases run in families and display classic patterns of inheritance, such as autosomal dominant, autosomal recessive, or X-linked. In general, these disorders are rare, arise early in life, and can be attributed to mutation in a single gene that, when present, directly causes the disease phenotype. Often these causative mutations are family specific. The vast majority of human diseases are genetically complex, wherein the direct correspondence between causative genotype and disease phenotype characteristic of Mendelian disorders is not present.12 Instead, complex diseases develop as the cumulative result of environmental exposures, exerting their effect over time, in genetically susceptible individuals. Therefore, the genotypic components of complex diseases are not causative but rather mediate disease risk. Many such susceptibility genotypes are expected for each complex disorder, some of which are common to similar diseases (e.g., autoimmune disorders) and some of which may be disease specific. Regardless, the individual contributing variants are likely to have only a slight contribution to the overall risk of each specific disease in the affected population (Fig. 2). Complex diseases are diverse in clinical presentation, progression, and response to treatment. Because of this heterogeneity, it is useful to break down complex disorders to a series of disease traits or phenotypes for consideration in genomic studies. These traits can be qualitative, such as the presence of a comorbid disease, an associated diagnostic marker, or a previously determined risk factor. In addition, these traits could be quantitative measures such as age of disease onset, results of serum liver tests, or gene expression profiles. Comprehensive characterization, assessment, and utilization of these traits will be essential in the dissection of the genetic and environmental contributors to complex disease.

Disease concordance in monozygotic (MZ) twins is presently the best means for establishing the strength of the inherited genetic determinates of complex disease.13 As MZ twins have identical DNA sequences, disease concordance is suggestive of genetic influence, whereas discordance indicates a greater role for environmental or stochastic effects. Furthermore, the difference in disease concordance between MZ and dizygotic (DZ) twin pairs may provide additional insight into the mechanisms at play in complex disease development. For example, large differences in disease concordance between MZ and DZ twin pairs could signify the involvement of numerous risk-modifying gene variants and, conversely, slight differences in concordance might imply a stronger shared-environment effect. However, it should be noted that de novo genetic and/or epigenetic effects prior to or after MZ twinning could have an effect on MZ disease concordance and MZ/DZ concordance ratios, obscuring the reality of the inherited genetic contribution to disease.14 Familial aggregation also provides a way to estimate the genetic impact on complex diseases,11 as family members share more genetic material between themselves than with the general population. Relative risk ratios (l) are often used to illustrate the risk of disease development in the families of affected individuals. The l is calculated by dividing the prevalence of a complex disease among family members (often specifically siblings, ls) by the prevalence of the disease in the population at large.11 In general, higher l values suggest a greater role of the genetic component in disease.

GENOMICS From the earliest stages of life the human genome is selectively activated in response to both internal and

Figure 2 Comparison between Mendelian and complex diseases. In general, Mendelian diseases have lower frequency in the population, display higher prevalence, and are caused by a single or a few genes, each of which has a high effect on the disease phenotype. In contrast, complex diseases are usually more common, have reduced prevalence, and are mediated by several to numerous genes, each of which has a small contribution to the phenotype.

6

SEMINARS IN LIVER DISEASE/VOLUME 27, NUMBER 1

2007

external cues to guide the development and continued function of numerous cell types organized into higher order structures such as our tissues and organ systems. Genomics is the field of study that seeks to understand and characterize the genome’s role in this process. The achievement of sequencing the human genome is indeed momentous and has provided new impetus to the study of its involvement in disease pathogenesis. However, the intricacies of genomic function cannot be understood by solely focusing on our species. To this extent, the genomes of a wide variety of organisms ranging from bacteria to mammals have been elucidated, and the effort continues. Complete lists of these genome sequencing projects are freely available through the National Center for Biotechnology Information (NCBI) website (http://www.ncbi.nlm.nih.gov/ entrez/query.fcgi?db=genomeprj).

(i.e., intergenic SNPs) are thought to be far less likely to affect function. Nevertheless, it has been suggested that a large portion of this intergenic sequence is under positive selection and, thus, potentially functional.20 To facilitate a better understanding of genome operation, the National Human Genome Research Institute (NHGRI) has launched the ENCODE (ENCyclopedia of DNA Elements) program, which seeks to perform an exhaustive determination of all functional elements in the human genome. Information regarding ENCODE can be found on the web at http:// www.genome.gov/10005107.

Sequence Variation The sequence of the human genome is varied, as illustrated by the nearly 12 million polymorphisms currently cataloged on the NCBI’s dbSNP website (http://www.ncbi.nlm.nih.gov/SNP/). Single nucleotide polymorphisms (SNPs) are the most prevalent of these variants, accounting for over 90% of human polymorphic loci.15 Distribution of SNPs across the genome is rather uniform and primarily dependent on the frequency of the minor allele (MAF). For instance, a SNP with a MAF of 1% is expected to be present in every 300 bases of genomic DNA, whereas a SNP with MAF of 40% is expected in only every 3300 bases.16 Resequencing portions of the genome in 137 individuals has confirmed that SNPs are quite common, occurring approximately every 180 base pairs.17 In addition, a majority of these SNPs (64%) are rare, with MAF < 5%.17 The location of a SNP, to some extent, suggests its potential for functional significance and impact on disease risk.18 SNPs in gene coding sequence resulting in premature termination (i.e., nonsense SNPs) or amino acid substitution (i.e., nonsynonymous SNPs) alter protein sequence and possibly function and, therefore, are more likely to demonstrate a phenotypic effect. Interestingly, resequencing of two genes in a large reference sample of 450 individuals identified a large number of rare nonsynonymous SNPs that were not detected in a smaller screening set,19 suggesting that many SNPs with a high likelihood to affect phenotype (and by correlate disease) are present in the population, albeit at low prevalence. SNPs located in splice site recognition, transcription factor binding, or enhancer sequences also have the potential to alter phenotype by affecting gene expression, splicing, or stability. However, we are not yet able to predict these functional consequences a priori. SNPs located between genes

Genomic Diversity Diversity of the genome is primarily driven by random mating and meiotic recombination, the phenomenon during which regions between pairs of equivalent chromosomes are exchanged in the course of gametogenesis, generating discreet genetic differences between parents and offspring.21 As we have transmitted the genetic material through relatively few generations since our ancestral origins, contemporary chromosomes have regions of variation in common, or display linkage disequilibrium (LD) (Fig. 3). Thus, the relatively young age of our species limits the diversity of the human genome. The pattern of LD across the genome is inconsistent because of the presence of recombination hot spots, regions of the genome in which recombination occurs more readily.22,23 The result is that regions of low LD flank regions of high LD, creating haplotype blocks.23,24 Limited allelic diversity within these blocks can potentially simplify genomic studies aimed at deciphering the mechanisms of complex disease.25,26 To facilitate such efforts, the International Human Haplotype Map (HapMap) project (http://www.hapmap.org/) was initiated to determine the structure of human haplotype blocks and identify SNPs that are predictive for the variation in a larger set (i.e., tag SNPs). To date, the HapMap project has assessed some 3 million SNPs in 269 individuals from four racial/ethnic groups. The HapMap effort has confirmed that significant redundancy exists among common SNPs such that local genomic variation can be reliably determined using a subset of tagging SNPs.22,27

The Horizon Genome sequencing projects, variant cataloging, and haplotype block mapping efforts all provide a solid basis for the study of genomics but together amount to only the ‘‘tip of the iceberg’’ in the level of understanding that will be required if genomic studies are to affect globally the way in which we approach the diagnosis, prognosis, and treatment of complex disease.

APPLYING GENOMICS TO THE STUDY OF COMPLEX DISEASE/JURAN, LAZARIDIS

7

Figure 3 Linkage disequilibrium (LD) limits the diversity of the human genome. LD is the nonrandom association of alleles at two or more loci. In humans, this is largely driven by the limited number of generations, and thus recombination events, through which the genome has passed since our common ancestors, generating allele combinations that are widely shared among contemporary humans (i.e., common haplotypes). Patterns of LD are influenced by population structure and dynamics, such as founder effects and population admixture, and as a result display differences between racial and ethnic groups.

Increasing our knowledge regarding sequence functionality, currently the aim of the ENCODE program, will take us one step further in this endeavor by providing a basis for the prediction of variant-induced consequences outside the nonsense and nonsynonymous polymorphisms for which this approach is currently feasible. Elucidation of the rules governing contextual genome utilization, including the generation of cell type–specific transcriptomes and proteomes (i.e., all gene transcripts and functional proteins) as well as the principles determining postgenome protein network, environmental, and physiological interactions, will be required before genomic effects on high-order structure and redundant processes will be fully appreciated.

APPLYING GENOMICS TO THE STUDY OF COMPLEX DISEASE The knowledge gained from our current efforts at sequencing and cataloging the variation of the human genome provides a sound basis for exploring its role in modulating traits of complex disease. Admittedly, we are

ill equipped to study the effects of inherited genetic variation across numerous physically separated loci and determine how these combinations of alleles interact with the environment and lead to the development of complex disease.28 However, utilizing the genomic approaches described in the following pages, we are beginning to identify simple associations between genetic variants and complex disease traits. Common single-variant and haplotypic associations are expected to be weak (e.g., odds ratio [OR] < 2.0) and, thus, poor predictors of disease development,29 but they may account for a large portion of the disease risk experienced by the affected population. Conversely, rare genetic variants significantly increasing susceptibility to disease development (e.g., OR > 5.0) may be widespread but individually are likely to account for disease predisposition in only a small subset of the affected population. Moreover, the role of emergent genetic phenomena such as somatic mutation and epigenetic modification in complex disease development is becoming more appreciated. At present, it is unclear which mechanisms will prove to have the greater overall impact on disease in

8

SEMINARS IN LIVER DISEASE/VOLUME 27, NUMBER 1

2007

general, although it is becoming apparent that all are likely to play a role in the spectrum of most complex disorders.

Investigating Inherited Disease Susceptibility
LINKAGE

Genetic linkage analysis seeks to identify the cosegregation of polymorphic genetic markers in affected family members to map disease-related genomic loci by exploiting the nature of meiotic recombination. That is, genes near each other will be inherited together more often than those located farther apart. Accordingly, the genetic markers segregating with the disease are assumed to be located near (linked to) the causative gene. The loci identified by traditional genome-wide linkage approaches have often been large (5 to 10 Mb) because of historically applied low marker densities and to some extent the limited number of observable meioses in family groups,30 generally requiring an extensive fine-mapping effort to pinpoint the offending alleles. However, significantly better resolution could be achieved in many studies by utilizing higher densities of genetic markers (e.g., SNPs), a proposition that is becoming more affordable as the cost of genotyping continues to drop. Both modelbased (i.e., parametric) and model-free (i.e., nonparametric) linkage approaches can be employed in the search for disease-related alleles.31 Parametric linkage analysis calls for the researcher to specify a model of inheritance and estimate the frequency and penetrance of the disease genes, and it is a powerful approach when applied to extended pedigrees with many affected individuals. Nonparametric linkage analysis is less powerful but does not require the specification of a genetic model and instead looks simply for excessive sharing of alleles identical by descent among affected family members. The tenets underlying linkage analysis make this a powerful approach for the identification of genes involved with Mendelian diseases but limit its application to complex disorders.31 Clinical, locus, and allelic heterogeneities, all of which are common features of complex diseases, effectively dilute the linkage signal, significantly reducing the chance of identifying plausible candidate regions. Parametric approaches are impractical, as models of inheritance and estimations of the frequency and penetrance of the disease genes are not readily assumable for complex disorders. Moreover, the late age of onset of most complex diseases often precludes the assessment of multiple generations, significantly reducing the power of both parametric and nonparametric approaches. However, the use of linkage analysis to identify genes involved in the development of com-

plex disease could prove useful, especially when applied to families in whom the genetic component is likely to be enriched, such as those with an unusually high rate of disease occurrence or exceptionally early age of disease onset. Although these families might not be representative of the disease in the majority of the population, findings could implicate major networks or pathways involved in the disease and would certainly provide a basis for further investigations.
ASSOCIATION

In general, association studies look for a statistical difference in the frequency of alleles between affected and unaffected individuals. Often this involves population-based comparisons between cases and unrelated controls; however, family-based tests of association can also be quite useful.32 Association studies take advantage of the LD generated by meiotic recombination throughout our ancestral history to identify genomic loci contributing to disease phenotype, in contrast to linkage approaches, which observe only recent meioses in family groups. The former approach has traditionally been applied to the study of genetic variants in candidate genes preselected for their potential involvement in specific disease processes such as genes or loci identified by prior wholegenome linkage scans or known to be involved biochemically with disease. Lately, genome-wide association studies utilizing hundreds of thousands to millions of SNPs spread across the genome have become reality.33,34 Overall, association studies are capable of identifying substantial genetic effects (i.e., OR > 2.0) on disease phenotype with relatively small sample sizes (n 200)35 and have high power to detect small effects of genetic variation (i.e., OR < 2.0) but require the sample sizes to be quite large (n  1000). SNPs are the genotyping markers most often employed in association studies as they are quite abundant and easy to type. Depending on the individual marker being tested, detected associations could directly affect the phenotype (i.e., susceptibility variant) or may be in LD with the true phenotypic effector.36 In general, follow-up functional studies aimed at explaining the disease mechanism underlying any detected associations would be beneficial. However, such studies are often not performed because of lack of a suitable model system and/or adequate specimens from which to derive RNA or protein. Moreover, when they are performed, the results can be unsatisfying and difficult to interpret, as the contextual milieu in which function is affected and contributes to disease is likely to be altered or lost. Population-based association studies are susceptible to false positive findings related to population stratification, a phenomenon stemming from unequal genetic backgrounds (i.e., allele frequencies) between the

APPLYING GENOMICS TO THE STUDY OF COMPLEX DISEASE/JURAN, LAZARIDIS

9

case and control populations. In fact, loci completely unlinked to the tested disease will exhibit association when the allele frequencies in the populations are considerably different.37 Moreover, less extreme manifestations of selection bias can work to cloud the results of association studies. For example, control populations derived from blood-bank donors may not be representative of the case population as blood donors tend to be rigorously screened and possibly in better general health than the cases, especially when collected for elderly populations.38 Thus, positive associations may reflect genetic variations involved with overall health and not specifically the disease of interest. Another source of false positive findings in association studies is multiple testing. For instance, when P values of .05 are considered significant, 1 out of 20 positive associations are likely to be false. When extrapolated to thousands of tests, numerous false associations are likely to be detected. Several methods to correct for multiple testing have been employed, but many of these are quite draconian and probably drive the exclusion of true positive findings.39 Replication in an independent data set is the best method to verify the research findings, and great care should be taken to ensure that the follow-up study is adequately powered and the controls are well matched.40 On the other hand, family-based association tests are not prone to stratification biases and, thus, offer an alternative to study and confirm the observations of casecontrol studies.41 Genotyping and statistical methods keep improving, and LD-based association studies will continue to demonstrate usefulness in the future. The collection of large sets of cases, their family members, and well-matched controls remains the primary challenge to the investigator wishing to study complex disorders. To this extent, the case for developing a large United States prospective cohort of 200,000 individuals to study the role of genes and environment in disease development has been raised42 and is currently debated between scientists, policy makers, and the federal government.

confirm the role of leptin (the cause of extreme obesity in the obese ob/ob mouse model) genetic polymorphism in human obesity43 and also used in the identification of a chemokine receptor (CCR5-delta 32) mutation that is protective against human immunodeficiency virus (HIV) infection and progression.44 As genotyping costs continue to decline, gene resequencing will become a more attractive approach to the identification of the genetic determinants of complex diseases and their phenotypes as it allows the simultaneous assessment of both common and rare variants. Toward this end, the NHGRI is aggressively funding the advancement of revolutionary genome sequencing technologies, with the goal of resequencing the entire human genome for $1000 (some four orders of magnitude less than currently feasible). When this lofty goal becomes reality, the way in which we view and approach genome science will be forever changed.

Gene Resequencing In general, association-based approaches to the identification of alleles involved with complex disease are not well suited to identify rare, recently arising variants that may be involved with disease risk or protection, as these variants are not likely to be directly assessed or indirectly detected by haplotypic association. An approach to identifying such rare variants is the complete resequencing of a particularly interesting candidate gene/generegion in affected patients, most likely focused on the protein coding exons and splice junctions, although potentially across whole genes or multigene regions when relatively small. This approach has been used to

Investigating Emergent Disease Susceptibility: Epigenetics, Somatic Mutation, and Aging Although inherited genetic variation is likely to play a significant role in determining susceptibility to complex diseases, it is becoming apparent that epigenetic changes and mutation in somatic cells may contribute a greater role in the etiology of these disorders than previously thought.14,45 This has become widely apparent in cancer, where some level of inherited risk is seemingly present, but the mechanisms leading to malignant transformation primarily involve misregulated epigenetics and accumulation of somatic mutation.46 The extent of these epigenetic and mutagenic phenomena is possibly somewhat determined by inherited variation, invoking a vicious cycle, but they are thought to be largely driven by stochastic phenomena and aging. Epigenetics involves stable changes in gene expression that do not entail alteration of DNA sequence and are decoupled from labile, reactionary transcriptional control processes.14 This form of transcriptional regulation is known to involve the methylation of cytosine bases at cytosine-guanine dinucleotides but may involve some of the classes of noncoding RNAs as well.14 Epigenetic events play a major role in mammalian development and in the maintenance of tissuespecific cellular function, and aberrant methylation is becoming increasingly evident in the expression of disease phenotypes.14,46 However, the methylation profile of the human genome remains largely unknown. To this extent, the Human Epigenome Project has been established, which aims to analyze DNA methylation patterns in the regulatory regions of all known human genes in most of the major cell types in a healthy state and their diseased counterparts.47 Information regarding the current status of this effort can be found at http://www.epigenome.org.

10

SEMINARS IN LIVER DISEASE/VOLUME 27, NUMBER 1

2007

Somatic mutations ranging from individual point mutations to large rearrangements, duplications, or deletions are increasingly noted as playing a role in the development of complex diseases, primarily in cancers for which diseased tissue (i.e., tumor) is often observable. DNA rearrangements or deletion resulting in the loss of heterozygosity (LOH) at one allele can unmask the deleterious effect of a recessive-acting mutation in a tumor suppressor gene and lead to the development of malignancy.48 LOH is identified when heterozygosity for a genetic marker is noted in the germline DNA but not in the DNA of the tumor. DNA rearrangements can also generate fusion genes, sometimes resulting in a gain-of-function abnormal hybrid protein. Previously identified fusion genes have often been classified as oncogenes because of their involvement with malignancy and poor outcome; one example is BCR-ABL, which is strongly associated with the development of chronic myeloid leukemia.49 Moreover, segmental gene duplication, occurring by a process involving low-copy repeat sequences, can affect gene copy number and, therefore, dosage of gene product, which can ultimately have an effect on phenotype. Such copy number changes have been associated with susceptibility to HIV infection.50 At present, the extent to which somatic mutation may play a role in complex disease phenotype is unclear. Moreover, this putative effect is hard to assess. Finally, some portion of most common complex disease cases may be attributed solely to stochastic processes related to aging, primarily the result of accumulated somatic mutation and cellular damage due to long-term exposure to harmful endogenous and environmental agents.51,52

banks needed to gather comprehensive data sets on genomic variation and environmental exposures. Third, the definition and classification of several complex diseases and associated traits, whether clinical, biochemical, or other, have to be expanded and improved to minimize disease heterogeneity. Pursuing clinical genomic research requires diverse expertise of investigators (i.e., clinicians, laboratory-based researchers, genetic epidemiologists, statistical geneticists, bioinformatics specialists). This need comes in antithesis to the conventional model of research programs where an individual principal investigator is responsible for the direction of the study. The cost of current genomic-based technology methods is falling but still quite high, limiting the application of these exciting approaches to large, well-funded research programs. Improvements of high-throughput solutions are necessary to reduce the price tag of these technologies and make them affordable to more researchers and applicable across the spectrum of complex diseases. Finally, but most important, the protection of human participants, whether patients, unaffected family members, or unrelated healthy controls, has to be ensured. These individuals are the key component of genomic research and their legal rights need to be protected if we wish to continue on with genomic science and to eventually apply genomic-based medicine for the good of humankind.

CHALLENGES Presently, we stand in the midst of scientific promise and challenges of applying genomics to the study of complex disease. Ideally, these obstacles can turn into opportunities leading to discovery of better methods for disease prognosis and novel therapies. Because of the Human Genome Project and subsequent initiatives (e.g., HapMap, ENCODE), we are now able to read the instruction of our genome and we possess the first tools to begin elucidating its functionality. However, we still lack methods to systematically examine the interactions among the numerous genes of the genome as well as to comprehend how variation can leave some of us more vulnerable to developing disease than others. The barriers we have to overcome in genomic research are considerable. First, it is important to realize the intricate nature of human biological systems. Clearer understanding of the redundancy of networked arrangements is needed in the context of the genome itself and the high-order processes it encodes. Second, we lack the large patient, family member, and matched control specimen

CONCLUSIONS Genomics offers the potential to change the way we diagnose and treat complex disease. The foremost goals of medical research are to advance the prognostication of disease and to develop safe, effective novel drugs. A clearer understanding of the roles the genome as a whole and environmental interaction play in complex disease pathogenesis will provide the keystone for future medical breakthroughs.

ACKNOWLEDGMENTS

Supported by NIH grant DK68290, the Palumbo Foundation, and the Morgan Foundation. The authors thank Stacy Roberson for secretarial assistance.

ABBREVIATIONS DZ dizygotic HIV human immunodeficiency virus LD linkage disequilibrium LOH loss of heterozygosity MAF minor allele frequency MZ monozygotic NHGRI National Human Genome Research Institute

APPLYING GENOMICS TO THE STUDY OF COMPLEX DISEASE/JURAN, LAZARIDIS

11

OR SNP

odds ratio single nucleotide polymorphism

REFERENCES
1. Chakravarti A, Little P. Nature, nurture and human disease. Nature 2003;421:412–414 2. Koonin EV. Orthologs, paralogs, and evolutionary genomics. Annu Rev Genet 2005;39:309–338 3. Makalowski W. The human genome structure and organization. Acta Biochim Pol 2001;48:587–598 4. Kitano H. Biological robustness. Nat Rev Genet 2004;5:826– 837 5. Dover G. How genomic and developmental dynamics affect evolutionary processes. Bioessays 2000;22:1153–1159 6. Bortoluzzi S, Romualdi C, Bisognin A, Danieli GA. Disease genes and intracellular protein networks. Physiol Genomics 2003;15:223–227 7. Gu Z, Steinmetz LM, Gu X, et al. Role of duplicate genes in genetic robustness against null mutations. Nature 2003; 421:63–66 8. Flatt T. The evolutionary genetics of canalization. Q Rev Biol 2005;80:287–316 9. Nowak MA, Boerlijst MC, Cooke J, Smith JM. Evolution of genetic redundancy. Nature 1997;388:167–171 10. Collins FS. The human genome project and the future of medicine. Ann NY Acad Sci 1999;882:42–55; discussion 56– 65 11. Lazaridis KN, Juran BD. American Gastroenterological Association future trends committee report: the application of genomic and proteomic technologies to digestive disease diagnosis and treatment and their likely impact on gastroenterology clinical practice. Gastroenterology 2005;129: 1720–1752 12. Strohman R. Maneuvering in the complex path from genotype to phenotype. Science 2002;296:701–703 13. MacGregor AJ, Snieder H, Schork NJ, Spector TD. Twins: novel uses to study complex traits and genetic diseases. Trends Genet 2000;16:131–134 14. Jiang YH, Bressler J, Beaudet AL. Epigenetics and human disease. Annu Rev Genomics Hum Genet 2004;5:479–510 15. Lander ES, Linton LM, Birren B, et al. Initial sequencing and analysis of the human genome. Nature 2001;409:860–921 16. Kruglyak L, Nickerson DA. Variation is the spice of life. Nat Genet 2001;27:234–236 17. Crawford DC, Akey DT, Nickerson DA. The patterns of natural variation in human genes. Annu Rev Genomics Hum Genet 2005;6:287–312 18. Tabor HK, Risch NJ, Myers RM. Opinion: candidate-gene approaches for studying complex genetic traits—practical considerations. Nat Rev Genet 2002;3:391–397 19. Glatt CE, DeYoung JA, Delgado S, et al. Screening a large reference sample to identify very low frequency sequence variants: comparisons between two genes. Nat Genet 2001; 27:435–438 20. Waterston RH, Lindblad-Toh K, Birney E, et al. Initial sequencing and comparative analysis of the mouse genome. Nature 2002;420:520–562 21. Moens PB. The double-stranded DNA helix in recombination at meiosis. Genome 2003;46:936–937 22. International HapMap Consortium. A haplotype map of the human genome. Nature 2005;437:1299–1320

23. Stumpf MP. Haplotype diversity and the block structure of linkage disequilibrium. Trends Genet 2002;18:226–228 24. Gabriel SB, Schaffner SF, Nguyen H, et al. The structure of haplotype blocks in the human genome. Science 2002;296: 2225–2229 25. Cardon LR, Abecasis GR. Using haplotype blocks to map human complex trait loci. Trends Genet 2003;19:135– 140 26. Johnson GC, Esposito L, Barratt BJ, et al. Haplotype tagging for the identification of common disease genes. Nat Genet 2001;29:233–237 27. Zeggini E, Rayner W, Morris AP, et al. An evaluation of HapMap sample size and tagging SNP performance in largescale empirical and simulated data sets. Nat Genet 2005;37: 1320–1322 28. Carlson CS, Eberle MA, Kruglyak L, Nickerson DA. Mapping complex disease loci in whole-genome association studies. Nature 2004;429:446–452 29. Holtzman NA. Putting the search for genes in perspective. Int J Health Serv 2001;31:445–461 30. Botstein D, Risch N. Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease. Nat Genet 2003; 33(suppl):228–237 31. Mayeux R. Mapping the new frontier: complex genetic disorders. J Clin Invest 2005;115:1404–1407 32. Majumder PP, Ghosh S. Mapping quantitative trait loci in humans: achievements and limitations. J Clin Invest 2005; 115:1419–1424 33. Maraganore DM, de Andrade M, Lesnick TG, et al. Highresolution whole-genome association study of Parkinson disease. Am J Hum Genet 2005;77:685–693 34. Puppala S, Dodd GD, Fowler S, et al. A genomewide search finds major susceptibility loci for gallbladder disease on chromosome 1 in Mexican Americans. Am J Hum Genet 2006;78:377–392 35. Whitcomb DC, Aoun E, Vodovotz Y, et al. Evaluating disorders with a complex genetics basis: the future roles of meta-analysis and systems biology. Dig Dis Sci 2005;50: 2195–2202 36. Palmer LJ, Cardon LR. Shaking the tree: mapping complex disease genes with linkage disequilibrium. Lancet 2005;366: 1223–1234 37. Gordon D, Finch SJ. Factors affecting statistical power in the detection of genetic association. J Clin Invest 2005;115: 1408–1418 38. Vineis P, McMichael AJ. Bias and confounding in molecular epidemiological studies: special considerations. Carcinogenesis 1998;19:2063–2067 39. Shephard N, John S, Cardon L, et al. Will the real disease gene please stand up? BMC Genet 2005;6(suppl 1):S66 40. Colhoun HM, McKeigue PM, Davey Smith G. Problems of reporting genetic associations with complex outcomes. Lancet 2003;361:865–872 41. Cardon LR, Bell JI. Association study designs for complex diseases. Nat Rev Genet 2001;2:91–99 42. Collins FS. The case for a US prospective cohort study of genes and environment. Nature 2004;429:475–477 43. Montague CT, Farooqi IS, Whitehead JP, et al. Congenital leptin deficiency is associated with severe early-onset obesity in humans. Nature 1997;387:903–908 44. Dean M, Carrington M, Winkler C, et al. Genetic restriction of HIV-1 infection and progression to AIDS by a deletion

12

SEMINARS IN LIVER DISEASE/VOLUME 27, NUMBER 1

2007

allele of the CKR5 structural gene. Hemophilia Growth and Development Study, Multicenter AIDS Cohort Study, Multicenter Hemophilia Cohort Study, San Francisco City Cohort, ALIVE Study. Science 1996;273:1856–1862 45. Dean M. Approaches to identify genes for complex human diseases: lessons from Mendelian disorders. Hum Mutat 2003;22:261–274 46. Laird PW. The power and the promise of DNA methylation markers. Nat Rev Cancer 2003;3:253–266 47. Rakyan VK, Hildmann T, Novik KL, et al. DNA methylation profiling of the human major histocompatibility complex: a pilot study for the human epigenome project. PLoS Biol 2004;2:e405

48. Knudson AG. Antioncogenes and human cancer. Proc Natl Acad Sci USA 1993;90:10914–10921 49. Randolph TR. Chronic myelocytic leukemia. Part II: approaches to and molecular monitoring of therapy. Clin Lab Sci 2005;18:49–56 50. Gonzalez E, Kulkarni H, Bolivar H, et al. The influence of CCL3L1 gene-containing segmental duplications on HIV-1/AIDS susceptibility. Science 2005;307:1434–1440 51. Kirkwood TB. Understanding the odd science of aging. Cell 2005;120:437–447 52. Suh Y, Vijg J. Maintaining genetic integrity in aging: a zero sum game. Antioxid Redox Signal 2006;8:559– 571