You are on page 1of 37

28.12.

2023

HUMAN DISEASE

KIVANÇ BİLECEN
PROF. OF MOLECULAR BIOLOGY & GENETICS

Human Disease

Life is a relationship between molecules, not a property of any one


molecule. So is therefore disease, which endangers life. While there
are molecular diseases, there are no diseased molecules. At the
level of the molecules, we find only variations in structure and
physicochemical properties.
—Emile Zuckerkandl and Linus Pauling (1962)
28.12.2023

Human Genetic Disease: A Consequence of DNA Variation

 At the level of the individual within a species, some mutations improve fitness, most mutations have no effect on fitness,
and some are maladaptive (relative to some norm).
 Disease may be defined as maladaptive changes that afflict individuals within a population.
 Disease is also defined as an abnormal condition in which physiological function is impaired.

 There is a tremendous diversity to the nature of human diseases


for several reasons:
 Mutations affect all parts of the human genome. There are
limitless opportunities for maladaptive mutations to occur,
and there are many mechanisms by which mutations can
cause disease
 insertions, deletions, duplications, inversions, points
mutations (silent, nonsense, frame-shift, splicing)
 Protein‐coding genes function by producing a protein as a
gene product. A disease‐causing mutation in a gene results in
the failure to produce the gene product with normal function.
 The interaction of an individual with his or her environment
has profound effects on disease phenotype. Genetically
identical twins may have entirely different phenotypes. Such
differences are attributable to environmental influences or to
epigenetic effects.

Human Genetic Disease: A Consequence of DNA Variation

A Bioinformatics Perspective on Human Disease

 Main approach >>> to describe genes and gene products that cause disease
 Main challenge >>> to connect the genotype to the phenotype
 Bioinformatics can have impact on our knowledge of diseases:
 DNA databases to compare DNA sequences (GenBank/EMBL/DDBJ/SRA; Online Mendelian Inheritance in Man
(OMIM); locus‐specific databases)
 Linkages studies, association studies >>> physical and genetics maps for the identification of mutant genes
 Mutations affecting proteins’ 3D structure – structural bioinformatics
 Functional studies to understand the effect of a mutation
 Gene function prediction – through identification of orthologs in simpler organisms
28.12.2023

Human Genetic Disease: A Consequence of DNA Variation

A Bioinformatics Perspective on Human Disease

 Main approach >>> to describe genes and gene products that cause disease
 Main challenge >>> to connect the genotype to the phenotype
 Bioinformatics can have impact on our knowledge of diseases:
 DNA databases to compare DNA sequences (GenBank/EMBL/DDBJ/SRA; Online Mendelian Inheritance in Man
(OMIM); locus‐specific databases)
 Linkages studies, association studies >>> physical and genetics maps for the identification of mutant genes
 Mutations affecting proteins’ 3D structure – structural bioinformatics
 Functional studies to understand the effect of a mutation
 Gene function prediction – through identification of orthologs in simpler organisms

Human Genetic Disease: A Consequence of DNA Variation

Classification of Disease

 Several general categories of disease


 single‐gene disorders,
 complex disorders,
 chromosomal disorders,
 environmental disease
 Classification systems
 Mortalities, morbidities
 Disability‐adjusted life years (DALYs)
28.12.2023

Human Genetic Disease: A Consequence of DNA Variation

Genetic Disorders/Diseases
Hereditary Sporadic
Disease Predisposition/Susceptibility Somatic Chromosomal

Late Onset
Somatic
Mendelian Non Mendelian Common Cancer Drug Response Cancer CNV Structural
Mosaicism
Diseases

Other
Autozomal Sex Linked Polygenic Mitochondrial Imprinting TNR Ploidy Aneuploidy Microdeletion
Structural
Syndromes
Anomalies

Dominant Recessive X Linked

Human Genetic Disease: A Consequence of DNA Variation

Classification of Disease

 A far more extensive listing of morbidity data is


provided by the International Statistical Classification
of Diseases and Related Health Problems (ICD)
 It provides a standard for coding patients at most
hospitals

 Rare diseases
 Diseases affecting <200.000 people
 Why are they important?
28.12.2023

Human Genetic Disease: A Consequence of DNA Variation

NIH Disease Classification: MeSH Terms

 Medical Subject Headings (MeSH)


 a unified language for biomedical literature database searches

Categories of Disease

 We can describe 4 main categories of diseases afflict humans:


 Single‐gene (monogenic) disease
 Complex disease
 Genomic disease
 Environmental disease
 Additionally
 Somatic diseases (like cancer)
 Mitochondrial diseases
28.12.2023

Categories of Disease

Allele Frequencies and Effect Sizes

 Minor Allele Frequency (MAF)


 Major allele >>> the most common allele for a given SNP
 Minor allele >>> the less common allele (the second most common) for a given SNP
 it provides information to differentiate between common and rare variants in the population

 The HapMap Project and the 1000 Genomes Project


 Cataloged millions of variants and reported each of
their allele frequencies
 Various approaches have shown which of these
variants are likely to be pathogenic or neutral

https://www.ncbi.nlm.nih.gov/snp/

Categories of Disease

Allele Frequencies and Effect Sizes

 Minor Allele Frequency (MAF)


 Major allele >>> the most common allele for a given SNP
 Minor allele >>> the less common allele (the second most common) for a given SNP
 it provides information to differentiate between common and rare variants in the population

 The HapMap Project and the 1000 Genomes Project


 Cataloged millions of variants and reported each of
their allele frequencies
 Various approaches have shown which of these
variants are likely to be pathogenic or neutral

https://www.ncbi.nlm.nih.gov/snp/
28.12.2023

Categories of Disease

Allele Frequencies and Effect Sizes

 Effect size
 Can be quantitated as an odds ratio (OR).
 An OR is a measure of association between an exposure (in our case a genetic variant) and an outcome (expression
of a disease).
 An OR of 1 implies that the presence of a variant does not affect the odds of a disease outcome;
 OR>1 implies an association with higher odds
of a disease occurrence.

Categories of Disease

Allele Frequencies and Effect Sizes

 A major goal of human genetics and genomics is to identify variants that cause disease (or confer risk for disease)

1. Rare alleles having large effects sizes tend to


cause Mendelian diseases that are primarily
4
monogenic 1
2. Low‐frequency alleles tend to have effects that
are less strong
3. Some common alleles have a low effect size 2
yet still contribute to common disease. Such
common alleles have been captured by GWA
5
studies. 3
4. Very few examples of common variants having
large effects in contributing to common
diseases
5. Rare variants having small effects are
extremely difficult to identify
28.12.2023

Categories of Disease

Monogenic Disorders

 Simple traits are transmitted following the rules of Mendel


 Online Mendelian Inheritance in Man (OMIM) database currently lists over 5000 phenotypes for which the molecular
basis is known
 While each Mendelian disease tends to be rare in the population, cumulatively these conditions affect at least 1% of
liveborn infants

 Simple traits are transmitted following the


rules of Mendel
 Online Mendelian Inheritance in Man
(OMIM) database currently lists over 5000
phenotypes for which the molecular basis is
known
 While each Mendelian disease tends to be
rare in the population, cumulatively these
conditions affect at least 1% of liveborn
infants

Categories of Disease

Complex Disorders

 Complex disorders such as Alzheimer’s disease and cardiovascular disease are caused by defects in multiple genes
 These disorders are also called multifactorial, reflecting that they are expressed as a function of both genetic and
environmental factors.
 These traits do not segregate in a simple, discrete, Mendelian manner
 asthma, autism, depression, diabetes, high blood pressure, obesity, osteoporosis

 Complex disorders are characterized by the following features


 Multiple genes are thought to be involved. It is the combination of mutations in multiple genes that defines the
disease.
 Complex diseases involve the combined effect of multiple genes, but they are also caused by both environmental
factors and behaviors that elevate the risk of disease
 Complex diseases are non‐Mendelian: they show familial aggregation but not segregation
 Susceptibility alleles have a high population frequency, that is, complex diseases are generally more frequent than
single‐gene disorders
 Susceptibility alleles have low penetrance. Penetrance is the frequency with which a dominant or homozygous
recessive gene produces its characteristic phenotype in a population. At the extremes, it is an all‐or‐none
phenomenon: a genotype is either expressed or it is not. In complex disorders, partial penetrance is common.
28.12.2023

Categories of Disease

Genomic Disorders

 Genomic disorders >>> changes in the structure of the genome that cause disease
 Large‐scale chromosomal abnormalities are extremely common causes of disease in humans.
 Many developmental abnormalities involve a portion of a chromosome.
 Some involve cytogenetically detectable changes and span millions of base pairs
 If they are too small to be cytogenetically visible (e.g., smaller than about 3 Mb) they are usually referred to as cryptic
changes.
 microdeletion syndromes include Cri‐du‐chat syndrome, Angelman syndrome, Prader Willi syndrome,
Smith‐Magenis syndrome, and various forms of intellectual disability that result from the gain (microduplication) or
loss (microdeletion) of chromosomal regions.

genomic disorders that are inherited in a Mendelian


fashion and involve only one or several genes

Categories of Disease

Genomic Disorders

 genomicdisorders
Genomic disorders>>>
that are inherited
changes in a Mendelian
in the structure of the fashion
genomeandthatinvolve only one or several genes
cause disease
 Large‐scale chromosomal abnormalities are extremely common causes of disease in humans.
 Many developmental abnormalities involve a portion of a chromosome.
 Some involve cytogenetically detectable changes and span millions of base pairs
 If they are too small to be cytogenetically visible (e.g., smaller than about 3 Mb) they are usually referred to as cryptic
changes.
 microdeletion syndromes include Cri‐du‐chat syndrome, Angelman syndrome, Prader Willi syndrome,
Smith‐Magenis syndrome, and various forms of intellectual disability that result from the gain (microduplication) or
loss (microdeletion) of chromosomal regions.
28.12.2023

Categories of Disease

a similar list of common structural variations that are associated


Genomic Disorders
with disease
 Genomic disorders >>> changes in the structure of the genome that cause disease
 Large‐scale chromosomal abnormalities are extremely common causes of disease in humans.
 Many developmental abnormalities involve a portion of a chromosome.
 Some involve cytogenetically detectable changes and span millions of base pairs
 If they are too small to be cytogenetically visible (e.g., smaller than about 3 Mb) they are usually referred to as cryptic
changes.
 microdeletion syndromes include Cri‐du‐chat syndrome, Angelman syndrome, Prader Willi syndrome,
Smith‐Magenis syndrome, and various forms of intellectual disability that result from the gain (microduplication) or
loss (microdeletion) of chromosomal regions.

Categories of Disease

Genomic Disorders

 Models for the molecular mechanisms of genomic disorders.


28.12.2023

Categories of Disease

Genomic Disorders

 Chromosomal alterations may be considered to occur


along a spectrum from having little or no adverse effects
to causing disease
 Copy number variants may have no phenotypic
consequences and may be thought of as chromosomal
alterations (in contrast to chromosomal abnormalities).
 Some copy number variants may increase disease
susceptibility, perhaps contributing to common complex
(multigenic) disorders.
 Chromosomal disorders are an extremely common
feature of normal human development.
 Over 60% of spontaneous abortions that occur at 12
weeks gestation or earlier are aneuploid, suggesting
that early pregnancy failures are likely due to lethal
chromosome abnormalities.

Categories of Disease

Environmentally Caused Disease

 Environmental diseases are extremely common.


 Infectious diseases are caused by a pathogen
 Directly by the environment (non‐infectious)
 malnutrition (whether maternal, fetal, or in an independent individual), poisoning by toxicants such as
lead or mercury, or injury
 GWA studies have been used to compare the genotypes of large numbers of individuals who are susceptible to an
infectious disease relative to controls
 In some cases the variant confers resistance by impairing the function of a receptor for a pathogen, as in the
case of the CCR5 for HIV‐2, FUT2 for norovirus, and DARC for Plasmodium vivax (a malaria pathogen)
28.12.2023

Categories of Disease

Disease and Genetic Background

 Modifier genes are likely to be involved, and environmental factors are certain to have large roles in genetic
diseases.
 the concept that monogenic disorders may be caused primarily by the abnormal function of a single gene yet
they always involve multiple genes
 Particular ethnic groups or other discrete groups have high susceptibility to some genetic diseases

Categories of Disease

Mitochondrial Disease
 Today, over 100 disease‐causing point mutations have been described
 The mitochondrial genome contains 37 genes, any of which can be associated
with disease. Morbidity map of the human mitochondrial genome
 Mitochondrial genetics differs from Mendelian genetics in three main ways
 Mitochondrial DNA is maternally inherited. A woman having a
mitochondrial DNA mutation may therefore transmit it to her children,
but only her daughters will further transmit the mutation to their
children.
 While nuclear genes exist with two alleles (one maternal and one
paternal), mitochondrial genes exist in hundreds or thousands of
haploid copies per cell. Some critical threshold of mutated
mitochondrial genomes is required before a disease is manifested.
 As cells divide the proportion of mitochondria having mutated genomes
can change, affecting the phenotypic expression of mitochondrial
disorders. Clinically, mitochondrial disorders present at different times
and in different regions of the body. An extremely broad variety of
diseases are associated with mutations in mitochondrial DNA.
 MITOMAP is a useful mitochondrial genome database
28.12.2023

Categories of Disease

Somatic Mosaic Disease

 Mosaicism is the occurrence of genetically distinct populations of cells within an


organism
 Genetic changes may involve somatic cells such as skin or liver (somatic mosaicism),
or they may involve germline cells (germline mosaicism, also called gonadal
mosaicism)

Categories of Disease

Cancer: A Somatic Mosaic Disease

 Cancer is a somatic mosaic disease, arising from a clone having somatic mutations and leading to malignant
transformation
 Cancer occurs when DNA mutations confer selective advantage to cells that proliferate, often uncontrollably
 There are six hallmarks of cancer, described by Hanahan and Weinberg (2011):
 proliferative signaling
 evading growth suppressors
 resisting cell death
 enabling replicative immortality
 induction of angiogenesis
 inactivating invasion and metastasis
 There are >200 types of cancer and many disease mechanisms, and a growing number of key tumor suppressor
genes and other oncogenic genes have been identified
 A human cancer genome project has been launched to catalog the DNA sequence of a variety of cancer genomes
28.12.2023

Categories of Disease

Cancer: A Somatic Mosaic Disease


 COSMIC (catalogue of somatic mutations in cancer)
 includes information on nearly 1 million cancer samples, >1.6 million mutations, and various types of mutations (fusions, genomic
rearrangements, and copy number variants).
 The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC) are other major initiatives.
 Their goals are to analyze mutations in thousands of tumor samples to characterize genetic changes, as well as alterations in the
transcriptome and epigenome.
 The UCSC Cancer Genomics Browser is a resource offering extensive data on cancer, including from the TCGA project

Categories of Disease

Cancer: A Somatic Mosaic Disease


 The landscape of cancer includes two types of mutations
 Driver mutations >>> confer a selective growth advantage to cells, are implicated as causing the neoplastic process, and are
positively selected for during tumorogenesis.
 Passenger mutations >>> are retained by chance but confer no selective advantage and do not contribute to oncogenesis.
 A challenge is to identify driver mutations throughout the genome of a cancer cell and to distinguish them from passenger mutations.
 The goal is to relate such a molecular profile of a cancer to an appropriate therapy to eradicate the cancer
28.12.2023

Categories of Disease

Cancer: A Somatic Mosaic Disease


 The advent of next‐generation sequencing has enabled deep cataloguing of many cancer
types.
 Some conclusions from such studies
 The rate of nonsynonymous mutations varies greatly.
 Cancers that compromise DNA repair function can lead to thousands of
nonsynonymous mutations per tumor.
 Cancer caused by mutagens such as tobacco or ultraviolet radiation from sunlight
tend to cause ∼100–200 nonsynonymous mutations per tumor.
 Lung cancers in smokers may therefore have ten times as many somatic mutations
as lung cancers in nonsmokers
 Cancers having relatively few mutations include pediatric tumors and leukemias (∼10 per
tumor).
 One reason is that some tumors acquire mutations over time (particularly tumors in
self‐renewing tissues).
 Metastatic cancer develops through somatic mutations that are acquired over a period of
decades. Also, mutations in metastatic tumors were already present in many cells in
primary tumors.

Categories of Disease

Cancer: A Somatic Mosaic Disease


 Aneuploidy is common in cancer cells, including whole‐chromosome or segmental copy
number changes, inversions, and translocations.
 Translocations often result in the fusion of two genes to create an oncogene such
as BCR‐ABL.
 Chromosomal deletions are the most common form of aneuploidy in cancer, often
deleting a tumor suppressor gene
 It is challenging to determine whether a somatic mutation represents a driver or a
passenger.
 A driver gene contains driver gene mutation(s) but it may also contain passenger
mutations.
 Oncogenes tend to have recurrent mutations at one or a few amino acid positions (as for
PIK3CA and IDH1), while tumor suppressor genes tend to acquire truncating mutations
along their length.
 Vogelstein et al. describe a “20/20 rule” in which a gene is classified as an oncogene
if >20% of the recorded mutations are missense at recurrent positions.
 It is a tumor suppressor gene if >20% of its mutations are inactivating.
28.12.2023

Categories of Disease

Cancer: A Somatic Mosaic Disease


 Heterogeneity of cancer mutations may be observed:
(1) among the cells of a single tumor;
(2) among different metastatic lesions of one patient;
(3) among the cells of a single metastatic lesion;
(4) among tumors of different patients
 While the cancer genome appears extraordinarily complex, the vast majority of genetic
variants are passengers that do not influence neoplasia
 Functionally, driver mutations influence three cellular processes:
 Cell fate
 Cell survival
 Genome maintenance

Disease Databases

 Two major types of human disease databases:


 Central databases give information on thousands of human diseases
 Online Mendelian Inheritance in Man (OMIM)  An Online Catalog of Human Genes and Genetic Disorders
 The Human Gene Mutation Database (HGMD)
 ClinVar ‐ ClinVar aggregates information about genomic variation and its relationship to human health
 Locus‐specific mutation databases
 mutations associated with genes, with a focus on either one specific gene and/or one disease
28.12.2023

Disease Databases

OMIM: Central Bioinformatics Resource for Human Disease

 A comprehensive database for human genes and genetic disorders, particularly rare (often monogenic) disorders having a genetic
basis.
 The OMIM database contains bibliographic entries for over 25,000 human diseases and relevant genes
 A focus of OMIM is inherited genetic diseases. The OMIM database is concerned with Mendelian genetics.
 Little information about genetic mutations in complex disorders, or chromosomal disorders.

Disease Databases

OMIM: Central Bioinformatics Resource for Human Disease


28.12.2023

Disease Databases

OMIM: Central Bioinformatics Resource for Human Disease

Disease Databases

OMIM: Central Bioinformatics Resource for Human Disease


28.12.2023

Disease Databases

OMIM: Central Bioinformatics Resource for Human Disease

 Each entry in OMIM is associated with a numbering system.


 There is a six‐digit code in which the first digit indicates the
mode of inheritance of the gene involved

Disease Databases

Human Gene Mutation Database (HGMD)

 The Human Gene Mutation Database (HGMD) is another major source of information on disease‐associated
mutations.
 The database is partly commercial (requiring payment for full access). [purchased by Qiagen]
 HGMD emphasizes more comprehensive cataloguing of mutations, compared to OMIM
 In sequencing human genomes and exomes, it is common to filter variants based on whether they have been
previously associated with disease; HGMD has emerged as a basic resource in many analysis pipelines.
28.12.2023

Disease Databases

Human Gene Mutation Database (HGMD)

Disease Databases

Human Gene Mutation Database (HGMD)


28.12.2023

Disease Databases

varsome – The Human Genetics Search Engine https://varsome.com/

https://varsome.com/variant/hg19/MEFV%3AM694V?annotation‐mode=germline

Disease Databases

ClinVar and Databases of Clinically Relevant Variants

 The ClinVar database provides data on human variants and their relationship to disease.
 It further provides links to the NIH Genetic Testing Registry (GTR), MedGen, Gene, OMIM, and PubMed
 There are five categories of content in ClinVar:
(1) Submitter  Submissions are from organizations and individuals.
(2) Variation  Includes sequences at one location (single allele) or multiple alleles (e.g., compound
heterozygotes in which two parents transmit different alleles at a single locus, sometimes causing a
phenotypic change). Variants are cross‐referenced to dbSNP and dbVar.
(3) Phenotype  May represent one concept or more and is annotated by MeSH term, OMIM number,
MedGen identifier, or Human Phenotype Ontology (HPO).
(4) Interpretation  Submitter‐driven and uses terms recommended by the American College of Medical
Genetics and Genomics (ACMG).
(5) Evidence  Typically consists of the number of individuals in which a given mutation was observed.
28.12.2023

Disease Databases

ClinVar and Databases of Clinically Relevant Variants

Disease Databases

ClinVar and Databases of Clinically Relevant Variants


28.12.2023

Disease Databases

ClinVar and Databases of Clinically Relevant Variants

Disease Databases

ClinVar and Databases of Clinically Relevant Variants


28.12.2023

Disease Databases

GeneCards

 Includes a wealth of information on human disease genes (Stelzer et al., 2011).


 Differs from OMIM in that it collects and integrates data from several dozen independent databases including
OMIM, GenBank, UniGene, Ensembl, the University of California at Santa Cruz (UCSC), and the Munich
Information Center for Protein Sequences (MIPS).
 Relative to OMIM, GeneCards uses relatively less descriptive text of human diseases and provides relatively
more functional genomics data.

Disease Databases

GeneCards
28.12.2023

Disease Databases

GeneCards

Disease Databases

GeneCards
28.12.2023

Disease Databases

GeneCards

Disease Databases

GeneCards
28.12.2023

Disease Databases

GeneCards

Disease Databases

Locus-Specific Mutation Databases and LOVD

 Central databases such as OMIM and HGMD attempt to comprehensively describe all disease‐related genes
without necessarily cataloguing every known allelic variant.
 IN CONTRAST, locus‐specific mutation databases describe variations in a single gene (or sometimes in several
genes) in depth.
 The coverage of known mutations also tends to be far deeper in locus‐specific databases as a group than in
central databases.
 A locus‐specific mutation database is a repository for allelic variations.
 The essential components of a locus‐specific database include the following
 a unique identifier for each allele;
 information on the source of the data;
 the context of the allele;
 information on the allele (e.g., its name, type, and nucleotide variation).
28.12.2023

Disease Databases

Locus-Specific Mutation Databases and LOVD

 A main point of entry to locus‐specific databases is the Human Genome Variation Society (HGVS).
 HGVS‐nomenclature is used to report and exchange information regarding variants found in DNA, RNA and protein sequences and
serves as an international standard
 Provides access to 1,600 locus‐specific mutation databases.
 Its major categories include:
(1) locus‐specific mutation databases, organized by HUGO approved gene symbols;
(2) disease‐centered central mutation databases, such as the Asthma Gene Database;
(3) central mutation and SNP databases, such as OMIM, dbSNP, HGMD, and PharmGKB;
(4) national and ethnic mutation databases, such as databases for diseases affecting Finns or Turks;
(5) mitochondrial mutation databases, such as MITOMAP;
(6) chromosomal variation databases, such as the Mitelman database of chromosome aberrations in cancer;
(7) nonhuman mutation databases, such as OMIA (Online Mendelian Inheritance in Animals);
(8) clinical databases such as those of the National Organization for Rare Disorders (NORD)

Disease Databases

Locus-Specific Mutation Databases and LOVD

 The Leiden Open Variation Database (LOVD)


 Supports thousands of locus‐specific database
 Provides software to establish locus‐specific databases and curate data on individuals, phenotypes, and DNA sequencing
variants following HGVS standards for nomenclature.
 Provides access to Mutalyzer, a software package that confirms variant data are presented in a consistent standard.
28.12.2023

Disease Databases

Locus-Specific Mutation Databases and LOVD

Disease Databases

Locus-Specific Mutation Databases and LOVD


28.12.2023

Disease Databases

Limitations of Disease Databases: The Growing Interpretive Gap

 Databases reporting which alleles are associated with human disease have critical roles in the interpretation of the clinical significance
of genomic variants.
 Data analysis pipelines for next‐generation sequencing studies
 filter‐exclude variants that are likely to be benign (neutral) because they appear in databases of apparently normal individuals
 filter‐include variants that are likely to be pathogenic because they have been annotated as disease‐associated
 Some of the challenges faced in assessing variants include the following:
 For monogenic disorders, some variants in a disease‐associated gene occur relatively frequently and their pathogenicity is
established. For other rare variants, the clinical significance is unknown.
 For multigenic disorders, allelic heterogeneity makes the interpretation of the clinical significance of variants even more difficult.
 There is a large “interpretive gap” as increasing numbers of variants are identified, but their significance has not yet been
assessed.
o Locus‐specific databases are excellent repositories for the cataloguing of variants, but they also need associated clinical or
phenotypic data.
 Databases such as the variants from the 1000 Genome Project are currently used to define neutral variants, but clinical and
phenotypic data are not available for those individuals.
o Even if they are defined as “apparently normal,” all are susceptible to disease.

Disease Databases

Limitations of Disease Databases: The Growing Interpretive Gap

 Mutation Taster (http://www.mutationtaster.org/)

Schwarz, J., Cooper, D., Schuelke, M. et al.


MutationTaster2: mutation prediction for the
deep‐sequencing age. Nat Methods 11, 361–
362 (2014).

https://doi.org/10.1038/nmeth.2890
28.12.2023

Approaches to Identify Disease-Associated Genes & Loci

Linkage Analysis

 A genetic linkage map displays genetic information in reference to linkage groups (chromosomes) in a genome.
 The mapping units are centiMorgans
 Based on recombination frequency between polymorphic markers such as SNPs or microsatellites
 One cM equals one recombination event in 100 meiosis (for the human genome, the recombination rate is typically 1–2 cM/Mb
 In linkage studies, genetic markers are used to search for coinheritance of chromosomal regions within families
 Two genes that are in proximity on a chromosome will usually cosegregate during meiosis.
 By following the pattern of transmission of a large set of markers in a large pedigree,
 linkage analysis can be used to localize a disease gene based on its linkage to a genetic marker locus.
 Huntington’s disease (OMIM#143100) was the first autosomal disorder for which linkage analysis was used to identify the disease
locus
 Linkage is usually successful for single‐gene disease models rather than for complex traits.
 It also typically involves studies of large pedigrees.

Approaches to Identify Disease-Associated Genes & Loci

Genome-Wide Association Studies (GWAS)

 While the genetic basis of over a thousand single‐gene disorders has been found,
 it is far more difficult to identify the genetic causes of common human diseases that involve multiple genes
 a large number of genes may each make only a small contribution to the disease risk
 GWAS rely on SNP microarrays having several hundred thousand to more than a million SNPs represented on a single array
 There are two main experimental designs
 Family‐based design  markers are measured in affected individuals (probands) and unaffected individuals to identify differences
in the frequency of variants
 Population‐based design  large number of unrelated cases and controls are studied
 hundreds or thousands in each group (larger the better in terms of statistical power)
 !!! GWAS which succeed in identifying strong evidence of association often implicate intergenic regions far removed from
protein‐coding genes.
 A key aspect of genome‐wide association studies is that replication studies are required to confirm that positive signals are authentic.
 NCI‐NHGRI Working Group on Replication in Association Studies  to eliminate false positive results that often occur.
 Repositories of GWAS data
 Catalog of Published Genome‐Wide Association Studies >>> at the National Human Genome Research Institute (NHGRI)
 Database of Genotype and Phenotype (dbGaP) >>> The National Library of Medicine (NLM)
o study documentation (e.g., protocols and data collection instruments); phenotypic data (of individuals and as a summary);
genetic data (genotypes, pedigrees, mapping results); statistical results (e.g., linkage and association results)
28.12.2023

Approaches to Identify Disease-Associated Genes & Loci

Genome-Wide Association Studies (GWAS)

 a large‐scale study by the Wellcome Trust Case Control Consortium


(2007)
 50 research groups
 >16,000 individuals
 ~2,000 affected individuals (one of seven common familial
diseases)
 ~3,000 control individuals (!!!)
 ~500,000 SNPs measured for each individual
 24 strong association signals were found for 6 of the 7 diseases
 Some previously identified, many novel

Approaches to Identify Disease-Associated Genes & Loci

Identification of Chromosomal Abnormalities

 The most common chromosomal aberrations in early development include the gain or loss of
whole chromosomes.
 Other common phenomena  large‐scale duplications, deletions, or rearrangements
involving many millions of base pairs
 Can be detected using >>> Standard cytogenetic approaches (karyotype analysis and
fluorescence in situ hybridization (FISH)
 Techniques used / improved
 Spectral karyotyping/multiplex‐FISH (SKY/M‐FISH)
o Different fluorescence for each chromosome, facilitating the identification of
abnormal karyotypes
 Array comparative genomic hybridization (aCGH),
o A high‐resolution karyotype analysis solution for the detection of unbalanced
structural and numerical chromosomal alterations with high‐throughput capabilities
 Both genomic microarrays (aCGH) and SNP microarrays are used routinely to identify
disease‐associated chromosomal abnormalities.
28.12.2023

Approaches to Identify Disease-Associated Genes & Loci

Human Genome Sequencing

 Genome Sequencing to Identify Monogenic Disorders


 Exome sequencing has been particularly useful for identifying variants that
cause monogenic disorders
 As the majority of Mendelian diseases are caused primarily by
mutations affecting the coding region of a gene.
 Targeted sequencing
 A powerful approach to studying monogenic disorders
 Whole genome sequencing can be a powerful tool especially for complex
disorders

Approaches to Identify Disease-Associated Genes & Loci

Research Versus Clinical Sequencing and Incidental Findings

 research studies must be approved by an Institutional Review Board (IRB) to confirm that appropriate procedures are in place
 In Turkey Ethics Committee (or Board)
 Informed consent must be obtained from the research participants or from patients
 The informed consent document explains the risks and benefits of a study
o The risk of an exome study includes the potential loss of sequence data by the research team
o The possible negative impact of learning that a family member has a disease‐causing mutation
 Consider a research study involving whole‐exome sequencing of a child with autism and his/her parents. The inclusion of the
parents’ exomes is critical because it allows inherited variants to be distinguished from de novo variants.
 What procedure will be followed if a parent or child has a mutation in a cancer‐causing gene? This possibility should be
addressed as part of the informed consent process, and the IRB should review this procedure.
 For clinical sequencing
 The American College of Medical Genetics and Genomics (ACMG) issued recommendations for reporting incidental findings in
exome and genome sequencing
 Primary finding as “pathogenic alterations in a gene or genes that are relevant to the diagnostic indication for which the
sequencing was ordered
 Incidental findings are unexpected positive findings.
o The results of a deliberate search for pathogenic or likely pathogenic alterations in genes that are not apparently
relevant to a diagnostic indication for which the sequencing test was ordered
28.12.2023

Approaches to Identify Disease-Associated Genes & Loci

Research Versus Clinical Sequencing and Incidental Findings

 research studies must be approved by an Institutional Review Board (IRB) to confirm that appropriate procedures are in place
 In Turkey Ethics Committee (or Board)
 Informed consent must be obtained from the research participants or from patients
 The informed consent document explains the risks and benefits of a study
o The risk of an exome study includes the potential loss of sequence data by the research team
o The possible negative impact of learning that a family member has a disease‐causing mutation
 Consider a research study involving whole‐exome sequencing of a child with autism and his/her parents. The inclusion of the
parents’ exomes is critical because it allows inherited variants to be distinguished from de novo variants.
 What procedure will be followed if a parent or child has a mutation in a cancer‐causing gene? This possibility should be
addressed as part of the informed consent process, and the IRB should review this procedure.
 For clinical sequencing
 The American College of Medical Genetics and Genomics (ACMG) issued recommendations for reporting incidental findings in
exome and genome sequencing
 Primary finding as “pathogenic alterations in a gene or genes that are relevant to the diagnostic indication for which the
sequencing was ordered
 Incidental findings are unexpected positive findings.
o The results of a deliberate search for pathogenic or likely pathogenic alterations in genes that are not apparently
relevant to a diagnostic indication for which the sequencing test was ordered

Approaches to Identify Disease-Associated Genes & Loci

Disease-Causing Variants in Apparently Normal Individuals

 How many disease‐associated variants occur in healthy people?


 How many variants that disrupt the function of protein‐coding genes occur in apparently normal people?
 The 1000 Genomes Project
o >2600 HGMD entries in the 1000 Genomes low‐coverage pilot data
o Each individual harbored 281–515 missense mutations
 40–85 of which were homozygous and predicted to be deleterious
o Furthermore, each individual had 40–110 variants identified as disease‐causing in the HGMD database
 Of these variants 3–24 were homozygous, meaning that both chromosomal copies carried the deleterious variants.
 Outcome 1 ‐ the number of deleterious alleles present in the genome of even apparently normal individuals is quite high
 Outcome 2 ‐ the databases such as HGMD that predict which variants are deleterious may include false positive entries
 Loss‐of‐function variants in healthy individuals may be categorized several ways
o Severe recessive alleles may occur in the heterozygous state.
o Alleles that are not severe may still impact disease risk and phenotype.
o There is benign loss of function variation (perhaps an example would be the loss of an olfactory receptor gene).
o There are variants that do not appreciably disrupt gene function.
o Many variants represent sequencing and annotation artifacts.
28.12.2023

Human Disease Genes in Model Organisms

 The study of human disease genes and gene products in other organisms is of fundamental importance in our efforts to understand
the pathophysiology of human disease
 While mutations in genes cause many diseases, it is the aberrant protein product that has the proximal functional consequence on
the cell and ultimately on the organism
 Once a human disease gene is identified in a model organism, it can often be knocked out or otherwise manipulated.
 This allows the phenotypic consequences of specific mutations to be assessed.

Human Disease Orthologs in Nonvertebrate Species


 A basic approach is to identify which known human disease genes have orthologs in model organisms.
 At the time that C. elegans was sequenced, about 65% of human disease genes had identifiable C. elegans orthologs.
 Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, Schizosaccharomyces pombe, Arabidopsis,
Dictyostelium discoideum

Human Disease Genes in Model Organisms


28.12.2023

Human Disease Genes in Model Organisms

Human Disease Genes in Model Organisms

Human Disease Orthologs in Rodents

 Mouse Genome Sequencing Consortium


 Although mice and humans are separated by 75 million years of evolution, 90 percent of the mouse genome can be aligned
with large segments of chromosomes in the human genome
 At the nucleotide level, approximately 40% of the human genome can be aligned to the mouse genome. These sequences
seem to represent most of the orthologous sequences that remain in both lineages from the common ancestor, with the rest
likely to have been deleted in one or both genomes.
 The number of genes estimated to be in the mouse genome was about 30,000. This is similar to the number estimated to be
in the human genome. As with humans, this number was later lowered to the 20,000‐25,000 range
 The Whole Mouse Catalog describes mouse models of human disease
 The public consortium that sequenced the mouse genome reported that 687 human disease genes have clear orthologs in mouse

 Sequencing of the genome of the Norway rat


 Of 1112 well‐characterized human disease genes from
HGMD, 76% have orthologs in rat
 Only six human disease genes were found to lack rat
orthologs.
 In general, the consortium concluded that human disease
genes tend to be well conserved in rat
28.12.2023

Human Disease Genes in Model Organisms

Human Disease Orthologs in Primates

 Chimpanzee Sequencing and Analysis Consortium,


2005
 While the chimpanzee and human genomes are
extremely closely related
  many common human disease variants
correspond to the wildtype form allele in the
chimpanzee
 It is possible that not all of these mutations are true
positive disease‐associated alleles in humans
 Conceivably, specifc changes in the human
environment in the past several million years have
made such ancestral sequences deleterious, such that
an altered sequence in humans is adaptive.

Functional Classification of Disease Genes

 Analysis of 923 human genes that are associated with


human disease.
 Primarily causing monogenic disorders
 Classification according to the function of the
protein product
 Enzymes represent the largest functional
category and account for 31% of the total gene
products associated with disease.
  In contrast, only 15% of positionally cloned
disease genes encode enzymes
 ??? historical bias toward our knowledge of
disease‐causing mutations that are based on
enzymatic defects

You might also like