You are on page 1of 27

The Human Genome

• The human genome is the complete set of genetic


information for humans (Homo sapiens)

• This information is encoded as DNA sequences within:


– 23 chromosome pairs in cell nuclei
– A small DNA molecule found within individual mitochondria
Nuclear DNA:
– Organized into large linear segments, the chromosomes
– Chromosomes vary in length from 47 x 106 bp to 246 x 106 bp
– Somatic cells in diploid eukaryotes contain 23 pairs of
chromosomes (22 pairs of autosomes plus two sex
chromosomes; XX in female; XY in male)
– The pairs are not perfect pairs (they are not identical in DNA
sequences, as one originates from the father and one from the
mother)
– Haploid human genomes (contained in egg and sperm cells)
consist of ~3 x 109 base pairs
– Diploid genomes (found in somatic cells) have twice the DNA
content (~6 x 109 bp)
Genomic informations

Graphical representation of the idealized


human diploid karyotype showing the
organization of the genome into
chromosomes
The figure shows both the female (XX) and
male (XY) versions of the 23rd chromosome
pair
Chromosomes are shown aligned at their
centromeres
The mtDNA is not shown

NCBI Genome Id 51
Ploidy Diploid
.
Number of chromosomes ~3 x 109 bp
Genome size 23 pairs
mtDNA:
•Small circular DNA (16,569 bp)
•Contains just 37 genes
– 13 of these genes code for proteins involved in respiratory complex, the
main biochemical component of energy-generating mitochondria (e.g.
gene for ATPase subunits 6 and 8; cyt c oxidase subunits I, II, III; cyt b;
NADH hydrogenase subunits 1-6)
– Other 24 specify ncRNA molecules required for expression of
mitochondrial genome [e.g. rRNA (12S and 16S) genes; tRNA genes]
– Genes are more tightly packed than nuclear genome and they contain no
introns
•The most abundant DNA molecule in the cell (~800
mitochondria/cell; ~10 copies of mtDNA/mitochondria)
– The abundance of mtDNA has been exploited in the analysis of ancient
DNA and in forensic investigations where limiting starting material can
hinder genetic analysis
Human genomes include:
– Protein-coding DNA genes
– Noncoding DNA (ncDNA)
Protein coding DNA sequences:
– Defined as those sequences that can be transcribed into mRNA and translated
into proteins during the human life cycle
– ~20,000 protein-coding genes per haploid genome (fewer than anticipated)
– Account for only a very small fraction of the genome (<2%)
– Distributed unevenly across the chromosomes, with an especially high gene
density within chromosomes 19, 11, and 1
– Protein-coding sequences represent the most widely studied and best
understood component of the human genome
– These sequences lead to the production of all human proteins, although
several biological processes (e.g. DNA rearrangements and alternative pre-
mRNA splicing) can lead to the production of many more unique proteins than
the number of protein-coding genes
– The complete modular protein-coding capacity of the genome is contained
within the exome, and consists of DNA sequences encoded by exons that can
be translated into proteins
– Because of its biological importance and lesser amount (<2% of the genome),
sequencing of the exome was the first major milepost of the Human Genome
Project (HGP)
• The size of protein-coding genes within the human
genome shows enormous variability
• For example,
– The gene for histone H1a (HIST1HIA) is relatively small and
simple, lacking introns and encoding mRNA sequences of 781 nt
and a 215 amino acid protein (648 nt open reading frame)
– Dystrophin (DMD) is the largest protein-coding gene in the
human reference genome, spanning a total of 2.2 Mbp
– Titin (TTN) has the longest coding sequence (80,780 bp), the
largest number of exons (364), and the longest single exon
(17,106 bp)
• Over the whole genome:
• The average size of an exon is 145 bp
• The average number of exons is 8.8
• The average coding sequence encodes 447 amino acids
Noncoding DNA:
– Defined as all of the DNA sequences not found within
protein-coding exons, and are never represented within
the sequence of proteins
– Accounts for >98% of the genome
– Role in
• Regulation of gene expression
• Organization of chromosome architecture
• Controlling epigenetic inheritance
• Includes:
– Genes for noncoding RNA (e.g. tRNA and rRNA)
– Untranslated components of protein-coding genes
• e.g. introns (within most protein-coding genes of the human genome, the length of
intron sequences is 10- to 100-times the length of exon sequences)
• 5‘- and 3‘-UTRs of mRNA
– Regulatory DNA sequences
• Some play role in gene expression, e.g. promoters, enhancers, silencers, operators
• Some regulate structural features of the chromosomes, e.g. telomeres,
centromeres
• Origin of replication
– Sequences related to mobile genetic elements
– Pseudogenes
– Repetitive DNA sequences
• Tandem repeats
• Interspersed repeats
– Sequences for which as yet no function has been elucidated
ncRNA (transcribed noncoding regions):
•Several regions are transcribed into functional noncoding RNA
•Human genome contains genes encoding several ncRNAs:
tRNA, rRNA, miRNA, siRNAs, snRNAs, snoRNAs, lncRNAs
•Contributes to epigenetics, transcription, RNA processing and
the translational machinery
– Regulates the expression of protein-coding genes
– Regulates splicing
– Regulates mRNA translation and stability
– Regulates chromatin structure (including histone modifications)
– Regulates DNA methylation
– Regulates DNA recombination
– Cross-regulates other noncoding RNAs
– No role (this transcription may be the product of non-specific RNA polymerase
activity)
Repetitive sequences:
•Microsatellites
– Tandem repeats fewer than 13 nucleotides
– e.g. dinucleotide repeat (AC)n
– Trinucleotide repeats are of particular importance, as sometimes
occur within coding regions of genes for proteins and may lead
to genetic disorders. For example, Huntington's disease results
from an expansion of the trinucleotide repeat (CAG)n within the
Huntingtin gene on human chromosome 4
– Telomeres end with a microsatellite hexanucleotide repeat of the
sequence (TTAGGG)n
•Minisatellites
– Tandem repeats of longer sequences
– Arrays of repeated sequences 14–60 nucleotides long
Mobile genetic elements (Retrotransposons and
Transposons)
– An abundant component in the human genome
– Played a major role in sculpting the human genome
– Some of these sequences represent endogenous retroviruses
(DNA copies of viral sequences that have become permanently
integrated into the genome and are now passed on to
succeeding generations
– Mobile elements within the human genome can be classified into
• LTR retrotransposons
• SINEs (including Alu elements)
• LINEs
• Class II DNA transposons
Types of genome-wide repeats in human genome
(source: IHGSC 2001 / Genomes 3):
Type of repeat Subtype Approximate number of
copies in human genome
SINEs 15,58,000
Alu 10,90,000
MIR 3,93,000
MIR3 75,000
LINEs 8,68,000
LINE-1 5,16,000
LINE-2 3,15,000
LINE-3 37,000
LTR elements 4,43,000
ERV class I 1,12,000
ERV(K) class II 8,000
ERV(L) class III 83,000
MaLR 2,40,000
DNA transposon 2,94,000
hAT 1,95,000
Tc-1 75,000
PiggyBac 2,000
Unclassified 22,000
The Human Reference Genome
• With the exception of identical twins, all humans show
variation in genomic DNA sequences
• The Human Reference Genome (HRG) is used as a
standard sequence reference
• There are several important points concerning the
Human Reference Genome
– The HRG is a haploid sequence. Each chromosome is
represented once
– The HRG is a composite sequence, and does not correspond to
any actual human individual
– The HRG is periodically updated to correct errors and
ambiguities
– The HRG in no way represents an "ideal" or "perfect" human
individual; It is simply a standardized representation or model
that is used for comparative purposes
Table :
•Basic information about these DNA molecules and their
gene content, based on a reference genome
•Summarizes the physical organization and gene content of
the human reference genome (that does not represent the
sequence of any specific individual)
•Data source: Ensembl genome browser release 68, July
2012
Centrome
Chromoso Length Confirmed Putative Pseudoge Misc re
Base pairs miRNA rRNA snRNA snoRNA
me (mm) proteins proteins nes ncRNA position
(Mbp)

1 85 249,250,621 2,012 31 1,130 134 66 221 145 106 125.0

2 83 243,199,373 1,203 50 948 115 40 161 117 93 93.3

3 67 198,022,430 1,040 25 719 99 29 138 87 77 91.0

4 65 191,154,276 718 39 698 92 24 120 56 71 50.4

5 62 180,915,260 849 24 676 83 25 106 61 68 48.4

6 58 171,115,067 1,002 39 731 81 26 111 73 67 61.0

7 54 159,138,663 866 34 803 90 24 90 76 70 59.9

8 50 146,364,022 659 39 568 80 28 86 52 42 45.6

9 48 141,213,431 785 15 714 69 19 66 51 55 49.0

Chromosome length: Estimated by multiplying the number of base pairs by 0.34


nanometers, the distance between base pairs in the DNA double helix
Number of proteins: Based on the number of initial precursor mRNA transcripts, and
does not include products of alternative pre-mRNA splicing, or modifications to protein
structure that occur after translation
10 46 135,534,747 745 18 500 64 32 87 56 56 40.2

11 46 135,006,516 1,258 48 775 63 24 74 76 53 53.7

12 45 133,851,895 1,003 47 582 72 27 106 62 69 35.8

13 39 115,169,878 318 8 323 42 16 45 34 36 17.9

14 36 107,349,540 601 50 472 92 10 65 97 46 17.6

15 35 102,531,392 562 43 473 78 13 63 136 39 19.0

16 31 90,354,753 805 65 429 52 32 53 58 34 36.6

17 28 81,195,210 1,158 44 300 61 15 80 71 46 24.0

18 27 78,077,248 268 20 59 32 13 51 36 25 17.2

19 20 59,128,983 1,399 26 181 110 13 29 31 15 26.5

20 21 63,025,520 533 13 213 57 15 46 37 34 27.5

21 16 48,129,895 225 8 150 16 5 21 19 8 13.2

22 17 51,304,566 431 21 308 31 5 23 23 23 14.7

X 53 155,270,560 815 23 780 128 22 85 64 52 60.6

Y 20 59,373,566 45 8 327 15 7 17 3 2 12.5

mtDNA 0.0054 16,569 13 0 0 0 2 0 0 22 N/A

Pseudogenes: The olfactory receptor gene family is one of the best-documented


examples of pseudogenes in the human genome. More than 60% of the genes in
this family are non-functional pseudogenes in humans
Organization of human genome
Human genome
(~3 x 109 bp)

Genes and gene related sequences Intergenic DNA
(1,200 Mb) (2,000 Mb)
↓ ↓
Genes Related sequences Interspersed repeats Other intergenic regions
(48 Mb) (1,152 Mb) (1,400 Mb) (600 Mb)
↓ ↓
Pseudogenes Gene fragments Introns UTRs ↓ Microsatellites Others
↓ (90 Mb) (510 Mb)
LINEs SINEs LTR DNA
elements transposons
(640 Mb) (420 Mb) (250 Mb) (90 Mb)
• Comparative genomics studies of mammalian genomes
suggest that ~5% of human genome has been conserved by
evolution since the divergence of existing lineages ~200
million years ago, containing a vast majority of genes
• Differences among the genomes of human individuals (on the
order of 0.1%)
• Chimpanzee (closest living relative) differs by 1.23%
– Considerable differences between humans and chimps may be due as
much to genome level variation in the number, function and expression
of genes rather than DNA sequence changes in shared genes
– 1/3rd of human genome has exactly the same protein translation as
their chimpanzee orthologs
– A major difference between two genomes is human chromosome 2,
which is equivalent to a fusion of chimpanzee chromosome 12 and 13
(later renamed 2A and 2B, respectively)
Human Genome Project (HGP)

• An international scientific research project


• Goals:
– Determining the sequence of base pairs which make up human
DNA
– Identifying and mapping the total genes of the human genome
from both a physical and functional standpoint
• Thousands of human genomes have been completely
sequenced, and many more have been mapped at lower
levels of resolution
• It is the largest collaborative biological project
Contributors
• The Human Genome Project originally aimed to map the
nucleotides contained in a human haploid reference genome
(more than three billion) -
– US Department of Energy’s Office of Health and Environmental Research
– US National Institutes of Health (NIH) National Human Genome Research
Institute (Dr. James Watson)
– Celera Genomics (Dr. Craig Venter)
• Several groups have announced efforts to extend this to
diploid human genomes including (two copies of genetic information
derived from two parents are not identical copies of haploid genome)
– The International HapMap Project
– Applied Biosystems
– Perlegen
– Illumina
– J. Craig Venter Institute
– Personal Genome Project
– Roche-454
Shotgun Approach
Genome broken into smaller pieces
(~1,50,000 bp long)

DNA fragments ligated into BACs
(derived from genetically engineered bacterial
chromosomes)

Vectors containing the genes inserted into bacteria
(replication; multiple copies)

Inserts sequenced separately as a small "shotgun" project
and then assembled
• The Human Genome Project was declared complete in
April 2003
– An initial rough draft of the human genome was available in June
2000
– By February 2001 a working draft was completed and published
– Final sequencing mapping of the human genome on April 14,
2003
– Although this was reported to be 99% of the human genome with
99.99% accuracy, a major quality assessment of the human
genome sequence was published on May 27, 2004 indicating
over 92% of sampling exceeded 99.99% accuracy which is
within the intended goal
• Further analyses and papers on the HGP continue to
occur
Key findings
• Key findings of the draft (2001) and complete (2004)
genome sequences include:
– There are ~20,000 genes in human beings, the same range as
in mice
– Understanding how these genes express themselves will
provide clues to how diseases are caused
– The human genome has significantly more segmental
duplications (nearly identical, repeated sections of DNA) than
other mammalian genomes. These sections may underline the
creation of new primate-specific genes
– Fewer than 7% of protein families appeared to be vertebrate-
specific
• Although the sequence of the human genome has been
(almost) completely determined by high throughput DNA
sequencing and bioinformatics approaches, it is not yet
fully understood; much work still needs to be done to
further elucidate the biological functions of their protein
and RNA products
Applications

• The resulting data are used worldwide in biomedical


science, molecular medicine, advances in the diagnosis
and treatment of diseases, anthropology, forensics and
human evolution studies

You might also like