Origin of Genomes

Origin of Genomes
Genome
• The genome of an organism can be defined as ‘the total

DNA content of the cell’
• Entire genetic complement of an organism
• It contains all the genetic information required to direct
the growth and development of the organism
• The term was coined in 1920 by Hans Winkler, Professor
of Botany at the University of Hamburg, Germany
• The Oxford English Dictionary suggests the name to be
a blend of the words gene and chromosome
Types of genomes
• Prokaryotic genomes
• Eukaryotic genomes
– Nuclear genomes
– Mitochondrial genomes
– Chloroplast genomes (Plastome)
• Additionally, the genome can comprise non-chromosomal genetic

elements such as viruses, plasmids and transposable elements
• The term genome can be applied specifically to mean what is
stored on a complete set of nuclear DNA (i.e., nuclear genome)
but can also be applied to what is stored within an organelles that
contain their own DNA, as with the mitochondrial genome or
plastome
Genome composition
• Genome composition is used to describe the make up of

contents of a haploid genome (single copy of all the
genetic information present in the nucleus)
• Should include in detail:
– Genome size
– Proportions of non-repetitive DNA and repetitive DNA
• By comparing the genome compositions between
genomes, we can better understand the evolutionary
history of a given genome
Prokaryotes:
•Small genome size
•85-90% of the genome is non-repetitive DNA, which
means coding DNA mainly forms it
Eukaryotes:
•Larger genome size
•Have the feature of exon-intron organization of protein
coding genes
•The variation of repetitive DNA content in eukaryotes is
also extremely high
•In mammals and plants, the major part of genome is
composed by repetitive DNA
Genome size and complexity
• Genome size is the total number of DNA base pairs in

one copy of a haploid genome (unit bp, Kbp, Mbp)
• There is notable increase in genome complexity from
prokaryotes to multicellular eukaryotes
• The increase in genome size and complexity during
evolution from prokaryotes to single celled eukaryotes
and further to multicellular eukaryotes is due to:
– Increased intracellular structural variety
– Cell differentiation and specialization
• The changes from prokaryotes to multicellular
eukaryotes include
– Gradual increases in gene number, resulting from the
retention of duplicate genes
– Expansion in the size and number of intragenic
spacers (introns)
– Dramatic proliferation of mobile genetic elements
– Increase in repetitive DNA sequences
Reasons for increasing genomic
complexity
Increasing genomic complexity over evolutionary time is

due to:
•Introns and exons
• Mobile genetic elements
• Repetitive DNA sequences
Introns
• Origin unknown, probably in the single ancestor of

eukaryotes
• Average of 4-9 introns per multicellular organism gene
• Average of 2 for unicellular eukaryote gene
• Virtually none has been found in prokaryotes
Repetitive non-coding DNA sequences
• In prokaryotes, genes are packed tightly together with
very little non-coding DNA being present
– E. coli as prokaryotes only have non-repetitive DNA
• Lower eukaryotes such as C. elegans and fruit fly, still
possess more non-repetitive DNA than repetitive DNA
• Higher eukaryotes contain a large amount of repetitive
non-coding DNA
– In some plants and amphibians, the proportion of non-
repetitive DNA is no more than 20%, becoming a
minority component
Non repetitive DNA: Protein coding sequences, RNA encoding genes

• The proportion of repetitive DNA is calculated by using
length of repetitive DNA divided by genome size
• There are two categories of repetitive DNA in genome:
– Tandem repeats
– Interspersed repeats
• Tandem repeats are usually caused by
– Slippage during replication
– Unequal crossing-over
• Examples: microsatellites [fewer than 13 nt repeats; e.g.
(CAG)n, telomere] and minisatellites [14-60 nt long
repeats]
• Count for a significant proportion in genome
• The largest proportion in mammalian is the interspersed
repeats
• Interspersed repeats include:
– Retrotransposons (main contributor of genome
evolution of higher eukaryotes)
– Some protein coding gene families
– Pseudogenes
– TEs can be classified into two categories
• Class 1 (retrotransposons)
• Class 2 (DNA transposons)
• Retrotransposons
– Can be transcribed into RNA, which are reverse
transcribed to cDNA and then inserted at another site
into the genome leading to duplication)
– Divided into:
• Long terminal repeats (LTRs)
• Non-Long Terminal Repeats (Non-LTR)
• Long Terminal Repeats (LTRs)
– Similar to retroviruses, which have both gag and pol
genes to make cDNA from RNA and proteins to insert
into genome, but LTRs can only act within the cell as
they lack the env gene in retroviruses
– LTRs form the largest fraction in most plant genome
– Account for the huge variation in genome size
• Non-Long Terminal Repeats (Non-LTRs)
– Widely spread in eukaryotic genomes
– Can be divided into
• Long interspersed nuclear elements (LINEs)
• Short interspersed nuclear elements (SINEs)
• Long interspersed nuclear elements (LINEs)
– Are able to encode two Open Reading Frames (ORFs) to
generate reverse transcriptase and endonuclease, which
are essential in retrotransposition
– The human genome has around 5,00,000 LINEs, taking
around 17% of the genome
– Examples, LINE-1, LINE-2, LINE-3
• Short interspersed nuclear elements (SINEs)
– Are usually less than 500 base pairs and need to co-opt
with the LINEs machinery to function as nonautonomous
retrotransposons
– The Alu element is the most common SINEs found in
primates, it has a length of about 350 base pairs and
takes about 11% of the human genome with around
1,500,000 copies
• DNA transposons
– Generally move by "cut and paste" in the genome, but
duplication has also been observed
– Class 2 TEs do not use RNA as intermediate
– Popular in bacteria
• Protein coding gene families (Multigene families)
– Common components of all genomes
– Example of multigene families of nonidentical genes are 2
related families of genes that encode globins (α and β)
• Pseudogenes
– Inactive copies of protein-coding genes
– Often generated by gene duplication, that have become
nonfunctional through the accumulation of inactivating
mutations
– Example, the olfactory receptor gene family is one of the
best-documented examples of pseudogenes in the human
genome; More than 60% of the genes in this family are
non-functional pseudogenes in humans
C-Value
• The C-value is the measure of genome size, typically

expressed in base pairs of DNA per haploid genome
• The term haploid genome refers to a single copy of all
the genetic information present in the nucleus
• Diploid nuclei of organisms produced sexually will
contain two complete, and not quite identical, copies of a
haploid genome, each derived from one of the parents
• There is enormous variation in the range of C-values,
from <106 bp for a mycoplasma to >1011 bp for some
plants and amphibians
C-values of a range of different organisms:
Organism Genome size Chromosome Predicted no. of genes

(bp) no. (n) (approximate values)
Mycoplasma genitalium ~106 1 500
Escherichia coli K12 4 x 106 1 4,500
Saccharomyces 12 x 106 16 6,000
cerevisiae (yeast)
Caenorhabditis elegans 97 x 106 6 20,000
(worm)
Drosophila melanogaster 180 x 106 6 14,000
(fly)
Oryza sativa (rice) 466 x 106 12 40,000
Arabidopsis thaliana 119 x 106 5 28,000
(weed)
Fugu rubripes (pufferfish) 390 x 106 22 25,000
(vertebrate)
Mouse 2,500 x 106 20 25,000
Humans 3,300 x 106 23 25,000
• DNA content of genomes:
Genome Genome size or C-value (bp)
Mycoplasma ~106
Bacteria >106 to <107
Fungi >107 to <108
Algae >107 to <108
Worms ~108
Insects >108 to <1010
Fish <109 to 1010
Amphibians <109 to >1011
Reptiles >109 to <1010
Birds ~109
Mammals >109 to <1010
Flowering plants >108 to <1011
• Predictions:
– As genome complexity increases, C-value should
increase
– C-value of a family should be same as the members
perform same type of functions
– As C-value increases, number of genes should
increase
1. As genome complexity increases, C-value should increase
Prediction true:
Prokaryote → Single celled → Invertebrate → Vertebrate
eukaryote
E. coli Yeast Drosophila Human
Prediction incorrect:
Homo sapiens (~3 x 109 bp) and Xenopus laevis (3.1 x 109 bp)
Flowering plants and insects have same amount of DNA
2. C-value of a family should be same as the members
perform same type of functions
Prediction true:
Fungi, algae, worms, fish, reptiles, birds, mammals have
same amount of DNA within the family/phyla
Insects, amphibians, flowering plants have wide ranges of
C-value
3. As C-value increases, number of genes should increase
Prediction true:
E. coli → Yeast → Drosophila → Human
Monocots (e.g. Oryza) have more number of genes than
dicots (e.g. Arabidopsis)
Variation in genome size between E. coli and human is
~700X, but the variation in number of genes is only ~7X
C-Value Paradox
• One surprising outcome of analyzing the C-value from
different organisms is the so-called ‘C-value paradox’
• Describes the lack of relationship between the DNA
content (C-value) of an organism and its coding potential
• Refers to the fact that genome sizes (and hence the C-
values) do not always correlate with genetic and/or
morphological complexity
• Means that organisms with similar complexity may have
very different genome sizes and conversely organisms
with similar C-values may not be equally complex
• Means that the larger genome size does not necessarily
contain more number of genes
• Examples of C-value paradox (facts):
– Toad xenopus and man have genomes of essentially the same
size. But we assume that man is more complex in terms of
genetic development
– In some phyla there are extremely large variations in DNA
content between organisms that do not vary much in complexity.
This is especially marked in insects, amphibians and plants
• A cricket has a genome 11X the size of a fruitfly
• In amphibians, the smallest genomes are <109 bp, while the
largest are >1011 bp
• There are unlikely to be a large difference in the number of
genes needed to specify these amphibians
– This variation among phyla does not occur in birds, reptiles and
mammals, which all show little variation within the group, with an
~2X range of genome sizes
• Genome size cannot be used as a predictor of genetic or
morphological complexity (there is no correlation between
genome size and genetic complexity)
– The organism with the largest genome is not necessarily the most
complex
– Genome size is positively correlated with the morphological
complexity among prokaryotes and lower eukaryotes (exception to
C-value paradox)
– However, after mollusks and all the other higher eukaryotes above,
this correlation is no longer effective
• Bigger genome does not mean more genes
• The proportion of non-repetitive DNA may show a decrease
along with the increase of genome size in higher eukaryotes
• Genome sizes of organisms may vary within many phyla
At present we are unable to understand why natural selection allows this

variation and whether it has evolutionary consequences
Organism Genome Predicted no. of Average number % of DNA that is
size (bp) genes of introns per repetitive
(approximate gene
values)
Mycoplasma ~106 500 0 0
genitalium
Escherichia coli K12 4 x 106 4,500 0 <1
Saccharomyces 12 x 106 6,000 0.04 3.4

cerevisiae
Caenorhabditis 97 x 106 20,000 5 6.3
elegans
Drosophila 180 x 106 14,000 3 12
melanogaster
Oryza sativa 466 x 106 40,000 - 42
Arabidopsis thaliana 119 x 106 28,000 3 -

(weed)
Fugu rubripes 390 x 106 25,000 5 2.7
Mouse 2,500 x 106 25,000 - -
Humans 3,300 x 106 25,000 6 46

• Add ‘Duplication’ and ‘Lateral gene transfer’ from
‘Acquisition of new genes’

Origin of Genomes - Final

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Origin of Genomes - Final

Uploaded by

Copyright:

Available Formats

• The genome of an organism can be defined as ‘the total

• Additionally, the genome can comprise non-chromosomal genetic

• Genome composition is used to describe the make up of

• Genome size is the total number of DNA base pairs in

Increasing genomic complexity over evolutionary time is

• Origin unknown, probably in the single ancestor of

Non repetitive DNA: Protein coding sequences, RNA encoding genes

• The C-value is the measure of genome size, typically

Organism Genome size Chromosome Predicted no. of genes

At present we are unable to understand why natural selection allows this

Saccharomyces 12 x 106 6,000 0.04 3.4

Arabidopsis thaliana 119 x 106 28,000 3 -

Mouse 2,500 x 106 25,000 - -

Humans 3,300 x 106 25,000 6 46

You might also like