Chapter I

enome Science (Genomic
Dereje Beyene (PhD)

College of Natural Science
Microbial Cellular and Molecular Biology
Department
Nov, 2013/14
Francis Crick
James Watson
Figure X:DNA is a double helix. (A) Francis Crick (left) and James Watson (right)
proposed that the DNA molecule has a double-helical structure. (B) Biochemists can now
pinpoint the position of every atom in a DNA molecule. To see that the essential features
of the original Watson-Crick model have been verified, follow with your eyes the doublehelical chains of sugar-phosphate groups and note the horizontal rungs of the bases. 2
Definition of the term

Genomics
The term Genomics derived from the term genome

The term genomics
is used for the first time in 1986,
when sequencing and mapping of the entire human genome

initiated
Genomics: it is the field of genome studies and includes
intensive effort to determine the sequence of DNA and RNA
using high-throughput sequencing strategies; generating fine
scale genetic maps; microchips arrays, collecting genome
variations within a population (e.g. 1000 genome project) and
ascertaining the transcriptional control genes: and employing
digital technology and computational intensive analysis to
understand the structure, function and evolution of diverse
3
Genomics Cont.....
Genomics provides essential tools to speed up the work of the
forward geneticists and is now a scientific discipline in its own right.
Genetics
Genetics
Application of genomics science in all areas of biology has
lowered the barriers that once separated the plant, animal, microbial
research communities.
Experimental
techniques
for
studying
transcriptomes
and
proteomes are providing novel insights into genome expression and the
new discipline of systems biology is linking genome biology with4
Genomics Cont.....
The investigation of the roles and functions of single genes
is a primary focus of
molecular biology or genetics and is a
common topic of modern medical and biological research.

Research of single genes does not fall into the definition of
genomics unless the aim of this genetic, pathway, and
functional information analysis is to elucidate its effect on, place
in, and response to the entire genome's networks.
A genome is the sum total of all an individual organism's genes.
Thus, genomics is the study of all the genes of a cell, or
tissue, at the DNA (genotype), mRNA (transcriptome), or
protein (proteome) levels.
5
Genomics Cont.....
Genome: The entire genetic complement of a living organism
Gene:
A DNA segment containing biological information and
hence coding for
RNA and/or polypeptide molecule
Genomic Expression: It is the series of events by which the
biological information carried by the genome is released and
made available to the cell.
The values of genome sequence lies in their

annotations:
Genome Annotation: Characterizing genomic features using
computational and
experimental methods (Functional
genomics).
Genes: Four levels of annotation

Gene prediction where are genes?
What do they look like?
Genome sequence can

tell us
Everything about the organism's life
Its developmental program
Enables us to identify genes responsible for disease

resistance or susceptibility
Novel gene discovery
Where are we going and where we came from?

.Evolution
How similar are we to apes, trees, and yeast? (comparative

genomics)
To define the minimum genome size of free living organisms

then exploit for the structure of minimal synthetic genome
that can support life(corner stone for Synthetic Biology)
The Basic Scheme of genome annotation and

Delivery Pipeline
STS: Sequence
Delivery Pipe Line
Analysis Pipe
Line
Taged Site
Fig. XX. Bioinormatics Uses Infromation Technology to
Manage and Analyze

Information Generated by the Life Sciences
Out
Come
MAJOR RESEARCH AREAS OF GENOMICS

Bacteriophage Genomics
Bacteriophages have played and continue to play a key role in
bacterial genetics and molecular biology. E.g. Cloning vector, M13
Vector ......
Bacteriophage genome sequences can be obtained through
direct sequencing of
isolated bacteriophages, but can also be derived as part of
microbial genomes
(How?).
Analysis of bacterial genomes has shown that a substantial
amount of microbial
N.B: Bacteriophage genomes are especially mosaic: the genome
DNA consists of prophage sequences and prophage-like elements.
of any one phage species appears to be composed of numerous
A detailed database mining of these sequences offers insights
individual modules. These modules may be found in other phage
into the role of
10
species in different arrangements.
prophages in shaping the bacterial genome
) Cyanobacteria Genomics
Cyanobacteria are prokaryotic organisms that have served as

important model organisms for studying oxygenic photosynthesis
and have played a significant role in the Earths history as primary
producers of atmospheric oxygen.
11
IV) Plant Genomics

Recent technological advancements have substantially expanded our
ability to analyze and understand plant genomes and to reduce the
gap existing between genotype and phenotype.
The fast evolving field of genomics allows scientists to analyze
thousand of genes in parallel, to understand the genetic
architecture of plant genomes and also to isolate the genes
responsible for mutations.
Whole plant genomes (about 33 plant species) can now be
sequenced available at http://phytozome.net/)
12
Fig. Phylogenic tree up to date as of September 4th 2012

Source: http://genomevolution.org/wiki/index.php/Sequenced_plant_g 13
enomes
III) Human genomics

The Human Genome Project (HGP,
it was
initiated 1990) is an
international scientific
research project with a primary
goal of
determining the sequence of
chemical base
pairs (nitrogenous bases; Purine (A
& G);
Pyrimidine (T & C)) which make up
DNA,
and of identifying and mapping the
approximately 20,000-25,000 genes
of the
human genome from both a physical
Display of the results of the project required significant bioinformatics resources. The
and
sequence of the human reference
assembly can be explored using the
UCSC Genome Browser or Ensembl.
functional standpoint.
The estimated
size of Cont
human genome
Scope of Genomics
14
size is
Organization Of Human
Genome
15
Scope of Genomics Cont
V) Metagenomics
Definition:
Metagenomics
Environmental
Genomics
or
Community Genomics) is the study of genomes recovered

from environmental samples without the need for culturing
them
Metagenomics data processed using Bioinformatics tools
Metagenomics:
It is powerful tool to reveal the previously hidden
diversity of microscopic life (culture based studies).
Thus, it offers a powerful lens for viewing the
microbial world that has the potential to
revolutionize the understanding of the entire
microscopic organisms (bacteria and fungi).
16
Fig. Scheme of the major stages of integrative metagenomic ecosystem study of

microbial ecology
17
Examining phylogenetic diversity using

variable region of 16S rRNA (Bacteria)
hyper
Fig. The operon of nuclear rRNA Map of bacteria

18
Why 16S rRNA for

Bacteria (Eubacteria
and archea )
Conserved regions
metagenomics?
of the gene are
identical for all
bacteria while the
variable
regions
contain
specific
sites
unique
to
individual bacteria.
The
uniqueness
(V1 to V9) enables
taxonomic
positioning
and
identification
of
bacteria.
19
Universal 16S RNA Primers For Metagenomics and/or

Species Identification Of Microbial Isolates
Importance of PCR primer pair selection
Broad-range primers used in the PCR reaction preceding the
sequencing target the conserved regions of the 16S rRNA gene
in order to unselectively amplify all bacterial DNA present in the
sample.
Therefore, critical evaluation of the primer pair coverage is a
prerequisite for unbiased and comprehensive sequence
information.
Microbial DNA sequence analysis pipeline
Microbial genomic DNA extraction from the samples
PCR amplification of 16S rRNA genes with the most conserved
universal
primers and constructing into a cloning vector
Sequencing of the PCR products
Quality confirmation of the resulting 16S rRNA gene fragments
Removal of vector and primer sequences
Data validation and consensus building
20
Comparison of the sequence data with public databases for
Map of Nuclear rRNA genes and Their ITS of

Fungi
21
Some Primer Map of ITS1 and ITS2 regions of Nuclear

rRNA of Fungi
22
Aims of Metagenomics
Diversity patterns of microorganisms can be used for

monitoring and predicting environmental conditions and
change. How? e.g. Microbiome in human gut!!
Examining genes/operons for desirable enzyme candidates

(e.g., cellulases, chitinases, lipases, antibiotics, other natural
products
Identified genes may be exploited for industrial or medical
applications.
Examining secretory, regulatory, and signal transduction

mechanisms associated with samples or genes of interest.
Examining bacteriophage and/or plasmid sequences.
These potentially influence diversity and structure of
microbial communities.
Examining potential lateral gene transfer events. Knowledge
of genome plasticity may give us an idea of types selective
pressures exist for gene capture and evolution within a
habitat.
23
Metagenomics Cont
Examining metabolic pathways
Facilitate towards designing culture media for the
growth of previously-uncultured microbes.
Examining genes that predominate in a given
environment compared to others.
Finally, metagenomic data and metadata can be
leveraged towards designing low- and highthroughput experiments focused on defining the
roles of genes and microorganisms in the
establishment of a dynamic microbial community.
24
Why is Metagenomics Important?
All reasons lead to more knowledge:

Organisms can be studied directly in their environments
bypassing the need to isolate each species
There
are
significant
advantages
for
viral
metagenomics, because of difficulties cultivating the
appropriate host, How? bacteriophages!!!
It is important to designing low- and high-throughput
experiments focused on defining the roles of genes and
microorganisms in the establishment of a dynamic
microbial community.
25
Microbial
diversity
Communit
y
26
genomics
27
Microbial
community
DN
ex A
tra
cti
o
Total community Genomes

Microbial
(DNA)
ty
uni
m
m
Co
ling
p
m
Sa oAmplifying
ach
r
p
ap
single gene,
e.g. 16S rRNA
nce
e
qu
Se
d
e
an erat
n
ge
e
tre
Phylogenetic tree
Outcome
s:
1. Phylogenetic
snapshot of most
members of the
community
2. Identification of
novel phylotype
communit
y
REs digest total
DNA and then
shotgun
Assembled
sequencing
and
annotation
genome
s
Total gene pool of
the community
1. Identification of all
genes categories
2. Discovery of new
genes
3. Linking genes to
particular phylotypes
28
29
Pharmacogenomics
Pharmacogenomics:
There is an inter-patient medication response variability in
their efficacy
and toxicity
Inter-patient variability are due to in part to polymorphisms
(the frequency
of the most frequent allele is 99%) in genes encoding for:
I.Drug Metabolizing enzymes,
II.Drug transporters, and/or
III.Drug targets (e.g. enzymes, receptors)
Thus, Pharmacogenomics is a field of study aim to elucidate the

genetic basis for differences in drug efficacy and toxicity, and
it uses genome wide approaches to identify the network of
genes that govern an individuals response to drug therapy.
30
Population
DNA sequencing
of Target gene
Better patient treatments through advanced diagnostics and

personalized medicine Diagnostic tests will guide the clinical
decision-making to prescribe a specific drug, depending on
the patients prognosis to be a responder or non-responder
to a given medication.
31
ome Terminologies
1) Allelic variant: A variation in the normal sequence of a gene.
2) Genotype: The genetic formation or the genetic makeup of an
orgnaism.
3) Genotype-phenotype correlation: The association between the
presence of genetic variation and the resulting physical
characteristics or abnormality.
4) Pharmacogenetics vs. Pharmacogenomics:
Pharmacogentics: it is the study of genetic variation in drug
metabolizing enzymes and the effect on drug response/ it is often a
study of the variations in a targeted gene, or group of functionally
related genes.
Pharmacogenomics represents the general study of the entire
spectrum of genes that affect drug behavior, i.e. It is a much
broader investigation of genetic variations at the level of the
genome.
5) Phenotype: Observable features of the expression of genes.
32
Drug metabolizing enzymes,

DMEs
(Phase
I
enzymes/Cytochrome
P450
enzymes, e.g. CYP2D6; Phase II
enzymes,
e.g.
N-acetyl
transferases)
Drug
transporters
(Solute
Carrier (SLC)- and ATP Binding
Cassette
(ABC)-transporters,
e.g. organic cation transporters,
OCTs, as members of the SLC
family)
Drug
receptors
(ligand
controlled ion channels or class
1 receptors, e.g. glutamate
receptor;
G-protein
coupled
receptors (GPCRs) or class 2
receptors,
e.g.
-receptor;
enzymatic receptors, e.g. insulin
receptor; receptors regulating
gene expression, e.g. steroid
hormone receptor)
Genetic variability is seen both in

the area of pharmacokinetics
(absorption,
distribution,
metabolism and excretion) and in
the area of pharmacodynamics
(drug effects).
33
Pharmac
E.g. Polymorphisms: Thiopurine S-methyltransferase

monogenic traits have a marked effect on pharmacokinetics
(Drug metabolism); such individuals who inherit an enzyme
deficiency must treated with markedly different dose the
affected medications (E.g. 5%-10% of the standard
thiopurine dose)
Beta-aderenergic receptor - can alter the sensitivity of patient
N.B: Most drug effects are determined by the interplay of several gene
to
treatment
(e.g.
beta-agonists),
changing
the
products that govern the pharmacokinetics (drug absorption,
pharmodynmics
of drug response.
distribution metabolism
and excretion); and pharmacodynamics
(effect of drug and mechanisms of action) of medications.
[Pharmacokinetics may simply defined as what the body does to the
drug, whereas Pharmacodynamics which may be defined as what
the drug does to the body]
The goal of Pharmacogenomics research is to elucidate these
polygenic determinants of drug effect.
Research outcome of the pharmacogenomics:
34
Provide new strategies for optimizing drug therapy based on each
35
Pharmac
TPMT: Thiopurine S-methyltransferase
This gene encodes the enzyme that metabolizes thiopurine drugs via Sadenosyl-L-methionine as the S-methyl donor and S-adenosyl-Lhomocysteine as a byproduct.
Thiopurine
drugs
such
chemotherapeutic agents.
as
6-mercaptopurine
are
used
as
Genetic polymorphisms that affect this enzymatic activity are correlated

Table : Summary of TPMT
with variations in sensitivity and toxicity to such drugs within individuals.
Deficiency:
Intolerance (defintion in medicine): inability to withstand or

consume; in ability to absorb or metabolise nutrients.
36
Intolerance (defintion in medicine): inability to withstand or

consume; in ability to absorb or metabolise nutrients.
Drug intolerance:
I. Inability to continue taking, or difficult to take, a medication
blc of an adverse side effect that is not immunity mediated
II. The state of reacting to the normal pharmacologic doses of a
drug with the syptome of overdosage
ne
i
s
Bu
ss
Pharmac
The methyl group (CH3) attached to the methionine sulfur atom in SAM is chemically
reactive. This allows donation of this group to an acceptor substrate in transmethylation
reactions. More than 40 metabolic reactions involve the transfer of a methyl group from
SAM
to
various
substrates,
such
as
nucleic acids,
proteins,
lipids
and
37
secondary metabolites.
Pharmac
1. Thiopurine S-methyltransferase
(EC:2.1.1.67)
2. Hypoxanthine phosphoribosyltransferase
[EC:2.4.2.8]
http://www.kegg.jp/kegg-bin/show_pathway?hsa00983+7172
38
Pharmac
Pharmac
Pharmac
Summary
The adrenergic receptors (subtypes alpha 1, alpha 2, beta 1,
and beta 2) are a prototypic family of guanine nucleotide
binding regulatory protein-coupled receptors that mediate the
physiological effects of the hormone epinephrine and the
neurotransmitter norepinephrine. Specific polymorphisms in this
gene have been shown to affect the resting heart rate and can
be involved in heart failure
Pharmac
Pharmac
Potential of
identify
patients within
Pharmacogenomics
To
a population with the
same diagnosis, who are
genetically predisposed
either not to respond to
therapy or to develop
unacceptable
toxicity,
an then to prospectively
alter their therapy to avoid
treatment that is not likely
to optimal. The remaining
now more homogeneous
population,
can
then
treated with conventional
therapy inwhich they are
not
genetically
The approaches
promise the advent of personalized
predisposed
medicine; to
infail.
which drugs and drug combinations are
optimized for each individual's unique genetic makeup.
44
45
gnificance of Genome size in Genomics

Definition: Genome size is the total amount of DNA contained
within one copy a Genome (haploid). It is measured by picogram
(pg), mega base pair (Mb). There is a genome size database, you
Thesearch
significance
genome size:
can
and get of
theknowing
information.
It is importance to the genomics and broader scientific community as
fundamental
features of genome structure
It uses for genomics-based comparative biodiversity studies, and
It is a direct estimators of the cost and workload of genome projects
46
Genome size Cont
Mycoplasma genitalium (genome size; 0.58 Mb) is a small parasitic bacterium that lives on
the ciliated epithelial cells of the primate genital and respiratory tracts. M. genitalium is the
smallest known genome that can constitute a cell, and the second-smallest bacterium after
the endosymbiont Carsonella ruddii. Until the discovery of Nanoarchaeum in 2002, M.
genitalium was also considered to be the organism with the smallest genome.
N.B: There is a difference between smallest parasitic bacteria and smallest free
47
living bacteria.
Genome size Cont
Aplaha- proteobacteria live in the ocean, 25% abundance

The first cultures members of the clade
The smallest genome and free living
48
Genome size Cont
49
Synthetic Biology and

Genome Size
Synthetic Biology: its primary focus is

building a minimal arteficial cell.
50
Streamlining genomes of model bacteria revealed genome

reduction lead to unanticipated beneficial properties such as:
High electroporation/transformation efficiency
Accurate propagation of recombinant genes and plasmids
Suitable to construct robust minimal synthetic genome, it
provides a
minimal cell a good chassis to assemble kinds of functional
modules.
Synthetic Biology is
referes to reliabley
engineer biological
systems that perform
human-defined
function.
C. Smolke (Nature 441: 277279)
51
omise of Synthetic Biolog

The field of synthetic biology holds a great promise for:
Design,
Construction and
Development of artificial (i.e. man-made) biological
(sub)systems
Thus offering potentially viable new routes to
genetically modified' organisms, smart drugs as well as
model systems to examine artificial genomes and
proteomes.
The
informed
manipulation
of
such
biological
(sub)systems could have an enormous positive impact on
our societies, with its effects being felt across a range of
activities such as the provision of healthcare, environmental
protection and remediation, etc.
52
enome Organization
and also Linearly arreng
53
karyotic & Eukaryotic Ribosomes
54
Single,
Circular
DNA
molecule, localized within
nucleoid
(the
lightly
staining
area
in
the
center of the cell)
Linearly arrenged,
histone
complex
and packagined in
organized passion
55
rokaryote genomes
Example: E. coli
89% coding
> 4,000 genes
122 structural RNA genes
Prophage remains
Transposable elements: Insertion
sequence (IS) and Transposons
(Composite and non-composite)
Horizontal transfers (conjugation)
56
e Genetic Features of Prokaryotic Genomes

Genome sequence inspection can be used to locate genes blc genes
are not random series of Nucleotides but instead have distinct
features
Fig X. A dsDNA molecule has six

reading
frames
(computational
prediction!). Both strands are read in
the 5 to 3 direction. Each starnd has
three reading frames, depending on
which nucleotide is chosen as the
starting position.
However, simple ORF scans are less effective with genome of higher
Eukaryotes this is partly blc of their gene are often split by introns.
Fig X. A protein coding gene
is an open reading frame
(ORF) of triplet codons
57
Prokaryote
Gene organization in the prokaryotic genome
A
simplified
version
of
prokaryotic operon organization.
Genes A, B, and C are
transcribed together onto a
single polycistronic transcript,
which is then translated to
produce three separate proteins.
Promote
r
Represor
protein
encoding gene
Promote
r
Proteins originating from genes

of a common operon often have
similar
functions,
interact
physically
through
proteinprotein
interactions,
or
participate
in
shared
biochemical Structural
pathways.
Operator
gene(s)
58
Prokaryote
cont
Operon
In genetics, an operon is a functioning unit of genomic
DNA containing a cluster of genes under the control of a
single regulatory signal or promoter.
The genes are transcribed together into an mRNA strand
and either translated together in the cytoplasm, or
undergo trans-splicing to create monocistronic mRNAs
that are translated separately, i.e. several strands of
mRNA that each encode a single gene product. The result
of this is that the genes contained in the operon are
either expressed together or not at all.
In short, Several genes must be both co-transcribed
and co-regulated to define an operon.
It is the main features of prokaryotic genome, it is made
of four components: promoter, regulator, operator and
59
An operon is made up of four basic DNA

components:
Promoter a nucleotide sequence that enables a gene to be
transcribed. The promoter is recognized by RNA polymerase,
which then initiates transcription. In RNA synthesis,
promoters indicate which genes should be used for
messenger RNA creation and, by extension, control which
proteins the cell produces.
Regulator - a These genes control the operator gene in
cooperation with certain compounds called inducers and corepressors present in in the cytoplasm. A regulator gene is
not necessarily adjacent to the operator gene its controls.
The regulator gene codes for and produces a protein
substance called repressor. The repressor substance
combines with the operator gene to repress its action.
Operator a segment of DNA that a repressor binds to. It is
classically defined in the lac operon as a segment between
the promoter and the genes of the operon. In the case of a
60
repressor, the repressor protein physically obstructs the RNA
Prokaryotic genome cont

Operons (group of genes that are located adjacent to one another in
the genome, with perhaps just one or two nucleotides bln the end of
one gene and the start of the next) are the chractersitic of features
of prokaryotic genomes
All genes in an operon are expressed as a single unit
Hence, prokaryotic genenomes have more compact genetic
organization with a little space bln genes. How? Give your explanation
(Hint: compare with eukaryotic conding gene organization)
beta-galactosidase:
This enzyme hydrolyzes the bond
between the two sugars, glucose and
galactose. It is coded for by the gene
LacZ.
Lactose Permease:
This enzyme spans the cell
membrane and brings lactose into
the cell from the outside environment.
The
membrane
is
otherwise
essentially impermeable to lactose. It
is coded for by the gene LacY.
Thiogalactoside transacetylase:
The function of this enzyme is not
known. It is coded for by the gene
LacA.
Prokaryote
61
62
gulation of the tryptophan Operon
Premature
termination
The operon contains five structural genes involved in the biosyhthesis
of tryptophan: trpE, D, C, B and A.
Expression of these genes is controled at two levels:
1. The trpR gene encodes a repressor that in the presence of
tryptophan, bind to the operator (o) block transcription.
2. In addition, expression is mediated by an attenuator sequence that
prematurely terminates transcription when high levels of
tryptophan are present. In this case, the attenuated RNA consist of
only a short leader sequence (L). P = promoter
63
gulation of the tryptophan Cont..

Attenuation (in genetics) is a proposed
mechanism of control in some bacterial operons
which results in premature termination of
transcription and which is based on the fact that, in
bacteria, transcription and translation proceed
simultaneously.
Attenuation involves a provisional stop signal
(attenuator), located in the DNA segment that
corresponds to the leader sequence of mRNA.
During attenuation, the ribosome becomes stalled
(delayed) in the attenuator region in the mRNA
leader.
Depending on the metabolic conditions, the
64
attenuator either stops transcription at that point or
gulation of the tryptophan Cont..

Attenuation, or dampening, of
the trp operon is made
possible by the fact that the
rate of translation influences
RNA structure, which in turn
influences
the
rate
of
transcription.
Translation therefore interferes
with transcription, making this
an example of translationmediated
transcription
attenuation.
Mechanistically, this kind of
attenuation
is
achieved
because special sequences
located near the beginning of
the transcript, called the
leader (trpL), interact to
create two possible RNA
conformations:
one
that
terminates transcription (the
65
Fig. Mechanism of transcriptional attenuation of
terminator stem), and one that
ocating genes for functional RNA

ORF scan is approperiate for protein coding genes, but what about
those genes for functional RNAs such as rRNA and tRNA. They have
their own distinctive features, which can be used to aid their
discovery in the genome sequence.
They have the ability to fold into secondary stucture, such as the
cloverleaf (tRNA- intramolecular base pairing)
Anticodon
66
Prokaryote
Fig. X A 50 Kb segment of the E. coli genome.

Insertion sequence (IS) are examples of transposable
elemnts (TEs).
% of DNA Non-coding for Protein
According the diagram, when the

genome size increases the number of
non-protein coding genes also increases
Non-protein
coding
sequences
make up only a
small fraction of
Prokaryotic
genomes.
Prokaryote
67
cont
ukaryotic genome
They have a multiple linear
chromosomes,
each
chromosome containing multiple
origins of replication
(Think about the comparative genome sizes of the
two groups; Prokaryotes and Eukaryotes )
They have a specialized sequences at the ends of chromosomes

to ensure a proper replication of the essential components of
chromosomes (Telomeres), also protects them from nuclease
degradation
They have special sequences to ensure the correct segregation of
homologous chromosomes during cell divisions (Centromeres)
68
Eukaryotic genome Cont
How many genes are there? This question is surprisingly not very
important, and has nothing to do with the organisms complexity.
There is more to genomes than protein-coding genes alone.
Eukaryotic 69
genomes cont
Eukaryotic genome Cont
Protein-coding genes
Although most prokaryotic chromosomes consist almost entirely of
proteincoding genes, such elements make up a small fraction of
most eukaryotic
genomes ( Figure)
As a prime example, the human genome might contain as few as
20,000 genes,
comprising less than 1.5% of the total genome sequence
Eukaryotic 70
genomes cont
Eukaryotic
genomes cont
Introns
Shortly after their discovery, the non-coding intervening sequences
within coding genes (introns) were suggested to account for the
pronounced discrepancy between gene number and genome size. It has
also recently been suggested that most non-coding DNA in animals (but
not plants) is intronic, which would imply that most of the genome is
transcribed even though protein-coding regions represent a tiny
71
minority.
Eukaryotic
genomes cont
Fig. Initiation of Transcription in

Eukaryotes
transcription factor (sometimes called a sequence-specific DNA-binding
factor) is a protein that binds to specific DNA sequences, thereby
controlling the flow (or transcription) of genetic information from DNA72to
xon and Intron Splicing moifs Search

Most introns start from the dinucleotide GU (DNA, GT) and end with the
dinculeotide
AG (in the 5' to 3' direction, mRNA).
GT and AG are referred to as the splice donor and splice acceptor site,
respectively
These consensus sequences are known to be critical, because changing
one of the
conserved nucleotides results in inhibition of splicing.
Upstream from the AG there is a region high in pyrimidines (C and U), or
polypyrimidine tract. Upstream from the polypyrimidine tract is the
branch point.
The branch point always contains an Adenine, but it is
otherwise loosely conserved.
A typical sequence is YNYYRAY, where Y indicates a pyrimidine (C or
U), N denotes
73
Eukaryotic
any nucleotide, R denotes any purine
(Gcont
or A), and A denotes
genomes
YNYYRAY
polypyrimidine
Fig. Exon Intron Consensus sequences in Eukaryotes
Splicing is controlled by specific intron sequences, called splicedonor (GU) and splice-acceptor (AG) sequences, which flank the
exons. Mutations in these sequences may lead to retention of large
segments of intronic DNA by the mRNA, or to entire exons being
spliced out of the mRNA. These changes could result in production
of a nonfunctional protein.
Eukaryotic 74
genomes cont
Mechanism of pre-mRNA splicing

branch-point adenosine
5 splice site
5-
Exon 1
3 splice site
Intron
trans-esterification
trans-esterification
ligated exons
-3
3
(www.wisc.edu/pharm
)
Exon 2
pre-mRNA
Cut at 5 site, lariat

formation
Cut at 3 site, exon

joining, lariat release
lariat intron
3
Eukaryotic 75
genomes cont
lternative splicing
It is a post-transcriptional modification in which a single
gene can code for multiple proteins (protein isoforms).
It is done in eukaryotes, prior to mRNA translation, by the
differential inclusion or exclusion of regions of pre-mRNA.
It is an important source of protein diversity.
During a typical gene splicing event, the pre-mRNA
transcribed from one gene can lead to different mature
mRNA molecules that generate multiple functional proteins.
In conclusion:
Gene splicing enables a single gene to increase its coding
capacity, allowing the synthesis of protein isoforms that are
structurally and functionally distinct. Gene splicing is
observed in high proportion of genes. In human cells, about
40-60% of the genes are known to exhibit alternative
76
splicing.
ene Splicing Mechanism

There are several types of common gene splicing events. These
are the events that can simultaneously occur in the genes after
the mRNA is formed from the transcription step of the central
dogma of molecular biology.
Exon Skipping: This is the most common known gene splicing
mechanism in which exon(s) are included or excluded from the
final gene transcript leading to extended or shortened mRNA
variants. The exons are the coding regions of a gene and are
responsible for producing proteins that are utilized in various
cell types for a number of functions.
Intron Retention: An event in which an intron is retained in
the final transcript. In humans 2-5 % of the genes have been
reported to retain introns. The gene splicing mechanism retains
the non-coding (Intron) portions of the gene and leads to a
deformity in the protein structure and functionality.
Alternative 3' Splice Site and 5' Splice Site: Alternative
gene splicing includes joining of different 5' and 3' splice site. In
77
Fig.. Gene models are

depicted
as
exons
(colored
rectangles)
connected
by
introns
(black
lines).
Green
arrows
indicate
transcription
initiation
sites, dotted lines indicate
splicing
patterns
and
polyadenylation sites are
denoted as poly (A).
The
mRNA
products
generated by each type
of AT are shown to the
right of each gene model.
Simple transcription is
contrasted
with
alternative
transcript
initiation, the five major
classes
of
alternative
splicing,
and
alternative
It should
be coupled
with
polyadenylation.
each
exon escaping, In
why?
model, yellow exons are
78
constitutive
and
blue
Alternative cleavage and polyadenylation: extent, regulation and function
, Alejandro P. Ugalde1 & Reuven Agami1

Nature Reviews Genetics Volume:14,:496506 (2013)DOI:doi:10.1038/nrg3482
Ran Elkon 1
The four different APA types
The
simplest
alternative
polyadenylation
(APA)
type,
which is termed tandem 3
untranslated region (UTR) APA,
involves
the
occurrence
of
alternative poly(A) sites within
the same terminal exon and
hence
generates
multiple
isoforms that differ in their 3UTR
length without affecting the
protein encoded by the gene.
The other three types involve
APA events, which potentially
affect the coding sequences in
addition to the 3UTRs. These
types are: alternative terminal
exon APA, in which alternative
splicing generates isoforms that
differ in their last exon; intronic
APA, which involves cleaving at
the cryptic intronic poly(A) signal
(PAS), extending an internal exon
and making it the terminal one;
and internal exon APA, which
79
involves
premature
polyadenylation
within
the
Sequence-based methods for profiling

transcript diversity
Hypothetical transcript
sequences consisting of
exons (green rectangles)
with intervening introns
(black
lines)
are
depicted
as
gapped
alignments
to
a
reference genome.
The
following
tracks
represent
sequences
generated
by
each
sequence-based
method. Human genes
have an average of 10
exons with an average
length of 250 bp. The
methods are displayed
in order of least to most
quantitative.
Abbreviations:
(EST)
expressed sequence tag;
80
(SAGE) serial analysis of
Trans-splicing: It refers to exons located on separate pre-mRNA

molecuale (intragenic and/or intergenic trans-splicing) are
selectively joined to produce mature mRNA encoding proteins
with distinct features and functions
Eukaryotic 81
genomes cont
Eukaryotic 82
genomes cont
Take Home Message!

Do you noted that annotation of eukaryotic genome is so complex
than prokaryotic genomes, Why?
What does mean the prokaryotic genome is more dense/compact
than Eukaryotes, how do you explain this statemet.
Eukaryotic 83
genomes cont
Pseudogenes
The term, coined in 1977 by Jacq, et al., is composed of the
prefix
pseudo, which means false, and the root gene, which is the
central unit of
molecular genetics.
They are dysfunctional relatives of genes that have lost their
protein-coding
ability or are otherwise no longer expressed in the cell
Some do not have introns or promoters (these pseudogenes
are copied from
mRNA and incorporated into the chromosome and are called
processed
pseudogenes)
most have some gene-like features (such as promoters, CpG
islands, and splice
Jacq C,
Millerthey
JR, Brownlee
GG. A pseudogene
structure
in 5S DNA
of to
sites),
are nonetheless
considered
nonfunctional,
due
Xenopus
laevis.
their lack
of
Cell 12:109-120.
1977. resulting
protein-coding
ability
from
various
genetic
disablements (premature
84
stop codons, frameshifts, or a lack of transcription)Eukaryotic
or their
genomes cont
85
Eukaryotic
genomes cont
Pseudogenes cont
Processed
Pseudogen
e!
Repeat
motifs
Figure X: Origins of pseudogenes: A. Retrotransposed pseudogenes: starting from the original gene (the
coding sequences are in black, the non-coding introns in gray, and the promoter element is indicated by the
large arrow upstream of the gene), transcription generates a primary mRNA (black and gray broken line), from
which the introns are excised by RNA splicing. This mature mRNA, which contains only exons and a polyadenosine tail, is transcribed back into DNA by enzymes called reverse transcriptases, and the DNA is
reinserted back into the genome. Hence, the pseudogene product will lack intron and promoter sequences, and
will bear characteristic repeat sequences at the insertion site, due to the integration mechanism. B. Duplicated
pseudogenes: DNA duplication generates a more-or-less faithful copy of the original gene, including introns
86
and, in many cases, promoter and other transcriptional regulatory elements. In most cases, this duplicated
gene will undergo crippling, inactivating mutations and turn into a pseudogene (in rarer cases, the duplicated
Pseudogenes cont
Pseudogenes from the point of view of
genome annotation
Pseudogenes are quite difficult to identify and characterize in genomes,
because the two requirements of homology and nonfunctionality are
implied through sequence calculations and alignments rather than
biologically proven.
Homology is implied by sequence identity between the DNA sequences
of the pseudogene and parent gene. After aligning the two sequences,
the percentage of identical base pairs is computed. A high sequence
identity (usually between 40% and 100%) means that it is highly likely
that these two sequences diverged from a common ancestral sequence
(are homologous), and highly unlikely that these two sequences were
independently created.
Nonfunctionality can manifest itself in many ways. Normally, a gene
must go through several steps in going from a genetic DNA sequence
to a fully functional protein: transcription, pre-mRNA processing,
translation, and protein folding are all required parts of this process. If
any of these steps fails, then the sequence may be considered
Eukaryotic 87
nonfunctional.
genomes cont
enome and gene duplication

Genome and gene duplication can occur by several mechanisms:
I.
Polyploidization - Autopolyploidization and Allopolyploidization

based on genome
origin
II. Segmental duplication and

III. Tandem gene duplication
Polyploidy is thought to be rare in animal because polyploidy can

disrupt dosage compensation that is required for genetic balance.
How do they do dosage balance? Why? Hint: Genome imprinting!
Eukaryotic 88
genomes cont
Gene Duplication Cont
2)
1)
3)
4)
(i.e. NeoSubfunctionalization)
functionalization
Fig. Four scenarios for the outcome of gene

duplication
Eukaryotic 89
genomes cont
Eukaryotic 90
genomes cont
Gene Duplication Cont .
Eukaryotic 91
genomes cont

Neofunctionalization, one of the possible outcomes of
functional divergence, occurs when one gene copy, or paralog
, takes on a totally new function after a gene duplication
event.
Neofunctionalization is an adaptive mutation process;
meaning one of the gene copies must mutate to develop a
function that was not present in the ancestral gene.
Eukaryotic 92
genomes cont

Subfunctionalization is one of the possible outcomes of
functional divergence that occurs after a gene duplication event, in
which pairs of genes that originate from duplication, or paralogs,
take on separate functions.
Subfunctionalization is a neutral mutation process; meaning that no
new adaptations are formed. During the process of gene
duplication paralogs simply undergo a division of labor by retaining
different parts (subfunctions) of their original ancestral function.
This partitioning event occurs because of segmental gene silencing
leading to the formation of paralogs that are no longer duplicates,
because each gene only retains a single function.
It is important to
Eukaryotic
genomes cont
note
that
the
ancestral gene was
capable
of
performing
both
functions and the
descendant
duplicate
genes
can
now
only
perform one of the
original
ancestral
93
Chromatin Structure
Chromatin is a term designating the structure in which DNA exists
within cells. The structure of chromatin is determined and stabilized
through the interaction of the DNA with DNA-binding proteins.
There are 2 classes of DNA-binding proteins. The histones are the major
class of DNA-binding proteins involved in maintaining the compacted
structure of chromatin.
There are 5 different histone proteins identified as H1, H2A, H2B, H3
and H4 (Core Histone) .
The other class of DNA-binding
proteins is a diverse group of
proteins called simply, nonhistone proteins. This class of
proteins includes the various
transcription
factors,
polymerases,
hormone
receptors and other nuclear
enzymes. In any given cell there
are greater than 1000 different
types of non-histone proteins
Fig. Structure of the
bound to the DNA.
chromosome
94
Eukaryotic
genomes cont
Chromatin Structure Cont

The binding of DNA by the histones
generates a structure called the
nucleosome.
The nucleosome core contains an
octamer
protein structure consisting of 2
subunits
each of H2A, H2B, H3 and H4.
Histone
H1
occupies
the
internucleosomal
DNA and is identified as the linker
histone.
The
nucleosome
approximately
150 bp of DNA.
core
contains
The
linker
DNA
between
each
nucleosome
can vary from 20 to more than 200
Eukaryotic
genomes cont
95

Chromatin is found in two
varieties:
euchromatin
and
heterochromatin. Originally, the
two forms were distinguished
cytologically by how intensely
they stained
Euchromatin is less intense, while
heterochromatin stains intensely,
indicating tighter packing.
Heterochromatin mainly consists
of
genetically
inactive
satellite sequences, and many
genes are repressed to various
extents, although some cannot be
expressed in euchromatin at all.
Both centromeres and telomeres
are heterochromatic, as is the
Barr body
of
the
second,
inactivated X-chromosome in a
female.
Eukaryotic
genomes cont
96

I. Heterochromatin is a tightly packed form of DNA, which comes
in different varieties. These varieties lie on a continuum between
the
two
extremes
of
Constitutive
and
Facultative
heterochromatin.
Both play a role in the expression of genes, where constitutive
heterochromatin
can
affect
the
genes
near
them
(
position-effect variegation) Facultative heterochromatin is the
result of genes that are silenced through a mechanism such as
histone methylation or siRNA through RNAi.
The regions of DNA packaged in facultative heterochromatin will not
be consistent between the cell types within a species, and thus a
sequence in one cell that is packaged in facultative heterochromatin
(and the genes within poorly expressed) may be packaged in
euchromatin in another cell (and the genes within no longer silenced).
However, the formation of facultative heterochromatin is regulated,
and is often associated with morphogenesis or differentiation.
Eukaryotic
genomes cont
97
DNA Modification and Genome

Expression
Important alternation of genome activity can also be
achieved by
making chemical changes to the DNA itself.
These changes are associated with the semi-permanent
silencing of the
genome, possibly entire chromosome, and often the
modified state is
inherited by the progeny arising from cell division.
The modification are brought about by DNA methylation.
CpG islands or CG islands are genomic regions that
contain a high
frequency of CpG sites but to date objective definitions
for CpG islands
are limited.
In mammalian genomes, CpG islands are typically 30098
Eukaryotic
3,000 base pairs
genomes cont
DNA Modification cont
Mutatio
n
Fig. Methylcytosine forms the same base pair with

guanine as cytosine, because the methyl group does
not block the formation of the inter-base hydrogen
bonds.
Fig. Deamination of 5-methylcytosine

to
thymine
has
led
to
the
replacement of CpG sequences with
TpA over time.
Fig. When cytosine is deaminated, it

becomes
uracil.
Repair
enzymes
recognize this as an abnormal DNA base
and replace the uracil with a cytosine.
However, when 5-methylcytosine is
deaminated, it becomes thymine, which
replaces the cytosine.
The proof
reading enzyme may keep the change
and edit G into A.
Eukaryotic
genomes cont
99
DNA Modification Cont
Figure 3. Methylation of CpG islands, together with histone deacetylation

and other modifications, silences genes through the mechanism of
chromatin remodeling and heterochromatin formation.
Figure 2. Methylation of
silences gene expression.
CpG
islands
Figure 4. Methylation of CpG

islands leads to long term
gene silencing.
The region also called DNA methylation Domain (DMA) or Imprinting

Eukaryotic
Control Region (ICR)
genomes cont
100
Eukaryotic
genomes cont

Types of methylation
There are two types of methylations:
1. Maintenance methylation
In order for genes to remain permanently
silenced by this mechanism, the DNA
methylation patterns must be stably
transmitted to daughter cells. This is
accomplished through the activity of DNA
maintenance methylase, which detects
CpG methylation in one strand of inherited
DNA and methylates the other daughter
strand, (Figure to the right).
2. De novo methylation
It adds methyl groups at totally new
position and so change the pattern of
methylation in a localized region of the
Reading
genomeassignment
Methylation is involved in genome imprinting
and X chromosome inactivation, How? Its
significance in dosage balance (in human)
Fig. Maintaining
methylation
methylation
and
de
novo
101
Histones
In biology, histones are highly alkaline proteins found in eukaryotic cell nuclei that
package and order the DNA into structural units called nucleosomes. They are the chief
protein components of chromatin, acting as spools around which DNA winds, and play a
role in gene regulation. Without histones, the unwound DNA in chromosomes would be
very long (a length to width ratio of more than 10 million to one in human DNA).
For example, each human cell has about 1.8 meters of DNA, but wound on the histones it
has about 90 micrometers (0.09mm) of chromatin, which, when duplicated and
condensed during mitosis, result in about 120 micrometers of chromosomes.
H
1
Eukaryotic
genomes cont
102
Histones Cont
Class of
Histone
Five major families of histones exist: H1/H5, H2A, H2B, H3, and H4.
Histones H2A, H2B, H3 and H4 are known as the core histones, while histones H1 and H5 are
known as the linker histones.
Two of each of the core histones assemble to form one octameric nucleosome core particle,
and 147
base pairs of DNA wrap around this core particle 1.65 times in a left-handed superhelical turn.
The linker histone H1 binds the nucleosome and the entry and exit sites of the DNA, thus
locking the
DNA into place and allowing the formation of higher order structure.
The most basic such formation is the 10nm fiber or beads on a string conformation. This
involves the
wrapping of DNA around nucleosomes with approximately 50 base pairs of DNA
separating each pair of
nucleosomes (also referred to as linker DNA).
The assembled histones and DNA is called chromatin.
Higher-order structures include the 30nm fiber (forming an irregular zigzag) and 100nm
fiber, these
being the structures found in normal cells. During mitosis and meiosis,
the condensed
103
Eukaryotic
chromosomes are
genomes cont
Histones Cont
Fig. Histone modifications regulates chromatin

structure and
functions
Chemical Modifications of Histone

Ac attachment of acetyl
group to lysine amino acids
in the N- terminal regions of
each of the core molecules.
The enzyme mediate the
acetylation is histone acetyltransferase (HAT)
Ac reduces the affinity of the
histone for DNA and possibly
reduces the interaction bln
individual
nucleosomes,
destablizing
the
30nm
chromatin fiber.
Hetrochromatin
unacetylated whereas those

in functional domains are
acetylated this indicate the
mecahnism is important for
DNA packaging and gene
expression regulation
Gene
activation
often
Eukaryotic 104
genomes cont
reversible the deacetylation
Histones Cont
Lysine acetylation is not the only type of histone modification but
the best
studied form of histone modification
Methylation of lysine and Argenine residues of the N-terminal
region of H3 and
H4, it is reversible event.
Phosphorylation of serine residues in the N-terminal regions of
H2A, H2B, H3
and H4
Ubiquitination of lysine residues at the C termini of H2A and H2B.
This
modification involves addition of the samll, common
(ubiquitous) protein
called ubiquitin or s related protein rather than unhelpfully
called SUMO.
Eukaryotic
genomes cont105
Histone Cont
Chromatin structure role of acetylation

Some coactivators have HAT (Histone
acetyltransferase) activity
Links histone acetylation, chromatin structure and
gene activation
HAT activity of co-activator acetylates core

histones bound to promoter DNA causing
release of nucleosome core particles or loosening of
histone-DNA interaction
Subsequent binding of transcription factors and RNA
polymerase
Once transcription is initiated RNA polymerase is
able to transcribe DNA packaged into nucleosomes
Acetylation is dynamic enzymes also remove

acetyl groups (Histone deacetylases (HDACs))
106
Histone Cont
Chromatin structure role of deacetylation

Removal of acetyl groups
Histone deacetylases (HDACs)
HDACs associated with transcriptional repression

HDACs are subunits of larger complexes
corepressors
HDACs guided to regions of DNA by methylation
patterns
Example:
Inactive X chromosome of female
Largely deacetylated histones
Active X chromosome has a normal level of histone
acetylation
Hemizygous:
107
Human chromosome Karyotype
108
Histone Cont
Chromatin structure Acetylation / Deacetylation
109
Histone Cont
Chromatin structure Acetylation / Deacetylation
110
END OF THE LESSON
111

Chapter I

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter I

Uploaded by

Copyright:

Available Formats

enome Science (Genomic

Dereje Beyene (PhD)

Definition of the term

The term Genomics derived from the term genome

is used for the first time in 1986,

when sequencing and mapping of the entire human genome

molecular biology or genetics and is a

common topic of modern medical and biological research.

The values of genome sequence lies in their

Genes: Four levels of annotation

Genome sequence can

Everything about the organism's life

Its developmental program

Enables us to identify genes responsible for disease

Novel gene discovery

Where are we going and where we came from?

How similar are we to apes, trees, and yeast? (comparative

To define the minimum genome size of free living organisms

The Basic Scheme of genome annotation and

Fig. XX. Bioinormatics Uses Infromation Technology to

Manage and Analyze

MAJOR RESEARCH AREAS OF GENOMICS

Cyanobacteria are prokaryotic organisms that have served as

IV) Plant Genomics

Fig. Phylogenic tree up to date as of September 4th 2012

III) Human genomics

Community Genomics) is the study of genomes recovered

Metagenomics data processed using Bioinformatics tools

Scope of Genomics Cont

Fig. Scheme of the major stages of integrative metagenomic ecosystem study of

Examining phylogenetic diversity using

Fig. The operon of nuclear rRNA Map of bacteria

Why 16S rRNA for

Universal 16S RNA Primers For Metagenomics and/or

Map of Nuclear rRNA genes and Their ITS of

Some Primer Map of ITS1 and ITS2 regions of Nuclear

Diversity patterns of microorganisms can be used for

Examining genes/operons for desirable enzyme candidates

Examining secretory, regulatory, and signal transduction

Why is Metagenomics Important?

All reasons lead to more knowledge:

Total community Genomes

Thus, Pharmacogenomics is a field of study aim to elucidate the

Better patient treatments through advanced diagnostics and

Drug metabolizing enzymes,

Genetic variability is seen both in

E.g. Polymorphisms: Thiopurine S-methyltransferase

Genetic polymorphisms that affect this enzymatic activity are correlated

Intolerance (defintion in medicine): inability to withstand or

Intolerance (defintion in medicine): inability to withstand or

gnificance of Genome size in Genomics

Genome size Cont

Genome size Cont

Aplaha- proteobacteria live in the ocean, 25% abundance

Genome size Cont

Synthetic Biology and

Synthetic Biology: its primary focus is

Streamlining genomes of model bacteria revealed genome

omise of Synthetic Biolog

and also Linearly arreng

karyotic & Eukaryotic Ribosomes

e Genetic Features of Prokaryotic Genomes

Fig X. A dsDNA molecule has six

Gene organization in the prokaryotic genome