You are on page 1of 111

enome Science (Genomic

Dereje Beyene (PhD)


College of Natural Science
Microbial Cellular and Molecular Biology
Department
Nov, 2013/14

Francis Crick

James Watson

Figure X:DNA is a double helix. (A) Francis Crick (left) and James Watson (right)
proposed that the DNA molecule has a double-helical structure. (B) Biochemists can now
pinpoint the position of every atom in a DNA molecule. To see that the essential features
of the original Watson-Crick model have been verified, follow with your eyes the doublehelical chains of sugar-phosphate groups and note the horizontal rungs of the bases. 2

Definition of the term


Genomics

The term Genomics derived from the term genome


The term genomics

is used for the first time in 1986,

when sequencing and mapping of the entire human genome


initiated
Genomics: it is the field of genome studies and includes
intensive effort to determine the sequence of DNA and RNA
using high-throughput sequencing strategies; generating fine
scale genetic maps; microchips arrays, collecting genome
variations within a population (e.g. 1000 genome project) and
ascertaining the transcriptional control genes: and employing
digital technology and computational intensive analysis to
understand the structure, function and evolution of diverse
3

Genomics Cont.....
Genomics provides essential tools to speed up the work of the
forward geneticists and is now a scientific discipline in its own right.

Genetics

Genetics
Application of genomics science in all areas of biology has
lowered the barriers that once separated the plant, animal, microbial
research communities.
Experimental

techniques

for

studying

transcriptomes

and

proteomes are providing novel insights into genome expression and the
new discipline of systems biology is linking genome biology with4

Genomics Cont.....
The investigation of the roles and functions of single genes
is a primary focus of

molecular biology or genetics and is a

common topic of modern medical and biological research.


Research of single genes does not fall into the definition of
genomics unless the aim of this genetic, pathway, and
functional information analysis is to elucidate its effect on, place
in, and response to the entire genome's networks.
A genome is the sum total of all an individual organism's genes.
Thus, genomics is the study of all the genes of a cell, or
tissue, at the DNA (genotype), mRNA (transcriptome), or
protein (proteome) levels.
5

Genomics Cont.....
Genome: The entire genetic complement of a living organism
Gene:
A DNA segment containing biological information and
hence coding for
RNA and/or polypeptide molecule
Genomic Expression: It is the series of events by which the
biological information carried by the genome is released and
made available to the cell.

The values of genome sequence lies in their


annotations:
Genome Annotation: Characterizing genomic features using
computational and
experimental methods (Functional
genomics).

Genes: Four levels of annotation


Gene prediction where are genes?
What do they look like?

Genome sequence can


tell us

Everything about the organism's life

Its developmental program

Enables us to identify genes responsible for disease


resistance or susceptibility

Novel gene discovery

Where are we going and where we came from?


.Evolution

How similar are we to apes, trees, and yeast? (comparative


genomics)

To define the minimum genome size of free living organisms


then exploit for the structure of minimal synthetic genome
that can support life(corner stone for Synthetic Biology)

The Basic Scheme of genome annotation and


Delivery Pipeline
STS: Sequence
Delivery Pipe Line

Analysis Pipe
Line

Taged Site

Fig. XX. Bioinormatics Uses Infromation Technology to

Manage and Analyze


Information Generated by the Life Sciences

Out
Come

MAJOR RESEARCH AREAS OF GENOMICS


Bacteriophage Genomics
Bacteriophages have played and continue to play a key role in
bacterial genetics and molecular biology. E.g. Cloning vector, M13
Vector ......
Bacteriophage genome sequences can be obtained through
direct sequencing of
isolated bacteriophages, but can also be derived as part of
microbial genomes
(How?).
Analysis of bacterial genomes has shown that a substantial
amount of microbial
N.B: Bacteriophage genomes are especially mosaic: the genome
DNA consists of prophage sequences and prophage-like elements.
of any one phage species appears to be composed of numerous
A detailed database mining of these sequences offers insights
individual modules. These modules may be found in other phage
into the role of
10
species in different arrangements.
prophages in shaping the bacterial genome

) Cyanobacteria Genomics

Cyanobacteria are prokaryotic organisms that have served as


important model organisms for studying oxygenic photosynthesis
and have played a significant role in the Earths history as primary
producers of atmospheric oxygen.

11

IV) Plant Genomics


Recent technological advancements have substantially expanded our
ability to analyze and understand plant genomes and to reduce the
gap existing between genotype and phenotype.
The fast evolving field of genomics allows scientists to analyze
thousand of genes in parallel, to understand the genetic
architecture of plant genomes and also to isolate the genes
responsible for mutations.
Whole plant genomes (about 33 plant species) can now be
sequenced available at http://phytozome.net/)

12

Fig. Phylogenic tree up to date as of September 4th 2012


Source: http://genomevolution.org/wiki/index.php/Sequenced_plant_g 13
enomes

III) Human genomics


The Human Genome Project (HGP,
it was
initiated 1990) is an
international scientific
research project with a primary
goal of
determining the sequence of
chemical base
pairs (nitrogenous bases; Purine (A
& G);
Pyrimidine (T & C)) which make up
DNA,
and of identifying and mapping the
approximately 20,000-25,000 genes
of the
human genome from both a physical
Display of the results of the project required significant bioinformatics resources. The
and
sequence of the human reference
assembly can be explored using the
UCSC Genome Browser or Ensembl.
functional standpoint.
The estimated
size of Cont
human genome
Scope of Genomics
14
size is

Organization Of Human
Genome

15
Scope of Genomics Cont

V) Metagenomics
Definition:

Metagenomics

Environmental

Genomics

or

Community Genomics) is the study of genomes recovered


from environmental samples without the need for culturing
them

Metagenomics data processed using Bioinformatics tools

Metagenomics:
It is powerful tool to reveal the previously hidden
diversity of microscopic life (culture based studies).
Thus, it offers a powerful lens for viewing the
microbial world that has the potential to
revolutionize the understanding of the entire
microscopic organisms (bacteria and fungi).
16

Scope of Genomics Cont

Fig. Scheme of the major stages of integrative metagenomic ecosystem study of


microbial ecology
17

Examining phylogenetic diversity using


variable region of 16S rRNA (Bacteria)

hyper

Fig. The operon of nuclear rRNA Map of bacteria


18

Why 16S rRNA for


Bacteria (Eubacteria
and archea )
Conserved regions
metagenomics?
of the gene are
identical for all
bacteria while the
variable
regions
contain
specific
sites
unique
to
individual bacteria.
The
uniqueness
(V1 to V9) enables
taxonomic
positioning
and
identification
of
bacteria.

19

Universal 16S RNA Primers For Metagenomics and/or


Species Identification Of Microbial Isolates
Importance of PCR primer pair selection
Broad-range primers used in the PCR reaction preceding the
sequencing target the conserved regions of the 16S rRNA gene
in order to unselectively amplify all bacterial DNA present in the
sample.
Therefore, critical evaluation of the primer pair coverage is a
prerequisite for unbiased and comprehensive sequence
information.
Microbial DNA sequence analysis pipeline
Microbial genomic DNA extraction from the samples
PCR amplification of 16S rRNA genes with the most conserved
universal
primers and constructing into a cloning vector
Sequencing of the PCR products
Quality confirmation of the resulting 16S rRNA gene fragments
Removal of vector and primer sequences
Data validation and consensus building
20
Comparison of the sequence data with public databases for

Map of Nuclear rRNA genes and Their ITS of


Fungi

21

Some Primer Map of ITS1 and ITS2 regions of Nuclear


rRNA of Fungi

22

Aims of Metagenomics

Diversity patterns of microorganisms can be used for


monitoring and predicting environmental conditions and
change. How? e.g. Microbiome in human gut!!

Examining genes/operons for desirable enzyme candidates


(e.g., cellulases, chitinases, lipases, antibiotics, other natural
products
Identified genes may be exploited for industrial or medical
applications.

Examining secretory, regulatory, and signal transduction


mechanisms associated with samples or genes of interest.
Examining bacteriophage and/or plasmid sequences.
These potentially influence diversity and structure of
microbial communities.
Examining potential lateral gene transfer events. Knowledge
of genome plasticity may give us an idea of types selective
pressures exist for gene capture and evolution within a
habitat.
23

Metagenomics Cont
Examining metabolic pathways
Facilitate towards designing culture media for the
growth of previously-uncultured microbes.
Examining genes that predominate in a given
environment compared to others.
Finally, metagenomic data and metadata can be
leveraged towards designing low- and highthroughput experiments focused on defining the
roles of genes and microorganisms in the
establishment of a dynamic microbial community.

24

Why is Metagenomics Important?

All reasons lead to more knowledge:


Organisms can be studied directly in their environments
bypassing the need to isolate each species
There
are
significant
advantages
for
viral
metagenomics, because of difficulties cultivating the
appropriate host, How? bacteriophages!!!
It is important to designing low- and high-throughput
experiments focused on defining the roles of genes and
microorganisms in the establishment of a dynamic
microbial community.

25

Microbial
diversity

Communit
y
26
genomics

27

Microbial
community

DN
ex A
tra
cti
o

Total community Genomes


Microbial
(DNA)
ty

uni
m
m
Co
ling
p
m
Sa oAmplifying
ach
r
p
ap
single gene,
e.g. 16S rRNA
nce
e
qu
Se
d
e
an erat
n
ge
e
tre

Phylogenetic tree

Outcome
s:

1. Phylogenetic
snapshot of most
members of the
community
2. Identification of
novel phylotype

communit
y
REs digest total
DNA and then
shotgun
Assembled
sequencing
and
annotation

genome
s
Total gene pool of
the community
1. Identification of all
genes categories
2. Discovery of new
genes
3. Linking genes to
particular phylotypes

28

29

Pharmacogenomics

Pharmacogenomics:
There is an inter-patient medication response variability in
their efficacy
and toxicity
Inter-patient variability are due to in part to polymorphisms
(the frequency
of the most frequent allele is 99%) in genes encoding for:
I.Drug Metabolizing enzymes,
II.Drug transporters, and/or
III.Drug targets (e.g. enzymes, receptors)

Thus, Pharmacogenomics is a field of study aim to elucidate the


genetic basis for differences in drug efficacy and toxicity, and
it uses genome wide approaches to identify the network of
genes that govern an individuals response to drug therapy.
30

Population

DNA sequencing
of Target gene

Better patient treatments through advanced diagnostics and


personalized medicine Diagnostic tests will guide the clinical
decision-making to prescribe a specific drug, depending on
the patients prognosis to be a responder or non-responder
to a given medication.
31

ome Terminologies
1) Allelic variant: A variation in the normal sequence of a gene.
2) Genotype: The genetic formation or the genetic makeup of an
orgnaism.
3) Genotype-phenotype correlation: The association between the
presence of genetic variation and the resulting physical
characteristics or abnormality.
4) Pharmacogenetics vs. Pharmacogenomics:
Pharmacogentics: it is the study of genetic variation in drug
metabolizing enzymes and the effect on drug response/ it is often a
study of the variations in a targeted gene, or group of functionally
related genes.
Pharmacogenomics represents the general study of the entire
spectrum of genes that affect drug behavior, i.e. It is a much
broader investigation of genetic variations at the level of the
genome.
5) Phenotype: Observable features of the expression of genes.

32

Drug metabolizing enzymes,


DMEs
(Phase
I
enzymes/Cytochrome
P450
enzymes, e.g. CYP2D6; Phase II
enzymes,
e.g.
N-acetyl
transferases)
Drug
transporters
(Solute
Carrier (SLC)- and ATP Binding
Cassette
(ABC)-transporters,
e.g. organic cation transporters,
OCTs, as members of the SLC
family)
Drug
receptors
(ligand
controlled ion channels or class
1 receptors, e.g. glutamate
receptor;
G-protein
coupled
receptors (GPCRs) or class 2
receptors,
e.g.
-receptor;
enzymatic receptors, e.g. insulin
receptor; receptors regulating
gene expression, e.g. steroid
hormone receptor)

Genetic variability is seen both in


the area of pharmacokinetics
(absorption,
distribution,
metabolism and excretion) and in
the area of pharmacodynamics
(drug effects).

33

Pharmac

E.g. Polymorphisms: Thiopurine S-methyltransferase


monogenic traits have a marked effect on pharmacokinetics
(Drug metabolism); such individuals who inherit an enzyme
deficiency must treated with markedly different dose the
affected medications (E.g. 5%-10% of the standard
thiopurine dose)
Beta-aderenergic receptor - can alter the sensitivity of patient
N.B: Most drug effects are determined by the interplay of several gene
to
treatment
(e.g.
beta-agonists),
changing
the
products that govern the pharmacokinetics (drug absorption,
pharmodynmics
of drug response.
distribution metabolism
and excretion); and pharmacodynamics
(effect of drug and mechanisms of action) of medications.
[Pharmacokinetics may simply defined as what the body does to the
drug, whereas Pharmacodynamics which may be defined as what
the drug does to the body]
The goal of Pharmacogenomics research is to elucidate these
polygenic determinants of drug effect.
Research outcome of the pharmacogenomics:
34
Provide new strategies for optimizing drug therapy based on each

35

Pharmac
TPMT: Thiopurine S-methyltransferase
This gene encodes the enzyme that metabolizes thiopurine drugs via Sadenosyl-L-methionine as the S-methyl donor and S-adenosyl-Lhomocysteine as a byproduct.
Thiopurine
drugs
such
chemotherapeutic agents.

as

6-mercaptopurine

are

used

as

Genetic polymorphisms that affect this enzymatic activity are correlated


Table : Summary of TPMT
with variations in sensitivity and toxicity to such drugs within individuals.
Deficiency:

Intolerance (defintion in medicine): inability to withstand or


consume; in ability to absorb or metabolise nutrients.
36

Intolerance (defintion in medicine): inability to withstand or


consume; in ability to absorb or metabolise nutrients.
Drug intolerance:
I. Inability to continue taking, or difficult to take, a medication
blc of an adverse side effect that is not immunity mediated
II. The state of reacting to the normal pharmacologic doses of a
drug with the syptome of overdosage
ne
i
s
Bu

ss

Pharmac

The methyl group (CH3) attached to the methionine sulfur atom in SAM is chemically
reactive. This allows donation of this group to an acceptor substrate in transmethylation
reactions. More than 40 metabolic reactions involve the transfer of a methyl group from
SAM
to
various
substrates,
such
as
nucleic acids,
proteins,
lipids
and
37
secondary metabolites.

Pharmac
1. Thiopurine S-methyltransferase
(EC:2.1.1.67)
2. Hypoxanthine phosphoribosyltransferase
[EC:2.4.2.8]

http://www.kegg.jp/kegg-bin/show_pathway?hsa00983+7172

38

Pharmac

Pharmac

Pharmac

Summary
The adrenergic receptors (subtypes alpha 1, alpha 2, beta 1,
and beta 2) are a prototypic family of guanine nucleotide
binding regulatory protein-coupled receptors that mediate the
physiological effects of the hormone epinephrine and the
neurotransmitter norepinephrine. Specific polymorphisms in this
gene have been shown to affect the resting heart rate and can
be involved in heart failure

Pharmac

Pharmac

Potential of
identify
patients within
Pharmacogenomics

To
a population with the
same diagnosis, who are
genetically predisposed
either not to respond to
therapy or to develop
unacceptable
toxicity,
an then to prospectively
alter their therapy to avoid
treatment that is not likely
to optimal. The remaining
now more homogeneous
population,
can
then
treated with conventional
therapy inwhich they are
not
genetically
The approaches
promise the advent of personalized
predisposed
medicine; to
infail.
which drugs and drug combinations are
optimized for each individual's unique genetic makeup.
44

45

gnificance of Genome size in Genomics


Definition: Genome size is the total amount of DNA contained
within one copy a Genome (haploid). It is measured by picogram
(pg), mega base pair (Mb). There is a genome size database, you
Thesearch
significance
genome size:
can
and get of
theknowing
information.
It is importance to the genomics and broader scientific community as
fundamental
features of genome structure
It uses for genomics-based comparative biodiversity studies, and
It is a direct estimators of the cost and workload of genome projects

46
Scope of Genomics Cont

Genome size Cont

Mycoplasma genitalium (genome size; 0.58 Mb) is a small parasitic bacterium that lives on
the ciliated epithelial cells of the primate genital and respiratory tracts. M. genitalium is the
smallest known genome that can constitute a cell, and the second-smallest bacterium after
the endosymbiont Carsonella ruddii. Until the discovery of Nanoarchaeum in 2002, M.
genitalium was also considered to be the organism with the smallest genome.
N.B: There is a difference between smallest parasitic bacteria and smallest free
47
living bacteria.

Genome size Cont

Aplaha- proteobacteria live in the ocean, 25% abundance


The first cultures members of the clade
The smallest genome and free living

48

Genome size Cont

49

Synthetic Biology and


Genome Size

Synthetic Biology: its primary focus is


building a minimal arteficial cell.

50

Streamlining genomes of model bacteria revealed genome


reduction lead to unanticipated beneficial properties such as:
High electroporation/transformation efficiency
Accurate propagation of recombinant genes and plasmids
Suitable to construct robust minimal synthetic genome, it
provides a
minimal cell a good chassis to assemble kinds of functional
modules.

Synthetic Biology is
referes to reliabley
engineer biological
systems that perform
human-defined
function.
C. Smolke (Nature 441: 277279)
51

omise of Synthetic Biolog


The field of synthetic biology holds a great promise for:
Design,
Construction and
Development of artificial (i.e. man-made) biological
(sub)systems
Thus offering potentially viable new routes to
genetically modified' organisms, smart drugs as well as
model systems to examine artificial genomes and
proteomes.
The
informed
manipulation
of
such
biological
(sub)systems could have an enormous positive impact on
our societies, with its effects being felt across a range of
activities such as the provision of healthcare, environmental
protection and remediation, etc.
52

enome Organization

and also Linearly arreng

53

karyotic & Eukaryotic Ribosomes

54

Single,
Circular
DNA
molecule, localized within
nucleoid
(the
lightly
staining
area
in
the
center of the cell)
Linearly arrenged,
histone
complex
and packagined in
organized passion

55

rokaryote genomes

Example: E. coli
89% coding
> 4,000 genes
122 structural RNA genes
Prophage remains
Transposable elements: Insertion
sequence (IS) and Transposons
(Composite and non-composite)
Horizontal transfers (conjugation)
56

e Genetic Features of Prokaryotic Genomes


Genome sequence inspection can be used to locate genes blc genes
are not random series of Nucleotides but instead have distinct
features

Fig X. A dsDNA molecule has six


reading
frames
(computational
prediction!). Both strands are read in
the 5 to 3 direction. Each starnd has
three reading frames, depending on
which nucleotide is chosen as the
starting position.
However, simple ORF scans are less effective with genome of higher
Eukaryotes this is partly blc of their gene are often split by introns.
Fig X. A protein coding gene
is an open reading frame
(ORF) of triplet codons

57
Prokaryote

Gene organization in the prokaryotic genome

A
simplified
version
of
prokaryotic operon organization.
Genes A, B, and C are
transcribed together onto a
single polycistronic transcript,
which is then translated to
produce three separate proteins.

Promote
r

Represor
protein
encoding gene

Promote
r

Proteins originating from genes


of a common operon often have
similar
functions,
interact
physically
through
proteinprotein
interactions,
or
participate
in
shared
biochemical Structural
pathways.
Operator

gene(s)
58
Prokaryote
cont

Operon
In genetics, an operon is a functioning unit of genomic
DNA containing a cluster of genes under the control of a
single regulatory signal or promoter.
The genes are transcribed together into an mRNA strand
and either translated together in the cytoplasm, or
undergo trans-splicing to create monocistronic mRNAs
that are translated separately, i.e. several strands of
mRNA that each encode a single gene product. The result
of this is that the genes contained in the operon are
either expressed together or not at all.
In short, Several genes must be both co-transcribed
and co-regulated to define an operon.
It is the main features of prokaryotic genome, it is made
of four components: promoter, regulator, operator and

59

An operon is made up of four basic DNA


components:
Promoter a nucleotide sequence that enables a gene to be
transcribed. The promoter is recognized by RNA polymerase,
which then initiates transcription. In RNA synthesis,
promoters indicate which genes should be used for
messenger RNA creation and, by extension, control which
proteins the cell produces.
Regulator - a These genes control the operator gene in
cooperation with certain compounds called inducers and corepressors present in in the cytoplasm. A regulator gene is
not necessarily adjacent to the operator gene its controls.
The regulator gene codes for and produces a protein
substance called repressor. The repressor substance
combines with the operator gene to repress its action.
Operator a segment of DNA that a repressor binds to. It is
classically defined in the lac operon as a segment between
the promoter and the genes of the operon. In the case of a
60
repressor, the repressor protein physically obstructs the RNA

Prokaryotic genome cont


Operons (group of genes that are located adjacent to one another in
the genome, with perhaps just one or two nucleotides bln the end of
one gene and the start of the next) are the chractersitic of features
of prokaryotic genomes
All genes in an operon are expressed as a single unit
Hence, prokaryotic genenomes have more compact genetic
organization with a little space bln genes. How? Give your explanation
(Hint: compare with eukaryotic conding gene organization)
beta-galactosidase:
This enzyme hydrolyzes the bond
between the two sugars, glucose and
galactose. It is coded for by the gene
LacZ.
Lactose Permease:
This enzyme spans the cell
membrane and brings lactose into
the cell from the outside environment.
The
membrane
is
otherwise
essentially impermeable to lactose. It
is coded for by the gene LacY.
Thiogalactoside transacetylase:
The function of this enzyme is not
known. It is coded for by the gene
LacA.
Prokaryote

61

62

gulation of the tryptophan Operon

Premature
termination
The operon contains five structural genes involved in the biosyhthesis
of tryptophan: trpE, D, C, B and A.
Expression of these genes is controled at two levels:
1. The trpR gene encodes a repressor that in the presence of
tryptophan, bind to the operator (o) block transcription.
2. In addition, expression is mediated by an attenuator sequence that
prematurely terminates transcription when high levels of
tryptophan are present. In this case, the attenuated RNA consist of
only a short leader sequence (L). P = promoter
63

gulation of the tryptophan Cont..


Attenuation (in genetics) is a proposed
mechanism of control in some bacterial operons
which results in premature termination of
transcription and which is based on the fact that, in
bacteria, transcription and translation proceed
simultaneously.
Attenuation involves a provisional stop signal
(attenuator), located in the DNA segment that
corresponds to the leader sequence of mRNA.
During attenuation, the ribosome becomes stalled
(delayed) in the attenuator region in the mRNA
leader.
Depending on the metabolic conditions, the
64
attenuator either stops transcription at that point or

gulation of the tryptophan Cont..


Attenuation, or dampening, of
the trp operon is made
possible by the fact that the
rate of translation influences
RNA structure, which in turn
influences
the
rate
of
transcription.
Translation therefore interferes
with transcription, making this
an example of translationmediated
transcription
attenuation.
Mechanistically, this kind of
attenuation
is
achieved
because special sequences
located near the beginning of
the transcript, called the
leader (trpL), interact to
create two possible RNA
conformations:
one
that
terminates transcription (the
65
Fig. Mechanism of transcriptional attenuation of
terminator stem), and one that

ocating genes for functional RNA


ORF scan is approperiate for protein coding genes, but what about
those genes for functional RNAs such as rRNA and tRNA. They have
their own distinctive features, which can be used to aid their
discovery in the genome sequence.
They have the ability to fold into secondary stucture, such as the
cloverleaf (tRNA- intramolecular base pairing)

Anticodon

66
Prokaryote

Fig. X A 50 Kb segment of the E. coli genome.


Insertion sequence (IS) are examples of transposable
elemnts (TEs).

% of DNA Non-coding for Protein

According the diagram, when the


genome size increases the number of
non-protein coding genes also increases
Non-protein
coding
sequences
make up only a
small fraction of
Prokaryotic
genomes.

Prokaryote
67
cont

ukaryotic genome
They have a multiple linear
chromosomes,
each
chromosome containing multiple
origins of replication
(Think about the comparative genome sizes of the
two groups; Prokaryotes and Eukaryotes )

They have a specialized sequences at the ends of chromosomes


to ensure a proper replication of the essential components of
chromosomes (Telomeres), also protects them from nuclease
degradation
They have special sequences to ensure the correct segregation of
homologous chromosomes during cell divisions (Centromeres)
68

Eukaryotic genome Cont

How many genes are there? This question is surprisingly not very
important, and has nothing to do with the organisms complexity.
There is more to genomes than protein-coding genes alone.

Eukaryotic 69
genomes cont

Eukaryotic genome Cont

Protein-coding genes
Although most prokaryotic chromosomes consist almost entirely of
proteincoding genes, such elements make up a small fraction of
most eukaryotic
genomes ( Figure)
As a prime example, the human genome might contain as few as
20,000 genes,
comprising less than 1.5% of the total genome sequence

Eukaryotic 70
genomes cont

Eukaryotic
genomes cont

Introns
Shortly after their discovery, the non-coding intervening sequences
within coding genes (introns) were suggested to account for the
pronounced discrepancy between gene number and genome size. It has
also recently been suggested that most non-coding DNA in animals (but
not plants) is intronic, which would imply that most of the genome is
transcribed even though protein-coding regions represent a tiny
71
minority.
Eukaryotic
genomes cont

Fig. Initiation of Transcription in


Eukaryotes
transcription factor (sometimes called a sequence-specific DNA-binding
factor) is a protein that binds to specific DNA sequences, thereby
controlling the flow (or transcription) of genetic information from DNA72to

xon and Intron Splicing moifs Search


Most introns start from the dinucleotide GU (DNA, GT) and end with the
dinculeotide
AG (in the 5' to 3' direction, mRNA).
GT and AG are referred to as the splice donor and splice acceptor site,
respectively
These consensus sequences are known to be critical, because changing
one of the
conserved nucleotides results in inhibition of splicing.
Upstream from the AG there is a region high in pyrimidines (C and U), or
polypyrimidine tract. Upstream from the polypyrimidine tract is the
branch point.
The branch point always contains an Adenine, but it is
otherwise loosely conserved.
A typical sequence is YNYYRAY, where Y indicates a pyrimidine (C or
U), N denotes
73
Eukaryotic
any nucleotide, R denotes any purine
(Gcont
or A), and A denotes
genomes

YNYYRAY

polypyrimidine

Fig. Exon Intron Consensus sequences in Eukaryotes

Splicing is controlled by specific intron sequences, called splicedonor (GU) and splice-acceptor (AG) sequences, which flank the
exons. Mutations in these sequences may lead to retention of large
segments of intronic DNA by the mRNA, or to entire exons being
spliced out of the mRNA. These changes could result in production
of a nonfunctional protein.
Eukaryotic 74
genomes cont

Mechanism of pre-mRNA splicing


branch-point adenosine
5 splice site
5-

Exon 1

3 splice site

Intron

trans-esterification

trans-esterification
ligated exons

-3

3
(www.wisc.edu/pharm
)

Exon 2

pre-mRNA

Cut at 5 site, lariat


formation

Cut at 3 site, exon


joining, lariat release
lariat intron
3

Eukaryotic 75
genomes cont

lternative splicing
It is a post-transcriptional modification in which a single
gene can code for multiple proteins (protein isoforms).
It is done in eukaryotes, prior to mRNA translation, by the
differential inclusion or exclusion of regions of pre-mRNA.
It is an important source of protein diversity.
During a typical gene splicing event, the pre-mRNA
transcribed from one gene can lead to different mature
mRNA molecules that generate multiple functional proteins.
In conclusion:
Gene splicing enables a single gene to increase its coding
capacity, allowing the synthesis of protein isoforms that are
structurally and functionally distinct. Gene splicing is
observed in high proportion of genes. In human cells, about
40-60% of the genes are known to exhibit alternative
76
splicing.

ene Splicing Mechanism


There are several types of common gene splicing events. These
are the events that can simultaneously occur in the genes after
the mRNA is formed from the transcription step of the central
dogma of molecular biology.
Exon Skipping: This is the most common known gene splicing
mechanism in which exon(s) are included or excluded from the
final gene transcript leading to extended or shortened mRNA
variants. The exons are the coding regions of a gene and are
responsible for producing proteins that are utilized in various
cell types for a number of functions.
Intron Retention: An event in which an intron is retained in
the final transcript. In humans 2-5 % of the genes have been
reported to retain introns. The gene splicing mechanism retains
the non-coding (Intron) portions of the gene and leads to a
deformity in the protein structure and functionality.
Alternative 3' Splice Site and 5' Splice Site: Alternative
gene splicing includes joining of different 5' and 3' splice site. In

77

Fig.. Gene models are


depicted
as
exons
(colored
rectangles)
connected
by
introns
(black
lines).
Green
arrows
indicate
transcription
initiation
sites, dotted lines indicate
splicing
patterns
and
polyadenylation sites are
denoted as poly (A).
The
mRNA
products
generated by each type
of AT are shown to the
right of each gene model.
Simple transcription is
contrasted
with
alternative
transcript
initiation, the five major
classes
of
alternative
splicing,
and
alternative
It should
be coupled
with
polyadenylation.
each
exon escaping, In
why?
model, yellow exons are
78
constitutive
and
blue

Alternative cleavage and polyadenylation: extent, regulation and function

, Alejandro P. Ugalde1 & Reuven Agami1


Nature Reviews Genetics Volume:14,:496506 (2013)DOI:doi:10.1038/nrg3482
Ran Elkon 1

The four different APA types

The
simplest
alternative
polyadenylation
(APA)
type,
which is termed tandem 3
untranslated region (UTR) APA,
involves
the
occurrence
of
alternative poly(A) sites within
the same terminal exon and
hence
generates
multiple
isoforms that differ in their 3UTR
length without affecting the
protein encoded by the gene.
The other three types involve
APA events, which potentially
affect the coding sequences in
addition to the 3UTRs. These
types are: alternative terminal
exon APA, in which alternative
splicing generates isoforms that
differ in their last exon; intronic
APA, which involves cleaving at
the cryptic intronic poly(A) signal
(PAS), extending an internal exon
and making it the terminal one;
and internal exon APA, which
79
involves
premature
polyadenylation
within
the

Sequence-based methods for profiling


transcript diversity

Hypothetical transcript
sequences consisting of
exons (green rectangles)
with intervening introns
(black
lines)
are
depicted
as
gapped
alignments
to
a
reference genome.
The
following
tracks
represent
sequences
generated
by
each
sequence-based
method. Human genes
have an average of 10
exons with an average
length of 250 bp. The
methods are displayed
in order of least to most
quantitative.
Abbreviations:
(EST)
expressed sequence tag;
80
(SAGE) serial analysis of

Trans-splicing: It refers to exons located on separate pre-mRNA


molecuale (intragenic and/or intergenic trans-splicing) are
selectively joined to produce mature mRNA encoding proteins
with distinct features and functions
Eukaryotic 81
genomes cont

Eukaryotic 82
genomes cont

Take Home Message!


Do you noted that annotation of eukaryotic genome is so complex
than prokaryotic genomes, Why?
What does mean the prokaryotic genome is more dense/compact
than Eukaryotes, how do you explain this statemet.
Eukaryotic 83
genomes cont

Pseudogenes
The term, coined in 1977 by Jacq, et al., is composed of the
prefix
pseudo, which means false, and the root gene, which is the
central unit of
molecular genetics.
They are dysfunctional relatives of genes that have lost their
protein-coding
ability or are otherwise no longer expressed in the cell
Some do not have introns or promoters (these pseudogenes
are copied from
mRNA and incorporated into the chromosome and are called
processed
pseudogenes)
most have some gene-like features (such as promoters, CpG
islands, and splice
Jacq C,
Millerthey
JR, Brownlee
GG. A pseudogene
structure
in 5S DNA
of to
sites),
are nonetheless
considered
nonfunctional,
due
Xenopus
laevis.
their lack
of
Cell 12:109-120.
1977. resulting
protein-coding
ability
from
various
genetic
disablements (premature
84
stop codons, frameshifts, or a lack of transcription)Eukaryotic
or their
genomes cont

85

Eukaryotic
genomes cont

Pseudogenes cont

Processed
Pseudogen
e!

Repeat
motifs
Figure X: Origins of pseudogenes: A. Retrotransposed pseudogenes: starting from the original gene (the
coding sequences are in black, the non-coding introns in gray, and the promoter element is indicated by the
large arrow upstream of the gene), transcription generates a primary mRNA (black and gray broken line), from
which the introns are excised by RNA splicing. This mature mRNA, which contains only exons and a polyadenosine tail, is transcribed back into DNA by enzymes called reverse transcriptases, and the DNA is
reinserted back into the genome. Hence, the pseudogene product will lack intron and promoter sequences, and
will bear characteristic repeat sequences at the insertion site, due to the integration mechanism. B. Duplicated
pseudogenes: DNA duplication generates a more-or-less faithful copy of the original gene, including introns
86
and, in many cases, promoter and other transcriptional regulatory elements. In most cases, this duplicated
gene will undergo crippling, inactivating mutations and turn into a pseudogene (in rarer cases, the duplicated

Pseudogenes cont
Pseudogenes from the point of view of
genome annotation
Pseudogenes are quite difficult to identify and characterize in genomes,
because the two requirements of homology and nonfunctionality are
implied through sequence calculations and alignments rather than
biologically proven.
Homology is implied by sequence identity between the DNA sequences
of the pseudogene and parent gene. After aligning the two sequences,
the percentage of identical base pairs is computed. A high sequence
identity (usually between 40% and 100%) means that it is highly likely
that these two sequences diverged from a common ancestral sequence
(are homologous), and highly unlikely that these two sequences were
independently created.
Nonfunctionality can manifest itself in many ways. Normally, a gene
must go through several steps in going from a genetic DNA sequence
to a fully functional protein: transcription, pre-mRNA processing,
translation, and protein folding are all required parts of this process. If
any of these steps fails, then the sequence may be considered
Eukaryotic 87
nonfunctional.
genomes cont

enome and gene duplication


Genome and gene duplication can occur by several mechanisms:
I.

Polyploidization - Autopolyploidization and Allopolyploidization


based on genome
origin

II. Segmental duplication and


III. Tandem gene duplication

Polyploidy is thought to be rare in animal because polyploidy can


disrupt dosage compensation that is required for genetic balance.
How do they do dosage balance? Why? Hint: Genome imprinting!

Eukaryotic 88
genomes cont

Gene Duplication Cont

2)

1)

3)

4)
(i.e. NeoSubfunctionalization)
functionalization

Fig. Four scenarios for the outcome of gene


duplication

Eukaryotic 89
genomes cont

Eukaryotic 90
genomes cont

Gene Duplication Cont .

Eukaryotic 91
genomes cont

Gene Duplication Cont


Neofunctionalization, one of the possible outcomes of
functional divergence, occurs when one gene copy, or paralog
, takes on a totally new function after a gene duplication
event.
Neofunctionalization is an adaptive mutation process;
meaning one of the gene copies must mutate to develop a
function that was not present in the ancestral gene.

Eukaryotic 92
genomes cont

Gene Duplication Cont


Subfunctionalization is one of the possible outcomes of
functional divergence that occurs after a gene duplication event, in
which pairs of genes that originate from duplication, or paralogs,
take on separate functions.
Subfunctionalization is a neutral mutation process; meaning that no
new adaptations are formed. During the process of gene
duplication paralogs simply undergo a division of labor by retaining
different parts (subfunctions) of their original ancestral function.
This partitioning event occurs because of segmental gene silencing
leading to the formation of paralogs that are no longer duplicates,
because each gene only retains a single function.
It is important to

Eukaryotic
genomes cont

note
that
the
ancestral gene was
capable
of
performing
both
functions and the
descendant
duplicate
genes
can
now
only
perform one of the
original
ancestral
93

Chromatin Structure
Chromatin is a term designating the structure in which DNA exists
within cells. The structure of chromatin is determined and stabilized
through the interaction of the DNA with DNA-binding proteins.
There are 2 classes of DNA-binding proteins. The histones are the major
class of DNA-binding proteins involved in maintaining the compacted
structure of chromatin.
There are 5 different histone proteins identified as H1, H2A, H2B, H3
and H4 (Core Histone) .
The other class of DNA-binding
proteins is a diverse group of
proteins called simply, nonhistone proteins. This class of
proteins includes the various
transcription
factors,
polymerases,
hormone
receptors and other nuclear
enzymes. In any given cell there
are greater than 1000 different
types of non-histone proteins
Fig. Structure of the
bound to the DNA.
chromosome
94
Eukaryotic
genomes cont

Chromatin Structure Cont


The binding of DNA by the histones
generates a structure called the
nucleosome.
The nucleosome core contains an
octamer
protein structure consisting of 2
subunits
each of H2A, H2B, H3 and H4.
Histone
H1
occupies
the
internucleosomal
DNA and is identified as the linker
histone.
The
nucleosome
approximately
150 bp of DNA.

core

contains

The
linker
DNA
between
each
nucleosome
can vary from 20 to more than 200

Eukaryotic
genomes cont

95

Chromatin Structure Cont


Chromatin is found in two
varieties:
euchromatin
and
heterochromatin. Originally, the
two forms were distinguished
cytologically by how intensely
they stained
Euchromatin is less intense, while
heterochromatin stains intensely,
indicating tighter packing.
Heterochromatin mainly consists
of
genetically
inactive
satellite sequences, and many
genes are repressed to various
extents, although some cannot be
expressed in euchromatin at all.
Both centromeres and telomeres
are heterochromatic, as is the
Barr body
of
the
second,
inactivated X-chromosome in a
female.

Eukaryotic
genomes cont

96

Chromatin Structure Cont


I. Heterochromatin is a tightly packed form of DNA, which comes
in different varieties. These varieties lie on a continuum between
the
two
extremes
of
Constitutive
and
Facultative
heterochromatin.
Both play a role in the expression of genes, where constitutive
heterochromatin
can
affect
the
genes
near
them
(
position-effect variegation) Facultative heterochromatin is the
result of genes that are silenced through a mechanism such as
histone methylation or siRNA through RNAi.
The regions of DNA packaged in facultative heterochromatin will not
be consistent between the cell types within a species, and thus a
sequence in one cell that is packaged in facultative heterochromatin
(and the genes within poorly expressed) may be packaged in
euchromatin in another cell (and the genes within no longer silenced).
However, the formation of facultative heterochromatin is regulated,
and is often associated with morphogenesis or differentiation.

Eukaryotic
genomes cont

97

DNA Modification and Genome


Expression
Important alternation of genome activity can also be
achieved by
making chemical changes to the DNA itself.
These changes are associated with the semi-permanent
silencing of the
genome, possibly entire chromosome, and often the
modified state is
inherited by the progeny arising from cell division.
The modification are brought about by DNA methylation.
CpG islands or CG islands are genomic regions that
contain a high
frequency of CpG sites but to date objective definitions
for CpG islands
are limited.
In mammalian genomes, CpG islands are typically 30098
Eukaryotic
3,000 base pairs
genomes cont

DNA Modification cont

Mutatio
n

Fig. Methylcytosine forms the same base pair with


guanine as cytosine, because the methyl group does
not block the formation of the inter-base hydrogen
bonds.

Fig. Deamination of 5-methylcytosine


to
thymine
has
led
to
the
replacement of CpG sequences with
TpA over time.

Fig. When cytosine is deaminated, it


becomes
uracil.
Repair
enzymes
recognize this as an abnormal DNA base
and replace the uracil with a cytosine.
However, when 5-methylcytosine is
deaminated, it becomes thymine, which
replaces the cytosine.
The proof
reading enzyme may keep the change
and edit G into A.
Eukaryotic
genomes cont

99

DNA Modification Cont

Figure 3. Methylation of CpG islands, together with histone deacetylation


and other modifications, silences genes through the mechanism of
chromatin remodeling and heterochromatin formation.

Figure 2. Methylation of
silences gene expression.

CpG

islands

Figure 4. Methylation of CpG


islands leads to long term
gene silencing.

The region also called DNA methylation Domain (DMA) or Imprinting


Eukaryotic
Control Region (ICR)

genomes cont

100

Eukaryotic
genomes cont

DNA Modification Cont


Types of methylation
There are two types of methylations:
1. Maintenance methylation
In order for genes to remain permanently
silenced by this mechanism, the DNA
methylation patterns must be stably
transmitted to daughter cells. This is
accomplished through the activity of DNA
maintenance methylase, which detects
CpG methylation in one strand of inherited
DNA and methylates the other daughter
strand, (Figure to the right).
2. De novo methylation
It adds methyl groups at totally new
position and so change the pattern of
methylation in a localized region of the
Reading
genomeassignment
Methylation is involved in genome imprinting
and X chromosome inactivation, How? Its
significance in dosage balance (in human)

Fig. Maintaining
methylation

methylation

and

de

novo

101

DNA Modification Cont

Histones
In biology, histones are highly alkaline proteins found in eukaryotic cell nuclei that
package and order the DNA into structural units called nucleosomes. They are the chief
protein components of chromatin, acting as spools around which DNA winds, and play a
role in gene regulation. Without histones, the unwound DNA in chromosomes would be
very long (a length to width ratio of more than 10 million to one in human DNA).
For example, each human cell has about 1.8 meters of DNA, but wound on the histones it
has about 90 micrometers (0.09mm) of chromatin, which, when duplicated and
condensed during mitosis, result in about 120 micrometers of chromosomes.

H
1

Eukaryotic
genomes cont

102

Histones Cont
Class of
Histone

Five major families of histones exist: H1/H5, H2A, H2B, H3, and H4.
Histones H2A, H2B, H3 and H4 are known as the core histones, while histones H1 and H5 are
known as the linker histones.
Two of each of the core histones assemble to form one octameric nucleosome core particle,
and 147
base pairs of DNA wrap around this core particle 1.65 times in a left-handed superhelical turn.
The linker histone H1 binds the nucleosome and the entry and exit sites of the DNA, thus
locking the
DNA into place and allowing the formation of higher order structure.
The most basic such formation is the 10nm fiber or beads on a string conformation. This
involves the
wrapping of DNA around nucleosomes with approximately 50 base pairs of DNA
separating each pair of
nucleosomes (also referred to as linker DNA).
The assembled histones and DNA is called chromatin.
Higher-order structures include the 30nm fiber (forming an irregular zigzag) and 100nm
fiber, these
being the structures found in normal cells. During mitosis and meiosis,
the condensed
103
Eukaryotic
chromosomes are
genomes cont

Histones Cont

Fig. Histone modifications regulates chromatin


structure and
functions

Chemical Modifications of Histone


Ac attachment of acetyl
group to lysine amino acids
in the N- terminal regions of
each of the core molecules.
The enzyme mediate the
acetylation is histone acetyltransferase (HAT)
Ac reduces the affinity of the
histone for DNA and possibly
reduces the interaction bln
individual
nucleosomes,
destablizing
the
30nm
chromatin fiber.
Hetrochromatin

unacetylated whereas those


in functional domains are
acetylated this indicate the
mecahnism is important for
DNA packaging and gene
expression regulation
Gene
activation
often
Eukaryotic 104
genomes cont
reversible the deacetylation

Histones Cont
Lysine acetylation is not the only type of histone modification but
the best
studied form of histone modification
Methylation of lysine and Argenine residues of the N-terminal
region of H3 and
H4, it is reversible event.
Phosphorylation of serine residues in the N-terminal regions of
H2A, H2B, H3
and H4
Ubiquitination of lysine residues at the C termini of H2A and H2B.
This
modification involves addition of the samll, common
(ubiquitous) protein
called ubiquitin or s related protein rather than unhelpfully
called SUMO.

Eukaryotic
genomes cont105

Histone Cont

Chromatin structure role of acetylation


Some coactivators have HAT (Histone
acetyltransferase) activity
Links histone acetylation, chromatin structure and
gene activation

HAT activity of co-activator acetylates core


histones bound to promoter DNA causing
release of nucleosome core particles or loosening of
histone-DNA interaction
Subsequent binding of transcription factors and RNA
polymerase
Once transcription is initiated RNA polymerase is
able to transcribe DNA packaged into nucleosomes

Acetylation is dynamic enzymes also remove


acetyl groups (Histone deacetylases (HDACs))
106

Histone Cont

Chromatin structure role of deacetylation


Removal of acetyl groups
Histone deacetylases (HDACs)

HDACs associated with transcriptional repression


HDACs are subunits of larger complexes
corepressors
HDACs guided to regions of DNA by methylation
patterns
Example:
Inactive X chromosome of female
Largely deacetylated histones
Active X chromosome has a normal level of histone
acetylation
Hemizygous:
107

Human chromosome Karyotype

108

Histone Cont
Chromatin structure Acetylation / Deacetylation

109

Histone Cont
Chromatin structure Acetylation / Deacetylation

110

END OF THE LESSON

111

You might also like