Professional Documents
Culture Documents
Francis Crick
James Watson
Figure X:DNA is a double helix. (A) Francis Crick (left) and James Watson (right)
proposed that the DNA molecule has a double-helical structure. (B) Biochemists can now
pinpoint the position of every atom in a DNA molecule. To see that the essential features
of the original Watson-Crick model have been verified, follow with your eyes the doublehelical chains of sugar-phosphate groups and note the horizontal rungs of the bases. 2
Genomics Cont.....
Genomics provides essential tools to speed up the work of the
forward geneticists and is now a scientific discipline in its own right.
Genetics
Genetics
Application of genomics science in all areas of biology has
lowered the barriers that once separated the plant, animal, microbial
research communities.
Experimental
techniques
for
studying
transcriptomes
and
proteomes are providing novel insights into genome expression and the
new discipline of systems biology is linking genome biology with4
Genomics Cont.....
The investigation of the roles and functions of single genes
is a primary focus of
Genomics Cont.....
Genome: The entire genetic complement of a living organism
Gene:
A DNA segment containing biological information and
hence coding for
RNA and/or polypeptide molecule
Genomic Expression: It is the series of events by which the
biological information carried by the genome is released and
made available to the cell.
Analysis Pipe
Line
Taged Site
Out
Come
) Cyanobacteria Genomics
11
12
Organization Of Human
Genome
15
Scope of Genomics Cont
V) Metagenomics
Definition:
Metagenomics
Environmental
Genomics
or
Metagenomics:
It is powerful tool to reveal the previously hidden
diversity of microscopic life (culture based studies).
Thus, it offers a powerful lens for viewing the
microbial world that has the potential to
revolutionize the understanding of the entire
microscopic organisms (bacteria and fungi).
16
hyper
19
21
22
Aims of Metagenomics
Metagenomics Cont
Examining metabolic pathways
Facilitate towards designing culture media for the
growth of previously-uncultured microbes.
Examining genes that predominate in a given
environment compared to others.
Finally, metagenomic data and metadata can be
leveraged towards designing low- and highthroughput experiments focused on defining the
roles of genes and microorganisms in the
establishment of a dynamic microbial community.
24
25
Microbial
diversity
Communit
y
26
genomics
27
Microbial
community
DN
ex A
tra
cti
o
uni
m
m
Co
ling
p
m
Sa oAmplifying
ach
r
p
ap
single gene,
e.g. 16S rRNA
nce
e
qu
Se
d
e
an erat
n
ge
e
tre
Phylogenetic tree
Outcome
s:
1. Phylogenetic
snapshot of most
members of the
community
2. Identification of
novel phylotype
communit
y
REs digest total
DNA and then
shotgun
Assembled
sequencing
and
annotation
genome
s
Total gene pool of
the community
1. Identification of all
genes categories
2. Discovery of new
genes
3. Linking genes to
particular phylotypes
28
29
Pharmacogenomics
Pharmacogenomics:
There is an inter-patient medication response variability in
their efficacy
and toxicity
Inter-patient variability are due to in part to polymorphisms
(the frequency
of the most frequent allele is 99%) in genes encoding for:
I.Drug Metabolizing enzymes,
II.Drug transporters, and/or
III.Drug targets (e.g. enzymes, receptors)
Population
DNA sequencing
of Target gene
ome Terminologies
1) Allelic variant: A variation in the normal sequence of a gene.
2) Genotype: The genetic formation or the genetic makeup of an
orgnaism.
3) Genotype-phenotype correlation: The association between the
presence of genetic variation and the resulting physical
characteristics or abnormality.
4) Pharmacogenetics vs. Pharmacogenomics:
Pharmacogentics: it is the study of genetic variation in drug
metabolizing enzymes and the effect on drug response/ it is often a
study of the variations in a targeted gene, or group of functionally
related genes.
Pharmacogenomics represents the general study of the entire
spectrum of genes that affect drug behavior, i.e. It is a much
broader investigation of genetic variations at the level of the
genome.
5) Phenotype: Observable features of the expression of genes.
32
33
Pharmac
35
Pharmac
TPMT: Thiopurine S-methyltransferase
This gene encodes the enzyme that metabolizes thiopurine drugs via Sadenosyl-L-methionine as the S-methyl donor and S-adenosyl-Lhomocysteine as a byproduct.
Thiopurine
drugs
such
chemotherapeutic agents.
as
6-mercaptopurine
are
used
as
ss
Pharmac
The methyl group (CH3) attached to the methionine sulfur atom in SAM is chemically
reactive. This allows donation of this group to an acceptor substrate in transmethylation
reactions. More than 40 metabolic reactions involve the transfer of a methyl group from
SAM
to
various
substrates,
such
as
nucleic acids,
proteins,
lipids
and
37
secondary metabolites.
Pharmac
1. Thiopurine S-methyltransferase
(EC:2.1.1.67)
2. Hypoxanthine phosphoribosyltransferase
[EC:2.4.2.8]
http://www.kegg.jp/kegg-bin/show_pathway?hsa00983+7172
38
Pharmac
Pharmac
Pharmac
Summary
The adrenergic receptors (subtypes alpha 1, alpha 2, beta 1,
and beta 2) are a prototypic family of guanine nucleotide
binding regulatory protein-coupled receptors that mediate the
physiological effects of the hormone epinephrine and the
neurotransmitter norepinephrine. Specific polymorphisms in this
gene have been shown to affect the resting heart rate and can
be involved in heart failure
Pharmac
Pharmac
Potential of
identify
patients within
Pharmacogenomics
To
a population with the
same diagnosis, who are
genetically predisposed
either not to respond to
therapy or to develop
unacceptable
toxicity,
an then to prospectively
alter their therapy to avoid
treatment that is not likely
to optimal. The remaining
now more homogeneous
population,
can
then
treated with conventional
therapy inwhich they are
not
genetically
The approaches
promise the advent of personalized
predisposed
medicine; to
infail.
which drugs and drug combinations are
optimized for each individual's unique genetic makeup.
44
45
46
Scope of Genomics Cont
Mycoplasma genitalium (genome size; 0.58 Mb) is a small parasitic bacterium that lives on
the ciliated epithelial cells of the primate genital and respiratory tracts. M. genitalium is the
smallest known genome that can constitute a cell, and the second-smallest bacterium after
the endosymbiont Carsonella ruddii. Until the discovery of Nanoarchaeum in 2002, M.
genitalium was also considered to be the organism with the smallest genome.
N.B: There is a difference between smallest parasitic bacteria and smallest free
47
living bacteria.
48
49
50
Synthetic Biology is
referes to reliabley
engineer biological
systems that perform
human-defined
function.
C. Smolke (Nature 441: 277279)
51
enome Organization
53
54
Single,
Circular
DNA
molecule, localized within
nucleoid
(the
lightly
staining
area
in
the
center of the cell)
Linearly arrenged,
histone
complex
and packagined in
organized passion
55
rokaryote genomes
Example: E. coli
89% coding
> 4,000 genes
122 structural RNA genes
Prophage remains
Transposable elements: Insertion
sequence (IS) and Transposons
(Composite and non-composite)
Horizontal transfers (conjugation)
56
57
Prokaryote
A
simplified
version
of
prokaryotic operon organization.
Genes A, B, and C are
transcribed together onto a
single polycistronic transcript,
which is then translated to
produce three separate proteins.
Promote
r
Represor
protein
encoding gene
Promote
r
gene(s)
58
Prokaryote
cont
Operon
In genetics, an operon is a functioning unit of genomic
DNA containing a cluster of genes under the control of a
single regulatory signal or promoter.
The genes are transcribed together into an mRNA strand
and either translated together in the cytoplasm, or
undergo trans-splicing to create monocistronic mRNAs
that are translated separately, i.e. several strands of
mRNA that each encode a single gene product. The result
of this is that the genes contained in the operon are
either expressed together or not at all.
In short, Several genes must be both co-transcribed
and co-regulated to define an operon.
It is the main features of prokaryotic genome, it is made
of four components: promoter, regulator, operator and
59
61
62
Premature
termination
The operon contains five structural genes involved in the biosyhthesis
of tryptophan: trpE, D, C, B and A.
Expression of these genes is controled at two levels:
1. The trpR gene encodes a repressor that in the presence of
tryptophan, bind to the operator (o) block transcription.
2. In addition, expression is mediated by an attenuator sequence that
prematurely terminates transcription when high levels of
tryptophan are present. In this case, the attenuated RNA consist of
only a short leader sequence (L). P = promoter
63
Anticodon
66
Prokaryote
Prokaryote
67
cont
ukaryotic genome
They have a multiple linear
chromosomes,
each
chromosome containing multiple
origins of replication
(Think about the comparative genome sizes of the
two groups; Prokaryotes and Eukaryotes )
How many genes are there? This question is surprisingly not very
important, and has nothing to do with the organisms complexity.
There is more to genomes than protein-coding genes alone.
Eukaryotic 69
genomes cont
Protein-coding genes
Although most prokaryotic chromosomes consist almost entirely of
proteincoding genes, such elements make up a small fraction of
most eukaryotic
genomes ( Figure)
As a prime example, the human genome might contain as few as
20,000 genes,
comprising less than 1.5% of the total genome sequence
Eukaryotic 70
genomes cont
Eukaryotic
genomes cont
Introns
Shortly after their discovery, the non-coding intervening sequences
within coding genes (introns) were suggested to account for the
pronounced discrepancy between gene number and genome size. It has
also recently been suggested that most non-coding DNA in animals (but
not plants) is intronic, which would imply that most of the genome is
transcribed even though protein-coding regions represent a tiny
71
minority.
Eukaryotic
genomes cont
YNYYRAY
polypyrimidine
Splicing is controlled by specific intron sequences, called splicedonor (GU) and splice-acceptor (AG) sequences, which flank the
exons. Mutations in these sequences may lead to retention of large
segments of intronic DNA by the mRNA, or to entire exons being
spliced out of the mRNA. These changes could result in production
of a nonfunctional protein.
Eukaryotic 74
genomes cont
Exon 1
3 splice site
Intron
trans-esterification
trans-esterification
ligated exons
-3
3
(www.wisc.edu/pharm
)
Exon 2
pre-mRNA
Eukaryotic 75
genomes cont
lternative splicing
It is a post-transcriptional modification in which a single
gene can code for multiple proteins (protein isoforms).
It is done in eukaryotes, prior to mRNA translation, by the
differential inclusion or exclusion of regions of pre-mRNA.
It is an important source of protein diversity.
During a typical gene splicing event, the pre-mRNA
transcribed from one gene can lead to different mature
mRNA molecules that generate multiple functional proteins.
In conclusion:
Gene splicing enables a single gene to increase its coding
capacity, allowing the synthesis of protein isoforms that are
structurally and functionally distinct. Gene splicing is
observed in high proportion of genes. In human cells, about
40-60% of the genes are known to exhibit alternative
76
splicing.
77
The
simplest
alternative
polyadenylation
(APA)
type,
which is termed tandem 3
untranslated region (UTR) APA,
involves
the
occurrence
of
alternative poly(A) sites within
the same terminal exon and
hence
generates
multiple
isoforms that differ in their 3UTR
length without affecting the
protein encoded by the gene.
The other three types involve
APA events, which potentially
affect the coding sequences in
addition to the 3UTRs. These
types are: alternative terminal
exon APA, in which alternative
splicing generates isoforms that
differ in their last exon; intronic
APA, which involves cleaving at
the cryptic intronic poly(A) signal
(PAS), extending an internal exon
and making it the terminal one;
and internal exon APA, which
79
involves
premature
polyadenylation
within
the
Hypothetical transcript
sequences consisting of
exons (green rectangles)
with intervening introns
(black
lines)
are
depicted
as
gapped
alignments
to
a
reference genome.
The
following
tracks
represent
sequences
generated
by
each
sequence-based
method. Human genes
have an average of 10
exons with an average
length of 250 bp. The
methods are displayed
in order of least to most
quantitative.
Abbreviations:
(EST)
expressed sequence tag;
80
(SAGE) serial analysis of
Eukaryotic 82
genomes cont
Pseudogenes
The term, coined in 1977 by Jacq, et al., is composed of the
prefix
pseudo, which means false, and the root gene, which is the
central unit of
molecular genetics.
They are dysfunctional relatives of genes that have lost their
protein-coding
ability or are otherwise no longer expressed in the cell
Some do not have introns or promoters (these pseudogenes
are copied from
mRNA and incorporated into the chromosome and are called
processed
pseudogenes)
most have some gene-like features (such as promoters, CpG
islands, and splice
Jacq C,
Millerthey
JR, Brownlee
GG. A pseudogene
structure
in 5S DNA
of to
sites),
are nonetheless
considered
nonfunctional,
due
Xenopus
laevis.
their lack
of
Cell 12:109-120.
1977. resulting
protein-coding
ability
from
various
genetic
disablements (premature
84
stop codons, frameshifts, or a lack of transcription)Eukaryotic
or their
genomes cont
85
Eukaryotic
genomes cont
Pseudogenes cont
Processed
Pseudogen
e!
Repeat
motifs
Figure X: Origins of pseudogenes: A. Retrotransposed pseudogenes: starting from the original gene (the
coding sequences are in black, the non-coding introns in gray, and the promoter element is indicated by the
large arrow upstream of the gene), transcription generates a primary mRNA (black and gray broken line), from
which the introns are excised by RNA splicing. This mature mRNA, which contains only exons and a polyadenosine tail, is transcribed back into DNA by enzymes called reverse transcriptases, and the DNA is
reinserted back into the genome. Hence, the pseudogene product will lack intron and promoter sequences, and
will bear characteristic repeat sequences at the insertion site, due to the integration mechanism. B. Duplicated
pseudogenes: DNA duplication generates a more-or-less faithful copy of the original gene, including introns
86
and, in many cases, promoter and other transcriptional regulatory elements. In most cases, this duplicated
gene will undergo crippling, inactivating mutations and turn into a pseudogene (in rarer cases, the duplicated
Pseudogenes cont
Pseudogenes from the point of view of
genome annotation
Pseudogenes are quite difficult to identify and characterize in genomes,
because the two requirements of homology and nonfunctionality are
implied through sequence calculations and alignments rather than
biologically proven.
Homology is implied by sequence identity between the DNA sequences
of the pseudogene and parent gene. After aligning the two sequences,
the percentage of identical base pairs is computed. A high sequence
identity (usually between 40% and 100%) means that it is highly likely
that these two sequences diverged from a common ancestral sequence
(are homologous), and highly unlikely that these two sequences were
independently created.
Nonfunctionality can manifest itself in many ways. Normally, a gene
must go through several steps in going from a genetic DNA sequence
to a fully functional protein: transcription, pre-mRNA processing,
translation, and protein folding are all required parts of this process. If
any of these steps fails, then the sequence may be considered
Eukaryotic 87
nonfunctional.
genomes cont
Eukaryotic 88
genomes cont
2)
1)
3)
4)
(i.e. NeoSubfunctionalization)
functionalization
Eukaryotic 89
genomes cont
Eukaryotic 90
genomes cont
Eukaryotic 91
genomes cont
Eukaryotic 92
genomes cont
Eukaryotic
genomes cont
note
that
the
ancestral gene was
capable
of
performing
both
functions and the
descendant
duplicate
genes
can
now
only
perform one of the
original
ancestral
93
Chromatin Structure
Chromatin is a term designating the structure in which DNA exists
within cells. The structure of chromatin is determined and stabilized
through the interaction of the DNA with DNA-binding proteins.
There are 2 classes of DNA-binding proteins. The histones are the major
class of DNA-binding proteins involved in maintaining the compacted
structure of chromatin.
There are 5 different histone proteins identified as H1, H2A, H2B, H3
and H4 (Core Histone) .
The other class of DNA-binding
proteins is a diverse group of
proteins called simply, nonhistone proteins. This class of
proteins includes the various
transcription
factors,
polymerases,
hormone
receptors and other nuclear
enzymes. In any given cell there
are greater than 1000 different
types of non-histone proteins
Fig. Structure of the
bound to the DNA.
chromosome
94
Eukaryotic
genomes cont
core
contains
The
linker
DNA
between
each
nucleosome
can vary from 20 to more than 200
Eukaryotic
genomes cont
95
Eukaryotic
genomes cont
96
Eukaryotic
genomes cont
97
Mutatio
n
99
Figure 2. Methylation of
silences gene expression.
CpG
islands
genomes cont
100
Eukaryotic
genomes cont
Fig. Maintaining
methylation
methylation
and
de
novo
101
Histones
In biology, histones are highly alkaline proteins found in eukaryotic cell nuclei that
package and order the DNA into structural units called nucleosomes. They are the chief
protein components of chromatin, acting as spools around which DNA winds, and play a
role in gene regulation. Without histones, the unwound DNA in chromosomes would be
very long (a length to width ratio of more than 10 million to one in human DNA).
For example, each human cell has about 1.8 meters of DNA, but wound on the histones it
has about 90 micrometers (0.09mm) of chromatin, which, when duplicated and
condensed during mitosis, result in about 120 micrometers of chromosomes.
H
1
Eukaryotic
genomes cont
102
Histones Cont
Class of
Histone
Five major families of histones exist: H1/H5, H2A, H2B, H3, and H4.
Histones H2A, H2B, H3 and H4 are known as the core histones, while histones H1 and H5 are
known as the linker histones.
Two of each of the core histones assemble to form one octameric nucleosome core particle,
and 147
base pairs of DNA wrap around this core particle 1.65 times in a left-handed superhelical turn.
The linker histone H1 binds the nucleosome and the entry and exit sites of the DNA, thus
locking the
DNA into place and allowing the formation of higher order structure.
The most basic such formation is the 10nm fiber or beads on a string conformation. This
involves the
wrapping of DNA around nucleosomes with approximately 50 base pairs of DNA
separating each pair of
nucleosomes (also referred to as linker DNA).
The assembled histones and DNA is called chromatin.
Higher-order structures include the 30nm fiber (forming an irregular zigzag) and 100nm
fiber, these
being the structures found in normal cells. During mitosis and meiosis,
the condensed
103
Eukaryotic
chromosomes are
genomes cont
Histones Cont
Histones Cont
Lysine acetylation is not the only type of histone modification but
the best
studied form of histone modification
Methylation of lysine and Argenine residues of the N-terminal
region of H3 and
H4, it is reversible event.
Phosphorylation of serine residues in the N-terminal regions of
H2A, H2B, H3
and H4
Ubiquitination of lysine residues at the C termini of H2A and H2B.
This
modification involves addition of the samll, common
(ubiquitous) protein
called ubiquitin or s related protein rather than unhelpfully
called SUMO.
Eukaryotic
genomes cont105
Histone Cont
Histone Cont
108
Histone Cont
Chromatin structure Acetylation / Deacetylation
109
Histone Cont
Chromatin structure Acetylation / Deacetylation
110
111