You are on page 1of 70

Secuencias regulatorias y bsqueda de

promotores
Bq. Francisco Duarte
Ph.D. Biotechnology

Contenidos
1. Background
2. Representacin de motivos regulatorios

3. Algoritmos de bsqueda de promotores


4. Bases de datos relacionadas con la bsqueda de
promotores

Probability of occurrence of each nucleotide


T
77%

for -10 sequence


A
T
A
A
76% 60% 61% 56%

T
82%

T
69%

for -35 sequence


T
G
A
C
79% 61% 56% 54%

A
54%

TRANSFAC

Estructura de un gen eucarionte

Regulatory
Elements

Gene

Contig

5
Transcription
primary
transcript

Splicing
Splice
Variants

mRNA
altern.
exon

5-UTR

CDS

3-UTR

Esquema general de la estructura jerrquica de las


regiones regulatorias de la transcripcion en genes
eucariontes
TSS
enhancer 2

enhancer 1

promoter

box A

box C box B
composite
element

box A

box G box F

box D

box E

box D

box A

TATA
box

initiator
Inr

Qu es un factor de transcripcin?
A transcription factor is a protein that regulates transcription
after nuclear translocation by specific interaction with DNA
or by stoichiometric interaction with a protein that can be
assembled into a sequence-specific DNA-protein complex.
http://www.gene-regulation.com/pub/databases/transfac/clSM.html

Regiones regulatorias

Gene regulation
Virtually every cell in your body contains a
complete set of genes
Gene
occurs
the
level of
But they regulation
are not all turned
on in at
every
tissue
transcription
or
production
of
mRNA
Each cell in your body expresses only a small
subset of genes at any time
During
development
different
cellsa express
A given
cell transcribes
only
specificdifferent
set of
sets
of genes
in aothers
precisely regulated fashion.
genes
and not

Insulin is made by pancreatic cells

Caractersticas de las regiones reguladoras

Chequear:
http://www.ccg.unam.mx/Computational_Genomics/PromoterTools/
http://molbiol-tools.ca/Promoters.htm
http://www.phisite.org/main/index.php?nav=tools&nav_sel=hunter
http://www.fruitfly.org/seq_tools/promoter.html
http://linux1.softberry.com/berry.phtml?topic=bprom&group=programs&subgroup=gfindb

Central dogma
Genetic information always goes from DNA to RNA to protein
Gene regulation has been well studied in E. coli

Trp Operon
When
a bacterial cell encounters a potential food source it will manufacture the enzymes
necessary to metabolize that food
E. coli uses several proteins encoded by a cluster of 5 genes to manufacture the amino acid
tryptophan.

Gene Regulation

All 5 genes are transcribed together as a unit called an operon, which produces a single
piece to
of sugars
mRNA like
for all
the genes.
Inlong
addition
glucose
and lactose E. coli cells also require amino acids
One essential aa is tryptophan.
RNA polymerase binds to a promoter located at the beginning of the first gene and
proceeds
down
the DNA transcribing
the(milk
genes
in sequence
When
E. coli
is swimming
in tryptophan
& poultry)
it will absorb the amino acids from
the media
When tryptophan is not present in the media then the cell must manufacture its own
amino acids

Gene regulation
In addition to amino acids, E. coli cells also metabolize sugars
in their environment.
In 1959 Jacques Monod and Fracois Jacob looked at the
ability of E. coli cells to digest the sugar lactose.
In the presence of the sugar lactose, E. coli makes an
enzyme called beta galactosidase.

Beta galactosidase breaks down the sugar lactose so the E.


coli can digest it for food.
It is the LAC Z gene in E. coli that codes for the enzyme beta
galactosidase.

Lac Z Gene
The tryptophane gene is turned on when there is no tryptophan in the
media.
That is when the cell wants to make its own tryptophan.
E. coli cells can not make the sugar lactose.

They can only have lactose when it is present in their environment.


Then they turn on genes to beak down lactose.
The E. coli bacteria only needs beta galactosidase if there is lactose in the
environment to digest. There is no point in making the enzyme if there is no
lactose sugar to break down.
It is the combination of the promoter and the DNA that regulate when a
gene will be transcribed.

This combination of a promoter and a gene is called an OPERON

THE OPERON
Operon is a cluster of genes encoding related enzymes that are regulated together
Operon consists of:
a promoter site where RNA polymerase binds and begins transcribing the
message.
a region that makes a repressor.
Repressor sits on the DNA at a spot between the promoter and the gene to be
transcribed.
This site is called the operator.

LAC Z GENE
E. coli regulate the production of Beta
Galactocidase by using a regulatory protein called
a repressor
The repressor binds to the lac Z gene at a site
between the promotor and the start of the coding
sequence
The site the repressor binds to is called the
operator

LAC Z GENE
Normally the repressor sits on the operator
repressing transcription of the lac Z gene
In the presence of lactose the repressor
binds to the sugar and this allows the
polymerase to move down the lac Z gene

LAC Z GENE
This results in the production of beta galactosidase
which breaks down the sugar.
When there is no sugar left the repressor will
return to its spot on the chromosome and stop
the transcription of the lac Z gene.

Mecanismo
operon apagado

GENE REGULATION
In eukaryotic organisms like ourselves there are several
methods of regulating protein production
Most regulatory sequences are found upstream from
the promoter
Genes are controlled by regulatory elements in the
promoter region that act like one/off switches or
dimmer switches

GENE REGULATION
Specific transcription factors bind to these regulatory
elements and regulate transcription.
Regulatory elements may be tissue specific and will
activate their gene only in one kind of tissue
Sometimes the expression of a gene requires the
function of two or more different regulatory elements

INTRONS AND EXONS


Eukaryotic DNA differs from prokaryotic DNA it that the coding
sequences along the gene are interspersed with noncoding
sequences.
The coding sequences are called
EXONS
The non coding sequences are called
INTRONS

INTRONS AND EXONS


After the initial transcript is produced the
introns are spliced out to form the completed
message ready for translation
Introns can be very large and numerous, so
some genes are much bigger than the final
processed mRNA

INTRONS AND EXONS


Muscular dystrophy

DMD gene is about 2.5 million base pairs long


Has more than 70 introns
The final mRNA is only about 17,000 base pairs
long

RNA Splicing
Provides a point where the expression of a gene can be
controlled
Exons can be spliced together in different ways
This allows a variety of different polypeptides to be
assembled from the same gene
Alternate splicing is common in insects and vertebrates,
where 2 or 3 different proteins are produced from one
gene

Protein domains in regulator sequences

TFBS: Transcription factors binding sites

Motif representations: from alignments to motifs

Transcription factors
Sequence-specific
DNA binding

Non-DNA binding

HAT

Layer III
Co-activator

Layer II
Layer I
DNA

adapter

TF1

TF2

TF3

TF4

Structure of transcription factors


USF-1,Liganddimer

oligomerization
domain

binding
domain

Activation
domain

Protein-protein
interaction
domain

DNA binding
domain

Gene

1.

Scavenger
receptor,
Homo sapiens

Schema and positions of a CE

TRANSCompel
accession number
C00080

Ets

AP-1

Enhancer 4500/-4100

2.

-53
:

GM-CSF,
Mus musculus

-40
:

3.

4.

Collagenase,
Homo sapiens

-89
:

-82
:

Ets

C00081

Ets

AP-1

-72
:

-66
:

C00083

AP-1

IgH ,
Mus musculus

C00133

Ets

AP-1

Enhancer at 3 flank

5.

6.

7.

8.

9.

10.

Interleukin 2,
Homo sapiens

Interleukin 2,
Homo sapiens

2, Mus musculus

-283
:

NFAT

IRF-1, Mus
musculus

AP-1

-167
:

C00109
-142
:

NF-B

AP-1

-167
:

C00165
-142
:

AP-1

IgH,
Homo sapiens

1, Rattus
norvegicus

-268
:

C00158

Oct-2

C00173

Ets

CBF

-117
:

-73
:

C/EBP
-123
:

-113
:

STAT-1

00101

NF-B
-49
:

-40
:

NF-B

C00192

Ternary complex NFATp - AP1 - DNA

Composite elements
Minimal functional units where both protein-DNA and protein-protein
interactions contribute to a highly specific pattern of gene expression
and provide cross-coupling of different signal transduction pathways.

F2
F1

Low level
of transcription

Low level
of transcription

F1
F2

Synergistic activation of
transcription

F1

F2

Integration of signals. Cross-coupling of signal transduction pathways


Membrane receptor

Ca2+ dependent canal

Src

Ras

SH2

Ras

SH3

Phosphorylation

Ca2+

Ca2+

GTP

GDP

PLC

Adaptors
PI3-K

Ca2+

cytoplasm

IP3

Calcineurin
PKB/Akt
P
NFATp

ERK

JNK

NFATp
ERK

NFATp
Nucleus

c-Fos

P38MAPK

JNK
c-Fos

IL-2

c-Jun

-Fos c-Jun

Composite element

P38MAPK

c-Jun

ATF-2
c-Jun ATF-2

ATF-2

Mouse IL-4 promoter


AP-1

HMG Y

STAT 6

AP-1

NF-Y

AP-1

HMG Y

c-MAF

AP-1

TATA
NFAT

NFAT

NFAT

NFAT
CE

-249

-180

-150

-114

-88

ST

NFAT
CE
-60

-28

+1

AP-1

AP-1

CBF

AP-1

AP-1

NF-B
NF-B
c-Rel/p65 p50/p6
5

GM-CSF
Homo
sapiens

CBF

AP-1
TATTT

NFAT

NFAT

CE

NFAT

NFAT

CE

CE

T-cell specific inducible enhancer at 3500 bp

NFAT

HMG Y(I)
-114

-88

CD28 response element

-54

CE

Promoter

ST
+1

Enhanceosome

Recruitment of CIITA to MHC-II promoters. A prototypical MHC-II promoter (HLA-DRA) is represented schematically with the W, X,
X2, and Y sequences conserved in all MHC-II, Ii, and HLA-DM promoters. RFX, X2BP, NF-Y, and an as yet undefined W-binding
protein bind cooperatively to these sequences and assemble into a stable higher order nucleoprotein complex referred to here as
the MHC-II enhanceosome. CIITA is tethered to the enhanceosome via multiple weak protein-protein interactions with the W, X,
X2, and Y-binding factors. The octamer site found in the HLA-DRA promoter (O), and its cognate activators (Oct and OBF-1) are not
required for recruitment of CIITA. CIITA is proposed to activate transcription (arrow) via its amino-terminal activation domains
(AD), which contact the RNA polymerase II basal transcription machinery.
Masternak K et al., Genes Dev 2000 May 1;14(9):1156-66

Closed nucleosomes

Site-specific TF

Acetylase
Acetilase

PCAF
Co-activator
p300/CBP

Acetylation
Acetilation

TFIID

TFIIA
TFIIB

TFIIF
TFIIE
RNA pol II

TFIIH

Databases on gene regulation

http://regulondb.ccg.unam.mx/

Ejercicio

Buscar .gbk y 100


pares
de
bases
upstream

BLASTp vs NR para buscar probables ortlogos

>malE - 100 bases upstream


aaagaactacctgaatttcgagattaggcctt
gatcgcgccggggtgaaagcgttatact
gacgcgcaaacgtttgcgcaatttgggcacag
agggggtt

>malE - 100 bases upstream


aggaggatggaaagaggatgtcatagaaagaa
actaaagaccgttaagcgacctctgcgt
atccacgagcaatatacacaaatggaaaagga
cgggttat

http://molbiol-tools.ca/Promoters.htm

http://www.prodoric.de/vfp/vfp_promoter.php

http://www.phisite.org/main/index.php?nav=tools&nav_sel=hunter

You might also like