You are on page 1of 61

,

.
2006

. ()
. ()
. , . ()
.. ( )


()

-
(cI)

-
(Cro)

(..)


,



100 000



Pattern ( -
o )
(
) positional weight matrix, PWM,
profile


codB
purE
pyrD
purT
cvpA
purC
purM
purH
purL
consensus

CCCACGAAAACGATTGCTTTTT
GCCACGCAACCGTTTTCCTTGC
GTTCGGAAAACGTTTGCGTTTT
CACACGCAAACGTTTTCGTTTA
CCTACGCAAACGTTTTCTTTTT
GATACGCAAACGTGTGCGTCTG
GTCTCGCAAACGTTTGCTTTCC
GTTGCGCAAACGTTTTCGTTAC
TCTACGCAAACGGTTTCGTCGG
ACGCAAACGTTTTCGT


codB
purE
pyrD
purT
cvpA
purC
purM
purH
purL
consensus
pattern

CCCACGAAAACGATTGCTTTTT
GCCACGCAACCGTTTTCCTTGC
GTTCGGAAAACGTTTGCGTTTT
CACACGCAAACGTTTTCGTTTA
CCTACGCAAACGTTTTCTTTTT
GATACGCAAACGTGTGCGTCTG
GTCTCGCAAACGTTTGCTTTCC
GTTGCGCAAACGTTTTCGTTAC
TCTACGCAAACGGTTTCGTCGG
ACGCAAACGTTTTCGT
aCGmAAACGtTTkCkT


j a C G m A A A C G

T T k C k T

A 6

0 0

1 0

G 1

5 0

3 9

I = j b f(b,j)[log f(b,j) / p(b)]

Logo


()
j a C G m A A A C G

T T k C k T

A 6

0 0

1 0

G 1

5 0

3 9

1.1 1.0 0.7 0.5

2.2

2.2

1.9 0.7 0.7 0.1 1.0 0.7 1.1 0.7 1.4 0.7

0.4

1.9 0.7

1.6 0.7 0.7

0.4

0.1

0.4 1.0 0.7 1.1 0.7 0.7 1.0 0.7 0.7

0.1

2.2 0.7 1.2 1.0 0.7 1.1

2.2 1.1 0.7 0.7 1.0 0.7

2.2 0.1 0.1 0.7


1.5

1.9

2.2

2.2 0.3 0.7

1.2 0.7

1.0 0.7

1.0 0.7

0.6

2.2

W(b,j)=ln(N(b,j)+0.5) 0.25iln(N(i,j)+0.5)
: -
( )
: z-score (
)
:
(
)
(pseudocounts)

,
..

GenBank

()
( )




:
:
: -



..

Bacillus subtilis
dnaN
gyrA
serS
bofA
csfB
xpaC
metS
gcaD
spoVC
ftsH
pabB
rplJ
tufA
rpsJ
rpoA
rplM

ACATTATCCGTTAGGAGGATAAAAATG
GTGATACTTCAGGGAGGTTTTTTAATG
TCAATAAAAAAAGGAGTGTTTCGCATG
CAAGCGAAGGAGATGAGAAGATTCATG
GCTAACTGTACGGAGGTGGAGAAGATG
ATAGACACAGGAGTCGATTATCTCATG
ACATTCTGATTAGGAGGTTTCAAGATG
AAAAGGGATATTGGAGGCCAATAAATG
TATGTGACTAAGGGAGGATTCGCCATG
GCTTACTGTGGGAGGAGGTAAGGAATG
AAAGAAAATAGAGGAATGATACAAATG
CAAGAATCTACAGGAGGTGTAACCATG
AAAGCTCTTAAGGAGGATTTTAGAATG
TGTAGGCGAAAAGGAGGGAAAATAATG
CGTTTTGAAGGAGGGTTTTAAGTAATG
AGATCATTTAGGAGGGGAAATTCAATG

dnaN
gyrA
serS
bofA
csfB
xpaC
metS
gcaD
spoVC
ftsH
pabB
rplJ
tufA
rpsJ
rpoA
rplM
cons.
num.

ACATTATCCGTTAGGAGGATAAAAATG
GTGATACTTCAGGGAGGTTTTTTAATG
TCAATAAAAAAAGGAGTGTTTCGCATG
CAAGCGAAGGAGATGAGAAGATTCATG
GCTAACTGTACGGAGGTGGAGAAGATG
ATAGACACAGGAGTCGATTATCTCATG
ACATTCTGATTAGGAGGTTTCAAGATG
AAAAGGGATATTGGAGGCCAATAAATG
TATGTGACTAAGGGAGGATTCGCCATG
GCTTACTGTGGGAGGAGGTAAGGAATG
AAAGAAAATAGAGGAATGATACAAATG
CAAGAATCTACAGGAGGTGTAACCATG
AAAGCTCTTAAGGAGGATTTTAGAATG
TGTAGGCGAAAAGGAGGGAAAATAATG
CGTTTTGAAGGAGGGTTTTAAGTAATG
AGATCATTTAGGAGGGGAAATTCAATG
aaagtatataagggagggttaataATG
001000000000110110000000111
760666658967228106888659666

dnaN
gyrA
serS
bofA
csfB
xpaC
metS
gcaD
spoVC
ftsH
pabB
rplJ
tufA
rpsJ
rpoA
rplM
cons.
num.

ACATTATCCGTTAGGAGGATAAAAATG
GTGATACTTCAGGGAGGTTTTTTAATG
TCAATAAAAAAAGGAGTGTTTCGCATG
CAAGCGAAGGAGATGAGAAGATTCATG
GCTAACTGTACGGAGGTGGAGAAGATG
ATAGACACAGGAGTCGATTATCTCATG
ACATTCTGATTAGGAGGTTTCAAGATG
AAAAGGGATATTGGAGGCCAATAAATG
TATGTGACTAAGGGAGGATTCGCCATG
GCTTACTGTGGGAGGAGGTAAGGAATG
AAAGAAAATAGAGGAATGATACAAATG
CAAGAATCTACAGGAGGTGTAACCATG
AAAGCTCTTAAGGAGGATTTTAGAATG
TGTAGGCGAAAAGGAGGGAAAATAATG
CGTTTTGAAGGAGGGTTTTAAGTAATG
AGATCATTTAGGAGGGGAAATTCAATG
tacataaaggaggtttaaaaat
0000000111111000000001
5755779156663678679890


( aGGAGG)


(ab initio)
:


k (k-)
k-
,

( )
k-

:

:
,

:


k-
k-
,

( )
k-

: k- -
?
1 : ,

: ( k-)
.

2 : ,
.
:
,
;

-
k-
. k-
, (,
, h , h<<k).
n- (n
).
( )


,

,

...
(
)

.
Expectation - Maximization

(, k-
)
:

,

.

, .
: .


:
I

I = j b f(b,j)[log f(b,j) / p(b)]


,


: A (
), I(A)
.
B ,
, I(B)
.
I(B) I(A), B
I(B) < I(A), B
P = exp [(I(B) I(A)) / T]
T ,
,
( 1).

Gibbs sampler
, A , I(A)
.


P ~ exp [(I(Anew)]

, .
(: )

:
()

W(b,j)=ln(N(b,j)+0.5) 0.25iln(N(i,j)+0.5)

b1bk

:
S(b1bk ) = j=1,,kW(bj,j)



() - ()

:
4k (),

(/)
2k (/,
AT/GC)

(/-)




:
( )
(
)
,

(
)

:
-
,
.
.
, .
,
,


:
/
:
/
:


E. coli
, 1
2000 , :
25% ,
60% ()

CRP (E. coli)


110
100
90
80
70
60
50
40
30
20
10
0

OV
UN

3,2

3,4

3,6

3,8

4,2

4,4

4,6

4,8

threshold

OV: (% )
= 1
UN: (% )
= 1

GenBank E. coli
gene
CDS

protein_bind
protein_bind
protein_bind
promoter
protein_bind
protein_bind

aroP

complement(120178..121551)
/note="b0112"
/gene="aroP"
complement(120178..121551)
/gene="aroP"
/product="aromatic amino acid transport
protein"
complement(121599..121617)
/bound_moiety="TyrR documented site"
complement(121622..121640)
/bound_moiety="TyrR documented site"
complement(121653..121664)
/bound_moiety="PutA predicted site"
complement(121683..121711)
/note="factor Sigma70; promoter aroP;
documented +1 at 121671"
complement(121810..121823)
/bound_moiety="OxyR predicted site"
complement(121813..121835)
/bound_moiety="ArgR predicted site"

TyrR

TyrR

PutA

Pr.

OxyR

ArgR

?
:


purL
ST AGCGGCATTTTGCGTAACAATGCGCCAGTTGGCAACTT-ATT-CGCAACGATAGCCGCACC--GTATGACAAGAAAAAGC
EC AGCGGCATTTTGCGTAAACCTGCGCCAGATGGCAACTT-ATT-ACAGCCATTGGCGGCACG--CGTTGCTAATTCACGAT
YP AGTGGCATTTTGCGCAACAAAACGCCAGTGTGCAACTTTATTGCGAGCTATTTGCTGAGTCTGCGTTACACACACATAGC
** *********** **
******
******* ***
* ** *
*
*
*
ST GG-TGATT---------TTATTTCT-------ACGCAAACGGTTTCGTCGGCGCGTCAGATTCTTTATAATGACGGCCGT
EC GG-TGATT---------TTATTTCC-------ACGCAAACGGTTTCGTCAGCGCATCAGATTCTTTATAATGACGCCCGT
YP GGCTGTTTCTGACTGAATTATTAATAATAGATACGCAAACGGTTTCGTCGGCGGCTCAGATTCACTATAATGGCGCGCGT
** ** **
*****
***************** *** ******** ******* ** ***
ST TTCCCCCC-------------------TTGCGCACACCAAA--------------GCTTAGAAGACGAGAGA--CTTA-EC TTCCCCCCC------------------TTGGGTACACCGAAA-------------GCTTAGAAGACGAGAGA--CTTA-YP TTTGCCCTGTTGTTGCGCCAATGAATGTTGCGCCCAATGAAGTGCTGTTCCAGCCGCTTCGAAGACGAGAGAAACTTAGA
** ***
*** * **
**
**** ************ ****
ST TGATGGAAATTCTGCGTGGTTCGCCTGCACTGTCTGCATTCCGTATCAATAAACTGCTGGCGCGCTTTCAGGCTGCCAAC
EC TGATGGAAATTCTGCGTGGTTCGCCTGCACTGTCGGCATTCCGAATCAACAAACTGCTGGCACGTTTTCAGGCTGCCAGG
YP TTATGGAAATACTGCGTGGTTCACCCGCTTTGTCGGCTTTTCGTATCACCAAACTGTTGTCCCGTTGCCAGGATGCTCAC
* ******** *********** ** ** **** ** ** ** **** ****** ** * ** * **** ***


yjcD
ST AAA-GCATAAAAAGCGGCAAAGTTCAGTTGAAAAAGCGTTGATGATCGCTGGATAATCGTTTGCTTTTTTTTG---CCAC
EC AAA-GAGAAAAAAGCAGCAAACTTCGGTTGAAAAAGCCGCTATGATCGCCGGATAATCGTTTGCTTTTTTTA----CCAC
YP AAATGTATTAAATGTCGCATTCGGGTGTTGATTAGTCACCACTGATGGCTAGATAATCGTTTGCCTTAAATGACATCTGC
*** *
*** * ***
***** * *
**** ** ************* **
*
* *
ST CC--------GTTTTGT--------ATACGTG----GAGCTAAACGTTTGCTTTTTTGCGGCGCCCCG-G-TTGTCGTAA
EC CC--------GTTTTGT--------ATGCGCG----GAGCTAAACGTTTGCTTTTTTGCGACGCAGCA-AATTGTCGCAA
YP CCTAAACTTCGATTTTTTTTCAGTCATGCGTTCTCCCAGCTAATCGTTTGCTATTTTTCCCCGCTCTATGAGTCAGGGAG
**
* *** *
** **
****** ******** **** * ***
*
* *
ST ATGTAGC----------ACAAGGA-GATAACGTTGCGCTGTTAGTGGATTACCTCCCACGTATACCGACGAATAATAAAT
EC ACCTGGA----------GCAGGAA-GATAACGTTTCGCTGGCAGGGGATTGTCCGCCACGCATCTTGACGAAAATTAAAC
YP AGTTAGTGAGTTCATCGACAGGAACGGAAACGATTACGTAGAGAAGGGCGCTTGGCTTGGCATGCTATTTTAAAATGA-C
* * *
** * * * **** *
*
**
*
* **
* * * *
ST TCTCAGGGGATGTTTTCT-ATGTCT------ACGCCTTCAGCGCGTACCGGCGGTTCACTCGACGCCTGGTTTAAAATTT
EC TCTCAGGGGATGTTTTCTTATGTCT------ACGCCATCAGCGCGTACCGGCGGTTCACTCGACGCCTGGTTTAAAATTT
YP ACACAGGGGACATCACC--ATGTCTAGCAGCAACCCTCAAGCACAGCCAAAGGGCACGCTTGATGCATTCTTTAAGCTTA
* ******* *
* ******
* **
*** *
*
** * ** ** ** * ***** **

rbsD :
Sty
Sen
Stm
Eco
Ype

AGGGTTACACTGCGGC-CAGCGAAACGTTTCGCTAGTGGAGCAGAAAAATGAAGAAAGGC
AGGGTTACACTGCGGC-CAGCGAAACGTTTCGCTAGTGGAGCAGAAAAATGAAGAAAGGC
GGGGTTACACTGCGGC-CAGCGAAACGTTTCGCTAGTGGAGCAGAAAAATGAAGAAAGGC
AGGATTAAACTGTGGGTCAGCGAAACGTTTCGCTGATGGAGAA-AAAAATGAAAAAAGGC
TTTTCTAAACTCCTTGTTAGCGAAACGTTTCGCTCTTGGAGTA-GATCATGAAAAAAGGT
** ***
**************** ***** * * ***** *****

Sty
Sen
Stm
Eco
Ype

ACCGTACTCAACTCTGAAATCTCGTCGGTCATTTCCCGTCTGGGGCATACTGATACTCTG
ACCGTACTCAACTCTGAAATCTCGTCGGTCATTTCCCGTCTGGGGCATACTGATACTCTG
ACCGTACTCAACTCTGAAATCTCGTCGGTCATTTCCCGTCTGGGGCATACTGATACTCTG
ACCGTTCTTAATTCTGATATTTCATCGGTGATCTCCCGTCTGGGACATACCGATACGCTG
GTATTACTGAACGCTGATATTTCCGCGGTTATCTCCCGTCTGGGCCATACCGATCAGATT
* ** ** **** ** ** **** ** *********** ***** ***
*




(.)

rVISTA: / /

You might also like