Professional Documents
Culture Documents
.
2006
. ()
. ()
. , . ()
.. ( )
()
-
(cI)
-
(Cro)
(..)
,
100 000
Pattern ( -
o )
(
) positional weight matrix, PWM,
profile
codB
purE
pyrD
purT
cvpA
purC
purM
purH
purL
consensus
CCCACGAAAACGATTGCTTTTT
GCCACGCAACCGTTTTCCTTGC
GTTCGGAAAACGTTTGCGTTTT
CACACGCAAACGTTTTCGTTTA
CCTACGCAAACGTTTTCTTTTT
GATACGCAAACGTGTGCGTCTG
GTCTCGCAAACGTTTGCTTTCC
GTTGCGCAAACGTTTTCGTTAC
TCTACGCAAACGGTTTCGTCGG
ACGCAAACGTTTTCGT
codB
purE
pyrD
purT
cvpA
purC
purM
purH
purL
consensus
pattern
CCCACGAAAACGATTGCTTTTT
GCCACGCAACCGTTTTCCTTGC
GTTCGGAAAACGTTTGCGTTTT
CACACGCAAACGTTTTCGTTTA
CCTACGCAAACGTTTTCTTTTT
GATACGCAAACGTGTGCGTCTG
GTCTCGCAAACGTTTGCTTTCC
GTTGCGCAAACGTTTTCGTTAC
TCTACGCAAACGGTTTCGTCGG
ACGCAAACGTTTTCGT
aCGmAAACGtTTkCkT
j a C G m A A A C G
T T k C k T
A 6
0 0
1 0
G 1
5 0
3 9
Logo
()
j a C G m A A A C G
T T k C k T
A 6
0 0
1 0
G 1
5 0
3 9
2.2
2.2
1.9 0.7 0.7 0.1 1.0 0.7 1.1 0.7 1.4 0.7
0.4
1.9 0.7
0.4
0.1
0.1
1.9
2.2
1.2 0.7
1.0 0.7
1.0 0.7
0.6
2.2
W(b,j)=ln(N(b,j)+0.5) 0.25iln(N(i,j)+0.5)
: -
( )
: z-score (
)
:
(
)
(pseudocounts)
,
..
GenBank
()
( )
:
:
: -
..
Bacillus subtilis
dnaN
gyrA
serS
bofA
csfB
xpaC
metS
gcaD
spoVC
ftsH
pabB
rplJ
tufA
rpsJ
rpoA
rplM
ACATTATCCGTTAGGAGGATAAAAATG
GTGATACTTCAGGGAGGTTTTTTAATG
TCAATAAAAAAAGGAGTGTTTCGCATG
CAAGCGAAGGAGATGAGAAGATTCATG
GCTAACTGTACGGAGGTGGAGAAGATG
ATAGACACAGGAGTCGATTATCTCATG
ACATTCTGATTAGGAGGTTTCAAGATG
AAAAGGGATATTGGAGGCCAATAAATG
TATGTGACTAAGGGAGGATTCGCCATG
GCTTACTGTGGGAGGAGGTAAGGAATG
AAAGAAAATAGAGGAATGATACAAATG
CAAGAATCTACAGGAGGTGTAACCATG
AAAGCTCTTAAGGAGGATTTTAGAATG
TGTAGGCGAAAAGGAGGGAAAATAATG
CGTTTTGAAGGAGGGTTTTAAGTAATG
AGATCATTTAGGAGGGGAAATTCAATG
dnaN
gyrA
serS
bofA
csfB
xpaC
metS
gcaD
spoVC
ftsH
pabB
rplJ
tufA
rpsJ
rpoA
rplM
cons.
num.
ACATTATCCGTTAGGAGGATAAAAATG
GTGATACTTCAGGGAGGTTTTTTAATG
TCAATAAAAAAAGGAGTGTTTCGCATG
CAAGCGAAGGAGATGAGAAGATTCATG
GCTAACTGTACGGAGGTGGAGAAGATG
ATAGACACAGGAGTCGATTATCTCATG
ACATTCTGATTAGGAGGTTTCAAGATG
AAAAGGGATATTGGAGGCCAATAAATG
TATGTGACTAAGGGAGGATTCGCCATG
GCTTACTGTGGGAGGAGGTAAGGAATG
AAAGAAAATAGAGGAATGATACAAATG
CAAGAATCTACAGGAGGTGTAACCATG
AAAGCTCTTAAGGAGGATTTTAGAATG
TGTAGGCGAAAAGGAGGGAAAATAATG
CGTTTTGAAGGAGGGTTTTAAGTAATG
AGATCATTTAGGAGGGGAAATTCAATG
aaagtatataagggagggttaataATG
001000000000110110000000111
760666658967228106888659666
dnaN
gyrA
serS
bofA
csfB
xpaC
metS
gcaD
spoVC
ftsH
pabB
rplJ
tufA
rpsJ
rpoA
rplM
cons.
num.
ACATTATCCGTTAGGAGGATAAAAATG
GTGATACTTCAGGGAGGTTTTTTAATG
TCAATAAAAAAAGGAGTGTTTCGCATG
CAAGCGAAGGAGATGAGAAGATTCATG
GCTAACTGTACGGAGGTGGAGAAGATG
ATAGACACAGGAGTCGATTATCTCATG
ACATTCTGATTAGGAGGTTTCAAGATG
AAAAGGGATATTGGAGGCCAATAAATG
TATGTGACTAAGGGAGGATTCGCCATG
GCTTACTGTGGGAGGAGGTAAGGAATG
AAAGAAAATAGAGGAATGATACAAATG
CAAGAATCTACAGGAGGTGTAACCATG
AAAGCTCTTAAGGAGGATTTTAGAATG
TGTAGGCGAAAAGGAGGGAAAATAATG
CGTTTTGAAGGAGGGTTTTAAGTAATG
AGATCATTTAGGAGGGGAAATTCAATG
tacataaaggaggtttaaaaat
0000000111111000000001
5755779156663678679890
( aGGAGG)
(ab initio)
:
k (k-)
k-
,
( )
k-
:
:
,
:
k-
k-
,
( )
k-
: k- -
?
1 : ,
: ( k-)
.
2 : ,
.
:
,
;
-
k-
. k-
, (,
, h , h<<k).
n- (n
).
( )
,
,
...
(
)
.
Expectation - Maximization
(, k-
)
:
,
.
, .
: .
:
I
: A (
), I(A)
.
B ,
, I(B)
.
I(B) I(A), B
I(B) < I(A), B
P = exp [(I(B) I(A)) / T]
T ,
,
( 1).
Gibbs sampler
, A , I(A)
.
P ~ exp [(I(Anew)]
, .
(: )
:
()
W(b,j)=ln(N(b,j)+0.5) 0.25iln(N(i,j)+0.5)
b1bk
:
S(b1bk ) = j=1,,kW(bj,j)
() - ()
:
4k (),
(/)
2k (/,
AT/GC)
(/-)
:
( )
(
)
,
(
)
:
-
,
.
.
, .
,
,
:
/
:
/
:
E. coli
, 1
2000 , :
25% ,
60% ()
OV
UN
3,2
3,4
3,6
3,8
4,2
4,4
4,6
4,8
threshold
OV: (% )
= 1
UN: (% )
= 1
GenBank E. coli
gene
CDS
protein_bind
protein_bind
protein_bind
promoter
protein_bind
protein_bind
aroP
complement(120178..121551)
/note="b0112"
/gene="aroP"
complement(120178..121551)
/gene="aroP"
/product="aromatic amino acid transport
protein"
complement(121599..121617)
/bound_moiety="TyrR documented site"
complement(121622..121640)
/bound_moiety="TyrR documented site"
complement(121653..121664)
/bound_moiety="PutA predicted site"
complement(121683..121711)
/note="factor Sigma70; promoter aroP;
documented +1 at 121671"
complement(121810..121823)
/bound_moiety="OxyR predicted site"
complement(121813..121835)
/bound_moiety="ArgR predicted site"
TyrR
TyrR
PutA
Pr.
OxyR
ArgR
?
:
purL
ST AGCGGCATTTTGCGTAACAATGCGCCAGTTGGCAACTT-ATT-CGCAACGATAGCCGCACC--GTATGACAAGAAAAAGC
EC AGCGGCATTTTGCGTAAACCTGCGCCAGATGGCAACTT-ATT-ACAGCCATTGGCGGCACG--CGTTGCTAATTCACGAT
YP AGTGGCATTTTGCGCAACAAAACGCCAGTGTGCAACTTTATTGCGAGCTATTTGCTGAGTCTGCGTTACACACACATAGC
** *********** **
******
******* ***
* ** *
*
*
*
ST GG-TGATT---------TTATTTCT-------ACGCAAACGGTTTCGTCGGCGCGTCAGATTCTTTATAATGACGGCCGT
EC GG-TGATT---------TTATTTCC-------ACGCAAACGGTTTCGTCAGCGCATCAGATTCTTTATAATGACGCCCGT
YP GGCTGTTTCTGACTGAATTATTAATAATAGATACGCAAACGGTTTCGTCGGCGGCTCAGATTCACTATAATGGCGCGCGT
** ** **
*****
***************** *** ******** ******* ** ***
ST TTCCCCCC-------------------TTGCGCACACCAAA--------------GCTTAGAAGACGAGAGA--CTTA-EC TTCCCCCCC------------------TTGGGTACACCGAAA-------------GCTTAGAAGACGAGAGA--CTTA-YP TTTGCCCTGTTGTTGCGCCAATGAATGTTGCGCCCAATGAAGTGCTGTTCCAGCCGCTTCGAAGACGAGAGAAACTTAGA
** ***
*** * **
**
**** ************ ****
ST TGATGGAAATTCTGCGTGGTTCGCCTGCACTGTCTGCATTCCGTATCAATAAACTGCTGGCGCGCTTTCAGGCTGCCAAC
EC TGATGGAAATTCTGCGTGGTTCGCCTGCACTGTCGGCATTCCGAATCAACAAACTGCTGGCACGTTTTCAGGCTGCCAGG
YP TTATGGAAATACTGCGTGGTTCACCCGCTTTGTCGGCTTTTCGTATCACCAAACTGTTGTCCCGTTGCCAGGATGCTCAC
* ******** *********** ** ** **** ** ** ** **** ****** ** * ** * **** ***
yjcD
ST AAA-GCATAAAAAGCGGCAAAGTTCAGTTGAAAAAGCGTTGATGATCGCTGGATAATCGTTTGCTTTTTTTTG---CCAC
EC AAA-GAGAAAAAAGCAGCAAACTTCGGTTGAAAAAGCCGCTATGATCGCCGGATAATCGTTTGCTTTTTTTA----CCAC
YP AAATGTATTAAATGTCGCATTCGGGTGTTGATTAGTCACCACTGATGGCTAGATAATCGTTTGCCTTAAATGACATCTGC
*** *
*** * ***
***** * *
**** ** ************* **
*
* *
ST CC--------GTTTTGT--------ATACGTG----GAGCTAAACGTTTGCTTTTTTGCGGCGCCCCG-G-TTGTCGTAA
EC CC--------GTTTTGT--------ATGCGCG----GAGCTAAACGTTTGCTTTTTTGCGACGCAGCA-AATTGTCGCAA
YP CCTAAACTTCGATTTTTTTTCAGTCATGCGTTCTCCCAGCTAATCGTTTGCTATTTTTCCCCGCTCTATGAGTCAGGGAG
**
* *** *
** **
****** ******** **** * ***
*
* *
ST ATGTAGC----------ACAAGGA-GATAACGTTGCGCTGTTAGTGGATTACCTCCCACGTATACCGACGAATAATAAAT
EC ACCTGGA----------GCAGGAA-GATAACGTTTCGCTGGCAGGGGATTGTCCGCCACGCATCTTGACGAAAATTAAAC
YP AGTTAGTGAGTTCATCGACAGGAACGGAAACGATTACGTAGAGAAGGGCGCTTGGCTTGGCATGCTATTTTAAAATGA-C
* * *
** * * * **** *
*
**
*
* **
* * * *
ST TCTCAGGGGATGTTTTCT-ATGTCT------ACGCCTTCAGCGCGTACCGGCGGTTCACTCGACGCCTGGTTTAAAATTT
EC TCTCAGGGGATGTTTTCTTATGTCT------ACGCCATCAGCGCGTACCGGCGGTTCACTCGACGCCTGGTTTAAAATTT
YP ACACAGGGGACATCACC--ATGTCTAGCAGCAACCCTCAAGCACAGCCAAAGGGCACGCTTGATGCATTCTTTAAGCTTA
* ******* *
* ******
* **
*** *
*
** * ** ** ** * ***** **
rbsD :
Sty
Sen
Stm
Eco
Ype
AGGGTTACACTGCGGC-CAGCGAAACGTTTCGCTAGTGGAGCAGAAAAATGAAGAAAGGC
AGGGTTACACTGCGGC-CAGCGAAACGTTTCGCTAGTGGAGCAGAAAAATGAAGAAAGGC
GGGGTTACACTGCGGC-CAGCGAAACGTTTCGCTAGTGGAGCAGAAAAATGAAGAAAGGC
AGGATTAAACTGTGGGTCAGCGAAACGTTTCGCTGATGGAGAA-AAAAATGAAAAAAGGC
TTTTCTAAACTCCTTGTTAGCGAAACGTTTCGCTCTTGGAGTA-GATCATGAAAAAAGGT
** ***
**************** ***** * * ***** *****
Sty
Sen
Stm
Eco
Ype
ACCGTACTCAACTCTGAAATCTCGTCGGTCATTTCCCGTCTGGGGCATACTGATACTCTG
ACCGTACTCAACTCTGAAATCTCGTCGGTCATTTCCCGTCTGGGGCATACTGATACTCTG
ACCGTACTCAACTCTGAAATCTCGTCGGTCATTTCCCGTCTGGGGCATACTGATACTCTG
ACCGTTCTTAATTCTGATATTTCATCGGTGATCTCCCGTCTGGGACATACCGATACGCTG
GTATTACTGAACGCTGATATTTCCGCGGTTATCTCCCGTCTGGGCCATACCGATCAGATT
* ** ** **** ** ** **** ** *********** ***** ***
*
(.)
rVISTA: / /