Professional Documents
Culture Documents
So Sánh Các Trình Tự Sinh Học Bằng Blast Và Clutalx
So Sánh Các Trình Tự Sinh Học Bằng Blast Và Clutalx
Mc tiu ca bi hc
Nm c nhng nguyn tc so snh cc trnh t sinh hc S dng chng trnh BLAST gip chng ta nhanh chng tm ra nhng trnh t sinh hc tng ng (nu c trong cc CSDL ln nh NCBI, EMBL, DDPJ) vi trnh t yu cu.
Bt cp trnh t
Sp xp thng hng trnh t l phng php sp xp hai hoc nhiu trnh t nhm t c s ging nhau ti a. Cc trnh t ny c th c xen bng cc khong trng (thng c din t bng cc gch ni ngang) ti cc v tr c th lm sao to thnh cc ct ging nhau (identical) hoc tng t nhau (similar). tcctctgcctctgccatcat---caaccccaaagt |||| ||| ||||| ||||| |||||||||||| tcctgtgcatctgcaatcatgggcaaccccaaagt
Gii thiu mn hc
Phng php ny thng c dng nghin cu s tin ha ca cc trnh t t mt t tin chung, c bit l cc trnh t sinh hc nh trnh t protein hoc trnh t DNA. Cc bt cp khng ng trong trnh t tng ng vi cc t bin v cc khong trng tng ng vi phn thm vo hoc xa i. Thut ng "sp xp thng hng trnh t" cng ch qu trnh to ra s sp xp ny hay tm ra cc cch sp xp tt nht trong c s d liu gm cc trnh t ring bit.
Gii thiu mn hc
Sp ging ct i mt l phng php phc v cho vic tm kim mt trnh t sp ging ct ton b hay (cc b) m trng khp nht ca cc chui protein (amino acid) hay DNA (nucleic acid). Thng thng, mc ch ca n l tm ra (mi quan h) ng ng ca mt gene hay mt sn phm-gen trong mt c s d liu cc thng tin mu c sn. Thng tin ny l hu ch tr li mt lot cc cu hi sinh hc khc nhau.
Gii thiu mn hc
ng dng
Mt vi v d v nhng cu hi m cc nh nghin cu dng BLAST tm cu tr li. Chng loi vi khun no c cc protein c lin h v ging loi vi mt loi protein khc m c chui amino-acid m ta bit khng?. Chui DNA m ta va sp xp c ngun gc t u? C gen no khc dng m ha cc protein c cu trc hay dng dp gn vi ci m ta va xc nh khng?. BLAST cn c dng kt hp vi cc gii thut khc c i hi s so trng chui gn ng.
Gii thiu mn hc
Blast
BLAST l mt gii thut so snh cc chui sinh hc, nh cc chui ca cc protein hay ca cc chui DNA khc nhau.
Chng ta dng blast khi cu hi t ra liu c trnh t no trong ngn hng d liu ging hoc gn ging vi trnh t ca bn khng?.
Gii thiu mn hc
Blast
Gii thiu mn hc
Thut ton ca BLAST c 2 phn, mt phn tm kim v mt phn nh gi thng k da trn kt qu tm c. Trong phn nh gi thng k, BLAST da trn c s nh gi ca mt cp trnh t tnh ra mt gi tr gi l [Bit-Score]. Gi tr cng cao chng t kh nng tng t ca cc bt cp cng cao. Ngoi ra BLAST tnh ton mt gi tr trng i EScore (Expect-Score) ph thuc vo Bit-Score.
9
Gii thiu mn hc
Gii thiu mn hc
10
Bc 2: BLAST tip tc tm kip nhng cp Hits tip theo da trn c s nhng Hit tm c trong bc 1
Minimum Score (S) Neighborhood Score Threshold (T)
11
Bc 3: Cui cng BLAST m rng nhng cp Hits tm c theo c hai chiu v ng thi nh s im. Qu trnh m rng kt thc khi im ca cc cp Hits khng th m rng thm na.
KENFDKARFSGTWYAMAKKDPEG 50 MKGLDIQKVAGTWYSLAMAASD. 44
M rng
RBP (query)
lactoglobulin (hit)
Hit!
M rng
Gii thiu mn hc
12
Nhng chui con ny c nh gi cho im da trn ma trn thay th (Substitutionsmatrix) BLOSUM hoc PAM.
Gii thiu mn hc
13
Protein words
Nhng chui con ny c nh gi cho im da trn ma trn thay th (Substitutionsmatrix) BLOSUM hoc PAM.
Gii thiu mn hc
14
Cch tnh im
Phng php chung:
Terminal mismatches (0) Bt cp nhau score (1) Mismatch penalty (-3) Gap penalty (-1) Gap extension penalty (-1)
DNA Defaults
GGGGGGAGAA--GGG
11(1)+2(-3)+1(-1)+1(-1)=
B v C (DNA)
32 .ACAGGACATTTTACTACTCTGCAGATAATGGCTGACTTTGACATGGTAC | | | | | | || | | || | | |||| | 51 TTCTTCAGACTGCGCCATGGGGCTCAGCGACGGGGAATGGCAGTTGGTGC . . . . . 81 TGAAGTGCTGGGGTCCAATGGAGGCGGACCACGCAACCCACGGGAGTCTG |||| |||||| ||||||| || |||| ||| ||| | 101 TGAATGCCTGGGGGAAGGTGGAGGCTGATGTCGCAGGCCATGGGCAGGAG . . . . . 131 GTGCTGACCCGTTTATTCACAGAGCACCCAGAAACCCTAAAGTTATTCCC || || | | | | ||||||| || || || ||||| || ||| 151 GTCCTCATCAGGCTCTTCACAGGTCATCCCGAGACCCTGGAGAAATTTGA . . . . . 181 CAAGTTTGCTGGC...ATCGCCCATGGGGACCTGGCCGGGGATGCAGGTG |||||| | | | | | || || | | | 201 CAAGTTCAAGCACCTGAAGACAGAGGCTGAGATGAAGGCCTCCGAGGACC 80 100 130 150 180 200 227 250
48% similarity
B v Heo
1 CAGCTGTCGGAGACAGACACCCAGTCAGTCCCGCCCTTGTTCTTTTTCTC | ||| ||| || | ||||| |||| ||| |||||| 1 .......CAGAGCCAGGACACCCAGTACGCCCGCACTTGCTCTGTTTCTC . . . . . 51 TTCTTCAGACTGCGCCATGGGGCTCAGCGACGGGGAATGGCAGTTGGTGC |||| ||||||| |||||||||||||||||||||||||||||| |||||| 44 TTCTGCAGACTGTGCCATGGGGCTCAGCGACGGGGAATGGCAGCTGGTGC . . . . . 101 TGAATGCCTGGGGGAAGGTGGAGGCTGATGTCGCAGGCCATGGGCAGGAG |||| | ||||||||||||||||||||||||||||||||||||||||||| 94 TGAACGTCTGGGGGAAGGTGGAGGCTGATGTCGCAGGCCATGGGCAGGAG . . . . . 151 GTCCTCATCAGGCTCTTCACAGGTCATCCCGAGACCCTGGAGAAATTTGA ||||||||||||||||| | ||||| ||||||||||||||||||||||| 144 GTCCTCATCAGGCTCTTTAAGGGTCACCCCGAGACCCTGGAGAAATTTGA . . . . . 201 CAAGTTCAAGCACCTGAAGACAGAGGCTGAGATGAAGGCCTCCGAGGACC |||||| |||||||||||| |||||| ||||||||||||||| ||||||| 194 CAAGTTTAAGCACCTGAAGTCAGAGGATGAGATGAAGGCCTCTGAGGACC 50 43 100 93 150 143 200 193 250 243
Cc bin th ca blast
Database DNA
1
protein 6 DNA protein protein
20
Blastn
Gii thiu mn hc
21
Gii thiu mn hc
22
Megablast
Large numbers of query sequences (megablast): Khi so snh mt s lng ln cc chui u vo qua ch mt BLAST dng dng lnh, "megablast" l nhanh hn rt nhiu so vi chy BLAST nhiu ln.
Gii thiu mn hc
23
Protein-protein BLAST
Chng trnh ny, khi a vo mt protein truy vn, s tr v cc chui protein gn ging nht t c s d liu protein m ngi dng ch nh. Blastp PSI-blast PHI-blast
Gii thiu mn hc
24
Kt qu
PHI-Blast PSI-Blast
Gii thiu mn hc
25
Gii thiu mn hc
26
Mt trong nhng chng trnh BLAST mi nht, chng trnh ny dng tm kim cc mi quan h xa (distant relative) ca mt protein.
Gii thiu mn hc
27
Kt qu
Gii thiu mn hc
28
Kt qu
Gii thiu mn hc
29
Blastx
Gii thiu mn hc
30
Kt qu
Blastx dch m protein t trnh t DNA nhp vo
Gii thiu mn hc
31
Gii thiu mn hc
32
Load trnh t 1
Gii thiu mn hc
33
Gii thiu mn hc
34
Gii thiu mn hc
35
Phn mm Clutalx
Clustalx l mt phn mm (giao din window) dng cho vic so snh s tng ng ca hai hay nhiu trnh t sinh hc. Clustalx m t kt qu bng h thng mu sc v cc k hiu ni bc nhng nt c trng trong nhng on tng ng.
ClustaX ngy cng tr nn hu ch cho cc nh nghin cu trong vic tm kim nhng vng bo tn trn nhng trnh t DNA hoc protein
Gii thiu mn hc
36
Nguyn tc Clustalx
Thu nhn v la chn tp trnh t (protein hay DNA, RNA) Nhp cc trnh t sinh hc vo Clustalx
Gii thiu mn hc
37
Gii thiu mn hc
38
Gii thiu mn hc
39
Gii thiu mn hc
40
Bi tp
1.
2.
3.
4.
Thc hin sp ging ct cc trnh t protein HSP70 mt s loi vi khun Thu thp v chn lc tp trnh t gen quan tm, ( v d gen C-prM virus Dengue, gy t huyt ngi Chn vng bo tn nht trong tp trnh t c sp ging ct. on bo tn c chn lm trnh t ch nhn bn bng phn mm thit k mi PDA
Gii thiu mn hc
41
http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/tut1. html
Gii thiu mn hc
42