You are on page 1of 42

n to

SO SNH CC TRNH T SINH HC BNG BLAST V CLUTALX

Mc tiu ca bi hc

Nm c nhng nguyn tc so snh cc trnh t sinh hc S dng chng trnh BLAST gip chng ta nhanh chng tm ra nhng trnh t sinh hc tng ng (nu c trong cc CSDL ln nh NCBI, EMBL, DDPJ) vi trnh t yu cu.

Cung cp nhng s liu v t l tng ng, ngun gc cc trnh t tng ng,


Tm kim trnh t sinh hc

Bt cp trnh t

Sp xp thng hng trnh t l phng php sp xp hai hoc nhiu trnh t nhm t c s ging nhau ti a. Cc trnh t ny c th c xen bng cc khong trng (thng c din t bng cc gch ni ngang) ti cc v tr c th lm sao to thnh cc ct ging nhau (identical) hoc tng t nhau (similar). tcctctgcctctgccatcat---caaccccaaagt |||| ||| ||||| ||||| |||||||||||| tcctgtgcatctgcaatcatgggcaaccccaaagt
Gii thiu mn hc

Phng php ny thng c dng nghin cu s tin ha ca cc trnh t t mt t tin chung, c bit l cc trnh t sinh hc nh trnh t protein hoc trnh t DNA. Cc bt cp khng ng trong trnh t tng ng vi cc t bin v cc khong trng tng ng vi phn thm vo hoc xa i. Thut ng "sp xp thng hng trnh t" cng ch qu trnh to ra s sp xp ny hay tm ra cc cch sp xp tt nht trong c s d liu gm cc trnh t ring bit.
Gii thiu mn hc

Sp ging ct i mt (Pairwise alignment)

Sp ging ct i mt l phng php phc v cho vic tm kim mt trnh t sp ging ct ton b hay (cc b) m trng khp nht ca cc chui protein (amino acid) hay DNA (nucleic acid). Thng thng, mc ch ca n l tm ra (mi quan h) ng ng ca mt gene hay mt sn phm-gen trong mt c s d liu cc thng tin mu c sn. Thng tin ny l hu ch tr li mt lot cc cu hi sinh hc khc nhau.

Gii thiu mn hc

ng dng

Mt vi v d v nhng cu hi m cc nh nghin cu dng BLAST tm cu tr li. Chng loi vi khun no c cc protein c lin h v ging loi vi mt loi protein khc m c chui amino-acid m ta bit khng?. Chui DNA m ta va sp xp c ngun gc t u? C gen no khc dng m ha cc protein c cu trc hay dng dp gn vi ci m ta va xc nh khng?. BLAST cn c dng kt hp vi cc gii thut khc c i hi s so trng chui gn ng.
Gii thiu mn hc

Blast

BLAST l mt gii thut so snh cc chui sinh hc, nh cc chui ca cc protein hay ca cc chui DNA khc nhau.

Chng ta dng blast khi cu hi t ra liu c trnh t no trong ngn hng d liu ging hoc gn ging vi trnh t ca bn khng?.

Gii thiu mn hc

Nguyn tc trong blast


Thu thp v la chn trnh t (protein hay DNA, RNA)
Phn tch kt qu blast

Blast

Thut ton ca BLAST c 2 phn, mt phn tm kim v mt phn nh gi thng k da trn kt qu tm c.

Gii thiu mn hc

Thut ton blast

Thut ton ca BLAST c 2 phn, mt phn tm kim v mt phn nh gi thng k da trn kt qu tm c. Trong phn nh gi thng k, BLAST da trn c s nh gi ca mt cp trnh t tnh ra mt gi tr gi l [Bit-Score]. Gi tr cng cao chng t kh nng tng t ca cc bt cp cng cao. Ngoi ra BLAST tnh ton mt gi tr trng i EScore (Expect-Score) ph thuc vo Bit-Score.
9

Gii thiu mn hc

Gi tr xc xut trong blast

Gii thiu mn hc

10

Cc bc tm kim trong blast


Bc 1: BLAST tm kim cc chui con ngn vi chiu di c nh W c tnh tng t cao

Bc 2: BLAST tip tc tm kip nhng cp Hits tip theo da trn c s nhng Hit tm c trong bc 1
Minimum Score (S) Neighborhood Score Threshold (T)

Nhng chui con no c s im ln hn mt gi tr ngng T (threshold value) th c gi l tm thy v c BLAST gi l Hits


Gii thiu mn hc

11

M rng so snh cc trnh t

Bc 3: Cui cng BLAST m rng nhng cp Hits tm c theo c hai chiu v ng thi nh s im. Qu trnh m rng kt thc khi im ca cc cp Hits khng th m rng thm na.

KENFDKARFSGTWYAMAKKDPEG 50 MKGLDIQKVAGTWYSLAMAASD. 44
M rng

RBP (query)
lactoglobulin (hit)

Hit!

M rng

Gii thiu mn hc

12

Nhng chui con nucleotide trong blast

Nhng chui con ny c nh gi cho im da trn ma trn thay th (Substitutionsmatrix) BLOSUM hoc PAM.
Gii thiu mn hc

13

Protein words

Nhng chui con ny c nh gi cho im da trn ma trn thay th (Substitutionsmatrix) BLOSUM hoc PAM.
Gii thiu mn hc

14

Cch tnh im
Phng php chung:

Terminal mismatches (0) Bt cp nhau score (1) Mismatch penalty (-3) Gap penalty (-1) Gap extension penalty (-1)

DNA Defaults

Cch tnh im s DNA


GGGGGGAGAA |||||*|*|| GGGGGAAAAAGGGGG 8(1)+2(-3)=

GGGGGGAGAA--GGG

|||||*|*|| ||| GGGGGAAAAAGGGGG

11(1)+2(-3)+1(-1)+1(-1)=

So snh cc c tnh di truyn ca cc loi

B v C (DNA)
32 .ACAGGACATTTTACTACTCTGCAGATAATGGCTGACTTTGACATGGTAC | | | | | | || | | || | | |||| | 51 TTCTTCAGACTGCGCCATGGGGCTCAGCGACGGGGAATGGCAGTTGGTGC . . . . . 81 TGAAGTGCTGGGGTCCAATGGAGGCGGACCACGCAACCCACGGGAGTCTG |||| |||||| ||||||| || |||| ||| ||| | 101 TGAATGCCTGGGGGAAGGTGGAGGCTGATGTCGCAGGCCATGGGCAGGAG . . . . . 131 GTGCTGACCCGTTTATTCACAGAGCACCCAGAAACCCTAAAGTTATTCCC || || | | | | ||||||| || || || ||||| || ||| 151 GTCCTCATCAGGCTCTTCACAGGTCATCCCGAGACCCTGGAGAAATTTGA . . . . . 181 CAAGTTTGCTGGC...ATCGCCCATGGGGACCTGGCCGGGGATGCAGGTG |||||| | | | | | || || | | | 201 CAAGTTCAAGCACCTGAAGACAGAGGCTGAGATGAAGGCCTCCGAGGACC 80 100 130 150 180 200 227 250

48% similarity

B v Heo
1 CAGCTGTCGGAGACAGACACCCAGTCAGTCCCGCCCTTGTTCTTTTTCTC | ||| ||| || | ||||| |||| ||| |||||| 1 .......CAGAGCCAGGACACCCAGTACGCCCGCACTTGCTCTGTTTCTC . . . . . 51 TTCTTCAGACTGCGCCATGGGGCTCAGCGACGGGGAATGGCAGTTGGTGC |||| ||||||| |||||||||||||||||||||||||||||| |||||| 44 TTCTGCAGACTGTGCCATGGGGCTCAGCGACGGGGAATGGCAGCTGGTGC . . . . . 101 TGAATGCCTGGGGGAAGGTGGAGGCTGATGTCGCAGGCCATGGGCAGGAG |||| | ||||||||||||||||||||||||||||||||||||||||||| 94 TGAACGTCTGGGGGAAGGTGGAGGCTGATGTCGCAGGCCATGGGCAGGAG . . . . . 151 GTCCTCATCAGGCTCTTCACAGGTCATCCCGAGACCCTGGAGAAATTTGA ||||||||||||||||| | ||||| ||||||||||||||||||||||| 144 GTCCTCATCAGGCTCTTTAAGGGTCACCCCGAGACCCTGGAGAAATTTGA . . . . . 201 CAAGTTCAAGCACCTGAAGACAGAGGCTGAGATGAAGGCCTCCGAGGACC |||||| |||||||||||| |||||| ||||||||||||||| ||||||| 194 CAAGTTTAAGCACCTGAAGTCAGAGGATGAGATGAAGGCCTCTGAGGACC 50 43 100 93 150 143 200 193 250 243

80% ging nhau (88% at aa!)

Cc bin th ca blast

Program query 1 blastn blastp blastx DNA

Database DNA

1
protein 6 DNA protein protein

20

Blastn

Megablast Discontiguous megablast

Gii thiu mn hc

21

So snh trnh t Nhp vo vi trnh t c s d liu

Gii thiu mn hc

22

Megablast

Large numbers of query sequences (megablast): Khi so snh mt s lng ln cc chui u vo qua ch mt BLAST dng dng lnh, "megablast" l nhanh hn rt nhiu so vi chy BLAST nhiu ln.
Gii thiu mn hc

23

Protein-protein BLAST
Chng trnh ny, khi a vo mt protein truy vn, s tr v cc chui protein gn ging nht t c s d liu protein m ngi dng ch nh. Blastp PSI-blast PHI-blast

Gii thiu mn hc

24

Kt qu
PHI-Blast PSI-Blast

Gii thiu mn hc

25

PSI blast Iteration 1

Gii thiu mn hc

26

Cha ng nhng vng protein-PSI blast

Mt trong nhng chng trnh BLAST mi nht, chng trnh ny dng tm kim cc mi quan h xa (distant relative) ca mt protein.
Gii thiu mn hc

27

Kt qu

Gii thiu mn hc

28

Kt qu

Gii thiu mn hc

29

Blastx

Gii thiu mn hc

30

Kt qu
Blastx dch m protein t trnh t DNA nhp vo

Gii thiu mn hc

31

So snh hai trnh t bng blast

Gii thiu mn hc

32

So snh H5N1 v streptococus

Load trnh t 1

Load trnh t 2 Nhn th

Gii thiu mn hc

33

Kt qu bng so snh hai trnh t

Gii thiu mn hc

34

Kt qu so snh H5N1 v Streptococus

Gii thiu mn hc

35

Phn mm Clutalx

Clustalx l mt phn mm (giao din window) dng cho vic so snh s tng ng ca hai hay nhiu trnh t sinh hc. Clustalx m t kt qu bng h thng mu sc v cc k hiu ni bc nhng nt c trng trong nhng on tng ng.

ClustaX ngy cng tr nn hu ch cho cc nh nghin cu trong vic tm kim nhng vng bo tn trn nhng trnh t DNA hoc protein
Gii thiu mn hc

36

Nguyn tc Clustalx

Thu nhn v la chn tp trnh t (protein hay DNA, RNA) Nhp cc trnh t sinh hc vo Clustalx

Phn tch kt qu sp ging ct

Gii thiu mn hc

37

Thu thp v la chn tp trnh t


Trc khi thc hin vic ging ct, phi la mt cch cn thn tp trnh t m cn ging ct. Nhng trnh t ny thuc cng mt protein, DNA hay RNA v cng t tin Ty thuc vo mc ch xy dng sp ging ct th ta chn ra mt s trnh t phn tch bng ClustalX V d: pht hin t bin th ta phi tm trnh t gen ca chng hoang di v cc trnh t ca gen ca cc chng c cho l t bin Nu mun tm vng bo tn th ta phi thu thp cc trnh t gen cng mt h protease A, gen c t LT

Gii thiu mn hc

38

Sp ging ct bng Clustalx

Gii thiu mn hc

39

Gii thiu mn hc

40

Bi tp
1.

2.

3.

4.

Thc hin sp ging ct cc trnh t protein HSP70 mt s loi vi khun Thu thp v chn lc tp trnh t gen quan tm, ( v d gen C-prM virus Dengue, gy t huyt ngi Chn vng bo tn nht trong tp trnh t c sp ging ct. on bo tn c chn lm trnh t ch nhn bn bng phn mm thit k mi PDA

Gii thiu mn hc

41

Tin sinh hc tr li mi quan h h hng

http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/tut1. html

Gii thiu mn hc

42

You might also like