You are on page 1of 11

XY DNG NG DNG H TR PHT TRIN SINH TIN HC

Trn Vn Lng1
Phn vin Cng ngh thng tin ti TP H Ch Minh
Tm tt: Vic xy dng cc ng dng h tr nghin cu v hun luyn v sinh tin hc l v cng cn thit, qua gip chng ta nhanh chng ha nhp vi cng ng th gii, m ra c hi hp tc vi cc nc trong khu vc. Bi bo trnh by mt s kt qu c c trong nhng nm va qua ca nhm nhin cu v Sinh tin hc ca Phn vin Cng ngh thng tin ti TP.H Ch Minh. Bi bo gm ba phn, phn th nht trnh by mt s bi ton c bn v Sinh tin hoc nh: thit k mi, tm cc enzym ct, v cy sinh loi, chn on cu trc protein,.... Phn th hai trnh by nhng kt qu hin thc cc bi ton trn vo trong ng dng Sinh tin hoc c tn gi HiBio ca nhm v cc nghin cu ang tp trung pht trin khc. Phn cui cng nu mt s kt lun, so snh kt qu t c so vi mt s phn mm khc cng lnh vc. ng thi nu ln mt s kin ngh.

1. M u Trong vi thp k qua, sinh hc phn t c nhiu bc pht trin mnh m, mt lot cc cng c ng dng sinh hc ra i gp phn thc y qu trnh gii m mt s lng ln trnh t b gene nhiu loi sinh vt. Cho n nay, nhiu b gene vi khun c gii m gn nh hon ton. D n gii trnh t tt c 24 nhim sc th ca b gene ngi cng hon thnh t cui nm 2000. C th ni cha bao gi thng tin sinh hoc tr nn phong ph v a dng nh hin nay. ng ph vi khi lng thng tin s nh vy, cng ngh thng tin c ng dng vo sinh hc mt cch kh trit . T , mt ngnh khoa hc hon ton mi ra i, l Sinh tin hoc (Bioinformatics). Thc cht Sinh tin hoc gn lin vi nhiu ngnh khoa hc khc nhau, nghin cu trn nhiu lnh vc khc nhau. Nhm thu thp, lu tr v phn tch cc d liu sinh hc. Khi , vai tr ca cc thut ton, h thng x l d liu, mi trng mng ht sc quan trng. Kt qu ca nhng nghin cu ny l to ra cc phn mm gip gii quyt mt s vn xung quanh vic tm hiu v gene, protein v mt s vn khc lin quan n sinh hc phn t. Nhng thnh tu ca lnh vc ny h tr mnh m n vic a sinh hc, y hc vo gii quyt ca nhng cng vic ca thc tin. c bit ng gp nhiu thnh tu khoa hc mi, nh vic tm ra cc ging cy trng v vt nui mi cho nng sut cao; thc y nhanh qu trnh chn on bnh, tm ra cc loi thuc cha bnh mi, ... Bn cnh , vi khi lng ln d liu sinh hc tc ng qua li ln nhau cng t ra nhiu vn . Chng hn, b gene ngi c gii m, tuy nhin hiu v s dng c b m ny cn phi c nhng kin thc v cu trc, chc nng ca protein, t mi vn dng c nhng kin thc ca b gene vo thc t, tc ng vo s di truyn. Bn cnh s protein trong c th ngi gp nhiu ln s gene. B gene c lu gi nh nhau trong cc t bo nhng kt qu m ho, biu hin ca n l b protein trong mi b phn, t bo li khng ging nhau. B protein thay i theo tnh trng ca t bo v phn ng li vi nhng tc ng bn ngoi. Bn gene c th ghi chp li trong nhng ti liu in n, nhng bn protein khng th ghi chp. Ch c th ghi li bn protein nh mt ti liu in t, do khng phi v qu ln m do s lin lc gia protein ny vi protein khc ch mang tnh c l v lin tc xy ra trong s tin ho ([4],[5],[6],[7],[8]).

Email: lang@hcmc.netnam.vn 1 Mc nh Chi, Q1, TPHCM

Chnh v vy vic a Tin hc vo Sinh hc v ang c quan tm bi hu ht cc nc pht trin trn th gii. Trong khu vc Chu , cc nc nh Trung Quc, Hn Quc, Nht Bn c nhng t ph trong lnh vc ny. Chnh ph ca h huy ng ngun nhn lc hot ng trong cc lnh vc khc nhau tham gia nghin cu. c bit, th gii c nhng ngn hng d liu ni ting nh: NCBI - Trung tm Quc gia v Thng tin Cng ngh Sinh hc (National Center for Biotechnology Information) ca M [9]. EMBL - Phng th nghim Sinh hc phn t (European Molecular Biology Laboratory) ca Chu u [10], hoc mt phn ca n l EBI - Vin Sinh tin hoc Chu u t Anh (European Bioinformatics Institute [11]). DDBJ - Ngn hng d liu DNA, Nht Bn (DNA Data Bank of Japan [12]). V phn mm, hin nay c kh nhiu phn mm gip x l cc trnh t sinh hc DNA v Protein nh: Phn mm FastA, Blast, Cn3D, Phylip, PepTool, GeneTool, ClustalX, BioEdit, SeqVISTA, SAGA, Primer3, PC-Genes, Discovery Studio Gene, DNASIS, DNAMAN, VECTOR NTI, AnnHyb, DNA Club, Plasmid Processor, Oligos, v.v... Nhng phn mm ny c mt s chc nng: Chuyn m trnh t DNA sang RNA. Hin th cp trnh t chnh v trnh t bt cp. Tm v tr ca mt enzyme gii hn trong mt trnh t. V bn plasmid, h tr thit k phn t Tm kim cc on mi (primer) So snh mc tng ng (similarity) gia cc trnh t V cy biu din mc tng ng gia cc trnh t (biu dendogram) Tm kim cc trnh t, cc on lp (motif), cc enzyme trong c s d liu. i vi Vit Nam chng ta, lnh vc ny cng ch xut hin cc vin nghin cu, trong mt vi trng i hc ln, v cng ch dng li trong gii nghin cu v Cng ngh sinh hc. Hot ng ca Khoa Cng ngh Sinh hc, trng i hc Khoa hc t nhin TP. H Ch Minh; Vin Cng ngh sinh hc, Vin Khoa hc v Cng ngh Vit Nam; Trng i hc Y Dc TP. H Ch Minh l nhng minh chng. 2. Phng php v kt qu 2.1 X l v gene v protein Lnh vc Sinh tin hoc v cng pht trin trn th gii. Tuy nhin Vit Nam, s tham gia ca nhng ngi lm Tin hc v cng him hoi, vic to ra cc sn phm phn mm a vo p dng khng thy c. Phn vin Cng ngh thng tin ti TP. H Ch Minh, trong nhng nm qua hp tc vi mt s nh nghin cu ca Vin Cng ngh Sinh hc; ca NCBI/NLM/NIH v NIAID/NIH, xy dng mt hng nghin cu vi hai mc tiu chnh [1]: Xy dng Website v ngn hng d liu cung cp thng tin di truyn phc v cng tc hun luyn v nghin cu Cng ngh Sinh hc. 2

Xy dng phn mm phc v vic nghin cu Cng ngh Sinh hc c th x l v phn tch cc trnh t sinh hc, qua nm c nguyn l thit k cng nh bc u to ra sn phm phn mm mang thng hiu Vit Nam trong lnh vc Sinh tin hoc. Nhm thc hin to ra sn phm phn mm vi tn gi HiBio. Sn phm phn mm HiBio ny c nhng tnh nng cn thit cho vic tm hiu v sinh hc phn t. Nhng chc nng t n gin nh kim tra trnh t c ng l mt trnh t sinh hc DNA hay Protein khng; chuyn i trnh t t DNA sang RNA, ri sang Protein vi mt s bng m, dch Frame khc nhau, v.v... Bn trong phn mm HiBio cng c mt s chc nng phc tp hn. Chng hn, Thit k mi hin th cp mi tt nht, cc on mi xui, cc on mi ngc, hoc sp xp theo cc quality; Thit k bn plasmid vi cc tnh nng cn thit cc dng khc nhau, trong c c vic xut nhng enzym ct; C th s dng d on cu trc protein bc 2, xem cu trc bc 3 ca mt protein no ; S dng v cy sinh loi theo hai dng c gc v khng gc; Vn tm kim motif cng c t ra trong HiBio; Bn cnh cc sn phm ngun m nh ClustalX, RasTop, Blastn cng c tch hp vo h thng vi s bn a ha cao. Phn mm HiBio l tp hp gm cc module chng trnh c lp, trong mi module thc hin mt chc nng ring nh: module nhp v son tho d liu trnh t, module so snh bt cp trnh t, module to vector ct, ... 2.1.1 Chuyn i trnh t DNA ng thc vt c cu thnh ch yu t 4 base c bn A, T, G, C, c kh nng to nn 64 codon (mi codon gm 3 base), c gi gn thnh 20 amino acid. Cc amino acid ny li gp phn hnh thnh nn cc protein c trng. Tuy nhin, cc sinh vt khng ging nhau nn s hnh thnh amino acid cng khng ging nhau. C nhng loi vi codon ny th to nn amino acid ny nhng i vi loi khc th li l mt acid amin khc. Chng hn, b ba (codon) CTT ng vt c xng sng hnh thnh nn Leucine th nm men l Threonine. Do , cng c tng ng nhiu bng dch m khc nhau cho tng loi, tng trng hp khc nhau. Trong h thng HiBio hin c 17 bng dch m t DNA sang Protein. Thut ton Six Frame, nh tn gi ca n, da trn nguyn l hp thnh b 3 chuyn m a ra 6 kh nng phn tch c th. Theo , chng ta c 3 frame theo chiu thun v 3 frame cn li theo chiu nghch. Ch 1 trong 6 frame ny c xem l sn phm dch m ng. 2.1.2 Thit k mi

Thng thng, vic trin khai cc ng dng sinh hc phn t thng vp phi tr ngi v s lng vt cht di truyn cn c. Trong cc phng php to dng phn ng khuch i trnh t (Polymerase Chain Reaction - PCR) c s dng rng ri. Nhng thc hin c phng php ny, nh sinh hc cn phi bit thng tin v trnh t cn nhn bn v t phi xc nh c cn cc mi - mt on DNA ngn, c kh nng bt cp b sung vi mt u ca mch khun b sung chuyn bit no. Thit k mi trong HiBio cho php thng k c bao nhiu on mi tho yu cu ngi dng v m t cc yu t k thut v cc on mi . T , c th thit lp thuc tnh on mi cn tm trong trnh t DNA thng qua giao din chng trnh. on mi c chn da trn cc thng s nh: Chiu di on mi V tr bt cp duy nht Phn trm loi nucleotid G, C Nhit annealing Cc on mi kh t bt cp vi nhau. 2.1.3 Tm kim trnh t Hai gii thut tm kim trong c s d liu ph bin trn th gii hin nay l BLAST v FastA. Gii thut BLAST c nh gi nhanh hn v hin ang c s dng rng ri. BLAST (Basic Local Alignment Search Tool) l gii thut tm kim heuristic. Gii thut ny c hin thc qua cc chng trnh: blastp, blastn, blastx, tblastn v tblastx ca NCBI vi cc chc nng nh sau: blastp: so snh chui amino acid vi c s d liu l cc chui protein blastn: so snh chui nucleotid vi c s d liu l cc chui nucleotid blastx: so snh bin i six-frame ca chui nucleotid vi c s d liu protein blastn: so snh chui protein vi c s d liu nucleotid tblastx: so snh bin i six-frame ca mt chui nucleotid vi bin i six-frame ca cc chui trong c s d liu nucleotid Mc ch chnh ca vic tm kim trn c s d liu l tm cc gene c trong c s d liu c cha mt chui trnh t cho trc. Thng thng cc nng ny c hin thc thng qua ng dng Web. D liu trong trng hp ny l mt trnh t DNA. T , HiBio s tm kim nhng gene bit c thng tin ging hoc tng t vi trnh t a vo. Ngoi thng s chnh l trnh t 4

DNA, chng trnh cn nhn thm cc thng s nh E (expect value), y l gi tr mong i khi so snh; W (word) l s nucleotid trong mt t dng so snh. 2.1.4 So snh trnh t Khi so snh trnh t sinh hc, c hai vn t ra, l vic so snh i xng ton cc v so snh i xng cc b. So snh i xng ton cc l qu trnh so snh i xng cho ton b cc phn t ca hai trnh t. Mi mt phn t ca mt trnh t s c so snh i xng vi mt phn t ca trnh t kia hoc ng vi mt k t trng (gap). Trong HiBioClustalX vic so snh i xng a trnh t da trn c s s dng gii thut Needleman - Wunsch, gii thut ny c nhng c im sau [1]: Tm so snh i xng ton cc tt nht gia hai trnh t bt k. S phn t so khp gia hai trnh t ln nht v cho php chn k t trng. Tt c cc phn t ca hai trnh t s c s dng to ma trn hai chiu. Mi kh nng so snh i xng u c th biu din thng qua ma trn ny Khc vi gii thut so snh i xng ton cc, gii thut so snh i xng cc b ch thc hin vic so snh i xng trn mt s phn ca cc trnh t c so snh. So snh i xng cc b c ngha sinh hc hn so snh ton cc v thng thng khng phi tt c cc phn t trong trnh t tham gia vo vic xc nh c tnh sinh hc ca trnh t. Gii thut so snh i xng cc b thng c s dng hin nay l Smith-Waterman. Cc bc ca gii thut Smith-Waterman tng t nh cc bc ca gii thut Needleman - Wunsch. Chc nng so snh ny nm trong module HiBioClustalX - l phin bn c hin thc trn c s phn mm ClustalX 1.83. Cng dng ca chng trnh cho php nhp v so snh cc on trnh t hoc cc tp tin so snh dng fasta, phylip, v.v... 2.1.5 Xc nh enzym ct gii hn Enzyme gii hn (RE) l loi enzyme c kh nng thu gii DNA mch i cc v tr xc nh. Mt s vn sau y ca cng ngh sinh hc lin quan n enzyme ct gii hn: Qu trnh phn tch trnh t ch thc hin ln lt vi tng on ngn, v vy cn phi ct b gene khng l ra thnh nhng phn nh. Phn nh cc on gene chnh xc nhm phc v 5

cho vic nhn bn cc on gene vi s lng ln cy ghp, phc v ngnh y dc, nghin cu biu hin gene, to cc gene bo v cho cc vi khun c ch, ... Lp bn enzyme ct gii hn cc nh sinh hc c th phn tch v nhn bit ng cc vng gene quan tm. Enzyme ct gii hn c c tnh c bn l kh nng nhn bit v ct mt trnh t xc nh trn DNA. Cc im m c tc ng ca enzyme c gi l im ct gii hn. Mi enzyme nhn bit mt trnh t nucleotide c trng. Cc trnh t ny thng bao gm 4 - 8 nucleotide. Cc RE khc nhau c cng mt trnh t nhn bit c gi l cc isoschizomers. Cn vi mt s RE, trnh t nhn bit khng c tnh chuyn bit tuyt i v n nhn din cc vng trnh t c dng c trng (trong cc mt s v tr c th l bt c k t no cng c) khng hon ton l mt chui duy nht. Chc nng tm enzyme ct trong HiBio cho php xc nh cc im ct trong cc on trnh t nhp vo bng cch so trng vi c s d liu cc enzyme ct gii hn thng dng. Kt qu tr v m t v tr ct trn trnh t ny, tn ca cc enzym v dng ct ca chng (thng hay cho). 2.1.6 Thit k Plasmid y cng l mt chc nng quan trng trong qu trnh nghin cu sng ngh sinh hc. Thit k Plasmid trong HiBio th hin cho DNA thng v trn, trn c s ngha cc im ct gii hn (restriction site), cc gene v v tr dng a nng (multiple cloning 2.1.7 D on cu trc bc hai D on cu trc protein l mt trong nhng nhim v quan trng, nhm xy dng c s d liu protein, phc v cho vic tm hiu chc nng v ngha ca cc protein, hiu c bn cht ca s sng t ci thin mi trng sng. Tm ra cu trc protein thng qua qu trnh phn tch thc nghim i hi nhiu thi gian v cng sc. Nhim v quan trng ca Sinh tin hoc l gip d on cu trc protein t mt chui cc amino acid bit hnh thnh nn protein . Tuy nhin cc phng php d on cu trc protein hin vn cha th c chnh xc tuyt i. T l chnh xc ca cc phng php chn on cu trc protein hin nay vn cha vt qu con s 80%. Vic tm hiu chc nng ca protein thng da vo cu trc bc ba ca protein . Tuy nhin, vic p dng cc phng php th nghim ho l tm ra cu trc bc ba ca protein vn cn gp nhiu kh khn v phc tp. Do , a s cc phng php xc nh cu trc bc ba ca mt protein phi tri qua mt bc trung gian quan trng l xc nh cu trc bc hai ca protein . Vic xc nh c cu trc bc hai ca protein s lm tng thm chnh xc trong vic xc nh cu trc bc ba ln t 25 cho n 50 phn trm. 6

c nh to site).

Phng php th nghim ho l c s dng xc nh cu trc bc hai protein cng gp nhiu kh khn v tn km thi gian. Chnh nhng kh khn ny thc y cc nh nghin cu tm cch chn on cu trc bc hai ca cc protein mi da trn cu trc bc hai ca nhng protein bit. Nhiu thut ton khc nhau c pht trin nh thut ton s dng h chuyn gia, s dng cc l thuyt v th, s dng cc phng php thng k, thut ton ngi lng ging gn nht, s dng mng neural, ... gii quyt bi ton trn, phng php thng thng l chia chui amino acid thnh nhiu on, mi on l mt ca s cha t 13 n 21 amino acid. Cc amino acid trong cng mt ca s s c cu trc bc 2 ging nhau. Khi bi ton c n gin ha t vic chn on cu trc bc 2 ca mi amino acid trong protein thnh vic chn on cu trc bc 2 ca mt amino acid trung tm trong mi ca s ni trn. Sau xy dng m hnh nh x t cc ca s amino acid trong chui sang trng thi cu trc bc 2 tng ng. chnh xc ca k thut d on cu trc bc 2 ca protein c ci tin ng k sau khi p dng mng neural nhn to hun luyn qu trnh chn on. Mng sau khi c hun luyn s c ghi nh chun b d on cu trc bc 2 ca cc chui protein mi. Mng neural thng c xy dng trn 3 lp v p dng vi s lng ln thng tin tng chi tit ca thng tin hun luyn, nhm tng chnh xc ca gii thut d on. chnh xc ca phng php d on cu trc bc 2 s dng mng neural hin l 78%. 2.1.8 Tm Motif Motif c phn thnh 2 loi. Th nht l motif trnh t (sequence motif), y l mt on trnh t c trng c tm thy 2 hay nhiu trnh t. Bn thn n i din cho chc nng, cu trc hoc thnh vin trong h. Cc motif c th c tm thy DNA, RNA, v Protein. Th hai l motif cu trc (structural motif), l mt nhm cu trc gn k nhau kt hp li to thnh mt cu trc 3D chuyn bit, cn c gi l siu cu trc bc 2. 2.2 Ngn hng d liu Sinh tin hoc Ngoi ra, nhm thc hin cng xy dng mt website Sinh tin hoc t ti a ch http://www.ioit-hcm.ac.vn. Nhm cung cp nhng phn mm do nhm xy dng v c nhng phn mm khc do nhm thu thp c trn Internet. Ti website ny cng c th s dng tm kim cc trnh t sinh hc nh mt s website hin c trn th gii. 7

c bit c th thc hin mt s chc nh v plasmid, thit k mi, v.v... ngay trn giao din ca web browser. M hnh d liu c thit k lu tr cc thng tin nucleotide ca nhiu loi sinh vt. C s d liu hin nay cha thng tin cc loi: tm s (Penaeus monodon), c ba sa (Pangasius), cy la nc (Oryza sativa), v mui gy bnh st rt (Anopheles gambiae). Khi , h thng thu thp mt cch t ng cp nht vo ngn hng d liu v thng qua cc cng c nh FastA, Blast tm kim mt on gene no . PDB l mt trong nhng c s d liu sinh hc phn t u tin trn th gii c thnh lp trc c nhng c s d liu DNA. K t khi cu trc protein u tin c cng b nm 1950 th mi n nhng nm 1970 th gii vn khng c mt s bin ng no ng k v s lng cu trc protein. Nm 1971, PDB c thnh lp Brookhaven National Laboratory. Nhim v chnh ca PDB l lu tr d liu cu trc protein trn my tnh [17]. Mt nh dng d liu chun c PDB pht trin l dng tp tin .pdb. Hin nay, c s d liu PDB ln n 34065 cu trc (cp nht ngy 06/12/2005, theo ngun http://www.rcsb.org/pdb/). H thng thit k nhm p ng vic lu tr d liu trong tp tin pdb h tr cho thao tc tm kim thng tin. 2.3 Bi ton Sinh tin hoc trn mi trng tnh ton li Trn th gii cng c nhiu d n v tnh ton li (Grid Computing) lin quan n Sinh tin hoc. Chng hn, cc d n sau y ca nhng trung tm nghin cu ln. myGrid ca UK e-Science c h tr cc trng i hc Anh, Vin Sinh tin hoc Chu u (EBT) v nhiu phng th nghim cng nghip ln trn th gii. D n nhm xy dng cc phn mm dng Middleware gii quyt cc bi ton Sinh hc [13]. D n BioGrid Project of Indiana University, USA vi tn gi Bioinformatics data and compute grids for bioscientists c thc hin t 12/2002 n 11/2006. Nhm qua to mi trng tnh ton li cho cc nh nghin cu Sinh hc. Cng ti i hc Indiana ny c d n 1 triu USD thc hin trong 4 nm nhm xy dng mng li tnh ton vi tn gi An e-Science Grid for Indiana University [14]. n North Carolina BioGrid nhm nghin cu v hin thc cc cng ngh mng li phc v cho cc nh nghin cu, o to c iu kin nghin cu c ch b gene [15]. Khu vc Chu Thi Bnh dng, c sng kin Asia Pacific BioGrid nhm to ra nh ca c s d liu phn tn pht trin t cc khun mn ca d n DataGrid [16]. Trong nc vic trin khai cc bi ton sinh hc trn h thng tnh ton li hon ton mi m. Nhm thc hin trin khai Tnh ton li trong vic gii quyt mt s vn ca tin sinh hc, vi Bioinformatics Grid Portal t ti a ch http://biogrid.ioit-hcm.ac.vn. H thng Grid ca Phn vin Cng ngh thng tin ti TPHCM cng kt ni vi h thng Grid ca KISTI (Korea Institute of Science and Technology Information) trong h thng K*Grid ca Hn Quc. 3. So snh v nh gi vi mt sn phn mm khc 3.1 V Plasmid Mt s so snh vi cc phn mm thit k Plasmid c xem l kh hiu qu hin nay. Phn mm STT Chc nng 1 8 T v Plasmid khi khng c ni dung on DNA HiBio C SimVector C Visual Cloning C

2 3 4 5 6 7 8 9 10 11 12

Nhp Plasmid t tp tin DNA C Nhp Plasmid t tp tin m t cu trc gen Khng Thm bt v hin th cc enzym ct trong on C gen Hin th cc on m tng ng khi nhp vo C thnh phn trong hnh v Tng tc v ko dn cc nhn thit k trn C plasmid Sa mu sc, hnh dng v nhn ca cc thnh C phn thm vo Chn, xa sa on mi hoc on m DNA vo C plasmid Chn, xa sa on c nh du trn C plasmid Chuyn i hnh dng trn v thng C To on m v Plasmid trn web C Xut Plasmid ra tp tin lu tr C

C Khng C C C C C C C Khng C

C C C Khng C C C C C Khng C

3.2 Chuyn i trnh t Trong nhng phn mm chuyn i trnh t Biology WorkBench, Molecular Toolkit, exPASy v HiBio, u c kh nng phn tch v cho ra cc kt qu ORF (Open Reading Frame) nh nhau i vi gene cy la (Oryza sativa) v bng m chun. Trong , duy nht Biology WorkBench c h tr vic chn lc ra ORF di nht. Tuy nhin, trong 4 cng c dch m trn, ch c chng trnh HiBio v cng c trn web ca ExPASy l c xem xt n cc kh nng khc ngoi acid amin M (Met). Chng hn, chng trnh HiBio cho ra hai kt qu ORF di nht khc nhau i vi L (Leu) v M (Met) trong kt qu phn tch t Frame +3. T cho thykt qu ca ORF bt u bng acid amin L nh trn l di nht v c kh nng l protein nhiu nht. 3.3 Tm Enzym ct Qua nhiu th nghim, chc nng tm cc Enzym ct c mt s u v khuyt im sau y, u im: C giao din trc quan tt khi hin th kt qu xc nh cc im ct gii hn. Cho php c d dng chn lc cc enzyme ct gii hn cn xt. C bo co thng k y thng tin v cc im ct, v tr ct, s lng, theo th t v tng kha cnh quan tm. Cho php ct lu hnh nh kt qu. Kt qu chnh xc cho vic tm kim im ct. Khuyt im: Kt qu hin th cha p bng mt s phn mm thng mi nh VectorNTI, GeneTool, PepTool, ... C s d liu v enzyme cha thc s y v cn thit. 3.4 D on cu trc protein bc 2 Kt qu d on c nh gi da trn cng thc Q3 vi tp mu protein c kt qu thc nghim l CASP3. Cng thc Q3 nh gi xc sut chun on ng i vi lp cu trc bc hai i C theo cng thc sau:

pred Qi%C =

N ic * 100 Ni

trong Ni l s lng cc axt amin c chn on thuc lp cu trc bc hai i, cn Nic l s lng cc axt amin trong s Ni axt amin trn c chn on ng. Xc xut chn on ng i vi c ba lp cu trc bc hai {E, H, C-L} c tnh theo cng thc sau:

Q3 =

N Ec +N Hc + N Lc * 100 NE + NH + NL

Kt qu thc nghim trn b d liu mu CASP3 cho thy chc nng ca phn mm c kh nng d on tng i tt so vi cc phng php khc nh bng sau: Phn mm HiBio PhD JPred DSC Q3 73.4% 66.7% 72.4% 67.3

Qua cc thc nghim kim chng, chc nng d on ca chng trnh c nhng u im sau: D on c kt qu kh tt so vi cc cng c thong dng hin nay. C giao din trc quan, rt d quan st v sao lu li cho cc bo co. Khim khuyt hin ti ca chc nng l cha th d on chnh xc tm loi cu trc ca tng axit amin. 4. Kt lun Trn th gii, Sinh tin hoc l mt lnh vc tng i pht trin, nhng vi Vit Nam chng ta, lnh vc ny cng ch xut hin cc vin nghin cu hoc cc trng i hc ln. c bit, vic to ra cc sn phm phn mm a vo p dng li v cng t i. HiBio l mt trong nhng sn phm p ng yu cu t ra ca nhng nghin cu v hun luyn sinh hc phn t. Tuy nhin, hon thin v c nhng kt qu kh quan hn, chc chn cn phi c nhng nghin cu chuyn su. Vi kt qu bc u, nhm thc hin ang kt hp vi cc nh nghin cu ca vin NIAID/NIH (National Institute of Allergy and Infectious Diseases/National Institutes of Health, USA) xy dng d n hon thin phn mm PCBioInf. Qua hi tho, nhm nghin cu mong mun nhng ngi lm tin hc cn tp trung nhiu hn na cho nhng sn phm dng ny. y nhanh hn na cc hot ng tnh ton khoa hc ti TPHCM. Ti liu tham kho [1] Trn Vn Lng, et al., Xy dng ng dng h tr pht trin tin sinh hc, Hi tho quc gia ln VIII v CNTT v TT, Hi Phng, 8/2005. [2] Trn Vn Lng, et al. , Mt s kt qu trong nghin cu v ng dng, trin khai tin sinh hc, Hi ngh Khoa hc k nim 30 nm Vin Khoa hc v Cng ngh Vit Nam, H Ni, 5/2005. [3] Trn Vn Lng v cng s, Nghin cu xy dng cng c tin hc x l thng tin v gene v protein, ti cp Vin Khoa hc v Cng ngh Vit Nam, 2003 2004. [4] Dao Van Tuyet, Le Phuoc Loc, et al., Vietnamese tool for studying on Bioinformatics, Workshop on Computational Biology: "From sequence to function", HCMC, September 2004. 10

[5] Le Phuoc Loc, Tran Van Lang, Ngo Phuoc Hau, Some common methods for protein structure prediction, Workshop on Computational Biology: "From sequence to function", HCMC, September 2004. [6] Dan E. Krane, Michael L. Raymer, Fundamental concepts of Bioinformatics, Benjamin Cummings, 2003. [7] Teresa K. Attwood, David J. Parry Smith, Introduction to Bioinformatics, Prentice Hall, 1999. [8] Mark Gerstein, Bioinformatics introduction, Yale University Press, 2002. [9] Peter Clote, Rolf Backofen, Computational Molecular Biology, An Introduction, John Wiley & Sons, 2000. [10] Ian Korf, Mark Yandell, Josept BedelL, BLAST, Oreilly, 2003. [11] Jean Michel Claverie, Cedric Notredame, Bioinformatics for dummies, Wiley Pulishing, Inc., 2003. [12] NCBI - National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov . [13] EMBL - European Molecular Biology Laboratory, http://www.embl-heidelberg.de. [14] EBI - European Bioinformatics Institute, http://www.ebi.ac.uk. [15] DDBJ - DNA Data Bank of Japan, http://www.ddbj.nig.ac.jp/Welcome.html [16] myGrid ca UK e-Science - Engineering and Physics for Science Reearch Council, http://www.mygrid.org.uk/ [17] The North Carolina BioGrid project, http://www.ncbiogrid.org/ [18] BITS - Bioinformatics Resources, http://bits.sinica.edu.tw/en/keysite/index.php [19] Asia Pacific BioGRID Initiative, http://www.apbionet.org/apbiogrid. [20] PDB - Protein Data Bank, http://www.rcsb.org/pdb/

BUILDING THE APPLICATIONS FOR SUPPORT TO DEVELOP THE BIOINFORMATICS


Tran Van Lang
HCMC Institute of Information Technology Abstract: The building the applications support the research and training on bioinformatics is very essential. Since then we can fall in line with the world on bioinformatics, and create the chance for the cooperation with other countries. In this paper we present some of the results of the bioinformatics research group, HCMC Institute of Information Technology in the last years. The paper consist three the part. In the first part, we consider the fundamental problems for analysis the biology sequence: primer design, restriction enzyme determination, protein structure prediction, inferring phylogenies, etc. In the second part, we introduce to our bioinformatics software HiBio and some of fields that bioinformatics research group are studying. In the final part, we present the conclusion and petition.

11

You might also like