You are on page 1of 10

TP CH KHOA HC, i hc Hu, S 59, 2010 NGHIN CU V S KHC BIT CA MNG CYTOKINE TRONG H THNG MIN DCH BNG

GII THUT TIN HO DA TRN MNG BAYES


Nguyn Hoi Tng, Ramstein Grard, Leray Philippe Phng th nghim Tin hc vng Nantes-Atlantique Jacques Yannick Trung tm nghin cu Ung th Nantes/Angers

TM TT
Bi bo gii thiu cch tip cn dng mng Bayes suy din s khc bit ca s nh hng ln nhau gia cc cytokine (mt loi protein quan trng ca h thng min dch) trong nhng iu kin th nghim khc nhau. Chng ta s c gii thiu phng php tin ha hc cu trc ca mng Bayes. Phng php ny cho php chn lc c mt tp hp cc mng c t l hc tt nht. Mi mng nhn c sau kt qu hc s c kim nghim bng phng php kim nh thng k vi hai qun th d liu bnh nhn: mt c dng thuc iu tr, cn li khng dng thuc iu tr. Mc ch ca th nghim ny l nhm nh gi mc nh hng ca thuc i vi s tng tc ln nhau gia cc gien.

1. Gii thiu Ch trong my nm gn y, ngi ta va tm c Interleukine 15 (IL-15) [1], mt loi cytokine c vai tr rt quan trng trong h thng min dch. Mt im ng ch l cytokine ny c cc chc nng gn ging v c quan h mt thit vi cc cytokine khc. V vy, vn t ra l IL-15 c tm nh hng nh th no i vi cc cytokine lng ging ca chng trong nhng iu kin th nghim khc nhau. tr li cu hi ny, cc chuyn gia trong ngnh nh n s h tr ca my tnh thng qua cc k thut tin tin v ang c nghin cu v ng dng rng ri trong lnh vc tin sinh hc. Tht vy, ngy nay cng ngh vi mng (microarray) cho php o c ng thi mc biu l ca hng ngn gien. Bn cnh , mng tng tc gien (gene regulatory networks) khng nhng cho ta mt ci nhn tng th v mi tc ng ln nhau gia cc gien, m cn c kh nng lu tr cc thng s v mc biu l ca chng. V vy, kh nng suy lun ca mng tng tc gien t d liu vi mng lun l vn mi nhn trong cc nghin cu tin sinh hc. Minh chng l c khng t cc phng php c xut cho vic xy dng loi mng ny (xem phn 2.1). Trong s , phng php tip cn bng mng Bayes (Bayesian networks) gy mt s ch ng k ca cc
173

nh nghin cu bi kh nng gii quyt phn ln cc vn t ra ca n: (1) cc tng tc phc tp to ra bi mt s lng ln gien c phn tch t cc ngun d liu ri rc v nhiu; (2) mt khi lng khng l cc bin (trn 30.000 gien) trong khi rt hn ch v d liu mu (vi chc n vi trm th nghim); (3) phc tp tnh ton ca cc cu trc mng v ngha thng k gia cc bin trong mng. Trong bi bo ny, tc gi mun gii thiu mt cch tip cn bng phng php tin ha duy tr tp hp cc mng Bayes c t l hc tt nht t d liu vi mng v IL-15. Tp hp ny cho php mt so snh cc kt qu thu c t mi mng bng kim nh thng k trn hai tp d liu bnh nhn: mt c iu tr bng thuc, cn li khng iu tr bng thuc (hai iu kin th nghim khc nhau). Ni cch khc, chng ta s i tr li cho cu hi: Lm th no dng mng Bayes suy lun s nh hng ca IL-15 trong nhng iu kin th nghim khc nhau?. 2. Phng php 2.1. Xy dng li mng tng tc gien bng tip cn mng Bayes Xy dng li mng tng tc gien l mt bi ton kh ni ting trong ngnh tin sinh hc. Tht vy, c khng t nhng gii php ngh cho vn ny m mt vi i din tiu biu c th k ra y l: clustering [4], mng Bayes [7], [10], [3], [13], m hnh th Gauss [11]. Mi mt xut c nhng li im cng nh gii hn ring ca n. Ring i vi bi bo ny, chng ti chn mng Bayes nh mt hng nghin cu chnh cho vic xy dng li mng tng tc gien. Cng trnh c xem l u tin cho vn ny thuc v nhm nghin cu ca Gio s Friedman v cng s vo nm 2000 [7]. y c xem nh l cng trnh tiu biu cho cc nghin cu sau ny v xy dng li mng tng tc gien da trn nguyn l ca mng Bayes.

Hnh 1. M hnh u tin ca bi ton xy dng li mng tng tc gien bng mng Bayes ngh bi Friedman v cng s vo nm 2000. 174

Nhng kt qu u tin ca cc tc gi ny c phn tch nghin cu da trn mt tp d liu c dung lng mc trung bnh. H p dng cc phng php n gin ri rc ha v hc cu trc mng. Cc tc gi cng nu ln mt s vn m cho cc nghin cu tip theo: tp d liu mu t, tnh lin tc ca d liu, phng php ri rc ha, d liu biu l c ph thuc thi gian, cc tnh nng suy din v cui cng khp vi kin thc ca chuyn gia. M hnh th hai c Peer v ng nghip gii thiu sau cng trnh th nht mt nm (2001) [10]. H nghin cu trn mt b d liu ln hn v ch trng vo vic phn tch, nh gi cc mng con da vo ngng ca tin cy c qui nh bi cc rng buc v tnh tri/ln gia cc gien. Ngoi ra, im khc bit ng ch so vi cc nghin cu u tin l h x l trn d liu lin tc m khng cn tri qua bc ri rc ha v hc cu trc mng. Mt trong cc vn c xem l m ca nghin cu ny l khm ph cc nhn t tim n c tng tc vi cc gien pht hin.

Hnh 2. M hnh ci tin ca Peer v cng s vo nm 2001.

Tr li vi nghin cu ca tc gi, tin trnh trng im ca gii php c gii thiu trn hnh s 3 chnh l hc cu trc mng Bayes. Tin trnh ny s m nhn vai tr hc cu trc ca mng Bayes t d liu vi mng bng cc gii thut tin ha (hnh s 3). Mt trong nhng li im m vi mng mang li l kh nng o c ng thi hng chc ngn gien. Hn na, ngy nay c s d liu vi mng c cng b v cho php ti min ph trn cc my ch ni ting nh: GEO Omnibus, Array Express, Oncomine l kt qu lm vic vi s ng gp ca nhiu trung tm nghin cu sinh hc trn th gii.

175

Hnh 3. M hnh ngh ca tc gi

Trong giai on u, chng ti s dng cch tin cp bng gii thut tin ha (c trnh by c th phn 2.3) to ra mt tp cc mng Bayes c nh gi l tt nht theo t s (score) nhn c t phn tch d liu th nghim. Ty vo c th ca tng iu kin th nghim khc nhau, m giai on hai, chng ti s kim tra kt qu t c ca cc mng ny bng phng php kim nh thng k (xem m hnh chi tit hnh s 4). C th hn, chng ti s dng phng php kim nh gi thuyt trn hai qun th d liu: mt c iu tr bng thuc, v ngc li. Kt qu t c ca nghin cu ny cho php chng ta nh gi tm nh hng ca liu php iu tr n s tng tc gien.

Hnh 4. M hnh ngh ca tc gi (chi tit).

176

2.2. Hc cu trc: vn quan trng trong vic xy dng mng tng tc gien Mng Bayes l m hnh th xc sut dng biu din mi quan h ph thuc gia cc i tng. y l mt loi th c hng khng c chu trnh. Cu trc ca mt mng Bayes G bao gm: mt tp cc nh v mt tp cc cnh c hng (hnh 5).

Hnh 5. V d ca mt mng Bayes n gin.

Trong nghin cu v xy dng li mng tng tc gien, mi gien ng vai tr ca mt nh, quan h tng tc gia cc gien th hin vai tr ca cc cnh. Nu tn ti mt cnh t A n B, v B ph thuc trc tip vo A (gien A tc ng gien B) th A c gi l cha ca B. Theo lut Markov, trong mt mng Bayes, mi bin ph thuc c in kin vo cc bin h hng m khng phi l con chu ca n. Khi , phn phi c iu kin ca A khi bit cha m n paA l P(A/paA) (ngi ta gi y l thng s mng network parameter). Vi lut n gin ny, chng ta c th suy din c mt mng Bayes c th gii thch c tnh cht ca d liu quan st c nh th no. V d: i vi mng Bayes nh hnh 5 bn trn, ta c th biu din phn phi c iu kin nh cng thc 1 sau y :
P(G1, G2, G3, G4, G5, G6) = P(G1).P(G3).P(G2|G1).P(G4|G2).P(G5 |G2,G3)

(1)

Trng hp n gin nht, cu trc ca mt mng Bayes c miu t v qui nh bi cc chuyn gia, sau chng ta ch vic dng n biu din cc suy lun. Tuy nhin, vic xc nh cu trc ny thc s qu phc tp so vi kh nng ca con ngi. V vy, i hi c cu trc mng v thng s mng u phi c hc t ng t d liu. Ngi ta gi cng vic ny l hc mng Bayes (Bayesian network learning). Vic hc mng Bayes t d liu i hi vic xc nh c m hnh cu
177

trc G v thng cc s P. hc thng s, tip cn kh ph bin l s dng hm t s thng k (scoring function). Hm ny c nhim v nh gi mc khp ca mt mng Bayes vi d liu hc. Sau tm mng ti u theo hm t s ny. Mt trong cc hm hay c s dng l BIC (Bayesian Information Criterion). hc cu trc, c hai dng tip cn: (1) Cc phng php da vo rng buc (constraint-based) tm trong c s d liu cc mi quan h c lp c iu kin, sau , xy dng cc cu trc th gi l cc mu. Cc mu ny biu din cho mt lp cc th DAG. (2) Cc phng php da vo tm kim v tnh t s (search and scoring) tm trong khng gian ca cc cu trc hp l c th c ca mt mng. Phng php ny c li im l d dng kt hp vi kin thc ca chuyn gia v gii quyt tt vn d liu thiu. Nh vy, phng php hc no thch hp cho bi ton xy dng li mng tng tc gien? Trong nhng nm gn y, c kh nhiu nghin cu u t vo vn ny: [7], [9], [8], [2], [5]. Mi nghin cu, cc tc gi ngh cc phng php hiu qu ring ca h ci thin chnh xc ca php suy lun. c bit, trong s cc nghin cu ny chng ti quan tm n nghin cu mi gn y ca C.Auliac [2], ngi va bo v thnh cng lun n tin s vo u nm 2009 vi ti Cc tip cn tin ha xy dng li mng tng tc gien bng cch hc mng Bayes. Cch tip cn ny s c trnh by trong phn tip theo. 2.3. Gii thut tin ha cho vic hc cu trc mng Bayes Gii thut tin ha (Evolutionary Algorithm - EA) l nhnh ngnh con ca tnh ton tin ha, mt gii thut ti u ha bng kinh nghim da vo qun th. EA cho php duy tr mt tp cc gii php ti u. Mt trong cc i din rt quen thuc ca EA l gii thut di truyn (Genetic Algorithm - GA).

Hnh 6. So snh cc tin trnh ca GA v EDA.

c bit, ch khong my nm gn y, mt hu bi ca GA c tn EDA (Estimation of Distribution Algorithm), gii thut nh gi phn phi, v ang
178

c cc nh nghin cu trong ngnh nhc n nh mt ci tin rt trin vng. Vi EDA, mi qun th s c gn vi mt phn phi xc sut v mi ng vin mi s c sinh ra bng phng php lu mu t phn phi ny. C th hn l ngi ta s thay th tin trnh lai ghp v t bin ca GA bng bc xy dng m hnh xc sut v ly mu qun th con trong EDA (hnh 6). Gii thut ny cho php duy tr mt tp hp cc gii php ti u vi cc phn phi xc sut tt. iu ny c ngha quan trng cho cc kim nh thng k sau ny. y cng l mt trong mc tiu chnh trong nghin c u ca chng ti. Thm vo , phng php tm mt phn phi xc sut tt vn cn l mt vn rt m. Tht vy, c rt nhiu phin bn khc nhau ca EDA c ngh tr li vn ny nh: EBNA (Estimation of Bayesian networks Algorihtm), FDA (Factorized Distribution Algorithm), LFDA (Learning Factorized Distribution Algorithm), BOA (Bayesian Optimization Algorithm). V vy, y l mt ti ha hn s vn cn tip tc thu ht cc u t nghin cu. Tr li trng hp ng dng cho vic hc cu trc mng Bayes, EDA c xp vo loi cc phng php tm kim v tnh t s (xem li phn 2.2) [12], [2]. Theo gii thut ny, mi mng Bayes ng vin c biu din bng mt chui nh phn Cij kch thc nn (cng thc 3):

Theo ngn ng ca l thuyt di truyn th mi mng Bayes l mt nhim sc th. C ngha l mi nhim sc th, s i din cho mt c th ca tp qun th, v c biu din bi mt chui nh phn c dng nh sau (xem hnh 7):
c11c21 ::: cn1 c12c22 ::: cn c1nc2n ::: cnn

(3)

Hnh 7. V d ca vic biu din mt mng Bayes theo ngn ng ca gii thut di truyn.

Nguyn tc m ha tun theo qui nh ca cng thc (2). Ring i vi hm thch nghi (fitness fuction) c s dng trong trng hp ny chnh l hm t s (scoring function, xem li phn 2.2) c tnh t d liu cho mi mng Bayes.
179

Thut ton v qu trnh hun luyn c m t nh sau:


1. Mt qun th c sinh ra t cc vc-t xc sut m ha t cc mng Bayes ngu nhin. 2. Hm thch nghi ca mi c th s c nh gi v xp hng chn nhng c th ti u. 3. Cp nht qun th da trn cc c th c xp hng theo ch s thch nghi cao nht. 4. t bin. 5. Lp li bc 1-4 cho n khi tha iu kin dng (khng c c th mi no c ch s thch nghi tt hn)

Hnh 8 di y s minh ha cho vic ng dng gii thut ny bng mt v d n gin kt thc bi bo. V d m t cc tin trnh tnh ton ca EDA cho vic hc cu trc ca mng Bayes. Kt qu u ra l tp hp cc mng Bayes c ch s thch nghi cao nht. y cng chnh l cc mng kt qu tim nng cho bc nghin cu tip theo s dng cc phng php kim nh thng k nhm nh gi hiu qu suy lun v mc tng tc gia cc i tng ca mng gien (cytokine):

Hnh 8. V d ca vic biu din mt mng Bayes theo ngn ng ca gii thut di truyn.

3. Kt lun v hng pht trin Mc tiu quan trng nht ca nghin cu ny l phn tch s khc bit v tm nh hng gia cc cytokine trong nhng iu kin th nghim khc nhau dng mng Bayes. t c u ny, gii thut tin ha s m nhn vai tr to v duy tr mt tp cc mng c cu trc ti u. T , mt bc kim nh thng k s c
180

p dng trn hai tp qun th c iu kin th nghim khc nhau nh gi li hiu qu suy lun thc t ca kt qu t c. Xy dng li mng tng tc gien t mng Bayes l mt hng nghin cu ang c u t bi nhiu nh nghin cu trong ngnh tin sinh hc. Gii php ngh ca chng ti ang c nhm nghin cu kim chng bng chng trnh phin bn th nghim v kt qu s c cng b trong thi gian sm nht. 4. Li cm n D n ny c ti tr bi BIL (BioInformatique Lingrienne), vng Pays de la Loire, Cng ha Php. TI LIU THAM KHO
1. Arena, Ra. Merendino, L. Bonina, D. Iannello, G. Stassi, and P. Mastroeni, The new microbiologica, Official journal of the Italian Society for Medical, Odontoiatric, and Clinical Microbiology (SIMMOC), 23(2), 2000. 2. C. Auliac, Approches volutionnaires pour la reconstruction de rseaux de rgulation gntique par apprentissage de rseaux baysiens, PhD Thesis, Universit d'Evry-Val d'Essonne, France, 2008. 3. M. Dejori, Analyzing gene expression data with bayesian networks, PhD thesis, Technical University of Graz, 2002. 4. Z. Dongxiao, O. H. Alfred, C. Hong, K. Ritu, And Anand S., Network constrained clustering for gene microarray data, Bioinformatics, 2005. 5. S.F. Emmert And M. Dehmer, Analysis of microarray data: A network-based approach, Wiley-VCH Publishing, 307-329, 2008. 6. N. Friedman, M. Linial, I. Nachman, And D. Pe'er, Using bayesian networks to analyze expression data, Computer Biology 7(3-4), 601-620, 2000. 7. F. Geier, T. Jens, And F. Christian, Reconstructing gene-regulatory networks from time series knock-out data, and prior knowledge, BMC Systems Biology, 1(1):11, 2007. 8. Y. Huang, J. Wang, Zhang J., Sanchez M., And Y. Wang, Bayesian inference of genetic regulatorynetworks from time series microarray data using dynamic bayesian networks. Bioinformatics, 2:46-56, 2007. 9. P. Li, Z. Chaoyang, P. Edward, G. Ping, And Youping D., Comparison of probabilistic boolean network and dynamic bayesian network approaches for inferring gene regulatory networks, BMC Bioinformatics, 8(Suppl 7):S13, 2007. 10. D. Pe'er, A. Regev, G. Elidan, And N. Friedman, Inferring subnetworks from perturbed 181

expression profiles, Bioinformatics (Oxford, England), 17(1), 2001. 11. J. Schferand And K. Strimmer, Learning large-scale graphical gaussian models from genomic data. J. F. Mendes. (Ed.). Proceedings of CNET, 2005. 12. G. Thibault, S. Bonnevay, And A. Aussem, Learning bayesian network structures by estimation of distribution algorithms: An experimental analysis, IEEE International Conference on Digital Information Management (ICDIM 07), Lyon, France, 2007. 13. L. Tiefei, Learning gene network using bayesian network framework, PhD thesis, National University of Singapore, 2005.

DIFFERENTIAL STUDY OF THE CYTOKINE NETWORK IN THE IMMUNE SYSTEM BY THE EVOLUTIONARY ALGORITHM BASED ON THE BAYESIAN NETWORK
Hoai-Tuong NGUYEN, Grard RAMSTEIN, Philippe LERAY LINA - Laboratory of Informatic of Nantes-Atlantique Yannick JACQUES CRCNA - Center of Research on Cancerology of Nantes/Angers

SUMMARY
In this paper, we present a Bayesian networks (BNs) approach in order to infer the differentiation of the cytokine implication in different experimental conditions. We introduce an evolutionary method for BNs structure learning that maintains a set of the best learned networks. Each of them will be tested by a statistic test with two populations of patient data: one with treatment (drugs), other without treatment. The answer to the qution How does the treatment influence the gene regulation? is expected.

182