You are on page 1of 31

I

dent
ifi
cat
ionofMutat
ional
Hot
spotsinEff
ect
orEvol
utioni
n
Puccini
aspeci
es

Submi
tt
edt
o: Submi
tt
edby
:

Mr
.Inder
ji
tYadav Jashanj
otSi
ngh

SchoolofAgr
icul
tur
alBi
otechnol
ogy (
L-2014-
A-20-
Biot
ech)

PAU,
Ludhi
ana Har
preetSi
ngh

(
L-2014-
A-59-
1
Bi
otech)

CERTI
FICATE

Thi
sist
o cer
ti
fyt
hatMr
.Jashanj
otSi
ngh,Admn.No.L-
2014-
A-20-
Bi
otechandMr
.Har
preetSi
ngh,Admn.No.L-
2014-
A-59-
Biot
ech,B.
Sc.
Bi
otechnol
ogy(
Hons.
)fi
naly
earst
udent
shav
esuccessf
ull
ycompl
eted
t
hei
r“I
n-HouseTr
aini
ng”dur
ingt
hef
inalsemest
erofgr
aduat
ionwi
tha
pr
oj I
ecton“ddent
if
icat
ionofmut
ati
onalhot
spot
sinEf
fect
orevol
uti
oni
n
Pucci
niaspeci
es”undermysuper
visi
on.

Mr.Inderji
tSinghYadav
School ofAgricul
tur
alBi
otechnol
ogy
PAU, Ludhiana

2
I
ntr
oduct
ion
Ef
fect
orpr
otei
nsar
emost
lysecr
etor
ypr
otei
nst
hatst
imul
atepl
ant
i
nfect
ion bymani
pul
ati
ng t
he hostr
esponse.
Ident
if
ying f
ungal
ef
fect
orpr
otei
ns and under
standi
ng t
hei
rfunct
ion i
s ofgr
eat
i
mpor
tancei
nef
for
tst
ocur
blossest
opl
antdi
seases.Ef
fect
or
pr
otei
nsar
e most
lysecr
etor
ypr
otei
nst
hatal
terhostcel
l
sto
suppr
esshostdef
ensemechani
sms,
andf
aci
l
itat
einf
ect
ionbyt
he
pat
hogen so i
tcan der
ive nut
ri
ent
sfr
om t
he host
.Eukar
yot
ic
f
il
ament
ous pl
ant pat
hogens secr
ete ef
fect
or pr
otei
ns t
hat
modul
atet
hehostcel
ltof
aci
l
itat
einf
ect
ion.

I
tisaccept
edt
hatmostf
ungalav
irul
encegenesencodev
irul
ence
f
act
orst
hatar
e cal
l
ed ef
fect
ors.Mostf
ungalef
fect
ors ar
e
secr
eted,cy
stei
ne-
ri
chpr
otei
ns,andar
olei
nvi
rul
encehasbeen
shownf
oraf
ewoft
hem,
incl
udi
ngAv
r2andAv
r4ofCl
adospor
ium
f
ulv
um,
whi
chi
nhi
bitpl
antcy
stei
nepr
oteasesandpr
otectchi
ti
nin
f
ungalcel
lwal
l
sagai
nstpl
antchi
ti
nases,r
espect
ivel
y.I
nresi
stant
pl
ant
s,ef
fect
orsar
edi
rect
lyori
ndi
rect
lyr
ecogni
zedbycognat
e
r
esi
stancepr
otei
nst
hatr
esi
deei
theri
nsi
det
hepl
antcel
loron
pl
asmamembr
anes.Sev
eralsecr
etedef
fect
orsf
unct
ioni
nsi
det
he
hostcel
l
,butt
heupt
akemechani
sm i
snoty
etknown.Var
iat
ion
obser
vedamongf
ungalef
fect
orsshowst
wot
ypesofsel
ect
ion
t
hatappeart
orel
atet
owhet
hert
heyi
nter
actdi
rect
lyori
ndi
rect
ly
wi
tht
hei
rcognat
eresi
stancepr
otei
ns.

3
Tar
getHost
Wheatl
eafr
usti
saf
ungaldi
seaset
hataf
fect
swheat
,bar
leyandr
ye
st
ems,l
eav
esandgr
ains.I
ntemper
atez
onesi
tisdest
ruct
iveonwi
nter
wheatbecauset
hepat
hogenov
erwi
nter
s.I
nfect
ionscanl
eadupt
o20%
y
iel
dloss.Thepat sPucci
hogeni niar
ustf
ungus.

• P.t
ri
ti
cinacauses"
blackr
ust
",

• P.gr
ami
niscauses“
stem r
ust
",

• P.st
ri
if
ormi
scauses"
yel
l
owr
ust
".

Obj
ect
ive
I
dent
if
icat
ionofpot
ent
ialhot
spot
sormut
abl
esi
tesi
nef
fect
orpr
otei
ns
f
orbet
termanagementofdi
seasei
nthef
utur
e.

Pr
otocol
fol
l
owed
I
. Sequencedownl
oad

I
I. Ef
fect
orP

4
I
II
. Bl
astAl
i
gnment

I
V. Fast
afi
l
escr
eat
ion

V. Cl
ust
alw

VI
. Al
n.f
ast
afi
l
escr
eat
ion

VI
I. Rev
trans

STEPONE-SEQUENCEDOWNLOAD

Ther equiredproteinandmRNAsequencewasdownl oadedf r


om t he
off
icialwebsi t
eofTHEBROADI NSTI TUTE.Thei nsti
tut
ewasf ounded
toseizet heoppor tunitythatarosefrom theHumanGenomePr ojectthe
i
nternat i
onaleffortthatsuccessf ul
lydecipheredtheentir
ehuman
geneticcode.Despi tethataccompl i
shment ,sci
entist
sknewt heyst i
ll
l
ackedacl earunder standingoft hegeneticbasisofdisease,andhowt o
tr
anslat ethatunder standingintomor eeffect
ivepreventi
on,diagnosis,
andtr eatment.
Toreachthesegoal s,
itwasclearthatanewt ypeofresear
chi nstit
uti
on
hadtobecr eated.Thetradi
ti
onalacademicmodel ofindi
vi
dual
l
aboratori
eswor ki
ngwi t
hinthei
rspecif
icdisci
pli
neswasnotdesi gnedto
meettheemer gingchall
engesofbiomedicine.Togainacompr ehensiv
e
vi
ewoft hehumangenomeandbi ologi
calsystems,t
heyinsteadhadt o
workinahi ghl
yintegr
atedfashi
on.

5
Aver
age GENOME
NUMBER PROTEI
N ASSEMBLY
NAMEOF
CONTI
GS GENES OFGENE LENGTH SI
ZE
SPECI
ES
MODELS
(
aa) (
Mbp)

Pucci
nia
gr
aminis
4557 1741 15979 395 88.
64

Pucci
niat
ri
ti
cina

23148 1781 15685 398 135.


34

Puccini
a
st
ri
iformis
29178 1270 18021 321 64.
78

Tabl
econt
aini
i
gdownl
oadedi
nfor
mat
ion

6
Sear
chi
ngt
hedomai
nspacewi
thDOMOSAI
C
DoMosai csi satoolforanal ysi
ngandv i
sual
izingaspect sof
modul arproteinevol
ution.Itall
owsuser st
o,st art
ingwi thasetof
relatedpr oteinsequences, annotateprotei
ndomai ns( using
differentdomai nannotat i
onmet hods),andvisualize domain
arrangement s (
theN-toC- terminal or
derofdomai nsi napr ot
ein)
alongaphy logentictr
ee.Itcanbeusedt ofi
ndoutwhet hera
domai nofinterestwasl ost,whet heragroupofpr oteinsdiff
ersin
i
tsdomai nar rangement s,orwhatt hecharacter i
sticdomai nsfora
phy logeneticgroupare.

DoMosai
csi
snotat
oolforconduct
ingdetai
l
edsequenceanal
ysi
s,or
constr
ucti
ngphylogeni
es.

Loadi
ngdat
atoDOMOSAI
C
WhenDoMosai csstarts,
youwi l
lberequir
edt osel
ectawor ki
ng
dir
ector
y.Thewor kspace( whichincl
udesallproj
ectdata)wil
lbe
savedinthisdi
rectory.Bydefault
,theworkingdir
ector
ywi l
lbe
under${user_
home} /domosaics-workspace.

7
Thehmmscan, hmmpressandtheprofi
lesarerequi
redfortheproper
funct
ioni
ngoftheDOMOSAI Ct
ool.Thesefil
esaredownloaded
separat
elyf
rom t
hedownloadl
inkprovi
dedatt heoff
ici
aldownloadsi
te
oftheDOMOSAI Ctool
.Ther
earethreedatatypesusedwithin
DoMosaics:
Tr
ees
Sequences
Domai
ns(
arr
angement
s)
Thesethreetypesareseparat
elyl
oadedint
oaproject(v
iaFil
e,
ManagePr oject
s).DoMosaicssupport
sthefoll
owi
ngf i
l
eformats:
Newick 
fortrees,Fast
a f
orsequences,
xdom, hmmscan or
 pf
amscan 
for
domainannot at
iondata.

8
Thebestwayt
ostar
twi
thDoMosai
cs,
ist
oloadaf
il
ewi
thf
ast
aent
ri
es
i
ntoanewproj
ect
.

DOMOSAI
CRSULTS

Reasonf
orr
eject
ionofr
esul
ts
i. Lessernumberofav rgenepr edicti
on.Multi
plepseudodomai n
predectionforsamegene.
i
i. Ther esult
sobt ainedfrom domosai cwer ereject
edbecausei twas
unablet oprov i
deamor eclearandspeci f
icidenti
ficat
ionofthe
effector
spr esentinthePucci niaspecies.
i
i
i. Itdidnott ellabouttheprobabi l
ityonwhicht heidenti
ficat
ionof
theef f
ectorsdepend.

9
STEP2-EFFECTORP
Eff
ectorPi sasof twar et oolforident i
ficati
onofpot ent i
aleffect
or
moleculesi nasetofsecr etedpr oteins.Comput ational ef
fector
candidatei denti
fi
cationandsubsequentf uncti
onal
characterizati
ondel i
ver sv al
uablei nsightsintopl ant-pathogen
i
nteractions.Howev er,effectorpr edictioninfungi hasbeen
chal
lengi ngduet oal ackofuni fyingsequencef eat uressuchas
conservedN- ter
minal sequencemot ifs.Fungal effect orsare
commonl ypr edi
ctedf rom secr etomesbasedoncr it
er i
asuchas
smallsizeandcy steine-rich,whichsuf fer
sf r
om pooraccur acy.

Effector
Pisamachinelearningmethodforfungaleff
ector
predicti
oni
nsecret
omesandhasbeent rai
nedt odist
inguish
secretedpr
otei
nsfrom secretedef
fect
orsinplant
-pathogenic

10
fungi.
 Ef
fectorPimprovesfungaleff
ectorpredi
cti
onfr
om
secretomesbasedonar obustsignalofsequence-
der
ived
properti
es,achiev
ingsensi
tivi
tyandspecifi
cit
yofover80%.

11
I
nst
all
ati
onofef
fect
orP
Eff
ectorPhasbeenwr itt
eni
nPy thonandusespepst
atsfr
om the
EMBOSSsof twareandt heWEKA3. 6soft
ware.TogetEf
fector
Ptowor
k
onyourl ocalmachine,youneedtoinst
allt
heEMBOSSandWEKA
soft
war esfrom source.Totestt
heproperworki
ngofeff
ectorpthe
fol
l
owi ngcommandi sused-

OUTPUTFORMAT
Effector
Pwill
retur
ntheoutputasshownintheexampl
ebelow.Fir
st,
it
willret
urnt
he pr
edict
edeff
ectors 
i
nyoursetasFASTAsequences,i
f
thereareany.

Second,a 
summarytabl
e wi
llbeshownwhi
chshowsthepr
edi
cti
ons
(ef
fect
orornon-
eff
ector
)foreachsubmi
tt
edprot
ein.
 

12
OUTPUTFI
LES
TheEf
fect
orPgener
atest
hreet
ypesofout
putf
il
es.

i
. ALL_OUTPUT.
txtFI
LE-Thi
sfil
escontainsthenamesofall
the
ef
fector
saswellasnon-
eff
ector
sincludi
ngthereef
fect
or.

i
i
. EFFECTOR. OUTFILE-Thi
sfi
lecontai
nsonlythenamesand
eff
ectorprobabi
li
tyoftheef
fect
orsthathavebeendet
ect
edby
theEffect
orptool
.

i
i
i. EFFECTOR_OUTPUT. FA-Thi
sfil
econtai
nstheef
fect
orsthat
havebeendetectedbytheEff
ectorpandal
socontai
nstheDNA
sequencesoftheseeff
ector
s.Thisfi
l
eiscr
eatedint
hef ast
a
for
mat .

13
EFFECTORPRESULTS
Pucci
niagr
ami
nis

Pucci
niat
ri
ti
cina

14
Pucci
niast
ri
if
ormi
s

15
STEP3-I dent
if
icat
ionofhomologous
sequencesf
orthepredict
edeffect
orsetusi
ng
BLAST(Basi
cLocal Al
ignmentSearchTool
)
In 
bioi
nformatics, 
BLAST f
or Basi
c Local 
Ali
gnment  Search Tool i
s
an al
gori
thm forcompar i
ng pri
mary  
biol
ogicalsequencei nformation,
suchast he 
ami no-aci
dsequencesof  pr
oteins 
orthe nucleotides 
of DNA
sequences.ABLASTsear chenablesar esearchertocompar eaquer y
sequencewithal i
brar
yor 
database ofsequences, andident i
fyli
brary
sequencesthatr esemblethequerysequenceabov eacer tainthreshol
d.

BLAST fi
ndsregi
onsofsimilar
it
ybetweenbiol
ogicalsequences.The
program compar
esnucleot
ideorprotei
nsequencestosequence
databasesandcal
cul
atesthestati
sti
calsi
gnif
icance.

Forcommenci ngtheprocessofBLAST,itr
equir
est
hecdsandthe
protei
nfil
esofthegiv
enspecies.Fort
heBLASTweusedEf f
ect
orP
result
sforthethr
eespeciesasquerysequence.

16
BLASTRESULTS

`
STEP4-PHYLOGENYofEf
fect
ors
 Pr
oduct
ionof“
cat
”fi
l
es

TheBl astresult
sf il
esandt heeffect
orpr esultf i
leswer ekepti n
separatefol
dersforbothDNAandpr ot
ein.Insteadofcont ai
ningjust
thenamesandsequencesoft heprotei
nst heycont ainedsomeext ra
contental
ongwi ththem eg.Ef
fect
orprobabili
ty,etc.Theremov alt
his
contentwasv it
alforthenextsteptooccur .Thi sextracontentwas
remov edt
hef ol
lowingli
nuxcommand-

sed-
i‘
s/|
.*
//’
fil
ename

Thiscommandr emov
edal
ltheext
racontentt
hatwaspresentaft
er
the“|”symbol.Nowwewer
el ef
twit
honlytheprot
eini
dandt he
proteinsequence.

The cat(shortf
or“concat
enat
e“)command is one ofthe most
fr
equent
ly used command i n Li nux/
Unix li
ke oper at
ing
syst
ems. 
cat 
commandal
lowsustocreat
esingl
eormult
ipl
efi
les,vi
ew

17
cont
ainoff
il
e,concat
enat
efi
l
esandr
edi
rectout
puti
nter
minal
orf
il
es.

Thecatf
il
esar
ecr
eat
edusi
ngt
hef
oll
owi
ngl
i
nuxcommand-

catf
il
e1f
il
e2f
il
e3f
il
e4>newf
il
e

Theabov ementionedcommandwi l
lcombineal
lthefourfi
l
es
togetherandmergethem t
ocreateasingl
enewfilewhichwil
l
containthecont
entsofal
lthefourf
il
es.

Thecatf
il
espr
oducedar
easf
oll
ows.Twocatf
il
eswer
ecr
eat
ed.One

18
f
orpr
otei
nsequencesandt
heot
herf
orDNAsequence.

 Pr
oduct
ionofmul
ti
ple.
txtf
il
es

Theeff
ect
orpresul
tfil
eisusedtocreat
ealist.
txtf
il
ewhichcont
ainsthe
namesofall
thesequences.Thecommandt hatwil
lbeusedtocreat
e
mult
ipl
e.t
xtfi
lewil
lreadthenamesofdiff
erentid.

sfrom t
hisf
il
eand

19
cr
eat
enewmul
ti
plef
il
es.Thecommandusedf
ordoi
ngsoi
sasf
oll
ows-

gr
ep‘
>’ef
fect
or_
fil
e>name.
li
st

Thi
sgrepcommandwi
llcombineal
lthecharact
ersst
art
ingwi
tht
he
symbol‘
>’andmer
get
hem int
oanewl i
stfi
le.

Nowtheloopcommandisusedtocreatethemult
ipl
e.txtf
il
es.This
commandwi l
luset
henamesoftheidsfrom t
henewl i
stfi
lethathas
beencr
eatedandmakemult
ipl
e.txtf
il
esfrom t
hem.

f
ori
in$(
catname.l
ist
);doecho$i;gr
ep$i
*fi
l
ename|
cut–f2>$i.
txt
;done

*f
il
enamemeansi twil
lconsi
derall
thefi
lespr
esentint
hecurrentfol
der
.
Thetextcommoni nal
lthefi
lesi
splaceaft
ert
hesy mbol“
*”.Allt
hese
commont extcont
aini
ngfi
leswil
lbeusedtocreat
enewmul ti
ple.t
xt

20
f
il
es.

.
txtf
il
escr
eat
ed

Mul
ti
ple.
txtf
il
eswi
l
lbecr
eat
ed

andt
hecont
entofeachoft
hese

f
il
eswi
l
lbeasshowni
nthese

i
mages.Thesef
il
esdoesnot

cont
ainanysequencebutonl
y

t
henamesoft
hev
ari
ousi
d.’
s.

21
 Pr
oduct
ionofmul
ti
ple.
fast
afi
l
es

Separ at
efastafi
lesarecreat
edf orbothdnaandprotei
nsequences.
Thecatf il
esarecopiedinthetxtfi
lescontai
ningf
olderandanewlist
containni
ngthenamesoft he.t
xtfil
esiscreat
ed.Thefastaf
il
esare
createdusingthefol
lowingcommand.

f
ori
in$(
catl
i
st)
;doecho$i;
seqt
ksubseqcat
.f
il
e$i
>
$i
.f
ast
a;done

Thiscommandr eadsthenamesoft hevari


ousid.’
sfr
om thenewlist
andfetchesther
esequencesf
rom thecatfil
ethathasrecent
lybeen
pl
acedi nthefol
der.Thenewl
isthasbeencr eatedbyusi
ngthesame
grepcommandt hatwasusedinthepr ev
iousstep.

Multi
plef
astafi
l
eswi l
lbecreated.Eachf
il
ewil
lcontai
nthei
d.’
saswel
l
astherer
espect
ivesequences.Thesenewlycr
eatedfi
leswi
l
lend
i
n.f
astaor.f
a.

Fast
afi
l
escr
eat
ed

22
STEP5–CLUSTALW
Allv ari
ati
onsoft heClust alwsof t
war ealignsequencesusi ngaheur isti
c
thatpr ogressi
velybuildsamul ti
plesequenceal i
gnmentf rom aseriesof
pairwiseal i
gnment s.Thi smet hodwor ksbyanal y zi
ngthesequencesas
awhol e,t
henut i
li
zingt heUPGMA/ Neighbor -
joiningmet hodtogener ate
adi stancemat ri
x.Agui det reeisthencal culatedf r
om t
hescor esoft he
sequencesi nthemat rix,t hensubsequent l
yusedt obui
ldthemul t
iple
sequenceal i
gnmentbypr ogressi
v el
yaligningt hesequencesi norderof
[
simi l
arit
y.Essentiall
y,Cl ust al
wcr eatesmul ti
plesequenceal i
gnment s
throught hreemai nsteps:

1.Doa 
pai
rwi
seal
i
gnment
 usi
ngt
hepr
ogr
essi
veal
i
gnmentmet
hod

2.Cr
eat
ea 
gui
det
ree 
(oruseauser
-def
inedt
ree)

3.Uset
hegui
det
reet
ocar
ryoutamul
ti
pleal
i
gnment

23
I
nput
/Out
put
Thi
sprogr
am accept
sawiderangeofinputf
ormat
s,i
ncl
udi
ng
NBRF/PI
R, 
FASTA,EMBL/
Swiss-Pr
ot,
Clustal
,GCC/
MSF,GCG9RSF,
and
GDE.
Theoutputfor
matcanbeoneormanyofthef
oll
owi
ng:
Clust
al,
NBRF/PIR,
 GCG/MSF,
 
PHYLIP,
GDE,orNEXUS.
BytheuseofCl
ustal
w,twot
ypesoffi
l
eswerecreat
edie..
alnand.
dnd
fi
les.Tocr
esat
ethesef
il
esCl
ustal
wwasusedinthefol
lowi
ngloop
command-

f
ori
in$(
catl
i
st)
;doecho$i
;cl
ust
alw–i
nfi
l
e=$i
–al
i
gn;
done
.
alnf
il
es

24
.
dndf
il
es:
treef
orhomol
ogoussequences

Althought
he.dndf
il
esthuscreat
edwereofnousesincet
henextst
ep
requir
edonlyt
heuseofthenewlycr
eat
ed.al
nfi
les.

25
STEP6-ALNt
oFASTAconv
ersi
on
Aftertheformationofalnanddndf ilesbyusi ngcl
uastal
w,thealnfil
e
areseparatedforbothDNAandpr oteinsequences.Anewl i
stiscreat
ed
whichcontainsthethenamesofal lthealnfil
es.Thi
snewl i
stiscreat
ed
usingt hesamegr epcommandasusedi ntheprevi
oussteps.Itwill
createal istcontaini
ng names ofal lthe .al
nfil
es.The f ol
l
owing
commandi susedtocreat
ealn.
fastaf i
les–

f
ori
in$(
catl
i
st)
;doecho$i;
seqret–sequence$i
–outseq$i
.f
ast
a;done

Thenewlyfor
medaln.f
ast
afi
l
esar
ereadyf
orr
evt
ranswher
ecodon
al
ignmentwil
loccur
.

Newl
ycr
eat
edf
ast
aal
i
gnedf
il
es

26
STEP7–REVTRANS:
CODONALI
GNMENT
RevTransisusef ulincaseswher eamul t
ipl
ealignmentofcodi ngDNA
formst hebasisforf urt
herinvesti
gat i
ons.Thisisoftenthecasei n
phylogeneti
canal ysis,whereamul t
ipleali
gnmenti sint
erpretedasa
statementofhomol ogywi t
heachcol umnr epresenti
ngchar act
ersof
commondescent .Inthiscontext,properali
gnmentofcodonboundar i
es
i
sespeci al
lyimpor tantforanalysest hati
nvolveestimati
onoft herat
io
betweennon- synony mousandsy nony mousmut at
ionrates.
Anot heruseofRev Transisasanai dfordesi gningdegeneratedPCR
pri
mer s.Ascenariof orthi
scouldbedesi gningPCRpr imer
st argeted
againstaspeci fi
cgeneacr ossar angeofor gani sms.Thetraditionalway
ofdoi ngt hi
sisbyal igni
ngpeptidesequencesf rom all
theorgani sms,
i
dent if
y i
ngsuitableregionsforpri
mert argeti
ngandt hendesigning
pri
mer sthataredegener atedwithr egardstot heami noacidsint he
targetar ea.Byusingknowl edgeoft heactual codonsusedi nthet ar
get
area, i
tispossibletol i
mitthedegr eet owhicht heprimersneedt obe
degener ated.RevTransmakessuchananal y siseasytoperform, andis
especi all
yusefulifthechosentargetar eaali
gnspoor l
yinaDNA
ali
gnment .
RevTranstakesasetofDNAsequences,v
irt
ual
lytr
anslat
esthem,al
i
gns
thepepti
desequences,andusesthi
sasascaff
oldforconst
ruct
ingt
he
corr
espondingDNAmul ti
pleal
ignment
.
Firstofallanewli
stcontai
ningthenamesofallthealn.
fast
af i
l
esis
created.Thenthealn.
fast
afilesandthel
istf
il
eiscopiedoutintoanew
folder.Thefol
lowi
ngcommandi susedtocreat
et henewRev t
ransfi
l
es-

f
oriin$(catli
st)
;doecho$i;pyt
hon
/pat
h_of_r
ev t
rans/r
evt
rans.
Py“$i
”.fast
a

$i”
_prot
ein.
aln.
fasta“$i
”_r
evt
rans.
aln;done

27
REVTRANSFI
LES

28
STEP8–SELECTON
Ther atioofnon- synony moust osy nony moussubst i
tutions, knownas
theKa/ Ksratio,isusedt oest imat ebot hpurif
yingandposi ti
veDar winian
selection.AKa/ Ksr atiosi gnif
icantlygreaterthan1i sindi cativeof
posi t
iveselect i
on,wher easv aluessi gnifi
cantl
ysmal l
ert han1ar e
i
ndi cativeofpur if
yingsel ecti
on.Ourmet hodcal culatest heKa/ Ksr ati
o
foreachcodonsi tei nacodon- basedmul t
ipl
esequenceal ignment
(MSA) .I nordertoachi ev emaxi mum accur acy,theal gor it
hm f orsite-
specificKa/ Ksr ati
osest imat i
onpr esent edhereexpl i
citlytakesi nt
o
accountt heev ol
utionar yr el
ationshipsamongt hesequencesandt he
under l
y i
ngst ochast i
cpr ocess.Thest ochasticpr ocessassumedher eis
amodi ficat
ionoft hecodon- basedev oluti
onarymodel suggest ed
by GoldmanandYang( 1994)  
wher et hemaxi mum l i
kelihoodest imates
oftheKa/ Ksr ati
osar ecomput edforeachsi te.Thesi gni f
icanceoft he
Ka/ Ksscor esi sobt ainedbyusi ngthelikeli
hoodr ati
ot est( LRT) .This
testcompar est wonest edmodel s:anul lmodel whi chassumesno
selectionandanal ter nat i
vemodel whichdoes.

Select
onwasnotusedonal lthefi
lesduemanytechni
calisuues
becauseitgeneratesmulti
plefi
l
eswi t
hthesamenamessoi twoul
d
havebeendiffi
culttoi
dentif
ysomepar ti
cul
arfi
l
es.Select
onisatime
consumingprocessandi twouldtakeaboutweeksti
met obe
successf
ull
yrunonal lt
henewl ycreat
edcodonali
gnedaln.f
astafi
les.

I
nspit
eofall
theabov
ementi
oneddi
ff
icul
ti
es,sel
ect
onwasperformed
onl
ineononl
y3ofthemanyf
il
esandtheresul
tshavebeendi
splay
edas
fol
l
ows-

29
Fi
gur
e1-PGTG_
00743T0.
txt
_rev
trans.
aln1

Fi
gur
e2-PGTG_
00949T0.
txt
_rev
trans.
aln1

Fi
gur
e3-PGTG_
03877T0.
txt
_rev
trans.
aln1

Asiti
sclearf
rom t
helegendthatascor
eofonegi vestheposi
ti
ve
sel
ect
ionwhichmeanst hatt
hesesi
teshavethehighestpr
obabi
li
tyto
getmutatedi
nthefut
ure.Whereasascoreofsevengivesapuri
fyi
ng

30
select
ionwhichmeanst hatthesesi t
eshav ethel eastpr
obabili
tyof
beingmut at
edint hefut ur
e.Infigure1andf igure2t herearenoposi ti
ve
select
ionswher easinfigure3qui t
eaf ewposi ti
veselecti
onsarepr esent.
Hence, t
hissequencecanbef urtherstudi
edt ofindmut at
ionalhotspots
thatmightcausepr oblem int hefutureandst epsmi ghtbetakenbef ore
handtoi nordertoprev entthesemut at
ionsfrom causinginfecti
on.

31

You might also like