You are on page 1of 17

A

Optimum choice of dataset ratio

1 1
2 2

r = 2 n×p

Optimum choice of window size

r = 23 ××np r = np r = 2 ×p n
B
Positive Dataset Preparation

blastpgp -d nr -i "P16386.seq" -
j 3 -h 0.001 -Q "P16386.pssm"
Negative Dataset Preparation

Prop osition 1
1
k

Phase 1: Three fold cross validation test


Phospho.ELM(A'' ) Independent
Benchm ark Dat aset (B)

Sequence Inform at ion Annot at ion Inform at ion

PSI-BLAST

PSSM profiles of PSSM profiles of


prot eins from A'' prot eins from B

SVM Inst ance


Preparat ion

Posit ive Inst ances Negat iv e Inst ances Posit ive Inst ances Negat iv e Inst ances
of A'' of A'' of B of B

Equalizing
bot h dat aset s
& Merge

Final
Training Inst ances

3-Fold
Cross Validat ion
Using SVM

Separat e Knowledgebase
(Models) for each of t he 3
residues for each of t he 5
window size

Choose Predict ion Benchm ark


Appropriat e Using SVM- Test Result wit h
Model PREDICT Assessm ent
A R N D C Q E G H I L K M F P S T W Y V
...
191 E 0 0 0 0 2 1 2 −1 −2 −2 −1 1 0 0 0 0 −1 3 −2 −1
192 E 0 1 1 0 −3 0 1 −2 −2 0 −2 1 0 −1 0 1 1 −3 0 −1
193 S 0 −1 1 −1 0 0 0 1 −2 0 −2 −1 0 −2 2 1 2 −3 0 0
194 D −1 0 1 3 1 1 2 1 1 −3 −1 0 −2 −3 0 0 −1 −4 −3 −2
195 S 0 0 1 0 1 1 1 0 1 −2 −1 0 0 −3 1 1 0 −3 −2 −1
196 V 0 0 −1 −1 −2 −1 0 1 1 0 0 0 0 1 0 0 0 −2 1 0
197 D −1 0 0 2 −3 0 1 0 −2 −1 −1 0 −2 −3 1 1 0 −3 −1 1
198 S 0 −2 0 0 1 −1 1 −2 0 −1 −1 0 −1 0 0 3 1 −3 −1 −1
199 A 1 −2 0 0 2 0 1 1 0 −3 −2 1 −2 −3 0 1 0 −4 −3 −1
200 D 0 0 0 2 −3 1 2 −1 2 −2 −1 0 1 −3 1 0 0 −3 −2 −1
201 A 1 0 1 0 2 0 0 0 −2 −1 −1 0 −2 −1 2 1 1 −3 −2 −1
202 E 1 1 1 0 3 0 1 −2 −2 −2 −1 0 0 −1 0 0 0 2 0 −1
203 E 0 1 1 1 2 1 1 0 −1 −2 −1 0 −1 0 1 0 0 −3 −2 −2
204 D 0 1 1 0 −2 1 0 0 −1 −1 −1 0 1 −2 1 0 0 −2 1 0
205 D 0 0 1 2 1 0 2 −1 1 −1 −3 0 0 0 0 0 0 −3 0 −1
...

Phase 2: Prediction system Testin g with Bench- TP + TN


m ark d ataset Ac = × 100 %
TP + FP + TN + FN
TP
Sn = × 100 %
TP + FN
TN
Sp = × 100 %
TN + FP
( TP ×TN ) −( FP ×FN )
Mcc =
( TP + FN ) ×( TN + FP )×( TP + FP) ×( TN + FN )
FP
FPR = × 100%
FP + TN

P Pred
Non-annot at ed (unknown)
Prot ein Sequence

PSI-BLAST

PSSM profiles of
t he prot ein

SVM Inst ance


Preparat ion

Inst ances for each of t he S,T,Y posit ions


for each of t he five diff erent window sizes

Choose Predict ion Benchm ark


Appropriat e Using SVM- Test Result w it h
Model PREDICT Assessm ent

A'
A"

You might also like