2010 - Evaluation of Protein Phosphorylation Site Predictors

64 Protein & Peptide Letters, 2010, 17, 64-69
Evaluation of Protein Phosphorylation Site Predictors
Shufu Que1,2,#, Yongfei Wang2,#, Peixiang Chen2, Yu-Rong Tang3, Ziding Zhang3, and
Huaqin He1,2,*
1
Key Laboratory of Ministry of Education for Genetic, Breeding and Multiple Utilization of Crops, Fujian Agriculture
and Forestry University, Fuzhou 350002, China; 2College of Life Science, Fujian Agriculture and Forestry University,
Fuzhou 350002, China; 3State Key Laboratory of AgroBiotechnology, College of Biological Sciences, China Agricul-
tural University, Beijing 100193, China
Abstract: A series of elegant phosphorylation site prediction methods have been developed, which are playing an increas-
ingly important role in accelerating the experimental characterization of phosphorylation sites in phosphoproteins. In this
study, we selected six recently published methods (DISPHOS, NetPhosK, PPSP, KinasePhos, Scansite and PredPhospho)
to evaluate their performance. First, we compiled three testing datasets containing experimentally verified phosphoryla-
tion sites for mammalian, Arabidopsis and rice proteins. Then, we present the prediction performance of the tested meth-
ods on these three independent datasets. Rather than quantitatively ranking the performance of these methods, we focused
on providing an understanding of the overall performance of the predictors. Based on this evaluation, we found the fol-
lowing results: i) current phosphorylation site predictors are not effective for practical use and there is substantial need to
improve phosphorylation site prediction; ii) current predictors perform poorly when used to predict phosphorylation sites
in plant phosphoproteins, suggesting that a rice-specific predictor will be required to obtain confident computational anno-
tation of phosphorylation sites in rice proteomics research; and iii) the tested predictors are complementary to some ex-
tent, implying that establishment of a meta-server might be a promising approach to developing an improved prediction
system.
Keywords: Protein, Phosphorylation site, Prediction, Evaluation.
INTRODUCTION mechanism of phosphorylation dynamics. Nevertheless,

these experiments are often time-consuming and costly [11].
Phosphorylation is one of the most important types of
Experimental difficulties in the large-scale identification of
post-translational modifications [1-4]. More than 30% of all
protein phosphorylation sites have stimulated the develop-
eukaryotic proteins are estimated to undergo reversible
ment of computational approaches to predict these sites from
phosphorylation [5]. Phosphorylation of substrate sites at protein sequences [12].
serine (Ser), threonine (Thr) or tyrosine (Tyr) residues is
performed by members of the protein kinase family. Bio- Over the past decade, a series of algorithms have been
chemically, phosphorylation involves the transfer of a phos- developed to predict phosphorylation sites from amino acid
phate moiety from adenosine triphosphate (ATP) to the ac- sequence. These approaches range from simple motif
ceptor residue, thereby generating adenosine diphosphate searches to more complex machine learning methods, such
(ADP). Protein kinases catalyze the phosphorylation events as Artificial Neural Networks (ANN) and Support Vector
that are essential for the regulation of cellular processes such Machines (SVM). Of the available resources, a few well-
as metabolism, proliferation, differentiation and apoptosis. maintained phosphorylation site prediction web sites have
Owing to the importance of protein phosphorylation in regu- been made freely available to the scientific community, in-
lating cellular signaling, a major goal of current proteomic cluding NetPhos [2], NetPhosK [12], KinasePhos [13],
efforts is to identify phosphoproteins in higher organisms DISPHOS [14], Scansite [15], PPSP [16], GPS [17], Pred-
[6]. Several large-scale phosphoproteomics studies have Phospho [18], and NetPhosYeast [19]. Generally, these
been carried out in yeast [7], mouse [8], human [9], plant phosphorylation-site predictors can be grouped into two
[10], and other species. categories. The first category of predictors focus on forecast-
ing whether a query Ser, Thr or Tyr residue is a phosphoryla-
Traditional experimental identification of protein kinase-
tion site or not. This group includes the NetPhos and
specific phosphorylation sites on substrates in vivo and in
DISPHOS. The second category of predictors that aim to
vitro has provided a foundation for understanding the identify protein kinase-family-specific phosphorylation sites.
This category includes NetPhosK, PSSP, PredPhospho, and
*Address correspondence to this author at the Key Laboratory of Ministry
of Education for Genetic, Breeding and Multiple Utilization of Crops, Fu-
KinasePhos. Comparatively, the kinase-family-specific
jian Agriculture and Forestry University, Fuzhou 350002, China and Col- phosphorylation site prediction results are more useful, be-
lege of Life Science, Fujian Agriculture and Forestry University, Fuzhou cause the corresponding kinase family information for every
350002, China; Tel: +86-591-83789443; Fax: +86-591-83789352; possible phosphorylation site can be inferred [13, 18]. How-
E-mail: hehq3@yahoo.com.cn ever, this type of prediction may be suitable only for those
#
These authors contributed equally to this work
0929-8665/10 $55.00+.00 © 2010 Bentham Science Publishers Ltd.

Evaluation of Protein Phosphorylation Site Predictors Protein & Peptide Letters, 2010, Vol. 17, No. 1 65
kinase families in which the number of known phosphoryla- with a quantitative understanding of the overall performance
tion sites is large. Therefore, these kinase-family-specific of the tested predictors. Moreover, we also point out some
prediction systems are inevitably limited to only a few kinase existing problems regarding protein phosphorylation site
families [18]. There are currently two important directions in prediction.
the field of protein phosphorylation site prediction. With the
increase of experimentally validated phosphorylation sites, MATERIALS AND METHODS
the first direction involves development of organism-specific
Datasets
predictors. Development of these predictors has been initi-
ated in some model organisms, specifically yeast [19] and We generated three phosphorylation site datasets consist-
Arabidopsis [20]. The second direction involves combining ing of experimentally verified mammalian, Arabidopsis and
phosphorylation site prediction with a network context of rice phosphoproteins, respectively. All the mammalian phos-
kinases and phosphoproteins. In addition to predicting phoproteins in Phospho.ELM (release version 7.0), which
kinase-family-specific phosphorylation sites, such a method contains 4,026 protein entries covering 16,428 phosphoryla-
would allow for the identification of specific kinases that are tion sites, were downloaded [22]. Three hundred mammalian
responsible in vivo for the predicted phosphorylation events. phosphoproteins were randomly selected from this total set
Existing phosphorylation site predictors play an increas- to compile the initial mammalian phosphoprotein dataset. To
ingly important role in complementing the experimental remove sequence redundancy, the initial mammalian phos-
characterization of protein phosphorylation. Generally, a phoproteins were filtered using a 30% sequence identity fil-
newly developed predictor must be intensively evaluated ter. Briefly, we performed pairwise BLAST searches for any
before its acceptance for publication in a peer-reviewed sequence pairs in the initial mammalian phosphoprotein
journal. Owing to the availability of many phosphorylation dataset. For sequence pairs with sequence identity > 30%,
site datasets, the accuracy of different predictors reported by only one sequence was left in the dataset. Thus, the phos-
the developers may be overestimated, because these methods phoprotein dataset size was reduced to 231 total proteins.
were optimized predominantly for the corresponding training The Arabidopsis and rice phosphorylation site datasets were
and testing data. Moreover, the current protein phosphoryla- obtained from recent literature, the feature table of the Swiss-
tion site predictors are derived primarily from mammalian Prot database (release 42.10) [23], and unpublished rice
phosphorylation site data and they may show lower predic- phosphorylation data from our laboratory. As in the mam-
tion accuracy for proteins from other organisms [19, 20]. To malian dataset, we filtered the Arabidopsis and rice phos-
our knowledge, different protein phosphorylation site predic- phoproteins using a sequence identity of 30% and compiled
tors have not been intensively benchmarked via some inde- them into the testing datasets for Arabidopsis and rice.
pendent datasets in the literature. Sikder and Zomaya (2009) The mammalian dataset consists of a total of 754 phos-
developed an independent dataset, collected from Phos- phoserine (pSer) sites, 135 phosphothreonine (pThr) sites
pho.ELM, to compare the performances of KinasePhos, Net- and 45 phosphotyrosine (pTyr) from 231 mammalian pro-
Phosk, DISPHOS, Scansite and PPSP [21]. However, this teins (Table 1). The Arabidopsis dataset contains a total of
independent dataset contained mainly mammalian proteins. 153 pSer sites, 47 pThr sites and 11 pTyr sites from 98
To explore the current phosphorylation site predictors’ appli- Arabidopsis proteins. The rice dataset consists of a total of
cation to plant proteomes, we decided to quantitatively in- 359 pSer sites, 263 pThr sites and 1 pTyr site from 105 rice
vestigate their performance when applied to Arabidopsis or proteins (Table 1). Since all of these phosphorylation sites
rice phosphoproteins. To this end, we constructed three in- are experimentally verified, they are regarded as (+) sites.
dependent phosphorylation site datasets and evaluated the The Ser, Thr and Tyr residues that are not annotated as phos-
performance of six phosphorylation site predictors on these phorylation sites within the above three datasets are regarded
datasets. We believe that this work will provide scientists
Table 1. The Number of Positive and Negative Phosphorylation Sites in the Three Phosphorylation Site Datasets
Datasets Sites pSer c pThr d pTyr e Total
(+) site a 359 263 1 623

rice
(-) site b 1961 1216 740 3917
(+) site 153 47 11 211

Arabidopsis
(-) site 4603 2625 1457 8685
(+) site 754 135 45 934

mammal
(-) site 15709 9825 4267 29801
a
(+) site: Phosphorylation sites annotated by experimental results
b
(-) site: Non-phosphorylation sites
c
pSer: Ser phosphorylation sites
d
pThr: Thr phosphorylation sites
e
pTyr: Tyr phosphorylation sites
66 Protein & Peptide Letters, 2010, Vol. 17, No. 1 Que et al.
as ( ) sites (i.e., non-phosphorylation sites). The number of kinase families. In this evaluation, prediction was made by
( ) sites in each dataset is also listed in Table 1. We have considering all kinase groups and families.
made the three phosphorylation site datasets freely available
at http://210.34.95.200/hlx/, to allow other researchers to Performance Assessment
benchmark their methods against our results.
Four measurements—Sensitivity (Sn), Specificity (Sp),
Accuracy (Ac) and the Matthews correlation coefficient
Six Tested Phosphorylation Site Predictors
(MCC) — were employed to evaluate the performance of the
Each sequence from the three independent datasets was tested predictors (definitions below):
individually run through all six methods (KinasePhos, Net-
TP ,
PhosK, DISPHOS, Scansite, PPSP and PredPhospho) using Sn =
the web site servers provided by the developers. Aside from TP + FN
DISPHOS, all the tested predictors belong to kinase-specific TN ,
programs. To allow for a fair comparison of these six predic- Sp =
TN + FP
tors, we evaluate only whether a candidate site is correctly
predicted. The correctness of the predicted kinase family TP + TN ,
Ac =
information is not considered in our analysis. The algorithms TP + FP + TN + FN
and parameter settings of the evaluated predictors are briefly
described below. and
DISPHOS (DISorder-enhanced PHOSphorylation site (TP TN ) ( FN FP) .

MCC =
predictor, http://core.ist.temple.edu/pred/) uses position- (TP + FN ) (TN + FP) (TP + FP) (TN + FN )
specific amino acid composition and predicted structural
disorder information to distinguish phosphorylation and non- where TP, FP, FN, and TN denote true positives, false posi-
phosphorylation sites. For our work, all the sequences in the tives, false negatives, and true negatives. Sn and Sp illustrate
testing datasets were analyzed using DISPHOS (version 1.3). the correct prediction ratios of positive and negative data
Since DISPHOS provides the option of different kingdoms sets, respectively. When the number of positive data and
and organisms, the “Eukaryotes” option was selected for negative data differ too much from each other, the MCC
predicting mammalian and rice proteins, and the “A. value is calculated to assess the prediction performance. The
thaliana” option was selected for Arabidopsis proteins. value of MCC ranges from -1 to 1, and larger values indicate
better prediction performance.
NetPhosK (http://www.cbs.dtu.dk/services/NetPhosK/)
uses an artificial neural network algorithm to predict 17 RESULTS AND DISCUSSION
kinase-family-specific phosphorylation sites. In this work, all
the tested proteins were analyzed with the NetPhosK 1.0 The Overall Performance of the Six Tested Predictors
server without a filtering method, and a threshold of 0.5 was For the mammalian dataset, the prediction performance
used to judge whether a site was predicted as phosphory- of the different methods indicated the following trend: Pred-
lated. Phospho>DISPHOS>Scansite>NetPhosK>PPSP>Kinase
KinasePhos (http://kinasephos.mbc.nctu.edu.tw/index. Phos (Table 2). PredPhospho demonstrated the best perform-
php) employs a Profile Hidden Markov Model (HMM) to ance, with an MCC value of 0.1971, and KinasePhos yielded
predict kinase-family-specific phosphorylation sites. In this the poorest prediction (MCC = 0.0076). When tested on the
evaluation, KinasePhos was run with the option of 100% Arabidopsis and rice datasets, the performance ranking of the
prediction specificity. No specific kinase was set, since we different predictors showed a trend that was similar to that of
evaluated only the predicted generic phosphorylation sites. the mammalian dataset. For the Arabidopsis dataset, the pre-
diction performance of PredPhospho demonstrated the high-
Scansite (http://scansite.mit.edu/) uses scores calculated est MCC (0.1141), DISPHOS was second (MCC = 0.0976),
from position-specific score matrices (PSSM) to search for NetphosK third (MCC = 0.052), PPSP fourth (MCC =
motifs within proteins that are likely to be phosphorylated by
0.0201), Scansite fifth (MCC = 0.0199), and KinasePhos
specific protein kinases. In this work, Scansite2.0 was run by yielded the lowest MCC (-0.0095). For the rice dataset,
searching all motifs and the “high stringency level” setting PredPhospho again yielded the highest MCC (0.1061),
was selected. whereas KinasePhos resulted in the lowest MCC (-0.0573).
PPSP (http://bioinformatics.lcd-ustc.org/PPSP/) utilizes Compared with the other five predictors, PredPhospho is the
Bayesian Decision Theory (BDT) to predict phosphorylation most favorable algorithm for phosphorylation site prediction
sites for approximately 70 kinase groups. In this work, pre- in mammalian, Arabidopsis and rice proteins.
diction was performed by considering all kinase groups and Generally, all six evaluated phosphorylation site predic-
the balance performance option was selected. tors showed poor prediction performance. Using the predic-
PredPhospho (http://pred.ngri.re.kr/PredPhospho.htm) tion of mammalian proteins as an example, the MCC values
predicts various kinase-specific phosphorylation sites by for all the tested predictors remained below 0.2, suggesting
training SVMs. Due to the limitations of phosphorylated that these predictors alone were not sufficient for practical
sequence data in public databases, the current PredPhospho use. The most critical issue contributing to low MCC values
can make classifiers for only four kinase groups and four in our study is that the number of non-phosphorylated Ser,
Thr or Tyr residues in a protein is far larger than that of
Table 2. The Overall Predictive Performance for the Phosphorylation Site Datasets
Tools DATASET Sn Sp Ac MCC
rice 94.86 2.4 15.09 -0.0573
KinasePhos Arabidopsis 95.73 3.17 5.36 -0.0095
mammalian 98.93 1.62 4.58 0.0076
rice 38.04 63 59.56 0.0075
DISPHOS Arabidopsis 63.98 66.47 66.41 0.0976
mammalian 86.51 61.42 62.18 0.1679
rice 59.07 52.41 53.33 0.079
NetPhosK Arabidopsis 66.82 50.26 50.65 0.052
mammalian 82.33 43.63 44.81 0.0901
rice 93.42 4.19 16.43 -0.0397
PPSP Arabidopsis 97.16 5.95 8.12 0.0201
mammalian 98.93 8.03 10.8 0.0445
rice 13.48 87.13 77.03 0.0063
Scansite Arabidopsis 18.48 86.06 84.45 0.0199
mammalian 42.08 85.08 83.78 0.128
rice 25.33 88.44 84.03 0.1061
PredPhospho Arabidopsis 59.61 77.01 76.7 0.1141
mammalian 78.88 77.65 77.68 0.1971
phosphorylated sites, inducing an imbalance of ( ) sites and Phos, PredPhospho, DISPHOS and NetPhosK showed the
(+) sites. Such a phenomenon is further exemplified in the highest MCC on the rice dataset, while the other two predic-
prediction performance of PredPhospho. When tested in tors still demonstrated their highest MCC values on the
mammalian proteins, PredPhospho performed reasonably mammalian dataset. Since the number of pTyr sites in the
well (Sn = 78.88% and Sp = 77.65%). However, the ratio of rice dataset is very limited, we are unable to conclude that
(+) sites to ( ) sites in the mammalian dataset is approxi- these four methods are favorable for pTyr site prediction.
mately 1:32, indicating that the prediction result of Pred-
Phospho indeed contains too many false positives, resulting The Tested Predictors Exhibited Better Performance on
in a relatively low MCC value (0.1971). For a machine Mammalian Proteins than Plant Proteins
learning method-based phosphorylation site predictor, it is
Each predictor exhibited the best performance (i.e., the
important to note that an optimized ratio of (+) sites to ( )
sites in the training dataset can often result in a better predic- highest MCC) on the mammalian dataset and the poorest
performance (i.e., the lowest MCC) on either the rice or
tion model.
Arabidopsis dataset (Table 2). It is likely that this is caused
by the predictors having been derived primarily from mam-
The Prediction Performance of pSer, pThr and pTyr
malian phosphorylation sites. We conclude that the tested
Sites
predictors might not adapt well to predicting plant protein
The tested predictors demonstrated different performance phosphorylation sites. Therefore, a plant-specific predictor
on pSer, pThr and pTyr site predictions (Fig. 1). Generally, may yield better performance when applied to plant proteins
the predictors have similar performance on pSer and pThr than a generic predictor. For instance, a plant-specific pre-
sites and better predictive performance on pTyr site. Because dictor called PhosPhAt has been developed to predict pSer
the number of pTyr sites is limited in all the testing datasets, sites in Arabidopsis. PhosPhAt was benchmarked to reveal
we can not conclude that the prediction of pTyr sites is easier better performance than some generic predictors [20]. With
than the prediction of pSer and pThr sites. In terms of pSer the accumulation of experimentally validated phosphoryla-
and pThr site prediction on the three datasets, all the tested tion sites in rice, it is hoped that a rice-specific predictor will
predictors indicated better performance (MCC) on the mam- be available in the near future. Such a tool will inevitably
malian dataset than on either plant (Arabidopsis and rice) facilitate more reliable computational annotation of phos-
dataset (Fig. 1). However, for pTyr site prediction, Kinase- phorylation sites in rice phosphoproteins.
68 Protein & Peptide Letters, 2010, Vol. 17, No. 1 Que et al.
weighted voting method to incorporate 15 phosphorylation-

site predictors into a meta-predictor, the performance of
which can exceed that of all individual predictors [24].
FUTURE PERSPECTIVES
In statistical predictions, three cross-validation methods
are often used to examine a predictor for its effectiveness in
a practical application: the independent dataset test, the sub-
sampling test, and the jackknife test [25]. However, as dem-
onstrated by Chou and Shen [26, 27], the jackknife test is
deemed the most objective of these three tests and can al-
ways yield a unique result for a given benchmark dataset.
For these reasons, the jackknife test has been increasingly
used by investigators to examine the accuracy of various
predictors [28-35]. However, a feasible way to examine the
quality of the existing web-site predictors is the independent
dataset test, as performed in this study.
In this study, we constructed three phosphorylation site
datasets (i.e., the datasets of mammal, Arabidopsis and rice
proteins) to benchmark the prediction performance of six
phosphorylation site predictors. It is worth mentioning that
the three testing datasets can not be regarded as the gold
standard. In particular, the unreliable (-) sites may impact the
evaluation. Some phosphorylation sites that have not been
experimentally validated are likely to be represented as (-)
sites in this study, contributing to the “noise” of our evalua-
tion. Moreover, the size of these three datasets may not be
large enough to represent the complete sequence space of
phosphoproteins. Despite these shortcomings, this evaluation
provides a quantitative understanding of the overall perform-
ance of current phosphorylation-site predictors.
We expect that both the developers of phosphorylation
site predictors and the interested experimental scientists will
benefit from our findings in this work. We would like to em-
phasize the following findings we have obtained. First, the
current phosphorylation site predictors are not effective for
practical use. In particular, the current predictors often yield
overprediction at the whole-sequence level (i.e., a lower frac-
tion of the predicted sites are actually true positives). There-
fore, reduction of overprediction remains an important direc-
tion for development of new phosphorylation-site predictors.
Second, the current predictors perform less well when they
are applied to predicting phosphorylation sites in Arabidop-
sis and rice phosphoproteins. Therefore, it is important to
develop organism-specific phosphorylation site predictors.
With the accumulated phosphorylation site data available in
many model organisms, such predictors have recently been
Figure 1. The predictive performance (MCC) of the six tested pre-
developed. Finally, we have found that the tested predictors
dictors on pSer (A), pThr (B) and pTyr (C) sites.
are complementary to some extent, suggesting that estab-
lishment of a meta-server may result in better predictive per-
The Complementarity Between Different Predictors
formance.
We found that different predictors are often complemen-
tary to some extent. For example, the prediction performance ACKNOWLEDGEMENTS
of KinasePhos was 98.93% (Sn) and 1.62% (Sp) on the
This study was supported in part by grant 20070019050
mammalian dataset and outperformed Scansite (Sn = 42.08%
of the State Education Ministry, China, the Key Science Pro-
and Sp = 85.08%) with a 56.85% higher sensitivity and an
gram of Fujian (no. 2009N0011), the National Natural Sci-
83.46% lower specificity. This suggested that an effective
ence Foundation of China (no. 30020170), and the Key Pro-
combination of KinasePhos and Scansite may result in a new gram of Ecology, Fujian Province (0608507).
prediction system with increased performance. Indeed, such
an effort has recently been initiated. Wan et al. (2008) used a
REFERENCES [19] Ingrell, C.R.; Miller, M.L.; Jensen, O.N.; Blom, N. NetPhosYeast:
prediction of protein phosphorylation sites in yeast. Bioinformatics,
[1] Peck, S.C. Early phosphorylation events in biotic stress. Curr. 2007, 23(7), 895-897.
Opin. Plant Biol., 2003, 6(4), 334-338. [20] Heazlewood, J.L.; Durek, P.; Hummel, J.; Selbig, J.; Weckwerth,
[2] Blom, N.; Gammeltoft, S.; Brunak, S. Sequence and structure- W.; Walther, D.; Schulze, W.X. PhosPhAt: a database of
based prediction of eukaryotic protein phosphorylation sites. J. phosphorylation sites in Arabidopsis thaliana and a plant-specific
Mol. Biol., 1999, 294(5), 1351-1362. phosphorylation site predictor. Nucleic Acids Res., 2008,
[3] Chou, K.C.; Watenpaugh, K.D.; Heinrikson, R.L. A model of the 36(Database issue), D1015-1021.
complex between cyclin-dependent kinase 5(Cdk5) and the [21] Sikder, A.R.; Zomaya, A.Y. Analysis of protein phosphorylation
activation domain of neuronal Cdk5 activator. Biochem. Biophys. site predictors with an independent dataset. Int. J. Bioinform. Res.
Res. Commun., 1999, 259, 420-428. Appl., 2009, 5(1), 20-37
[4] Chou, K.C. Review: Structural bioinformatics and its impact to [22] Diella, F.; Cameron, S.; Gemund, C.; Linding, R.; Via, A.; Kuster,
biomedical science. Curr. Med. Chem., 2004, 11, 2105-2134. B.; Sicheritz-Ponten, T.; Blom, N.; Gibson, T.J. Phospho.ELM: a
[5] Hubbard, M.J.; Cohen, P. On target with a new mechanism for the database of experimentally verified phosphorylation sites in
regulation of protein phosphorylation. Trends Biochem Sci., 1993, eukaryotic proteins. BMC Bioinformatics, 2004, 5, 79.
18(5), 172-177. [23] Boutet, E.; Lieberherr, D.; Tognolli, M.; Schneider, M.; Bairoch,
[6] Khan, M.M.; Jan, A.; Karibe, H.; Komatsu, S. Identification of A. UniProtKB/Swiss-Prot. Methods Mol. Biol., 2007, 406, 89-112.
phosphoproteins regulated by gibberellin in rice leaf sheath. Plant [24] Wan, J.; Kang, S.; Tang, C.; Yan, J.; Ren, Y.; Liu, J.; Gao, X.;
Mol. Biol., 2005, 58(1), 27-40. Banerjee, A.; Ellis, L.B.; Li, T. Meta-prediction of phosphorylation
[7] Ficarro, S.B.; McCleland, M.L.; Stukenberg, P.T.; Burke, D.J.; sites with weighted voting and restricted grid search parameter
Ross, M.M.; Shabanowitz, J.; Hunt, D.F.; White, F.M. selection. Nucleic Acids Res., 2008, 36(4), e22.
Phosphoproteome analysis by mass spectrometry and its [25] Chou, K.C.; Zhang, C.T. Review: Prediction of protein structural
application to Saccharomyces cerevisiae. Nat. Biotechnol., 2002, classes. Crit. Rev. Biochem. Mol. Biol., 1995, 30, 275-349.
20(3), 301-305. [26] Chou, K.C.; Shen, H.B. Cell-PLoc: A package of web-servers for
[8] Ballif, B.A.; Villen, J.; Beausoleil, S.A.; Schwartz, D.; Gygi, S.P. predicting subcellular localization of proteins in various organisms.
Phosphoproteomic analysis of the developing mouse brain. Mol. Nat. Protocols, 2008, 3, 153-162.
Cell Proteomics, 2004, 3(11), 1093-1101. [27] Chou, K.C.; Shen, H.B. Review: Recent progresses in protein
[9] Lim, Y.P.; Diong, L.S.; Qi, R.; Druker, B.J.; Epstein, R.J. subcellular location prediction. Analy. Biochem., 2007, 370, 1-16.
Phosphoproteomic fingerprinting of epidermal growth factor [28] Xiao, X.; Chou, K.C. Digital coding of amino acids based on
signaling and anticancer drug action in human tumor cells. Mol. hydrophobic index. Protein Pept. Lett., 2007, 14, 871-875.
Cancer Ther., 2003, 2(12), 1369-1377. [29] Munteanu, C.B.; Gonzalez-Diaz, H.; Magalhaes, A.L. Enzymes/
[10] Nuhse, T.S.; Stensballe, A.; Jensen, O.N.; Peck, S.C. Phospho- non-enzymes classification model complexity based on
proteomics of the Arabidopsis plasma membrane and a new composition, sequence, 3D and topological indices. J. Theor. Biol.,
phosphorylation site database. Plant Cell, 2004, 16(9), 2394-2405. 2008, 254, 476-482.
[11] Hjerrild, M.; Gammeltoft, S. Phosphoproteomics toolbox: compu- [30] Munteanu, C.R.; Gonzalez-Diaz, H.; Borges, F.; de Magalhaes,
tational biology, protein chemistry and mass spectrometry. FEBS A.L. Natural/random protein classification models based on star
Lett., 2006, 580(20), 4764-4770. network topological indices. J. Theor. Biol., 2008, 254, 775-783.
[12] Blom, N.; Sicheritz-Ponten, T.; Gupta, R.; Gammeltoft, S.; Brunak, [31] Rezaei, M.A.; Abdolmaleki, P.; Karami, Z.; Asadabadi, E.B.;
S. Prediction of post-translational glycosylation and phospho- Sherafat, M.A.; Abrishami-Moghaddam, H.; Fadaie, M.;
rylation of proteins from the amino acid sequence. Proteomics, Forouzanfar, M. Prediction of membrane protein types by means of
2004, 4(6), 1633-1649. wavelet analysis and cascaded neural networks. J. Theor. Biol.,
[13] Huang, H.D.; Lee, T.Y.; Tzeng, S.W.; Horng, J.T. KinasePhos: a 2008, 254, 817-820.
web tool for identifying protein kinase-specific phosphorylation [32] Kannan, S.; Hauth, A.M.; Burger, G. Function prediction of
sites. Nucleic Acids Res., 2005, 33(Web Server issue), W226-229. hypothetical proteins without sequence similarity to proteins of
[14] Iakoucheva, L.M.; Radivojac, P.; Brown, C.J.; O'Connor, T.R.; known function. Protein Pept. Lett., 2008, 15, 1107-1116.
Sikes, J.G.; Obradovic, Z.; Dunker, A.K. The importance of [33] Chen, C.; Chen, L.; Zou, X.; Cai, P. Prediction of protein
intrinsic disorder for protein phosphorylation. Nucleic Acids Res., secondary structure content by using the concept of chou's pseudo
2004, 32(3), 1037-1049. amino acid composition and support vector machine. Protein Pept.
[15] Obenauer, J.C.; Cantley, L.C.; Yaffe, M.B. Scansite 2.0: Proteome- Lett., 2009, 16, 27-31.
wide prediction of cell signaling interactions using short sequence [34] Shen, H.B.; Chou, K.C. QuatIdent: A web server for identifying
motifs. Nucleic Acids Res., 2003, 31(13), 3635-3641. protein quaternary structural attribute by fusing functional domain
[16] Xue, Y.; Li, A.; Wang, L.; Feng, H.; Yao, X. PPSP: prediction of and sequential evolution information. J. Proteome Res., 2009, 8,
PK-specific phosphorylation site with Bayesian decision theory. 1577-1584.
BMC Bioinformatics, 2006, 7, 163. [35] Xiao, X.; Wang, P.; Chou, K.C. Predicting protein quaternary
[17] Xue, Y.; Zhou, F.; Zhu, M.; Ahmed, K.; Chen, G.; Yao, X. GPS: a structural attribute by hybridizing functional domain composition
comprehensive www server for phosphorylation sites prediction. and pseudo amino acid composition. J. Appl. Crystallogr., 2009,
Nucleic Acids Res., 2005, 33(Web Server issue), W184-187. 42, 169-173.
[18] Kim, J.H.; Lee, J.; Oh, B.; Kimm, K.; Koh, I. Prediction of
phosphorylation sites using SVMs. Bioinformatics, 2004, 20(17),
3179-3184.
Received: March 05, 2009 Revised: May 23, 2009 Accepted: May 27, 2009

2010 - Evaluation of Protein Phosphorylation Site Predictors

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2010 - Evaluation of Protein Phosphorylation Site Predictors

Uploaded by

Copyright:

Available Formats

64 Protein & Peptide Letters, 2010, 17, 64-69

Evaluation of Protein Phosphorylation Site Predictors

INTRODUCTION mechanism of phosphorylation dynamics. Nevertheless,

0929-8665/10 $55.00+.00 © 2010 Bentham Science Publishers Ltd.

Datasets Sites pSer c pThr d pTyr e Total

(+) site a 359 263 1 623

(+) site 153 47 11 211

(+) site 754 135 45 934

DISPHOS (DISorder-enhanced PHOSphorylation site (TP TN ) ( FN FP) .

Tools DATASET Sn Sp Ac MCC

rice 94.86 2.4 15.09 -0.0573

KinasePhos Arabidopsis 95.73 3.17 5.36 -0.0095

mammalian 98.93 1.62 4.58 0.0076

rice 38.04 63 59.56 0.0075

DISPHOS Arabidopsis 63.98 66.47 66.41 0.0976

mammalian 86.51 61.42 62.18 0.1679

rice 59.07 52.41 53.33 0.079

NetPhosK Arabidopsis 66.82 50.26 50.65 0.052

mammalian 82.33 43.63 44.81 0.0901

rice 93.42 4.19 16.43 -0.0397

PPSP Arabidopsis 97.16 5.95 8.12 0.0201

mammalian 98.93 8.03 10.8 0.0445

rice 13.48 87.13 77.03 0.0063

Scansite Arabidopsis 18.48 86.06 84.45 0.0199

mammalian 42.08 85.08 83.78 0.128

rice 25.33 88.44 84.03 0.1061

PredPhospho Arabidopsis 59.61 77.01 76.7 0.1141

mammalian 78.88 77.65 77.68 0.1971

weighted voting method to incorporate 15 phosphorylation-

You might also like