Professional Documents
Culture Documents
Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
Center for Computation & Technology, Louisiana State University, Baton Rouge, LA 70803, USA
H I G H L I G H T S
art ic l e i nf o
a b s t r a c t
Article history:
Received 19 October 2012
Received in revised form
24 January 2013
Accepted 18 March 2013
Available online 27 March 2013
Keywords:
Articial sequences
Evolved sequences
Protein threading
Protein structure modeling
Template-based modeling
1. Introduction
Systems biology has emerged to help understand how the
components of complex living systems interact and how their
n
Correspondence address: Department of Biological Sciences, Louisiana State
University, 202 Life Sciences Building, Baton Rouge, LA 70803, USA.
Tel.: +1 225 578 2791.
E-mail address: michal@brylinski.org
0022-5193/$ - see front matter & 2013 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.jtbi.2013.03.018
malfunction causes disease (Kitano, 2002). To perform a systemslevel analysis of molecular interactions, one requires a complete
list of functionally annotated genes and proteins specied by these
genes (or briey gene products) within an organism. Numerous
genome sequencing projects have already provided the community with a vast amount of sequence information; however, due to
the lack of functionally annotated close homologs, the molecular
functions of many of these gene products remain unknown (Mi
et al., 2003). To carry out functional inference in the low sequence
78
2. Methods
2.1. Dataset
The development and optimization of the scoring function and
simulation protocols as well as benchmarking calculations are
carried out for a non-redundant at the 35% sequence identity level
set of 10,558 proteins 50600 residues in length selected from the
Protein Structure Classication (CATH) database (Orengo et al.,
1997). The pairwise sequence identity threshold automatically
excludes close homologs from benchmarks. Pairwise structure
alignments for the dataset proteins are constructed by fr-TMalign (Pandit and Skolnick, 2008) and evaluated by a TM-score,
which is a length-independent measure that ranges from 0 to 1.
A value of 0.4 indicates a statistically signicant similarity with a
p-value of 3.4 105 (Zhang and Skolnick, 2004). We note that a
TM-score of 0.4 is an appropriate fold similarity assignment
threshold; template structures above a TM-score of 0.4 contain
sufcient information to enable the full-length reconstruction of
the target structure (Skolnick et al., 2012).
bur
bur
C bur
A;B N A;B = N i;B
i1
where N bur
i;B is the number of amino acids of type i within the
state B.
The burial score for a given amino acid A is dened as its
composition in the particular state B (calculated from the structure) normalized by its frequency of occurrence, fA, in the dataset:
Sbur
A;B
C bur
A;B
fA
1 n bur
S
n i 1 i;B
1 n sec
S
n i 1 i;D
1 n seq
P
ni1 i
including pseudo-counts:
p
cj F j m
p
P seq
j
m m
79
dFire C
1:8031 n18:669
EdFSC
dFire SC
1:5766 n44:863
Fig. 1. Normalized dFire pseudo-energy scores. Scores are calculated for CATH proteins using (A) C atoms and (B) side chain centers of mass and plotted as a function of
protein length.
80
Potj Potj
sj
!2
ln
1
p
sj 2
where Potj is a standard score of the Pot statistics for amino acid j.
These restraints are nally averaged over 20 residue types to
give anti-bunching pseudo-energy score, Epot:
Epot
1 20 pot
C
ni1 i
10
For template selection and the construction of target-totemplate alignments, we use two popular algorithms: HHpred
(Soding, 2005) and CSI-BLAST (Biegert and Soding, 2009). HHpred
detects distant homologous relationships between proteins based
on the pairwise alignments of prole hidden Markov models and
was shown to outperform other prole-prole comparison methods. CSI-BLAST is a modied version of PSI-BLAST (Altschul et al.,
1997) that derives context-specic amino acid similarities from
short windows centered on each query sequence residue, which
results in a signicant increase of sensitivity as well as alignment
quality, particularly for difcult cases. For each program, we
constructed three template libraries using wild-type sequences
from CATH, these articially evolved from random to stabilize
CATH structures as well as random protein-like sequences.
2.6. Data fusion
CSt r WT
r EV
t
t
w6 E
pot
11
The weight factors were optimized on the CATH dataset. First, for
each structure, we generated a large number of sequences, by
iteratively shufing the native one. Then, we selected 20 nativelike sequences, whose identity to the wild type sequence was
450% (on average every second residue is identical) as well as 30
decoy sequences with the identity of o10% (evidently dissimilar
sequences). Native-like sequences should exhibit similar behavior
in prole-based modeling while decoy sequences will have very
different properties. This procedure resulted in the total number of
527,900 sequences. We employed an algorithm that belongs to
Evolution Strategies, which imitate the principles of natural
evolution as a method to solve parameter optimization problems
(Back and Schwefel, 1993; Back et al., 1992), to select a set weight
factors that maximize the Z-score (the dimensionless ratio of the
rst and second moments of the pseudo-energy distribution
within the native-like pool and the decoy pool)
Zscore
Enat Edec
s
12
13
and r EV
are the template rank in the wild-type and
where r WT
t
t
evolved library, respectively.
2.7. Assessment measures
The ability to select structurally similar templates from the
library is assessed by the area under the accumulation curve
(AUAC), where positives are dened as these structures that have
a TM-score to the target of 0.4. The remaining templates are
considered negatives. Furthermore, we calculate the number of
good templates (TM-score 0.4) found among the top 10
identied templates. Alignment accuracy is evaluated by
Matthew's correlation coefcient (MCC) against reference structure alignments constructed by fr-TM-align (Pandit and Skolnick,
2008):
TP TNFP FN
MCC p
TP FPTP FNTN FPTN FN
14
3. Results
2.4. Sequence evolution engine
The optimization of an amino acid sequence to stabilize a given
structure is carried out by Simulated Annealing (SA). Here
we use the C++ implementation from GNU Scientic Library
(http://www.gnu.org/software/gsl), with the following SA parameters: N_TRIES 200, ITERS_FIXED_T 2000, K 1.0, T_INITIAL
5000, MU_T 1.002 and T_MIN 0.005. A single Monte Carlo step
swaps a pair of randomly selected residues in the evolving
sequence. The starting sequences are always random with a
generic protein-like composition according to amino acid frequencies provided by UniProtKB/Swiss-Prot (Boutet et al., 2007).
81
Fig. 2. Most effective scoring terms in discriminating between wild-type and random sequences. Distribution of selected pseudo-energy scores assigned to wild-type as well as
random protein-like sequences across the CATH dataset: (A) secondary structure score, (B) burial score, (C) dFire-SC score and (D) sequence prole score.
82
Fig. 3. Characteristics of evolved sequences. (A) Sequence identity to wild-type calculated for evolved and random sequences. (B) Degeneracy of sequence proles constructed
for wild-type, evolved and random sequences measured by Pearson's chi-squared test (2-statistic). Boxes end at the quartiles Q1 and Q3; a horizontal line in a box is the
median. Whiskers point at the farthest points that are within 3/2 times the interquartile range.
83
Fig. 4. Ability of evolved sequences to recognize the native-like fold. Accuracy of template selection by HHpred and CSI-BLAST from the wild-type, evolved as well as random
CATH libraries using evolved target sequences: (A) area under the accumulation curve (AUAC), (B) number of structurally similar proteins within the top 10 identied
templates.
Fig. 5. Performance of articially evolved templates in fold recognition. Accuracy of template selection by HHpred and CSI-BLAST from the wild-type, evolved as well as random
CATH libraries using wild-type target sequences: (A) area under the accumulation curve (AUAC), (B) number of structurally similar proteins within the top 10 identied
templates. Data fusion ranking is obtained by merging ranks assigned using wild-type and evolved libraries.
84
Fig. 6. Accuracy of target-to-template threading alignments. Alignments are constructed for wild-type target sequences by (A) HHpred and (B) CSI-BLAST using wild-type,
evolved as well as random CATH libraries and assessed by Matthew's correlation coefcient (MCC) against structure alignments. Inset plots show the average TM-score
calculated for threading alignments compared to the corresponding structure alignments for (W) wild-type, (E) evolved and (R) random CATH libraries.
85
Fig. 7. Template ranking using wild-type and articially evolved libraries. Area under the accumulation curve for template selection from wild-type and evolved libraries using
wild-type target sequences. (A) HHpred and (B) CSI-BLAST. Each dot represents one CATH target, the regression line and the diagonal is solid and dashed, respectively.
Fig. 8. Condence estimates for template selection. Distribution of the area under the accumulation curve (AUAC) for template selection from wild-type and evolved libraries
using wild-type target sequences. Targets are grouped into 5 condence bins based on the (A) probability values estimated by HHpred and (B) E-values by CSI-BLAST.
HHpred: 2hpkA00 (rank 22), 1qv0A00 (rank 41), 2zfdA00 (rank 12)
and 1yr5A00 (rank 24). Despite their low sequence identity to the
target (25.2%, 17.6%, 20.8% and 22.3%, respectively), all these
protein domains are structurally related with a TM-score of 0.49,
0.48, 0.43 and 0.43, respectively (see Fig. 9B and D). By using
HHpred and the evolved template library, these templates are
found within the top 10 ranks; 2hpkA00, 1qv0A00, 2zfdA00 and
1yr5A00 are now ranked 2, 3, 4 and 8, respectively. Note that the
similarity of evolved sequences to the target sequence remains at a
comparable level: 22.9%, 18.8%, 23.6% and 17.8%, respectively.
Another interesting example is Gram-negative porin from Rhodobacter capsulatus (CATH domain code: 2porA00) and its four
weakly homologous templates: 3dwnA00, 2gskA02, 2iahA03 and
1kmoA02. Their TM-score/wild-type/evolved sequence identity to
the target is 0.50/23.8%/18.9%, 0.57/23.6%/22.1%, 0.58/23.2%/22.3%
and 0.59/24.5%/19.8%, respectively. Structural alignments of these
templates to the target are shown in Fig. 9EH. By using evolved
template sequences rather than the wild-type ones, the ranking of
these templates systematically improves from 13 to 1, from 22 to 3,
from 4100 to 3, and from 4100 to 6, respectively.
Our second case study shows that even when threading of
wild-type target sequences using both wild-type and evolved
template libraries correctly identies structurally similar proteins,
86
Fig. 9. Examples of improved template recognition using evolved template library. Top panel: template-to-target structural alignments of 1sraA00 (target) and (A) 2hpkA00,
(B) 1qv0A00, (C) 2zfdA00, and (D) 1yr5A00 (templates). Bottom panel: template-to-target structural alignments of 2porA00 (target) and (E) 3dwnA00, (F) 2gskA02,
(G) 2iahA03, and (H) 1kmoA02 (templates). Target and template structures are colored in green and yellow, respectively; the aligned region is solid. (For interpretation of the
references to colour in this gure legend, the reader is referred to the web version of this article.)
4. Discussion
Many existing approaches to protein threading and fold recognition exhibit extremely high false negative rates due to the fact
that the vast majority of pairs of proteins with similar structures
populate the midnight zone of sequence identity (Rost, 1999).
Most of the scoring schemes use as a basis a strong sequence
prole component, which is designed to detect evolutionary
relationships at the sequence level rather than structure similarities (Biegert and Soding, 2009; Eddy, 1998; Hughey and Krogh,
1996; Sadreyev and Grishin, 2003; Soding, 2005). Since most
proteins with similar structures in the midnight zone are likely
the products of convergent or divergent evolution (Doolittle, 1994;
Rost, 1997), they remain undetectable even for very sensitive
homology-based methods. It has been demonstrated that for an
arbitrary protein, structural analogs are likely present in the PDB
(Zhang and Skolnick, 2005; Zhang et al., 2006), yet a considerable
fraction of protein sequences fall into the hard target category
(Peng and Xu, 2010). One possible solution to this problem would
be to develop a new class of purely structure-based algorithms;
however, extensive efforts in this direction brought about rather
limited success (Kryshtafovych et al., 2005; Moult et al., 2011,
2009). In this study we explore the possibility of using synthetic
amino acid sequences instead of the wild-type ones to enhance
the detection of structural analogs.
We developed a method for the optimization of generic
protein-like amino acid sequences to stabilize the respective
structures using several statistical potentials, which are compatible with these used in protein threading and fold recognition. In
extensive benchmarks, we show that the articially evolved
sequences, despite their low sequence identity to the wild-type
counterparts, have signicant capabilities to recognize the correct
structures. Furthermore, when state-of-the-art threading is
applied to both wild-type and articially evolved template
87
Fig. 10. Examples of more accurate target-to-template alignments constructed using evolved template sequences. (A) Alignments constructed by CSI-BLAST for 1j58A02 (target)
and 1gqgC02 (template); (B) alignments constructed by HHpred for 1gkpA01 (target) and 1rk6A01 (template). Left panel shows template-to-target structural alignment by
fr-TM-align. Target and template structures are colored in red and orange, respectively; the aligned region is solid. Plots on the right show C-RMSD per residue calculated
over residue positions aligned by fr-TM-align; reference structural alignments are compared to threading alignments obtained using wild-type and evolved template
sequences. (For interpretation of the references to colour in this gure legend, the reader is referred to the web version of this article.)
5. Conclusions
We introduce a novel modeling stratagem, which employs a
library of synthetic sequences to improve template ranking in fold
recognition by sequence prole-based methods. Regardless of its
current limitations, it represents a new direction in the development of more sensitive threading approaches with the enhanced
capabilities of targeting difcult, midnight zone templates.
Availability
Datasets and modeling results are available free of charge at
http://www.brylinski.org/evolver.
Acknowledgments
This study was supported by LSU Council on Research through
the 2012 Summer Stipend Program. Portions of this research were
conducted with high performance computational resources provided by Louisiana State University (http://www.hpc.lsu.edu).
References
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.,
1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs. Nucl. Acids Res. 25, 33893402.
Back, T., Schwefel, H.P., 1993. An overview of evolutionary algorithms for parameter
optimization. Evol. Comput. 1, 123.
Back, T., Hoffmeister, F., Schwefel, H.P., 1992. A Survey of Evolution Strategies. In:
Proceedings of the Fourth International Conference on Genetic Algorithms, San
Mateo, CA, pp. 29.
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H.,
Shindyalov, I.N., Bourne, P.E., 2000. The protein data bank. Nucl. Acids Res.
28, 235242.
Biegert, A., Soding, J., 2009. Sequence context-specic proles for homology
searching. Proc. Natl. Acad. Sci. USA 106, 37703775, http://dx.doi.org/
10.1073/pnas.0810767106.
Boutet, E., Lieberherr, D., Tognolli, M., Schneider, M., Bairoch, A., 2007. UniProtKB/
Swiss-Prot. Methods Mol. Biol. 406, 89112.
Brylinski, M., Skolnick, J., 2008. A threading-based method (FINDSITE) for ligandbinding site prediction and functional annotation. Proc. Natl. Acad. Sci. USA 105,
129134.
Brylinski, M., Skolnick, J., 2011. FINDSITE-metal: integrating evolutionary information
and machine learning for structure-based metal-binding site prediction at the
proteome level. Proteins 79, 735751, http://dx.doi.org/10.1002/prot.22913. [doi].
Brylinski, M., Prymula, K., Jurkowski, W., Kochanczyk, M., Stawowczyk, E.,
Konieczny, L., Roterman, I., 2007. Prediction of functional sites based on the
fuzzy oil drop model. PLoS Comput. Biol. 3, e94.
Chen, H., Kihara, D., 2011. Effect of using suboptimal alignments in template-based
protein structure prediction. Proteins 79, 315334, http://dx.doi.org/10.1002/
prot.22885.
Dai, L., Zhou, Y., 2011. Characterizing the existing and potential structural space of
proteins by large-scale multiple loop permutations. J. Mol. Biol. 408, 585595,
http://dx.doi.org/10.1016/j.jmb.2011.02.056.
Doolittle, R.F., 1994. Convergent evolution: the need to be explicit. Trends Biochem.
Sci. 19, 1518.
Drew, K., Winters, P., Butterfoss, G.L., Berstis, V., Uplinger, K., Armstrong, J., Rife, M.,
Schweighofer, E., Bovermann, B., Goodlett, D.R., Davis, T.N., Shasha, D.,
Malmstrom, L., Bonneau, R., 2011. The Proteome Folding Project: proteomescale prediction of structure and function. Genome Res. 21, 19811994
http://dx.doi.org/10.1101/gr.121475.111.
Eddy, S.R., 1998. Prole hidden Markov models. Bioinformatics 14, 755763.
Elcock, A.H., 2001. Prediction of functionally important residues based solely on the
computed energetics of protein structure. J. Mol. Biol. 312, 885896, http://dx.
doi.org/10.1006/jmbi.2001.5009.
88
Frishman, D., Argos, P., 1995. Knowledge-based protein secondary structure assignment. Proteins 23, 566579, http://dx.doi.org/10.1002/prot.340230412.
Ginalski, K., 2006. Comparative modeling for protein structure prediction. Curr.
Opin. Struct. Biol. 16, 172177, http://dx.doi.org/10.1016/j.sbi.2006.02.003.
Ginn, C.M.R., Willett, P., Bradshaw, J., 2000. Combination of molecular similarity
measures using data fusion. Perspect. Drug Discov. Des. 20, 116.
Guharoy, M., Chakrabarti, P., 2005. Conservation and relative importance of
residues across protein-protein interfaces. Proc. Natl. Acad. Sci. USA 102,
1544715452, http://dx.doi.org/10.1073/pnas.0505425102.
Hall, D.L., Llinas, J., 1997. An introduction to multisensor data fusion. Proc. IEEE 85,
623.
Hert, J., Willett, P., Wilton, D.J., Acklin, P., Azzaoui, K., Jacoby, E., Schuffenhauer, A.,
2004. Comparison of ngerprint-based methods for virtual screening
using multiple bioactive reference structures. J. Chem. Inf. Comput. Sci. 44,
11771185, http://dx.doi.org/10.1021/ci034231b.
Hetenyi, C., van der Spoel, D., 2006. Blind docking of drug-sized compounds to
proteins with up to a thousand residues. FEBS Lett. 580, 14471450, http://dx.
doi.org/10.1016/j.febslet.2006.01.074.
Huang, B., Schroeder, M., 2006. LIGSITEcsc: predicting ligand binding sites using the
Connolly surface and degree of conservation. BMC Struct. Biol. 6, 19, http://dx.
doi.org/10.1186/1472-6807-6-19.
Hughey, R., Krogh, A., 1996. Hidden Markov models for sequence analysis:
extension and analysis of the basic method. Comput. Appl. Biosci. 12, 95107.
Jones, D.T., 1999. Protein secondary structure prediction based on position-specic
scoring matrices. J. Mol. Biol. 292, 195202, http://dx.doi.org/10.1006/
jmbi.1999.3091.
Jones, D.T., Taylor, W.R., Thornton, J.M., 1992. A new approach to protein fold
recognition. Nature 358, 8689, http://dx.doi.org/10.1038/358086a0.
Karchin, R., Cline, M., Karplus, K., 2004. Evaluation of local structure alphabets
based on residue burial. Proteins 55, 508518, http://dx.doi.org/10.1002/
prot.20008.
Kitano, H., 2002. Systems biology: a brief overview. Science 295, 16621664, http:
//dx.doi.org/10.1126/science.1069492.
Kryshtafovych, A., Venclovas, C., Fidelis, K., Molt, J., 2005. Progress over the rst
decade of CASP experiments. Proteins 61 (7), 225236, http://dx.doi.org/
10.1002/prot.20740.
Kurowski, M.A., Bujnicki, J.M., 2003. GeneSilico protein structure prediction metaserver. Nucl. Acids Res. 31, 33053307.
Kuziemko, A., Honig, B., Petrey, D., 2011. Using structure to explore the sequence
alignment space of remote homologs. PLoS Comput. Biol. 7, e1002175,
http://dx.doi.org/10.1371/journal.pcbi.1002175.
Liu, S., Vakser, I.A., 2011. DECK: Distance and environment-dependent, coarsegrained, knowledge-based potentials for protein-protein docking. BMC Bioinform. 12, 280, http://dx.doi.org/10.1186/1471-2105-12-280.
Lundstrom, J., Rychlewski, L., Bujnicki, J., Elofsson, A., 2001. Pcons: a neuralnetwork-based consensus predictor that improves fold recognition. Protein
Sci. 10, 23542362.
Marti-Renom, M.A., Stuart, A.C., Fiser, A., Sanchez, R., Melo, F., Sali, A., 2000.
Comparative protein structure modeling of genes and genomes. Annu. Rev.
Biophys. Biomol. Struct. 29, 291325, http://dx.doi.org/10.1146/annurev.
biophys.29.1.291.
McGufn, L.J., Smith, R.T., Bryson, K., Sorensen, S.A., Jones, D.T., 2006. High
throughput prole-prole based fold recognition for the entire human proteome. BMC Bioinform. 7, 288, http://dx.doi.org/10.1186/1471-2105-7-288.
Mi, H., Vandergriff, J., Campbell, M., Narechania, A., Majoros, W., Lewis, S., Thomas, P.D.,
Ashburner, M., 2003. Assessment of genome-wide protein function classication
for Drosophila melanogaster. Genome Res. 13, 21182128, http://dx.doi.org/10.1101/
gr.771603.
Moult, J., Fidelis, K., Kryshtafovych, A., Tramontano, A., 2011. Critical assessment of
methods of protein structure prediction (CASP)round IX. Proteins 79 (10),
15, http://dx.doi.org/10.1002/prot.23200.
Moult, J., Fidelis, K., Kryshtafovych, A., Rost, B., Tramontano, A., 2009. Critical
assessment of methods of protein structure predictionRound VIII. Proteins 77
(9), 14, http://dx.doi.org/10.1002/prot.22589.
Orengo, C.A., Michie, A.D., Jones, S., Jones, D.T., Swindells, M.B., Thornton, J.M., 1997.
CATHa hierarchic classication of protein domain structures. Structure 5,
10931108.
Pandit, S.B., Skolnick, J., 2008. Fr-TM-align: a new protein structural alignment
method based on fragment alignments and the TM-score. BMC Bioinform. 9,
531, http://dx.doi.org/10.1186/1471-2105-9-531.
Pandit, S.B., Skolnick, J., 2010. TASSER_low-zsc: an approach to improve structure
prediction using low z-score-ranked templates. Proteins 78, 27692780,
http://dx.doi.org/10.1002/prot.22791.
Peng, J., Xu, J., 2010. Low-homology protein threading. Bioinformatics 26, i294i300, http://dx.doi.org/10.1093/bioinformatics/btq192.
Peng, J., Xu, J., 2011. A multiple-template approach to protein threading. Proteins
79, 19301939, http://dx.doi.org/10.1002/prot.23016.
Rost, B., 1997. Protein structures sustain evolutionary drift. Fold Des. 2, S19S24.
Rost, B., 1999. Twilight zone of protein sequence alignments. Protein Eng. 12,
8594.
Sadreyev, R., Grishin, N., 2003. COMPASS: a tool for comparison of multiple protein
alignments with assessment of statistical signicance. J. Mol. Biol. 326,
317336.
Schmidt, H., 2000. A proposed measure for psi-induced bunching of randomly
spaced events. J. Parapsychol. 64, 301316.
Shah, M., Passovets, S., Kim, D., Ellrott, K., Wang, L., Vokler, I., LoCascio, P., Xu, D.,
Xu, Y., 2003. A computational pipeline for protein structure prediction and
analysis at genome scale. Bioinformatics 19, 19851996.
Skolnick, J., Brylinski, M., 2009. FINDSITE: a combined evolution/structure-based
approach to protein function prediction. Brief Bioinform. 10, 378391, doi:
bbp017 [pii]10.1093/bib/bbp017 [doi].
Skolnick, J., Zhou, H., Brylinski, M., 2012. Further evidence for the likely completeness of the library of solved single domain protein structures. J. Phys. Chem. B
116, 66546664, http://dx.doi.org/10.1021/jp211052j.
Skolnick, J., Arakaki, A.K., Lee, S.Y., Brylinski, M., 2009. The continuity of protein
structure space is an intrinsic property of proteins. Proc. Natl. Acad. Sci. USA
106, 1569015695, http://dx.doi.org/10.1073/pnas.0907683106.
Soding, J., 2005. Protein homology detection by HMM-HMM comparison. Bioinformatics 21, 951960, http://dx.doi.org/10.1093/bioinformatics/bti125.
Taylor, W.R., Chelliah, V., Hollup, S.M., MacDonald, J.T., Jonassen, I., 2009. Probing
the dark matter of protein fold space. Structure 17, 12441252, http://dx.doi.
org/10.1016/j.str.2009.07.012.
Wass, M.N., Kelley, L.A., Sternberg, M.J., 2010. 3DLigandSite: predicting ligandbinding sites using similar structures. Nucleic Acids Res 38, W469W473,
http://dx.doi.org/10.1093/nar/gkq406.
Wu, S., Zhang, Y., 2007. LOMETS: a local meta-threading-server for protein
structure prediction. Nucleic Acids Res. 35, 33753382, http://dx.doi.org/
10.1093/nar/gkm251.
Zhang, C., Liu, S., Zhou, H., Zhou, Y., 2004. An accurate, residue-level, pair potential
of mean force for folding and binding based on the distance-scaled, ideal-gas
reference state. Protein Sci. 13, 400411, http://dx.doi.org/10.1110/ps.03348304.
Zhang, Y., 2009. I-TASSER: fully automated protein structure prediction in CASP8.
Proteins 77 (9), 100113, http://dx.doi.org/10.1002/prot.22588.
Zhang, Y., Skolnick, J., 2004. Scoring function for automated assessment of protein
structure template quality. Proteins 57, 702710, http://dx.doi.org/10.1002/
prot.20264.
Zhang, Y., Skolnick, J., 2005. The protein structure prediction problem could be
solved using the current PDB library. Proc. Natl. Acad. Sci. USA 102, 10291034,
http://dx.doi.org/10.1073/pnas.0407152101.
Zhang, Y., Hubner, I.A., Arakaki, A.K., Shakhnovich, E., Skolnick, J., 2006. On the
origin and highly likely completeness of single-domain protein structures.
Proc. Natl. Acad. Sci. USA 103, 26052610, http://dx.doi.org/10.1073/pnas.
0509379103.