Professional Documents
Culture Documents
# The Author 2012. Published by Oxford University Press. All rights reserved. 119
For Permissions, please e-mail: journals.permissions@oup.com
L.Wang et al.
et al. (2007) performed spatial comparisons of physico- mutated interface residues derived from 20 protein com-
chemical interactions shared by functionally similar protein – plexes was taken from ASEdb (Thorn and Bogan, 2001). The
protein complexes and identified hot spots. Grosdidier and protein complexes are ensured as non-homologous because
Fernandez-Recio (2008) predicted hot spots by computational the sequence identity of at least one protein in each interface
docking without protein complex information. del Sol and is ,35% from all other proteins in the dataset. The sequence
O’Meara (2005) represented protein complexes as small- identity between proteins was calculated using the PISCES
world networks, and showed that a relatively small number of sequence culling server (Wang and Dunbrack, 2003). Hot
highly central residues correspond to hot spots. Li and Liu spot was defined when its change of binding energy (DDG)
(2009) used bipartite graph representation of protein com- is 2.0 kcal/mol upon its mutation to alanine. Thus, these
plexes. Maximal biclique subgraphs were subsequently identi- interface residues were divided into 77 hot spots and 241
fied from the bipartite graphs to locate biclique patterns which non-hot spots. The dataset was implemented as our training
might correspond to hot spots. Tuncbag et al. (2010a) ana- set. Additionally, the dataset of Cho et al. (2009), which was
lyzed residue contact networks of protein interfaces using derived from BID (Fischer et al., 2003), was used as an inde-
minimum cut trees and observed that the highest degree node pendent test set. Two residues which are not in protein inter-
residue and any other residue located at the other face, i.e. upon complex formation are calculated as
ave dpxcomp ðiÞ ave dpxmono ðiÞ
atom contact areasðiÞ ¼ rel dpxðiÞ ¼ ð6Þ
( ) ave dpxcomp ðiÞ
X XX ð2Þ
atom contact areaða; bÞ rel s - ch dpxðiÞ ¼
j[theother a[i b[j
face s - ch ave dpxcomp ðiÞ s - ch ave dpxmono ðiÞ ð7Þ
where atom_contact_area (a, b) is the contact area between s - ch ave dpxcomp ðiÞ
two interacting atoms a and b.
Residue contacts and physicochemical features. Two resi- Totally, there are 19 descriptors for each residue. For every
dues will have residue contact information (residue contact) target residue that we are predicting, the features are encoded
if there is one pair of contact atoms from them individually. from its descriptors and those of its two neighbor residues,
Residue contacts of a residue are calculated by summarizing i.e. mirror-contact residue and intra-contact residue. The
these residue contacts between the residue and other residues mirror-contact residue of a target residue is defined as its
121
L.Wang et al.
related to the buried hydrophobic surface area, and large hot spots and the rest as non-hot spots. KFC can correctly
atom contact area is required to make a water-tight seal predict 1 of the 3 hot spots and 12 of the 13 non-hot spots.
around a critical set of energetically favorable interactions Our method and KFC correctly predicted TYR76:N as a
(Bogan and Thorn, 1998). Figure 2D shows the box plots of hot spot residue which is colored by green in Fig. 3. It
atom_contact_areas_target in different residues. The means closely contacts the other face of the interface with a high
of hot spots, non-hot spots and all residues are 77.31, 43.72 value of atom contact areas of the target residue (atom_
and 51.85 Å2, respectively. The feature is significantly differ- contact_areas_target ¼ 97 Å2). Hot spot residue ARG67:N,
ent in hot spots and non-hot spots (P value ¼ 9.46 10211). colored by yellow in Fig. 3, was only identified by our
The result indicates that hot spots tend to contain large method. Both the residue and its mirror-contact ILE414:J
contact surface areas between two interacting proteins. (colored by pink in Fig. 3) are deeply buried with high
values of average depth indices (ave_dpx_target ¼ 3.36 Å
Case studies and ave_dpx_mirror ¼ 3.23 Å). Although residue ARG67:N
Thrombin binding with thrombomodulin. Serine proteinase loosely contacts the other face with a low value of atom
alpha-thrombin (PDBID:1dx5, chain N and chain B) causes contact areas of target residue (atom_contact_areas_target ¼
blood clotting and the anticoagulation is activated when it 16.8 Å2), the side chain of its mirror-contact residue
binds to thrombomodulin (PDBID: 1dx5, chain J) ILE414:J stretches into its face of the interface and forms
(Fuentes-Prior et al., 2000). Three hot spots (ARG67:N, many deeply buried atomic contacts containing a high value
TYR76:N and GLU80:N) and 13 non-hot spots have experi- of atom contact areas of mirror-contact residue
mentally been determined in thrombin. In these 16 alanine- (atom_contact_areas_mirror ¼ 102.7 Å2). Our method
mutated residues, our method identified two residues successfully identified ARG67:N by using the RF model of
(ARG67:N, TYR76:N) as hot spots and the rest as non-hot integrative features from its mirror-contact residue ILE414:J.
spots. Two of the three hot spots and all the non-hot spots
were correctly predicted (shown in Supplementary Fig. S1). Calmodulin binding with a peptide of smooth muscle myosin
In contrast, MINERVA predicted two residues (GLN38:N, light chain kinase. Calmodulin (CaM) is a versatile
ILE82:N) as hot spots and the others as non-hot spots. None Ca2þ-binding protein that regulates the activity of numerous
of the 3 hot spots and 11 non-hot spots were correctly pre- effector proteins in response to Ca2þ signals (Meador et al.,
dicted. KFC identified two residues (PHE34:N, TYR76:N) as 1992). Calcium-bound calmodulin (Ca2þ-CaM) (PDBID:
124
Prediction of hot spots in protein interfaces using a RF model with hybrid features
Our prediction model offers a powerful tool for detecting Meador,W.E., Means,A.R. and Quiocho,F.A. (1992) Science, 257,
candidate residues in the studies of alanine scanning muta- 1251– 1255.
Mihel,J., Sikic,M., Tomic,S., Jeren,B. and Vlahovicek,K. (2008) BMC
genesis for functional protein interaction sites. Our method is Struct. Biol., 8, 21.
applicable to the case that the 3D structure of a protein Moreira,I.S., Fernandes,P.A. and Ramos,M.J. (2007) Proteins, 68, 803–812.
complex is available. To extend our method to single- Ofran,Y. and Rost,B. (2007) PLoS Comput. Biol., 3, e119.
molecular structures or even sequences will enlarge the Robin,X., Turck,N., Hainard,A., Tiberti,N., Lisacek,F., Sanchez,J.C. and
Muller,M. (2011) BMC Bioinformatics, 12, 77.
application domain and it is an important research topic. Schreiber,G. and Fersht,A.R. (1995) J. Mol. Biol., 248, 478 –486.
Shen,J., Zhang,J., Luo,X., Zhu,W., Yu,K., Chen,K., Li,Y. and Jiang,H.
(2007) Proc. Natl Acad. Sci. USA, 104, 4337–4341.
Supplementary data Shulman-Peleg,A., Shatsky,M., Nussinov,R. and Wolfson,H.J. (2007) BMC
Biol., 5, 43.
Supplementary data are available at PEDS online. Sobolev,V., Sorokine,A., Prilusky,J., Abola,E.E. and Edelman,M. (1999)
Bioinformatics, 15, 327–332.
Thorn,K.S. and Bogan,A.A. (2001) Bioinformatics, 17, 284–285.
Tuncbag,N., Gursoy,A. and Keskin,O. (2009) Bioinformatics, 25,
Acknowledgements
References
Ahmad,S., Gromiha,M.M. and Sarai,A. (2004) Bioinformatics, 20, 477– 486.
Assi,S.A., Tanaka,T., Rabbitts,T.H. and Fernandez-Fuentes,N. (2010)
Nucleic Acids Res., 38, e86.
Bogan,A.A. and Thorn,K.S. (1998) J. Mol. Biol., 280, 1– 9.
Breiman,L. (1992) Mach. Learn., 257, 1251– 1255.
Chen,L., Wang,R. and Zhang,X. (2009) Biomolecular Network: Methods
and Applications in Systems Biology. New Jersey: John Wiley & Sons.
Chen,L., Wang,R., Li,C. and Aihara,K. (2010) Modelling Biomolecular
Networks in Cells: Structures and Dynamics. Berlin: Springer.
Cho,K., Kim,D. and Lee,D. (2009) Nucleic Acids Res., 37, 2672– 2687.
Darnell,S.J., Page,D. and Mitchell,J.C. (2007) Proteins, 68, 813–823.
del Sol,A. and O’Meara,P. (2005) Proteins, 58, 672–682.
Fauchere,J. and Pliska,V. (1983) Eur. J. Med. Chem., 18, 369– 375.
Fischer,T.B., Arunachalam,K.V., Bailey,D., et al. (2003) Bioinformatics, 19,
1453– 1454.
Fuentes-Prior,P., Iwanaga,Y., Huber,R., Pagila,R., Rumennik,G., Seto,M.,
Morser,J., Light,D.R. and Wolfram Bode,W. (2000) Nature, 404,
518–525.
Grosdidier,S. and Fernandez-Recio,J. (2008) BMC Bioinformatics, 9, 447.
Guerois,R., Nielsen,J.E. and Serrano,L. (2002) J. Mol. Biol., 320, 369–387.
Hu,Z., Ma,B., Wolfson,H. and Nussinov,R. (2000) Proteins, 39, 331– 342.
Kabsch,W. and Sander,C. (1983) Biopolymers, 22, 2577–2637.
Kawashima,S., Ogata,H. and Kanehisa,M. (1999) Nucleic Acids Res., 27,
368–369.
Keskin,O., Ma,B. and Nussinov,R. (2005) J. Mol. Biol., 345, 1281– 1294.
Kortemme,T. and Baker,D. (2002) Proc. Natl Acad. Sci. USA, 99,
14116–14121.
Li,J. and Liu,Q. (2009) Bioinformatics, 25, 743–750.
Liaw,A. and Wiener,M. (2002) R News, 2, 18– 22.
Liu,Z.P., Wu,L.Y., Wang,Y., Zhang,X.S. and Chen,L. (2010) Bioinformatics,
26, 1616– 1622.
Ma,B., Elkayam,T., Wolfson,H. and Nussinov,R. (2003) Proc. Natl Acad.
Sci. USA, 100, 5772– 5777.
126