You are on page 1of 22

proteins

STRUCTURE O FUNCTION O BIOINFORMATICS

Conservation of structural fluctuations


in homologous protein kinases and its
implications on functional sites
Raju Kalaivani,1 Alexandre G. de Brevern,2,3,4,5 and Narayanaswamy Srinivasan1*
1 Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka 560012, India
2 INSERM, U 1134, DSIMB, Paris F-75739, France
3 Sorbonne Paris Cite, University of Paris Diderot, Paris F-75739, France
4 Institut National de la Transfusion Sanguine (INTS), Paris F-75739, France
5 Laboratoire d’Excellence GR-Ex, Paris F-75739, France

ABSTRACT

Our aim is to explore the similarities in structural fluctuations of homologous kinases. Gaussian Network Model based Normal
Mode Analysis was performed on 73 active conformation structures in Ser/Thr/Tyr kinase superfamily. Categories of kinases
with progressive evolutionary divergence, viz. (i) Same kinase with many crystal structures, (ii) Within-Subfamily, (iii) Within-
Family, (iv) Within-Group, and (v) Across-Group, were analyzed. We identified a flexibility signature conserved in all kinases
involving residues in and around the catalytic loop with consistent low-magnitude fluctuations. However, the overall structural
fluctuation profiles are conserved better in closely related kinases (Within-Subfamily and Within-family) than in distant ones
(Within-Group and Across-Group). A substantial 65.4% of variation in flexibility was not accounted by variation in sequences
or structures. Interestingly, we identified substructural residue-wise fluctuation patterns characteristic of kinases of different cat-
egories. Specifically, we recognized statistically significant fluctuations unique to families of protein kinase A, cyclin-dependent
kinases, and nonreceptor tyrosine kinases. These fluctuation signatures localized to sites known to participate in protein-protein
interactions typical of these kinase families. We report for the first time that residues characterized by fluctuations unique to the
group/family are involved in interactions specific to the group/family. As highlighted for Src family, local regions with differen-
tial fluctuations are proposed as attractive targets for drug design. Overall, our study underscores the importance of considera-
tion of fluctuations, over and above sequence and structural features, in understanding the roles of sites characteristic of kinases.

Proteins 2016; 84:957–978.


C 2016 Wiley Periodicals, Inc.
V

Key words: homologous proteins; conservation of protein flexibility; structural fluctuations; functional site identification;
family-specific interactions; STY kinases; protein kinases.

Additional Supporting Information may be found in the online version of this article.
Abbreviations: Abl, Abelson murine leukemia homolog; Ack, activated cdc42-associated tyrosine kinase; AGC, group containing PKA, PKG, PKC, and related
families; Akt, PKB or protein kinase B; CAMK, group containing calcium/calmodulin regulated kinases and related families; CDK, cyclin-dependent kinase;
CDKN3, cyclin-dependent kinase inhibitor 3; CK1, group containing casein kinase 1 and related families; CKS, cyclin-dependent kinase regulatory subunit;
CMGC, group containing CDK, MAPK, GSK3, CLK, and related families; DAPK, death-associated protein kinase; DMPK, myotonic dystrophy protein kinase;
EGFR, epidermal growth factor receptor; ENM, elastic network model; FS score, flexibility similarity score; GNM, Gaussian network model; GRK, G-protein
coupled receptor kinase; GSK, glycogen synthase 3 kinase; InsR, family containing insulin receptor and associated kinases; MAPK, mitogen-activated protein
kinase; MAST, microtubule-associated serine/threonine kinase; NDR, nuclear DBF2-related kinases; NMA, normal mode analysis; nRTK, nonreceptor tyrosine
kinase; PDK1, phosphoinositide-dependent protein kinase 1; PHK, phosphorylase kinase; PKA, protein kinase A or cAMP-dependent protein kinase; PKC,
protein kinase C; PKG, protein kinase G or cGMP-dependent protein kinase; PKN, protein kinase N; PTF, phototrophin and flippase kinases; RSK, ribo-
somal S6 kinase; RSKL, RSK-like kinases; RSKR, RSK-related kinases; RTK, receptor tyrosine kinase; SGK, family containing serum and glucocorticoid
responsive kinase and related kinases; SH2, Src homology 2; Src, family containing SrcA, SrcB, Frk, and related subfamilies; SRPK, SR protein kinase; phos-
phorylates serine/arginine rich splicing factors; STE, group containing MAP kinase cascade kinases; STE20, family containing MAP4K (MAP kinase kinase
kinase kinase) and related kinases; STY kinases, serine/threonine/tyrosine kinases; TK, tyrosine kinase; YANK, yet another novel kinase.

Grant sponsors: Indo-French Collaborative Grant CEFIPRA, Grant number: 5203–2 (to N.S. and A.dB.); Mathematical Biology Initiative, Department of Science and Technol-
ogy and Department of Biotechnology, Government of India, J. C. Bose National Fellowship (to N.S.), National Institute for Health and Medical Research (INSERM), University
Paris Diderot, Sorbonne Paris Cite, National Institute for Blood Transfusion (INTS), and laboratory of excellence GR-Ex (to A.dB.), and University Grants Commission, Gov-
ernment of India (to R.K.).
*Correspondence to: Narayanaswamy Srinivasan, Lab no. 103, Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560012, Karnataka, India.
E-mail: ns@mbu.iisc.ernet.in
Received 20 August 2015; Revised 2 February 2016; Accepted 17 March 2016
Published online 30 March 2016 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/prot.25044

C 2016 WILEY PERIODICALS, INC.


V PROTEINS 957
R. Kalaivani et al.

INTRODUCTION adopt highly similar conformations in their active states,


especially in the functional motif regions. Contrastingly,
Consideration of homology has profoundly influenced the inactive state conformations of STY kinases are less
our understanding of functional relatedness between pro- similar in general.
teins. However, traditional homology detection methods Although the proteins of STY kinase superfamily
using sequence,1–5 structure,6–10 and functional broadly perform the same catalytic function, they are
motif11–14 similarities have shortcomings. A trait that is involved in a myriad of signalling pathways;32 are regu-
tightly related to the protein’s function would be more lated by varied mechanisms33–38 and localize in differ-
effective as a proxy to understand relatedness among ent cellular regions, resulting in distinct biochemical
proteins. Recent studies have shown that dynamics of effects.39–46 The sequence and structure features of STY
proteins carry activity-specific information15 and are kinases alone cannot identify their cognate substrates;
coupled to their functions.16 But, studies that compre- specific regulatory mechanisms or interactions with other
hensively analyse the relationship between dynamics and proteins. We hypothesise that the structural fluctuations
functions of related proteins are sparse.17–20 In the cur- or mobility of STY kinases is an important consideration
rent work, an understanding on the mobility of proteins to understand the basis of their regulatory features, espe-
was arrived at using the fluctuations derived from Gaus- cially their specific recognition of other domains/pro-
sian Network Model (GNM) based Normal Mode Analy- teins. If this hypothesis is valid, one would expect the
sis (NMA). Variations in structural fluctuations among structural fluctuations to be better conserved in closely
homologous kinases are then viewed in the light of hier- related STY kinases than distantly related ones. For
archical categories such as subfamily and family. In sum- example, fluctuations would be conserved better within a
mary, we primarily aim to understand how the mobility subfamily of kinases than across subfamilies within a
of homologous kinases vary, and how well this variation family. If the structural fluctuations of related STY
is consistent with the hierarchical grouping of kinases. kinases are indeed better conserved, is it only a trivial
Protein kinases, which target the hydroxyl group in consequence of conservation of sequence and structure
the side-chains of Ser, Thr, and Tyr residues, were chosen information? Or, do the fluctuations truly contain infor-
as the model system of homologous proteins. They form mation that is not inherent in sequences and structures?
a large and well-studied protein superfamily with several How do such conserved structural fluctuations, if pres-
structures belonging to diverse families/subfamilies ent, corroborate with the specific association of kinases
already available. They catalyse phosphotransfer from a with other domains/proteins?
nucleoside triphosphate to a Ser/Thr/Tyr residue on its To answer these questions, we first grouped the STY
cognate protein substrate. For this reason, we refer them kinases into five categories: (i) Same kinase with more
as STY kinases. Various studies have analyzed their than one crystal structure available, (ii) Within-
sequence,21–23 structure,24,25 and function/mecha- Subfamily, (iii) Within-Family, (iv) Within-Group, and
nism.26,27 Reflective of the diverse regulatory mecha- (v) Across-Group. These categories are derived from Kin-
nisms that tightly control the activity of different STY Base21 (see Methods) classification scheme and corre-
kinases, their catalytic domains have a modest 20% spond to progressive evolutionary divergence of kinases.
sequence identity across groups of families.28 On the STY kinases within a group have similar substrate site
other hand, consistent with the common ancestry and specificity; those within a family have related regulatory
catalytic mechanism, all the members of the superfamily features and those within a subfamily have similar
have a conserved structural fold29 made of a smaller N- sequence and regulatory features across phyla. GNM
terminal-lobe and a bigger C-terminal-lobe. While the C- based NMA was performed on 73 active conformation
lobe is entirely made of helices, the N-lobe mainly con- structures of STY kinase catalytic domains (Table I)
tains b sheets and a conserved helical region referred as using only the spatial positions of Ca atoms (Gaussian
aC-helix. The aC-helix is a crucial structural motif network model based normal mode analysis section in
whose conformational variation has significant conse- Methods). We characterized the resulting structural fluc-
quence on the activity levels of the kinase.25,30 Besides tuations of STY kinases of different categories in the light
the aC-helix, ATP binding loop is the other functionally of their regulatory features such as specific association
important flexible motif in the N-lobe and it is involved with domains/proteins.
in binding of the nucleoside in the catalytic cleft. Con-
nected to the N-lobe through the hinge region is the
C-lobe, which contains the catalytic loop, with a com- METHODS
pletely conserved aspartate,31 followed by the activation
KinBase classification of STY kinases
loop. Both these loops are known to play significant roles
in the switch of the kinases25 between two extreme func- Hanks and Hunter originally devised a scheme for
tional states: active and inactive. Through several struc- classifying47 STY kinases based on the sequence similar-
tural studies,30 it has been shown that the STY kinases ity between their catalytic domains. Based on the

958 PROTEINS
Structural Fluctuations in Protein Kinases

Table I
Data Set of 73 STY Kinases used in the Present Study

KinBase classification
Kinase domain
no. Group Family Subfamily UniProt ID PDB ID ATOM residues Res ()
1 AGC Akt P31751 1O6K_A 152–409 1.7
1O6L_A 152–409 1.6
2 AGC DMPK GEK Q09013 2VD5_A 71–339 2.8
3 AGC DMPK ROCK Q28021 2F2U_A 92–354 2.4
4 AGC PDK1 O15530 1H1W_A 82–342 2
5 AGC PKA P05132 1APM_E 43–297 2
1ATP_E 43–297 2.2
1FMO_E 43–297 2.2
1JBP_E 43–297 2.2
1L3R_E 43–297 2
1RDQ_E 43–297 1.26
2CPK_E 43–297 2.7
2ERZ_E 43–297 2.2
6 AGC PKA P36887 1CDK_A 43–297 2
7 AGC PKC PKCa P05771 2I0E_A 342–600 2.6
8 AGC PKC PKCd Q04759 2JED_A 380–634 2.32
9 AGC PKC PKCi P41743 1ZRZ_A 245–513 3
10 CAMK CAMKL CHK1 O14757 1IA8_A 9–265 1.7
11 CAMK DAPK DAPK P53355 1JKK_A 13–275 2.4
1IG1_A 13-275 1.8
12 CAMK MAPKAPK MK2 P49137 1NXK_A 64–325 2.7
13 CAMK PHK P00518 1PHK_A 19–287 2.2
2PHK_A 19–287 2.6
14 CAMK PIM P11309 1XR1_A 38–290 2.1
15 CK1 CK1 CK1-D Q06486 1CKJ_A 12–282 2.46
16 CMGC CDK CDK2 P24941 1FIN_A 4–286 2.3
1JST_A 4–286 2.6
1JSU_A 4–286 2.3
1QMZ_A 4–286 2.2
1W98_A 4–286 2.15
17 CMGC CDK CDK4 Q00534 1JOW_B 4–286 3.1
18 CMGC CDK CDK5 Q00535 1H4L_A 4–286 2.65
19 CMGC DYRK DYRK1 Q13627 2VX3_A 159–479 2.4
20 CMGC GSK P49841 1GNG_A 56–340 2.6
1O9U_A 56–340 2.4
21 CMGC MAPK ERK1 P63086 2ERK_A 23–311 2.4
22 CMGC MAPK p38 P53778 1CM8_A 27–311 2.4
23 CMGC MAPK p38 P47811 3PY3_A 24–308 2.1
24 CMGC SRPK Q96SB4 1WAK_A 80–653 1.73
1WBP_A 80–653 2.4
25 Other Aur O14965 1OL5_A 133–383 2.5
26 STE STE-Unique Q99558 4DN5_A 400–655 2.5
27 STE STE20 PAKA Q13153 1YHV_A 270–521 1.8
1YHW_A 270–521 1.8
3Q52_A 270–521 1.8
3Q53_A 270–521 2.09
28 STE STE20 TAO Q9JLS3 1U5Q_A 28–281 2.1
1U5R_A 28–281 2.1
29 TK Abl P00519 1OPL_A 261–512 3.42
2F4J_A 242–493 1.91
2G2I_A 242–493 3.12
2GQG_A 242–493 2.4
30 TK Ack Q07912 1U46_A 126–385 2
1U4D_A 126–385 2.1
1U54_A 126–385 2.8
31 TK Csk P32577 1K9A_A 195–445 2.5
32 TK EGFR P00533 1M14_A 688–955 2.6
2GS2_A 688–955 2.8
2ITP_A 712–979 2.74
33 TK InsR P06213 1IR3_A 996–1271 1.9
34 TK InsR P08069 1K3A_A 969–1244 2.1
35 TK PDGFR P10721 1PKG_A 589–927 2.9
36 TK Src SrcA P12931 1Y57_A 267–520 1.91

PROTEINS 959
R. Kalaivani et al.

Table I
(Continued)

KinBase classification
Kinase domain
no. Group Family Subfamily UniProt ID PDB ID ATOM residues Res ()
1YOJ_A 269–522 1.95
1YOL_A 269–522 2.3
1YOM_A 269–522 2.9
37 TK Src SrcA P00523 3DQW_A 267–520 2.02
38 TK Src SrcB P06239 1QPC_A 245–498 1.6
1QPD_A 245–498 2
1QPE_A 245–498 2
1QPJ_A 245–498 2.2
3LCK_A 245–498 1.7
39 TK VEGFR P35968 1VR2_A 834–1162 2.4

A total of 73 STY kinase domain structures, accounting for 39 unique STY kinases, are listed along with hierarchical classification of KinBase. KinBase classifies STY
kinases broadly into groups. STY kinases within a group are further clustered into families. Additionally, some of the within-family members are sub-categorised into
subfamilies. UniProt ID, Protein Data Bank identities of the structures (PDB ID) and the subunit used are also enlisted. The region of ATOM residues (numbering
according to the PDB structure file) that form the kinase catalytic domain and the resolution of the X-ray crystal structure are also tabulated.

phylogeny of catalytic domains, they classified the then Definition of categories derived from
known STY kinases into a few major groups and subdi- KinBase classification
vided them into families and subfamilies. This classifica-
tion resulted in clusters of kinases with related regulatory We performed all possible pairwise comparisons of
features such as association with specific domains/pro- structural fluctuations of the 73 STY kinases considered
teins and similar structural features. Nonetheless, with in the study. These pairs were grouped into five catego-
the emergence of new kinomes and expanding knowledge ries based on the KinBase classification of the entities: (i)
of the mechanistic and regulatory features of Same-Protein, (ii) Within-Subfamily, (iii) Within-Family,
kinases,48,49 the shortcoming of using only the catalytic (iv) Within-Group, and (v) Across-Group. These catego-
domain sequence became apparent and an empirical sys- ries represent progressive divergence in the evolution of
tem of classification was required. kinases, with Within-Subfamily corresponding to the
In 2002, the complete protein kinome of humans was most closely related kinases and Across-Group corre-
mapped and classified into 9 major groups,21 subclassi- sponding to the most divergent kinases. Same-Protein
fied into families and finely into subfamilies. This classi- category has 78 pairs, whose entities are different crystal
fication, called the KinBase classification, was an structures of the same STY kinase. For example, PDB
augmented Hanks-Hunter system, where the primary IDs 1Y57_A (KinBase classification—TK group: Src fam-
mode of classification was still based on the catalytic ily: SrcA subfamily; UniProt ID P12931) and 1YOJ_A
domain sequence. Parameters such as sequence similarity (KinBase classification—TK group: Src family: SrcA sub-
outside the catalytic domain, domain architecture, bio- family; UniProt ID P12931) are a pair in the Same-
logical functions, association with cellular pathways, Protein category. Within-Subfamily category has 14 pairs,
localisation in the cell and orthology found in the other whose entities are crystal structures of STY kinases of
kinomes were used to incorporate changes over and identical group, family and subfamily denominations but
above the Hanks-Hunter method. This scheme also different UniProt IDs. For example, PDB IDs 1Y57_A
involves manual curation to attain a useful classification (KinBase classification—TK group: Src family: SrcA sub-
scheme that substantiates experimental results and evolu- family; UniProt ID P12931) and 3DQW_A (KinBase
tionary conservation within and across phyla. As a result, classification—TK group: Src family: SrcA subfamily;
kinases within a group may have broadly similar sub- UniProt ID P00523) form a pair in the Within-Subfamily
strate site specificity or similar strategies of regulation category. Within-Family category has 50 pairs, whose
such as binding of a second messenger and SH2 interac- entities are crystal structures of STY kinases of identical
tion; those within a family are related by similar regula- group and family denominations but different subfamily
tory features and those within a subfamily have highly assertions. For example, PDB IDs 1Y57_A (KinBase clas-
conserved sequence and regulatory features across spe- sification—TK group: Src family: SrcA subfamily; Uni-
cies. With several kinomes now available, the KinBase Prot ID P12931) and 1QPC_A (KinBase classification—
classification system is a widely accepted, well curated, TK group: Src family: SrcB subfamily; UniProt ID
and revised database, providing classification of >3000 P06239) are a pair in the Within-Family category.
kinase sequences across phyla. Definitions of group, fam- Within-Group category consists of 441 pairs, whose enti-
ily and subfamily of the STY kinases in the current study ties are crystal structures of STY kinases of identical
are obtained from KinBase. group but different family denominations. For example,

960 PROTEINS
Structural Fluctuations in Protein Kinases

PDB IDs 1Y57_A (KinBase classification—TK group: Src For the construction of network topology, only the Ca
family: SrcA subfamily; UniProt ID P12931) and 1IR3_A atoms with resolved spatial coordinates were used.
(KinBase classification—TK group: InsR family; UniProt GNM-based NMA was performed on the network thus
ID P06213) are a pair in the Within-Group category. obtained and square fluctuations of the Ca atoms with
Across-Group category has 2045 pairs, whose entities are available positional coordinates were calculated. Square
crystal structures of STY kinases belonging to different fluctuations of two residues sequentially preceding the
groups. For example, PDB IDs 1Y57_A (KinBase classifi- missing residues and two residues sequentially succeeding
cation—TK group: Src family: SrcA subfamily; UniProt the missing residues were disregarded for all analyses.
ID P12931) and 1U5R_A (KinBase classification—STE For instance, if the Ca atoms of the 10th and 11th resi-
group: STE20 family: TAO subfamily; UniProt ID dues were missing in the crystal structure of the kinase
Q9JLS3) are a pair in the Across-Group category. catalytic domain spanning residues 5 to 260, spatial coor-
dinates of Ca atoms 5 to 9 and 12 to 260 were used to
Gaussian network model based normal mode construct the virtual mass-spring topology. However, the
analysis square fluctuations of only the residues 5 to 7 and 14 to
Seventy three STY kinase structures in active confor- 260 were used for all the analyses in the study. This is
mation (Table I) were all prepared by (i) retaining only a done in order to remove any spurious fluctuations
single kinase catalytic domain structure, and not consid- caused by the virtue of missing residues in the immedi-
ering other domains/chains and (ii) deleting any bound ate surroundings. For each of the 73 crystal structures
ligands, substrates or inhibitors. Thus prepared structures used in the study, the kinase catalytic domain, missing
were each modelled as a 3-dimensional mass-spring sys- regions, residues used for network construction and the
tem using coarse-grained Elastic Network Model residues whose fluctuations were used in the study are
(ENM).50 The spatial coordinates of the back bone Ca tabulated in Supporting Information Table S1.
It is arguable that the results obtained in the study are
atoms were virtually modelled as masses and all the Ca-
an incidental effect of the missing residues in the struc-
Ca pairs with inter-Ca distance less than 7Å51 were
tures, and the consequent erroneous network topology.
virtually connected by springs of identical spring con-
In order to address this concern, we performed two con-
stants.52 Assuming that the crystal structure of the STY
trol experiments (“Missing residues control experiments”
kinase represents the equilibrium state, the topology of
section in Supporting Information). In the first experi-
harmonic potentials was solved for their vibrational
ment, we used the 49 STY kinase structures containing
modes around this stable state.53 Further simplification
no missing Ca coordinates and replicated the primary
of the model was brought about by assuming isotropic
results of the study (Supporting Information Figs. S1–
Gaussian fluctuations around the masses or Ca atoms.54
S3). This experiment shows that there is no significant
The resulting Gaussian Network Model (GNM) of the
change in the results of the study even if the structures
STY kinases, determines direction-nonspecific relative
with missing residues are eliminated from the dataset. In
fluctuations at each mass or Ca atom. Despite the
the second experiment, we modelled the missing residues
above-stated simplifications and microsampling around
in 22 of the remaining 24 structures. We show in
the equilibrium state alone, GNM-based NMA fluctua-
detailed comparative analysis that the square fluctuations
tions have been successful in identifying biologically rele-
of residues more than two amino acids away from the
vant and physiologically feasible motions of the
missing region is unaltered in the modelled structures as
proteins.16
Construction of network topologies and calculation of compared with the native structures with missing resi-
eigenvectors (normal modes) and eigenvalues (frequency) dues (Fig. S4 in Supporting Information). The above
of the structures were carried out as described in previ- results clearly advocate that the results of the study are
ous studies.15 Only the global mode or the mode of least not a consequence of the missing regions in the crystal
structures.
frequency was used in all the analyses in the study. This
is because the global mode represents the slowest motion
Residue-wise square fluctuations
that usually involves contribution from a large fraction
of residues. As a result, the global mode usually denotes For every residue i in each STY kinase structure, the
conformational changes or biologically relevant motions square fluctuation in the global mode g is given by:
in the protein.55,56 " #
T
3k B T ug u g
Treatment of missing residues hDRi2 ig 5
g kg
ii
Twenty four of the seventy three STY kinase structures
analyzed in the study have one or more missing Ca where hDRi2 ig is the square fluctuation of residue i in the
atomic positions in the positional coordinate data files. global mode g, kB is the Boltzmann constant, T is the

PROTEINS 961
R. Kalaivani et al.

absolute temperature, g is the spring constant of all the  


  N21
Ca-Ca interactions defined by the Kirchhoff matrix, kg FSSP 512 12 g2SP 3
N22
is the least eigenvalue or frequency, and ug is the eigen-
vector corresponding to the least frequency or the global
normal mode. where FSSP is the FS score calculated using Spearman
rank order correlation, gSP is the Spearman rank order
correlation coefficient calculated between the square fluc-
Normalized global mode square fluctuations
tuations of the two kinases, taking into account only
The residue-wise square fluctuations derived in the those residues which have topological equivalence in
global mode of a given structure was normalized by each other and N is the length of aligned residues. The
dividing with the highest individual-residue square fluc- effect of length of aligned residues N on FSSP , sequence
tuation in the same mode of the same structure. The identity and RMSD was investigated and found to be
fluctuation of residue i in the global mode g is normal- insignificant (Supporting Information Fig. S5).
ized as:
FS score using Pearson rank order correlation

Normalized square f luctuations This is calculated in a manner similar to the FS score


hDRi2 ig calculated using Spearman rank order correlation, with
5hDRi2 inormalized 5 n o the modification of using Pearson rank order correlation
max hDR 2 ig
instead of Spearman:
n o  
where max hDR 2 ig is a constant for every structure,   N21
and is the square fluctuation of the most mobile residue FSPS 512 12 g2PS 3
N22
in the global mode of the structure.
where FSPS is the FS score calculated using Pearson rank
Calculation of flexibility similarity (FS) score
order correlation, gPS is the Pearson rank order correla-
tion coefficient calculated between the square fluctua-
FS score, or Flexibility Similarity score, is a quantita- tions of the two kinases, taking into account only those
tive measure of similarity between the global mode struc- residues which have topological equivalence in each
tural fluctuations of any two given STY kinases. In this other, and N is the length of aligned residues.
study, FS scores were calculated using three different
measures.17,19,20 FS score using overlap of eigenvectors

GNM-based NMA was carried out for the kinase cata-


FS score using Spearman rank order correlation lytic domains and the global mode eigenvector was
Seventy-three STY kinases were all individually com- determined for the two constituent kinases in each of the
pared with one another, giving rise to 2628 possible pair- 2628 pairs. Dot product or overlap between the global-
wise comparisons. For each of the 2628 pairwise mode eigenvectors of the two kinases was calculated, tak-
comparisons, structure-based sequence alignment of the ing into account only those residues which have topolog-
ical equivalence in each other. This corresponds to the
kinase catalytic domain was performed using TM-
FS score calculated using overlap of eigenvectors:
Align57 to identify topologically equivalent residues in
the kinase pair. GNM-based NMA was individually car-
d1  EV
FSOL 5jEV d2 j
ried out for the kinase catalytic domains and the
residue-wise square fluctuations were determined for the
where FSOL is the FS score calculated using the overlap
two constituent kinases in the pair. Spearman rank order d1 is the unit vector in the direc-
of two eigenvectors, EV
correlation coefficient was calculated between the square
tion of global mode eigenvector of the first constituent
fluctuations of the two kinases, taking into account only d2 is the unit vector in the
kinase of the pair and EV
those residues which have topological equivalence in
direction of global mode eigenvector of the second con-
each other. It is to be noted that the number of aligned
stituent kinase of the pair.
residues, or residues with topological equivalence, will
vary depending on the kinases in comparison. This could
Multivariate correlation analysis
potentially introduce a bias in the calculated correlation
coefficient. To counter this bias, each correlation coeffi- Multivariate correlation analysis was carried out to
cient was calculated with a correction for the number of understand the extent of consequence of sequence and
data points (the length of aligned residues) using the structure information in the global mode structural fluc-
formula: tuations. For each of the 2628 STY kinase pairs, sequence

962 PROTEINS
Structural Fluctuations in Protein Kinases

identity and RMSD were calculated using structure based divergence was calculated by Jensen-Shannon entropy59
sequence alignment algorithm TM-Align.57 Further, the and structure divergence was calculated as Ca RMSD.
similarity in the flexibility profiles of the constituent
kinases in each of the kinase pairs was calculated as FS Interaction density analysis from iPfam
scores using Spearman rank order correlation (FSSP ), as structures
described above. The fraction of variance in FSSP scores iPfam60 is a database of all domain-domain interac-
that can be accounted for by the variance in sequence tions, validated through structure determination. We
identity and RMSD was calculated as: queried iPfam for all the PDB structures that contained a
0sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi12 protein kinase domain and another domain of interest
FSS2 1FSR2 2ð23FSS3FSR3SRÞA (say, C-terminal, SH2, and so forth). All the resulting
Vboth 5@ PDB entries (say, n hits) were parsed to identify the num-
12SR2
ber of interactions (inter-Ca distance of 6 Å or less) each
kinase residue had with the domain of interest. The
FSS5 gSP ðFSSP ; seqidÞ sequences of the kinase catalytic domains of the n struc-
tures were aligned using T-Coffee61 to identify the topo-
FSR5 gSP ðFSSP ; RMSDÞ logically equivalent residues. At every aligned position, the
number of interactions of the kinase residue with the
SR5 gSP ðseqid; RMSDÞ domain of interest across the n structures was summed
and referred to as the interaction density. This interaction
where Vboth is the fraction of variance in FSSP that can density was mapped on the kinase fold for visualization.
be explained by the variance in sequence identity and
RMSD, gSP ðFSSP ; seqidÞ is the Spearman rank order cor- Linear discriminant analysis
relation coefficient between FSSP and sequence identity,
gSP ðFSSP ; RMSDÞ is the Spearman rank order correlation Linear discriminant analysis was performed using the
coefficient between FSSP and RMSD, gSP ðseqid; RMSDÞ is Statistics and Machine Learning toolbox of MATLAB62
the Spearman rank order correlation coefficient between to understand how well a simple classifier is able to
sequence identity and RMSD and FSSP is the FS score ascertain the group and family to an STY kinase based
calculated using Spearman rank order correlation after solely on its global mode structural fluctuations. The lin-
correcting for the number of aligned residues. ear classifier was trained and tested on a data set of 71
The fraction of variance in FSSP scores that can be STY kinases (Supporting Information Fig. S6) for group
accounted for by the variance in sequence identity alone prediction. A leave-one-out approach was used, that is,
is given by: using 70 STY kinases as the training set and testing on
the remaining one, repeating 71 times. GNM perceived
VS 5Vboth 2FSR2 global mode normalized square fluctuations of the resi-
dues that had topological equivalence in each of the 71
The fraction of variance in FSSP scores that can be kinases were used for training and testing. This
accounted for by the variance in RMSD alone is given accounted for normalized square fluctuations at 326
by: alignment positions. The accuracy with which the group
of the test kinase was predicted was calculated as an
VR 5Vboth 2FSS2 average across the 71 structures. Similarly, the linear clas-
sifier was trained and tested on a dataset of 62 STY
Thus, the fraction of variance in FSSP that is explained kinases (Supporting Information Fig. S7) for family pre-
by both the sequence identity and RMSD inseparably is diction using the global mode fluctuations of the same
given by: 326 alignment positions in a leave one out method.

VSR 5Vboth 2VS 2VR


RESULTS
Seventy three STY kinase structures, solved in active
conformations at resolutions better than 3.4 Å by X-ray
Sequence and structure similarity across
crystallography, were investigated (Table I). The reason
73 kinases
for selecting active conformations is twofold: first, the
Topological equivalence of residues across the 73 STY inactive conformations are varied30 and secondly, struc-
kinases was derived from multiple structure superposi- tural fluctuations of active conformations differ system-
tion based multiple sequence alignment.58 For each atically from those of inactive conformations.15 The
of the topologically aligned position, the sequence dataset, comprising 39 unique kinases of 7 KinBase21

PROTEINS 963
R. Kalaivani et al.

groups, has an average pairwise sequence identity of the lowest in magnitude and are distributed over a wide
29.7% 6 15.3 (mean 6 standard deviation) and RMSD of range of values. Within-Group (purple curve), Within-
2.27 Å 6 0.57. In each of the 73 structures, the spatial Family (green curve), Within-Subfamily (blue curve) and
positions of Ca atoms of the kinase catalytic domain Same-Protein (red curve) FSSP scores are shifted to the
alone were used to construct the Kirchhoff matrix for right of the Across-Group scores in a sequential manner,
GNM based NMA (Gaussian network model based nor- suggesting that the distributions of correlation coefficients
mal mode analysis section in Methods). All other atoms, increase systematically and hierarchically. Kinase pairs that
including bound ligands/peptides and other domains, are more closely related (Same-Protein and Within-Sub-
were omitted. GNM based NMA was performed on the family) have higher FSSP scores than the less closely
structures, thus processed, with identical spring constants related ones (Within-Family and Within-Group), which
and standard cut-off value.54 Residue-wise square fluctu- in turn have higher scores than the Across-Group pairs.
ations were calculated for each kinase from the mode of The median of the FSSP scores in the Across-Group cat-
the lowest frequency, or global mode (Residue-wise egory, denoted by q0, is 0.84 [dotted gray line in Fig.
square fluctuations section in Methods). For all analyses, 1(A,B)]. Thus, the fraction of Across-Group pairs that
fluctuations of two residues sequentially preceding and have FSSP scores greater than q0 , P ðFSSPAG > q0 Þ, is 0.5 by
two residues sequentially succeeding the missing regions construction. The fraction of Within-Group pairs that
in the structure, if any, were disregarded (Treatment of have FSSP scores greater than q0 , P ðFSSPwG > q0 Þ, is 0.84.
missing residues section in Methods). Hence, the conservation of flexibility is relatively higher in
the Within-Group pairs than in the Across-Group pairs,
Fluctuations of closely related STY kinases that is, P ðFSSPwG > q0 Þ > P ðFSSPAG > q0 Þ50:5 (binomial
are correlated better than those of distant test, P 5 4.94E-52). The fractions of Within-Family,
homologues
P ðFSSPwF > q0 Þ, Within-Subfamily, P ðFSSPWS > q0 Þ, and
Definition of relatedness from KinBase classification Same-Protein, P ðFSSPSP > q0 Þ, pairs that have FSSP scores
greater than q0 are all 1.0. This indicates that all of the
In order to assess the similarity in fluctuation profiles
Within-Family, Within-Subfamily, and Same-Protein pairs
of STY kinases, we performed all possible 2628 pairwise
have FS scores higher than an average Across-Group pair
comparisons of residue-wise square fluctuations of the
(Binomial test, P  E-04). Further, unpaired t-test [upper-
73 STY kinases. Of these, based on KinBase classifica-
right in Fig. 1(C)] and Kolmogorov-Smirnov test [lower-
tion,21 the entities of 78 pairs correspond to identical
left in Fig. 1(C)] also distinguish the FSSP scores of the five
kinases in different crystal structures (Definition of cate-
categories significantly from one another, arguing that the
gories derived from KinBase classification section in
categories indeed have different scores. Corresponding
Methods); 14 pairs correspond to kinases of the same
analyses done for FS scores calculated using Pearson rank
subfamily; 50 pairs to kinases of the same family but dif-
order correlation [FSPS , Fig. 1(D–F)] and overlap of global
ferent subfamilies; 441 pairs to the same group but dif-
ferent families; and 2045 pairs to different groups. The mode eigenvectors [FSOL , 1(G–I)] also yield similar results.
above are referred as (i) Same-Protein, (ii) Within- Relationship between flexibility similarity scores and
Subfamily, (iii) Within-Family, (iv) Within-Group, and sequence/structural similarity
(v) Across-Group categories respectively (Table S2 in
Supporting Information). We have demonstrated that Within-Subfamily kinase
The similarity between the global mode residue-wise pairs have better conserved flexibility profiles than
square fluctuations of any two kinases, referred as Flexi- Across-Group kinase pairs. This was achieved upon con-
bility Similarity score (FS score), was calculated using sideration of category definitions from KinBase.21 How-
three measures: (i) Spearman rank order correlation [Fig. ever, as will be applicable to other protein families that
1(A–C)], (ii) Pearson rank order correlation [Fig. 1(D– are not manually classified, we wanted to understand the
F)], and (iii) overlap of the two global mode eigenvectors conservation in flexibility profiles purely as a function of
[Fig. 1(G–I)]. Figure 1(A) shows the histogram of FS sequence and structure similarity among kinases. To this
scores of all the 2628 STY kinase pairs, calculated using end, we calculated pairwise sequence identity and RMSD
Spearman rank order correlation (FSSP ) and segregated between the kinase catalytic domains of all the 2628
into one of five different categories. The FSSP scores of pairs57 (Supporting Information Table S2). The mean
Same-Protein (red) and Within-Subfamily (blue) catego- and standard error of mean of pairwise sequence identity
ries are accrued near 1.0; Within-Family (green) and and RMSD are 0.992 6 0.001 and 0.60 Å 6 0.04 for
Within-Group (purple) categories are accumulated Same-Protein pairs; 0.938 6 0.026 and 0.67 Å 6 0.12 for
around 0.95 and Across-Group (orange) category is dis- Within-Subfamily pairs; 0.575 6 0.014 and 1.33 Å 6 0.05
persed over a range of 0.60 to 0.96. Cumulative fre- for Within-Family pairs; 0.385 6 0.003 and 1.77 Å 6 0.02
quency plot of the FSSP scores [Fig. 1(B)] shows that for Within-Group pairs and 0.240 6 0.001 and 2.48
the Across-Group similarity (orange curve) scores are Å 6 0.01 for Across-Group pairs. We plotted the FSSP

964 PROTEINS
Structural Fluctuations in Protein Kinases

Figure 1
Distribution of FS scores of different categories. Each of the 73 STY kinases’ global mode square fluctuations were compared with that of every
other kinase, resulting in Flexibility Similarity scores (FS scores) for 2628 kinase pairs. Based on the KinBase classification and UniProt IDs of the
constituent kinases, the pairs and the corresponding FS scores were divided into the following categories: Same-Protein (red), Within-Subfamily
(blue), Within-Family (green), Within-Group (purple), and Across-Group (orange). FS scores were calculated using Spearman rank order correla-
tion coefficient (FSSP , A–C), Pearson rank order correlation coefficient (FSPS , D–F) or overlap of global mode eigenvectors (FSOL , G–I). (A) Fre-
quency distribution of FSSP scores is plotted for all the five categories. The dotted gray line, labelled medianAG, represents the median value q0 of
the Across-Group similarity score (FSSPAG , orange) and corresponds to a Spearman rank order correlation coefficient of 0.84. (B) Cumulative fre-
quency distribution of FSSP scores is plotted for all the five categories. The medianAG or q0, corresponding to the median of FSSPAG (orange), is
plotted as a dotted gray line. (C) FSSP score distributions of each of the five categories were compared with that of every other category using
unpaired t-test (upper-right triangle) and Kolmogorov-Smirnov test (lower-left triangle) and the resulting P values are tabulated with color codes
in a brown-purple scheme. Predominance of brown color indicates that the FSSP score distributions of the five categories are different from each
other with statistical significance. (D–F) are similar to (A–C), respectively, but the FS scores are calculated using Pearson rank order correlation
coefficients (FSPS Þ. (G–I) are similar to (A–C), respectively, but the FS scores are calculated using overlap of the global mode eigenvectors (FSOL ).
[Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

scores as a function of their sequence identity [Fig. 2(A)] correlation analysis section in Methods) on the FSSP
and structure similarity [Fig. 2(B)]. Only a weak correla- scores. Only 6.27% of the variance in FSSP scores could
tion between flexibility similarity (FSSP ) scores and the be accounted by the variance in sequence identity alone,
corresponding sequence identity is seen [Fig. 2(A), 3.41% by the variance in RMSD alone and 24.9% by the
Spearman rank order correlation, cc 5 0.56, P 5 E-215]. variance of both sequence identity and RMSD insepara-
Similarly, RMSD values also correlate weakly with con- bly. There still remains a considerable 65.4% of variation
servation in flexibility [Fig. 2(B), Spearman rank order in FSSP scores, which is unexplained by sequence identity
correlation, cc 5 20.53, P 5 E-192], although the flexibil- or RMSD. This implies that the conservation in flexibility
ity profiles are determined solely based on the three- profiles of STY kinases is not always accompanied by
dimensional spatial coordinates of Ca atoms. As the cor- conservation in sequence/structure. In this context, one
relation of flexibility similarity with sequence identity could speculate that the flexibility patterns important for
and RMSD is weak, it is arguable that the conservation function/regulation/stability of a protein were evolutio-
observed in flexibility is not a trivial consequence of narily selected against other variations that compromised
sequence/structure similarity. In this regard, we hypothe- such flexibility patterns.
sise that the flexibility profiles contain more information
than can be explained by sequence/structure alone. Fluctuations conserved in all protein kinases
To understand the extent of similarity in flexibility
contributed by sequence and structure similarity, we con- Although a large and divergent superfamily, STY
ducted a multivariate correlation analysis (Multivariate kinases have a common catalytic mechanism. Therefore,

PROTEINS 965
R. Kalaivani et al.

Figure 2
FS scores as a function of sequence and structure similarity. (A) FSSP scores of 2628 kinase pairs are plotted against their sequence identity, calcu-
lated from structure based sequence alignment using TM-align.57 The scatter points are colored according to which of the following categories the
kinase pair belongs to: Same-Protein (red), Within-Subfamily (blue), Within-Family (green), Within-Group (purple), and Across-Group (orange).
The FSSP scores, which are indicative of the similarity in fluctuation profiles, show a weak correlation (Spearman rank order correlation, cc 5 0.56,
P 5 2.6E-215) with sequence identity. (B) FSSP scores show a weak correlation with the corresponding RMSD (Spearman rank order correlation,
cc 5 20.53, P 5 5.2E-192). (C) The RMSD of the 2628 kinase pairs are plotted against their sequence identity, showing a strong correlation (Spear-
man rank order correlation, cc 5 20.73, P  0.0) between sequence and structure similarity. [Color figure can be viewed in the online issue, which
is available at wileyonlinelibrary.com.]

it is possible that a common flexibility trait, necessary As in the previous section, we examined whether the
for the catalytic function, is present in all the STY observed essential flexibility profile of STY kinases is
kinases. We sought to identify this flexibility pattern that reflected in their sequence and structure. To this end, we
is conserved through evolution and is invariably present structurally aligned the 73 kinase structures,58 calculated
in all the 73 STY kinases in our dataset. the sequence conservation scores at every alignment
After identifying the topologically equivalent residues position using Jensen-Shannon divergence59 and
from sequence based alignment,61 normalized global mapped them on the kinase fold [Fig. 3(C), red indicat-
mode square fluctuations of the corresponding residues ing the most and gray indicating the least conserved res-
in the 73 STY kinases were plotted [Fig. 3(A), orange idues]. Analogously, the RMSD at the aligned Ca
curves]. The variance in normalized square fluctuations positions of the multiple structure alignment is mapped
across kinases at each residue position is color coded in onto the kinase fold in Figure 3(D) (red indicating the
a red-gray scheme below the zero ordinate, with red least and gray indicating the highest RMSD values). We
indicating the least and gray indicating the highest var- observe barely any correlation of the variance in square
iance. Also plotted are the Z-scores of variance [Fig. fluctuations [Fig. 3(B)] with sequence conservation [Fig.
3(A), blue dots]. It is clearly seen that some residues 3(C), Spearman rank order correlation, cc 5 0.32,
show lower variation in structural fluctuations across P 5 1.5E-13] or RMSD [Fig. 3(D), Spearman rank order
kinases [Fig. 3(A), red, Z-scores < 21] than others [Fig. correlation, cc 5 20.09, P 5 0.03]. Thus, it is ascertained
3(A), gray]. Such residues also have a low mean magni- that the flexibility profile found in all the 73 kinases in
tude of normalized square fluctuations. When the the dataset, is not a trivial consequence of sequence or
residue-wise variance in structural fluctuations across structure information. Further, we probed to determine
kinases is mapped on to the kinase catalytic fold [Fig. whether residue-wise structural differences can explain
3(B)], it is seen that the residues spatially proximal to residue-wise fluctuation differences. To this end, we
the catalytic loop in the STY kinase structure possess measured Spearman rank order correlation between
highly conserved structural fluctuations. The residues residue-wise difference in square fluctuations and
around the catalytic loop show remarkable conservation residue-wise Ca-Ca displacement for each of the 2628
of low magnitude structural fluctuations consistent kinase pairs [Supporting Information Fig. S8(A–D)]. We
across all kinases. This signature flexibility profile is found that 92% of the 2628 pairs had a correlation coef-
arguably the essential dynamics for an STY kinase. This ficient of less than 0.25. Additionally, we also observed
is reasonable because the aspartate in the catalytic loop that residue-wise difference in Voronoi packing den-
acts as a hydrogen acceptor in the catalysis of phospho- sities64 correlated weakly (Spearman rank order correla-
transfer, the single common function of all the STY tion, 98% of the 2628 pairs had cc  0.1) with residue-
kinases. Previous studies on other families of proteins63 wise difference in square fluctuations [Supporting Infor-
have shown that the catalytic residues have lower struc- mation Fig. S8(A,E–G)]. Thus, we conclude that the var-
tural fluctuations than other functional residues. iation in structural fluctuations cannot be explained by

966 PROTEINS
Structural Fluctuations in Protein Kinases

Figure 3
Fluctuations conserved in all STY kinases are proximal to the catalytic loop. (A) Normalized square fluctuations of 73 STY kinases listed in Table I
are plotted (orange lines) as a function of topologically aligned positions. At every alignment position, the variance in normalized square fluctua-
tions in calculated and color coded in a red-gray scheme below the zero ordinate. Also, the Z-score of variance in normalized square fluctuations at
every alignment position is plotted (blue dots). It can be seen that some residues from noncontiguous regions exhibit very low variance in struc-
tural fluctuations (marked red below the ordinate; Z-score 20.1) than others. (B) The red-gray color code on the sequence depicted in (A) is
mapped on to the kinase fold. All the noncontiguous regions with low variance are either in or proximal to the catalytic loop in the three-
dimensional space. (C) The catalytic domain structures of the 73 STY kinases were aligned using MUSTANG58 and the Jensen-Shannon59 conser-
vation score at every aligned position is mapped on the kinase fold in a red-gray scheme. The variance in normalized square fluctuations (B) is
weakly correlated to the measure of sequence divergence (C) with a Spearman rank order correlation coefficient of 0.32 (P 5 1.5E-13). (D) The cat-
alytic domain structures of the 73 STY kinases were aligned using MUSTANG58 and the Ca RMSD at every aligned position is mapped on the
kinase fold in a red-gray scheme. The variance in normalized square fluctuations (B) is not correlated with the measure of structure similarity (D)
(Spearman rank order correlation, cc 5 20.09, P 5 0.03). [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.
com.]

trivial sequence divergence, residue-wise structural mechanisms; respond to assorted signalling molecules
difference or global RMSD. and occur in concurrence with diverse domain architec-
tures. However, there are common features in most, if
Flexibility patterns characteristic of a kinase not all, members of AGC group: (i) presence of a C-
group terminal tail outside the kinase domain containing a
hydrophobic (HF) motif,65 (ii) repositioning of aC-helix
Since the previous analyses of the study suggest that
by binding of the C-terminal tail between the aC-helix
there exists a characteristic low fluctuation profile
and b4-strand,66 (iii) activation loop phosphorylation
around the catalytic loop in all the STY kinases, we
by PDK1,67 which is docked to the kinase by the HF
explored if there also existed signature flexibility profiles
motif,68,69 and (iv) translocation of the kinase to the
for the different subtypes of STY kinases. To this end, we
membrane.70 From the above common attributes, it is
scrutinised two of the well-studied KinBase groups, AGC
discernible that the C-terminal tail is the most distin-
and TK, to find such group-specific fluctuation profiles
guishing feature of all the AGC kinases. In support of
and discuss their functional implications.
this argument, previous studies have reported remarkable
Case study: AGC group (with special reference to PKA)
conservation of kinase catalytic residues that bind the C-
terminal tail and the resulting inactivity of the kinase
AGC group encompasses diverse families of STY upon mutations/deletions of such residues.71
kinases, viz. Akt, DMPK, GRK, MAST, NDR, PDK1, We queried iPfam60 for all crystal structures contain-
PKA, PKC, PKG, PKN, RSK, RSKL, RSKR, SGK, YANK, ing both the STY kinase domain (Pfam IDs: PF00069 or
PTF. The member families participate in distinct signal PF07714) and the C-terminal tail (Pfam ID: PF00433).
transduction pathways; are modulated by diversified All the 58 structure hits belong to AGC group of STY

PROTEINS 967
R. Kalaivani et al.

Figure 4
Fluctuations specific to AGC group are seen in residues involved in AGC-specific protein-protein interactions. (A) An example AGC group STY
kinase structure (PDB ID: 2JED_A) is depicted. The C-terminal tail is represented in sticks (green). The catalytic domain itself is represented in
cartoon, with the residues colored red if they interacted (inter Ca distances of 6 Å or less) with the C-terminal tail and gray if they did not. (B)
The C-terminal interaction density at every residue position, calculated from 58 AGC structures retrieved from iPfam, is plotted on the kinase fold
in a red-gray scheme. The residues that interacted with the C-terminal tail the most number of times in all the 58 structures are colored red and
those that interacted the least are represented in gray. (C) The residues that show AGC-specific structural fluctuations are mapped (red) on the
kinase fold. The residues that showed statistically different (unpaired t-test, P < 0.005) structural fluctuations in the 17 AGC kinase structures,
when compared with the 56 non-AGC kinase structures in Table I are colored red. (D) Residue-wise mean of normalized square fluctuations of 17
AGC group kinase structures (green) and 56 non-AGC group kinase structures (purple) are plotted as a function of the aligned residue positions.
Those alignment positions that showed statistically different (unpaired t-test, P < 0.005) structural fluctuation distributions between the AGC and
the non-AGC groups are marked red below the ordinate. (E) Residue-wise mean of (i) normalized square fluctuation differences (red), (ii) Ca-Ca
displacements (blue), and (iii) Voronoi packing density differences (green) between all possible AGC-non-AGC kinase pairs are plotted as a func-
tion of the aligned residue positions. On an average, we observe only a weak correlation of fluctuation differences with Ca-Ca displacements
(Spearman rank order correlation, mean cc 5 0.37) and Voronoi packing density differences (Spearman rank order correlation, mean cc 5 0.23).
[Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

kinases and form the dataset for interaction analysis Across the 58 structures, we counted the number of
(Interaction density analysis from iPfam structures sec- times each residue in the kinase domain interacts with
tion in Methods). An example structure (PDB ID the C-terminal tail. We mapped this interaction density
2JED_A) is shown in Figure 4(A), where the C-terminal onto the kinase fold in Figure 4(B), with red indicating
residues are colored green and the kinase domain resi- maximum and gray indicating minimum interaction
dues are colored red or gray depending on whether they density. It can be noted that the C-terminal tail wraps
interact with the C-terminal tail or not respectively. around the kinase, predominantly anchoring at regions

968 PROTEINS
Structural Fluctuations in Protein Kinases

near aC-helix, b1-strand, hinge and C-terminal end of including those that form cell surface receptors (receptor
the kinase. Previous studies65,71 have recognized a frac- tyrosine kinases, or RTKs) and those that are localized in
tion of these residues to be indispensable for kinase func- the cytoplasm (non-receptor tyrosine kinases, or nRTKs).
tioning and C-terminal tail binding. Since the C-terminal Despite the diversity, a key regulatory feature commonly
tail binding is a unique feature of most families within found in many of the TKs is their binding to Src
the AGC group, we examined if the involved residues Homology 2 (SH2) domains. In the case of RTKs, cyto-
show differential dynamics, in the form of structural plasmic target proteins72 and adapter proteins73 bind to
fluctuations, in the AGC group of kinases as compared the activated kinase domains via their SH2 domains. In
with others. To this end, we compared the normalized the case of nRTKs, contiguous SH2 and SH3 domains
square fluctuations of AGC STY kinases against those of are involved in interaction with the kinase domain, thus
non-AGC STY kinases in our dataset. In Figure 4(D), regulating the TK between active and inactive states.74
residue-wise mean, across 17 AGC STY kinases (green Thus, SH2 domain, a key regulator of TK function and
curve), of the normalized square fluctuations is plotted, unknown to regulate any other group of STY kinase, was
contrasting that of 56 non-AGC STY kinases (purple investigated as a plausible regulation determinant of TKs
curve). Every residue position, that exhibits statistically that could be reflected in the structural fluctuations.
different (unpaired t-test, P < 0.005) structural fluctua- Structural fluctuation patterns observed only in the
tions between the AGC and non-AGC groups is marked members of TK group were analyzed as described in the
red below the zero ordinate. Upon mapping onto the previous section. The residue-wise interaction density of
kinase structure fold [Fig. 4(C), red], we note that these the kinase domain (Pfam ID: PF07714) with SH2
residues are clustered around aC-helix, b1-strand, hinge, domain (Pfam ID: PF00017), determined from 24 avail-
and C-terminal end. We observe remarkable agreement able crystal structures, is plotted onto the kinase fold in
between residues in AGC kinases, as determined from Figure 6(A). We observe that aC-helix, loop connecting
structure complexes, to be involved in group-specific reg- b7-b8 strands and parts of activation loop, aF-helix,
ulatory interactions [Fig. 4(B)] and those that were aH-helix and aI-helix are the known regions of interac-
derived from structural fluctuation analysis [Fig. 4(C)]. tion with SH2 domain. Figure 6(C) shows the residue-
We probed if the difference in flexibility profiles observed wise mean normalized square fluctuations of 25 TKs
between the AGC and non-AGC kinases could be (green curve) and 48 non-TKs (purple curve) from our
explained by structural parameters like Ca-Ca displace- dataset (Table I). Those residue positions that show dif-
ments or Voronoi packing density differences. To this ferential (unpaired t-test, P < 0.0005) structural fluctua-
end, we calculated the Ca-Ca displacements, difference tions between the TK and non-TK kinases are marked in
in Voronoi packing densities and difference in square red below the zero ordinate. When mapped onto the
fluctuations at every alignment position between all pos- kinase fold [Fig. 6(B), red], these residues span aC-helix,
sible AGC-nonAGC structure pairs. In Figure 4(E), loop connecting b7-b8 strands and parts of activation
residue-wise mean difference in square fluctuations (red loop, aG-helix, aH-helix and aI-helix. We observe a
curve), residue-wise mean Ca-Ca displacements (blue strong agreement between the residues which are found
curve) and residue-wise mean difference in Voronoi to be involved in group-specific regulatory interactions
packing densities (green curve) are plotted. Fluctuation of TKs [Fig. 6(A)] and those that were derived from
differences only weakly correlate with Ca-Ca displace- structural fluctuation analysis [Fig. 6(B)]. As in the pre-
ments (Spearman rank order correlation, mean vious analyses, we see that the fluctuation differences
cc 5 0.37) and Voronoi packing density differences between TK and non-TK kinase pairs are only weakly
(Spearman rank order correlation, mean cc 5 0.23). correlated with Ca-Ca displacements (Spearman rank
Thus, the correlate of residue interactions specific to order correlation, mean cc 5 0.46) and Voronoi packing
AGC kinases, observed as differential fluctuations, is not density differences (Spearman rank order correlation,
a trivial consequence of structure differences. Repeating mean cc 5 0.27). Analyses contrasting only the nRTKs
the analysis, considering only the Protein Kinase A (PKA with non-TKs (Fig. 7) also yield similar results.
or c-AMP dependent protein kinase) family of AGC
group of kinases against non-AGC kinases (Fig. 5), also Flexibility features characteristic of a
yields similar results. family—A case study with CDKs

In the previous sections, we have successfully


Case study: TK group (with special reference to nRTKs)
delineated the structural fluctuations essential for all STY
The second group of STY kinases enquired for group- kinases and those specific for AGC and TK groups. Fur-
specific structural fluctuations is tyrosine kinases (TK). ther, we investigated if the residues characteristic of a
TK group consists of kinases that transfer the phosphate kinase family reflect structural dynamics characteristic of
moiety from an ATP specifically to a tyrosine residue in the family. For this analysis, cyclin-dependent kinase
the substrate protein. It encompasses diverse families, (CDK) family of the CMGC group was considered since

PROTEINS 969
R. Kalaivani et al.

Figure 5
Fluctuations specific to PKA are seen in residues involved in AGC-specific protein-protein interactions. (A) An example PKA STY kinase structure
(PDB ID: 1ATP_E) is depicted. The C-terminal tail is shown in sticks (green) and the catalytic domain is represented in cartoon. (B) The
C-terminal interaction density at every residue position, calculated from 1ATP_E, is plotted on the kinase fold in a red-gray scheme. The residues
in the kinase catalytic domain that interacted with the C-terminal (inter Ca distances of 6 Å or less) are colored red and the rest are colored gray.
(C) Upon analysis of 9 PKA and 56 non-AGC kinase structures in Table I, residues that showed PKA-specific structural fluctuations (unpaired
t-test, P < 0.005) are mapped in red color on the kinase fold. (D) Residue-wise mean of normalized square fluctuations of 9 PKA kinase structures
(green) and 56 non-AGC group kinase structures (purple) are plotted as a function of the aligned residue positions. Those alignment positions that
showed statistically different (unpaired t-test, P < 0.005) structural fluctuation distributions between the PKA and the non-AGC groups are marked
red below the ordinate. (E) Residue-wise mean of (i) normalized square fluctuation differences (red), (ii) Ca-Ca displacements (blue), and (iii)
Voronoi packing density differences (green) between all possible PKA-non-AGC kinase pairs are plotted as a function of the aligned residue posi-
tions. On an average, we observe only a weak correlation of fluctuation differences with Ca-Ca displacements (Spearman rank order correlation,
mean cc 5 0.33) and Voronoi packing density differences (Spearman rank order correlation, mean cc 5 0.22). [Color figure can be viewed in the
online issue, which is available at wileyonlinelibrary.com.]

they are stringently regulated with several protein-protein cartoon representations of each of these interactions
interactions. Some of these interactions are observed along with the interacting residues (red) in the kinase
only in the kinases of CDK family, that is, a given pro- catalytic domain. Similar to the previous analyses, the
tein interacts with most members of the CDK family, residues in seven structures of the CDK family [Fig.
and with none of any other family, in the CMGC group. 8(E), green curve] in our dataset (Table I) that showed
We identified four such CDK-family-specific interactions differential (unpaired t-test, P < 0.05) structural fluctua-
from iPfam:60 Cyclin_N (Pfam ID: PF00134), Ank_2 tions when compared with the eight structures of the
(Pfam ID: PF12796), CDKN3 (Pfam ID: PF05706), and non-CDK families [Fig. 8(E), purple curve] of CMGC
CKS (Pfam ID: PF01111). Figure 8(A–D) illustrates the group were marked in red below the zero ordinate.

970 PROTEINS
Structural Fluctuations in Protein Kinases

Figure 6
Fluctuations specific to TK group exist in residues involved in TK-specific protein-protein interactions. (A) The SH2 domain interaction density at
every residue position, calculated from 24 TK structures retrieved from iPfam, is plotted on the kinase fold in a red-gray scheme. The residues that
interacted with the SH2 domain the most number of times in all the 24 structures are colored red and those that interacted the least are repre-
sented in gray. (B) Twenty-five TK and 48 non-TK structures’ global mode normalized square fluctuations were analyzed. The residues that show
TK-specific structural fluctuations with significant difference from non-TK groups (unpaired t-test, P < 0.0005) are mapped (red) on the kinase
fold. (C) Residue-wise mean of normalized square fluctuations of 25 TK group kinase structures (green) and 48 non-TK group kinase structures
(purple) are plotted as a function of aligned positions. Those alignment positions that showed statistically different (unpaired t-test, P < 0.0005)
structural fluctuation distributions between the TK and the non-TK groups are marked red below the ordinate. (D) Residue-wise mean of (i) nor-
malized square fluctuation differences (red), (ii) Ca-Ca displacements (blue), and (iii) Voronoi packing density differences (green) between all pos-
sible TK-non-TK pairs are plotted as a function of the aligned residue positions. On an average, we observe only a weak correlation of fluctuation
differences with Ca-Ca displacements (Spearman rank order correlation, mean cc 5 0.46) and Voronoi packing density differences (Spearman rank
order correlation, mean cc 5 0.27). [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

When mapped on to the kinase catalytic fold [Fig. 8(F), and differential fluctuations at the kinase, group and
red], we can see remarkable agreement between the resi- family levels.
dues involved in binding of only CDKs to other proteins
[Fig. 8(A–D)] and those derived from structural fluctua- Prediction of group and family
tions [Fig. 8(F)]. Taken together, our analyses strongly
advocate that the residues functionally specialized to We performed a classical linear discriminant analysis
carry out protein-protein interactions show conserved to predict the group classification of 71 STY kinase

PROTEINS 971
R. Kalaivani et al.

Figure 7
Fluctuations specific to nRTKs are seen in residues involved in TK-specific protein-protein interactions. (A) The SH2 domain interaction density at
every residue position, calculated from 22 nRTK structures retrieved from iPfam, is plotted on the kinase fold in a red-gray scheme. The residues
that interacted with the SH2 domain the most number of times in all the 22 structures are colored red and those that interacted the least are repre-
sented in gray. (B) The residues that show nRTK-specific structural fluctuations are mapped (red) on the kinase fold. The residues that showed
statistically different (unpaired t-test, P < 0.0005) structural fluctuations in the 18 nRTK structures, when compared with the 48 non-TK structures
in Table I is colored red. (C) Residue-wise mean of normalized square fluctuations of 18 nRTK structures (green) and 48 non-TK group kinase
structures (purple) are plotted as a function of the aligned residue positions. Those alignment positions that showed statistically different (unpaired
t-test, P < 0.0005) structural fluctuation distributions between the nRTK and the non-TK groups are marked red below the ordinate. (D) Residue-
wise mean of (i) normalized square fluctuation differences (red), (ii) Ca-Ca displacements (blue), and (iii) Voronoi packing density differences
(green) between all possible nRTK-non-TK pairs are plotted as a function of the aligned residue positions. On an average, we observe only a weak
correlation of fluctuation differences with Ca-Ca displacements (Spearman rank order correlation, mean cc 5 0.46) and Voronoi packing density
differences (Spearman rank order correlation, mean cc 5 0.32). [Color figure can be viewed in the online issue, which is available at wileyonlineli-
brary.com.]

structures (Supporting Information Fig. S6) using their would result in an accuracy of 20% based purely on
structural fluctuations at 326 alignment positions in a chance. If the distribution of the 71 kinases in the five
leave-one-out method. The 71 STY kinases in the dataset groups is taken into account, we expect an accuracy of
belonged to one of the five groups: AGC, CAMK, CMGC, 18.8% 6 9.7 by chance. However, the simplest classifier
STE, and TK. Assuming equal distribution and sampling could predict the groups of the STY kinases, based solely
of kinases in the five groups, a random group classification on the normalized global mode structural fluctuations,

972 PROTEINS
Structural Fluctuations in Protein Kinases

Figure 8
Fluctuations specific to CDK family are seen in residues involved in CDK-specific protein-protein interactions. CDK-specific protein-protein inter-
actions with (A) Cyclin_N (PDB ID: 1JST), (B) Ank_2 (PDB ID: 1BLX), (C) CDKN3 (PDB ID: 1FQ1), and (D) CKS (PDB ID: 1BUH) are
depicted. The CDK-specific interacting partners are represented in sticks (green). The kinase catalytic domain itself is represented in cartoon, with
the residues colored red if they interacted (inter Ca distances of 6 Å or less) with the CDK-specific partner and gray if they did not. (E) Residue-
wise mean of normalized square fluctuations of seven CDK family structures of CMGC group (green) and 8 non-CDK family structures of CMGC
group (purple) in Table I are plotted as a function of the aligned residue positions. Those alignment positions that showed statistically different
(unpaired t-test, P < 0.05) structural fluctuation distributions between the CDK and the non-CDK families are marked red below the ordinate. (F)
The residues that show CDK-specific structural fluctuations (unpaired t-test, P < 0.05) in the seven CDK structures, when compared with the eight
non-CDK structures, are mapped (red) on the kinase fold. (G) Residue-wise mean of (i) normalized square fluctuation differences (red), (ii) Ca-
Ca displacements (blue), and (iii) Voronoi packing density differences (green) between all possible CDK-non-CDK pairs are plotted as a function
of the aligned residue positions. On an average, we observe only a weak correlation of fluctuation differences with Ca-Ca displacements (Spearman
rank order correlation, mean cc 5 0.54) and Voronoi packing density differences (Spearman rank order correlation, mean cc 5 0.33).

with an accuracy of 90.14%, significantly better than discriminant analysis suggests that the information present
chance (v2 test, P 5 E-10). Similarly, prediction of family in the global mode fluctuations of the kinase residues is
was done on a dataset of 62 STY kinases (Supporting sufficient to classify them into groups and families. Previ-
Information Fig. S7). They belonged to 16 families: Akt, ously, we demonstrated that the group-specific and
DMPK, PKA, PKC, DAPK, PHK, CDK, GSK, MAPK, family-specific residues, in terms of differential structural
SRPK, STE20, Abl, Ack, EGFR, InsR, and Src. A random fluctuations, often corresponded to regions of specific
prediction of family would be of 6.25% accuracy. Consid- modular interactions with other proteins/domains. Taken
eration of distribution of kinases in the families gives a together, we infer that those alignment positions that are
mean expected accuracy of 4.7% 6 4.2 by chance. Again, a weighed maximally by the classifier would be likely sites of
better than chance (v2 test, P 5 E-16) of 82.26% accuracy interactions with other proteins/domains observed only in
was observed in the classifier predictions. This the group or family concerned.

PROTEINS 973
R. Kalaivani et al.

Application of structural fluctuations in fluctuations (derived from GNM-based NMA) of the


inhibition of kinases homologous STY kinase superfamily have been scruti-
Design of proper kinase inhibitors remains a difficult nised. STY kinases have been experimentally and man-
task, especially in attaining selectivity of target. Although ually curated over the years and categorized into groups,
many inhibitors have been designed75 for various STY subcategorized into families and further divided into
kinases, recent high throughput binding and functional subfamilies. STY kinases within a subfamily are more
assays have revealed the extensive cross reactivity of the closely related to each other than those within a family
drugs.76,77 One approach to obtain highly selective and across subfamilies, which are in turn more closely
drugs is to limit their binding to the target STY kinase related to each other than those within a group and
alone. This requires knowledge of regions in the STY across families. The clear hierarchy is manually estab-
lished, taking into account the sequence similarities, sub-
kinase that are specific to the target by way of differential
strate specificities, concurring domain architectures,
binding abilities. From our previous analyses, we derive
cellular localization, participating biological pathways,
that these regions with differential binding abilities will
etc.21 This extensive understanding of the STY kinases
have differential structural fluctuations. We aim to pin-
makes it possible to validate the effectiveness of GNM-
point the regions of differential fluctuations using GNM
based NMA derived structural fluctuations in under-
based NMA that could be potential drug targets with
standing protein relatedness.
selective binding modes. To this end, we analyzed the
STY kinases form a large family with diverse substrate
milestone studies of Fabian et al.76 and Karaman et al.77
specificities, regulatory mechanisms and protein-protein
and noted that Src inhibitors frequently bind to off-
interactions. These properties are comparable for closely
targets like Abl and EGFR kinases. We undertook the
related STY kinases (Within-Subfamily category) and
case study of Src kinases to determine the regions that
diverge progressively as we move through Within-Family,
are characterized by statistically distinct structural fluctu-
Within-Group, and Across-Group categories. Consistent
ations from that of Abl and EGFR kinases.
with this, the global mode structural fluctuations are
From our dataset (Table I), we compared the normal-
most correlated in the Same-Protein and Within-
ized global mode square fluctuations of 10 Src kinases
Subfamily categories, with hierarchical decrease through
[Fig. 9(A), green curve] against that of 4 Abl kinases
Within-Family, Within-Group and Across-Group catego-
[Fig. 9(A), purple curve]. The residues with differential ries. This is the first pointer that suggests that the struc-
fluctuations (unpaired t-test, P < 0.05) are marked in red tural fluctuations derived from NMA reflect
below the ordinate. When mapped to the Src kinase fold, diversification in regulatory features observed in homolo-
these residues [Fig. 9(B), red] correspond to the b-sheet gous kinases. It is well known that the functional or reg-
in the N-lobe and the aG-helix in the C-lobe. Likewise, ulation relatedness among proteins, in many cases,
when compared with the three EGFR family kinases [Fig. corresponds to similar sequence and structure. Our study
9(C), purple curve], the Src kinases [Fig. 9(C), green points out that the similarity in structural fluctuations,
curve] had differential (unpaired t-test, P < 0.01) struc- although derived solely from the Ca positions, shows
tural fluctuations around the b-sheet in the N-lobe and only weak correlations with structure similarity (such as
the aG-helix in the C-lobe [Fig. 9(D), red]. We propose RMSD, Ca-Ca displacements and packing density differ-
that these regions are potential targets for drugs with ences) and sequence similarity. Taken together, the global
increased selectivity imparted by specific binding. mode structural fluctuations correspond well with
Recently, the aG-helix has also been identified to be domain/protein association attributes and weakly with
essential for substrate recognition and binding in many sequence and structure attributes. This trend has been
kinases.78,79 Thus, aG-helix is an attractive candidate previously observed17,19 in globins and other homolo-
for drug target that will both inhibit the kinase activity gous families. These evidences strongly suggest that the
as well as impart selectivity. This prediction remains to fluctuations important for retention of attributes such as
be validated experimentally. specificity, regulation, and localization are selectively con-
served during evolution against those that are not.
DISCUSSION If there is indeed conservation of structural fluctua-
tions, in order to preserve common regulatory attributes
Traditional methods that detect similarity in sequence, of related kinases, we should be able to identify a con-
structure and motifs in order to gauge the relatedness served flexibility signature in all the STY kinases that is
among proteins have shortcomings. Dynamics is emerg- crucial for the common phosphotransfer function. Such
ing as the key indicator of function, regulation and spec- a conserved flexibility profile was identified around the
ificity of proteins. Indeed, proteins exist as fluid catalytic loop of the kinase fold. The residues three-
molecules with constant motions inside the cell80 and dimensionally in and around the catalytic loop enjoy
movements are crucial to function, regulation, stability very little mobility when compared with the other resi-
and evolution.81,82 For the first time, global mode dues. This result also corroborates with a previous

974 PROTEINS
Structural Fluctuations in Protein Kinases

Figure 9
Residues with fluctuations specific to Src family may be used for drug targeting. (A) Residue-wise mean of normalized square fluctuations of 10
Src family structures of TK group (green) and 4 Abl family structures of TK group (purple) in Table I are plotted as a function of the aligned resi-
due positions. Those alignment positions that showed statistically different (unpaired t-test, P < 0.05) structural fluctuation distributions between
the Src and Abl families are marked red below the ordinate. (B) The residues that showed differential fluctuations (unpaired t-test, P < 0.05) in the
10 Src structures, when compared with the four Abl structures, are mapped (red) on the kinase fold. (C) Residue-wise mean of (i) normalized
square fluctuation differences (red), (ii) Ca-Ca displacements (blue), and (iii) Voronoi packing density differences (green) between all possible Src-
Abl kinase pairs are plotted as a function of the aligned residue positions. On an average, we observe only a weak correlation of fluctuation differ-
ences with Ca-Ca displacements (Spearman rank order correlation, mean cc 5 0.38) and Voronoi packing density differences (Spearman rank order
correlation, mean cc 5 0.22). (D) Residue-wise mean of normalized square fluctuations of 10 Src family structures of TK group (green) and 3
EGFR family structures of TK group (purple) in Table I are plotted as a function of the aligned residue positions. Those alignment positions that
showed statistically different (unpaired t-test, P < 0.01) structural fluctuation distributions between the Src and the EGFR families are marked red
below the ordinate. (E) The residues that showed differential fluctuations (unpaired t-test, P < 0.01) in the 10 Src structures, when compared with
the 3 EGFR structures, are mapped (red) on the kinase fold. (F) Residue-wise mean of (i) normalized square fluctuation differences (red), (ii) Ca-
Ca displacements (blue), and (iii) Voronoi packing density differences (green) between all possible Src-EGFR kinase pairs are plotted as a function
of the aligned residue positions. On an average, we observe only a weak correlation of fluctuation differences with Ca-Ca displacements (Spearman
rank order correlation, mean cc 5 0.57) and Voronoi packing density differences (Spearman rank order correlation, mean cc 5 0.20). [Color figure
can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

study63 that reported relatively higher stability of cata- throughout the STY kinase superfamily. This conserved
lytic residues in other enzyme families. Further, the flexibility signature can be understood in the light of
restricted mobility of these residues is conserved function of STY kinases. The aspartate in the catalytic

PROTEINS 975
R. Kalaivani et al.

loop is a completely conserved residue that accepts CONCLUSIONS


hydrogen during the phosphotransfer. It would be essen-
tial for the residues around the phosphosite and nucleo- We have, for the first time, validated GNM based
side to be stable for efficient phosphate moiety transfer, NMA as a tool to understand the regulation attributes
explaining the signature stability of the catalytic loop in and specificities in a protein superfamily. Taking the
the kinases. well-studied STY kinases as the model system, we have
After describing the structural fluctuations common to established that the global mode structural fluctuations
all the STY kinases, the present study identified struc- are conserved better in closely related proteins than in
tural fluctuations conserved in closely-related STY distantly related proteins. Such a conservation of fluctua-
kinases, such as AGC and TK groups. The structural tion profiles is not a trivial consequence of sequence and
fluctuations which were differentially present in AGC structure similarities, but has correspondence to func-
group, as compared with all other groups, were indeed tional relatedness. The conservation of fluctuations is
found in residues that were involved in AGC-group- reliably observed at different levels of hierarchy of STY
specific protein-protein interactions. Similarly, the fluctu- kinases: group, family, and subfamily. We identified fluc-
ations that demarcated the TK group from all other tuation profiles essential to all STY kinases and unique
groups were observed in residues involved in TK-group- to AGC group, TK group and CDK family of kinases.
specific protein-protein interactions. We report, for the Such group-characteristic and family-characteristic fluc-
first time, that residues that accomplish group-specific tuation patterns strongly localized to group-specific and
regulatory roles (like modular interaction with another family-specific protein-protein interaction sites. Finally,
domain/protein) also possess fluctuations that are signifi- we tested whether the global mode fluctuations con-
cantly different from other groups. This indicates that tained enough information to demarcate the groups and
residues involved in group-specific regulatory mecha- families of STY kinases. After successful classification, we
nisms can be identified by virtue of their differential identified, based on flexibility profile analysis, aG-helix
fluctuation patterns. This could find implications in to be a potential drug target site in Src kinase that would
identifying regions of protein-protein interactions and minimise cross reaction with Abl and EGFR kinases.
regulatory modulations from the structures alone. This
result is also true in a much narrower and more specific ACKNOWLEDGMENT
category of family. CDK family of STY kinases, belonging
The authors thank the anonymous reviewers whose
to the CMGC group, shows fluctuations remarkably dif-
comments helped improve the rigor and presentation of
ferent from other CMGC kinases in those residues that
the study.
are involved in CDK-family-specific protein-protein
interactions. Thus, we have identified structural fluctua-
REFERENCES
tions essential for STY kinases, specific for AGC and TK
groups and unique to CDK family. 1. D’Aquino JA, Ringe D. Determinants of the SRC homology domain
Present analyses suggest that global mode fluctuations 3-like fold. J Bacteriol 2003; 185:4081–4086.
of STY kinases contain information specific to functional 2. Sudarsanam S. Structural diversity of sequentially identical subse-
quences of proteins: identical octapeptides can have different con-
attributes like protein binding, and thus are capable of
formations. Proteins 1998; 30:228–231.
identifying the group and family classification of a kinase 3. Minor DL, Kim PS. Context-dependent secondary structure forma-
structure. Upon testing the hypothesis using a linear dis- tion of a designed protein sequence. Nature 1996; 380:730–734.
crimination analysis, we find that it is possible to classify 4. Joshi T, Xu D. Quantitative assessment of relationship between
a kinase structure to its group and family solely based on sequence similarity and function similarity. BMC Genomics 2007; 8:
its global mode square fluctuations. Finally, we present a 222.
5. Hegyi H, Gerstein M. Annotation transfer for genomics: measuring
possible application of using the global mode square functional divergence in multi-domain proteins. Genome Res 2001;
fluctuations as a means to predict drug targets in STY 11:1632–1640.
kinases. As a case study, we identified the aG-helix of 6. Dodson G, Wlodawer A. Catalytic triads and their relatives. Trends
the Src kinases to be a potential drug target. We predict Biochem Sci 1998; 23:347–52.
7. Doolittle RF. Convergent evolution: the need to be explicit. Trends
that a drug designed to bind the Src kinase at the aG-
Biochem Sci 1994; 19:15–18.
helix will not only inhibit the kinase, but also have mini- 8. Murzin AG. How far divergent evolution goes in proteins. Curr
mal cross reaction with similar kinases like Abl and Opin Struct Biol 1998; 8:380–387.
EGFR. Indeed, aG-helix has a sufficiently distinct fluctu- 9. Nagano N, Orengo CA, Thornton JM. One fold with many func-
ation pattern in Src, when compared with Abl and EGFR tions: the evolutionary relationships between TIM barrel families
based on their sequences, structures and functions. J Mol Biol 2002;
kinases. As an extension, we could interpret that aG-
321:741–765.
helix will result in distinct binding modes, and hence 10. Neidhart DJ, Kenyon GL, Gerlt JA, Petsko GA. Mandelate racemase
curtail cross reaction. Laboratory experiments need to be and muconate lactonizing enzyme are mechanistically distinct and
carried out in order to test this prediction. structurally homologous. Nature 1990; 347:692–694.

976 PROTEINS
Structural Fluctuations in Protein Kinases

11. Axelsen KB, Palmgren MG. Evolution of substrate specificities in major scaffolder in brush borders of proximal tubular cells. Kidney
the P-Type ATPase superfamily. J Mol Evol 1998; 46:84–101. Int 2003; 64:1733–1745.
12. Ubersax J. a, Ferrell JE. Mechanisms of specificity in protein phos- 35. Eggers CT, Schafer JC, Goldenring JR, Taylor SS. D-AKAP2 interacts
phorylation. Nat Rev Mol Cell Biol 2007; 8:530–541. with Rab4 and Rab11 through its RGS domains and regulates trans-
13. Perona JJ, Craik CS. Evolutionary divergence of substrate specificity ferrin receptor recycling. J Biol Chem 2009; 284:32869–32880.
within the chymotrypsin-like serine protease fold. J Biol Chem 36. Masterson LR, Mascioni A, Traaseth NJ, Taylor SS, Veglia G. Allo-
1997; 272:29987–29990. steric cooperativity in protein kinase A. Proc Natl Acad Sci USA
14. Colicelli J. ABL tyrosine kinases: evolution of function, regulation, 2008; 105:506–11.
and specificity. Sci Signal 2010; 3:re6. 37. Kinderman FS, Kim C, von Daake S, Ma Y, Pham BQ, Spraggon G,
15. Kalaivani R, Srinivasan N. A Gaussian network model study sug- Xuong N-H, Jennings PA, Taylor SS. A dynamic mechanism for
gests that structural fluctuations are higher for inactive states than AKAP binding to RII isoforms of cAMP-dependent protein kinase.
active states of protein kinases. Mol Biosyst 2015; 11:1079–1095. Mol Cell 2006; 24:397–408.
16. Bahar I, Jernigan RL. Vibrational dynamics of transfer RNAs: com- 38. Boettcher AJ, Wu J, Kim C, Yang J, Bruystens J, Cheung N,
parison of the free and synthetase-bound forms. J Mol Biol 1998; Pennypacker JK, Blumenthal DA, Kornev AP, Taylor SS. Realizing
281:871–884. the allosteric potential of the tetrameric protein kinase A RIa holo-
17. Maguid S, Fernandez-Alberti S, Ferrelli L, Echave J. Exploring the enzyme. Structure (London, England: 1993) 2011; 19:265–276.
common dynamics of homologous proteins. Application to the glo- 39. Ficarro SB, McCleland ML, Stukenberg PT, Burke DJ, Ross MM,
bin family. Biophys J 2005; 89:3–13. Shabanowitz J, Hunt DF, White FM. Phosphoproteome analysis by
18. Keskin O, Jernigan RL, Bahar I. Proteins with similar architecture mass spectrometry and its application to Saccharomyces cerevisiae.
exhibit similar large-scale dynamic behavior. Biophys J 2000; 78: Nat Biotechnol 2002; 20:301–305.
2093–2106. 40. Corbin JD, Turko IV, Beasley A, Francis SH. Phosphorylation of
19. Maguid S, Fernandez-Alberti S, Parisi G, Echave J. Evolutionary phosphodiesterase-5 by cyclic nucleotide-dependent protein kinase
conservation of protein backbone flexibility. J Mol Evol 2006; 63: alters its catalytic and allosteric cGMP-binding activities. Eur J Bio-
448–457. chem 2000; 267:2760–2767.
20. Maguid S, Fernandez-Alberti S, Echave J. Evolutionary conservation 41. Biggs WH, Meisenhelder J, Hunter T, Cavenee WK, Arden KC. Pro-
of protein vibrational dynamics. Gene 2008; 422:7–13. tein kinase B/Akt-mediated phosphorylation promotes nuclear
21. Manning G, Whyte DB, Martinez R, Hunter T, Sudarsanam S. The exclusion of the winged helix transcription factor FKHR1. Proc Natl
protein kinase complement of the human genome. Science (New Acad Sci USA 1999; 96:7421–7426.
York) 2002; 298:1912–1934. 42. Bryant PB, Joel FH. Phosphorylation of the cAMP response element
22. Caenepeel S, Charydczak G, Sudarsanam S, Hunter T, Manning G. binding protein CREB by cAMP-dependent protein kinase A and
The mouse kinome: discovery and comparative genomics of all glycogen synthase kinase-3 alters DNA-binding affinity, conforma-
mouse protein kinases. Proc Natl Acad Sci USA 2004; 101:11707– tion, and increases net charge. Biochemistry 1998; 37:3795–3809.
11712. 43. Hans F, Dimitrov S. Histone H3 phosphorylation and cell division.
23. Plowman GD, Sudarsanam S, Bingham J, Whyte D, Hunter T. The Oncogene 2001; 20:3021–3027.
protein kinases of Caenorhabditis elegans: a model for signal trans- 44. Hernandez SE, Krishnaswami M, Miller AL, Koleske AJ. How do
duction in multicellular organisms. Proc Natl Acad Sci USA 1999; Abl family kinases regulate cell shape and movement? Trends Cell
96:13603–13610. Biol 2004; 14:36–44.
24. Endicott JA, Noble MEM, Johnson LN. The structural basis for con- 45. Acin-Perez R, Gatti DL, Bai Y, Manfredi G. Protein phosphorylation
trol of eukaryotic protein kinases. Annu Rev Biochem 2012; 81:587– and prevention of cytochrome oxidase inhibition by ATP: coupled
613. mechanisms of energy metabolism regulation. Cell Metab 2011; 13:
25. Nolen B, Taylor S, Ghosh G. Regulation of protein kinases; control- 712–719.
ling activity through activation segment conformation. Mol Cell 46. Dix MM, Simon GM, Wang C, Okerberg E, Patricelli MP, Cravatt
2004; 15:661–675. BF. Functional interplay between caspase cleavage and phosphoryla-
26. Shchemelinin I, Sefc L, Necas E. Protein kinases, their function and tion sculpts the apoptotic proteome. Cell 2012; 150:426–440.
implication in cancer and other diseases. Folia Biol 2006; 52:81–100. 47. Hanks SK, Hunter T. Protein kinases 6. The eukaryotic protein
27. Adams JA. Kinetic and catalytic mechanisms of protein kinases. kinase superfamily: kinase (catalytic) domain structure and classifi-
Chem Rev 2001; 101:2271–2290. cation. FASEB J 1995; 9:576–596.
28. Hanks S, Quinn A, Hunter T. The protein kinase family: conserved 48. Rakshambikai R, Manoharan M, Gnanavel M, Srinivasan N. Typical
features and deduced phylogeny of the catalytic domains. Science and atypical domain combinations in human protein kinases: func-
1988; 241:42–52. tions, disease causing mutations and conservation in other primates.
29. Zheng J, Trafny EA, Knighton DR, Xuong NH, Taylor SS, Ten Eyck RSC Adv 2015; 5:25132–25148.
LF, Sowadski JM. 2.2 A refined crystal structure of the catalytic sub- 49. Martin J, Anamika K, Srinivasan N. Classification of protein kinases
unit of cAMP-dependent protein kinase complexed with MnATP on the basis of both kinase and non-kinase regions. PLoS One
and a peptide inhibitor. Acta Crystallogr D 1993; 49:362–365. 2010; 5:e12460.
30. Huse M, Kuriyan J. The conformational plasticity of protein kinases. 50. Tirion MMM. Large amplitude elastic motions in proteins from a
Cell 2002; 109:275–282. single-parameter, atomic analysis. Phys Rev Lett 1996; 77:1905–
31. Kannan N, Neuwald AF. Did protein kinase regulatory mechanisms 1908.
evolve through elaboration of a simple structural component? J Mol 51. Miyazawa S, Jernigan RL. Estimation of effective interresidue con-
Biol 2005; 351:956–972. tact energies from protein crystal structures: quasi-chemical approx-
32. Pawson T, Scott JD. Protein phosphorylation in signaling–50 years imation. Macromolecules 1985; 18:534–552.
and counting. Trends Biochem Sci 2005; 30:286–290. 52. Rader AJ, Chennubhotla C, Yang L, Bahar I. The Gaussian network
33. Kim C, Cheng CY, Saldanha SA, Taylor SS. PKA-I holoenzyme model: Theory and application. In: Cui Q, Bahar I, editors. Normal
structure reveals a mechanism for cAMP-dependent activation. Cell mode analysis. Theory and Applications to Biological and Chemical
2007; 130:1032–1043. Systems, Boca Raton: Chapman & Hall/CRC; 2006. pp 41–64.
34. Gisler SM, Pribanic S, Bacic D, Forrer P, Gantenbein A, Sabourin 53. Haliloglu T, Bahar I, Erman B. Gaussian dynamics of folded pro-
LA, Tsuji A, Zhao Z-S, Manser E, Biber J, Murer H. PDZK1: I. a teins. Phys Rev Lett 1997; 79:3090–3093.

PROTEINS 977
R. Kalaivani et al.

54. Bahar I, Atilgan a. R, Erman B. Direct evaluation of thermal fluctu- 69. Gao T, Toker A, Newton AC. The carboxyl terminus of protein kinase c
ations in proteins using a single-parameter harmonic potential. Fold provides a switch to regulate its interaction with the phosphoinositide-
Design 1997; 2:173–181. dependent kinase, PDK-1. J Biol Chem 2001; 276:19588–19596.
55. Jernigan RL, Demirel MC, Bahar I. Relating structure to function 70. Peterson RT, Schreiber SL. Kinase phosphorylation: keeping it all in
through the dominant modes of motion of DNA topoisomerase II. the family. Curr Biol 1999; 9:R521–R524.
Int J Quantum Chem 1999; 75:301–312. 71. Chestukhin A, Litovchick L, Schourov D, Cox S, Taylor SS, Shaltiel
56. Bahar I, Erman B, Jernigan RL, Atilgan a. R, Covell DG. Collective S. Functional malleability of the carboxyl-terminal tail in protein
motions in HIV-1 reverse transcriptase: examination of flexibility kinase A. J Biol Chem 1996; 271:10175–10182.
and enzyme function. J Mol Biol 1999; 285:1023–1037. 72. Koch CA, Anderson D, Moran MF, Ellis C, Pawson T. SH2 and SH3
57. Zhang Y, Skolnick J. TM-align: a protein structure alignment algo- domains: elements that control interactions of cytoplasmic signaling
rithm based on the TM-score. Nucleic Acids Res 2005; 33:2302– proteins. Science (New York, NY) 1991; 252:668–674.
2309. 73. Lowenstein EJ, Daly RJ, Batzer AG, Li W, Margolis B, Lammers R,
58. Konagurthu AS, Whisstock JC, Stuckey PJ, Lesk AM. MUSTANG: a Ullrich A, Skolnik EY, Bar-Sagi D, Schlessinger J. The SH2 and SH3
multiple structural alignment algorithm. Proteins 2006; 64:559–574. domain-containing protein GRB2 links receptor tyrosine kinases to
59. Capra J. a, Singh M. Predicting functionally important residues ras signaling. Cell 1992; 70:431–442.
from sequence conservation. Bioinformatics (Oxford, England) 74. Filippakopoulos P, M€ uller S, Knapp S. SH2 domains: modulators of
2007; 23:1875–1882. nonreceptor tyrosine kinase activity. Curr Opin Struct Biol 2009;
60. Finn RD, Miller BL, Clements J, Bateman A. iPfam: a database of 19:643–649.
protein family and domain interactions found in the Protein Data 75. Grant SK. Therapeutic protein kinase inhibitors. Cell Mol Life Sci
Bank. Nucleic Acids Res 2014; 42:D364–D373. 2009; 66:1163–1177.
61. Notredame C, Higgins DG, Heringa J. T-Coffee: a novel method for 76. Fabian MA, Biggs WH, Treiber DK, Atteridge CE, Azimioara MD,
fast and accurate multiple sequence alignment. J Mol Biol 2000; Benedetti MG, Carter TA, Ciceri P, Edeen PT, Floyd M, Ford JM,
302:205–217. Galvin M, Gerlach JL, Grotzfeld RM, Herrgard S, Insko DE, Insko
62. MATLAB and statistics toolbox release. Natick, MA: The Math- MA, Lai AG, Lelias JM, Mehta SA, Milanov ZV, Velasco AM,
Works, Inc.; 2012. Wodicka LM, Patel HK, Zarrinkar PP, Lockhart DJ. A small
63. Yang L-W, Bahar I. Coupling between catalytic site and collective molecule-kinase interaction map for clinical kinase inhibitors. Nat
dynamics: a requirement for mechanochemical activity of enzymes. Biotechnol 2005; 23:329–336.
Structure (London, England: 1993 2005; 13:893–904. 77. Karaman MW, Herrgard S, Treiber DK, Gallant P, Atteridge CE,
64. Rother K, Hildebrand PW, Goede A, Gruening B, Preissner R. Voro- Campbell BT, Chan KW, Ciceri P, Davis MI, Edeen PT, Faraoni R,
noia: analyzing packing in protein structures. Nucleic Acids Res Floyd M, Hunt JP, Lockhart DJ, Milanov ZV, Morrison MJ, Pallares G,
2009; 37:D393–D395. Patel HK, Pritchard S, Wodicka LM, Zarrinkar PP. A quantitative analy-
65. Kannan N, Haste N, Taylor SS, Neuwald AF. The hallmark of AGC sis of kinase inhibitor selectivity. Nat Biotechnol 2008; 26:127–132.
kinase functional divergence is its C-terminal tail, a cis-acting regu- 78. Taylor SS, Haste NM, Ghosh G. PKR and eIF2alpha: integration of
latory module. Proc Natl Acad Sci USA 2007; 104:1272–1277. kinase dimerization, activation, and substrate docking. Cell 2005;
66. Yang J, Cron P, Thompson V, Good VM, Hess D, Hemmings BA, 122:823–825.
Barford D. Molecular mechanism for the regulation of protein 79. Dar AC, Dever TE, Sicheri F. Higher-order substrate recognition of
kinase B/Akt by hydrophobic motif phosphorylation. Mol Cell 2002; eIF2alpha by the RNA-dependent protein kinase PKR. Cell 2005;
9:1227–1240. 122:887–900.
67. Belham C, Wu S, Avruch J. Intracellular signalling: PDK1 – a kinase 80. Wong W, Gough NR. Focus issue: the protein dynamics of cell sig-
at the hub of things. Curr Biol 1999; 9:R93–R96. naling. Sci Signal 2009; 2:eg4.
68. Biondi RM, Cheung PC, Casamayor A, Deak M, Currie RA, Alessi 81. Smock RG, Gierasch LM. Sending signals dynamically. Science (New
DR. Identification of a pocket in the PDK1 kinase domain that York, NY) 2009; 324:198–203.
interacts with PIF and the C-terminal residues of PKA. EMBO J 82. Tokuriki N, Tawfik DS. Protein dynamism and evolvability. Science
2000; 19:979–988. (New York, NY) 2009; 324:203–207.

978 PROTEINS

You might also like