You are on page 1of 5

IEEE - 31661

Bioinformatics: Protein Structure Prediction

Chandrayani N.Rokde Dr.Manali Kshirsagar


Department of Computer Technology, Department of Computer Technology,
Yeshwantrao Chavan College of Engineering Yeshwantrao Chavan College of Engineering Nagpur,
Nagpur, Maharashtra, India-441110 Maharashtra, India-441110
chandrayanirokde@gmail.com Manali_kshirsagar@yahoo.com

Abstract—Proteins are essential parts of our life and participate a full biological activity until it folds into a three-dimensional
in virtually every process within a cell. The understanding of structure. Information on the secondary and three dimensional(3D)
protein structures is vital to determine the function of a protein. structures of a protein is important for understanding its biological
Protein structure prediction (PSP) from amino acid sequence is activity, because the shape and nature of the protein molecule surface
one of the high focus problems in bioinformatics today. This is account for the mechanisms of protein functions.
due to the fact that the biological function of the protein is
determined by its three dimensional structure. Thus, protein II PROTEIN STRUCTURE
structure prediction is a fundamental area of computational
biology. Its importance is intensed by large amounts of sequence Formation of protein passes through different levels of
data coming from PDB (Protein Data Bank) and the fact that structure.[2] The primary structure of a protein is simply the linear
experimentally methods such as X-ray crystallography or arrangement, or sequence, of the amino acid residues that compose it.
Nuclear Magnetic Resonance (NMR)which are used to Secondary protein structure occurs when sequence of amino acid are
determining protein structures remains very expensive and time linked by hydrogen bonds. The prediction consists of assigning
consuming. For minimizing the time ,computational methods are regions of the amino acid sequence as likely alpha helices, beta
used for protein folding and structure prediction problem.In this strands. The main goal in prediction of secondary structure is to take
paper results of protein p53 are discussed. primary structure (sequence) of protein .It is observed that due to the
size, shape and charge of amino acid side chain, each amino acid may
Keywords- Proteins,Protein Structure Prediction,Protein Folding, fit better in one type of secondary structure than another.
Computational tools used in PSP.
Tertiary structure refers to the overall conformation of a
polypeptide chain that is, the three-dimensional arrangement of all its
amino acid residues. In contrast with secondary structures, which are
I.INTRODUCTION TO PROTEINS stabilized by hydrogen bonds, tertiary structure is primarily stabilized
by hydrophobic interactions between the non polar side chains,
Proteins are main building blocks of our Life. They are hydrogen bonds between polar side chains, and peptide bonds. These
stabilizing forces hold elements of secondary structure, helices,
responsible for catalyzing and regulating biochemical reactions,
strands, turns, and random coils compactly together. Because the
transporting molecules, and they form the basis of structures such as
stabilizing interactions are weak, however, the tertiary structure of a
skin, hair, and tendon. The shape of protein is specified by its amino
protein is not rigidly fixed but undergoes continual and minute
acid sequence. There are 20 different kinds of amino acid and each
fluctuation [3]. This variation in structure has important
amino acid is identified by its side chain which determines the
consequences in the function and regulation of proteins. The final
properties of amino acid. Amino acids are separated into four groups
level of protein structure is quaternary structure, which refers to when
Non- polar Polar, Basic, Acidic, Polar and Non-Polar are again
more than one protein come together to form a complex.
categorized under Hydrophobic (attracted towards water) and
Hydrophilic (repelled by water). The combination of the properties
that allow a specific protein to form into a certain structure is not
completely known. There are many inherent properties that amino III. PROTEIN TERTIARY STRUCTURE PREDICTION
acids have that are involved in determining the structure of a protein.
One of the most important distinguishing factors of amino acids is Protein structure prediction is the prediction of the three-
their different tails which are also called the R Groups. Other factors dimensional structure of a protein from its amino acid sequence thus
play key roles in determining the final structure of a protein, these all activities of proteins are depends upon its three dimensional
include: the energy level of the structure which needs to be low and structure. Structure prediction is fundamentally different from the
stable and links between amino acids [1]. A protein does not exhibit inverse problem of protein design. The three-dimensional structure of
a protein is determined by the network of covalent and non-covalent
interactions.[4]Although protein is constructed by the polymerization

4th ICCCNT 2013


July 4-6, 2013, Tiruchengode, India
IEEE - 31661

of only 20 different amino acids into linear chains, proteins carry out VI.EXPERIMENTAL WORK
an incredible array of diverse tasks. A protein chain folds into a
As of today, hundreds of servers and tools are widely available for
unique shape that is stabilized by noncovalent interactions between
protein structure prediction. For protein threading, methods such as
regions in the linear sequence of amino acids. This spatial
FASTA and Basic Local Alignment Search Tool (BLAST) were
organization of a protein its shape in three dimensions is a key to
developed to perform rapid searches for sequence homologues in
understanding its function. Only when a protein is in its correct three-
large sequence database.These methods produce relatively accurate
dimensional structure, or conformation, is it able to function
approximate sequence alignment by quickly finding sub-sequences in
efficiently.[5] A key concept in understanding how proteins work is
the databases. The two most popular databases for protein structure
that function is derived from three-dimensional structure, and three-
are the Protein Data Bank (PDB) and the NCBI Protein Database.
dimensional structure is specified by amino acid sequences.
Protein p53 tumor suppressor is a flexible molecule composed of four
identical protein chains[9].Flexible molecules are difficult to study by
x-ray crystallography because they do not form orderly crystals, and
IV. METHODS USED IN PSP if they do crystallize, The p53 protein is a phosphoprotein made of
393 amino acids. It consists of four units (or domains):
There are three main strategies for solving the PSP(Protein  A domain that activates transcription factors.
structure prediction) problem: homology (comparative) techniques,  A domain that recognizes specific DNA sequences (core
protein threading (fold recognition), and Ab initio (de novo) domain).
techniques. Homology modeling is a knowledge-based approach,
given a sequence database, use multiple sequence alignment on this  A domain that is responsible for the tetramerization of the
database to identify structurally conserved regions and construct protein.
structure backbone and loops based on these regions, restore side-  A domain that recognized damaged DNA, such as
chains and refine through energy minimization. Homology modeling misalignd base pairs or single-stranded DNA.
is for "easier" targets.Accuracy of the prediction is 60%.Protein Structure by parts
threading is carried out when sequence similarity with structure is
Greater than 25%. Protein threading is for those targets with only Most of the p53 mutations that cause cancer are found in the
fold-level homology found Protein threading is for "harder" targets. DNAbinding domain. The most common mutations are shown here,
Accuracy of the prediction is 40%. The goal of Ab initio protein using PDB entry 1tup. This PDB entry includes three copies of the
structure prediction is to predict a protein's structure accurately by DNA-binding domain; only one (chain B in the file) is shown here.
focusing on the chemical and physical properties of the amino acid The mutations are found in and around the DNA-binding face of the
sequence making up the mature protein[6]. This method is too slow protein. The most common mutation changes arginine 248, colored
and inaccurate and used for novel targets. Every two years, the red here. Notice how it snakes into the minor groove of the DNA
performance of current methods is assessed in the CASP experiment (shown in blue and green), forming a strong stabilizing interaction.
stands for Critical Assessment of Techniques for Protein Structure When mutated to another amino acid, this interaction is lost. Other
Prediction. key sites of mutation are shown in pink, including arginine residues
175, 249, 273 and 282, and glycine 245. Some of these contact the
DNA directly, and others are involved in positioning other DNA-
V.FOLD RECOGNITION binding amino acids.

Proteins fold due to hydrophobic effect, Vander Waals


interactions, electrostatic forces, and Hydrogen bonding. Protein
threading, also known as fold recognition, is a method of protein
modelling (i.e. computational protein structure prediction) which is
used to model those proteins which have the same fold as proteins of
known structures, but do not have homologous proteins with known
structure[7]. PROTEIN folding is the process by which a protein
assumes its 3D structure. All protein molecules are endowed with a
primary structure consisting of the polypeptide chain [8]. Fold
recognition requires a criterion to identify the best template for one
target sequence. The protein fold-recognition approach to structure
prediction aims to identify the known structural framework (that is.
the backbone of an experimentally determined protein structure) that
accommodates the target protein sequence in the best way. Typically, Fig1.1. structure of protein P53 Fig 1.2Folded structure of
a fold-recognition program comprises four components: protein P53
(1) The representation of the template structures (usually
corresponding to proteins from the Protein Data Bank database),
(2) The evaluation of the compatibility between the target sequence
and a template fold,
(3) The algorithm to compute the optimal alignment between the
target sequence and the template structure, and the way the ranking
is computed and the statistical significance is estimated .

4th ICCCNT 2013


July 4-6, 2013, Tiruchengode, India
IEEE - 31661

VII.PHYRE b. Secondary structure and disorder prediction

The Phyre and Phyre2 servers predict the three-dimensional structure The user-submitted protein sequence is first scanned against a large
of a protein sequence using the principles and techniques sequence database using PSI-BLAST. The profile generated by PSI-
of homology modeling. Because the structure of a protein is more BLAST is then processed by the neural network secondary structure
conserved in evolution than its amino acid sequence, a protein prediction program PsiPred and the protein disorder predictor
sequence of interest (the target) can be modeled with reasonable Disopred. [9]The predicted presence of alpha-helices, beta-strands
accuracy on a very distantly related sequence of known structure (the and disordered regions is shown graphically together with a color-
template), provided that the relationship between target and template coded confidence bar.
can be discerned through sequence alignment. Currently the most
powerful and accurate methods for detecting and aligning remotely
related sequences rely on profiles or hidden Markov models
(HMMs). These profiles/HMMs capture the mutational propensity of
each position in an amino acid sequence based on observed mutations
in related sequences and can be thought of as an 'evolutionary
fingerprint' of a particular protein.

IX.RESULTS

Tested a amino acid sequence of protein p53(Homo sapiens) on


Phyre server and the results is divided into three regions .

MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPD
DIEQWFTEDPGPDEAPRMPEAA
PRVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGT
AKSVTCTYSPALNKMFCQLAKT
CPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVRRCPHHERCSDSD
GLAPPQHLIRVEGNLRVEYLDDRN
TFRHSVVVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILTIITLED
SSGNLLGRNSFEVRVCACPGR
DRRTEKENLRKKGEPHHELPPGSTKRALPNNTSSSPQPKKKPLDGEYF
TLQIRGRERFEMFRELNEALEL
KDAQAGKEPGGSRAHSSHLKSKKGQSTSRHKKLMFKTEGPDSD

a. Domain Analysis

Fig 1.3 Fig shows independent secondary structure


prediction methods are shown together with a consensus prediction.
Red horizontal bars indicate predicted alpha helices, blue bars
represent beta strands and gray bars indicate coil. The color-coded
numbers in the Cons_prob row indicate the confidence of the
prediction at eachposition from 0 (low confidence) to 9 (high
confidence). Similarly, in the Disorder prediction, positions are
flagged as „d‟ for disordered and „o‟ for ordered, with a likelihood of
disorder between 0 (low) and 9(high) reported in the Diso_prob row.

Fig 1.2 shows domain analysis of protein after clicking on red


portion we get detailed information of the template.

4th ICCCNT 2013


July 4-6, 2013, Tiruchengode, India
IEEE - 31661

c. Detailed Template View XI REFERENCES


[1] Gutachter:Prof. Dr. Martin Vingron, Walaa Fathy Ahmed Walid
Gomaa” “Approaches to protein structures” 978-1-4577-0476-5/112011
IEEE.

[2] Marco Vassura, Luciano Margara, Pietro Di Lena, Filippo Medri,


Piero Fariselli, and Rita Casadio,” Reconstruction of 3D Structures from
Protein Contact Maps”, VOL. 5, NO. 3, JULY-SEPTEMBER 2008

[3]Maciej Kicinski,” AB INITIO PROTEIN STRUCTURE


PREDICTION ALGORITHMS” (2011). Master's Projects. Paper 165.

[4]Hongyu Zhang “Protein Tertiary Structures:Prediction from Amino


Acid” Sequences ENCYCLOPEDIA OF LIFE SCIENCES / & 2002
Macmillan Publishers Ltd.

[5] D.T. Jones, “THREADER: protein sequence threading by double


dynamic Programming,” In Computational Methods in Biology (ed. S.
Salzberg,D. Searl, and S. Kasif), Amsterdam: Elsevier Science, 1998.

Fig 1.4 Cropped view of the top two hits in the primary table of [6] J. Skolnick, D. Kihara, and Y. Zhang, “Development and large scale
the fold recognition results, including images of the protein models benchmark testing of the PROSPECTOR 3 threading algorithm,”
produced and descriptors of the fold and super family of the template Proteins, vol. 56, pp. 502-518, 2004.
used. Table cells with a red background highlight particularly
confident. [7]Daisuke Kihara, Hui Lu,Andrzej Kolinski , and Jeffrey
Skolnick(2001) “TOUCHSTONE: An ab initio protein structure
All over results shows that there are 231 residues (59% of your prediction method that uses threading based tertiary restraints”
sequence) have been modelled with 100.0% confidence by the single
highest scoring template.
[8] I.Cymerman, M. Feder, M. PawŁowski, M.A. Kurowski,J.M.
Bujnicki “Computational Methods for Protein Structure Prediction and
Fold recognition” Nucleic Acids and Molecular Biology,Vol. 15
X.CONCLUSION
Springer-Verlag Berlin Heidelberg 2004.
In spite of the great advances in experimental technique and the
tremendous boost in computational power we have witnessed in the [9] D. Lane, A.J. Levine B. Vogelstein “Additional information on p53
past decades, the protein folding problem is still far from solved.We tumor suppressor” Nature 408, pp. 307-310,200
already have a good understanding of some of the simpler protein‐
folding mechanisms, and we can simulate the folding of a few, very
small proteins. Therefore, the research of folding continues, and in all [9] Kelley LA and Sternberg MJE Nature Protocols Protein structure
likelihood will bring about great advances in the development of new prediction on the web “A case study using Phyre server” 4, 363-371
experimental techniques and new theoretical approaches. (2009)

[10] Mount, David W. Bioinformatics: Sequence and Genome Analysis.


Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press 2004.

[11]http://www.biophysics.org/blot/seq_empirical.html

[12] http://www.sbg.bio.ic.ac.uk/~phyre2/html/nprot.2009.2.pdf

[13] http://www.ncbi.nlm.nih.gov/gene/715

4th ICCCNT 2013


July 4-6, 2013, Tiruchengode, India
IEEE - 31661

4th ICCCNT 2013


July 4-6, 2013, Tiruchengode, India

You might also like