You are on page 1of 25

Protein Secondary Structure

Prediction

K.Akila,
Lecturer,
Department of Bioinformatics
Jamal Mohamed College,
Trichy.
Proteins
 Proteins play a crucial role in virtually all biological
processes with a broad range of functions.
 The activity of an enzyme or the function of a
protein is governed by the three-dimensional
structure.

H11_MOUSE
histocompatibility antigen

VE2_BPV1
Bovine DNA-binding domain
20 amino acids - the building blocks

Clickable map at: http://www.russell.embl-heidelberg.de/aas/


Secondary structure = spatial arrangement of amino­acid 
residues that are adjacent in the primary structure
Reasons for Predicting Secondary
Structure
 Starting point for prediction of tertiary and
quaternary structure.
 Insight into biological function of protein.
 Facilitate alignment for homology modeling
of distantly related proteins.
 Insight for data analysis/mutagenesis
experiments when structure is not known.
 Since secondary structure is local, just need
amino acid sequence.
Use of amino acid properties in
prediction schemes

Sequence
Propensity 
function
Other inputs

Vector of 
Sequence
propensities
Prediction 
Prediction
function
Other inputs
Primary structure
Primary structure refers to the "linear" sequence of amino acids.
Types of Secondary
Structures

 α Helices
 β Sheets
 Loops
 Coils
α Helix

 Most abundant secondary


 structure
 3.6 amino acids per turn
 Hydrogen bond formed
between every fourth
reside
 Average length: 10 amino
acids, or 3 turns
 Varies from 5 to 40 amino
acids
Contd.

 Normally found on the surface of protein cores.


 Interact with aqueous environment
– Inner facing side has hydrophobic amino acids.
– Outer-facing side has hydrophilic amino acids.

 Every third amino acid tends to be hydrophobic.


 Pattern can be detected computationally.
 Rich in alanine (A), gutamic acid (E), leucine(L), and
methionine (M).
 Poor in proline (P), glycine (G), tyrosine (Y),and serine (S).
β Sheet

 Hydrogen bonds between 5-10


consecutive amino acids in one portion
of the chain with another 5-10 farther
down the chain
 Interacting regions may be adjacent with
a short loop, or far apart with other
structures in between.
 Slight counterclockwise rotation
-Alpha carbons (as well as R side
groups) alternate above and below the
sheet
- Prediction difficult, due to wide range
of φ and ψ angles.
Loop

 Regions between α helices and βsheets.


 Various lengths and three-dimensional
configurations.
 Located on surface of the structure.
 Hairpin loops: complete turn in the polypeptide
chain, (anti-parallel β sheets).
 More variable sequence structure.
 Tend to have charged and polar amino acids.
 Frequently a component of active sites.
Secondary structure
prediction
• Historically first structure prediction methods
predicted secondary structure

• Can be used to improve alignment accuracy

• Can be used to detect domain boundaries within


proteins with remote sequence homology

• Often the first step towards 3D structure prediction

• Informative for mutagenesis studies


Contd.,

 In either case, amino acid propensities


should be useful for predicting secondary
structure
 Two classical methods that use previously
determined propensities:
– Chou-Fasman
– Garnier-Osguthorpe-Robson
Chou-Fasman method
 Uses table of conformational parameters
(propensities) determined primarily from
measurements of secondary structure by
CD spectroscopy
 Table consists of one “likelihood” for each
structure for each amino acid
Chou-Fasman propensities
(partial table)

Amino Acid Pα Pβ Pt
Glu 1.51 0.37 0.74
Met 1.45 1.05 0.60
Ala 1.42 0.83 0.66
Val 1.06 1.70 0.50
Ile 1.08 1.60 0.50
Tyr 0.69 1.47 1.14
Pro 0.57 0.55 1.52
Gly 0.57 0.75 1.56
Chou-Fasman method
 A prediction is made for each type of
structure for each amino acid
– Can result in ambiguity if a region has high
propensities for both helix and sheet (higher
value usually chosen, with exceptions)
Chou-Fasman method
 Calculation rules are somewhat ad hoc
 Example: Method for helix
– Search for nucleating region where 4 out of 6
a.a. have Pα > 1.03
– Extend until 4 consecutive a.a. have an average
Pα < 1.00
– If region is at least 6 a.a. long, has an average
Pα > 1.03, and average Pα > average Pβ,
consider region to be helix
Accuracy of Chou-Fasman
predictions
 Sequences whose 3D structures are known
are processed so that each residue is
“assigned” to a given secondary structure
class by looking at the backbone angles
 Three classes most often used (helix=H,
sheet=E, turn=C) but sometimes use four
classes (helix, sheet, turn, loop)
Confusion matrix for Chou-
Fasman method on 78 proteins

          Predicted H E C Unknown
True
H 47.5 3.0 4.3 45.2

E 20.8 16.8 7.1 55.4

C 6.4 3.6 38.0 52.0


Average accuracy = 54.4
Data from Z­Y Zhu, Protein Engineering 8:103­109, 1995
Garnier-Osguthorpe-Robson
 Uses table of propensities calculated
primarily from structures determined by X-
ray crystallography
 Table consists of one “likelihood” for each
structure for each amino acid for each
position in a 17 amino acid window
Garnier-Osguthorpe-Robson
 Analogous to searching for “features” with a
17 amino acid wide frequency matrix
 One matrix for each “feature”
α-helix
β-sheet
– turn
– coil
 Highest scoring “feature” is found at each
location
Accuracy of predictions
 GOR much better at recognizing β-sheets
 Both methods are only about 55-65%
accurate.