You are on page 1of 25

Protein Secondary Structure


Department of Bioinformatics
Jamal Mohamed College,
 Proteins play a crucial role in virtually all biological
processes with a broad range of functions.
 The activity of an enzyme or the function of a
protein is governed by the three-dimensional

histocompatibility antigen

Bovine DNA-binding domain
20 amino acids - the building blocks

Clickable map at:

Secondary structure = spatial arrangement of amino­acid 
residues that are adjacent in the primary structure
Reasons for Predicting Secondary
 Starting point for prediction of tertiary and
quaternary structure.
 Insight into biological function of protein.
 Facilitate alignment for homology modeling
of distantly related proteins.
 Insight for data analysis/mutagenesis
experiments when structure is not known.
 Since secondary structure is local, just need
amino acid sequence.
Use of amino acid properties in
prediction schemes

Other inputs

Vector of 
Other inputs
Primary structure
Primary structure refers to the "linear" sequence of amino acids.
Types of Secondary

 α Helices
 β Sheets
 Loops
 Coils
α Helix

 Most abundant secondary

 structure
 3.6 amino acids per turn
 Hydrogen bond formed
between every fourth
 Average length: 10 amino
acids, or 3 turns
 Varies from 5 to 40 amino

 Normally found on the surface of protein cores.

 Interact with aqueous environment
– Inner facing side has hydrophobic amino acids.
– Outer-facing side has hydrophilic amino acids.

 Every third amino acid tends to be hydrophobic.

 Pattern can be detected computationally.
 Rich in alanine (A), gutamic acid (E), leucine(L), and
methionine (M).
 Poor in proline (P), glycine (G), tyrosine (Y),and serine (S).
β Sheet

 Hydrogen bonds between 5-10

consecutive amino acids in one portion
of the chain with another 5-10 farther
down the chain
 Interacting regions may be adjacent with
a short loop, or far apart with other
structures in between.
 Slight counterclockwise rotation
-Alpha carbons (as well as R side
groups) alternate above and below the
- Prediction difficult, due to wide range
of φ and ψ angles.

 Regions between α helices and βsheets.

 Various lengths and three-dimensional
 Located on surface of the structure.
 Hairpin loops: complete turn in the polypeptide
chain, (anti-parallel β sheets).
 More variable sequence structure.
 Tend to have charged and polar amino acids.
 Frequently a component of active sites.
Secondary structure
• Historically first structure prediction methods
predicted secondary structure

• Can be used to improve alignment accuracy

• Can be used to detect domain boundaries within

proteins with remote sequence homology

• Often the first step towards 3D structure prediction

• Informative for mutagenesis studies


 In either case, amino acid propensities

should be useful for predicting secondary
 Two classical methods that use previously
determined propensities:
– Chou-Fasman
– Garnier-Osguthorpe-Robson
Chou-Fasman method
 Uses table of conformational parameters
(propensities) determined primarily from
measurements of secondary structure by
CD spectroscopy
 Table consists of one “likelihood” for each
structure for each amino acid
Chou-Fasman propensities
(partial table)

Amino Acid Pα Pβ Pt
Glu 1.51 0.37 0.74
Met 1.45 1.05 0.60
Ala 1.42 0.83 0.66
Val 1.06 1.70 0.50
Ile 1.08 1.60 0.50
Tyr 0.69 1.47 1.14
Pro 0.57 0.55 1.52
Gly 0.57 0.75 1.56
Chou-Fasman method
 A prediction is made for each type of
structure for each amino acid
– Can result in ambiguity if a region has high
propensities for both helix and sheet (higher
value usually chosen, with exceptions)
Chou-Fasman method
 Calculation rules are somewhat ad hoc
 Example: Method for helix
– Search for nucleating region where 4 out of 6
a.a. have Pα > 1.03
– Extend until 4 consecutive a.a. have an average
Pα < 1.00
– If region is at least 6 a.a. long, has an average
Pα > 1.03, and average Pα > average Pβ,
consider region to be helix
Accuracy of Chou-Fasman
 Sequences whose 3D structures are known
are processed so that each residue is
“assigned” to a given secondary structure
class by looking at the backbone angles
 Three classes most often used (helix=H,
sheet=E, turn=C) but sometimes use four
classes (helix, sheet, turn, loop)
Confusion matrix for Chou-
Fasman method on 78 proteins

          Predicted H E C Unknown
H 47.5 3.0 4.3 45.2

E 20.8 16.8 7.1 55.4

C 6.4 3.6 38.0 52.0

Average accuracy = 54.4
Data from Z­Y Zhu, Protein Engineering 8:103­109, 1995
 Uses table of propensities calculated
primarily from structures determined by X-
ray crystallography
 Table consists of one “likelihood” for each
structure for each amino acid for each
position in a 17 amino acid window
 Analogous to searching for “features” with a
17 amino acid wide frequency matrix
 One matrix for each “feature”
– turn
– coil
 Highest scoring “feature” is found at each
Accuracy of predictions
 GOR much better at recognizing β-sheets
 Both methods are only about 55-65%