You are on page 1of 57

STRUCTURAL AND

FUNCTIONAL BIOINFORMATICS
Primary & Secondary Structure
Prediction
LEVELS OF PROTEIN
STRUCTURE

 Proteins have complicated structures which


are necessary for them to perform their
varied functions.

 Scientists have divided these structures into


four levels, ranging from primary to
quaternary.
 The primary structure is merely the order
of bonded amino acids in a protein. For
example, MAGTAK is a protein with the
sequence Met-Ala-Gly-Thr-Ala-Lys, where
Methionine is at the amino terminus and
Lysine is at the carboxyl terminus.

 The secondary structure is the first type


of folding the protein undergoes. There
are three basic types of secondary
structures: alpha helix, beta strand, and
coil.
 Relatively accurate structure prediction programs
that can successfully predict the secondary
structure of a protein when the sequence is known
have been developed.

 Once the protein begins to fold back onto itself, it


forms a tertiary structure.

 There are many types of tertiary structures found


in proteins, and predicting the tertiary structure
from a primary sequence is a challenge.
LEVELS OF DESCRIPTION

 protein structure is often described at four


different scales

 – primary structure
 – secondary structure
 – tertiary structure
 – quaternary structure
PRIMARY STRUCTURE

 The primary structure refers to amino acid linear


sequence of the polypeptide chain.

 The primary structure is held together by covalent


bonds such as peptide bonds, which are made
during the process of protein biosynthesis or
translation.
 The two ends of the polypeptide chain are
referred to as the carboxyl terminus (C-terminus)
and the amino terminus (N-terminus) based on the
nature of the free group on each extremity.

 Counting of residues always starts at the N-


terminal end (NH2-group), which is the end where
the amino group is not involved in a peptide bond.

 The primary structure of a protein is determined


by the gene corresponding to the protein. A
specific sequence
of nucleotides in DNA is transcribed into mRNA,
which is read by the ribosome in a process called
translation.
 Post-translational modifications such as
disulfide formation, phosphorylations and
glycosylations are usually also considered a
part of the primary structure, and cannot be
read from the gene.

 Example: Insulin is composed of 51 amino


acids in 2 chains. One chain has 31 amino
acids and the other has 20 amino acids.
PEPTIDE BOND

 When the carboxylic acid (-COOH) linked to the


C of one amino acid condenses with the amino
group (-NH2) bounded to the C of the next amino
acid and a water molecule is expulsed, it says
the peptide bond has been formed.

 The peptide bond is also called amide bond and


its nature is covalent, thus the strongest
interactions implying a pair of electrons to be
shared.
 The atoms involved in forming the peptide
bond are referred to as the backbone atoms.
They are the nitrogen of the amino group,
the α carbon to which the side chain is
attached and carbon of the carbonyl group.
DIHEDRAL ANGLES
 One is the dihedral angle along the N–Cα
bond, which is defined as phi (φ); and the
other is the angle along the Cα–C bond,
which is called psi (ψ).

Angles “phi” and “psi” are free to rotate but are limited
by spatial constraints
PROTEIN SECONDARY STRUCTURE

 Secondary structure refers to highly regular


local sub-structures.

 Two main types of secondary structure,


the alpha helix and the beta strand or beta
sheets.
The secondary structure of
protein depends on hydrogen
bonding between C=O and
N-H groups.
Alpha Helix and Beta Pleated
sheets
EXPLANATION
 Proteins are polymers of 20 different amino
acids linked by specific type of bond, the
peptide bond. The direct chain translated
from the genetic code within ribosomes using
mRNA as template is called the primary
structure of the protein.

 When hydrogen bonds are being formed


between the N-H and -C=O groups of the
invariant parts of the amino acids, the
backbone chain that contains them can
adopt either alpha helices or beta strands
CLASSES
 1.Alpha class: proteins only consisting of Alpha
helices connected by loops,
 2. Beta-class: proteins only consisting of beta
sheets,
 3. Alpha/beta-class: beta-sheets and Alpha
helices combined, e.g. parallel-sheets
connected by -helices,
 4. (Alpha+beta) - class: Alpha-helices are
separated from beta-sheets,
 5. (Alpha and beta)-class: multi-domain, that
means the Alpha-helices are not in contact with
the beta-sheets, membrane and cell surface
proteins.
ALPHA HELICES
 Alpha-helices are regular cylindrical structures
and one of the most common secondary
structures in proteins.

 They are generated by hydrogen bonding


between the CO group of one residue n and the
NH group of the n+4 residue being all close
together.

 All the carboxyl and amide groups are hydrogen


bonding except the ones corresponding to the
carboxy-terminal end and amide-terminal end.
 There are 3.6 residues per turning a common
–helix.
BETA-SHEET
 Another common secondary structure. In contrast
to the alpha-helix, it is formed by hydrogen bonds
between backbone atoms on adjacent regions of
the peptide backbone, called beta-strands.

 These interactions do not involve side chains.


Thus, many different sequences can form a beta-
sheet.
 A beta-sheet is a regular and rigid structure
often represented as a series of flattened
arrows.

 Each arrow points towards the protein’s C-


terminus side to have distinct properties
from the other.
PARALLEL AND ANTIPARALLEL
BETA SHEETS
 There are two types depending on the
orientation the strands run:

 Parallel beta sheets when the strands run in


the same direction and

 Anti parallel beta sheets when they run to


one another in opposite directions.

 Also mixed beta sheets have been observed.


TURNS
 Turns are also known as hairpin reverse turn or
beta turn.

 It is considered as the simplest secondary


structure element and the simple way to satisfy
the hydrogen bonding capability of the peptide
bond.

 It makes up as hydrogen bond between the


carbonyl oxygen (-CO) of the residue n and the
amide hydrogen (-NH) of the residue n+3.
Example of beta turns
LOOPS
 Loops are tails of the polypeptide chains that
connect regions of secondary structure
involving hydrogen bonding and packing
interactions with the rest of the structure.
COILED COIL
 In a typical coiled-coil two alpha-helices wrap
around each other to form a stable structure.

 One side of each helix contains mostly aliphatic


amino acids, such as leucines and valines, while
the other side contains mostly polar residues.

 Helices containing distinct hydrophobic and


polar sides are called amphipathic.
NEXT CLASS
FUNCTIONAL ASPECTS OF
SECONDARY STRUCTURE ELEMENTS

 DNA binding

 α-helices have particular significance


in DNA binding motifs, including helix-turn-
helix motifs, leucine zipper motifs and zinc
finger motifs.
MEMBRANE SPANNING

 α-helices are also the most common protein


structure element that crosses biological
membranes.
COMMON STRUCTURAL MOTIFS

 A very simple structural motif involving β


sheets is the β hairpin, in which two
antiparallel strands are linked by a short loop
of two to five residues, of which one is
frequently a glycine or a proline,

 However, individual strands can also be linked


in more elaborate ways with long loops that
may contain alpha helices or even entire
protein domains.
 Some other important protein domains that
are formed by protein structures include
 Greek key motif
 The Greek key motif consists of four adjacent
antiparallel strands and their linking loops.

 The β-α-β motif

 β-meander motif
 A simple supersecondary protein topology
composed of 2 or more consecutive
antiparallel β-strands linked together
by hairpin loops.
SUPERSECONDARY STRUCTURES

• Arthur Lesk improved the


Linderstorm-Lang
classification

• Secondary structures
interact to form
supersecondary structures

http://bigpictureeducation.com/sites/default/files/styles/gallery_large/public/GENERALB0002608_0.jpg?
itok=MFBG1YTB
PROTEIN SECONDARY
STRUCTURE PREDICTION
PROTEIN SECONDARY
STRUCTURE PREDICTION
 The secondary structure prediction methods can be
either ab initio based, which make use of single
sequence information only, or homology based, which
make use of multiple sequence alignment information.

 The ab initio methods, which belong to early


generation methods, predict secondary structures
based on statistical calculations of the residues of a
single query sequence.

 The homology-based methods do not rely on statistics


of residues of a single sequence, but on common
secondary structural patterns conserved among
multiple homologous sequences.
AB INITIO–BASED METHODS
 This type of method predicts the secondary
structure based on a single query sequence.

 It measures the relative propensity of each amino


acid belonging to a certain secondary structure
element. The propensity scores are derived from
known crystal structures.

 Examples of ab initio prediction are the Chou–


Fasman and Garnier, Osguthorpe, Robson (GOR)
methods. The ab initio methods were developed
in the 1970s
 The Chou–Fasman algorithm (
http://fasta.bioch.virginia.edu/fasta/chofas.ht
m
) determines the propensity or intrinsic
tendency of each residue to be in the helix,
strand, and β-turn conformation using observed
frequencies found in protein crystal structures.

 For example, it is known that alanine, glutamic


acid, and methionine are commonly found in α-
helices, whereas glycine and proline are much
less likely to be found in such structures.
CALCULATION OF PROPENSITY
SCORE
 The calculation of residue propensity scores is
simple. Suppose there are n residues in all
known protein structures from which m
residues are helical residues. The total
number of alanine residues is y of which x are
in helices.

 The propensity for alanine to be in helix is


the ratio of the proportion of alanine in
helices over the proportion of alanine in
overall residue population (using the formula
[x/m]/[y/n]).
 If the propensity for the residue equals 1.0 for
helices (P[α-helix]), it means that the residue has
an equal chance of being found in helices or
elsewhere.
 If the propensity ratio is less than 1, it indicates
that the residue has less chance of being found in
helices.
 If the propensity is larger than 1, the residue is
more favored by helices.
 Based on this concept, Chouand Fasman developed
a scoring table listing relative propensities of each
amino acid to be in an α-helix, a β-strand, or a β-
turn
HOMOLOGY-BASED METHODS
 This type of method combines the ab initio
secondary structure prediction of individual
sequences and alignment information from
multiple homologous sequences (>35%
identity).

 The idea behind this approach is that close


protein homologs should adopt the same
secondary and tertiary structure.
 When each individual sequence is predicted for
secondary structure using a method similar to
the GOR method, errors and variations may
occur.

 However, evolutionary conservation dictates


that there should be no major variations for
their secondary structure elements. Therefore,
by aligning multiple sequences, information of
positional conservation is revealed. Because
residues in the same aligned position are
assumed to have the same secondary structure.
PREDICTION WITH NEURAL
NETWORKS

 A neural network is a machine learning


process that requires a structure of multiple
layers of interconnected variables or nodes.

 In secondary structure prediction, the input


is an amino acid sequence and the output is
the probability of a residue to adopt a
particular structure.
 Between input and output are many
connected hidden layers where the machine
learning takes place to adjust the
mathematical weights of internal
connections.

 The neural network has to be first trained by


sequences with known structures so it can
recognize the amino acid patterns and their
relationships with known structures.
Structure Prediction by Neural networks
TERTIARY STRUCTURE

 Tertiary structure refers to three-


dimensional structure of a single protein
molecule.

 The alpha-helices and beta-sheets are


folded into a compact globule.
 The folding is driven by the non-
specific hydrophobic interactions (the burial
of hydrophobic residues from water), but the
structure is stable only when the parts of a
protein domain are locked into place
by specific tertiary interactions, such as 

 salt bridges,
 hydrogen bonds,
 and the tight packing of side chains
and disulfide bonds.
INTERACTIONS
 Tertiary protein structure refers to the complete three
dimensional folding of a protein. Stabilization of a protein's
tertiary structure may involve interactions between amino
acids located far apart along the primary sequence.
 These may include:

 weak interactions such as hydrogen bonds and Van der


Waals interactions.

 ionic bonds involving negatively charged and positively


charged amino acid side-chain groups.

 disulfide bonds, covalent linkages that may form as the


thiol groups of two cysteine residues are oxidized to a
disulfide
Tertiary structure is determined by the
interactions between the side chains (R groups)
Quaternary structure is the overall protein structure
resulting from combinations of polypeptide subunits
Summary of the Four Levels of Protein Structure
CONFORMATIONAL CHANGES OF
PROTEINS
 A protein may undergo reversible structural
changes in performing its biological function.

 Protein function depends on specific


conformation (shape)

 The alternative structures of the same


protein are referred to as
different conformations, and transitions
between them are called conformational
changes.
WHAT DETERMINES CONFORMATION?
 In general, the amino-acid sequence of a
protein determines the 3D shape of a protein
[Anfinsen et al., 1950s]
 • but some limitations
 allproteins can be denatured
 some proteins are inherently disordered (i.e. lack a
regular structure)
 there are various mechanisms through which the
conformation of a protein can be changed in vivo
 post-translational modifications such as
phosphorylation
 etc.
WHAT DETERMINES CONFORMATION?
 Which physical properties of the protein
determine its fold?
 rigidity of the protein backbone
 interactions among amino acids, including
 electrostatic interactions
 van der Waals forces
 volume constraints
 hydrogen, disulfide bonds
 interactions of amino acids with water
 hydrophobic and hydrophilic residues
ASSIGNMENT # 2
 Supersecondary structures their examples
and functions
 Number or amino acids in alpha helices and
beta sheets. Minim numbers of amino acids
involved in making alpha helices and beta
sheets.
 Assignment # 2 due next week on Monday
(30th September)

 Quiz 1 from lecture 1 and 2 on 26 September

You might also like