You are on page 1of 53

Comparative Protein Modelling

Gajigan | Lopez | Palmario | Tan | Sotelo

THE PROBLEM
Can

we predict the 3-dimensional shape of a protein given its amino acid sequence alone? NOOOOOO

Generally,

But

some methods give partial description of 3D structure of proteins

PROTEINS
Amino

acids + peptide bonds = proteins

R groups distinguish different amino acids


Fig. 1 General structure of an amino acid

PROTEINS

DIFFERENT AMINO ACIDS

WHAT DETERMINES PROTEIN FOLDING?

Generally, the aa sequence determines the 3D shape

Exceptions: Protein denaturation Multiple conformations Chaperones

What determines protein fold? Rigidity of backbone Interactions among amino acids Amino acid interaction with water

LEVELS OF PROTEIN STRUCTURE

SECONDARY STRUCTURE
Folding

of the linear sequence of proteins into regular repeating patterns helix sheets Coil or loop

SECONDARY STRUCTURES

Beta pleated sheet conformation

Alpha helix conformation

DETERMINING PROTEIN STRUCTURE


X-ray NMR Prediction

crystallography by computational means


PDB Yearly Growth of Protein Structures

CATH TAXONOMY

Database containing hierarchical domain classifications of protein structures from PDB C Class, C-level

Defined by secondary structure composition Defined by overall shape of domain structure Defined by overall shape and connectivity of domain structures

A Architecture, A-level

T Topology (Fold family), T-level

H Homologous superfamily, H-level

PROTEIN STRUCTURE PREDICTION

Prediction in 1D
Secondary structure Solvent accessibility Transmembrane helices

Prediction in 2D

Inter-residue/strand contacts

Prediction in 3D
Homology modeling Fold recognition Ab initio prediction

1D SECONDARY STRUCTURE
Given

: Amino acid sequence

What

to do? Predict secondary structure conformation of each amino acid , , or c (coil)

SECONDARY STRUCTURE PREDICTION : ANOTHER APPROACH

Make prediction for a given residue by considering a window of n neighboring residues Determine model that performs mapping from window of residues to secondary structure state

Homology Modelling

Protein Structure Prediction

Protein Homology Modelling

based protein structure is more conserved than protein sequence Assumption:

Homologous protein sequences very similar 3D structure

Most accurate when the target and template have similar sequences
template homologous proteins structure was determined using high resolution experimental methods (i.e., X-ray crystallography or NMR)

Steps in Homology Modelling

Steps in Homology Modelling

Selection of Template with known tertiary structure

use sequence alignment search programs (e.g. BLAST) to identify homologous sequence from protein structure databases like PDB Selection of template can be:

Select template with the highest sequence identity Select potentially different template for each similar segment of the protein sequence

Steps in Homology Modelling

Other factors ion selecting template

Resolution of Template structure

Better to use high resolution structures as model template function Ligands environment

Other source of similarity


Steps in Homology Modelling

Aligning protein sequence with templates

Target and template are aligned using


Pair-wise alignment (e.g. Smith-Waterman) Multiple sequence alignment (e.g. CLUSTAL)

accuracy of the alignment --> critical parameter for successful homology modelling

Steps in Homology Modelling

alignment method maximize


maximize sequence similarity (typical) maximize structural similarity (others)

Steps in Homology Modelling

Alignment defines structurally equivalent position

Steps in Homology Modelling

Building of model structures

Build a model using the known structures of homologous template protein Common modelling methods use: by assembly of rigid bodies

(e.g., COMPOSER, SWISS-MODEL)

by segment matching or coordinate reconstruction

(e.g. SEGMOD)
(e.g. MODELLER)

By satisfaction of spatial restraints

Steps in Homology Modelling

Modelling by assembly of rigid bodies

model is assembled from a small number of rigid bodies obtained from the aligned protein structures Proteins can be dissected into

conserved core regions variable loops connect conserved core region Sidechains decorate the backbone

Steps in Homology Modelling


template structures are selected and superposed
framework is obtained by averaging the coordinates of the atoms of structurally conserved regions Loops are generated fit the anchor core regions and have a compatible sequence sidechains are modelled based on their intrinsic conformational preferences

COMPOSER and SWISS-MODEL

Steps in Homology Modelling

Modelling by segment matching or coordinate reconstruction

based the findings that most hexapeptide segment of protein structure can be clustered into only 100 structurally different classes

Segments on the template usually the conserved segment serve as guiding position

Segments of the target protein fit on these guiding position will be identified and assembled

Protein model will be constructed


SEGMOD

Steps in Homology Modelling

Modelling by satisfaction of spatial restraints

starts by generating many constraints or restraints on the structure of target sequence restraints are obtained

assuming that the corresponding distances between aligned residues in the template and the target structures are similar Considering stereochemical restraints on bond lengths, bond angles, dihedral angles, and non-bonded atomatom contacts that

Steps in Homology Modelling

The model is then derived by minimizing the violations of all the restraints which is achieved either by distance geometry or realspace optimization MODELLER-software used

Steps in Homology Modelling

Evaluation constructed model

Validity of the constructed model must be checked Evaluate the stereochemistry and other structural features of the model (e.g., bond lengths, and dihedral angles, side chain rotamers, etc)

Examples of programs PROCHECK and WHATCHECK

Steps in Homology Modelling

Checking of spatial features of the model

hydrophobic core, solvent accessibility, distribution of charged groups, atom-atomdistances, atomic volumes and main-chain hydrogen bonding

a number of online servers are available to evaluate 3D models including PSVS, Eval123D and JCSG.

Steps in Homology Modelling

final model must be consistent with experimental observations,


site-directed mutagenesis cross-linking data ligand binding

Common Errors in Homology Modelling

Inaccurate or incorrect constructed model may arise from

mistakes in alignment of the sequence to the template selecting wrong template errors in modelling side chains error in modelling sequence segments without template

Limitations of Homology Modelling

Large bias to template Cant study conformational changes, Cant find new catalytic or active side Cant explain the activity or lack of activity of the protein

Limitations of Homology Modelling

Protein Threading:
What It Is, When To Do It and How It Is Done

Homology Modeling has its limitations. so Protein Threading makes up for it.
So, when should we do it?
1. We have a sequence of unknown structure. 2. The sequence has no detectable homology to anything of known structure. 3. There are no functional clues as to the structural class of the unknown.
But these situations arent always recognised.

The ideas behind protein threading


There are limited numbers of basic folds found in nature. (1000 to 10, 000) Amino acid preferences for different structural environments provide sufficient information to choose among folds.
In other words, Protein Threading is a knowledge-based technique.

But what exactly is it?

So, protein threading is


Since there are only a limited number of folds in nature, we can find candidate folds, thread a protein through it, and score a proteins fit.

Its complicated and going obsolete, but heres how it works.

First, we have a sequence. MA A G Y AV L S

Second, we run it through candidate folds.

It wont work without math.

Third, we perform complicated math.

Finally, we get the highest score!

Heres our structural model.

How the scoring works

Score Function Measures match of unknown sequence and target sequence. Number of amino acids of type i in the environment m

Unknown Sequence

The score of the number of amino acids in the environment

Target Sequence

Lets see how it works.


Heres a sequence with unknown structure.

But the good thing is, we know the characteristics of the amino acids present.
H bond donor H bond acceptor Glycine Hydrophobic

Lets run the sequence through a library of folded proteins.


Candidate # 1.
S = 20

Candidate # 2.
S=5

Candidate # 3. S = -3

And the winner is CANDIDATE # 1.

Heres a sample scoreboard.


Amino Acid Type

Position on Sequence

The scores do have a basis


So, when do we get a high score?

We get it when the sequence of amino acids in the unknown highly correspond to that of the target sequence.
The factors that account for the correspondence are as follows: - amino acid preferences for solvent accessibility

- amino acid preferences for particular secondary structures


- interactions among spatially neighboring amino acids

Protein threading will be obsolesced without ever really having had a phase of glory. (Torda, 2003)
Less than 30% of the predicted first hits are true remote homologues.

So, lets not waste time on protein threading when

The sequence already has a very high homology with a known structure. The protein has unusual characteristics.

It doesnt have a structure in the presence of a cofactor or a prosthetic group.

The protein is membrane-bound.

The scoring function assumes that water serves as solvent.

But then again, times have changed Tools now exist to get reliable scores.

Applications and Innovations

Application
Usually, homology modeling is applied in the following fields:
1. 2.

Drug design Analysis of protein function


a)
b) c)

Protein interaction Antigenic behavior of proteins Protein stability studies

3.

Alternative path to experimental design

Innovation
MODELLER Open source software Can be used to model proteins and docking Produces outputs which does not include H atoms Flexible

Thank you!