You are on page 1of 50

Introduction to Homology Modeling

Yuk Sham Supercomputing Institute (612) 626-0802 (help) (612) 624-7427 (direct) help@msi.umn.edu shamy@msi.umn.edu

Outline
Introduction
What is homology modeling

Theory
Basic principles and procedure

InsightII and homology module


How to start it

Building your first model from sequence


Dihydrofolate Reductase

Improving your model


Relaxation

Introduction
Goal
Create a 3-D structure of a protein

Why
When the 3-D structure is not available from X-ray or NMR studies

What do you need to know


Protein sequence of interest (target sequence) Protein structure of at least one homologous protein (3-D template)

Introduction
What is a homologous protein
A protein that have a high sequence similarity to the target protein sequence

Basic assumption of Homology Modeling


Proteins sharing high sequence similarity should have similar protein fold . . . . (if they dont misfold)

Introduction
There are exceptions !!
Protein can be designed with a few mutations/insertions/deletions to completely unfold, misfold and even fold to another secondary structure Proteins may adopt completely different folds depending on environmental variables such as pH, temperature, salt conc. etc. which is not address by homology modeling

Introduction
Why are we still doing it ??
If someone ask you, Given a class of proteins that folds in a certain way, what do you think the structure of this protein, whose structure is not yet solved experimentally, should be ?? The bottom line . . . . . What is your best guess ?

In perspective
Human has 30K to 35K Genes Number of proteins expressed in human alone can be 10 X number of Genes Total number of proteins expressed by all organisms .. MANY Number of protein structures at PDB as of May 05 is 31K There is a good chance that the protein you are looking for is not solved !!!!

Some Basic Theory


When is homology modeling applicable
When the sequence similarity is greater than 40 % When there is more than one instance that support our initial assumption. E.g. If you can find a protein from different organisms with slightly different protein sequence sharing a very similar protein structure (fold).

Find homologous proteins

Procedure
Align Target Sequence to SCR sequence Similar Sequence

Get homologous protein structure

Extract Sequence

Assign coordinates from SCR

Struct/Seq Alignment

Generate loops

Determine SCR

Repair the ends

Load Target Sequence

Optimize structure

Getting a Protein Structure Go to protein data bank at


http://www.rcsb.org

Find a series of proteins structure that are homologous to your protein. Download the coordinate file (in PDB format) to your working directory

InsightII: One stop Graphical User Interface for life sciences

InsightII features cover today


Molecular Visualization Structure comparison Sequence alignment Structure building Molecular Modeling Energy Calculation Minimization

How to run InsightII


Available on SGI and Linux workstations module load insightII start_insightII

InsightII
MSI logo

Viewer toolbar

Frequently Used buttons

Module toolbar

Frequently used buttons

Loading molecule

Note: Molecule/Get

Displaying Structure

Note: Molecule/Display

Homology
Homology toolbar Here I am

Note: Accelrys Logo/homology

Extracting Sequence

Sequence Viewing Window

Note: Sequences/Extract

Manipulating Sequence Manually with the mouse


Toggle on Seq on Sequence Viewing Window Moving Entire Sequence Middle click on a residue and drag to move the entire sequence Creating Gap Right click on a residue and drag to the right to create gap to the right of residue Left click on a residue and drag to the left to create gap to the left of residue Moving entire Gap Right click on a gap and drag to the left to move entire gap to left Left click on a gap and drag to the right to move entire gap to right Splitting Gap Right click on a gap and drag to the right to split gap to right Left click on a gap and drag to the left to split gap to left

Multiple Sequence Alignment

Superimpose Structure using SCR

Note: Alignment/Multiple Sequence

Aligned Structure

Aligned Sequence

Structurally Conserved Regions (SCR)

Get Sequence

Note: Sequences/Get

Pairwise Sequence Alignment

Note: Alignment/Pairwise_Sequence

Aligned Sequence

Sequence Identity

Note: Alignment/Percent_Identity

Manipulating Boxes Manually with the mouse


Toggle on Box on Sequence Viewing Window Creating Boxes Altering Boxes Moving Boxes
Left click on one residue of a sequence and drag to another residue of another sequence to create a box Right click on left or right side of box and drag to shrink or expand box Middle click on edge of box to move box Middle click then left click to freeze box Repeat to unfreeze box Middle click, hold down control on keyboard, then left click to delete box

Freezing/Unfreezing Boxes Deleting Boxes

Assigning SCR manually

Assigning Coordinates from Aligned Sequence

Note: Sequences/Assign_Coords

Unassignable Sequence

Generating Loops

Note: Loops/Generate

Displaying Loops

Note: Loops/Display

Assign Coordinates to Loops

Note: Loops/Display

Setting Variable to Local PDB

Note: Session/Env_Var /usr/local/db/pdb/current/entries

Searching Loops

Note: Loops/Search

End Repair

Note: Refine/EndRepair

Displaying Trace

Note: Molecule/Display

SCR and Loop Regions

Note: Molecule/Display

Relax Setup

Note: Refine/Relax

Relax

Note: Refine/Relax

Compare with Experimental Structure (1CD2)

Note: Loops/Display

Running Fasta

Note: Databases/Input

Running Fasta

Note: Databases/Output

Running Fasta

Note: Databases/Run

Fasta Printout

Fasta Printout

Learning more about InsightII and its modules on your own


Pilot tutorials

InsightII Documentation
http://www.accelrys.com/doc/life/index.html Username: msi Passwd: msi-doc

To get help
By mail Web Phone help@msi.umn.edu www.msi.umn.edu 612 626 0802

Appointment TBA

Contact Information
Yuk Sham Computational Biology/Biochemistry Consultant Phone: (612) 624 7427 (Walter Library) Phone: (612) 624 0783 (VWL) Email: shamy@msi.umn.edu

You might also like