You are on page 1of 80

Rohit

Digitally signed by Rohit Jhawer


DN: cn=Rohit Jhawer, o, ou,
email=rohit_jhawer@hotmail.

Jhawer
com, c=IN
Date: 2007.03.09 14:10:44
+05'30'

Lecture 14:
Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Review of Proteins
• Proteins: polypeptides with a three
dimensional structure

• Primary structure – sequence of amino
acids constituting polypeptide chain

• Secondary structure – local organization of


polypeptide chain into secondary structures
such as α helices and β sheets

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Review of Proteins
• Tertiary structure –three dimensional
arrangements of amino acids as they react to
one another due to polarity and interactions
between side chains

• Quaternary structure – Interaction of several


protein subunits

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Protein Structure
• Proteins: chains of amino acids joined by
peptide bonds

• Amino Acids:
– Polar (separate positive and negatively charged
regions)
– free C=O group (CARBOXYL), can act as
hydrogen bond acceptor
– free NH group (AMINYL), can act as hydrogen
bond donor

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Protein Structure

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Protein Structure
• Many confirmations possible due to the
rotation around the Alpha-Carbon (Cα)
atom

• Confirmational changes lead to


differences in three-dimensional
structure of protein

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Protein Structure
• Polypeptide chain has pattern of N-Cα-C
repeated

• Angle between aminyl group and Cα is


PHI (φ) angle; angle between Cα and
carboxyl group is PSI (ψ) angle

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Protein Structure

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Differences between A.A.’s
• Difference between 20 amino acids is the R
side chains

• Amino acids can be separated based on the


chemical properties of the side chains:
– Hydrophobic
– Charged
– Polar

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Differences between A.A.’s
• Hydrophobic: Alanine(A), Valine(V),
phenylalanine (Y), Proline (P), Methionine
(M), isoleucine (I), and Leucine(L)

• Charged: Aspartic acid (D), Glutamic Acid


(E), Lysine (K), Arginine (R)

• Polar: Serine (S), Theronine (T), Tyrosine (Y);


Histidine (H), Cysteine (C), Asparagine (N),
Glutamine (Q), Tryptophan (W)

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Secondary Structure

• Image source: http://www.ebi.ac.uk/microarray/biology_intro.html


CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Secondary Structures
• Core of each protein made up of regular
secondary structures

• Regular patterns of hydrogen bonds are


formed between neighboring amino acids

• Amino acids in secondary structures have


similar φ and ψ angles

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Secondary Structures
• Structures act to neutralize the polar groups
on each amino acid

• Secondary structures tightly packed in protein


core and a hydrophobic environment

• Each amino acid side group has a limited


space to occupy -- therefore a limited number
of possible interactions

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Types of Secondary
Structures
• α Helices
• β Sheets
• Loops
• Coils

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
α Helix
• Most abundant secondary
structure

• 3.6 amino acids per turn

• Hydrogen bond formed


between every fourth reside

• Average length: 10 amino


acids, or 3 turns

• Varies from 5 to 40 amino acids

Image source: http://www.hhmi.princeton.edu/sw/2002/psidelsk/scavengerhunt.htm; http://www4.ocn.ne.jp/~bio/biology/protein.htm


CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
α Helix
• Normally found on the surface of protein
cores

• Interact with aqueous environment


– Inner facing side has hydrophobic amino
acids
– Outer-facing side has hydrophilic amino
acids

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
α Helix
• Every third amino acid tends to be
hydrophobic

• Pattern can be detected computationally

• Rich in alanine (A), gutamic acid (E), leucine


(L), and methionine (M)

• Poor in proline (P), glycine (G), tyrosine (Y),


and serine (S)
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
β Sheet

Image source: http://broccoli.mfn.ki.se/pps_course_96/ss_960723_12.html;


http://www4.ocn.ne.jp/~bio/biology/protein.htm

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
β Sheet
• Hydrogen bonds between 5-10
consecutive amino acids in one portion
of the chain with another 5-10 farther
down the chain

• Interacting regions may be adjacent


with a short loop, or far apart with other
structures in between

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
β Sheet
• Directions:
– Same: Parallel Sheet
– Opposite: Anti-parallel Sheet
– Mixed: Mixed Sheet

• Pattern of hydrogen bond formation in


parallel and anti-parallel sheets is
different

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
β Sheet
• Slight counterclockwise rotation

• Alpha carbons (as well as R side


groups) alternate above and below the
sheet

• Prediction difficult, due to wide range of


φ and ψ angles

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Interactions in Helices and
Sheets

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Loop
• Regions between α helices and β
sheets

• Various lengths and three-dimensional


configurations

• Located on surface of the structure

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Loop
• Hairpin loops: complete turn in the
polypeptide chain, (anti-parallel β sheets)

• More variable sequence structure

• Tend to have charged and polar amino acids

• Frequently a component of active sites

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Coil
• Region of secondary structure that is
not a helix, sheet, or loop

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Secondary Structure

• Image source: http://www.ebi.ac.uk/microarray/biology_intro.html


CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
6 Classes of Protein Structure
1) Class α: bundles of α helices connected by
loops on surface of proteins

2) Class β: antiparallel β sheets, usually two


sheets in close contact forming sandwich

3) Class α/β: mainly parallel β sheets with


intervening α helices; may also have mixed β
sheets (metabolic enzymes)

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
6 Classes of Protein Structure
4) Class α+ β: mainly segregated α helices and
antiparallel β sheets

5) Multidomain (α and β) proteins more than


one of the above four domains

6) Membrane and cell-surface proteins and


peptides excluding proteins of the immune
system

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
α Class Protein (hemoglobin)

• http://www.rcsb.org/pdb/cgi/explore.cgi?job=graphics;pdbId=3hhb;page=;pid=&opt=show&size=250

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
β Class Protein (T-Cell CD8)

• http://www.rcsb.org/pdb/cgi/explore.cgi?job=graphics;pdbId=1cd8;page=;pid=&opt=show&size=500

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
α/ β Class Protein
(tryptohan synthase)

• http://www.rcsb.org/pdb/cgi/explore.cgi?job=graphics;pdbId=2wsy;page=;pid=&opt=show&size=500

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
α+β Class Protein
(1RNB)

• http://www.rcsb.org/pdb/cgi/explore.cgi?job=graphics;pdbId=1rnb;page=;pid=&opt=show&size=500

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Membrane Protein (10PF)

• http://www.rcsb.org/pdb/cgi/explore.cgi?job=graphics;pdbId=1opf;page=;pid=&opt=show&size=500

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Protein Structure Databases
• Databases of three dimensional structures of
proteins, where structure has been solved
using X-ray crystallography or nuclear
magnetic resonance (NMR) techniques

• Protein Databases:
– PDB
– SCOP
– Swiss-Prot
– PIR

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Protein Structure Databases
• Most extensive for 3-D structure is the
Protein Data Bank (PDB)

• Current release of PDB (April 8, 2003)


has 20,622 structures

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Partial PDB File
ATOM 1 N VAL A 1 6.452 16.459 4.843 7.00 47.38 3HHB 162
ATOM 2 CA VAL A 1 7.060 17.792 4.760 6.00 48.47 3HHB 163
ATOM 3 C VAL A 1 8.561 17.703 5.038 6.00 37.13 3HHB 164
ATOM 4 O VAL A 1 8.992 17.182 6.072 8.00 36.25 3HHB 165
ATOM 5 CB VAL A 1 6.342 18.738 5.727 6.00 55.13 3HHB 166
ATOM 6 CG1 VAL A 1 7.114 20.033 5.993 6.00 54.30 3HHB 167
ATOM 7 CG2 VAL A 1 4.924 19.032 5.232 6.00 64.75 3HHB 168
ATOM 8 N LEU A 2 9.333 18.209 4.095 7.00 30.18 3HHB 169
ATOM 9 CA LEU A 2 10.785 18.159 4.237 6.00 35.60 3HHB 170
ATOM 10 C LEU A 2 11.247 19.305 5.133 6.00 35.47 3HHB 171
ATOM 11 O LEU A 2 11.017 20.477 4.819 8.00 37.64 3HHB 172
ATOM 12 CB LEU A 2 11.451 18.286 2.866 6.00 35.22 3HHB 173
ATOM 13 CG LEU A 2 11.081 17.137 1.927 6.00 31.04 3HHB 174
ATOM 14 CD1 LEU A 2 11.766 17.306 .570 6.00 39.08 3HHB 175
ATOM 15 CD2 LEU A 2 11.427 15.778 2.539 6.00 38.96 3HHB 176

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Description of PDB File
• second column: amino acid position in the
polypeptide chain

• fourth column: current amino acid

• Columns 7, 8, and 9: x, y, and z coordinates


(in angstroms)

• The 11th column: temperature factor -- can be


used as a measurement of uncertainty
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Protein Structure
Classification Databases
• Structural Classification of proteins
(SCOP)

• based on expert definition of structural


similarities

• SCOP classifies by class, family, superfamily,


and fold

• http://scop.mrc-lmb.cam.ac.uk/scop/
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Protein Structure
Classification Databases
• Classification by class, architecture,
topology, and homology (CATH)

• Classifies proteins into hierarchical levels by


class

• a/B and a+B are considered to be a single


class

• http://www.biochem.ucl.ac.uk/bsm/cath/
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Protein Structure
Classification Databases
• Molecular Modeling Database (MMDB)

• structures from PDB categorized into


structurally related groups using the VAST

• looks for similar arrangements of secondary


structural elements

• http://www.ncbi.nlm.nih.gov/Entrez

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Protein Structure
Classification Databases
• Spatial Arrangement of Backbone
Fragments (SARF)

• categorized on structural similarities,


similar to the MMDB

• http://www-lmmb.ncifcrf.gov/~nicka/sarf2.html

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Visualization of Proteins
• A number of programs convert atomic
coordinates of 3-d structures into views of the
molecule

• allow the user to manipulate the molecule by


rotation, zooming, etc.

• Critical in drug design -- yields insight into


how the protein might interact with ligands at
active sites
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Visualization of Proteins
• Most popular program for viewing 3-
dimensional structures is Rasmol

Rasmol: http://www.umass.edu/microbio/rasmol/
Chime: http://www.umass.edu/microbio/chime/
Cn3D: http://www.ncbi.nlm.nih.gov/Structure/
Mage: http://kinemage.biochem.duke.edu/website/kinhome.html
Swiss 3D viewer: http://www.expasy.ch/spdbv/mainpage.html

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Alignment of Protein Structure
• Three-dimensional structure of one protein
compared against three-dimensional
structure of second protein

• Atoms fit together as closely as possible to


minimize the average deviation

• Structural similarity between proteins does


not necessarily mean evolutionary
relationship
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Alignment of Protein Structure
• Positions of atoms in three-dimensional
structures compared

• Look for positions of secondary


structural elements (helices and
strands) within a protein domain

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Alignment of Protein Structure
• Distances between carbon atoms
examined to determine degree
structures may be superimposed

• Side chain information can be


incorporated
– Buried; visible

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
SSAP
• Secondary Structure Alignment
Program

• Incorporates double dynamic


programming to produce a structural
alignment between two proteins

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Steps in SSAP
• 1) Calculate vectors from Cβ of one amino
acid to set of nearby amino acids
– Vectors from two separate proteins compared
– Difference (expressed as an angle) calculated,
and converted to score

• 2) Matrix for scores of vector differences


from one protein to the next is computed.

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Steps in SSAP
• 3) Optimal alignment found using
global dynamic programming, with a
constant gap penalty

• 4) Next amino acid residue


considered, optimal path to align this
amino acid to the second sequence
computed

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Steps in SSAP
• 5) Alignments transferred to
summary matrix
– If paths cross same matrix position, scores
are summed
– If part of alignment path found in both
matrices, evidence of similarity

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Steps in SSAP
• 6) Dynamic programming alignment
is performed for the summary matrix
– Final alignment represents optimal
alignment between the protein structures
– Resulting score converted so it can be
compared to see how closely related two
structures are

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Distance Matrix Approach
• Uses graphical procedure similar to dot
plots

• Identifies atoms that lie most closely


together in three-dimensional structure

• Two sequences with similar structure


can have dot plots superimposed

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Distance Matrix Approach
• Values in distance matrix represent distance
between the Cα atoms in the three
dimensional structure

• positions of closest packing atoms marked


with a dot to highlight regions of interest

• Similar groups superimposed as closely as


possible by minimizing sum of atomic
distances
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
DALI
• Distance Alignment Tool (DALI)

• Uses distance matrix method to align protein


structures

• Assembly step uses Monte Carlo simulation


to find submatrices that can be aligned

• Existing structures that have been compared


are organized into the FSSP database
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Fast Structural Similarity
Search
• Compare types and arrangements of
secondary structures within two proteins

• If elements similarly arranged, three-


dimensional structures are similar

• VAST and SARF are programs that use


these fast methods

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Structural Motifs Based on
Sequence Analysis
• Some structural elements can be
determined by looking at sequence
composition
– zinc finger motifs
– leucine zippers
– coiled-coil structures

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Zinc Finger Motifs
• Found by looking at
order and spacing of
cysteine and
histidine residues

• Typical zinc finger


motifs are
composed of two
cysteines followed Image source: www.bmb.psu.edu/faculty/tan/lab/
by two histidines tanlab_gallery_protdna.html

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Leucine Zippers
• Found by looking for
two antiparallel alpha
helices held together

• Interactions between
hydrophobic leucine
residues found every
seventh position in helix Image source: ww2.mcgill.ca/biology/undergra/
c200a/sec3-5.htm

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Transmembrane Proteins
• traverse back and forth
through alpha helices

• Typical length: 20-30


residues

• Transmembrane alpha
helices have hydrophobic
residues on the inside
facing portions, and
hydrophilic residues on the
outside Image source:
http://www.northwestern.edu/neurobiology/faculty/pinto2/pinto_12big.jpg

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Membrane Prediction
Programs
• PHDhtm: employs neural network approach;
neural network trained to recognize sequence
patterns and variations of helices in
transmembrane proteins of known structures

• Tmpred: functions by searching a protein


against a sequence scoring matrix obtained
by aligning the sequences of all known
transmembrane alpha helix regions

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Distance Matrix Approach
• Uses graphical procedure similar to dot
plots

• Identifies atoms that lie most closely


together in three-dimensional structure

• Two sequences with similar structure


can have dot plots superimposed

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Distance Matrix Approach
• Values in distance matrix represent distance
between the Cα atoms in the three
dimensional structure

• positions of closest packing atoms marked


with a dot to highlight regions of interest

• Similar groups superimposed as closely as


possible by minimizing sum of atomic
distances
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
DALI
• Distance Alignment Tool (DALI)

• Uses distance matrix method to align protein


structures

• Assembly step uses Monte Carlo simulation


to find sub-matrices that can be aligned

• Existing structures that have been compared


are organized into the FSSP database
CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Fast Structural Similarity
Search
• Compare types and arrangements of
secondary structures within two proteins

• If elements similarly arranged, three-


dimensional structures are similar

• VAST and SARF are programs that use


these fast methods

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Structural Motifs Based on
Sequence Analysis
• Some structural elements can be
determined by looking at sequence
composition
– zinc finger motifs
– leucine zippers
– coiled-coil structures

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Zinc Finger Motifs
• Found by looking at
order and spacing of
cysteine and
histidine residues

• Typical zinc finger


motifs are
composed of two
cysteines followed Image source: www.bmb.psu.edu/faculty/tan/lab/
by two histidines tanlab_gallery_protdna.html

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Leucine Zippers
• Found by looking for
two antiparallel alpha
helices held together

• Interactions between
hydrophobic leucine
residues found every
seventh position in helix Image source: ww2.mcgill.ca/biology/undergra/
c200a/sec3-5.htm

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Transmembrane Proteins
• traverse back and forth
through alpha helices

• Typical length: 20-30


residues

• Transmembrane alpha
helices have hydrophobic
residues on the inside
facing portions, and
hydrophilic residues on the
outside Image source:
http://www.northwestern.edu/neurobiology/faculty/pinto2/pinto_12big.jpg

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Membrane Prediction
Programs
• PHDhtm: employs neural network approach;
neural network trained to recognize sequence
patterns and variations of helices in
transmembrane proteins of known structures

• Tmpred: functions by searching a protein


against a sequence scoring matrix obtained
by aligning the sequences of all known
transmembrane alpha helix regions

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Chou-Fasman Method
• based on analyzing frequency of amino acids in
different secondary structures
– A, E, L, and M strong predictors of alpha helices
– P and G are predictors in the break of a helix

• Table of predictive values created for alpha helices,


beta sheets, and loops

• Structure with greatest overall prediction value


greater than 1 used to determine the structure

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
GOR Method
• Improves upon the Chou-Fasman method

• Assumes amino acids surrounding the central amino


acid influence secondary structure central amino acid
is likely to adopt

• Scoring matrices used in GOR method, incorporates


information theory and Bayesian statistics

• Mount, p450-451

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Neural Network Models
• Programs trained to recognize amino acid
patterns located in known secondary
structures

• distinguish these patterns from patterns not


located in structures

• PHD and NNPREDICT use neural networks

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Nearest-neighbor
• machine learning method

• secondary structure confirmation of an amino


acid calculated by identifying sequences of
known structures similar to the query by
looking at the surrounding amino acids

• Nearest-neighbor programs include include


PSSP, Simpa96, SOPM, and SOPMA

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Prediction of 3d Structures
• Threading is most Robust technique
• Time consuming
• Requires knowledge of protein structure

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Threading
• Searches for structures with similar folds
without sequence similarity

• Threading takes a sequence with unknown


structure and threads it through the
coordinates of a target protein whose
structure has been solved
– X-ray crystallography
– NMR imaging

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Threading
• Considered position by position subject
to predetermined constraints

• Thermodynamic calculations made to


determine most energetically favorable
and confirmationally stable alignment

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Environmental Template
• Environment of each amino acid in each
known structural core is determined
– secondary structure
– area of side chain buried by closeness to
other atoms
– types of nearby side chains

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Environmental Template
• Each position classified into one of 18
types
– 6 representing increasing levels of residue
burial
– three classes of secondary structure (alpha
helices, beta sheets, and loops).

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Upcoming Seminars
• Topic TBA
– Rafael Irizarry, Johns Hopkins University
• Friday, 4/23/2004
• 8:30 AM – 9:30 AM
• LOCATION: K-Building Room 2036 (HSC
Campus)

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Presentations
• 4:45 – 5:00 Richard Jones
• 5:00 – 5:15 Steven Xu
• 5:15 – 5:30 Olutola Iyun
• 5:30 – 5:45 Frank Baker
• 5:45 – 6:00 Guanghui Lan
• 6:00 – 6:15 Tim Hardin
• 6:15 – 6:30 Satish Bollimpalli & Ravi
Gundlapalli

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka

You might also like