Professional Documents
Culture Documents
The Structure Lectures: Boris Steipe
The Structure Lectures: Boris Steipe
Lectures
Boris Steipe
boris.steipe@utoronto.ca http://biochemistry.utoronto.ca/steipe
9.0 1
Lecture 9.0:
Use of Protein Structure
Boris Steipe
boris.steipe@utoronto.ca http://biochemistry.utoronto.ca/steipe
( Some slides have been adapted from material by Chris Hogue, Toronto, prepared for CBW in 2002)
9.0 2
Concepts
1. "Sequence" and "structure" are abstractions of biopolymers.
2. Structure can be determined experimentally.
3. Structure abstractions can be stored, retrieved and visualized.
4. Knowledge of structure allows mechanistic explanations.
5. Structure is not arbitrary, but comes in units - motifs, helices,
strands, domains and complexes.
6. Domains are folding units, functional units and units of
inheritance.
9.0 3
Concept 1:
"Sequence" and
"structure" are
abstractions of
biopolymers.
9.0 4
Physical Amino Acids and
Amino Acid Abstractions
Formula: C9H9NO2
N
Smiles String†: [CH]
([NH][R])([C](=[O])[R])
[CH2]-[c]1([cH][cH][c]([cH] O OH
[cH]1)[OH])
Name: Tyrosine
3-Letter: Tyr
1-Letter: Y
ATOM 1091 N TYR 145 -35.676 -13.136 50.622 1.00 10.36
ATOM 1092 CA TYR 145 -36.931 -13.763 51.019 1.00 10.63
ATOM 1093 C TYR 145 -37.676 -12.879 52.016 1.00 11.16
ATOM 1094 O TYR 145 -37.061 -12.316 52.926 1.00 13.91
ATOM 1095 CB TYR 145 -36.660 -15.140 51.638 1.00 9.52
ATOM 1096 CG TYR 145 -37.845 -15.737 52.361 1.00 6.36
ATOM 1097 CD1 TYR 145 -38.144 -15.357 53.663 1.00 3.30
ATOM 1098 CD2 TYR 145 -38.691 -16.652 51.727 1.00 6.14
ATOM 1099 CE1 TYR 145 -39.248 -15.856 54.311 1.00 5.57
ATOM 1100 CE2 TYR 145 -39.804 -17.165 52.376 1.00 4.89
ATOM 1101 CZ TYR 145 -40.076 -16.757 53.670 1.00 4.35
ATOM 1102 OH TYR 145 -41.170 -17.231 54.345 1.00 4.44
†
http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html
9.0 5
The Concept of Abstract Amino Acids
Allows Highly Compressed Information
H-bond Donor Nucleophile
Bulky
Phospho-Acceptor
Hydrophobic
H-Bond Acceptor
Y
Aromatic
2° side chain
rotational freedom
9.0 6
The Concept of Abstract Amino
Acid Similarity is Lossy
H-bond Donor Nucleophile
(CHKNQRSTWY) (CDESTY)
Bulky
(FILQRYW)
Phospho-Acceptor
(STY)
Hydrophobic
(FAMILYVW) H-Bond Acceptor
(DEHNQSTY)
Y
Aromatic
(FWH)
2° side chain
rotational freedom
(CDFHSW)
9.0 7
Structure Contextualizes Sequence
… V V I Y T T G … (Tyr262 in 1ERQ.pdb)
9.0 8
Structural Abstraction
9.0 9
Concept 2:
Structure can
be determined
experimentally.
9.0 10
Experimental sources of
structure
• Crystallization required
• Diffraction data collection
• The phase problem: MAD, heavy
X-ray metal isomorphic derivatives ...
• ... or "Molecular replacement" give
phase approximations
NMR • Model building in electron density
maps
• Refinement
9.0 11
Experimental sources of
structure
Crystallization is limiting.
X-ray Diffraction is not imaging!
Refinement is required.
http://www-structure.llnl.gov/Xray/101index.html
9.0 12
Experimental sources of
structure
X-ray
• High concentration required
( ~ 1mM)
• Assignment of peaks ...
• ... determination of crosspeaks
distance constraints
NMR • Calculation of models from
distance constraints
• Refinement
9.0 13
Experimental sources of
structure
X-ray
1DRO.PDB
Ensemble of structures that are compatible
Consensus model
with experimental distance constraints
Concentration/Solubility
NMR Assignment and NOEs
Refinement
9.0 14
Assessing structure quality
Metrics:
•Resolution, R-factor and R-free
•Bond length and angle deviations
•Coordinate error can be
estimated
from diffraction data http://www.sci.sdsu.edu/TFrey/Bio750/Bio750X-Ray.html
http://swift.cmbi.kun.nl/WIWWWI//fullcheck.html
http://www.biochem.ucl.ac.uk/~roman/procheck/procheck.html (also NMR)
9.0 15
Concept 3:
Structure
abstractions can
be stored,
retrieved and
visualized.
9.0 16
The
PDB
The PDB
is the QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.
primary
repository
of protein
structure
data.
http://www.rcsb.org/pdb
9.0 17
What’s in a Structure File?
• Population experiments
• X-ray, 1 structure
• NMR - sometimes many structures
• Incomplete - not all “atoms” are there
• Hydrogens, parts of the protein in motion
• Crystallographic “space”
• correct, but not always relevant
9.0 18
The PDB format
9.0 19
Header
HEADER IMMUNOGLOBULIN 01-MAR-93 2IMM 2IMM 2
COMPND IMMUNOGLOBULIN VL DOMAIN (VARIABLE DOMAIN OF KAPPA LIGHT 2IMM 3
COMPND 2 CHAIN) OF MCPC603 2IMM 4
SOURCE HUMAN (HOMO $SAPIENS) RECOMBINANT SYNTHETIC M603 GENE 2IMM 5
AUTHOR B.STEIPE,R.HUBER 2IMM 6
REVDAT 1 15-JUL-93 2IMM 0 2IMM 7
REMARK 1 2IMM 8
REMARK 1 REFERENCE 1 2IMM 9
REMARK 1 AUTH B.STEIPE,A.PLUCKTHUN,R.HUBER 2IMM 10
REMARK 1 TITL REFINED CRYSTAL STRUCTURE OF A RECOMBINANT 2IMM 11
REMARK 1 TITL 2 IMMUNOGLOBULIN DOMAIN AND A 2IMM 12
REMARK 1 TITL 3 COMPLEMENTARITY-DETERMINING REGION 1-GRAFTED MUTANT 2IMM 13
REMARK 1 REF J.MOL.BIOL. V. 225 739 1992 2IMM 14
REMARK 1 REFN ASTM JMOBAK UK ISSN 0022-2836 070 2IMM 15
[...]
REMARK 2 2IMM 23
REMARK 2 RESOLUTION. 2.00 ANGSTROMS. 2IMM 24
REMARK 3 2IMM 25
[...]
9.0 20
Seqres
[...]
SEQRES 1 114 ASP ILE VAL MET THR GLN SER PRO SER SER LEU SER VAL 2IMM 35
SEQRES 2 114 SER ALA GLY GLU ARG VAL THR MET SER CYS LYS SER SER 2IMM 36
SEQRES 3 114 GLN SER LEU LEU ASN SER GLY ASN GLN LYS ASN PHE LEU 2IMM 37
SEQRES 4 114 ALA TRP TYR GLN GLN LYS PRO GLY GLN PRO PRO LYS LEU 2IMM 38
SEQRES 5 114 LEU ILE TYR GLY ALA SER THR ARG GLU SER GLY VAL PRO 2IMM 39
SEQRES 6 114 ASP ARG PHE THR GLY SER GLY SER GLY THR ASP PHE THR 2IMM 40
SEQRES 7 114 LEU THR ILE SER SER VAL GLN ALA GLU ASP LEU ALA VAL 2IMM 41
SEQRES 8 114 TYR TYR CYS GLN ASN ASP HIS SER TYR PRO LEU THR PHE 2IMM 42
SEQRES 9 114 GLY ALA GLY THR LYS LEU GLU LEU LYS ARG 2IMM 43
[...]
9.0 21
Pitfalls:
Atom Atomname is a mix of Chemical element
and bond topology. "CA.." ≠ ".CA."
Sequence number is actually a string -
Atom Chain and insertion code are required to
number make it unique (e.g B 123A).
Amino acid
type
X
Y Z Occ
ATOM 119 CA ARG 18 8.386 51.105 35.847 1.00 7.30 2IMM 179
B
Sequence
number (Temperature factors)
Atom
name
Record
type PDB format is strictly column oriented !
9.0 22
Hetero Atoms
[...]
HETATM 877 O HOH 1 -4.169 60.050 40.145 1.00 3.00 2IMM 937
[...]
http://xray.bmc.uu.se/hicup/
9.0 23
The crystallographic asymmetric units does not
necessarily contain a functional molecule
The contents of a crystal
lattice unit cell can be
generated from the
asymmetric unit by
applying the required
symmetry operations for
the crystallographic
space-group. But neither
is this trivial for the
non-crystallographer,
nor is it obvious which
of the symmetry
replicates might make
1qpi.pdb Tet-repressor/operator complex physiological contacts.
9.0 24
... Biological Unit
PQS reasons
automatically
about how a
monomer
might be
correctly
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.
completed to a
functional bio-
molecular
complex (and
is often
correct).
http://pqs.ebi.ac.uk/
9.0 25
NCBI
structure
group
MMDB - very
well integrated
but somewhat
impenetrable.
9.0 26
NDB
http://ndbserver.rutgers.edu/NDB/
urx035.pdb
(Hammerhead Ribozyme)
9.0 27
PDBsum - and "secondary"
structure databases
http://www.biochem.ucl.ac.uk/bsm/pdbsum/
9.0 28
PDBsum - Information
9.0 29
Others
Macromolecular Structure Database at EBI (Relibase, PQS ...)
http://www.ebi.ac.uk/msd/
Molecular Library
http://www.nyu.edu/pages/mathmol/library/
9.0 30
Concept 4:
Knowledge of
structure allows
mechanistic
explanations.
9.0 31
Structure as an integrated map
- Example questions
• Which part of my structure appears to be conserved ?
• Are two functionally important residues possibly in contact ?
• Where is Asn220 relative to the active site ?
• May the mutation E123A possibly have something to do with
protein stability ?
• Is Leu234 on the surface, or in the core ?
• I want to clone my protein into a yeast two-hybrid system: should I
fuse the DNA binding domain to the N- or the C- terminus ?
9.0 32
Geometric relationships
• Bonds
• Angles, plain and dihedral
• Surfaces
• Chemical potential, amino acid functions
• Static and dynamic disorder
• Structural similarity
• Electrostatics
• Conservation patterns (structural and functional)
• Quarternary structure
• Posttranslational modification sites
• Unexpected homology
• [...]
9.0 33
Distances from
coordinates
XYZ coordinates are vectors in an
orthogonal coordinate system, in Å.
All the rules of analytical geometry apply.
[...]
ATOM 687 OH TYR 86 7.415 62.584 32.900 1.00 3.37
[...]
ATOM 651 O ASP 82 9.996 62.571 32.488 1.00 5.18
[...]
9.0 34
Dihedral angles
9.0 35
Backbone dihedral angles:
Ramachandran plots
9.0 36
Sidechain rotamers
3
2
100 randomly chosen
Phe-residues superimposed.
9.0 37
H-bond patterns
Example: TYR - Side Chain Donor
OH can donate a single hydrogen
(The OH-H bond is 1.00Å long and lies in the plane
of CE1, CE2, CZ and OH forming an angle of 110
degrees with the CZ-OH bond.)
9.0 38
Molecular surface
Chain "A" of
1AON.PDB -
GroEL/ES complex
Surface rendering
of GroEL/ES
complex
(D. Goodsell)
9.0 39
Molecular surface
Surface provides a visual metaphore,
and a useful tool to map properties.
9.0 40
r= 1.4
Molecular surface
Probe !
9.0 41
Molecular surface
Contact surface
Accessible surface
"Accessible"
9.0 42
Calculating solvent accessible
surfaces
1. Draw a sphere around each atom, with a radius of (VdW + solvent
probe ).
2. Erase all overlapping sphere surfaces.
3. The remaining area is the accessible surface.
r= 1.4Å
C: 1.75 Å
N: 1.55
O: 1.4Å
H: 1.17Å
9.0 43
Parameters and assumptions
Problem: Analytical solution inefficient.
Solution: Numerical solution with probe points
Problem: Regular placement of n probe points
Solution: Stochastic placement
Problem: Stochastic placement quite irregular
Solution: Enforce minimum separation
Problem: Efficiency
Solution: Place points only once, translate as needed
Problem: What is a good value for n ? u,v [0,1]
Solution: Try different n, evaluate standard deviation
Problem: Should n be constant per atom, or per area ? = 2u
Solution: dots/area - need to scale dots with r VdW
Problem: Hydrogens - where to get united atom radii ? = cos-1 (2v–1)
Solution: Literature search.
Problem: Reference areas for relative SAA needed
Solution: Model explicitely, as tripeptides http://mathworld.wolfram.com/
SpherePointPicking.html
[...]
9.0 44
Mapping properties on
surfaces
•Properties of atoms (B-factors)
•Ensemble properties of residues
(hydrophobicity, conservation)
•Geometry (local curvature)
•Fields and potentials
(isosurfaces, binding potential)
9.0 45
Concept 5:
Structure is not
arbitrary, but
contains
recurring units.
9.0 46
Basic building blocks of
structure:
Eg. PROMOTIF - as used in PDBSUM
9.0 47
Unbiased structure motifs:
alignment with added value
Motif alignments ... Why are particular
amino acids conserved? What is
essential in a sequence ?
9.0 48
1ic
ag
A schematikon motif example:
complex loop
3.8
3.6
3.4
3.2
3.0
2.8
2.6
2.4
2.2
2.0
Motif:
Length:
1icf 215
7
1.8
1.6
1.4
1.2
1.0
0.8
0.6
Support: 7
0.4
Unique: 7
0.2
0.0
Rank: 399
9.0
123Po
4567
49
1w
A schematikon motif example:
strand N-cap
ag
3.8
3.6
3.4
3.2
3.0
2.8
2.6
2.4
2.2
Motif: 1whi 35
2.0
1.8
1.6
1.4
1.2
1.0
Length: 4
0.8
0.6
Support: 7
0.4
Unique: 7
0.2
0.0
Rank: 444
9.0
12
34
Po
50
Concept 6:
Domains are
folding units,
functional units, and
units of inheritance.
9.0 51
Domains are ubiquitous in
proteins
Large proteins are composed of compact,
semi-independent units - domains.
Reason:
Modularity
Folding efficiency
2MCP.PDB
9.0 52
Domains in proteins:
Number of
domains in 787
representative
proteins used as
the basis for the
CATH database
9.0 53
Domains in proteins:
Non-random
relationship
between domain
number and
chain length in
the 787
representative
proteins used as
the basis for the
CATH database
9.0 54
Domains in proteins:
Domain size in
the 787
representative
proteins used as
the basis for the
CATH database
9.0 55
There is no universal
definition of "domains"
Possible definitions are based on independently inherited (sub)sequences
(sequence domain), modular protein functions (functional domain), folding
unit or atomic contacts (structural domain).
9.0 56
Further complications:
Analogous
structure,
Domain
insertions,
Circular
permutations,
Domain
swapping.
Domain insertion
1A2J.PDB 2TRX.PDB
Protein disulfide isomerase Thioredoxin
9.0 57
Further complications:
Analogous
structure,
Domain
insertions,
Circular
permutations,
Domain 253
swapping.
Circular permutation
1ERQ.PDB
1ALQ.PDB
beta lactamase
beta lactamase
9.0 58
Further complications:
Analogous
structure,
Domain
insertions,
Circular
permutations,
Domain
swapping.
Domain swapping
11BG.PDB
Bull seminal ribonuclease
9.0 59
Domains can be elusive:
9.0 60
Why care ?
Function:
evolution works on sequence, but selects function.
Definition of domains in structure can uncover functional units
that may evolve independently. Sequence searches, alignments
etc. with domains are much more specific.
9.0 61
Automated (objective) domain
definition: - Sequence (CDD)
http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml
CDD
from Smart
and Pfam
CDART
from CDD
and Genbank
9.0 62
SemiAutomated consensus domain
definition: - Structure (CATH)
Dehydrolipoamide
dehydrogenase 1LPFA:
Jones S et al. (1998) Domain assignment for protein structures using a consensus
approach: Chracterization and analysis. Protein Science 7:233-242
9.0 63
SCOP & CATH: structural classification
The eight
most
frequent
SCOP
Superfolds
http://scop.mrc-lmb.cam.ac.uk/scop/
http://www.biochem.ucl.ac.uk/bsm/cath/
9.0 64
CATH - Class
Class1: Mainly Alpha Class 2: Mainly Beta Class 3: Mixed Class4: Few
Alpha/Beta Secondary
Structures
9.0 65
CATH - Architecture
9.0 66
CATH - Topology
9.0 67
CATH - Homology
9.0 68
CATH -
Entry
(Example)
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.
9.0 69
IV: Open Issues
9.0 70
Bioinformaticians apparently
do not like structure !
Sequence: Structure:
• Discrete alphabet • Continuous space
• Linear algebra, complicated
• Easy to manipulate
energy functions
• Well developed • Databases and
datastructures datastructures are difficult
• Well developed libraries • Paucity of libraries