You are on page 1of 33

Unit 1: Structure Determination

Protein Structure Database


PDB
PDB File format
Ramachandran Plot
Worldwide PDB (wwPDB)
• The wwPDB organization manages the PDB that archive
structure data and metadata for biological macromolecules to
promote basic and applied research and education across the
sciences.
• Missions:
– Manage the wwPDB Core Archives according to the FAIR Principles.
– Provide expert deposition, validation, biocuration, and remediation
services at no charge to Data Depositors worldwide.
– Ensure universal open access to public domain structural biology data with
no limitations on usage.
– Develop and promote community-endorsed data standards for archiving
and exchange of global structural biology data.
wwPDB Members
• Protein Data Bank in
Europe
• Biological Magnetic
Resonance Data Bank
• Protein Data Bank Japan
• Research Collaboratory for
Structural Bioinformatics
Protein Data Bank
RCSB PDB
• The Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data
Bank (PDB) is a comprehensive database of three-dimensional protein and
nucleic acid structures determined by X-ray crystallography, NMR, and cryo-
electron microscopy.
• The PDB was established in 1971 at Brookhaven National Laboratories (BNL)
with the deposition of seven structures.
• Since then, management of the PDB has been transferred to the RCSB (fall,
1998) and around 154735 structures have been deposited (August 2019).
• The PDB has become an international resource for macromolecular structural
coordinates
• Most peer-reviewed journals now require the deposition of coordinates in this
database prior to publishing structural data.
• PDB structures can be easily accessed through the main web site
(http://www.rcsb.org/pdb) or six other international mirror sites by searching
key words or PDB entry codes. An advanced search form is also available.
Information about structures
1. Download/Display File
– Allows downloading or displaying the coordinate file
2. Medline
– Provides abstract of the primary publication describing the structure
3. View Structure
– Permits interactive viewing of the structure through interactive
graphics programs such as RasMol, Chime, Swiss PdbViewer, VRML, or
MICE
4. Structural Neighbors
– Provide links for that particular structure within the CATH, CE, FSSP,
SCOP, or VAST databases
5. Geometry
– Provides considerable structural analysis of the structure like
Ramachandran plot etc.
6. Sequence Details
– Provide sequence(s) for that particular structure in FASTA format
Other Existing Structural Databases
MSD:
• The Macromolecular Structure Database (MSD;
http://www.ebi.ac.uk/msd/index.html) at the European
Bioinformatics Institute (EBI) manages and distributes
macromolecular structural data.
MMBD:
• The Molecular Modeling Database
(MMDB;http://www.ncbi.nlm.nih.gov/Structure/MMDB/
mmdb.shtml) is maintained by the National Center for
Biotechnology Information (NCBI), part of the National
Institutes of Health (NIH).
Understanding Structure File
• The primary information stored in the structure file
basically consists of coordinate information for biological
molecules.
• These files list the atoms in each protein, and their 3D
location in space (coordinates).
• These files are available in several formats (PDB, mmCIF,
XML).
• A typical PDB formatted file includes a large "header"
section of text that summarizes the protein, citation
information, and the details of the structure solution,
followed by the sequence and a long list of the atoms and
their coordinates.
• The archive also contains the experimental observations
that are used to determine these atomic coordinates.
Interpreting Coordinates
• A Protein Data Bank (PDB) data file for a protein
structure contains only x, y, and z coordinates of atoms
the most basic requirement for a visualization program is
to build connectivity between atoms to make a view of a
molecule.
• The visualization program should also be able to produce
molecular structures in different styles, which include
wire frames, balls and sticks, space-filling spheres, and
ribbons
• The main feature of computer visualization programs is
interactivity, which allows users to visually manipulate
the structural images through a graphical user interface.
Protein Data Bank (PDB) File

• A file that describe a structure.


• PDB format is a standard for files containing atomic
coordinates.
• It is used for structures in the Protein Data Bank and is
read and written by many programs.
• The complete PDB file specification provides
information, including authors, literature references,
and the method of structure determination.
• PDB format consists of lines of information in a text
file.
(A) Wireframes. (B) Balls and sticks. (C) Space-filling spheres. (D) Ribbons
Example: 1gcn
Atom serial number
X coordinate values
Atom name
Y coordinate values

Z coordinate values
Branch indicator
Residue type
Chain identifier
Occupancy

Remoteness indicator code


α-A β-B γ-G δ-D ε-E ζ-Z η-H
Element symbol

Residue number
ATOM Temperature factor
HETATM (B-factor)
TER
HELIX
SHEET
SSBOND
Example: 1gcn

• OXT - extra oxygen atom on the terminal carboxyl group.


• HXT- extra hydrogen atom - rarely seen
• TER - terminates the amino acid chain
• The last residue in the alpha chain is THR.
• Again, the extra oxygen atom OXT appears in the terminal carboxyl group.
• The TER record indicates the end of the peptide chain.
• It is important to have TER records at the end of peptide chains so a bond
is not drawn from the end of one chain to the start of another.
Example: 3hhb

• At the end of chain A, the heme group records appear


• The last residue in the alpha chain is an ARG (arginine).
• Again, the extra oxygen atom OXT appears in the terminal carboxyl group.
• The TER record indicates the end of the peptide chain.
• It is important to have TER records at the end of peptide chains so a bond is not
drawn from the end of one chain to the start of another.
• In the example above, the TER record is correct and should be present, but the
molecule chain would still be terminated at that point even without a TER record,
because HETATM residues are not connected to other residues or to each other.
• The heme group is a single residue made up of HETATM records.
Hydrogen Atoms Example: 1vm3

• Hydrogen atom records follow the records of all other atoms of a particular residue.
• A hydrogen atom name starts with H. The next part of the name is based on the name of the
connected nonhydrogen atom.
• For example, in amino acid residues, H is followed by the remoteness indicator (if any) of the
connected atom, followed by the branch indicator (if any) of the connected atom;
• If more than one hydrogen is connected to the same atom, an additional digit is appended so
that each hydrogen atom will have a unique name.
Common Errors in PDB Format Files
• Spurious Long Bonds
• Missing TER cards - Either a TER card or a
change in the chain ID is needed to mark the
end of a chain
• Improper use of ATOM records instead of
HETATM records
• Misaligned Atom Names
• Incorrectly aligned atom names in PDB records
can cause problems
• Duplicate Atom Names
• failure to uniquely name all atoms within a given
residue
• Residues Out of Sequence
• the second residue in the file is erroneously
numbered
• Common Typos
• Sometimes the letter l is accidentally substituted
for the number 1
• Missing Coordinates and Biological Assemblies
• Due to the limitations of structure determination methods,
most entries do not include coordinates for every single
atom in the identified molecule.
• In some cases, the experimental method may not observe
certain atoms. For example, flexible regions and hydrogen
atoms are not observed in X-ray crystallographic
experiments, and therefore, are not included in the PDB
coordinate files.
• A few of the common situations you might encounter are
– Asymmetric and Biological Assemblies (PDB ID:1hho)
– Alpha-Carbon Coordinate Files (PDB ID:1f6g)
– Missing Loops and Tails (PDB ID:1az5)
– Fragments and Domains (PDB ID:2a7u)
Exercise : Understanding PDB
Files
• Go to www.rcsb.org
• or search PDB in Google
• Search and download 1gcn
• Search and download 3hhb
• Search and download 1vm3
• Do not double click to open the file but right click the file and
choose ‘Open with’ option.
• Choose program ‘WordPad’ to open.
Alternate version of the exercise

• Go to www.rcsb.org
• or search PDB in Google
• Search 1gcn
• Search 3hhb
• Search 1vm3
• Click ‘Display Files’
• Explore ‘PDB Format’
Need More info?
• Check the following links…
• Introduction to PDB Data
• http://pdb101.rcsb.org/learn/guide-to-
understanding-pdb-data/introduction
Ramachandran Plot
• A special way for plotting
protein torsion angles was also
introduced by Ramachandran
and co-authors, and was
subsequently named the
Ramachandran plot.
• The Ramachandran plot
provides an easy way to view
the distribution of torsion
angles in a protein structure.
• The two torsion anglesdescribe
the rotations of the
polypeptide backbone around
the bonds between N-Cα
(called Phi, φ) and Cα-C (called
Psi, ψ).
• Torsion angles are among the most important
local structural parameters that control protein
folding - essentially, if we would have a way to
predict the Ramachandran angles for a particular
protein, we would be able to predict its fold.
• The torsion angles phi and psi provide the
flexibility required for the polypeptide backbone
to adopt a certain fold, since the third possible
torsion angle within the protein backbone (called
omega, ω) is essentially flat and fixed to 180
degrees.
• The horizontal axis shows φ values, while the vertical shows ψ
values.
• Notice that the counting starts in the left hand corner from -180
and extend to +180 for both the vertical and horizontal axes.
• Each dot on the plot shows the angles for an amino acid.
• This allows clear distinction of the characteristic regions of α-
helices and β-sheets.
• The regions on the plot with the highest density of dots are the
so-called “allowed” regions, also called low-energy regions.
• Some values of φ and ψ are forbidden since the involved atoms
will come too close to each other, resulting in a steric clash.
• For a high-quality and high resolution experimental structure
these regions (generously allowed and disallowed) are usually
empty or almost empty - very few amino acid residues in
proteins have their torsion angles within these regions.
• But there are sometimes exclusions from this rule - such values can be
found and they most probably will result in some strain in the polypeptide
chain.
• In such cases additional interactions will be present to stabilize such
structures. They may have functional significance and may be conserved
within a protein family.
• Another exception from the principle is the torsion angle distribution for one
single residue, glycine.
• Glycine does not have a side chain, which allows high flexibility in the
polypeptide chain, making otherwise forbidden rotation angles accessible.
• That is why glycine is often found in loop regions, where the polypeptide
chain needs to make a sharp turn.
• This is also the reason for the high conservation of glycine residues in
protein families, since the presence of turns at certain positions is a
characteristic of a particular fold of a structure.
• Another residue with special properties is proline, which in contrast to
glycine fixes the torsion angles at a certain value, very close to that of an
extended β-strand.
• Proline is often found at the end of helices and functions as a “helix
disruptor”.
Structure Quality Assessment
• In cases when the protein X-ray structure was not properly
refined, and especially for bad or wrong homology models,
we may find torsion angles in disallowed regions of the
Ramachandran plot − this type of deviations usually
indicates problems with the structure.
• Based on this, the Ramachandran plot is usually used in
assessing the quality of experimental structures or
homology models.
• Torsion angles outside the low-energy regions, whenever
observed, should be carefully examined.
• They may indicate problems in the structure, but they may
also be true and may provide some interesting insights into
the function of the protein.
Red indicates low-energy regions and allowed regions; yellow
allowed regions, pale yellow the so-called generously-allowed
regions and white marks disallowed regions. A: Good, B: Bad.
Exercise – Ramachandran Plot
RAMPAGE can be accessed from
http://mordred.bioc.cam.ac.uk/~rapper/rampage.php
Procheck can be accessed through PDBSum http://www.ebi.ac.uk/thornton-
srv/databases/cgi-bin/pdbsum/GetPage.pl?pdbcode=index.html
Upload a PDB file by clicking the ‘Browse’ button.
Provide an email ID to receive the results in email.

You might also like