0% found this document useful (0 votes)
1K views50 pages

Protein Structure Classification Guide

SCOP and CATH are hierarchical databases that classify protein domains based on structural and evolutionary relationships. SCOP classifies proteins into classes, folds, superfamilies, and families. CATH classifies proteins into classes, architectures, topologies, superfamilies, and families. Both aim to organize the growing number of protein structures and provide a framework for understanding protein evolution and function.

Uploaded by

rahil2989
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views50 pages

Protein Structure Classification Guide

SCOP and CATH are hierarchical databases that classify protein domains based on structural and evolutionary relationships. SCOP classifies proteins into classes, folds, superfamilies, and families. CATH classifies proteins into classes, architectures, topologies, superfamilies, and families. Both aim to organize the growing number of protein structures and provide a framework for understanding protein evolution and function.

Uploaded by

rahil2989
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd

SCOP & CATH

Dr. M.I. Hassan


1. Protein Data Bank (PDB)

• Protein Data Bank: maintained by the Research


Collaboratory for Structural Bioinformatics (RCSB)

• http://www.rcsb.org/pdb/
– 30060 Structures 15-Mar-2005
– 27570 Structures 05-Oct-2004
– 23997 Structures 20-Jan-2004
– 62787 Structures 20-Jan-2010

– Also contains structures of other bio-macromolecules: DNA,


carbohydrates and protein-DNA complexes.
PDB Content Growth
Growth Of Unique Folds Per Year As Defined By SCOP
Growth Of Unique Topologies Per Year As Defined By CATH
Alternative Source of Structure: NCBI
Free Software for Protein Structure
Visualization

• RASMOL: available for all platforms


http://www.openrasmol.org
• Swiss PDB Viewer: from Swiss-Prot http://
www.expasy.ch/spdbv/
• Chemscape Chime Plug-in: for PC and Mac http://
www.mdl.com/downloads/downloadable/index.jsp
• YASARA: http://www.yasara.org/
• MOLMOL: MOLecule analysis and MOLecule display
http://129.132.45.141/wuthrich/software/molmol/index.html
Hierarchical classification of protein
domains: SCOP & CATH

• SCOP: Structural Classification of Proteins


University of Cambridge, UK
http://scop.mrc-lmb.cam.ac.uk/scop/
Hyperlink in Singapore: http://scop.bic.nus.edu.sg/

• CATH: Class—Architecture—Topology
--Homologous Superfamily
Sequence family
University College London, UK
http://www.biochem.ucl.ac.uk/bsm/cath/
Basis for protein classification
Proteins adopt a limited number of topologies
More than 50,000 sequences fold into ~1000 unique
folds.

Homologous sequences have similar structures


Usually, when sequence identity>30%, proteins adopt the
same fold. Even in the absence of sequence homology,
some folds are preferred by vastly different sequences.

The “active site” is highly conserved


A subset of functionally critical residues are found to be
conserved even the folds are varied.
The hierarchy in SCOP

Root
5 classes: All-, All-β, / β, + β,
Class multi-domain

Fold Have the same major secondary


structure & topological connections

Superfamily Probable common ancestry

Family Clear evolutionary relationship

Protein
How many unique folds do organisms
use to express functions?

Sequence space
> 50,000

Conformational
Many sequences to form space
one unique fold
~1,000 ???????
Growth of Protein Databases

90000 12000
Sequences

No. of Structures and Folds


80000 Structures
10000
70000 Folds
No of Sequences

60000 8000
50000
6000
40000
30000 4000
20000
2000
10000
0 0
1988

1990

1996

1998
1986

1992

1994

2000
Structural Classification of Proteins
SCOP
• University of Cambridge, UK:
http://scop.mrc-lmb.cam.ac.uk/scop/
– mirrored at Singapore: http://scop.bic.nus.edu.sg/
– contains PDB entries grouped hierachically by:
• Structural class,
• Fold,
• Superfamily,
• Family,
• Individual member
(domain-based)
Structural Classification of Proteins
SCOP
• Family

• Proteins are clustered together into families on the


basis of one of two criteria that imply their having a
common evolutionary origin:
• All proteins that have residue identities of 30% and
greater;
• Proteins with lower sequence identities but whose
functions and structures are very similar

Example, globins with sequence identities of 15%.


Structural Classification of Proteins
SCOP
• Superfamily

• Families, whose proteins have low sequence identities


but whose structures and, in many cases, functional
features suggest that a common evolutionary origin is
probable, are placed together in superfamilies

• Example, actin, the ATPase domain of the heat-


shock protein and hexokinase
Structural Classification of Proteins
SCOP
• Fold
• Superfamilies and families are defined as having a
common fold if their proteins have same major
secondary structures in same arrangement with the
same topological connections.
Structural Classification of Proteins
SCOP
• Class
– For convenience of users, the different folds have been grouped into
classes. Most of the folds are assigned to one of a few structural classes
on the basis of the secondary structures of which they composed
SCOP Class: All- topologies
cytochrome ferritin
b-562
SCOP Class: All- topologies
SCOP Class: All- topologies
SCOP Class: All- topologies

 sandwiches -barrels
SCOP Class: All- topologies
SCOP Class: Topologies

 horseshoe
SCOP Class: Topologies

 barrels
SCOP Class: Topologies
SCOP Class: Alpha+Beta Topologies
SCOP Class: Alpha+Beta Topologies
Ubiquitin

1ubi
Ubiquitin

1ubi
Ubiquitin

1ubi
Ubiquitin

1ubi
CATH database
http://www.biochem.ucl.ac.uk/bsm/cath/

CATH:
Class—Architecture—
Topology--Homologous
Superfamily--Sequence
family
Orengo et al. CATH-a hierarchical
classification of protein domain
structures (1997) Structure 5, 1093-
1108

Sequence identity >30% the same overall fold


Sequence identity >70% the same overall fold
+ the similar function
The hierarchy in CATH
Class 3 classes: Mainly-, Mainly-β, -β

Architecture Overall shape as determined by


orientations of secondary structures

Topology Both the overall shape & connectivity


of secondary structure
Homologous
Share a common ancestor
Superfamily

Sequence Classified based on sequence


identity
CATH database
Class
Derived from secondary structure content, is assigned for more than 90% of protein structures
automatically.

Architecture
Describes the gross orientation of secondary structures, independent of connectivities, is currently
assigned manually.

Topology
Clusters structures according to their topological connections and numbers of secondary structures.

Homologous superfamilies
Cluster proteins with highly similar structures and functions. The assignments of structures to
topology families and homologous superfamilies are made by sequence and structure comparisons.

Sequence families
Structures within each H-level are further clustered on sequence identity. Domains clustered in the
same sequence families have sequence identities >35%.

Non-identical sequence domains, Identical sequence domains, Domains


CATH database
The class (C), architecture (A) and
topology (T) levels in the CATH database

Class

Architecture

Topology
The class (C), architecture (A) and
topology (T) levels in the CATH database

Homologous
Superfamily
CATH – architectures
CATH – architectures (cont.)
The protein structure universe in
the PDB (1997) by a CATH wheel
The distribution of non-
homologous structures
(i.e. a single
representative from
each homologous
superfamily at the H-
level in CATH) amongst
the different classes (C),
architectures (A) and
fold families (T) in the
CATH database.
SCOP / CATH -> DALI
SCOP & CATH

• Hierarchical and based on abstractions


• Include some manual aspects and are curated by experts in the field
of protein structure

Dali
Presentation of results of computer classification, where the methods that
underlie the classification remain internal

Structure comparison
DALI
Comparing protein structures in 3D

   anti parallel barrelmeander

More information about DALI


Touring protein fold space with Dali/FSSP: Liisa Holm and Chris Sander
Compare 3D protein structures by Dali
http://www.ebi.ac.uk/dali/
Compare 3D protein structures by Dali
http://www.ebi.ac.uk/dali/
• The FSSP database (Fold classification based on Structure-Structure alignment
of Proteins) is based on exhaustive all-against-all 3D structure comparison of
protein structures currently in the Protein Data Bank (PDB).
• The classification and alignments are automatically maintained and
continuously updated using the Dali search engine.

Dali Domain Dictionary


• Structural domains are delineated automatically using the criteria of recurrence
and compactness. Each domain is assigned a Domain Classification number
DC_l_m_n_p , where:
 l - fold space attractor region
 m - globular folding topology
 n - functional family
 p - sequence family
Compare 3D protein structures by Dali
http://www.ebi.ac.uk/dali/
Functional families
• Evolutionary relationships from strong structural similarities which are
accompanied by functional or sequence similarities.
• Functional families are branches of the fold dendrogram where all pairs
have a high average neural network prediction for being homologous.
Sequence families
• Representative subset of the Protein Data Bank extracted using a 25 %
sequence identity threshold.
• All-against-all structure comparison was carried out within the set of
representatives.
• Homologues are only shown aligned to their representative.
Compare 3D protein structures by Dali
http://www.ebi.ac.uk/dali/
Fold types
• Fold types are defined as clusters of
structural neighbors in fold space with
average pairwise Z-scores (by Dali)
above 2.

Structural neighbours of 1urnA (top left).


1mli (bottom right) has the same
topology even though there are shifts in
the relative orientation of secondary
structure elements
Summary

 Protein structure database (PDB)

 Protein structure visualization software

 Structural classification, databases and


servers

You might also like