You are on page 1of 11

III.

PROTEIN CLASSIFICATION
SCOP
CATH
• A hierarchical domain classification of protein
structures in the Protein Data Bank
• Only crystal structures solved to resolution
better than 4.0 angstroms are considered,
together with NMR structures
• Class – Architecture – Topology – Homologous
superfamily
CATH – classification
• All the classification is performed on individual protein
domains

• To divide multidomain protein structures into their


constituent domains - a range of algorithms which include
structure based methods (CATHEDRAL, SSAP, DETECTIVE)
used.

• If a given domain has sufficiently high sequence and structural


similarity (ie. 35% sequence identity, SSAP score >= 80) with a
domain that has been previously classified in CATH, the
classification is automatically inherited from the other
domain. Otherwise, the domain is classified manually
CATH - heirarchy
Class, C-level 
• Class is determined according to the
secondary structure composition and packing
within the structure.
• Three major classes are recognised; mainly-
alpha, mainly-beta and alpha-beta (includes
both alternating alpha/beta structures and
alpha+beta) and proteins with low secondary
structure
CATH - heirarchy
Architecture, A-level
• describes the overall shape of the domain
structure as determined by the orientations of
the secondary structures
• ignores the connectivity between the
secondary structures.
• It is currently assigned manually using a
simple description of the secondary structure
arrangement e.g. barrel or 3-layer sandwich.
CATH - heirarchy
Topology (Fold family), T-level
• Structures are grouped according to whether
they share the same topology or fold in the
core of the domain, that is, if they share the
same overall shape and connectivity of the
secondary structures in the domain core.
• Domains in the same fold group may have
different structural decorations to the
common core.
CATH - heirarchy
Homologous Superfamily, H-level
• This level groups together protein domains
which are thought to share a common
ancestor and can therefore be described as
homologous.
• Similarities are identified either by high
sequence identity or structure comparison
using SSAP.
Sequence Family Levels: (S,O,L,I,D)
• Domains within each H-level are subclustered
into sequence families using multi-linkage
clustering at the following levels:
Sequence
Level Name Overlap
Identity
S Sequence Family (S35) 35% 80%

O Orthologous Family (S60) * 60% 80%

L Like” domain (S95) 95% 80%

I Identical domain (S100) 100% 80%


Unique
D Domain counter
Search in CATH
• Begin your search by typing in your search
word into the search box in the right hand
corner of any CATH page and click 'Quick
Search.
• To find a particular domain, use either the
CATH domain ID, CATH chain ID or the PDB
code as the search term. 
Protocol

• Go to the CATH database homepage


http://www.cathdb.info/
• Select the option “Search CATH by ID/sequence/text”
• In the query box, paste the FASTA sequence of the
protein and click ‘sequence search”
• In the results page, identify the CATH domain with
100% identity throughout the protein length (or
maximum identity for maximum length; whichever
applicable) and click on the CATH code for the domain
• From the link that opens, note down the CATH code
and level description for the domain
• This gives the CATH classification for the protein

You might also like