You are on page 1of 4

Pfam is a database of protein families and domains that is widely used to analyse novel

genomes, metagenomes and to guide experimental work on particular proteins and


systems 1. From Pfam inception in 1998, Pfam has been designed to scale with the growth
in the number of new protein sequences deposited 2. To achieved the scalability 2, each
Pfam family has a seed alignment that contains a representative set of sequences for the
entry 1. The seed alignments are used to build profile hidden Markov models (HMMs)
that can be used to search any sequence database for homologues in a sensitive and
accurate fashion 2. Those homologues that score above the curated inclusion thresholds
are aligned against the profile HMMs to make a full alignment 2. By searching a protein
sequence against the Pfam library of profile HMMs, you can determine which domains it
carries 5. Pfam can also be used to analyse proteomes and questions of more complex
domain architectures 5.
The newest version, Pfam 36.0, contains a total of 20,795 families and 660 clans 4.
Some of the Pfam entries are grouped into clans 5. Pfam defines a clan as a collection of
entries that have arisen from a single evolutionary origin 5. Evidence of their evolutionary
relationship can be in the form of similarity in tertiary structures, or, when structures are
not available, from common sequence motifs 5. Since the last release, 1191 new families
have been built 4. 28 families have been kill, and 5 new clans have been created 4.
Additionally, around 1.5% of existing Pfam entries have been updated 4. 2,818 families
have seen a change in their boundaries, 281 of them have changed by more than 50
residues, most of them got trimmed or split into domains often due to improved
information from accurate structural models 4.
Pfam families are divided into two categories, Pfam-A and Pfam-B 3. Each Pfam-A
family consists of a curated seed alignment containing a small set of representative
members of the family, profile HMMs built from the seed alignment and an automatically
generated full alignment which contains all detectable protein sequences belonging to the
family, as defined by profile HMM searches of primary sequence databases 3. Pfam-B
entries are automatically generated from the ProDom database (The ProDom protein
domain family database originates from the early recognition that automated methods are
needed to reach comprehensiveness of protein domain analysis 6. This comprehensiveness
makes ProDom a unique resource usefully complementing expert derived databases such
as PFAM 6), and are represented by a single alignment 3. The use of representative seed
alignments for Pfam-A families allows efficient and sustainable manual curation of
alignments and annotation, while the automatic generation of full alignments and Pfam-B
clusters ensures that Pfam is a comprehensive classification of protein families that scales
effectively with the growth of the sequence databases 3.
Pfam type definitions divide entries into one of six types and they can help users in
selecting which Pfam families to use in their analyses 1. In particular, a large scale screen
of Pfam families have been carried out using the ncoils software to identify families with
a high proportion of predicted coiled-coil, and after inspection of such families, their type
were able to be changed 1.
7

Pfam is now hosted by InterpPro. InterPro is a


bioinformatics resource that provides functional analysis
of protein sequences by classifying them into families and
predicting the presence of domains and important sites 8.
To classify proteins in this way, InterPro uses predictive
models, known as signatures, provided by several
different databases (referred to as member databases) that
make up the InterPro Consortium 8. Pfam is one of the
member database. Different member databases use
different methods to construct their signatures, and they
have their own particular focus of interest: structural
and/or functional domains, protein families, or protein
features such as active sites or binding sites 9.

The InterPro homepage easily accessible by clicking


on the InterPro link in the Proteins panel on the EBI
services page 8

Pfam has been retired. The Pfam website codebase was


first released over 20 years ago, and although it has been
updated from time to time, some of its core functionality still dates back to its origins 10.
There is a lot of technical debt in its current state, and to maintain becomes harder 10. By
retiring the website, the core of Pfam will be focused on producing 10. The deployment
and visualisation tasks is leaved to the InterPro website 10. InterPro was redesigned in
recent years, using up to date technologies, including a modern framework 10.
1 https://academic.oup.com/nar/article/49/D1/D412/5943818
2 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2808889/
3 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2238907/
4 https://xfam.wordpress.com/2023/09/18/pfam-36-0-release/
5 https://pfam-docs.readthedocs.io/en/latest/faq.html
6 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC539988/
7 Pfam is now hosted by InterPro (xfam.org)
8 https://www.ebi.ac.uk/training/online/courses/interpro-functional-and-structural-
analysis/what-is-interpro/
9 https://www.ebi.ac.uk/training/online/courses/interpro-functional-and-structural-
analysis/what-is-interpro/where-does-data-come-from/
10 https://xfam.wordpress.com/2022/08/04/pfam-website-decommission/

You might also like