You are on page 1of 3

Availability of Nucleotide and Protein database on Internet

Course Name:
Bioinformatics-2

Question No 1:

Give a comprehensive account of nucleotide and protein database available on internet

Answer:

Protein sequence database

Primary database of protein

The primary database keeps the experimentally proteins that are summarized from the conceptual translation of the nucleotide
sequence. This data is not experimentally derived data but it has arisen due to evaluation of the nucleotide collection data.
There are some number one protein series databases.

PIRPSD:

The PIR-PSD is a combined enterprise between PIR (protein information resource), the MIPS (Munich Information Center
For Protein Sequence, Germany) and the JIPID ( Japan International Protein Information Database, Japan).
The PIR-PSD is combined, object relational DBMS and non-redundant. It is public domain PSD protein sequence
Database. One of the specific functions of the PIR-PSD is its own class of protein sequence that is primarily based on the superfamily
concept.
PIR-PSD series are primarily categorized in the area of homology and motifs series. These homology domain names can correspond to
evolutionary construction blocks and on the other hand motifs series comprise useful websites. The class method permits an extra
entire knowledge of series characteristics- shape relationship

SWISSPORT:

Swissport is the most commonly used protein database. This database is curated by humans.
The basic aim of Swissport is to provide all the necessary information about a specific protein.
It is widely recognized and basically this SWISS-PORT protein database is widely used. These are curated protein series like PIR-
PSD and these protein series afford an extra degree of a SWISS-PRO.
The information of these protein series may be taken into the pondering at a time as annotation.
The data which is present in every entry can be placed separately as annotation.
This center record includes the sequence that is entered in common single letter amino acid code, and the associated bibliography and
reference. The sequence that is obtained from taxonomy forms the parts of this core information.
The annotation consists of facts at the characteristics of the protein and the post translational alteration with phosphorylation,
acetylation etc and also the structural domain name and sites, ATP- binding sites, comprehensive of calcium binding regions,and zinc
fingers etc and the quaternary structural sequence of the protein which is similar to other protein if any and the authors publishing
different sequence for the same protein is the reason of some disease or mutation in different strains of a describe is a part of
annotation.
TrEMBL (for Translated EMBL) is a computer-annotated protein sequence database which is affiliated as a supplement to SWISS-
PORT. It consists of all coding sequences present in the EMBL nucleotide database and which have not been fully annotated. This
may contain the sequence of proteins which are not expressed and never identified in the organisms.

Protein Data Bank:

It is primarily a protein structure database. It has a 3 dimensional shape of a crystallographic database of big organic molecules,
including protein.
Although Protein Data bank archives this three dimensional structure that is not only for protein but also for all biologically large
molecules like RNA molecules, Nucleic acid fragments, big peptides including antibiotic gramicidin.
This database basically holds the data which is derived from the three sources, Structure recognized by means of crystallography, x-
ray, molecular modeling and NMR experiments.

Secondary Database Of protein:

Secondary databases are so named because they consist of the result of analysis of series held in the primary database. Many
secondary protein databases are the end outcome of different proteins. There are some commonly used secondary sequence databases
and structures.

PROSITE:

It is a set of collective database styles determined in protein as opposed to whole sequence. It is one such sample database.
Its pattern and protein motif are encoded as regular expressions.
The information respective to each entry has two forms in PROSITE, the pattern and the related descriptive text.

PRINTS:

In the PRINTS database, the series style of protein is saved as fingerprints. A fingerprint is faster of style and motifs than a single one.
The information in it is divided into three sections
In addition to entry name, number of motif and accession number, the primary section incorporates hyperlink to different databases
which also have the great record approximately about the chracteritized family.
The second segment provides a desk representing how many motifs that make up the fingerprint happen in how many sequences of
that family.
The third section consists of the actual fingerprints that are recorded as multiple aligned sets of sequence, the alignment is made
without any space, Therefore there is only one set of aligned sequence for one motif.

MHCPep:

It is a database comprising over 13000 peptide sequences also known to bind the major Histocompatibility Complex of the immune
system.
Each entry in database consist of not only peptide sequence which can be 8 to 10 amino acid long but also has the information on the
specific MHC molecule to which it binds, the experimental approach use to assay the peptide the binding affinity observed and degree
of activity observed, the source protein when broken down give rise to peptide, the position along with peptide where it anchors on the
MHC molecule and cross link to other information.

Pfam:

It contains the profile used in the Hidden Markov models.


HMMS builds the model of pattern as the series of the match, insert or delete state, substituted with scores assigned for alignment to
go from one state to another.
Every member of a family or pattern in pfam consists of four elements. The first one is annotation, which contains the information on
the source to make the entry, the method used and few numbers that serve the figure of merit.
The second is the seed alignment which is used to bootstrap the relaxation of the sequence into multiple alignment then the family.
Third one is HMM profile which is 0.33
The fourth one is the detail of the whole alignment of all these sequences recognised in that own circle of relatives.

Nucleic Acid Sequence database:

The nucleotide database is the set of sequences from various sources which include Genebank, RefSeq, TPA, PDB. Genome, gene and
transcript collection information offer the inspiration for biomedical studies.

Primary database of nucleotide sequence:

It consists of three ladder databases that were made to have uncooked nucleic acid sequences for the general public and researchers
alike GenBank, EMBL, and DDBJ.

GenBank:

It is bodily positioned within the side USA and is available via the NCBI portal over the intern. Its collection database has open access
and annotated series of all publicly available nucleotide sequences and their protein translations.
This database is the national center for biotechnology information as part of the International Nucleotide sequence Database
Collaboration. It has become an important database in biological fields.

EMBL: ( European Molecular Biological Laboratory)

EMBL nucleotide sequence database is the comprehensive collection of primary nucleotide sequences marinated at the European
Bioinformatic institute. It received data from the genomic sequencing center, individual scientists and patent officers.

DDBJ: (DNA Databank of Japan)

This data bank is located at the National Institute of Genetics (NIG) in the Shizuoka prefecture of Japan and it is the only nucleotide
sequence data bank in Asia. Although DDBJ mostly receives its data from Japanese researchers, it can also collect data from
contributors from any other country.

You might also like