You are on page 1of 4

Nucleotide Sequence Databases, GenBank, DDBJ, EMBL,

A database is an organized collection of data, generally stored and accessed electronically from
a computer system.

BIOLOGICAL DATABASES:

 libraries of biological information collected from:

 Scientific experiments

 published literature

 high-throughput experiment technology and

 computational analyses

The INFORMATIONS includes genomics, proteomics, metabolomics, microarray gene expression,


phylogenetics, biological networks etc.

 The design, development, and long-term management of Biological database is a core area of the
discipline of bioinformatics

[[ Relibase: database for comprehensive analysis of protein-ligand interactions.

In molecular biology, REBASE is a database of information about restriction enzymes and DNA
methyltransferases. REBASE contains and extensive set of references, sites of recognition and
cleavage, sequences and structures. It also contains information on the commercial availability
of each enzyme.

GPCRdb contains data, diagrams and web tools for G protein-coupled receptors (GPCRs).
Users can browse all GPCR structures and the collections of receptor mutants.]]
Nucleotide Sequence database

The International Nucleotide Sequence Database Collaboration (INSDC, http://insdc.org)


consists of a joint effort to collect, catalogue, provide open access to published biological data
and disseminate databases containing nucleotide (DNA and RNA) sequences. It involves the
following computerized databases: DNA Data Bank of Japan (Japan)(DDBJ), GenBank (USA) and
the European Nucleotide Archive (UK). New and updated data on nucleotide sequences contributed
by research teams to each of the three databases are synchronized on a daily basis through
continuous interaction between the staff at each the collaborating organizations.

((Annotated: add notes to (a text or diagram) giving explanation or comment.))

This work began in the early 1980s when DNA sequence data began to accumulate in
the scientific literature. The EMBL Data Library (now the European Nucleotide
Archive) was developed to store DNA sequences published in the scientific literature.
The NCBI’s GenBank and NIG’s DDBJ followed.

 The databases EMBL, GenBank, and DDBJ are the three primary nucleotide sequence databases: 
They include sequences submitted directly by scientists and genome sequencing group, and sequences
taken from literature and patents.  There is comparatively little error checking and there is a fair
amount of redundancy (repeated).

GenBank sequence records are owned by the original submitter and cannot be altered by a
third party. RefSeq sequences are not part of the INSDC but are derived from INSDC
sequences to provide non-redundant curated (organized professionally) data.

 The entries in the EMBL, GenBank and DDBJ databases are synchronized on a daily basis, and the
accession numbers are managed in a consistent manner between these three centers.  The nucleotide
databases have reached such large sizes that they are available in subdivisions for quick search and
downloads.

NIH (National Institutes of Health in the United States)  NCBI (National Centre for Biotechnology
Information) is a part of NIH NCBI produced and maintained Genbank.
Submissions in Genbank:

Only original sequences can be submitted to GenBank. Direct submissions are made to GenBank using
BankIt, which is a Web-based form, or the stand-alone submission program, Sequin.

Upon receipt of a sequence submission, the GenBank staff examines the originality of the data and
assigns an accession number to the sequence and performs quality assurance checks.

The submissions are then released to the public database, where the entries are retrievable by Entrez
(search engine of ncbi) or downloadable by FTP.

 There are no legal restrictions on the use of the data in these databases. However, there are
patented sequences in the databases.

GENBANK

 It offers a daily exchange of information with other major sequence databases, has a variety of user
interfaces, fairly detailed online help (with e-mail addresses for more information if what is already
available is not sufficient), and a speedy interface.

 Because of its popularity, however, GenBank can also be very slow during peak research hours.

 Established by the National Center for Biotechnology Information (NCBI), GenBank is a collection of all
known nucleotide sequences from scientists around the world.

 Searching GenBank is fairly straightforward and can be done with a variety of search tools.

 Submitting sequences to Gen Bank is also very easy and is required by most journals before articles
pertaining to the sequence are published (this provides easy access to the information for the journal's
readers).

EMBL:

 EMBL is good to use when you need a limited amount of data and when you are not trying to identify
a gene by sequence analysis.  However, because EMBL and all of its mirror sites are located in Europe,
your connection will be slow more often than not.

 All of the information submitted to EMBL is mirrored daily in both GenBank and DDBJ, so searching
elsewhere might provide the same amount of information in less time.  EMBL is the database for the
European Molecular Biology Laboratory.  It is a flat-file database that is searched by a multitude of
various search engines.

 EMBL sequences are stored in a form corresponding to the biological state of the information in vivo.
 Thus, cDNA sequences are stored in the database as RNA sequences, even though they usually appear
in the literature as DNA.
DDBJ:

 Because DDBJ mirrors its information daily with GenBank and EMBL, beginning sequence searchers
might want to try a database with a friendlier searching interface.  However, DDBJ also offers all of its
pages in Japanese as well, so if you are more comfortable reading the Japanese versions of the pages, it
can be very useful.

 DDBJ, the DNA Data Bank of Japan, was established in 1986 to be one of the major international DNA
Databases (with GenBank and EMBL).  It is certified to collect information from researchers and assign
accession numbers to submitted entries.  SEARCHING DDBJ is somewhat awkward as the only way to
access most of the data is by its accession number via anonymous FTP.

-------------------------

 Biological databases are an important tool in assisting scientists to understand and explain biological
phenomena from the in-depth knowledge of structure of biomolecules and their interaction

 This knowledge helps facilitate the fight against diseases, assists in the development of medications
and in discovering basic relationships amongst species in the history of life.

You might also like