GenBank:
DNA and protein are complicated 3D molecules, composed of thousands or even millions of
atoms bonded together. DNA and proteins are both polymers, chains of repeating chemical units
with a common backbone holding together. In DNA four nucleic acid monomers (A, T, C and G)
are commonly used to build the polymer chain, the four nucleic acids can occur in any order, and
the order they occur determines what the DNA does.
GenBank:
The first major bioinformatics project was undertaken by Margaret Dayhoff in 1965, who
developed the first protein sequence database called Atlas of Protein Sequence and
Structure.
Subsequently, in the early 1970s, the Brookhaven National Laboratory established the
Protein Data Bank for archiving three-dimensional protein structures.
The first sequence alignment algorithm was developed by Needleman and Wunsch in
1970. This was a fundamental step in the development of the field of bioinformatics,
which paved the way for the routine sequence comparisons and database searching
practiced by modern biologists.
The 1980s saw the establishment of GenBank and the development of fast database
searching algorithms such as FASTA by William Pearson and BLAST by Stephen
Altschul and coworkers.
GenBank is the DNA database from the National Centre for Biotechnology Information (NCBI).
NCBI is a division of the National Library of Medicine, located at National Institute of health
(NIH) in Bethesda in Maryland. It incorporates sequences from publicly available sources,
primary from direct author submission and large-scale sequencing projects.
NCBI maintains sequence from every organism, every source, every type of DNA from mRNA
to cDNA to clone to expressed sequence tags (ETSs) to high throughput genome sequencing data
and information about sequence polymorphisms. There are approximately 126,551,501,141 bases
in 135,440,924 sequence records in the traditional GenBank divisions and 191,401,393,188 bases
in 62,715,288 sequence records in the division of April 2011.
The increasing size of the database and the diversity of data sources available have made it
convenient to split GenBank into smaller discrete divisions.
GENBANK SUBMISSION TYPE
GenBank accepts mRNA or genomic sequence data directly determined by the submitter.
The submission must include information about the source organism and annotation provided by
the submitter.
More details about adding annotation and sample files can be found in the GenBank Submissions
Handbook.
The following data is not accepted by GenBank
Primer sequences and Protein sequences with no underlying nucleotide.
submission Sequence containing a mix of genomic and mRNA sequence.
Sequences with length less than 200 nucleotides.
HOW TO SUBMIT DATA TO GENBANK
The most important source of new data for GenBank is direct submissions from scientists.
GenBank depends on its contributors to help keep the database as comprehensive, current, and
accurate as possible.
NCBI provides timely and accurate processing and biological review of new entries and updates
to existing entries and is ready to assist authors who have new data to submit.
Data exchange between DDBJ, EMBL and GenBank occurs daily so it is only necessary to
submit the sequence to one database, whichever one is most convenient, without regard for
where the sequence may be published.
Updating or Revising a GenBank Sequence Revisions or updates to GenBank entries can be
made by the submitters at any time. Information about the correct format for different types of
updates can be found on the Update guidelines page.
Confidentiality:
Some authors are concerned that the appearance of their data in GenBank prior to publication
will compromise their work. GenBank will, upon request, withhold release of new submissions
for a specified period. However, if a paper citing the sequence or accession number is published
prior to the specified date, your sequence will be released upon publication.
Privacy:
If you are submitting human sequences to GenBank, do not include any data that could reveal the
personal identity of the source. It is our assumption that you have received any necessary
informed consent authorizations that your organizations require prior to submitting your
sequences.