Professional Documents
Culture Documents
Biological Database
Biological Database
1982 EMBL initiated DNA sequence database, followed within a year by GenBank of NCBI and in 1984 by DNA Database of Japan 1988 EMBL/GenBank/DDBJ agreed on common format for data elements
Autonomous: many independent maintainers Heterogeneous data formats: e.g., various data formats for the same data entities; various types of biological data: genomic, microarray, proteomic, ... Dynamic: frequent and continuous changes in data content. Broad domain knowledge. Workflow-oriented: databases. Rich set of analysis tools. Information integration is essential: data aggregation from several databases.
1078 databases reported in The Molecular Biology Database Collection: 2008 update by Michael Y. Galperin , Nucleic Acids Research, 2008, Vol. 36, Database issue D2-D4 Metabase: database of biological databases,
Update (adding new data) frequency: daily to annually Free accessibility (almost all)
Types of databases
Primary databases
Original submissions by experimentalists Content controlled by the submitter Examples: GenBank, GEO
Derivative databases
Built from primary data Content controlled by third party (NCBI) Examples: Refseq, RefSNP, UniGene, NCBI Protein, Conserved Domain, Gene.
Genomic database
Sequence Databases
The sequence databases are the oldest type of biological databases, and also the most widely used.
Sequence Databases
International nucleotide sequence database collaboration-include EMBL,DDBJ,NCBI{GENBANK} Coding and non-coding DNA Gene structure,intron exon,splice site Transcriptional regulator site and transcription factors RNA sequence database
General sequence database e.g. swiss-prot, uni-prot, refseq Protein properties e.g. Binding DB, PPT-DB Protein localization and targeting e.g.
Structure database
There are various types of structure databases : Small molecules e.g. AANT : Amino Acid - Nucleotide interaction database Carbohydrates e.g. Glycoconjugate Data Bank: Structures an annotated glycan structure database and N-glycan primary structure verification service . Nucleic acid e.g. MeRNA (Metals in RNA) -a comprehensive compilation of all metal binding sites identified in RNA threedimensional structures available from the Protein Data Bank (PDB) and Nucleic Acid Database. Protein structure e.g. The Protein Data Bank (PDB) is the single worldwide archive of structural data of biological macromolecules.
PDB
The Protein Data Bank ( PDB ) was established at Brookhaven National Laboratories in 1971 as an archive for biological macromolecular crystal structures.
The three dimensional structures in PDB are primarily derived from experimental data obtained by X-ray crystallography and NMR .
SCOP
The SCOP database groups different protein structures according to their evolutionary relationship.The evolutionary relationship of all known protein structures have been determined by manual inspection and automated methods.
The goal of SCOP is to provide detail information about close relatives of proteins and to provide an evolutionary based protein classification resource.
GENOMIC DATABASE
General genomic database
e.g. Entrez Gene--It is NCBI's database for gene specific information. It does not include all known or predicted genes; instead Entrez Gene focuses on the genomes that have been completely sequenced,
Microarray data and gene expression database. Plant database. Immunological database. Human gene and disease. Literature database. EST databases.
ESTs
EST expressed sequence tag partial DNA sequence (single-pass) of a cDNA clone provides the most comprehensive evidence for the existence of genes and their structure provide an inventory of likely genes and their variants along with information regarding the functional roles played by these genes and their products.
UniGene is a database at NCBI that contains clusters (UniGene clusters) of sequences that represent unique genes. These cluster are made automatically by partitioning GenBank sequences into a non-redundant set of gene-oriented clusters.
Other EST cluster databases are TIGR Gene Indices, Sputnik: Annotation of clustered plant ESTs:
Entrez Browser (at NCBI) ExPASy (home of SwissProt) Ensembl (Open Source based system)
References
Lukas K. Buehler, Hooman H. Rashidi :Bioinformatics basics MArketa Zuelebil, Jeremy o. Baum:Understanding Bioinformatics Yi Ping Phoebe Chen:Bioinformatics technologies Maureen J. Donlin:Introduction to Genomics and Bioinformatics Biological databases an introduction: Dr. Erik Bongcam-Rudloff Building successful biological databases: Russ B. Altman Stanford University Google http://www.google.com
http://nar.oupjournals.org