Biological Database

Biological database
An ever expanding reservoir of information.

Presented By: Mahesh Yadav
What is biological database???

Biological databases are libraries of life sciences information, collected from scientific experiments, published literature, high throughput experiment technology, and computational analyses. They contain information from research areas including genomics, proteomics, metabolomics, microarray gene expression and phylogenetics .
A brief history of biological databases
1965 M. O. Dayhoff et al. publish Atlas of Protein Sequences and Structures
1982 EMBL initiated DNA sequence database, followed within a year by GenBank of NCBI and in 1984 by DNA Database of Japan 1988 EMBL/GenBank/DDBJ agreed on common format for data elements
Biological Databases: specific features
Autonomous: many independent maintainers Heterogeneous data formats: e.g., various data formats for the same data entities; various types of biological data: genomic, microarray, proteomic, ... Dynamic: frequent and continuous changes in data content. Broad domain knowledge. Workflow-oriented: databases. Rich set of analysis tools. Information integration is essential: data aggregation from several databases.
Biological Databases: some statistics
More than 1000 different databases
1078 databases reported in The Molecular Biology Database Collection: 2008 update by Michael Y. Galperin , Nucleic Acids Research, 2008, Vol. 36, Database issue D2-D4 Metabase: database of biological databases,
Update (adding new data) frequency: daily to annually Free accessibility (almost all)
Types of databases
Primary databases
Original submissions by experimentalists Content controlled by the submitter Examples: GenBank, GEO
Derivative databases
Built from primary data Content controlled by third party (NCBI) Examples: Refseq, RefSNP, UniGene, NCBI Protein, Conserved Domain, Gene.
Genomic database
Biological database Sequence database Structure database
Sequence Databases
The sequence databases are the oldest type of biological databases, and also the most widely used.
Sequence Databases
Nucleotide sequence database Protein sequence database
International nucleotide sequence database collaboration-include EMBL,DDBJ,NCBI{GENBANK} Coding and non-coding DNA Gene structure,intron exon,splice site Transcriptional regulator site and transcription factors RNA sequence database
General sequence database e.g. swiss-prot, uni-prot, refseq Protein properties e.g. Binding DB, PPT-DB Protein localization and targeting e.g.
DBSubLoc - Database of protein Subcellular Localization

Protein sequence motifs and active sites e.g. PROSITE Database of individual protein families e.g. Plant TFDB
Structure database
There are various types of structure databases : Small molecules e.g. AANT : Amino Acid - Nucleotide interaction database Carbohydrates e.g. Glycoconjugate Data Bank: Structures an annotated glycan structure database and N-glycan primary structure verification service . Nucleic acid e.g. MeRNA (Metals in RNA) -a comprehensive compilation of all metal binding sites identified in RNA threedimensional structures available from the Protein Data Bank (PDB) and Nucleic Acid Database. Protein structure e.g. The Protein Data Bank (PDB) is the single worldwide archive of structural data of biological macromolecules.
PDB
The Protein Data Bank ( PDB ) was established at Brookhaven National Laboratories in 1971 as an archive for biological macromolecular crystal structures.
The three dimensional structures in PDB are primarily derived from experimental data obtained by X-ray crystallography and NMR .
SCOP
The SCOP database groups different protein structures according to their evolutionary relationship.The evolutionary relationship of all known protein structures have been determined by manual inspection and automated methods.
The goal of SCOP is to provide detail information about close relatives of proteins and to provide an evolutionary based protein classification resource.
GENOMIC DATABASE
General genomic database
e.g. Entrez Gene--It is NCBI's database for gene specific information. It does not include all known or predicted genes; instead Entrez Gene focuses on the genomes that have been completely sequenced,
Taxonomy and identification:

e.g. NCBI Taxonomy Database -It includes names and classifications for all of the organisms that are represented in the protein and sequence databases.
Prokaryotic genome database:

e.g. GeneDB
Viral genome database:

e.g. BioHealthBase
Fungal genome database:

e.g. Yeast Resource Center
Genome annotation terms and nomenclature:

e.g. BioThesaurus--It is a web-based system that maps a comprehensive collection of protein and gene names to protein entries in the UniProt Knowledgebase (UniProtKB).
Invertebrate genome database:

e.g. Drosophila microarray centre
Unicellular eukaryote genome database:

e.g. 1. TGD - Tetrahymena Genome Database , 2. Full malaria--It is a database of full length enriched cDNA libraries of malaria parasites: Plasmodium falciparum, P. yoelii, and Toxoplasma gondii
Some other databases
Microarray data and gene expression database. Plant database. Immunological database. Human gene and disease. Literature database. EST databases.
ESTs
EST expressed sequence tag partial DNA sequence (single-pass) of a cDNA clone provides the most comprehensive evidence for the existence of genes and their structure provide an inventory of likely genes and their variants along with information regarding the functional roles played by these genes and their products.
e.g. dbEST, HUNT: Annotated human full-length cDNA sequences
EST cluster databases
UniGene is a database at NCBI that contains clusters (UniGene clusters) of sequences that represent unique genes. These cluster are made automatically by partitioning GenBank sequences into a non-redundant set of gene-oriented clusters.
Other EST cluster databases are TIGR Gene Indices, Sputnik: Annotation of clustered plant ESTs:
Some examples of integrated biological database resources are:

Entrez Browser (at NCBI) ExPASy (home of SwissProt) Ensembl (Open Source based system)
References

Lukas K. Buehler, Hooman H. Rashidi :Bioinformatics basics MArketa Zuelebil, Jeremy o. Baum:Understanding Bioinformatics Yi Ping Phoebe Chen:Bioinformatics technologies Maureen J. Donlin:Introduction to Genomics and Bioinformatics Biological databases an introduction: Dr. Erik Bongcam-Rudloff Building successful biological databases: Russ B. Altman Stanford University Google http://www.google.com
http://nar.oupjournals.org
Nucleic Acids Research Database & Web Server issues

Biological Database

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Biological Database

Uploaded by

Copyright:

Available Formats

Biological database

An ever expanding reservoir of information.

What is biological database???

A brief history of biological databases

1965 M. O. Dayhoff et al. publish Atlas of Protein Sequences and Structures

Biological Databases: specific features

Biological Databases: some statistics

More than 1000 different databases

Biological database Sequence database Structure database

Nucleotide sequence database Protein sequence database

DBSubLoc - Database of protein Subcellular Localization

Taxonomy and identification:

Prokaryotic genome database:

Viral genome database:

Fungal genome database:

Genome annotation terms and nomenclature:

Invertebrate genome database:

Unicellular eukaryote genome database:

Some other databases

e.g. dbEST, HUNT: Annotated human full-length cDNA sequences

EST cluster databases

Some examples of integrated biological database resources are:

Nucleic Acids Research Database & Web Server issues

You might also like