You are on page 1of 19

Biological database

An ever expanding reservoir of information.


Presented By: Mahesh Yadav

What is biological database???


Biological databases are libraries of life sciences information, collected from scientific experiments, published literature, high throughput experiment technology, and computational analyses. They contain information from research areas including genomics, proteomics, metabolomics, microarray gene expression and phylogenetics .

A brief history of biological databases

1965 M. O. Dayhoff et al. publish Atlas of Protein Sequences and Structures

1982 EMBL initiated DNA sequence database, followed within a year by GenBank of NCBI and in 1984 by DNA Database of Japan 1988 EMBL/GenBank/DDBJ agreed on common format for data elements

Biological Databases: specific features

Autonomous: many independent maintainers Heterogeneous data formats: e.g., various data formats for the same data entities; various types of biological data: genomic, microarray, proteomic, ... Dynamic: frequent and continuous changes in data content. Broad domain knowledge. Workflow-oriented: databases. Rich set of analysis tools. Information integration is essential: data aggregation from several databases.

Biological Databases: some statistics

More than 1000 different databases

1078 databases reported in The Molecular Biology Database Collection: 2008 update by Michael Y. Galperin , Nucleic Acids Research, 2008, Vol. 36, Database issue D2-D4 Metabase: database of biological databases,

Update (adding new data) frequency: daily to annually Free accessibility (almost all)

Types of databases
Primary databases
Original submissions by experimentalists Content controlled by the submitter Examples: GenBank, GEO

Derivative databases
Built from primary data Content controlled by third party (NCBI) Examples: Refseq, RefSNP, UniGene, NCBI Protein, Conserved Domain, Gene.

Genomic database

Biological database Sequence database Structure database

Sequence Databases

The sequence databases are the oldest type of biological databases, and also the most widely used.
Sequence Databases

Nucleotide sequence database Protein sequence database

International nucleotide sequence database collaboration-include EMBL,DDBJ,NCBI{GENBANK} Coding and non-coding DNA Gene structure,intron exon,splice site Transcriptional regulator site and transcription factors RNA sequence database

General sequence database e.g. swiss-prot, uni-prot, refseq Protein properties e.g. Binding DB, PPT-DB Protein localization and targeting e.g.

DBSubLoc - Database of protein Subcellular Localization


Protein sequence motifs and active sites e.g. PROSITE Database of individual protein families e.g. Plant TFDB

Structure database
There are various types of structure databases : Small molecules e.g. AANT : Amino Acid - Nucleotide interaction database Carbohydrates e.g. Glycoconjugate Data Bank: Structures an annotated glycan structure database and N-glycan primary structure verification service . Nucleic acid e.g. MeRNA (Metals in RNA) -a comprehensive compilation of all metal binding sites identified in RNA threedimensional structures available from the Protein Data Bank (PDB) and Nucleic Acid Database. Protein structure e.g. The Protein Data Bank (PDB) is the single worldwide archive of structural data of biological macromolecules.

PDB

The Protein Data Bank ( PDB ) was established at Brookhaven National Laboratories in 1971 as an archive for biological macromolecular crystal structures.
The three dimensional structures in PDB are primarily derived from experimental data obtained by X-ray crystallography and NMR .

SCOP

The SCOP database groups different protein structures according to their evolutionary relationship.The evolutionary relationship of all known protein structures have been determined by manual inspection and automated methods.
The goal of SCOP is to provide detail information about close relatives of proteins and to provide an evolutionary based protein classification resource.

GENOMIC DATABASE
General genomic database
e.g. Entrez Gene--It is NCBI's database for gene specific information. It does not include all known or predicted genes; instead Entrez Gene focuses on the genomes that have been completely sequenced,

Taxonomy and identification:


e.g. NCBI Taxonomy Database -It includes names and classifications for all of the organisms that are represented in the protein and sequence databases.

Prokaryotic genome database:


e.g. GeneDB

Viral genome database:


e.g. BioHealthBase

Fungal genome database:


e.g. Yeast Resource Center

Genome annotation terms and nomenclature:


e.g. BioThesaurus--It is a web-based system that maps a comprehensive collection of protein and gene names to protein entries in the UniProt Knowledgebase (UniProtKB).

Invertebrate genome database:


e.g. Drosophila microarray centre

Unicellular eukaryote genome database:


e.g. 1. TGD - Tetrahymena Genome Database , 2. Full malaria--It is a database of full length enriched cDNA libraries of malaria parasites: Plasmodium falciparum, P. yoelii, and Toxoplasma gondii

Some other databases

Microarray data and gene expression database. Plant database. Immunological database. Human gene and disease. Literature database. EST databases.

ESTs

EST expressed sequence tag partial DNA sequence (single-pass) of a cDNA clone provides the most comprehensive evidence for the existence of genes and their structure provide an inventory of likely genes and their variants along with information regarding the functional roles played by these genes and their products.

e.g. dbEST, HUNT: Annotated human full-length cDNA sequences

EST cluster databases

UniGene is a database at NCBI that contains clusters (UniGene clusters) of sequences that represent unique genes. These cluster are made automatically by partitioning GenBank sequences into a non-redundant set of gene-oriented clusters.
Other EST cluster databases are TIGR Gene Indices, Sputnik: Annotation of clustered plant ESTs:

Some examples of integrated biological database resources are:


Entrez Browser (at NCBI) ExPASy (home of SwissProt) Ensembl (Open Source based system)

References

Lukas K. Buehler, Hooman H. Rashidi :Bioinformatics basics MArketa Zuelebil, Jeremy o. Baum:Understanding Bioinformatics Yi Ping Phoebe Chen:Bioinformatics technologies Maureen J. Donlin:Introduction to Genomics and Bioinformatics Biological databases an introduction: Dr. Erik Bongcam-Rudloff Building successful biological databases: Russ B. Altman Stanford University Google http://www.google.com
http://nar.oupjournals.org

Nucleic Acids Research Database & Web Server issues

You might also like