Professional Documents
Culture Documents
Overview
• Introduction
• What is a database
• What type of databases can we access
• What roles do they play
• What type of information can we get from
them
• How do we access these information
1
What is a database ?
• Convenient method of vast amount of
information
Why databases ?
• Means to handle and share large volumes of
biological data
• Support large-scale analysis efforts
• Make data access easy and updated
• Link knowledge obtained from various
fields of biology and medicine
2
Different Database Types
• depends on the nature of information stored
(sequences, 2D gel or 3D structure images)
Features
• Most of the databases have a web-interface to
search for data
3
Biological Databases
Type of databases Information they contain
Bibliographic databases Literature
Taxonomic databases Classification
Nucleic acid databases DNA information
Genomic databases Gene level information
Protein databases Protein information
Protein families, domains and
functional sites Classification of proteins and identifying domains
Enzymes/ metabolic pathways Metabolic pathways
ØPrimary databases
ØSecondary databases
ØComposite databases
4
Primary databases
• Contain sequence data such as nucleic acid
or protein
• Example of primary databases include :
Nucleic Acid Databases Protein Databases
• EMBL • SWISS-PROT
• Genbank • TREMBL
• DDBJ • PIR
Secondary databases
• Or sometimes known as pattern databases
• Contain results from the analysis of the
sequences in the primary databases
• Example of secondary databases include :
Ø PROSITE
Ø Pfam
Ø BLOCKS
Ø PRINTS
5
Composite databases
• Combine different sources of primary
databases.
• Make querying and searching efficient and
without the need to go to each of the
primary databases.
• Example of composite databases include :
Ø NRDB – Non-Redundant DataBase
Ø OWL
DDBJ : http://www.ddbj.nig.ac.jp
DNA Databank of Japan
6
The International Sequence Database Collaboration
GenBank
EMBL
DDBJ
7
Amount Of Data Grows Rapidly
8
The Internet and WWW
9
National Centre for Biotechnology Information
http://www.ncbi.nlm.nih.gov/
10
Entrez
Entrez is a search and retrieval
system that integrates information
from databases at NCBI.
11
BNIP
12
Brief description of the sequence.
Contains the
contact information
of the submitter
Contains the information about the genes,
gene products and regions of biological
significance reported in the sequence &
•length of sequence
•scientific name of the source organism
•Taxon ID number, Map location
13
How to understand the output
Unique Identifiers :
Each entry in a database must have a unique
identifier
EMBL Identifier (ID)
GENBANK Accession Number (AC)
Or
http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html
14
ExPASy
15
SWISS-PROT
A curated protein sequence database which
strives to provide a high level of annotations
(such as the description of the function of a
protein, its domains structure, post-
translational modifications, variants, etc.), a
minimal level of redundancy and high level of
integration with other databases
http://tw.expasy.org/sprot/
TrEMBL
• Computer-annotated supplement to
SWISS-PROT
16
ENZYME
Enzyme nomenclature
database
http://tw.expasy.org/enzyme/
ENZYME Database
• A repository of information relative to
the nomenclature of enzymes
17
Access to ENZYME
• by EC number
• by enzyme class
• by description (official name) or
alternative name(s)
• by chemical compound
• by cofactor
18
KEGG
19
A structured database containing
information about metabolic
pathways in many organisms.
KEGG
• Part of the GenomeNet database
system
20
21
Link to other Enzyme
pathways
Compound
22
Summary
23
• Database standards, nomenclature, and naming
conventions are not clearly defined for many aspects
of biological information. This makes information
extraction more difficult
The End
24