You are on page 1of 15

BIOLOGICAL DATABASES

(Secondary Databases)

BY,
VAISHNAVI
KOTI
1st Yr Mtech

BIOTECHNOLOGY
WHAT IS
BIOINFORMATICS?
A computerized store house of data that
provide a standardized way for locating, adding,
and changing data.
TYPES OF DATABASES:
Object Oriented database: attempt to model
the structure of a given data set that as
closely as possible
Rational database :Organizes informations into
tables where each column represents the field
of informations that can be stored in a single
record

The Central Dogma & Biological
Data
Original DNA Sequences
(Genomes)

Expressed DNA sequences


( = mRNA Sequences
= cDNA sequences)
Expressed Sequence Tags
(ESTs)
Protein Sequences
-Inferred
-Direct sequencing
Primary and Secondary
Databases
Primary Database:Databases
consisting of data derived
experimentally such as nucleotide
sequences and three dimensional
structures are known as primary
databases
Examples: GenBank, Trace, SRA, SNP, GEO
Secondary Database:
Those data that are derived from the
analysis or treatment of primary data
such as secondary structures,
hydrophobicity plots, and domain are
stored in secondary databases
Examples: NCBI Protein, Refseq, TPA,
RefSNP, GEO datasets, UniGene,
Homologene, Structure, Conserved Domain
PRIMARY VS. SECONDARY
SEQUENCE
ACGT DATABASES
G C RefSeq
C TC T
A A
GAG
GAG
ATCATC
TA Labs TATAGCCG
AGCTCCGATA
TA CCGATGACAA
GC
CG
TG C
G Genome
CGT
C A
Sequencing A CT
T
TG
A
T Curators Assembly
Centers
G
G

AT
A
A

CA

CA
TGC

CG
GA
TT TTGACA Updated
TA

C
CGTGA

CG
AC

G
ACG

A
CG GC

TAT AT
ATTGTGA

continually
C GAC

T
GT

GC

TA AGC TGA G AC
TG

TAT

C
A C
TA

GA
T GC
A T C
GC TTATAGCCGG C TG CA by NCBI
A

AT
AT
T
TATAGCCGT T
TATAGCCG
A TATAGCCGA A T A
TA

T A G C
TA TT
GA GenBank
AT UniGene

TACTTTCTT C TC T
GAGA A A
GAGA GAG
GAG
T
A ATCA C ATCATC Algorithms
TYPES OF SECONDARY
DATABASES
PROSITE:
PROSITEis aprotein database.
It consists of entries describing theprotein
families,domainsandfunctional sitesas well
as amino acidpatterns and profiles in them.
These are manually curated by a team of
theSwiss Institute of Bioinformaticsand
tightly integrated intoSwiss-Protprotein
annotation.
As per 2012 it has 1,650 documentation
entries, 1,308 patterns, 1,039 profiles.
SWISS PROT:
SWISS-PROT is a curated protein
sequence database which strives to
provide a high level of annotation such
as the description of the function of a
protein, its domains structure, post-
translational modifications, variants,
etc , a minimal level of redundancy and
high level of integration with other
databases.
UNIPROT:
UniProtis a comprehensive, high-
quality and freely accessible database
ofprotein sequenceand functional
information, many entries being
derived fromgenome sequencing
projects.
It contains a large amount of
information about the biological
function of proteins derived from the
research literature
Online Mendelian Inheritance in
Man(OMIM)
OMIM is a comprehensive,
authoritative compendium of human
genes and genetic phenotypes that is
freely available and updated daily.
This database was initiated in the
early 1960s by Dr. Victor A. McKusick
as a catalog of mendelian traits and
disorders, entitled Mendelian
Inheritance in Man (MIM).
Mitochondri
Prefix Autosomal X Linked Y Linked Totals
al
*Gene description 14,399 704 48 35 15,186

+Gene and
81 2 0 2 85
phenotype, combined

#Phenotype
description, molecular 4,312 301 4 29 4,646
basis known

%Phenotype
description or locus,
1,504 126 5 0 1,635
molecular basis
unknown

Other, mainly
phenotypes with
1,697 112 2 0 1,811
suspected mendelian
basis

Totals 21,993 1,245 59 66 23,363


GENOME DATABASES
These databases
collectgenomesequences, annotate
and analyze them, and provide public
access. Some addcurationof
experimental literature to improve
computed annotations. These
databases may hold many species
genomes, or a singlemodel
organismgenome.
ADVANTAGES
More relevant inter-related information in one place.

Makes it easier to find additional relevant information related to


initial query.

Potentially find information indirectly linked, but relevant to your


subject of interest.

uncover non-obvious genetic features that explain phenotype or


disease.

Easier to build a story based on multiple pieces of biological


evidence.
THANK
YOU

You might also like