You are on page 1of 13

Course title: Basic Bioinformatics

Course Code: ZOL-602


Credit hours: 3(2-1)
Databases of NCBI

DR. SAIRA HINA


Contents include:

 Nucleotide Databases
 BioProject
 Assembly
 GenBank
 PopSet
 Protein Databases
 GenPept
 Protein clusters
Nucleotide Databases

 Contain collection of nucleotide sequences from different sources.

 Different databases categorized under nucleotide databases are :


BioProject
 BioProject provides links to the primary
data from these projects, which range from focused genome
sequencing projects to large international collaborations

Assembly
 Assembly database collects metadata about genome
assemblies that were either submitted to GenBank or that are part of the Ref Seq
database.
Genome
 The Genome database collects genomic sequencing projects
for a given species and provides links to corresponding
records in BioProject, Assembly, Nucleotide and Protein.

RefSeq
 The RefSeq database is a non-redundant set of curated
and computationally derived sequences for transcripts, proteins
and genomic regions.
GenBank
 GenBank is the primary nucleotide sequence archive at
NCBI and is a member of the International Nucleotide Sequence
Database Collaboration (INSDC).

 Sequences from GenBank are available from three


Entrez databases

 Nucleotide
 EST
 GSS
PopSet
 Collection of related sequences and alignments derived from
population, phylogenetic, mutation and ecosystem studies that
have been submitted to GenBank.

Sequence Read Archive (SRA)


 repository for raw sequence reads and

alignments generated by high-throughput nucleic acid sequencers.


Clone database (CloneDB)
 Resource for finding descriptions, sources, map
positions and distributor information about available
clones and libraries.

Probe
 Registry of nucleic acid reagents designed for use in a wide
variety of biomedical research applications including
genotyping, SNP discovery, gene expression, gene silencing
and gene mapping.


Protein Databases
 Include collection of protein sequences from different sources.
 Entrez (protein sequence database of NCBI).
 Receives protein sequences from

 PIR (Protein Information Resource)


 PDB (Protein Data Bank)
 Translations for coding sequences from RefSeq(Reference Sequence)
 GenBank
 CDD is protein annotation resource comprising of MSA (Multiple
Sequence Alignment) models of proteins and ancient domains.
 Cross-linking of Entrez with Entrez Taxonomy database
 Contains information of more than 75,000 organisms.

GenPept
 Format in which protein sequence is displayed in Entrez.

Protein clusters
Protein Clusters database contains sets of almost identical RefSeq proteins
encoded by complete genomes from prokaryotes, eukaryotic organelles.
 Accession number is provided to each nucleotide

 Remains permanently associated with nucleotide sequence


 Allows easy tracking of different versions of sequence information among
three organizations.
References
 Andreas D. Baxevanis, BIOINFORMATICS A Practical Guide to the
Analysis of Genes and Proteins SECOND EDITION, A JOHN WILEY &
SONS, INC., PUBLICATION.

 Database resources of the National Center for Biotechnology Information


by NCBI Resource Coordinator, Nucleic Acids Research, 2016, Vol. 44,
Database issue D7–D19doi: 10.1093/nar/gkv129
 Essential Bioinformatics, by Jin Xiong, Cambridge
Applied Bioinformatics by Selzer, P., Marhofer, R. and Rohwer, A.
, Internet Source

You might also like