You are on page 1of 87

Genomic Databases & Analysis Tools

CONTENT

Introduction

1. Comparative genomics
ARED Organism - AU-rich element cluster variations between human and mouse
AnimalQTLdb - a livestock QTL database tool set for positional QTL information mining and beyond
BGDB - Bovine Genome Database
COMPARE - a multi-organism system for cross-species data comparison and transfer of information
CVTree - A Phylogenetic Tree Reconstruction Tool Based on Whole Genomes
CleanEST - the cleansed EST libraries database
CoGemiR - a comparative genomics microRNA database
DroSpeGe - rapid access database for new Drosophila species genomes
ECR Browser - A Tool for Visualizing and Accessing Data from Comparisons of Multiple Vertebrate
Genomes
EVOG - evolutionary visualizer for overlapping genes
GenColors - annotation and comparative genomics of prokaryotes made easy
GeneNest gene indices
IMG/M - Integrated Microbial Genomes/Metagenomes
MANTIS - a phylogenetic framework for multi-species genome comparisons
MBGD - Microbial genome database for comparative analysis
MEGA - Molecular Evolutionary Genetics Analysis
MamPol - a database of nucleotide polymorphism in the Mammalia class
MicrobesOnline - Prokaryotic Genome Database
Narcisse -- a mirror view of conserved syntenies
OMA - the Orthologous MAtrix project
OrthoDB - the hierarchical catalog of eukaryotic orthologs
OrthoMaM - orthologous mammalian markers
PReMod - a database of genome-wide mammalian cis-regulatory module predictions
PhenomicDB - Comparison of phenotypes of orthologous genes in human and model organisms
Phylemon - A suite of web tools for molecular evolution, phylogenetics, and phylogenomics
Pristionchus.org - a genome-centric database of the nematode satellite species Pristionchus pacificus
ProtClustDB -- NCBI Protein Clusters Database
Pseudofam - the pseudogene families database
SALAD - Surveyed contained motif ALignment diagram and the Associating Dendrogram
ShotgunFunctionalizeR - R-package for functional comparison of metagenomes

1
SnoopCGH - Comparative Genomic Hybridization software
SwissRegulon - a database of genome-wide annotations of regulatory sites
The CGView Server - a comparative genomics tool for circular genomes
The ERGO -- Genome analysis and discovery system
VISTA - Computational Tools for Comparative Genomics
eggNOG -- evolutionary genealogy of genes: Non-supervised Orthologous Groups
metaTIGER - a metabolic gene evolution resource

2. General genomics databases and tools


ABCdb - Archaeal and Bacterial ABC transporter database
CARGO - A web portal to integrate customized biological information
CEBS - Chemical Effects in Biological Systems
DEG - A Database of Essential Genes
EGassembler - online bioinformatics service for large-scale processing, clustering, and assembling
ESTs and genomic DNA fragments
Entrez Genome
GGT - Graphical GenoTypes
GenomeVx - circular chromosome visualization
Genomes OnLine Database (GOLD) - genomic and metagenomic projects and their associated
metadata
Genomes at the EBI
IMG - Integrated microbial genomes system
IMG/M - Integrated Microbial Genomes/Metagenomes
Invertebrate Homologous Genes Database
KEGG -- resource for deciphering the genome
KaryotypeDB - Karyotype and chromosome data for animal and plant species
MBGD - Microbial genome database for comparative analysis
MetaLook - a 3D visualization software for marine ecological genomics
NCBI Genomic Biology
OriDB - a DNA replication origin database
OrthoDB - the hierarchical catalog of eukaryotic orthologs
OrthoMCL-DB - querying a comprehensive multi-species collection of ortholog groups
PHIDIAS - a pathogen-host interaction data integration and analysis system
PSI SGKB - The Protein Structure Initiative Structural Genomics Knowledgebase
PhylomeDB - a database for genome-wide collections of gene phylogenies
Pseudofam - the pseudogene families database
Pseudogene.org - a comprehensive database and comparison platform for pseudogene annotation
Quadbase - a database of quadruplex motifs
THOR - targeted high-throughput ortholog reconstructor
TIGR (The Institute for Genomic Research) Microbial Database

2
The European Bioinformatics Institute's data resources - towards systems biology
The National Microbial Pathogen Database Resource (NMPDR) - a genomics platform based on
subsystem annotation
The Plant DNA C-values Database
TransportDB - A Relational Database of Cellular Membrane Transport Systems
TreeBASE - A database of phylogenetic knowledge
Visualization for genomics - the Microbial Genome Viewer
dbGap - a Database of Genome-Wide Association Studies
eggNOG - evolutionary genealogy of genes: Non-supervised Orthologous Groups
euGenes - a Eukaryota genome information system
mGene.web - Web service for accurate computational gene finding
The Fungal Genome Size Database

3. Genome annotation terms, ontologies, nomenclature, and classification


BioPortal - Biology Portal to Biomedical Ontologies
DAVID - A Database for Annotation, Visualization, and Integrated Discovery
DAVID Knowledgebase -- a backend database used for all DAVID bioinformatics tools
Evigan - an automated gene annotation program for eukaryotic genomes
FunSimMat - a comprehensive functional similarity database
GALA - the database of Genome ALignments and Annotations
GO - the Gene Ontology Database
GOA - The Gene Ontology Annotation Database
GOEAST - Gene Ontology Enrichment Analysis Software Toolkit
Gendoo - GENe, Disease features Ontology-based Overview system
HCGene - Hierarchical Classification of Genes
HGNC Database - HUGO Gene Nomenclature
IGRhCELLID - Integrated Genetic Resources of Human CELL lines for IDentification
IUBMB Nomenclature database
IUPAC Nomenclature database
IUPHAR-DB - the IUPHAR database of G protein-coupled receptors and ion channels
OBO-Edit - an ontology editor for biologists
PANTHER --Protein Analysis THrough Evolutionary Relationships
PLAN2L - PLant ANnotation to Literature
ProfCom - profiling of complex functionality
RDP - The Ribosomal Database Project
RIDOM - Ribosomal Differentiation of Medical Micro-organisms Database
SimCT - SIMilarity Clustering Tree
SuperPred - target-prediction server
The NCBI Taxonomy Homepage
The Ontology Lookup Service - more data and better tools for controlled vocabulary queries

3
UMLS - The Unified Medical Language System
Tree of Life
WEGO - a web tool for plotting GO annotations

4. Genome browsers, genome annotation, genomic sequence analysis


AMIGene - Annotation of MIcrobial Genes
BABELOMICS - advanced functional profiling of transcriptomics, proteomics, and genomics
experiments
BRIGEP x97 the BRIDGE-based genomex96transcriptomex96proteome browser
CNVDetector - locating copy number variations using array CGH data
CREx - Common interval Rearrangement Explorer
CleanEST - the cleansed EST libraries database
Database of Genomic Variants
Dcode.org anthology of comparative genomic tools
DiProGB - Dinucleotide Properties Genome Browser
ECR Browser - A Tool for Visualizing and Accessing Data from Comparisons of Multiple Vertebrate
Genomes
ESTAnnotator - a tool for high throughput EST annotation
Ensembl
G-SESAME - Gene Semantic Similarity Analysis and Measurement Tools
GC-Profile - a web-based tool for visualizing and analyzing the variation of GC content in genomic
sequences
IBRENA - In silico Biochemical Reaction Network Analysis
IGB - The Integrated Genome Browser
Identification of patterns in biological sequences at the ALGGEN server - PROMO and MALGEN
IsoFinder - Computational Prediction of Isochores in Genome Sequences
KAAS - KEGG Automatic Annotation Server
MGAlignIt - a web service for the alignment of mRNA/EST and genomic sequences
MICheck - a web tool for fast checking of syntactic annotations of bacterial genomes
MultiPipMaker and supporting tools - alignments and analysis of multiple genomic DNA sequences
Multiple alignments of genomic sequences using CHAOS, DIALIGN, and ABC
OmicBrowse - Genome Annotation Browser
Proteomes and Genomes Fasta
SPRING - a tool for the analysis of genome rearrangement using reversals and block-interchanges
SRA - Sequence Read Archive
The CHAOS/DIALIGN - Multiple Alignment of Genomic Sequences
The UCSC Archaeal Genome Browser
UCSC Genome Browser
ZOOM! - Zillions Of Oligos Mapped
e2g - An Interactive Web-based Server for Efficiently Mapping Large EST and cDNA Sets to Genomic
Sequences

4
g:Profiler - a web-based toolset for functional profiling of gene lists from large-scale experiments

5. Human genome databases, maps, and viewers


CREME - Cis-Regulatory Module Explorer for the Human Genome
CTD - Comparative Toxicogenomics Database
Database of Genomic Variants
ENCODE - ENCyclopedia Of DNA Elements
Ensembl
Evola - human orthologs as evolutionary annotation
GDB - the Human Genome DataBase
GenAtlas
GeneCards
GeneLoc - exon-based integration of human genome maps
H-DBAS - Alternative splicing database of completely sequenced and manually annotated full-length
cDNAs based on H-Invitational
H-InvDB - Human Invitational Database
HMDB - the Human Metabolome Database
HOMD - the Human Oral Microbiome Database
HuRef Genome Browser
LIFEdb - A Database for Functional Genomics Experiments
MGC - MAMMALIAN GENE COLLECTION
MotifMap - a human genome-wide map of candidate regulatory motif sites
NCG - Network of Cancer Genes
ORFDB -A High-quality Open Reading Frame Collection
RefSeq - Reference Sequence database
TRbase - A Database Of Tandem Repeats In The Human Genome
The Chromosome 7 Annotation Project - Human chromosome 7 sequence and annotation
The Gene Wiki - Wikipedia Gene Portal
UCSC Genome Browser
VISTA Enhancer Browserx97a database of tissue-specific human enhancers
dbGap - a Database of Genome-Wide Association Studies

5
Introduction
Genomic databases are referred to as online repositories of genomic variants, defined for a single
(locus-specific) or more (general) genes or particularly for a populace or ethnic group. Genomic databases are
indispensable parts of human genome informatics, which experienced an exponential development in the
postgenomic period, due to the understanding of the genetic etiology of human disorders and the identification
of various genomic variants. These sources organize this knowledge and variants so that it could be ultimately
beneficial not only for molecular diagnosis but likewise for clinicians and scientists. Genomic databases are
connected to supportive databases to help in the elucidation of the sequence data. These are devised and
regulated in the electronic setup of one or more computers, using many user interfaces, file transfer
procedures, software applications, and operating systems. These databases comprise sequence data
produced by geneticists, molecular biologists, and other employing techniques in labs that allow the
ascertaining of an individual set of a whole DNA sequence. Genomic databases enable the storing, sharing as
well as comparison of information throughout research studies, across data types, across organisms, and also
across people. These are not novel advancements - even before the popularisation of the modern internet,
'online' databases have been available in order to share data on essential organisms used for research works.
New advancements in both information sharing technology and genome sequencing technology have actually
produced an outburst of data sources, based around specific organisms, as has been traditionally the case, as
well as around specific data types, like transcriptional data or short-read sequencing data.

Bioinformatic tools are software packages that are devised for retrieving significant details from the mass of
biological/molecular biology databases and to perform structural or sequence analysis. There are visualization
software to examine and extract details from proteomic databases, and even data-mining tools that extract
data within genomic sequence databases. These can be categorized into miscellaneous tools, sequence
analysis tools, protein functional analysis tools, and similarity & homology tools.

1. Comparative genomics

1. ARED Organism - AU-rich element cluster variations between human and mouse

● Official URL: https://brp.kfshrc.edu.sa/ARED/

6
● What you can do: View AREs in the human transcriptome and study the comparative genomics of AREs
in model organisms.

● Highlights:
■ ARED Organism depicts the expansion of the adenylate uridylate (AU)-rich element (ARE)-containing
human mRNA database into the transcriptomes of mice and rats.
■ ARED-Integrated, another updated and expanded version of ARED, is a compilation of ARED
versions 1.0 to 3.0 and updated version 4.0 that is devoted to human mRNAs.
■ Quantitative estimation of ARE conservation in human, mouse, and rat transcripts was
conducted, symbolizing that a notable proportion (around 25%) of human genes vary in their
ARE patterns from mouse and rat transcripts.

2. AnimalQTLdb - a livestock QTL database tool set for positional QTL information
mining and beyond

● Official URL: https://www.animalgenome.org/cgi-bin/QTLdb/index

● What you can do: Search for publicly available QTL data on livestock and animal species.

● Highlights:
■ The Animal Quantitative Trait Loci (QTL) database (AnimalQTLdb) is designed to house all
publicly available QTL data on livestock animal species from which researchers can readily
locate and compare QTL within species.
■ The database tools are also added to link the QTL data to other types of genomic data, like
radiation hybrid (RH) maps, fingerprinted contig (FPC) physical maps, linkage maps, and
comparative maps to the human genome, etc.
■ As of early 2007, this database contains data on 1675 pigs, 846 cattle, and 657 chicken QTL,
which are dynamically linked to respective RH, FPC, and human comparative maps.

3. BGDB - Bovine Genome Database

● Official URL: https://bovinegenome.elsiklab.missouri.edu/

● What you can do: Find information about bovine genomics data.

● Highlights:

7
■ BGD aims to improve the annotation of the bovine genome and to integrate the genome
sequence with other genomics data.
■ It comprises GBrowse genome browsers, BLAST databases, a quantitative trait loci
(QTL) viewer, the Apollo Annotation Editor, and gene pages.

■ Genome browsers, available for both scaffold and chromosome coordinate systems,
display the bovine Official Gene Set (OGS), RefSeq and Ensembl gene models,
non-coding RNA, repeats, pseudogenes, single-nucleotide polymorphism, markers, QTL,
and alignments to complementary DNAs, ESTs, and protein homologs.
■ The Bovine QTL viewer is connected to the BGD Chromosome GBrowse, allowing for the
identification of candidate genes underlying QTL.
■ The Apollo Annotation Editor connects directly to the BGD Chado database to provide
researchers with remote access to gene evidence in a graphical interface that allows
editing and creating new gene models.
■ Researchers may upload their annotations to the BGD server for review and integration
into the subsequent release of the OGS.
■ Gene pages display information for individual OGS gene models, including gene
structure, transcript variants, functional descriptions, gene symbols, Gene Ontology
terms, annotator comments, and links to the National Center for Biotechnology
Information and Ensembl.
■ Each gene page is linked to a wiki page to allow input from the research community.

4. COMPARE - a multi-organism system for cross-species data comparison and


transfer of information

● Official URL: http://compare.ibdml.univ-mrs.fr/

● What you can do: A multi-organism web-based resource system designed to easily retrieve, correlate
and interpret data across species.

● Highlights:
■ The COMPARE interface gives access to a broad array of data including genomic
structure, expression data, annotations, pathways, and literature links for human and 3
extensively studied animal models (mouse, Drosophila, and zebrafish).
■ A consensus ortholog-finding pipeline combining many ortholog prediction methods
provides accurate comparisons of data across species and has been used to transfer
information from well-studied organisms to more poorly annotated ones.

8
5. CVTree - A Phylogenetic Tree Reconstruction Tool Based on Whole Genomes

● Official URL: http://tlife.fudan.edu.cn/cvtree/cvtree/

● What you can do: Construct a phylogenetic tree of microorganisms based on the oligopeptide content
of their complete proteomes.

● Highlights:
■ The CVTree web server is a novel implementation of the whole genome-based,
alignment-free composition vector (CV) method for phylogenetic analysis.
■ This new implementation attempts to meet the challenge of ever-increasing amount of
genome data and its database has over 850 prokaryotic genomes which will be updated
monthly from NCBI, and more than 80 fungal genomes collected manually from several
sequencing centers.
■ Users can also choose to output trees whose monophyletic branches are collapsed to
various taxonomic levels.
■ The development of the whole genome-based alignment-free CV method has provided
an independent verification to the traditional phylogenetic analysis based on a single
gene or a few genes.
■ It provides a faster and stable research platform. Users can upload their own sequences
to find their phylogenetic position among genomes selected from the server's inbuilt
database.
■ All sequence data used in a session may be downloaded as a compressed file.

6. CleanEST - the cleansed EST libraries database

● Official URL: https://www.kobic.re.kr/

● What you can do: A novel database server that classifies GenBank's dbEST (database of expressed
gene sequences) libraries and removes contaminants.

● Highlights:
■ All dbEST libraries were classified based on species and sequencing centers.
■ Human EST libraries were classified by anatomical and pathological systems according
to eVOC ontologies.
■ For each dbEST library, two different cleansed sequences were provided: 'pre-cleansed'
and 'user-cleansed'.
■ To provide user-cleansed sequences, an automatic user-cleansing pipeline was built, in
which sequences of a user-selected library are cleansed on-the-fly according to
user-selected options.

9
■ To generate pre-cleansed sequences, sequences were cleaned in dbEST by the
alignment of EST sequences against well-known contamination sources: UniVec, E. coli,
mitochondria, and chloroplast (for a plant).

7. CoGemiR - a comparative genomics microRNA database

● Official URL: https://cogemir.tigem.it/

● What you can do: Provides an overview of the genomic organization of microRNAs and the extent of
conservation during evolution in different metazoan species.

● Highlights:
■ The database gathers data on genomic location, conservation, and expression data of
both known and newly predicted microRNAs and displays the data by privileging a
comparative point of view.
■ The database also incorporates a microRNA prediction pipeline to annotate microRNAs
in newly sequenced genomes.

8. DroSpeGe - rapid access database for new Drosophila species genomes

● Official URL: http://insects.eugenes.org/DroSpeGe/

● What you can do: Search and compare 12 new and old Drosophila genomes.

● Highlights:
■ This comparative genome database presents genome researchers with rapid, usable
access to 12 new and old Drosophila genomes.
■ New genome assemblies provided by several sequencing centers have been annotated
with known model organism gene homologies and gene predictions to provide basic
comparative data.

■ This genome database includes homologies to Drosophila melanogaster and 8 other


eukaryote model genomes and gene predictions from several groups.
■ BioMart provides for data mining of annotations and sequences.

10
■ BLAST searches of the newest assemblies are integrated with genome maps.
■ GBrowse maps provide detailed views of cross-species-aligned genomes.
■ Summaries of fundamental genome statistics include sizes, genes found and predicted,
homology among genomes, phylogenetic trees of species, and comparisons of various
gene predictions for sensitivity and specificity in finding new and known genes.

9. ECR Browser - A Tool for Visualizing and Accessing Data from Comparisons of
Multiple Vertebrate Genomes

● Official URL: http://ecrbrowser.dcode.org/

● What you can do: Access to whole-genome alignments of human, mouse, rat, and fish sequences.

● Highlights:
■ It provides the starting point for the discovery of novel genes, identification of distant
gene regulatory elements, and prediction of transcription factor binding sites.
■ The genome alignment portal of the ECR Browser also allows quick and automated
alignments of any user-submitted sequence to the genome of choice.
■ The interconnection of the ECR Browser with other DNA sequence analysis tools
generates a unique portal for studying and exploring vertebrate genomes.

10. EVOG - evolutionary visualizer for overlapping genes

● Official URL: http://neobio.cs.pusan.ac.kr/evog/


● What you can do: Analyze the evolutionary process of overlapping genes when comparing different
species.

● Highlights:
■ This integrated database provides a manually curated database that displays the
evolutionary features of overlapping genes.
■ The EVOG DB components included a number of overlapping genes (10074 in human,
10,009 in chimpanzee, 67,039 in orangutan, 51,001 in marmoset, 219 in rhesus, 3627 in
cow, 209 in dog, 10,700 in mouse, 7987 in rat, 1439 in chicken, 597 in Xenopus, 2457 in
zebrafish and 4115 in Drosophila).

11
11. GenColors - annotation and comparative genomics of prokaryotes made easy

● Official URL: http://gencolors.fli-leibniz.de/

● What you can do: A web-based software/database system aimed at an improved and accelerated
annotation of prokaryotic genomes.

● Highlights:
■ GenColors allows seamless integration of data from ongoing sequencing projects and
annotated genomic sequences obtained from GenBank.
■ The genome comparison tools include best bidirectional hits, gene conservation,
syntenies, and gene core sets.
■ The GenColors system can be used both for annotation purposes in ongoing genome
projects and as an analysis tool for finished genomes.
■ A variety of export/import filters manages an effective data flow from sequence assembly
and manipulation programs (e.g., GAP4) to GenColors and back as well as to standard
GenBank file(s).
■ Precomputed UniProt matches allow annotation and analysis in an effective manner.
■ In addition to these analysis options, base-specific quality data (coverage and confidence)
can also be handled if available.
■ GenColors comes in two types, dedicated genome browsers, and the Jena Prokaryotic
Genome Viewer (JPGV).
■ Dedicated genome browsers contain genomic information on a set of related genomes
and offer a large number of options for genome comparison.

12. GeneNest gene indices

● Official URL: http://genenest.molgen.mpg.de/

● What you can do: Visualize gene indices of human, mouse, Arabidopsis, Zebrafish, Drosophila, and
Sheep.

● Highlights:
■ GeneNest is a comprehensive visualization of gene indices of 6 model organisms.
■ The objective of GeneNest is to represent each gene by a single cluster of ESTs and/or mRNAs.
Further subdivision of a cluster into contigs may be caused by alternative splicing, genomic
sequences, or artifacts like chimeric sequences.
■ Consensus sequences derived from GeneNest contigs are a basis for mapping genes onto the
genome, and for analysis of splice isoforms.

12
13. IMG/M - Integrated Microbial Genomes/Metagenomes

● Official URL: http://img.jgi.doe.gov/m

● What you can do: A data management and analysis system for metagenomes.

● Highlights:
■ IMG/M is a data management and analysis system for microbial community genomes
(metagenomes) hosted at the Department of Energy's (DOE) Joint Genome Institute (JGI).
■ IMG/M provides IMG's comparative data analysis tools extended to handle metagenome data,
together with metagenome-specific analysis tools.
■ IMG/M consists of metagenome data integrated with isolated microbial genomes from the
Integrated Microbial Genomes (IMG) system.

14. MANTIS - a phylogenetic framework for multi-species genome comparisons

● Official URL: http://www.mantisdb.org/MANTiS/Welcome.html

● What you can do: The missing link between multi-species full genome comparisons and functional
analysis.

● Highlights:
■ MANTIS is a relational database for the analysis of
➢ Gains and losses of genes on specific branches of the metazoan phylogeny.
➢ Reconstructed genome content of ancestral species.
➢ Over-or under-representation of functions/processes and tissue specificity of gained,
duplicated, and lost genes.
■ MANTIS estimates the most likely positions of gene losses on the true phylogeny using a
maximum-likelihood function.
■ A user-friendly interface and an extensive query system allow investigating questions pertaining
to gene identity, phylogenetic mapping, and function/expression parameters.

15. MBGD - Microbial genome database for comparative analysis

● Official URL: http://mbgd.genome.ad.jp/

● What you can do: Conduct a comparative analysis of completely sequenced microbial genomes.

13
● Highlights:
■ The microbial genome database (MBGD) for comparative analysis is a platform for microbial
comparative genomics-based on automated ortholog group identification.
■ The database contains approximately 1000 genomes.
■ An outstanding feature of MBGD is that it enables users to create ortholog groups using a
specified subgroup of organisms.
■ To utilize the MBGD database as a comprehensive resource for investigating microbial genome
diversity, the following advanced functionalities have been developed:
➢ Enhanced assignment of functional annotation, including external database links to each
orthologous group.
➢ Interface for choosing a set of genomes to compare based on phenotypic properties.
➢ The addition of more eukaryotic microbial genomes (fungi and protists) and some higher
eukaryotes as references.
➢ Enhancement of the MyMBGD mode, which allows users to add their own genomes to MBGD
and now accepts raw genomic sequences without any annotation (in such a case, it runs a
gene-finding procedure before identifying the orthologs).
■ Some analysis functions, such as the function to find orthologs with similar phylogenetic patterns,
have also been improved.

16. MEGA - Molecular Evolutionary Genetics Analysis

● Official URL: http://www.megasoftware.net/

● What you can do: A biologist-centric software for evolutionary analysis of DNA and protein sequences.

● Highlights:
■ MEGA software is a desktop application devised for comparative analysis of homologous gene
sequences either from multigene families or from different species with a special emphasis on
inferring evolutionary relationships and patterns of DNA and protein evolution.
■ In addition to the tools for statistical analysis of data, MEGA provides various convenient facilities
for the assembly of sequence data sets from files or web-based repositories, and it includes tools
for visual presentation of the results obtained in the form of interactive phylogenetic trees and
evolutionary distance matrices.

17. MamPol - a database of nucleotide polymorphism in the Mammalia class

● Official URL: http://mampol.uab.es/mampol/

14
● What you can do: Conduct single nucleotide polymorphisms diversity measurements among
homologous sequences from the Mammalia class.

● Highlights:
■ MamPol, 'Mammalia Polymorphism Database', is a website comprising all the well-annotated
polymorphic sequences available in GenBank for the Mammalia class grouped by name of
organism and gene.
■ It also contains a set of tools for simple re-analysis of the available data and a statistics section
that is updated daily and summarizes the contents of the database.
■ Data gathering, calculation of diversity measures, and daily updates are automatically performed
using PDA software.
■ Diversity measures of single nucleotide polymorphisms are provided for each set of haplotypic
homologous sequences, including polymorphism at synonymous and non-synonymous sites,
linkage disequilibrium, and codon bias.
■ The MamPol website includes several interfaces for browsing the contents of the database and
making customizable comparative searches of different species or taxonomic groups.

18. MicrobesOnline - Prokaryotic Genome Database

● Official URL: http://www.microbesonline.org/

● What you can do: Find information about 1000s of microbial genomes.

● Highlights:
■ MicrobesOnline is a resource for comparative and functional genome analysis.
■ The portal includes more than 1000 complete genomes of bacteria, archaea, and fungi and
thousands of expression microarrays from diverse organisms ranging from model organisms
such as Escherichia coli and Saccharomyces cerevisiae to environmental microbes such as
Desulfovibrio Vulgaris and Shewanella oneidensis.
■ To identify co-regulated genes, MicrobesOnline can search for genes based on their expression
profile and provides tools for identifying regulatory motifs and seeing if they are conserved.
■ To assist in annotating genes and in reconstructing their evolutionary history, MicrobesOnline
includes a comparative genome browser based on phylogenetic trees for every gene family as
well as a species tree.
■ It also includes fast phylogenetic profile searches, comparative views of metabolic pathways,
operon predictions, a workbench for sequence analysis, and integration with RegTransBase and
other microbial genome resources.
■ The next update of MicrobesOnline will contain significant new functionality, including a
comparative analysis of metagenomic sequence data.

15
19. Narcisse -- a mirror view of conserved syntenies

● Official URL: http://narcisse.toulouse.inra.fr/

● What you can do: A database dedicated to the study of genome conservation.

● Highlights:
■ The Narcisse database is dedicated to the study of genome conservation, from sequence
similarities to conserved chromosomal segments or conserved syntenies, for a large number of
animals, plants, and bacterial completely sequenced genomes.
■ The query interface, a comparative genome browser, enables navigation between genome dot
plots, comparative maps, and sequence alignments.

20. OMA - the Orthologous MAtrix project

● Official URL: http://omabrowser.org/

● What you can do: Explore orthologous relations across 352 complete genomes.

● Highlights:
■ A web-based tool allowing the exploration of orthologous relations over 352 complete genomes.
■ Orthologs can be viewed as groups across species, but also at the level of sequence pairs,
allowing the distinction among one-to-one, one-to-many and many-to-many orthologs.
21. OrthoDB - the hierarchical catalog of eukaryotic orthologs

● Official URL: http://cegg.unige.ch/orthodb

● What you can do: Find groups of orthologous genes.

● Highlights:
■ Catalogs groups of orthologous genes in a hierarchical manner, at each radiation of the species
phylogeny, from more general groups to more fine-grained delineations between closely related
species.
■ A COG-like and Inparanoid-like ortholog delineation procedure was used on the basis of
all-against-all Smith-Waterman sequence comparisons to analyze 58 eukaryotic genomes,
focusing on vertebrates, insects, and fungi to facilitate further comparative studies.

16
22. OrthoMaM - orthologous mammalian markers

● Official URL: http://www.orthomam.univ-montp2.fr/orthomam/html/

● What you can do: A database of orthologous genomic markers for placental mammal phylogenetics.

● Highlights:
■ The EnsEMBL database was used to determine a set of orthologous genes from 12 available
complete mammalian genomes.
■ As targets for possible amplification and sequencing in additional taxa, more than 3,000 exons of
length > 400 bp have been selected, among which 118, 368, 608, and 674 are respectively
retrieved for 12, 11, 10, and 9 species.
■ A bioinformatic pipeline has been developed to provide evolutionary descriptors for these
candidate markers in order to assess their potential phylogenetic utility.
■ Our database centered on complete genome information now makes it possible to select
promising markers to a given phylogenetic question or a systematic framework by querying a
number of evolutionary descriptors.

23. PReMod - a database of genome-wide mammalian cis-regulatory module


predictions

● Official URL: http://genomequebec.mcgill.ca/PReMod

● What you can do: Conduct genome-wide cis-regulatory module (CRM) predictions for both the human
and the mouse genomes.

● Highlights:
■ This database is based on a prediction algorithm that exploits the fact that many known CRMs are
made of clusters of phylogenetically conserved and repeated transcription factors (TF) binding
sites.
■ Contrary to other existing databases, PReMod is not restricted to modules located proximal to
genes, but in fact, mostly contains distal predicted CRMs (pCRMs).
■ The output includes information about the binding sites predicted within the selected pCRMs, and
a graphical display of their distribution within the pCRMs.
■ It also provides a visual depiction of the chromosomal context of the selected pCRMs in terms of
neighboring pCRMs and genes, all of which are linked to the UCSC Genome Browser and the
NCBI.
■ It allows users to:
➢ Identify pCRMs around a gene of interest
➢ Identify pCRMs that have binding sites for a given TF (or a set of TFs)

17
➢ Download the entire dataset for local analyses

24. PhenomicDB - Comparison of phenotypes of orthologous genes in human


and model organisms

● Official URL: http://www.phenomicdb.de/

● What you can do: Compare phenotypes of a given gene or gene set in different model organisms.

● Highlights:
■ PhenomicDB is a multi-organism phenotype-genotype database including human, mouse, fruit fly,
C.elegans, and other model organisms.
■ The inclusion of gene Indices (NCBI Gene) and orthologues (same gene in different organisms)
from HomoloGene allows comparing phenotypes of a given gene over many organisms
simultaneously.
■ 2007 update: They have enhanced PhenomicDB recently by additionally incorporating quantitative
and descriptive RNA interference (RNAi) screening data, by enabling the usage of phenotype
ontology terms, and by providing information on assays and cell lines.

25. Phylemon - A suite of web tools for molecular evolution, phylogenetics, and
phylogenomics

● Official URL: http://phylemon.bioinfo.cipf.es/

● What you can do: Phylemon is a web server that integrates a selected suite of more than 20 different
tools from the most popular stand-alone programs of phylogenetic and evolutionary analysis.

● Highlights:
■ Phylemon is an online platform for phylogenetic and evolutionary analyses of molecular
sequence data.
■ It has been conceived as a natural response to the increasing demand of data analysis of many
experimental scientists wishing to add a molecular evolution and phylogenetics insight into their
research.
■ Tools included in Phylemon cover a wide yet selected range of programs: from the most basic for
multiple sequence alignment to elaborate statistical methods of phylogenetic reconstruction
including methods for evolutionary rates analyses and molecular adaptation.
■ Phylemon has several features that differentiate it from other resources:

18
➢ It offers an integrated environment that enables the direct concatenation of evolutionary
analyses, the storage of results, and handles required data format conversions
➢ Once an outfile is produced, Phylemon suggests the next possible analyses, thus guiding
the user and facilitating the integration of multi-step analyses.
➢ Users can define and save complete pipelines for specific phylogenetic analysis to be
automatically used on many genes in subsequent sessions or multiple genes in a single
session (phylogenomics).

26. Pristionchus.org - a genome-centric database of the nematode satellite


species Pristionchus pacificus

● Official URL: http://www.pristionchus.org/

● What you can do: Search for genomic information on nematode satellite species Pristionchus
pacificus.

● Highlights:
■ This Pristionchus pacificus web resource offers diverse content covering genome browsing,
genetic and physical maps, similarity searches, a community platform, and assembly details.

27. ProtClustDB -- NCBI Protein Clusters Database

● Official URL: http://www.ncbi.nlm.nih.gov/sites/entrez?db=proteinclusters

● What you can do: Find information about related protein sequences.

● Highlights:
■ The NCBI Protein Clusters Database (ProtClustDB) has been created to efficiently maintain and
keep the deluge of data generated from prokaryotic genomic studies up to date.
■ It contains both curated and uncurated clusters of proteins grouped by sequence similarity.
■ The May 2008 release contains a total of 285 386 clusters derived from over 1.7 million proteins
encoded by 3806 nt sequences from the RefSeq collection of complete chromosomes and
plasmids from four major groups: prokaryotes, bacteriophages, and the mitochondrial and
chloroplast organelles.
■ There are 7180 clusters containing 376,513 proteins with curated gene and protein functional
annotation.
■ PubMed identifiers and external cross-references are collected for all clusters and provide
additional information resources.

19
■ A suite of web tools is available to explore more detailed information, such as multiple
alignments, phylogenetic trees, and genomic neighborhoods. ProtClustDB provides an efficient
method to aggregate gene and protein annotation for researchers.

28. Pseudofam - the pseudogene families database

● Official URL: http://pseudofam.pseudogene.org/

● What you can do: A database of pseudogene families based on the protein families from the Pfam
database.

● Highlights:
■ Pseudofam provides resources for analyzing the family structure of pseudogenes including query
tools, statistical summaries, and sequence alignments.
■ Pseudofam uses a large-scale parallelized homology search algorithm (implemented as an
extension of the PseudoPipe pipeline) to identify pseudogenes.
■ It contains over 125,000 pseudogenes identified from 10 eukaryotic genomes and aligned within
nearly 3000 families (approximately one-third of the total families in PfamA).
■ Each identified pseudogene is assigned to its parent protein family and consequently aligned to
each other by transferring the parent domain alignments from the Pfam family.
■ Pseudogenes are also given additional annotations based on an ontology, reflecting their mode of
creation and subsequent history.
■ The annotation highlights the association of pseudogene families with genomic features, such as
segmental duplications.
■ In addition, pseudogene families are associated with key statistics, which identify outlier families
with an unusual degree of pseudogenization.
■ The statistics also show how the number of genes and pseudogenes in families correlates across
various species.

29. SALAD - Surveyed contained motif ALignment diagram and the Associating
Dendrogram

● Official URL: http://salad.dna.affrc.go.jp/salad/en/

● What you can do: Perform a systematic comparison of proteome data among species.

● Highlights:
■ SALAD is a comparative genomics database from plant-genome-based proteome data sets.

20
■ 'SALAD on ARRAYs' is a viewer to view arbitrary microarray data sets of paralogous genes linked
to the same dendrogram in a window.
■ Evolutionarily conserved motifs were extracted by MEME software from 209,529 protein-sequence
annotation groups selected by BLASTP from the proteome data sets of 10 species: rice, sorghum,
Arabidopsis thaliana, grape, a lycophyte, a moss, 3 algae, and yeast.
■ Similarity clustering of each protein group was performed by pairwise scoring of the motif
patterns of the sequences. The SALAD database gives a graphical viewer that displays a motif
pattern diagram linked to the resulting bootstrapped dendrogram for each protein group.
■ Amino-acid-sequence-based and nucleotide-sequence-based phylogenetic trees for motif
combination alignment, a logo comparison diagram for each clade in the tree, and a Pfam-domain
pattern diagram are also available.

30. ShotgunFunctionalizeR - R-package for functional comparison of


metagenomes

● Official URL: http://shotgun.math.chalmers.se/

● What you can do: Analyze data from functional analysis on fragmented microbial genetic material.

● Highlights:
■ ShotgunFunctionalizeR contains tools for importing, annotating and visualizing metagenomic
data generated by shotgun high-throughput sequencing.
■ It includes many statistical procedures for assessing functional differences between samples,
both for individual genes and for entire pathways.
■ The tool is based on a Poisson model, which is highly flexible and hence applicable to a broad
range of different experimental designs.

31. SnoopCGH - Comparative Genomic Hybridization software

● Official URL: http://snoopcgh.sourceforge.net/

● What you can do: Visualize and explore comparative genomic hybridization data sets.

● Highlights:
■ SnoopCGH is a software tool for structural variants (SV) analysis utilizing data from array CGH
technologies, which is also amenable to short-read sequence data.

21
■ Array-based comparative genomic hybridization (CGH) technology is used to discover and
validate genomic structural variation, including copy number variants, insertions, deletions, and
other structural variants (SVs).
■ The visualization and summarization of the array CGH data outputs, potentially across many
samples, is an important process in the identification and analysis of SVs.

32. SwissRegulon - a database of genome-wide annotations of regulatory sites

● Official URL: http://www.swissregulon.unibas.ch/

● What you can do: Search for genome-wide annotations of regulatory sites in yeast and prokaryotic
genomes.

● Highlights:
■ This database contains genome-wide annotations of regulatory sites in the intergenic regions of
genomes.
■ The annotations are produced using a number of recently developed PhyloGibbs algorithms that
operate on multiple alignments of orthologous intergenic regions from related genomes in
combination with, whenever available, known sites from the literature, and ChIP-on-chip binding
data.

■ It provides information about the sequence, location, orientation, posterior probability, and,
whenever available, a binding factor of each annotated site.
■ This database can be queried based on any annotated genomic feature and for regulons.

33. The CGView Server - a comparative genomics tool for circular genomes

● Official URL: http://stothard.afns.ualberta.ca/cgview_server/

● What you can do: Generate graphical maps of circular genomes that show sequence features, base
composition plots, analysis results, and sequence similarity plots.

● Highlights:
■ Sequences can be supplied in raw, FASTA, GenBank or EMBL format.
■ The server uses BLAST to compare the primary sequence to up to three comparison genomes or
sequence sets.
■ The BLAST results and feature information are converted to a graphical map showing the entire
sequence, or an expanded and more detailed view of a region of interest.

22
■ Additional feature or analysis information can be submitted in the form of GFF (General Feature
Format) files.
■ Several options are included to control which types of features are displayed and how the features
are drawn.
■ The CGView Server can be used to visualize features associated with any bacterial, plasmid,
chloroplast, or mitochondrial genome, and can aid in the identification of conserved genome
segments, instances of horizontal gene transfer, and differences in gene copy number.
■ Because a collection of sequences can be used in place of a comparison genome, maps can also
be used to visualize regions of a known genome covered by newly obtained sequence reads.

34. The ERGO -- Genome analysis and discovery system

● Official URL: http://ergo.integratedgenomics.com/ERGO/

● What you can do: Conduct a comprehensive analysis of genes and genomes.

● Highlights:
■ This genome analysis and discovery suite is an integration of biological data from genomics,
biochemistry, high-throughput expression profiling, genetics, and peer-reviewed journals.
■ ERGO combines pattern-based analysis with comparative genomics by visualizing genes within
the context of regulation, expression profiling, phylogenetic clusters, fusion events, networked
cellular pathways, and chromosomal neighborhoods of other functionally related genes.
■ The outcome of this multifaceted approach is to provide an extensively curated database of the
largest available integration of genomes, with a vast collection of reconstructed cellular pathways
spanning all domains of life.
■ The current version (2004) of the ERGO database contains 618 complete or nearly complete
genomes, of which 319 are Bacteria, 116 Eukarya, 34 Archaea, and 149 Viruses.
■ ERGO is not available to the general public, access to ERGO is provided only under subscription.

35. VISTA - Computational Tools for Comparative Genomics

● Official URL: http://www-gsd.lbl.gov/vista/

● What you can do: Comprehensive suite of programs and databases for comparative analysis of
genomic sequences.

● Highlights:

23
■ The VISTA portal for comparative genomics is devised to provide biomedical scientists a unified
set of tools to lead them from the raw DNA sequences through the alignment and annotation to
the visualization of the outcomes.
■ The VISTA portal also hosts the alignments of a number of genomes computed by our group,
enabling users to study the regions of their interest without having to manually download the
individual sequences.

36. eggNOG -- evolutionary genealogy of genes: Non-supervised Orthologous


Groups

● Official URL: http://eggnog.embl.de/

● What you can do: Discover orthologous groups of genes.

● Highlights:
■ Contains orthologous groups constructed from Smith-Waterman alignments through the
identification of reciprocal best matches and triangular linkage clustering.
■ eggNOG covers 2,242 035 proteins (built from 2,590,259 proteins) and provides a broad functional
description for at least 1,966,709 (88%) of them.
■ Applying this procedure to 630 complete genomes (529 bacteria, 46 archaea, and 55 eukaryotes),
which is a 2-fold increase relative to the previous version, yielded 224,847 OGs, including 9724
extended versions of the original COG and KOG.
■ OGs were computed for different levels of the tree of life; in addition to the species groups
included in our first release (i.e. fungi, metazoa, insects, vertebrates, and mammals), OGs have
been constructed for archaea, fishes, rodents, and primates.
■ The non-supervised orthologous groups (NOGs) were automatically annotated with functional
descriptions, protein domains, and functional categories as defined initially for the COG/KOG
database.
■ In-depth analysis is facilitated by precomputed high-quality multiple sequence alignments and
maximum-likelihood trees for each of the available OGs.

37. metaTIGER - a metabolic gene evolution resource

● Official URL: http://www.bioinformatics.leeds.ac.uk/metatiger/

● What you can do: Find metabolic networks and phylogenomic information on a taxonomically diverse
range of eukaryotes.

24
● Highlights:
■ metaTIGER uses genomic information from 121 eukaryotes and 404 prokaryotes and sensitive
sequence search techniques to predict the presence of metabolic enzymes.
■ These enzyme sequences were used to create a comprehensive database of 2257
maximum-likelihood phylogenetic trees, some containing over 500 organisms.
■ The trees can be viewed using iTOL, an advanced interactive tree viewer, enabling straightforward
interpretation of large trees.
■ metaTIGER is demonstrated through evolutionary analysis of Plasmodium, including identification
of genes horizontally transferred from chlamydia.

■ Complex high-throughput tree analysis is also available through user-defined queries, allowing the
rapid identification of trees of interest, e.g. containing putative HGT events.
■ metaTIGER also provides novel and easy-to-use facilities for viewing and comparing the
metabolic networks in different organisms via highlighted pathway images and tables.

2. General genomics databases and tools


1. ABCdb - Archaeal and Bacterial ABC transporter database

● Official URL: http://www-abcdb.biotoul.fr/

● What can you do: Search for comprehensive information on ATP-binding cassette (ABC) transporters in
archaeal and bacterial genomes.

● Highlights:
■ Additional query tools have been developed for the analysis of the ABC family from both
functional and evolutionary perspectives.
■ High quality of annotation is achieved by manual verification of the predictions.
■ Cross-reference to the transport classification system is used to predict the type of compound
transported.
■ ABCdb is an online resource for ABC transporter repertories from sequenced archaeal and
bacterial genomes

2. CARGO - A web portal to integrate customized biological information

● Official URL: http://cargo.bioinfo.cnio.es

25
● What can you do: A open platform system that aims to facilitate the analysis of biological data,
including visualization, mapping, and literature retrieval.

● Highlights:
■ The tool is designed to be used by experimental biologists with no training in bioinformatics. In
the current state, the system presents a list of human cancer genes.
■ Through the use of small agents, called widgets, supported by a Rich Internet Application (RIA)
paradigm based on AJAX, CARGO provides pieces of minimal, relevant, and descriptive biological
information.
■ CARGO (Cancer And Related Genes Online) is a configurable biological web portal designed as a
tool to facilitate, integrate and visualize results from Internet resources, independently of their
native format or access method.
■ There is a massive quantity of information generated in Life Sciences, and it is spread in several
databases & repositories.
■ Despite the broad availability of the information, there is a great demand for methods that are able
to look for, gather and display distributed data in a standardized and friendly way.

3. CEBS - Chemical Effects in Biological Systems

● Official URL: http://cebs.niehs.nih.gov

● What can you do: Public repository for toxicogenomics data.

● Highlights:
■ CEBS comprises data derived from studies of genetic alterations and of chemicals and is
compatible with environmental and clinical studies.
■ CEBS is devised to allow the user to query the data using the study conditions, the subject
responses, and then, having identified an appropriate set of subjects, to move to the microarray
module of CEBS to carry out gene signature and pathway analysis.
■ Scope of CEBS: CEBS currently holds 22 studies of rats, 4 studies of mice, and 1 study of
Caenorhabditis elegans.
■ CEBS can additionally accommodate data from studies of human subjects.
■ Toxicogenomics studies currently in CEBS contain more than 4000 microarray hybridizations, and
75 2D gel images annotated with protein identification performed by MS/MS and MALDI.
■ CEBS comprises raw microarray data collected in accordance with MIAME guidelines and
provides tools for data selection, pre-processing, and analysis resulting in annotated lists of genes
of interest.
■ Also, histopathology & clinical chemistry findings from over 1500 animals are included in CEBS.
■ CEBS/BID: The BID (Biomedical Investigation Database) is another part of the CEBS system.

26
■ BID is a relational database used to load and curate study data before exporting to CEBS, in
addition to capturing and displaying novel data types such as PCR data, or additional fields of
interest, including those defined by the HESI Toxicogenomics Committee (in preparation).
■ BID can be accessed through the user interface from https://dir-apps.niehs.nih.gov/arc/.
■ Requests for a copy of BID and for depositing data into CEBS or BID are accessible at
http://www.niehs.nih.gov/cebs-df/.

4. DEG - A Database of Essential Genes

● Official URL: http://tubic.tju.edu.cn/deg/

● What can you do: Find information about genes essential to life in prokaryotes and eukaryotes.

● Highlights:
■ The number of eukaryotic essential genes has increased more than 5-fold because DEG 1.0 only
had yeast ones, but DEG 5.0 also has those in humans, mice, worms, fruit flies, zebrafish, and the
plant Arabidopsis thaliana.
■ In the most recent release, the number of prokaryotic essential genes in DEG has increased about
10-fold, mainly owing to genome-wide gene essentiality screens performed in a wide range of
bacteria.

5. EGassembler - online bioinformatics service for large-scale processing,


clustering, and assembling ESTs and genomic DNA fragments

● Official URL: http://egassembler.hgc.jp/

● What can you do: Align and merge sequence fragments resulting from shotgun sequencing or gene
transcripts (EST) fragments in order to reconstruct the original segment or gene.

● Highlights:
■ It is a unique all-in-one online application web service for large-scale ESTs and genomic DNA
clustering and assembling.
■ EGassembler provides an automated as well as a user-customized analysis tool for cleaning,
repeat masking, vector trimming, organelle masking, clustering, and assembling of ESTs and
genomic fragments.

27
6. Entrez Genome

● Official URL: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Genome

● What can you do: Search for genomic sequences from completely sequenced organisms and those for
which sequencing is in progress.

● Highlights:
■ Acquire genomic sequences from sequenced organisms and organisms for which sequencing is
ongoing.

7. GGT - Graphical GenoTypes

● Official URL: http://www.plantbreeding.wur.nl

● What can you do: Software for visualization and analysis of genetic data.

● Highlights:
■ The current version has many options for genetic analysis of populations including diversity
analyses and simple association studies.
■ The GGT package was developed in a plant-breeding context and thus focuses on plant genetic
data but was not intended to be limited to plants only.

8. GenomeVx - circular chromosome visualization

● Official URL: http://wolfe.gen.tcd.ie/GenomeVx/

● What can you do: Simple web-based creation of editable circular chromosome maps.

● Highlights:
■ GenomeVx is a web-based tool for making editable, publication-quality, maps of chloroplast &
mitochondrial genomes and of large plasmids.
■ These maps show the position scales, chromosomal features, and location of genes.
■ The program takes as input either GenBank record or raw feature positions.
■ In the latter case, features are automatically extracted and colored, an example of which is given.
■ Output is in the Adobe Portable Document Format (PDF) and can be modified by programs such
as Adobe Illustrator.

28
9. Genomes OnLine Database (GOLD) - genomic and metagenomic projects and
their associated metadata

● Official URL: http://www.genomesonline.org/

● What can you do: Access information regarding complete and ongoing genome and metagenome
projects around the world.

● Highlights:
■ The Genomes On Line Database (GOLD) is an extensive resource for centralized monitoring of
genome and metagenome projects globally.
■ Both ongoing and complete projects, along with their associated metadata, can be accessed in
GOLD via precomputed tables and a search page.
■ As of September 2009, GOLD comprises information for over 5800 sequencing projects, of which
1100 have been finished and their sequence data stored in a public repository.
■ GOLD continues to expand, moving toward the goal of providing the most extensive repository of
metadata information pertaining to the projects and their environments/organisms in accordance
with the Minimum Information about a (Meta)Genome Sequence (MIGS/MIMS) specification.

10. Genomes at the EBI

● Official URL: http://www.ebi.ac.uk/genomes/

● What can you do: Search for EBIx92s collection of databases for the analysis of complete and
unfinished viral, pro-and eukaryotic genomes.

● Highlights:
■ This website permits access to a large number of complete and unfinished genomes, including
primary sequence data and secondary data derived from different analyses.

11. IMG - Integrated microbial genomes system

● Official URL: http://img.jgi.doe.gov/

● What can you do: Search and analyze microbial genomes from a comprehensive database by the
DOE-Joint Genome Institute (JGI).

29
● Highlights:
■ The integrated microbial genomes (IMG) system functions as a community resource for
comparative analysis of publicly available genomes in an extensive integrated context.
■ IMG contains both complete & draft microbial genomes integrated with other publicly available
genomes from all three domains of life, along with a huge number of viruses and plasmids.
■ It offers tools and viewers for reviewing and analyzing the annotations of genes and genomes in a
comparative context.
■ From its first release in 2005, IMG's analytical capabilities and data content have been steadily
expanded through regular releases.
■ Many companion IMG systems have been established in order to serve domain-specific needs,
such as expert review of genome annotations.

12. IMG/M - Integrated Microbial Genomes/Metagenomes

● Official URL: http://img.jgi.doe.gov/m

● What can you do: A data management and analysis system for metagenomes.

● Highlights:
■ IMG/M provides IMG's comparative data analysis tools extended to handle metagenome data,
together with metagenome-specific analysis tools.
■ IMG/M consists of metagenome data integrated with isolated microbial genomes from the
Integrated Microbial Genomes (IMG) system.
■ IMG/M is a data management and analysis system for microbial community genomes
(metagenomes) hosted at the Department of Energy's (DOE) Joint Genome Institute (JGI).

13. Invertebrate Homologous Genes Database

● Official URL: http://pbil.univ-lyon1.fr/databases/hoinvgen/HOINVGEN.html

● What can you do: Search for information on homologous invertebrate genes.

● Highlights:
■ INVertebrate HOmologous GENes (INVHOGEN) is a database mixing the available invertebrate
protein genes from UniProt (consisting of Swiss-Prot and TrEMBL) into gene families.
■ For each family, INVHOGEN offers several protein alignments, a maximum likelihood-based
phylogenetic tree, and taxonomic information regarding the sequences. It is possible to download
the corresponding GenBank flat files, the alignment, and the tree in Newick format.

30
■ Sequences and pertaining information have been structured in an ACNUC database under a
client/server architecture. Hence, complex selections can be carried out.
■ An external graphical tool (FamFetch) permits access to the data to assess homology
relationships between genes and differentiate orthologous from paralogous sequences.

14. KEGG - resource for deciphering the genome

● Official URL: http://www.genome.jp/kegg

● What can you do: Search a database of biological systems that integrates genomic, chemical, and
systemic functional information.

● Highlights:
■ KEGG offers a reference knowledge base for linking genomes to life via the process of PATHWAY
mapping, which is to map, for instance, a transcriptomic or genomic content of genes to KEGG
reference pathways to infer systemic behaviors of the organism or the cell.
■ Apart from this, KEGG offers a reference knowledge base for bridging genomes to the
environment, such as for the evaluation of drug-target relationships, via the process of BRITE
mapping.
■ KEGG BRITE is an ontology database representing functional hierarchies of different biological
objects, like drugs, diseases, organisms, cells, molecules & relationships between them.
■ KEGG PATHWAY is currently supplemented with a new global map of metabolic pathways, which
is essentially a blended map of about 120 existing pathway maps.
■ Besides this, smaller pathway modules are defined and stored in KEGG MODULE that even
comprises other complexes & functional units.
■ The KEGG resource is being expanded to meet the needs for practical applications.
■ KEGG DRUG comprises all approved drugs in Japan & the US, and KEGG DISEASE is a new
database bridging diagnostic markers, drugs, pathways, and disease genes.

15. KaryotypeDB - Karyotype and chromosome data for animal and plant species

● Official URL: http://www.nenno.it/karyotypedb/

● What can you do: Search for Karyotype and chromosome data for animal and plant species.

● Highlights:
■ The Karyotype database (KaryotypeDB) comprises chromosome and karyotype information such
as chromosome number, length, karyotype features, idiograms, physical localization of DNA
sequences by fluorescence in situ hybridization (FISH), and cell material for metaphase

31
chromosomes and polytene chromosomes from different animal and plant species together with
literature references and links

16. MBGD - Microbial genome database for comparative analysis

● Official URL: http://mbgd.genome.ad.jp/

● What can you do: Conduct a comparative analysis of completely sequenced microbial genomes.

● Highlights:
■ Some analysis functions, such as the function to find orthologs with similar phylogenetic patterns,
have also been improved.
■ To utilize the MBGD database as a comprehensive resource for investigating microbial genome
diversity, the following advanced functionalities have been developed: (i) enhanced assignment of
functional annotation, including external database links to each orthologous group, (ii) interface
for choosing a set of genomes to compare based on phenotypic properties, (iii) the addition of
more eukaryotic microbial genomes (fungi and protists) and some higher eukaryotes as
references and (iv) enhancement of the MyMBGD mode, which allows users to add their own
genomes to MBGD and now accepts raw genomic sequences without any annotation (in such a
case, it runs a gene-finding procedure before identifying the orthologs).
■ The database contains almost 1000 genomes.
■ A prominent feature of MBGD is that it allows users to create ortholog groups using a specified
subgroup of organisms.
■ The microbial genome database (MBGD) for comparative analysis is a platform for microbial
comparative genomics-based on automated ortholog group identification.

17. MetaLook - a 3D visualization software for marine ecological genomics

● Official URL: http://www.megx.net/metalook/

● What can you do: For visualization and analysis of marine ecological genomic and metagenomic data
with respect to habitat parameters.

● Highlights:
■ MetaLook provides a 3-D user interface to interactively visualize DNA sequences on a world map,
based on a centralized georeferenced database.
■ The user can define environmental containers to organize the sequences according to different
habitat criteria.

32
■ To find similar sequences, the containers can be queried with either gene from the georeferenced
database or user-imported sequences, with the BLAST algorithm.
■ This permits an interactive analysis of the distribution of gene functions in the environment.

18. NCBI Genomic Biology

● Official URL: http://www.ncbi.nlm.nih.gov/Genomes/

● What can you do: Provides several genomic biology tools and resources, including organism-specific
pages that include links to many websites and databases relevant to that species.

● Highlights:
■ Organisms: Arabidopsis, Aspergillus, Bee, Beetle, Cat, Chicken, Chimpanzee, Cow, Dictyostelium,
Dog, Frog, Fruit Fly, Horse, Human, Malaria, Mosquito, Mouse, Nematode, Opossum, Pig, Rabbit,
Rat, Rhesus macaque, Sea Urchin, Sheep, Yeast (Saccharomyces), Zebra Finch, Zebrafish.

19. OriDB - a DNA replication origin database

● Official URL: http://www.oridb.org

● What can you do: Search for both confirmed and predicted S.cerevisiae DNA replication origin sites.

● Highlights:
■ OriDB offers a web-based catalog of these confirmed and predicted S.cerevisiae DNA replication
origin sites.
■ The record for each site comprises the following details: genomic location and chromosome
context of the origin site; time of origin replication; DNA sequence of proposed or experimentally
confirmed origin elements; free energy required to open the DNA duplex (stress-induced DNA
duplex destabilization or SIDD); and phylogenetic conservation of sequence elements.
■ Origin sites are linked to several external resources, like the Saccharomyces Genome Database
(SGD) and relevant publications at PubMed.
■ At last, a Chromosome Viewer utility permits users to interactively generate graphical
representations of DNA replication data genome-wide.
■ As of 2006, the sites are confined to budding yeast (S. cerevisiae).

20. OrthoDB - the hierarchical catalog of eukaryotic orthologs

● Official URL: http://cegg.unige.ch/orthodb

33
● What can you do: Find groups of orthologous genes.
.
● Highlights:
■ A COG-like and Inparanoid-like ortholog delineation procedure was used on the basis of
all-against-all Smith-Waterman sequence comparisons to analyze 58 eukaryotic genomes,
focusing on vertebrates, insects and fungi to facilitate further comparative studies.
■ Catalogs groups of orthologous genes in a hierarchical manner, at each radiation of the species
phylogeny, from more general groups to more fine-grained delineations between closely related
species.

21. OrthoMCL-DB - querying a comprehensive multi-species collection of


ortholog groups

● Official URL: http://orthomcl.org/orthomcl/

● What can you do: Search for predicted ortholog group from 55 species.
.
● Highlights:
■ OrthoMCL-DB provides a centralized warehouse for orthology prediction among multiple species.
■ OrthoMCL software, the entire FASTA dataset employed and clustering results are available for
download.
■ Information for ortholog groups includes the phyletic profile, the list of member proteins and a
multiple sequence alignment, a statistical summary and graphical view of similarities, and a
graphical representation of domain architecture.
■ The ortholog database may be queried based on protein or group accession numbers, keyword
descriptions or BLAST similarity.
■ OrthoMCL software was used to cluster proteins based on sequence similarity, using an
all-against-all BLAST search of each species' proteome, followed by normalization of inter-species
differences, and Markov clustering.
■ A total of 511,797 proteins (81.6% of the total dataset) were clustered into 70,388 ortholog
groups.
■ The OrthoMCL database houses ortholog group predictions for 55 species, including 16 bacterial
and 4 archaeal genomes representing phylogenetically diverse lineages, and most currently
available complete eukaryotic genomes: 24 unikonts (12 animals, 9 fungi, microsporidium,
Dictyostelium, Entamoeba), 4 plants oralgae and 7 apicomplexan parasites.

22. PHIDIAS - a pathogen-host interaction data integration and analysis system

34
● Official URL: http://www.phidias.us/

● What can you do: Information related to pathogen-host interactions.

● Highlights:
■ PHIDIAS is a web-based database system that operates as a centralized source to search,
compare, and analyze integrated gene expression data, conserved domain, and genome
sequences related to pathogen-host interactions (PHIs) for pathogen species designated as high
priority agents for biological security and public health.
■ Moreover, PHIDIAS permits the submission, search, and analysis of PHI genes and molecular
networks curated from peer-reviewed literature.

23. PSI SGKB - The Protein Structure Initiative Structural Genomics


Knowledgebase

● Official URL: http://kb.psi-structuralgenomics.org/

● What can you do: Designed to turn the products of the Protein Structure Initiative into knowledge that is
important for understanding living systems and disease.

● Highlights:
■ In collaboration with the Nature Publishing Group, the PSI SGKB provides a research library,
editorials about new research advances, news, and an events calendar to present a broader view
of structural biology and structural genomics.
■ It also offers the ability to search all of the structural and methodological publications and the
innovative technologies that were catalyzed by the PSI's high-throughput research efforts.
■ This resource provides central access to structures in the Protein Data Bank (PDB), along with
functional annotations, associated homology models, worldwide protein target tracking
information, available protocols, and the potential to obtain DNA materials for many of the
targets.

24. PhylomeDB - a database for genome-wide collections of gene phylogenies

● Official URL: http://phylomedb.org/

● What can you do: Database of complete phylomes derived for different genomes within a specific
taxonomic range.

● Highlights:

35
■ The current version of PhylomeDB includes the phylomes of humans, the yeast Saccharomyces
cerevisiae, and the bacterium Escherichia coli, comprising a total of 32 289 seed sequences with
their corresponding alignments and 172 324 phylogenetic trees.
■ For each genome, PhylomeDB provides the alignments, phylogenetic trees, and tree-based
orthology predictions for every single encoded protein.
■ All phylomes in the database are built using a high-quality phylogenetic pipeline that includes
evolutionary model testing and alignment trimming phases.

25. Pseudofam - the pseudogene families database

● Official URL: http://pseudofam.pseudogene.org/

● What can you do: A database of pseudogene families based on the protein families from the Pfam
database.

● Highlights:
■ Pseudofam offers resources for evaluating the family structure of pseudogenes like sequence
alignments, statistical summaries, and query tools.
■ It contains over 125,000 pseudogenes identified from 10 eukaryotic genomes and aligned within
nearly 3000 families (approximately one-third of the total families in PfamA).
■ Pseudofam utilizes a large-scale parallelized homology search algorithm (implemented as an
extension of the PseudoPipe pipeline) to identify pseudogenes.
■ Each identified pseudogene is assigned to its parent protein family and subsequently aligned to
each other by transferring the parent domain alignments from the Pfam family.
■ Pseudogenes are also given additional annotations depending on an ontology, reflecting their
mode of creation and subsequent history.
■ The annotation emphasizes the relation of pseudogene families with genomic features, such as
segmental duplications.
■ Besides this, pseudogene families are associated with key statistics, which identify outlier
families with an unusual degree of pseudogenization.
■ The statistics additionally indicate how the number of genes and pseudogenes in families
correlates across various species.

26. Pseudogene.org - a comprehensive database and comparison platform for


pseudogene annotation

● Official URL: http://pseudogene.org/

● What can you do: Search for annotated information on pseudogene from a comprehensive collection.

36
● Highlights:
■ The Pseudogene.org knowledgebase functions as a comprehensive repository for pseudogene
annotation.
■ It combines a variety of heterogeneous resources and backs a subset structure that emphasizes
certain groups of pseudogenes.
■ Tools are offered for the comparison of sets and the creation of layered set unions, enabling
researchers to derive a current 'consensus' set of pseudogenes.
■ As of 2006, the database comprises over 100,000 pseudogenes spanning 64 prokaryote and 11
eukaryote genomes, also a collection of human annotations compiled from 16 sources.

27. Quadbase - a database of quadruplex motifs

● Official URL: http://quadbase.igib.res.in/

● What can you do: Genome-wide database of G4 DNA--occurrence and conservation in human,
chimpanzee, mouse, and rat promoters and 146 microbes.

● Highlights:
■ This is a compendium of quadruplex motifs, with a specific focus on their occurrence and
conservation in promoters-QuadBase.
■ It is composed of two components (EuQuad and ProQuad).
■ EuQuad provides information on quadruplex motifs present within 10 kb of transcription starts
sites in 99 980 human, chimpanzee, rat, and mouse genes.
■ ProQuad comprises quadruplex details of 146 prokaryotes.
■ Besides gene-specific searches for quadruplex motifs, QuadBase has a number of other modules.
■ 'Orthologs Analysis' queries for conserved motifs across species depending on a chosen
reference organism; 'Pattern Search' can be employed to get specific motifs of interest from a
selected organism with user-defined criteria for quadruplex motifs, i.e. stem, loop size, etc.
'Pattern Finder' tool can look for motifs in any given sequence.

28. THOR - targeted high-throughput ortholog reconstructor

● Official URL: http://www.bcgsc.ca/platform/bioinfo/software/thor

● What can you do: Designed to assemble target genomic sequence orthologs in low-coverage genomes.

● Highlights:

37
■ Low-coverage genomes (LCGs) are becoming an increasingly pertinent source of data for
phylogenetic researches.
■ Nonetheless, the assembly of these genomes is difficult, time-consuming, and lags behind
sequence generation.
■ THOR is a rapid, stringent application for targeted reconstruction of sequence orthologs in
unassembled LCGs.
■ With a 4x coverage set of mouse whole-genome sequence reads, THOR could partially or
completely reconstruct 416/1000 human promoter ortholog regions in around 7.3 min/promoter.
■ THOR's reconstruction rate enhances markedly with both higher-coverage and less divergent
target species.

29. TIGR (The Institute for Genomic Research) Microbial Database

● Official URL: http://cmr.tigr.org/tigr-scripts/CMR/CmrHomePage.cgi

● What can you do: Retrieve genetic information of published microbial genomes and chromosomes and
those in progress.

● Highlights:
■ The CMR (Comprehensive Microbial Resource) contains analysis on completed microbial genome
sequencing.
■ It offers a set of curated databases containing DNA and protein sequence, gene expression,
cellular role, protein family, and taxonomic data for microbes, plants, and humans.

30. The European Bioinformatics Institute's data resources - towards systems


biology

● Official URL: http://www.ebi.ac.uk/

● What can you do: Search and use an extensive collection of molecular biology databases and tools.

● Highlights:
■ The European Bioinformatics Institute's (EBI's) tools and databases have transformed to fulfill the
changing demands of molecular biologists: new databases covering proteinx96protein
interactions (IntAct), pathways (Reactome), and small molecules (ChEBI) have been introduced.
■ Existing core databases have continued to evolve to meet the changing needs of biomedical
researchers, and have developed new data-access tools that assist biologists to move intuitively
through the various data types, thereby assisting them to put the parts together to perceive
biology at the systems level.

38
■ In the most recent update, a description of how the EMBL-EBI's biomolecular databases are
evolving to cope with increasing levels of submission, a growing and diversifying user base, and
the demand for new types of data is provided.

31. The National Microbial Pathogen Database Resource (NMPDR) - a genomics


platform based on subsystem annotation

● Official URL: http://www.nmpdr.org/

● What can you do: Search for annotated genomic information on pathogenic bacteria.

● Highlights:
■ The National Microbial Pathogen Data Resource (NMPDR) is a National Institute of Allergy and
Infectious Disease (NIAID)-funded Bioinformatics Resource Center that supports research in
certain Category B pathogens.
■ NMPDR comprises the entire genomes of around 50 strains of pathogenic bacteria as well as
>400 other genomes.
■ It combines whole, public genomes with expertly curated biological subsystems to provide the
most consistent genome annotations.
■ NMPDR offers an extensive bioinformatics platform, with tools and viewers for genome analysis.
■ NMPDR tools are Signature Genes, which identify the set of genes in common or that
differentiates two groups of organisms.
■ Drug target identification and high-throughput, in silico, compound screening are in progress.

32. The Plant DNA C-values Database

● Official URL: http://data.kew.org/cvalues/homepage.html

● What can you do: Search for information on plant DNA C-values and genome sizes.

● Highlights:
■ It combines data from the Angiosperm DNA C-values Database (release 6.0, Oct 2005),
Gymnosperm DNA C-values Database (release 3.0, Dec. 2004), the Pteridophyte DNA C-values
Database (release 3.0, Dec. 2004 ), the Bryophyte DNA C-values Database (release 2.0, Dec. 2004),
together with the addition of the Algae DNA C-values database (release 1.0, Dec. 2004).
■ The Plant DNA C-values Database currently contains data for 5150 different plant species.

39
33. TransportDB - A Relational Database of Cellular Membrane Transport
Systems

● Official URL: http://www.membranetransport.org/

● What can you do: Predict cellular membrane transport proteins in organisms whose complete genome
sequences are available.

● Highlights:
■ It permits BLAST search against known transporter protein sequences, comparison of transport
systems from different organisms, and phylogenetic trees of individual transporter families.
■ For each organism, the complete set of membrane transport systems was found and categorized
into several types and families based on putative membrane topology, protein family,
bioenergetics, and substrate specificities.

34. TreeBASE - A database of phylogenetic knowledge

● Official URL: http://treebase.org/treebase-web/home.html

● What can you do: Search for phylogenetic trees and the data matrices used to generate them from
published research papers.

● Highlights:
■ TreeBASE is a relational database devised to manage and explore information on phylogenetic
relationships.
■ Its main role is to record published phylogenetic trees and data matrices. It additionally includes
bibliographic information on phylogenetic studies, and some information on taxa, characters,
algorithms used, and analyses performed.
■ The database is devised to permit the retrieval and recombination of trees and data from multiple
studies, and it can be studied interactively with trees provided in the database.

35. Visualization for genomics - the Microbial Genome Viewer

● Official URL: http://mgv2.cmbi.ru.nl/genome/index.html

● What can you do: Interactively visualize microbial genomes and generate high-quality scalable images.

● Highlights:

40
■ Microbial Genome Viewer is a web-based visualization tool that permits the user to integrate
intricate genomic data in a highly interactive manner.
■ It allows the interactive production of chromosome wheels and linear genome maps from
genome annotation data stored in a MySQL database.
■ The produced images are in scalable vector graphics (SVG) format, which is suitable for making
dynamic Web representations and high-quality scalable images.
■ Gene-related data like transcriptome and time-course microarray experiments can be
superimposed on the maps for visual inspection.

36. dbGap - a Database of Genome-Wide Association Studies

● Official URL: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gap

● What can you do: Search for data from wide association (GWA) studies.

● Highlights:
■ dbGaP is a database of Genotype and Phenotype, will for the first time offer a central spot for
interested parties to observe all study documentation and to see summaries of the quantified
variables in a structured and searchable web format.
■ The database additionally offers pre-computed analyses of the level of the statistical association
between genes and selected phenotypes.
■ Genotype data are procured by employing high-throughput genotyping arrays to check subjects'
DNA for single nucleotide polymorphisms (SNPs), areas of the genome that have been discovered
to differ among humans.
■ The first release of dbGaP (Dec. 2006) comprises data on two studies: the Age-Related Eye
Diseases Study (AREDS), and the National Institute of Neurological Disorders and Stroke
Parkinsonism Study, a case-controlled study that collected detailed phenotypic data, cell line
samples, and DNA of 2,573 subjects.

37. eggNOG - evolutionary genealogy of genes: Non-supervised Orthologous


Groups

● Official URL: http://eggnog.embl.de

● What can you do: Discover orthologous groups of genes.

● Highlights:
■ Contains orthologous groups constructed from Smith-Waterman alignments through the
detection of reciprocal best matches and triangular linkage clustering.

41
■ Implementing this protocol to 630 complete genomes (529 bacteria, 46 archaea, and 55
eukaryotes), which is a 2-fold increase compared to the previous version, resulted in 224,847 OGs,
including 9724 extended versions of the original COG and KOG.
■ OGs were computed for different levels of the tree of life; in addition to the species groups
included in the first release (i.e. fungi, metazoa, insects, vertebrates, and mammals), have now
constructed OGs for archaea, fishes, rodents, and primates.
■ The non-supervised orthologous groups (NOGs) were automatically annotated with functional
descriptions, protein domains, and functional categories as defined at the beginning for the
COG/KOG database.
■ In-depth analysis is mediated by precomputed high-quality multiple sequence alignments and
maximum-likelihood trees for each of the available OGs.
■ eggNOG entails 2,242 035 proteins (built from 2,590,259 proteins) and offers a broad functional
description for minimum 1,966,709 (88%) of them.

38. euGenes - a Eukaryota genome information system

● Official URL: http://eugenes.org/

● What can you do: Search for a common summary of eukaryotic genes and genomes.

● Highlights:
■ euGenes is a genome information database and system that offers a common summary of
eukaryotic genomes and genes.
■ The genomes added are zebrafish, mosquito, rat, Arabidopsis thaliana, Saccharomyces cerevisiae,
Caenorhabditis elegans, fruit fly, mouse, and human.
■ The summary includes the following details: Gene product information (homologies, structure,
and function), molecular & genetic map information, full name, and gene symbol.
■ Links to extended gene information.

39. mGene.web - Web service for accurate computational gene finding

● Official URL: http://www.mgene.org/web

● What can you do: Use to predict protein sequences from DNA.

● Highlights:
■ mGene.web is a web service for the genome-wide prediction of protein-coding genes from
eukaryotic DNA sequences.

42
■ It provides pre-trained models for the identification of gene structures including untranslated
regions in an increasing number of organisms.
■ Users have the additional likelihood to train the system with their own data for other organisms.
■ The system is established in a highly modular way, such that individual parts of the framework,
such as the promoter prediction tool or the splice site predictor, can be employed autonomously.
■ The underlying gene finding system mGene is depending on discriminative machine learning
techniques and its high accuracy has been illustrated in an international competition on nematode
genomes.
■ mGene.web is openly accessible and can be applied for eukaryotic genomes of small to moderate
size (several hundred Mbp).

40. The Fungal Genome Size Database

● Official URL: http://www.zbi.ee/fungal-genomesize/

● What can you do: Search for information on fungal haploid DNA C-values and genome sizes.

● Highlights:
■ This database is an extensive cataloger of fungal genome size and haploid DNA contents data,
derived from published studies.

3. Genome annotation terms, ontologies, nomenclature, and


classification

1. BioPortal - Biology Portal to Biomedical Ontologies

● Official URL: http://bioportal.bioontology.org/

● What you can do: Find information about biomedical ontologies to drive data integration, information
retrieval, data annotation, natural-language processing, and decision support.

● Highlights:
■ BioPortal is an open repository of biomedical ontologies that provides access via Web services
and Web browsers to ontologies developed in OWL, RDF, OBO format, Protxe9gxe9 frames.
■ Its functionality includes the ability to browse, search and visualize ontologies.

43
■ The Web interface also facilitates community-based participation in the evaluation and evolution
of ontology content by providing features to add notes to ontology terms, mappings between
terms, and ontology reviews based on criteria such as usability, domain coverage, quality of
content, and documentation and support.
■ It enables the integrated search of biomedical data resources such as the Gene Expression
Omnibus (GEO), ClinicalTrials.gov, and ArrayExpress, through the annotation and indexing of these
resources with ontologies in BioPortal.

2. DAVID - A Database for Annotation, Visualization, and Integrated Discovery

● Official URL: http://david.abcc.ncifcrf.gov/

● What you can do: Conduct comprehensive gene annotation, expression data analysis, biological
pathway mapping, and other functional genomics tasks.

● Highlights:
■ All tools in the DAVID Bioinformatics Resources aim to provide a functional interpretation of large
lists of genes derived from genomic studies.
■ The updated DAVID Bioinformatics Resources consists of the DAVID Knowledgebase and five
integrated, web-based functional annotation tool suites: the DAVID Gene Functional Classification
Tool, the DAVID Functional Annotation Tool, the DAVID Gene ID Conversion Tool, the DAVID Gene
Name Viewer, and the DAVID NIAID Pathogen Genome Browser.
■ The expanded DAVID Knowledgebase now integrates almost all major and well-known public
bioinformatics resources centralized by the DAVID Gene Concept, a single-linkage method to
agglomerate tens of millions of diverse gene/protein identifiers and annotation terms from a
variety of public bioinformatics databases.
■ For any uploaded gene list, the DAVID Resources now provides not only the typical gene-term
enrichment analysis but also new tools and functions that allow users to condense large gene
lists into gene functional groups, convert between gene/protein identifiers, visualize
many-genes-to-many-terms relationships, cluster redundant and heterogeneous terms into
groups, search for interesting and related genes or terms, dynamically view genes from their lists
on bio-pathways and more.

3. DAVID Knowledgebase -- a backend database used for all DAVID bioinformatics


tools

● Official URL:
http://david.abcc.ncifcrf.gov/content.jsp?file=/knowledgebase/DAVID_knowledgebase.html

44
● What you can do: A gene-centered database integrating heterogeneous gene annotation resources to
facilitate high-throughput gene functional analysis.

● Highlights:
■ The DAVID Knowledgebase is built around the DAVID Gene Concept, a single-linkage method to
agglomerate tens of millions of gene/protein identifiers from various public genomic resources
into DAVID gene clusters.
■ Such identifiers’ grouping improves the cross-reference capability, particularly across NCBI and
UniProt systems, enabling more than 40 publicly available functional annotation sources to be
comprehensively integrated and centralized by the DAVID gene clusters.
■ The simple, pair-wise, text format files which make up the DAVID Knowledgebase are freely
downloadable for various data analysis uses.
■ Also, a well-organized web interface allows users to query different types of heterogeneous
annotations in a high-throughput manner.

4. Evigan - an automated gene annotation program for eukaryotic genomes

● Official URL: http://www.seas.upenn.edu/~strctlrn/evigan/evigan.html

● What you can do: A hidden variable model for integrating gene evidence for eukaryotic gene prediction.

● Highlights:
■ Evigan can accommodate a variety of evidence types, including (but not limited to) gene models
computed by diverse gene finders, BLAST hits, EST matches, and splice site predictions; learned
parameters encode the relative quality of evidence sources.
■ Since separate training data are not required (apart from the training sets used by individual gene
finders), Evigan is lovely for newly sequenced genomes where little or no reliable manually curated
annotation is available.
■ The ability to produce a ranked list of alternative gene models may facilitate the identification of
alternatively spliced transcripts.

5. FunSimMat - a comprehensive functional similarity database

● Official URL: http://funsimmat.bioinf.mpi-inf.mpg.de/

● What you can do: Database that provides several different semantic similarity measures for GO terms.

● Highlights:

45
■ The Functional Similarity Matrix is a comprehensive database providing various precomputed
functional similarity values for proteins in UniProtKB and protein families in Pfam and SMART.
■ Data from the Gene Ontology Annotation project has been added, as well as new functional
similarity measures.
■ The applicability of the database is greatly extended by implementing a new Gene Ontology-based
method for disease gene prioritization.
■ Two new visualization tools allow interactive analysis of the functional relationships between
proteins or protein families.
■ This is enhanced further by the introduction of an automatically derived hierarchy of annotation
classes.
■ Additional changes include a revised user front-end and a new RESTlike interface for improving
the user-friendliness and online accessibility of FunSimMat.

6. GALA - the database of Genome ALignments and Annotations

● Official URL: http://www.bx.psu.edu/

● What you can do: Search annotated sequence alignment from interlinked relational databases for five
vertebrate species, human, chimpanzee, mouse, rat, and chicken.

● Highlights:
■ The GALA database(s) incorporate genomic annotation information with multi-species
alignments to allow complex querying on publicly available sequence information.
■ The annotation information includes genes, SNPs, alignments, disease association, and gene
expression levels from multiple sources.
■ Users may formulate simple or complex queries by querying any field in the database individually
or by combining queries to refine and narrow the inquiry’s scope, respectively.

7. GO - the Gene Ontology Database

● Official URL: http://www.geneontology.org/

● What you can do: Search structured, controlled vocabularies for community use in annotating genes,
gene products, and sequences.

● Highlights:
■ The Gene Ontology (GO) Consortium (GOC) continues to develop, maintain and use a set of
structured, controlled vocabularies for the annotation of genes, gene products, and sequences.
■ The GO ontologies are expanding both in content and in structure.

46
■ Several new relationship types have been introduced and used, along with existing relationships,
to create links between and within the GO domains.
■ These improve the representation of biology, facilitate querying, and allow GO developers to check
for systematically correct inconsistencies within the GO.
■ Gene product annotation using GO continues to increase both in the number of total annotations
and species coverage.
■ GO tools, such as OBO-Edit, an ontology-editing tool, and AmiGO, the GOC ontology browser, have
seen major improvements in functionality, speed, and ease of use.

8. GOA - The Gene Ontology Annotation Database

● Official URL: http://www.ebi.ac.uk/GOA

● What you can do: Search gene information annotated based on the standardized vocabularies of Gene
Ontology.

● Highlights:
■ The Gene Ontology Annotation project provides high-quality electronic and manual associations
(annotations) of Gene Ontology (GO) terms to UniProt Knowledgebase (UniProtKB) entries.
■ The project’s annotations are collated with annotations from external databases to provide an
extensive, publicly available GO annotation resource.
■ With over 160 000 taxa, with greater than 32 million annotations, GOA remains the largest and
most comprehensive open-source contributor to the GO Consortium (GOC) project.
■ The group has augmented the number and coverage of their electronic pipelines, and several new
manual annotation projects and collaborations now further enhance this resource.
■ A range of files facilitate the download of annotations for particular species and GO term
information and associated annotations can also be viewed and downloaded from the newly
developed GOA QuickGO tool, which allows users to precisely tailor their annotation set.

9. GOEAST - Gene Ontology Enrichment Analysis Software Toolkit

● Official URL: http://omicslab.genetics.ac.cn/GOEAST/

● What you can do: A web-based software toolkit for Gene Ontology enrichment analysis.

● Highlights:
■ Compared with available GO analysis tools, GOEAST has the following improved features:
➢ GOEAST displays enriched GO terms in a graphical format according to their relationships
in the hierarchical tree of each GO category (biological process, molecular function, and

47
cellular component), therefore, provides a better understanding of the correlations among
enriched GO terms.
➢ GOEAST supports analysis for data from various sources (probe or probe set IDs of
Affymetrix, Illumina, Agilent, or customized microarrays, as well as different gene
identifiers) and multiple species (about 60 prokaryotic and eukaryotic species).
➢ One unique feature of GOEAST is to allow cross-comparison of the GO enrichment status
of multiple experiments to identify functional correlations among them.
■ GOEAST also provides rigorous statistical tests to enhance the reliability of analysis results.

10. Gendoo - GENe, Disease features Ontology-based Overview system

● Official URL: http://gendoo.dbcls.jp/

● What you can do: Find information about diseases, genes, and drugs related to MeSH vocabulary.

● Highlights:
■ Gendoo is a web server that profiles gene and disease features using MeSH vocabulary.
■ Diseases and genes were characterized by generating feature profiles of associated drugs,
biological phenomena, and anatomy with the MeSH (Medical Subject Headings) vocabulary.
■ 1,760,054 pairs of OMIM entries and MeSH terms were obtained using the full set of MEDLINE
articles.
■ Gendoo's web application visualizes these profiles.
■ Gendoo and the developed feature profiles are useful for omics analysis from molecular and
clinical viewpoints.

11. HCGene - Hierarchical Classification of Genes

● Official URL: http://homes.dsi.unimi.it/~valenti/SW/hcgene/

● What you can do: A software tool to support the hierarchical classification of genes.

● Highlights:
■ HCGene implements methods to process and analyze the Gene Ontology and the FunCat
taxonomy to support the functional classification of genes.
■ HCGene allows the extraction of subgraphs and subtrees related to specific biological problems,
labeling genes and gene products with multiple and hierarchical functional classes, and the
association of different types of bio-molecular data to genes for learning to predict their
functions.

48
12. HGNC Database - HUGO Gene Nomenclature

● Official URL: http://www.genenames.org/

● What you can do: Search approved symbols for human genes.

● Highlights:
■ The HUGO Gene Nomenclature Committee (HGNC) aims to assign a unique and ideally
meaningful name and symbol to every human gene.
■ The HGNC database currently comprises over 24 000 public records containing approved human
gene nomenclature and associated gene information.
■ Recently relocated to the European Bioinformatics Institute with direct links to the searchable
HGNC database and other related database resources, such as the HCOP orthology search tool
and manually curated gene family webpages.

13. IGRhCELLID - Integrated Genetic Resources of Human CELL lines for


IDentification

● Official URL: http://igrcid.ibms.sinica.edu.tw/cgi-bin/index.cgi

● What you can do: Find information about common human cell lines.

● Highlights:
■ IGRhCellID, a database designed to integrate eight-cell identification methods, including seven
methods (STR profile, gender, immunotypes, karyotype, isoenzyme profile, TP53 mutation, and
mutations of cancer genes) available in various public databases and the method of profiling
genome alterations of human cell lines.
■ With data validation of 11 small deleted genes in human cancer cell lines, profiles of genomic
alterations further allow users to search for human cell lines with deleted gene to serve as
indigenous knock-out cell model (such as SMAD4 in gene view), with amplified gene to be the cell
models for testing therapeutic efficacy (such as ERBB2 in gene view) and with overlapped
aberrant chromosomal loci for revealing common cancer genes (such as 9p21.3 homozygous
deletion with co-deleted CDKN2A, CDKN2B and MTAP in chromosome view).
■ IGRhCellID provides not only available methods for cell identification to help eradicate concerns of
using misidentified cells but also designated genetic features of human cell lines for experiments.

14. IUBMB Nomenclature database

49
● Official URL: http://www.chem.qmul.ac.uk/iubmb/

● What you can do: Search for the nomenclature of enzymes, membrane transporters, electron transport
proteins, and other proteins.

● Highlights:
■ Enzyme Nomenclature
■ Protein Nomenclature
■ Electron transport protein Nomenclature

15. IUPAC Nomenclature database

● Official URL: https://www.npu-terminology.org/npu-database/

● What you can do: Search for Nomenclature of biochemical and organic compounds approved by the
IUBMB-IUPAC Joint Commission.

● Highlights:
■ Biochemical compound Nomenclature
■ Organic compound Nomenclature

16. IUPHAR-DB - the IUPHAR database of G protein-coupled receptors and ion


channels

● Official URL: https://www.guidetopharmacology.org/

● What you can do: Search for International Union of Pharmacology recommendations on receptor
nomenclature and drug classification.

● Highlights:
■ The IUPHAR database (IUPHAR-DB) integrates peer-reviewed pharmacological, chemical, genetic,
functional, and anatomical information on the 354 nonsensory G protein-coupled receptors
(GPCRs), 71 ligand-gated ion channel subunits, and 141 voltage-gated-like ion channel subunits
encoded by the human, rat, and mouse genomes.
■ These genes represent the targets of approximately one-third of currently approved drugs and are
a major focus of drug discovery and development programs in the pharmaceutical industry.
■ IUPHAR-DB provides a comprehensive description of the genes and their functions, with
information on protein structure and interactions, ligands, expression patterns, signaling

50
mechanisms, functional assays, and biologically important receptor variants (e.g., single
nucleotide polymorphisms and splice variants).
■ Also, the phenotypes resulting from altered gene expression (e.g., in genetically altered animals or
human genetic disorders) are described.
■ The database’s content is peer-reviewed by members of the International Union of Basic and
Clinical Pharmacology Committee on Receptor Nomenclature and Drug Classification
(NC-IUPHAR); the data are provided through manual curation of the primary literature by a network
of over 60 subcommittees of NC-IUPHAR.
■ Links to other bioinformatics resources, such as NCBI, Uniprot, HGNC, and the rat and mouse
genome databases are provided.

17. OBO-Edit - an ontology editor for biologists

● Official URL: https://sourceforge.net/project/showfiles.php?group_id=36855

● What you can do: An open-source, platform-independent ontology editor.

● Highlights:
■ Developed and maintained by the Gene Ontology Consortium.
■ Implemented in Java, OBO-Edit uses a graph-oriented approach to display and edit ontologies.
■ OBO-Edit is particularly valuable for viewing and editing biomedical ontologies.

18. PANTHER --Protein Analysis THrough Evolutionary Relationships

● Official URL: http://www.pantherdb.org/

● What you can do: Browse and search proteins based on their biological functions.

● Highlights:
■ Protein Analysis THrough Evolutionary Relationships (PANTHER) is a comprehensive software
system for inferring genes’ functions based on their evolutionary relationships.
■ Phylogenetic trees of gene families form the basis for PANTHER, and these trees are annotated
with ontology terms describing the evolution of gene function from ancestral to modern-day
genes.
■ One of the main applications of PANTHER is accurate prediction of the functions of
uncharacterized genes, based on their evolutionary relationships to genes with functions known
from experiments.

51
■ It also includes software tools for analyzing genomic data relative to known and inferred gene
functions.
■ Since 2007, there have been several new developments to PANTHER: (i) improved phylogenetic
trees, explicitly representing speciation and gene duplication events, (ii) identification of gene
orthologs, including least diverged orthologs (best one-to-one pairs), (iii) coverage of more
genomes (48 genomes), (iv) improved support for alternative database identifiers for genes,
proteins and microarray probes and (v) adoption of the SBGN standard for the display of
biological pathways.
■ PANTHER trees are being annotated with gene function as part of the Gene Ontology Reference
Genome project, resulting in an increasing number of curated functional annotations.

19. PLAN2L - PLant ANnotation to Literature

● Official URL: http://zope.bioinfo.cnio.es/plan2l/plan2l.html

● What you can do: Find information about Arabidopsis thaliana.

● Highlights:
■ PLAN2L is a web-based online search system that integrates text mining and information
extraction techniques to systematically access information useful for analyzing genetic, cellular,
and molecular aspects of the plant model organism Arabidopsis thaliana.
■ It facilitates a more efficient retrieval of information relevant to heterogeneous biological topics,
from implications in biological relationships at the level of protein interactions and gene
regulation to subcellular locations of gene products and associations to cellular and
developmental processes, i.e., cell cycle, flowering, root, leaf and seed development.
■ Predefined pairs of entities can be provided as queries for which literature-derived relations
together with textual evidence are returned.

20. ProfCom - profiling of complex functionality

● Official URL: http://webclu.bio.wzw.tum.de/profcom/

● What you can do: A web tool for profiling the complex functionality of gene groups identified from
high-throughput data.

● Highlights:
■ ProfCom is a web-based tool for the functional interpretation of a gene list identified to be
related by experiments.

52
■ A trait that makes ProfCom a unique tool is an ability to profile enrichments of not only available
Gene Ontology (GO) terms but also of 'complex functions.’
■ A 'Complex function' is constructed as a Boolean combination of available GO terms.
■ The complex functions inferred by ProfCom are more specific in comparison to single terms and
describe more accurately the functional role of genes.
■ ProfCom provides a user-friendly dialog-driven web page submission available for several model
organisms and supports most available gene identifiers.
■ Besides, the web service interface allows the submission of any kind of annotation data.

21. RDP - The Ribosomal Database Project

● Official URL: http://rdp.cme.msu.edu/

● What you can do: Provides researchers with quality-controlled bacterial and archaeal small subunit
rRNA alignments and analysis tools.

● Highlights:
■ An improved alignment strategy uses the Infernal secondary structure-aware aligner to provide a
more consistently higher quality alignment and faster processing of user sequences.
■ Substantial new analysis features include a new Pyrosequencing Pipeline that provides tools to
support analysis of ultra-high-throughput rRNA sequencing data.
■ This pipeline offers a collection of tools that automate the data processing and simplify the
computationally intensive analysis of large sequencing libraries.
■ Besides, a new Taxonomic visualization tool allows rapid visualization of taxonomic
inconsistencies. It suggests corrections, and a new class Assignment Generator provides
instructors with a lesson plan and individualized teaching materials.

22. RIDOM - Ribosomal Differentiation of Medical Micro-organisms Database

● Official URL: http://www.ridom-rdna.de/

● What you can do differentiating medical microorganisms based on partial small subunit ribosomal
DNA (16S rDNA) sequence.

● Highlights:
■ This web server is an evolving electronic resource designed to provide micro-organism
differentiation services for medical identification needs.
■ The diagnostic procedure begins with a specimen partial small subunit ribosomal DNA (16S
rDNA) sequence.

53
■ A species or genus name for the specimen in question will be returned from a similarity search.
■ Where the first results are ambiguous or do not define species level, hints for further molecular,
i.e., internal transcribed spacer and conventional phenotypic differentiation, will be offered
(x91sequential and polyphasic approachx92).
■ Additionally, each entry in RIDOM contains detailed medical and taxonomic information linked,
context-sensitive, to external World Wide Web services.
■ Nearly all sequences are newly determined, and the sequence chromatograms are available for
intersubjective quality control.

23. SimCT - SIMilarity Clustering Tree

● Official URL: http://tagc.univ-mrs.fr/SimCT/

● What you can do: Use to find information about relationships between biological objects.

● Highlights:
■ SimCT is a web-based application that graphically displays the relationships between biological
objects (e.g., genes or proteins) based on their annotations to a biomedical ontology.
■ The result is presented as a tree of these objects, which can be viewed and explored through a
specific java applet to highlight relevant features.
■ SimCT draws a simplified representation of biological terms present in the set of objects and can
be applied to any ontology for which annotation data is available.

24. SuperPred - target-prediction server

● Official URL: https://prediction.charite.de/

● What you can do: Drug classification and target prediction.

● Highlights:
■ The web-server translates a user-defined molecule into a structural fingerprint that is compared
to about 6300 drugs, which are enriched by 7300 links to molecular targets of the drugs, derived
through text mining followed by manual curation.
■ Links to the affected pathways are provided.
■ The similarity to the medical compounds is expressed by the Tanimoto coefficient that gives the
structural similarity of the two compounds.
■ A similarity score higher than 0.85 results in correct ATC prediction for 81% of all cases.

54
■ As the biological effect is well predictable, the web-server allows prognoses about the medical
indication area of novel compounds and finds new leads for known targets if the structural
similarity is sufficient.

25. The NCBI Taxonomy Homepage

● Official URL: http://www.ncbi.nlm.nih.gov/Taxonomy/

● What you can do: Search for comprehensive taxonomy information of all organisms represented in
GenBank.

● Highlights:
■ The NCBI taxonomy database is not a primary source for taxonomic or phylogenetic information.
■ It attempts to incorporate phylogenetic and taxonomic knowledge from various sources,
including the published literature, web-based databases, and the advice of sequence submitters
and outside taxonomy experts.

26. The Ontology Lookup Service - more data and better tools for controlled
vocabulary queries

● Official URL: http://www.ebi.ac.uk/ols

● What you can do: Provides interactive and programmatic interfaces to query, browse and navigate an
ever-increasing number of biomedical ontologies and controlled vocabularies.

● Highlights:
■ The volume of data available for querying has more than quadrupled since it went into
production, and OLS functionality has been integrated into several high-usage databases and
data entry tools.
■ Improvements have been made to both OLS query interfaces, based on user feedback and
requirements, to improve usability and service interoperability and provide novel ways to perform
queries.

27. UMLS - The Unified Medical Language System

● Official URL: http://umlsks.nlm.nih.gov/

55
● What you can do: Search for standardized biomedical vocabularies, concepts, and their relationships.

● Highlights:
■ It is a repository of biomedical vocabularies developed by the US National Library of Medicine,
integrating over 2 million names for some 900 000 concepts from more than 60 families of
biomedical vocabularies, as well as 12 million relations among these concepts.
■ Vocabularies integrated into the UMLS Metathesaurus include the NCBI taxonomy, Gene
Ontology, the Medical Subject Headings (MeSH), OMIM, and the Digital Anatomist Symbolic
Knowledge Base.

28. Tree of Life

● Official URL: http://tolweb.org/tree/phylogeny.html

● What you can do: Search for comprehensive information on phylogeny and biodiversity.

● Highlights:
■ The Tree of Life Web Project (ToL) is a collaborative effort of biologists worldwide. On more than
4000 World Wide Web pages, the project provides information about the diversity of organisms
on Earth, their evolutionary history (phylogeny), and their characteristics.
■ Each page contains information about a particular group of organisms (e.g., echinoderms,
tyrannosaurs, phlox flowers, cephalopods, club fungi, or the salamander fish Western Australia).
ToL pages are linked one to another hierarchically, in the form of the evolutionary tree of life.
■ Starting with the root of all Life on Earth and moving out along diverging branches to individual
species, the ToL project’s structure thus illustrates the genetic connections between all living
things.

29. WEGO - a web tool for plotting GO annotations

● Official URL: http://wego.genomics.org.cn/

● What you can do: Plot gene ontology annotations onto a histogram graph.

● Highlights:
■ WEGO (Web Gene Ontology Annotation Plot) is a simple but useful tool for visualizing, comparing
and plotting to GO annotation results.
■ WEGO is designed to deal with the directed acyclic graph structure of GO to facilitate the
histogram creation of GO annotation results.

56
30. agriGO - Agriculture Gene Ontology

● Official URL: http://bioinfo.cau.edu.cn/agriGO/

● What you can do: Perform gene ontology analysis of agricultural species.

● Highlights:
■ agriGO is an integrated web-based GO analysis toolkit for the agricultural community to meet
analysis demands from new technologies and research objectives.
■ The supported organisms and gene identifiers include 38 agricultural species composed of 274
data types.
■ A new analysis approach using Gene Set Enrichment Analysis strategy and customizable
features is provided.
■ Four tools, SEA (Singular enrichment analysis), PAGE (Parametric Analysis of Gene set
Enrichment), BLAST4ID (Transfer IDs by BLAST), and SEACOMPARE (Cross comparison of SEA),
are integrated as a toolkit to meet different demands.
■ A cross-comparison service is provided so that different data sets can be compared and
explored in a visualized way.
■ agriGO functions as a GO data repository with search and download functions.

4. Genome browsers, genome annotation, genomic


sequence analysis

1. AMIGene - Annotation of MIcrobial Genes

● Official URL: http://www.genoscope.cns.fr/agc/tools/amigene/index.html

● What you can do: Automatically identify the most likely coding sequences (CDSs) in a large contig or a
complete bacterial genome sequence.

● Highlights:
■ The first step in AMIGene is dedicated to constructing Markov models that fit the input genomic
data (i.e., the gene model), followed by combining well-known gene-finding methods and a
heuristic approach for the selection of the most likely CDSs.

57
■ The web interface allows the user to select one or several gene models applied to the analysis of
the input sequence by the
■ AMIGene program and to visualize the list of predicted CDSs graphically and in a downloadable
text format.

2. BABELOMICS - advanced functional profiling of transcriptomics, proteomics,


and genomics experiments

● Official URL: http://www.babelomics.org/

● What you can do: Suite of web tools for the functional profiling of genome-scale experiments.

● Highlights:
■ Babelomics includes different flavors of conventional functional enrichment methods and more
advanced gene set analysis methods that make it a unique tool among the similar resources
available.
■ In addition to the well-known functional definitions (GO, KEGG), Babelomics includes new ones
such as Biocarta pathways or text mining-derived functional terms.
■ Regulatory modules implemented include transcriptional control (Transfac, CisRed) and other
regulation levels such as miRNA-mediated interference.
■ Moreover, Babelomics allows for sub-selection of terms to test a more focused hypothesis.
■ Also, gene annotation correspondence tables can be imported, which allows testing with
user-defined functional modules.
■ Finally, a tool for the 'de novo' functional annotation of sequences has been included in the
system. This allows using yet unannotated organisms in the program.
■ Babelomics has been extensively re-engineered. It includes web services and Web 2.0 technology
features, a new user interface with persistent sessions, and a new extended database of gene
identifiers.

3. BRIGEP x97 the BRIDGE-based genomex96transcriptomex96proteome browser

● Official URL: https://www.cebitec.uni-bielefeld.de/groups/brf/software/brigep/

● What you can do: Process and analyze bacterial genome, transcriptome, and proteome data.

● Highlights:
■ BRIGEP bioinformatics software system consists of three web-based applications: GenDB,
EMMA, and ProDB.

58
■ These applications facilitate the processing and analysis of bacterial genome, transcriptome and
proteome data and are actively used by numerous international groups.
■ Code bundles for these and other tools are accessible on an FTP server.

4. CNVDetector - locating copy number variations using array CGH data

● Official URL: http://www.csie.ntu.edu.tw/~kmchao/tools/CNVDetector/

● What you can do: A program for locating copy number variations in a single genome.

● Highlights:
■ CNVDetector has several merits:
➢ It can deal with the array CGH data even if the noise is not normally distributed.
➢ It has a linear time kernel.
➢ Its parameters can be easily selected.
➢ It evaluates the statistical significance for each CNV calling.

5. CREx - Common interval Rearrangement Explorer

● Official URL: http://pacosy.informatik.uni-leipzig.de/crex

● What you can do: Infer genomic rearrangements based on common intervals.

● Highlights:
■ CREx heuristically determines pairwise rearrangement events in unichromosomal genomes.
■ CREx considers transpositions, reverse transpositions, reversals, and
tandem-duplication-random-loss (TDRL) events.
■ It supports the user in finding parsimonious rearrangement scenarios given a phylogenetic
hypothesis.
■ CREx is based on common intervals, which reflect genes that appear consecutively in several of
the input gene orders.

6. CleanEST - the cleansed EST libraries database

● Official URL: http://cleanest.kobic.re.kr/

● What you can do: A novel database server that classifies GenBank's dbEST (database of expressed
gene sequences) libraries and removes contaminants.

59
● Highlights:
■ All dbEST libraries were classified according to species and sequencing centers.
■ Anatomical and pathological systems classified human EST libraries according to eVOC
ontologies.
■ For each dbEST library, two different cleansed sequences were provided: 'pre-cleansed' and
'user-cleansed.’
■ To generate pre-cleansed sequences, sequences were cleaned in dbEST by aligning EST
sequences against well-known contamination sources: UniVec, Escherichia coli, mitochondria,
and chloroplast (for the plant).
■ To provide user-cleansed sequences, an automatic user-cleansing pipeline was built, in which
sequences of a user-selected library are cleansed on-the-fly according to user-selected options.

7. Database of Genomic Variants

● Official URL: http://projects.tcag.ca/variation/project.html

● What you can do: Find a comprehensive summary of structural variation in the human genome.

● Highlights:
■ The Database of Genomic Variants’ objective is to provide a comprehensive summary of
structural variation in the human genome. Structural variation is defined as genomic alterations
that involve segments of DNA that are larger than >1kb. Now InDels is also annotated in the
100bp-1kb range. The content of the database is only representing structural variation identified
in healthy control samples.
■ The Database of Genomic Variants provides a useful catalog of control data for studies aiming to
correlate genomic variation with phenotypic data. The database is continuously updated with
new data from peer-reviewed research studies.

8. Dcode.org anthology of comparative genomic tools

● Official URL: http://www.dcode.org/

● What you can do: Demarcate functional regions in genomic DNA sequences using a list of tools.

● Highlights:
■ This Web site provides several analytical and visualization tools for the analysis of arbitrary
sequences and whole genomes.

60
■ These tools include two alignment tools, zPicture and Mulan; a phylogenetic shadowing tool,
eShadow for identifying lineage- and species-specific functional elements; two evolutionary
conserved transcription factor analysis tools, rVista, and multiTF; a tool for extracting
cis-regulatory modules governing the expression of co-regulated genes, Creme 2.0; and a
dynamic portal to multiple vertebrate and invertebrate genome alignments, the ECR Browser.

9. DiProGB - Dinucleotide Properties Genome Browser

● Official URL: http://diprogb.fli-leibniz.de/

● What you can do: Visualize genomes of interest.

● Highlights:
■ DiProGB is a genome browser that encodes the primary nucleotide sequence by
thermodynamical and geometrical dinucleotide properties.
■ The nucleotide sequence is thus converted into a sequence graph.
■ This visualization, supported by different graph manipulation options, facilitates genome
analyses.
■ It can identify genomic regions where certain physical properties are more conserved than the
nucleotide sequence itself.
■ Most of the DiProGB tools can be applied to both the primary nucleotide sequence and the
sequence graph.
■ They include motif and repeat searches as well as statistical analyses.
■ DiProGB adds a new dimension to the common genome analysis approaches by considering the
physical properties of DNA and RNA.

10. ECR Browser - A Tool for Visualizing and Accessing Data from Comparisons
of Multiple Vertebrate Genomes

● Official URL: http://ecrbrowser.dcode.org/

● What you can do: Access to whole-genome alignments of human, mouse, rat, and fish sequences.

● Highlights:
■ It provides the starting point for discovering novel genes, identifying distant gene regulatory
elements, and predicting transcription factor binding sites.
■ The genome alignment portal of the ECR Browser also permits fast and automated alignments of
any user-submitted sequence to the genome of choice.

61
■ The ECR Browser’s interconnection with other DNA sequence analysis tools creates a unique
portal for studying and exploring vertebrate genomes.

11. ESTAnnotator - a tool for high throughput EST annotation

● Official URL: http://genome.dkfz-heidelberg.de/

● What you can do: Perform high throughput annotation of expressed sequence tags (ESTs).

● Highlights:
■ ESTAnnotator is a tool for the high throughput annotation of expressed sequence tags (ESTs) by
automatically running a collection of bioinformatics applications.
■ A quality check is performed in the first step, and repeats, vector parts, and low-quality
sequences are masked.
■ Then successive steps of database searching and EST clustering are performed. Already known
transcripts present within mRNA and genomic DNA reference databases are identified.
■ Subsequently, tools for the clustering of anonymous ESTs and further database searches at the
protein level are applied.
■ Finally, each tool’s outputs are gathered, and the relevant results are presented in a descriptive
summary.

12. Ensembl

● Official URL: http://www.ensembl.org/

● What you can do: Find genome annotation, databases, and other information for chordate and selected
model organism and disease vector genomes.

● Highlights:
■ Ensembl integrates genomic information for a comprehensive set of chordate genomes with a
particular focus on resources for human, mouse, rat, zebrafish, and other high-value sequenced
genomes.
■ Ensembl data is accessible in a variety of formats including via our genome browser, API, and
BioMart.
■ It provides complete gene annotations for all supported species in addition to specific resources
that target genome variation, function, and evolution.
■ This year marks the tenth anniversary of Ensembl and in that time the project has grown with
advances in genome technology. As of release 56 (September 2009), Ensembl supports 51

62
species including marmoset, pig, zebra finch, lizard, gorilla, and wallaby, which were added in the
past year.
■ Significant additions and improvements to Ensembl since our previous report include the
incorporation of the human GRCh37 assembly, enhanced visualization and data-mining options
for the Ensembl regulatory features, and continued development of our software infrastructure.

13. G-SESAME - Gene Semantic Similarity Analysis and Measurement Tools

● Official URL: http://bioinformatics.clemson.edu/G-SESAME/

● What you can do: Find information about the semantic similarity between Gene Ontology terms.

● Highlights:
■ G-SESAME is a set of online tools for measuring the semantic similarities of Gene Ontology (GO)
terms and the functional similarities of gene products, and for further discovering biomedical
knowledge from the GO database.
■ The tools have been used around 6.9 million times by 417 institutions from 43 countries since
October 2006.

14. GC-Profile - a web-based tool for visualizing and analyzing the variation of
GC content in genomic sequences

● Official URL: http://tubic.tju.edu.cn/GC-Profile/

● What you can do: Analyze the variation of GC content in genomic sequences.

● Highlights:
■ GC-Profile can be utilized to segment prokaryotic and eukaryotic genomes.
■ It gives a quantitative and qualitative view of genome organization and the relationships between
the G+C content and other genomic features, like distributions of genes and CpG islands.

15. IBRENA - In silico Biochemical Reaction Network Analysis

● Official URL: http://www.eng.buffalo.edu/~neel/ibrena/

● What you can do: Find information about enzyme reaction pathways.

63
● Highlights:
■ In silico Biochemical Reaction Network Analysis (IBRENA) is a software package that facilitates
many functions including cellular reaction network simulation and sensitivity analysis (both
forward and adjoint methods), coupled with principal component analysis, singular value
decomposition, and model reduction.
■ The software features a graphical user interface that aids simulation and plotting in silico results.
■ While the primary focus is to aid formulation, testing, and reduction of theoretical biochemical
reaction networks, the program can also be used for the analysis of high-throughput genomic
and proteomic data.

16. IGB - The Integrated Genome Browser

● Official URL: https://sourceforge.net/projects/genoviz/

● What you can do: Find genomic information.

● Highlights:
■ The Integrated Genome Browser (IGB) is an open-source, desktop graphical display tool
implemented in Java that supports real-time zooming and panning through a genome; layout of
genomic features and datasets in moveable, adjustable tiers; incremental or genome-scale data
loading from remote web servers or local files; and dynamic manipulation of quantitative data via
genome graphs.
■ It is a flexible, highly interactive visualization tool that can display new data alongside foundation
datasets, such as reference gene annotations.

17. Identification of patterns in biological sequences at the ALGGEN server -


PROMO and MALGEN

● Official URL: http://alggen.lsi.upc.es/

● What you can do: Predict transcription factor binding sites or visualize sequence correspondences
among long DNA sequences.

● Highlights:
■ PROMO is a virtual lab for the identification of putative transcription factor binding sites (TFBS)
in DNA sequences from a species or groups of species of interest. TFBS described in the
TRANSFAC database are used to construct specific binding site weight matrices for TFBS
prediction.

64
■ MALGEN is a tool that pictures a graphic showing the skeleton of a multi-alignment. It is based
on an algorithm that compares two genomes whose performance depends mainly on the first
genome. For this reason, the shortest sequence should be submitted beforehand.

18. IsoFinder - Computational Prediction of Isochores in Genome Sequences

● Official URL: http://bioinfo2.ugr.es/IsoF/isofinder.html

● What you can do: Predict isochores at the genomic sequence level.

● Highlights:
■ Using the recursive algorithm, the program leads to the decomposition of a chromosome
sequence into long homogeneous genome regions (LHGRs) with well-defined mean G+C
contents, each significantly different from the G+C contents of the adjacent LHGRs.
■ Most LHGRs can be identified with Bernardi's isochores, given their correlation with biological
features such as gene density, SINE, and LINE (short, long interspersed repetitive elements)
densities, recombination rate, or single nucleotide polymorphism variability.

19. KAAS - KEGG Automatic Annotation Server

● Official URL: http://www.genome.jp/kegg/kaas/

● What you can do: Provides functional annotation of genes by BLAST comparisons against the manually
curated KEGG GENES database.

● Highlights:
■ In the KEGG database, genes in complete genomes are annotated with the KEGG orthology (KO)
identifiers, or the K numbers, based on the best hit information using Smith-Waterman scores as
well as by the manual curation.
■ Each K number represents an ortholog group of genes, and it is directly linked to an object in the
KEGG pathway map or the BRITE functional hierarchy.
■ A web-based server called KAAS, i.e. an implementation of a rapid method was developed to
automatically assign K numbers to genes in the genome, enabling reconstruction of KEGG
pathways and BRITE hierarchies.
■ The method is based on sequence similarities, bi-directional best hit information, and some
heuristics, and has achieved a high degree of accuracy when compared with the manually
curated KEGG GENES database.

65
20. MGAlignIt - a web service for the alignment of mRNA/EST and genomic
sequences

● Official URL: http://proline.bic.nus.edu.sg/mgalign/mgalignit.html

● What you can do: Align mRNA or expressed sequence tags (EST) and genome sequences.

● Highlights:
■ MGAlign is a novel, rapid, memory efficient, and practical method for aligning mRNA/EST and
genome sequences.
■ Additionally, the web service allows users to efficiently visualize the alignment in a graphical
manner and to perform limited analysis on the alignment output.
■ The server also allows the alignment to be saved in several forms, both graphical and text,
suitable for further processing and analysis by other programs.

21. MICheck - a web tool for fast checking of syntactic annotations of bacterial
genomes

● Official URL: http://www.genoscope.cns.fr/agc/tools/micheck

● What you can do: Perform rapid verification of sets of annotated genes and frameshifts in previously
published bacterial genomes.

● Highlights:
■ MICheck (MIcrobial genome Checker) enables rapid verification of sets of annotated genes and
frameshifts in previously published bacterial genomes.
■ The web interface allows users easily to investigate the MICheck results, i.e. inaccurate or
missed gene annotations: a graphical representation is drawn, in which the genomic context of a
unique coding DNA sequence annotation or a predicted frameshift is given, using the information
on the coding potential (curves) and annotation of the neighboring genes.
■ This tool can be seen as a preliminary step before the functional re-annotation step to check
quickly for missing or wrongly annotated genes.

22. MultiPipMaker and supporting tools - alignments and analysis of multiple


genomic DNA sequences

● Official URL: http://www.bx.psu.edu/miller_lab/

66
● What can you do: Align multiple, long genomic DNA sequences quickly and with good sensitivity.

● Highlights:
■ New tools are provided to search MultiPipMaker output for conserved matches to a
user-specified pattern and for conserved matches to position weight matrices that describe
transcription factor binding sites (singly and in clusters).
■ The outputs include a stacked set of percent identity plots, called a MultiPip, comparing the
reference sequence with subsequent sequences and nucleotide-level multiple alignments.
■ Alignments are computed between a contiguous reference sequence and one or more secondary
sequences, which can be finished or draft sequence.

23. Multiple alignments of genomic sequences using CHAOS, DIALIGN, and ABC

● Official URL: http://dialign.gobics.de/chaos-dialign-submission

● What can you do: Perform multiple alignments of genomic sequences.

● Highlights:
■ This WWW-based software system was devised for multiple alignments of genomic sequences.
■ It employs the local alignment tool CHAOS to quickly detect chains of pairwise similarities.
■ These similarities are employed as anchor points to speed up the DIALIGN multiple-alignment
program.
■ At last, the visualization tool ABC is utilized for interactive graphical representation of the
resulting multiple alignments.

24. OmicBrowse - Genome Annotation Browser

● Official URL: http://omicspace.riken.jp/db/genome.html

● What can you do: Use to manage multiple -omic datasets simultaneously.
.
● Highlights:
■ OmicBrowse is a genome browser devised as a scalable system for regulating numerous
genome annotation datasets.
■ It is a freely accessible tool capable of maintaining multiple user data access to each dataset to
enable multiple users to have their own integrative view of both their unpublished and published
datasets, therefore the maintenance costs pertaining to supplying each collaborator exclusively
with their own private data are significantly lowered.
■ OmicBrowse backs DAS1 imports and exports of annotations to Internet site servers globally.

67
■ A data-download named OmicDownload server is offered that interactively chooses datasets and
separates the data on the selected datasets.

25. Proteomes and Genomes Fasta

● Official URL: http://www.ebi.ac.uk/fasta33/genomes.html

● What can you do: Perform sequence similarity searching against complete genomes databases using
the Fasta programs.

● Highlights:
■ Use different programs fasta3, fastx3, fasty3, fastf3, or fasts3.
■ Sequence similarity and homology searching against complete proteome or genome databases.

26. SPRING - a tool for the analysis of genome rearrangement using reversals
and block-interchanges

● Official URL: http://algorithm.cs.nthu.edu.tw/tools/SPRING/

● What can you do: Analyze genome rearrangement between two chromosomal genomes.

● Highlights:
■ SPRING is a bioinformatic tool for the analysis of genome rearrangement between two
chromosomal genomes with reversals and/or block-interchanges.
■ It accepts two or more chromosomes as its input and then computes a minimum series of
reversals and/or block-interchanges between any two input chromosomes for transforming one
chromosome into another.
■ The input of SPRING can be either gene/landmark orders or bacterial-size sequences.
■ If the input is a set of chromosomal sequences then the SPRING will automatically look for the
same landmarks, which are homologous/conserved regions shared by all input sequences.
■ SPRING even calculates the breakpoint distance between any pair of two chromosomes.
■ Moreover, SPRING indicates phylogenetic trees that are reconstructed depending on the
rearrangement and breakpoint distance matrixes.

27. SRA - Sequence Read Archive

● Official URL: http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?

68
● What can you do: Find raw sequencing data from the "next" generation of sequencing platforms
including Roche 454 GS Systemxae, Illumina Genome Analyzerxae, Applied Biosystems SOLiDxae
System, Helicos Heliscopexae, Complete Genomicsxae, and others.

● Highlights:
■ The Sequence Read Archive (SRA) was established to offer the scientific community an archival
destination for next-generation data sets.
■ Next-generation sequencing bases are generating biological sequencing data in unprecedented
quantity.
■ Users of these resources can acquire data sets stored in any of the three SRA instances.

28. The CHAOS/DIALIGN - Multiple Alignment of Genomic Sequences

● Official URL: http://dialign.gobics.de/chaos-dialign-submission

● What can you do: Perform multiple alignments of large genomic sequences.

● Highlights:
■ CHAOS is a fast database search tool that creates a list of local sequence similarities. These are
used by DIALIGN as anchor points to speed up the final alignment procedure. The resulting
alignment is returned to the user in different formats together with a list of anchor points found
by CHAOS.
■ The web server utilizes the combination of CHAOS and DIALIGN to achieve both speed and
alignment accuracy.

29. The UCSC Archaeal Genome Browser

● Official URL: http://archaea.ucsc.edu/

● What can you do: Browse archaeal genomes.

● Highlights:
■ UCSC Archaeal Genome Browser presently comprises 26 archaeal genomes.
■ It shows protein conservation across habitat & phylogenetic, multi-genome alignments,
microarray data, sequence motifs (promoters and Shine-Dalgarno), operon & gene annotation
from multiple sources, and G/C content.

69
30. UCSC Genome Browser

● Official URL: http://genome.ucsc.edu

● What can you do: Get a rapid and reliable display of any requested portion of genomes at any scale,
together with dozens of aligned annotation tracks.

● Highlights:
■ The University of California, Santa Cruz, Genome Browser Database (GBD) offers integrated
sequence and annotation data for a huge set of model and vertebrate organism genomes.
■ In 2009, genomic sequence and a basic set of annotation 'tracks' are offered for 47 organisms,
including 14 mammals, 10 non-mammal vertebrates, 3 invertebrate deuterostomes, 13 insects, 6
worms, and a yeast.
■ New data highlights from this year include an updated human genome browser, a 44-species
multiple sequence alignment track, improved variation, and phenotype tracks, and 16 new
genome-wide ENCODE tracks.
■ New attributes include drag-and-zoom navigation, a Wiki track for user-added annotations, new
custom track formats for large datasets (bigBed and bigWig), new multiple alignment output
tools, links to variation, and protein structure tools, in silico PCR utility improvisations, and
enhanced track configuration tools.

31. ZOOM! - Zillions Of Oligos Mapped

● Official URL: http://www.bioinfor.com/zoom

● What can you do: Designed to map millions of short reads, emerged by next-generation sequencing
technology, back to the reference genomes, and carry out post-analysis.

● Highlights:
■ A framework for how full sensitivity mapping can be performed in the most efficient way, through
spaced seeds is presented here.
■ With the framework, software called ZOOM has been developed, which is able to map the
Illumina/Solexa reads of 15X coverage of a human genome to the reference human genome in
one CPU-day, enabling two mismatches, at full sensitivity.

32. e2g - An Interactive Web-based Server for Efficiently Mapping Large EST and
cDNA Sets to Genomic Sequences

70
● Official URL: http://bibiserv.techfak.uni-bielefeld.de/e2g/

● What can you do: Map large expressed sequence tag (EST) and cDNA datasets to genomic DNA.

● Highlights:
■ The webserver houses massive sets of EST sequences (for instance, 4.1 million mouse ESTs of
1.87 Gb as of 2003) in precomputed indexed data structures for efficient sequence analysis.
■ Users can upload a genomic DNA sequence of interest and quickly compare this to the entire set
of ESTs on the server.
■ It provides a mapping of the ESTs on the genomic DNA. The e2g web interface offers a graphical
overview of the mapping. Alignments of the mapped EST regions with components of the
genomic sequence are observed. Zooming functions enable the user to interactively study the
results.
■ Mapped sequences can be downloaded for additional analysis.

33. g:Profiler - a web-based toolset for functional profiling of gene lists from
large-scale experiments

● Official URL: http://biit.cs.ut.ee/gprofiler/

● What can you do: A public web server for characterizing and manipulating gene lists resulting from
mining high-throughput genomic data.

● Highlights:
■ g:Profiler has a simple & user-friendly web interface with potential visualization for capturing
Gene Ontology (GO), pathway, or transcription factor binding site enrichments down to individual
gene levels.
■ Apart from standard multiple testing corrections, a new enhanced approach for computing the
true consequence of multiple testing over complex structures like GO has been presented.
■ Interpreting ranked gene lists is backed from the same interface with very efficient algorithms.
■ Such ordered lists may appear while studying the majorly affected genes from high-throughput
data or genes co-expressed with the query gene.
■ Other significant aspects of practical data analysis are backed by modules tightly integrated with
g: Profiler.
■ These are - g: Sorter for searching a large body of public gene expression data for co-expression,
g: Orth for finding orthologous genes from other species; and g: Profiler supports 31 different
species, and g: Convert for converting between different database identifiers, and underlying data
is updated regularly from sources like the Ensembl database.
■ Bioinformatics communities wishing to integrate with g: Profiler can employ alternative simple
textual outputs.

71
5. Human genome databases, maps, and viewers

1. CREME - Cis-Regulatory Module Explorer for the Human Genome


● Official URL: http://creme.dcode.org/

● What you can do: Identify and visualize cis-regulatory modules in the promoter regions of a given set of
potentially co-regulated genes in the human genome.
● Highlights:
■ CREME relies on a database of putative transcription factor binding sites that have been
annotated across the human genome using a library of position weight matrices and evolutionary
conservation with the mouse and rat genomes.
■ A search algorithm is applied to this data set to identify combinations of transcription factors
whose binding sites tend to co-occur nearby in the input gene set’s promoter regions.
■ The identified cis-regulatory modules are statistically scored, and significant combinations are
reported and graphically visualized.

2. CTD - Comparative Toxicogenomics Database


● Official URL: http://ctd.mdibl.org/

● What you can do: A curated database that promotes understanding of the effects of environmental
chemicals on human health.

● Highlights:
■ Chemical-gene interactions, chemical-disease relationships, and gene-disease relationships are
manually curated from the literature, allowing data to be integrated to construct
chemical-gene-disease networks.
■ Curation focuses on environmental chemicals.
■ Interactions are manually curated.
■ Interactions are constructed using controlled vocabularies and hierarchies.
■ Additional gene attributes (such as Gene Ontology, taxonomy, and KEGG pathways) are
integrated.
■ Data can be viewed from the perspective of a chemical, gene, or disease.
■ Results and batch queries can be downloaded and saved.
■ CTD acts as both a knowledgebase (by reporting data) and a discovery tool (generating novel
inferences).

72
■ Over 116,000 interactions between 3900 chemicals and 13,300 genes have been curated from
270 species, and 5900 gene-disease and 2500 chemical-disease direct relationships have been
captured.
■ By integrating these data, 350,000 gene-disease relationships and 77,000 chemical-disease
relationships can be inferred.

3. Database of Genomic Variants

● Official URL: http://projects.tcag.ca/variation/project.html

● What you can do: Find a comprehensive summary of structural variation in the human genome.
● Highlights:
■ The Database of Genomic Variants’ objective is to provide a comprehensive summary of
structural variation in the human genome. Structural variation is defined as genomic alterations
that involve segments of DNA that are larger than >1kb. Now InDels is also annotated in the
100bp-1kb range. The content of the database is only representing structural variation identified
in healthy control samples.
■ The Database of Genomic Variants provides a useful catalog of control data for studies aiming to
correlate genomic variation with phenotypic data. The database is continuously updated with
new data from peer-reviewed research studies.

4. ENCODE - ENCyclopedia Of DNA Elements

● Official URL: http://genome.cse.ucsc.edu/ENCODE/

● What you can do: Search for information on identified functional elements in the human genome.

● Highlights:
■ The Encyclopedia of DNA Elements (ENCODE) project is an international consortium of
investigators funded to analyze the human genome to produce a comprehensive catalog of
functional elements.
■ The ENCODE Data Coordination Center at The University of California, Santa Cruz (UCSC) is the
primary repository for ENCODE investigators’ experimental results.
■ These results are captured in the UCSC Genome Bioinformatics database and download server
for visualization and data mining via the UCSC Genome Browser and companion tools.

5. Ensembl
● Official URL: http://www.ensembl.org/

73
● What you can do: Find genome annotation, databases, and other information for chordate and selected
model organism and disease vector genomes.

● Highlights:
■ Ensembl integrates genomic information for a comprehensive set of chordate genomes with a
particular focus on resources for human, mouse, rat, zebrafish, and other high-value sequenced
genomes.
■ It provides complete gene annotations for all supported species in addition to specific resources
that target genome variation, function, and evolution.
■ Ensembl data is accessible in various formats, including via our genome browser, API, and
BioMart.
■ This year marks the tenth anniversary of Ensembl, and in that time, the project has grown with
advances in genome technology.
■ As of release 56 (September 2009), Ensembl supports 51 species, including marmoset, pig, zebra
finch, lizard, gorilla, and wallaby, added in the past year.
■ Major additions and improvements to Ensembl since our previous report include incorporating
the human GRCh37 assembly, enhanced visualization and data-mining options for the Ensembl
regulatory features, and continued development of our software infrastructure.

6. Evola - human orthologs as evolutionary annotation

● Official URL: http://www.h-invitational.jp/evola/

● What you can do: Database of evolutionary features of human genes.


● Highlights:
■ A sub-database of H-InvDB, an integrated database of annotated human genes
(http://h-invitational.jp/).
■ In the process of ortholog detection, computational analysis based on conserved genome
synteny and transcript sequence similarity were followed by manual curation by researchers
examining phylogenetic trees.
■ In total, 18 968 human genes have orthologs among 11 vertebrates (chimpanzee, mouse, cow,
chicken, zebrafish, etc.), either computationally detected or manually curated orthologs.
■ Evola provides amino acid sequence alignments and phylogenetic trees of orthologs and
homologs.
■ In the 'd(N)/d(S) view,’ natural selection on genes can be analyzed between human and other
species.
■ In 'Locus maps,’ all transcript variants and their exon/intron structures can be compared among
orthologous gene loci.
■ Evola serves as a comprehensive and reliable database to utilize comparative analyses for
obtaining new knowledge about human genes.

74
7. GDB - the Human Genome DataBase

● Official URL: http://www.gdb.org/


● What you can do: Search the encyclopedia of the human genome that is being constantly revised and
updated to reflect the current state of scientific knowledge.
● Highlights:
■ GDB is a public repository of data on human genes, clones, STSs, polymorphisms, and maps.
■ GDB entries are highly cross-linked to each other, literature citations, and entries in other
databases, including the sequence databases, OMIM, and the Mouse Genome Database.
■ Mapping data from large genome centers and smaller mapping efforts are added to GDB on an
ongoing basis.
■ The database can be searched by a variety of methods, ranging from keyword searches to
complex queries.

8. GenAtlas

● Official URL: http://www.dsi.univ-paris5.fr/genatlas/

● What you can do: Retrieve comprehensive genetic, phenotypic, and pathological information about the
human genome and proteome.

● Highlights: GENATLAS compiles the information relevant to the Human Genome Project’s mapping
efforts by manual annotation using published literature.
■ GENATLAS contains three databases: Genes database (18771 genes), Phenotypes database
(3372 entries), References database (55554 entries), all as of Feb. 2005.
■ The three databases may be queried in full-text mode.
■ Obtain DNA, RNA, protein, expression/ subcellular localization, and pathological characteristics
for any gene.
■ Obtain imprinting information for a gene.
■ Obtain Phenotype information for a gene.

9. GeneCards

● Official URL: http://www.genecards.org/

● What you can do: Find concise information about the functions of all human genes.

75
● Highlights:
■ The information presented here has been automatically extracted from various resources.
GeneCardsx99 is particularly useful for people who wish to find information about genes of
interest in the context of functional genomics and proteomics.

10. GeneLoc - exon-based integration of human genome maps

● Official URL: http://genecards.weizmann.ac.il/geneloc/index.shtml

● What you can do: Search for the meaningful location-based identifier to each gene in the human
genome.

● Highlights:
■ GeneLoc unifies gene collections, eliminates redundancies, and assigns each gene a meaningful
location-based identifier, which also serves as its GeneCards ID.
■ It integrates gene lists by comparing genomic coordinates at the exon level.
■ GeneLoc presents an integrated map for each human chromosome based on data integrated by
the GeneLoc algorithm.

11. H-DBAS - Alternative splicing database of completely sequenced and


manually annotated full-length cDNAs based on H-Invitational

● Official URL: http://h-invitational.jp/h-dbas/

● What you can do: Search for annotated information on alternatively spliced human transcripts.

● Highlights:
■ H-DBAS is a specialized database for human alternative splicing (AS) based on H-Invitational
full-length cDNAs.
■ For better annotations of AS events, RNA-Seq tag information was correlated to the AS exons
and splice junctions.
■ 148,376,598 RNA-Seq tags were generated from RNAs extracted from cytoplasmic, nuclear, and
polysome fractions.
■ Analysis of the RNA-Seq tags identified 90,900 exons that are very likely to be used for protein
synthesis.
■ 254 AS junctions of human RefSeq transcripts are unique to nuclear RNA and may not have any
translational consequences.
■ A new comparative genomics viewer is available so users can empirically understand the
evolutionary turnover of AS.

76
12. H-InvDB - Human Invitational Database

● Official URL: http://www.h-invitational.jp/

● What you can do: Search for annotated information on human genes and transcripts.

● Highlights:
■ H-InvDB is a comprehensive annotation resource of human genes and transcripts and consists
of two main views and six sub-databases.
■ The newest release gives the annotation for 219,765 human transcripts in 43,159 human gene
clusters based on human full-length cDNAs and mRNAs.
■ It now provides several new annotation features, such as mapping of microarray probes, new
gene models, relation to known ncRNAs, and information from the Glycogene database.
■ H-InvDB also provides useful data mining resources-'Navigation search', 'H-InvDB Enrichment
Analysis Tool (HEAT)' and web service APIs.
■ 'Navigation search' is an extended search system that enables complicated searches by
combining 16 different search options.
■ HEAT is a data mining tool for automatically identifying features specific to a given human gene
set.
■ HEAT searches for H-InvDB annotations that are significantly enriched in a user-defined gene set,
as compared with the entire H-InvDB representative transcripts.
■ H-InvDB now has web service APIs of SOAP and REST to allow the use of H-InvDB data in
programs, providing the users extended data accessibility.

13. HMDB - the Human Metabolome Database

● Official URL: http://www.hmdb.ca/

● What you can do: Search for comprehensive information on human metabolism and metabolites.

● Highlights:
■ The most recent release of HMDB has been significantly expanded and enhanced over the
previous release.
■ The number of purified compounds with reference to NMR, LC-MS and GC-MS spectra has more
than doubled (from 380 to more than 790 compounds).

77
■ The number of fully annotated metabolite entries has grown from 2180 to more than 6800 (a
300% increase), while the number of metabolites with biofluid or tissue concentration data has
grown by a factor of five (from 883 to 4413).
■ Many new database searching tools and new data content has been added or enhanced. These
include better algorithms for spectral searching and matching, more powerful chemical
substructure searches, faster text searching software, as well as dedicated pathway searching
tools, and customized, clickable metabolic maps.

14. HOMD - the Human Oral Microbiome Database

● Official URL: http://www.homd.org/

● What you can do: Search detailed biological entries on oral microorganisms and the genes they
express.

● Highlights:
■ The goal of the HOMD is to provide the scientific community with comprehensive information on
approximately 600 prokaryote species that are present in the human oral cavity.
■ The majority of these species are uncultivated and unnamed, recognized primarily by their 16S
rRNA sequences.
■ The HOMD presents a provisional naming scheme for the currently unnamed species so that
strain, clone, and probe data from any laboratory can be directly linked to a stably named
reference entity.
■ The HOMD links sequence data with phenotypic, phylogenetic, clinical, and bibliographic
information.
■ Full and partial oral bacterial genome sequences determined as part of this project and the
Human Microbiome Project, are being added to the HOMD as they become available.
■ HOMD offers easy to use tools for viewing all publicly available oral bacterial genomes.

15. HuRef Genome Browser

● Official URL: http://huref.jcvi.org/

● What you can do: A web resource for individual human genomics.

● Highlights:

78
■ The browser provides a comparative view between the NCBI human reference sequence and the
HuRef assembly, and it enables the navigation of the HuRef genome in the context of HuRef,
NCBI, and Ensembl annotations.
■ Single nucleotide polymorphisms, indels, inversions, structural and copy-number variations are
shown in the context of existing functional annotations on either genome in the comparative
view.
■ The browser provides full access to the underlying reads with sequence and quality information,
the genome assembly, and the evidence supporting the identification of DNA polymorphisms.

16. LIFEdb - A Database for Functional Genomics Experiments

● Official URL: http://www.lifedb.de/

● What you can do: Explore and visualize features of the annotated human cDNAs and ORFs combined
with experimental results.

● Highlights:
■ LIFEdb integrates data from large-scale functional genomics assays and manual cDNA
annotation with bioinformatics gene expression and protein analysis.
■ It links information regarding novel human full-length cDNAs with functional information on the
encoded proteins produced in functional genomics and proteomics approaches.
■ The database also serves as a sample-tracking system to manage the process from cDNA to
experimental read-out and data interpretation.
■ New features of the 2006 release LIFEdb include
➢ An updated user interface with enhanced query capabilities
➢ A configurable output table and the option to download search results in XML
➢ The integration of data from cell-based screening assays addressing the influence of
protein-over-expression on cell proliferation.
➢ The display of the relative expression ('Electronic Northern') of the genes under
investigation using curated gene expression ontology information.

17. MGC - MAMMALIAN GENE COLLECTION

● Official URL: http://mgc.nci.nih.gov/

● What you can do: Search for sequences of publicly accessible cDNA resources containing a complete
open reading frame (ORF) for every human, mouse, and other model organisms' genes.

● Highlights:

79
■ The goal of the Mammalian Gene Collection (MGC), is to provide full-length open reading frame
(FL-ORF) clones for human, mouse, and rat genes.
■ All MGC sequences are deposited in GenBank and the clones can be purchased from distributors
of the IMAGE consortium.

18. MotifMap - a human genome-wide map of candidate regulatory motif sites

● Official URL: http://motifmap.ics.uci.edu/

● What you can do: Explore a comprehensive map of regulatory elements in the human genome.

● Highlights:
■ A procedure was developed for identifying regulatory sites, with high levels of conservation
across different species, using a new scoring scheme, the Bayesian branch length score (BBLS).
■ Using BBLS, 1.5 million regulatory sites were predicted, corresponding to 380 known regulatory
motifs, with an estimated false discovery rate (FDR) of less than 50%.
■ It is demonstrated that the method is particularly effective for 155 motifs, for which 121,056
sites can be mapped with an estimated FDR of less than 10%.
■ Over 28K SNPs are located in regions overlapping the 1.5 million predicted motif sites,
suggesting potential functional implications for these SNPs.

■ These elements are deposited in a database and created a user-friendly Web server for the
retrieval, analysis, and visualization of these elements.
■ The initial map provides a systematic view of gene regulation in the genome, which will be
refined as additional motifs become available.

19. NCG - Network of Cancer Genes


● Official URL: http://ncg.kcl.ac.uk/

● What you can do: Find information about the properties of cancer genes.

● Highlights:
■ The Network of Cancer Genes (NCG) collects and integrates data on 736 human genes that are
mutated in various types of cancer.
■ For each gene, NCG provides information on duplicability, orthology, evolutionary appearance,
and topological properties of the encoded protein in a comprehensive version of the human
protein-protein interaction network.

80
■ NCG also stores information on all primary interactors of cancer proteins, thus providing a
complete overview of 5357 proteins that constitute direct and indirect determinants of human
cancer.
■ With the constant delivery of results from the mutational screenings of cancer genomes, NCG
represents a versatile resource for retrieving detailed information on particular cancer genes, as
well as for identifying common properties of pre-compiled lists of cancer genes.

20. ORFDB -A High-quality Open Reading Frame Collection

● Official URL: http://orf.invitrogen.com/

● What can you do: Search an evolving collection of human and mouse Open Reading Frame (ORF)
clones (UltimateTM ORF Clones).

● Highlights:
■ As of October 2003, the ORFDB comprises 6200 human and 2870 mouse UltimateTM ORF
clones. All ORF clones have been completely sequenced with high quality, and are matched to
public reference protein sequences.
■ UltimateTM ORF clones can be searched by LocusLink ID, Unigene ID, clone ID, gene symbol,
GenBank accession, keyword, blast, or via functional relationships by searching the collection
through the Gene Ontology (GO) Browser.
■ Furthermore, the cloned ORFs have been comprehensively annotated across six classes:
Genomic links, SNP, Protein, Clone Format, ORF, Gene, with the details assembled in a format,
termed the ORFCard.

21. RefSeq - Reference Sequence database

● Official URL: http://www.ncbi.nlm.nih.gov/RefSeq/

● What can you do: Find sequences representing genomes, transcripts, and proteins.

● Highlights:
■ The database adds data from more than 5300 organisms spanning prokaryotes, eukaryotes, and
viruses, with records for over 5.5 x 10(6) proteins.
■ Protein and nucleotide sequences are clearly linked, and the sequences are linked to other
resources like the NCBI Map Viewer and Gene.
■ Sequences are annotated to have features such as database cross-references, names,
references, variation, conserved domains, coding regions, and others.

81
22. TRbase - A Database Of Tandem Repeats In The Human Genome

● Official URL: http://trbase.ex.ac.uk/

● What can you do: Search for Tandem Repeats In The Human Genome.

● Highlights:
■ TRbase is a web-accessible relational tandem repeats database that links tandem repeats to
gene locations and disease genes of the human genome.
■ This database detects both imperfect and perfect repeats of 1-2000 bp unit lengths.

23. The Chromosome 7 Annotation Project - Human chromosome 7 sequence


and annotation

● Official URL: http://www.chr7.org/

● What can you do: Search for comprehensive annotated genomic information on human Chromosome
7.

● Highlights:
■ This database comprises annotation and DNA sequence of the complete human chromosome 7,
encompassing almost 158 million nucleotides of DNA and 1917 gene structures.
■ Additional structural features like segmental duplications, fragile sites, and imprinted genes were
combined at the level of the DNA sequence with medical genetic data, including 440
chromosome rearrangement breakpoints related to the disease.

24. The Gene Wiki - Wikipedia Gene Portal

● Official URL: http://en.wikipedia.org/wiki/Portal:Gene_Wiki

● What can you do: Find information about annotated genes and protein functions.

● Highlights:
■ The Gene Wiki is a user-modified resource that uses a complementary and alternative model
based on the principle of community intelligence in cellular and molecular biology.
■ It is an informal set of pages on human proteins and genes, and this initiative to form these
pages is tightly organized with the Molecular and Cellular Biology Wikiproject.

82
■ Directly integrated within the online encyclopedia, Wikipedia, the motive of this initiative is to
establish a gene-specific review article for every gene in the human genome, where each article
is collaboratively drafted, continuously updated and community reviewed.
■ Gene Wiki articles are openly accessible within the Wikipedia website.

25. UCSC Genome Browser

● Official URL: http://genome.ucsc.edu

● What can you do: Get a rapid and reliable display of any requested portion of genomes at any scale,
together with dozens of aligned annotation tracks.

● Highlights:
■ The University of California, Santa Cruz, Genome Browser Database (GBD) offers integrated
sequence and annotation data for a massive set of model & vertebrate organism genomes.
■ In 2009, genomic sequence and a basic set of annotation 'tracks' are offered for 47 organisms,
including 14 mammals, 10 non-mammal vertebrates, 3 invertebrate deuterostomes, 13 insects, 6
worms, and a yeast.
■ New data highlights from this year include an updated human genome browser, a 44-species
multiple sequence alignment track, improved variation, and phenotype tracks, and 16 new
genome-wide ENCODE tracks.
■ New attributes include drag-and-zoom navigation, a Wiki track for user-added annotations, new
custom track formats for large datasets (bigBed and bigWig), new multiple alignment output
tools, links to variation, and protein structure tools, in silico PCR utility improvisations, and
enhanced track configuration tools.

26. VISTA Enhancer Browserx97a database of tissue-specific human enhancers

● Official URL: http://enhancer.lbl.gov/

● What can you do: Search for information on experimentally validated human noncoding fragments with
gene enhancer activity as assessed in transgenic mice.

● Highlights:
■ This developing database now (as of Feb. 2007) comprises more than 301 experimentally tested
DNA fragments, of which over 115 have been validated as tissue-specific enhancers.
■ For each positive enhancer, digital images of whole-mount embryo staining at embryonic day
11.5 and anatomical details of the reporter gene expression pattern are offered.

83
■ Users can collect elements near single genes of interest, browse for enhancers that target
reporter gene expression to a certain tissue or download the whole set of enhancers with a
defined conservation depth or tissue specificity.

27. dbGap - a Database of Genome-Wide Association Studies

● Official URL: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gap

● What can you do: Search for data from wide association (GWA) studies.

● Highlights:
■ dbGaP is a database of Genotype and Phenotype, will for the first time offer a central spot for
interested parties to observe all study documentation and to see summaries of the quantified
variables in a structured and searchable web format.
■ The database additionally offers pre-computed analyses of the level of the statistical association
between genes and selected phenotypes.
■ Genotype data are procured by employing high-throughput genotyping arrays to check subjects'
DNA for single nucleotide polymorphisms (SNPs), areas of the genome that have been discovered
to differ among humans.
■ The first release of dbGaP (Dec. 2006) comprises data on two studies: the Age-Related Eye
Diseases Study (AREDS), and the National Institute of Neurological Disorders and Stroke
Parkinsonism Study, a case-controlled study that collected detailed phenotypic data, cell line
samples, and DNA of 2,573 subjects.

84
www.biotecnika.org

Follow us on

You might also like