You are on page 1of 7

GeneID

GeneID is a program to predict genes in anonymous genomic sequences


designed with a hierarchical structure.

In the first step, splice sites, start and stop codons are predicted and scored
along the sequence using Position Weight Arrays (PWAs).

In the second step, exons are built from the sites. Exons are scored as the sum of
the scores of the defining sites, plus the the log-likelihood ratio of a Markov
Model for coding DNA.

Finally, from the set of predicted exons, the gene structure is assembled,
maximizing the sum of the scores of the assembled exons.

geneid offers some type of support to integrate predictions from multiple source
via external gff files and the redefinition of the general gene
structure or model is also feasible. The accuracy of geneid compares favorably
to that of other existing tools, but geneid is likely more efficient in terms of
speed and memory usage.
Currently, geneid v1.2 analyzes the whole human genome in 3 hours (Gbp /
hour) on a processor Intel(R) Xeon CPU 2.80 Ghz.
Characteristics of GENE ID
geneid accuracy compares to that of other existing "ab initio" gene prediction tools.

geneid is very efficient in terms of speed and memory usage. In practice, geneid can
analyze chromosome size sequences at a rate of about 1 Gbp per hour on the Intel(R) Xeon
CPU 2.80 Ghz. For the largest human chromosome (chr1), it requires 1/2 Gbyte of RAM
plus the size of the Fasta sequence. .

geneid offers support to integrate predictions from multiple sources (ESTs, blast HSPs)
and to reannotate genomic sequences, via external gff files and together with the
redefinition of the "gene model".

geneid output can be customized to different levels of detail, including exhaustive listing
of potential signals and exons. Furthermore, several output formats as gff or XML are
available.

There are available parameter files in geneid v 1.2 for Drosophila


Melanogaster, human (which can be also used for vertebrate genomes), Dictyostelium
discoideum and Tetraodon nigroviridis (which can be used for Fugu rubripes) among many
others for species spanning the four "classical" kingdoms. The additional currently
available parameter files can be found under the section "geneid parameter files" .
Glimmer
Glimmer is a system for finding genes in microbial DNA, especially the genomes of
bacteria, archaea, and viruses. Glimmer (Gene Locator and Interpolated Markov ModelER)
uses interpolated Markov models (IMMs) to identify the coding regions and distinguish
them from noncoding DNA. The IMM approach, described in our Nucleic Acids
Research paper on Glimmer 1.0 and in our subsequent paper on Glimmer 2.0 , uses a
combination of Markov models from 1st through 8th-order, weighting each model
according to its predictive power. Glimmer 1.0 and 2.0 use 3-periodic nonhomogenous
Markov models in their IMMs.

Glimmer is the primary microbial gene finder at TIGR, and has been used to annotate the
complete genomes of over 80 bacterial species at TIGR and elsewhere. Its analyses of
some of these genomes and others is available at the Comprehensive Microbial
Resource site.

For our eukaryotic gene finders go to the GlimmerHMM site .


AUGUSTUS
AUGUSTUS, a software for gene prediction in eukaryotic genomic sequences that is based on
a generalized hidden Markov model, a probabilistic model of a sequence and its gene
structure. The web server allows the user to impose constraints on the predicted gene
structure. A constraint can specify the position of a splice site, a translation initiation site or a
stop codon. Furthermore, it is possible to specify the position of known exons and intervals
that are known to be exonic or intronic sequence. The number of constraints is arbitrary and
constraints can be combined in order to pin down larger parts of the predicted gene
structure. The result then is the most likely gene structure that complies with all given user
constraints, if such a gene structure exists. The specification of constraints is useful when part
of the gene structure is known, e.g. by expressed sequence tag or protein sequence
alignments, or if the user wants to change the default prediction. The web interface and the
downloadable stand-alone program are available free of charge

Also study about NCBI ORF FINDER and Genscan

You might also like