Professional Documents
Culture Documents
In the first step, splice sites, start and stop codons are predicted and scored
along the sequence using Position Weight Arrays (PWAs).
In the second step, exons are built from the sites. Exons are scored as the sum of
the scores of the defining sites, plus the the log-likelihood ratio of a Markov
Model for coding DNA.
Finally, from the set of predicted exons, the gene structure is assembled,
maximizing the sum of the scores of the assembled exons.
geneid offers some type of support to integrate predictions from multiple source
via external gff files and the redefinition of the general gene
structure or model is also feasible. The accuracy of geneid compares favorably
to that of other existing tools, but geneid is likely more efficient in terms of
speed and memory usage.
Currently, geneid v1.2 analyzes the whole human genome in 3 hours (Gbp /
hour) on a processor Intel(R) Xeon CPU 2.80 Ghz.
Characteristics of GENE ID
geneid accuracy compares to that of other existing "ab initio" gene prediction tools.
geneid is very efficient in terms of speed and memory usage. In practice, geneid can
analyze chromosome size sequences at a rate of about 1 Gbp per hour on the Intel(R) Xeon
CPU 2.80 Ghz. For the largest human chromosome (chr1), it requires 1/2 Gbyte of RAM
plus the size of the Fasta sequence. .
geneid offers support to integrate predictions from multiple sources (ESTs, blast HSPs)
and to reannotate genomic sequences, via external gff files and together with the
redefinition of the "gene model".
geneid output can be customized to different levels of detail, including exhaustive listing
of potential signals and exons. Furthermore, several output formats as gff or XML are
available.
Glimmer is the primary microbial gene finder at TIGR, and has been used to annotate the
complete genomes of over 80 bacterial species at TIGR and elsewhere. Its analyses of
some of these genomes and others is available at the Comprehensive Microbial
Resource site.