You are on page 1of 3

Terms and ontology[edit]

From a practical view, an ontology is a representation of something we know about. "Ontologies"


consist of representations of things that are detectable or directly observable, and the relationships
between those things. There is no universal standard terminology in biology and related domains,
and term usages may be specific to a species, research area or even a particular research group.
This makes communication and sharing of data more difficult. The Gene Ontology project provides
an ontology of defined terms representing gene product properties. The ontology covers three
domains:

 cellular component, the parts of a cell or its extracellular environment;


 molecular function, the elemental activities of a gene product at the molecular level, such
as binding or catalysis;
 biological process, operations or sets of molecular events with a defined beginning and
end, pertinent to the functioning of integrated living units: cells, tissues, organs, and organisms.
Each GO term within the ontology has a term name, which may be a word or string of words; a
unique alphanumeric identifier; a definition with cited sources; and an ontology indicating the domain
to which it belongs. Terms may also have synonyms, which are classed as being exactly equivalent
to the term name, broader, narrower, or related; references to equivalent concepts in other
databases; and comments on term meaning or usage. The GO ontology is structured as a directed
acyclic graph, and each term has defined relationships to one or more other terms in the same
domain, and sometimes to other domains. The GO vocabulary is designed to be species-neutral,
and includes terms applicable to prokaryotes and eukaryotes, single and multicellular organisms.
GO is not static, and additions, corrections and alterations are suggested by, and solicited from,
members of the research and annotation communities, as well as by those directly involved in the
GO project.[5] For example, an annotator may request a specific term to represent a metabolic
pathway, or a section of the ontology may be revised with the help of community experts (e.g. [6]).
Suggested edits are reviewed by the ontology editors, and implemented where appropriate.
The GO ontology and annotation files are freely available from the GO website [7] in a number of
formats, or can be accessed online using the GO browser AmiGO. The Gene Ontology project also
provides downloadable mappings of its terms to other classification systems.

Example term[edit]
id: GO:0000016
name: lactase activity
ontology: molecular_function
def: "Catalysis of the reaction: lactose + H2O=D-glucose + D-galactose." [EC:3.2.1.108]
synonym: "lactase-phlorizin hydrolase activity" BROAD [EC:3.2.1.108]
synonym: "lactose galactohydrolase activity" EXACT [EC:3.2.1.108]
xref: EC:3.2.1.108
xref: MetaCyc:LACTASE-RXN
xref: Reactome:20536
is_a: GO:0004553 ! hydrolase activity, hydrolyzing O-glycosyl compounds
Data source:[8]

Annotation[edit]
Genome annotation encompasses the practice of capturing
data about a gene product, and GO annotations use terms
from the GO to do so. Annotations from GO curators are
integrated and disseminated on the GO website, where
they can be downloaded directly or viewed online using
AmiGO.[9] In addition to the gene product identifier and the
relevant GO term, GO annotations have at least the
following data: The reference used to make the annotation
(e.g. a journal article); An evidence code denoting the type
of evidence upon which the annotation is based; The date
and the creator of the annotation

Supporting information, depending on GO term and


evidence used and supplementary information, such as the
conditions the function is observed under, may also be
included in a GO annotation.
The evidence code comes from a controlled vocabulary of
codes, the Evidence Code Ontology, covering both manual
and automated annotation methods.[10] For
example, Traceable Author Statement (TAS) means a
curator has read a published scientific paper and the
metadata for that annotation bears a citation to that
paper; Inferred from Sequence Similarity (ISS) means a
human curator has reviewed the output from a sequence
similarity search and verified that it is biologically
meaningful. Annotations from automated processes (for
example, remapping annotations created using another
annotation vocabulary) are given the code Inferred from
Electronic Annotation (IEA). In 2010, over 98% of all GO
annotations were inferred computationally, not by curators,
but as of July 2, 2019, only about 30% of all GO
annotations were inferred computationally. [11][12] As these
annotations are not checked by a human, the GO
Consortium considers them to be marginally less reliable
and they are commonly to higher level, less detailed terms.
Full annotation data sets can be downloaded from the GO
website. To support the development of annotation, the GO
Consortium provides workshops and mentors new groups
of curators and developers.
Many machine learning algorithms have been designed
and implemented to predict Gene Ontology annotations. [13]
[14]

Example annotation[edit]
Gene product: Actin, alpha cardiac muscle 1, UniProtKB:P68032
GO term: heart contraction ; GO:0060047 (biological process)
Evidence code: Inferred from Mutant Phenotype (IMP)
Reference: PMID 17611253
Assigned by: UniProtKB, June 6, 2008
Data source:[15]

You might also like