Professional Documents
Culture Documents
Automatic annotation of concepts contained in biomedical ontologies is a crucial step for enabling data interoperability and translational discoveries. While a variety of tools have been proposed and been used for annotation purposes, there is no unified
framework that allows automatic generation and integration of annotations in OWL files. Here we present a prototype framework that uses the National Library of Medicine MetaMap program to generate a fully annotated bleeding phenotype OWL ontology file.
INTRODUCTION RESULTS
Annotation of biomedical ontologies with controlled vocabularies is vital for maximizing ontology-based data sharing and analysis. Methods to automate annotation Our results showed the framework successfully extracted the BPO terms, and the UMLS annotations were successfully integrated into the BPO. Among the 543
would greatly advance the usefulness of biomedical ontology. Currently the annotation process needs to be done manually by expert curators, which is time consuming, concepts in the BPO, 100% were found in the UMLS. 804 unique CUIs were tagged for annotation. These 804 CUIs in turn generated a total of 1246 controlled
labor intensive, and error prone. While methodologies and discussions on the development of automatic annotation for free text have been proposed1, there is no easy vocabulary terms. The results of the experiment were analyzed based on precision and recall of the yielded terms determined by two domain experts. The results are
way to integrate annotations generated by those tools with the original ontology files. To offer a solution to this problem, we present a unified framework whereby tabulated and shown in Table1.
annotations are automatically generated from and integrated into a Web Ontology Language (OWL) ontology. We test our framework using our Bleeding Phenotype
Ontology (BPO), a domain ontology that represents a coherent body of explicit declarative knowledge about bleeding phenomena2 and the National Library of Table 1. Evaluation of the automated OWL annotation framework.
Medicine’s MetaMap program3, a natural language program that maps text to concepts in the Unified Medical Language System (UMLS).
# of UMLS Terms MetaMap MetaMap + Post
Out of the original 543 concepts in the BPO, 46 concepts were excluded from
annotation process by the post processor, thereby attributing to lower
processor
number of false positive (FP) terms and the increase in the precision rate Total 1374 1246
from 58% to 66%. The other 497 concepts were annotated with the 1246
METHODS controlled vocabulary terms generated by 804 unique CUIs of which 34 were TP 598 598
tagged as OMIM, 63 as ICD9CM, 1140 as SNOMEDCT and 9 as GO terms.
The recall rates for both Metamap and the annotator (MetaMap + post FP 431 303
processor) were 95% due to the fact that both attributed to the same number FN 31 31
of false negative (FN) terms i.e., terms that did not have coverage in UMLS
knowledge source. Precision 58% 66%
Recall 95% 95%
DISCUSSION
The choice of the set of vocabularies in UMLS used to create the mappings depends on the type of biomedical data the ontology is attempting to annotate. Because
the BPO contains terms describing anatomic abnormalities, malignancies, and gene products, various vocabularies from UMLS were used, including GO, OMIM,
SNOMEDCT, ICD9CM. Concepts of type T061 (therapeutic or preventive procedure, e.g., C1533734:Treatment) and the errors caused by multiple occurrences in
semantic types (e.g., in “coagulation evaluation”, “evaluation” yielded: C0220825:Evaluation [Functional Concept], C1261322:Evaluation [Health Care Activity])
presented the greatest challenge to the system’s performance (high FP rate). The other factors that were accounted for the FP and FN terms includes: i) lack of
coverage of narrower terms in Metathasaurus, for example, the term “uterotonic_medication” was mapped to more generic term “Medication, C0013227”; ii) spelling
error causing concepts to be missed and iii) incorrect usage of concept names (e.g., we found false matches for “type AB” whose correct usage yielded to “blood type
AB, C0427624” ). This annotation process helped us further refine our BPO ontology by adhering to more standardized naming convention and allowed us to achieve
a significant improvement in precision rate.