You are on page 1of 1

A Practical Approach to Automated OWL Annotation

Shamim A. Mollah, Andreas C. Mauer, Matthew Wrobel, Edward Barbour


Center for Clinical and Translational Science
The Rockefeller University, New York, NY

Automatic annotation of concepts contained in biomedical ontologies is a crucial step for enabling data interoperability and translational discoveries. While a variety of tools have been proposed and been used for annotation purposes, there is no unified
framework that allows automatic generation and integration of annotations in OWL files. Here we present a prototype framework that uses the National Library of Medicine MetaMap program to generate a fully annotated bleeding phenotype OWL ontology file.

INTRODUCTION RESULTS
Annotation of biomedical ontologies with controlled vocabularies is vital for maximizing ontology-based data sharing and analysis. Methods to automate annotation Our results showed the framework successfully extracted the BPO terms, and the UMLS annotations were successfully integrated into the BPO. Among the 543
would greatly advance the usefulness of biomedical ontology. Currently the annotation process needs to be done manually by expert curators, which is time consuming, concepts in the BPO, 100% were found in the UMLS. 804 unique CUIs were tagged for annotation. These 804 CUIs in turn generated a total of 1246 controlled
labor intensive, and error prone. While methodologies and discussions on the development of automatic annotation for free text have been proposed1, there is no easy vocabulary terms. The results of the experiment were analyzed based on precision and recall of the yielded terms determined by two domain experts. The results are
way to integrate annotations generated by those tools with the original ontology files. To offer a solution to this problem, we present a unified framework whereby tabulated and shown in Table1.
annotations are automatically generated from and integrated into a Web Ontology Language (OWL) ontology. We test our framework using our Bleeding Phenotype
Ontology (BPO), a domain ontology that represents a coherent body of explicit declarative knowledge about bleeding phenomena2 and the National Library of Table 1. Evaluation of the automated OWL annotation framework.
Medicine’s MetaMap program3, a natural language program that maps text to concepts in the Unified Medical Language System (UMLS).
# of UMLS Terms MetaMap MetaMap + Post
Out of the original 543 concepts in the BPO, 46 concepts were excluded from
annotation process by the post processor, thereby attributing to lower
processor
number of false positive (FP) terms and the increase in the precision rate Total 1374 1246
from 58% to 66%. The other 497 concepts were annotated with the 1246
METHODS controlled vocabulary terms generated by 804 unique CUIs of which 34 were TP 598 598
tagged as OMIM, 63 as ICD9CM, 1140 as SNOMEDCT and 9 as GO terms.
The recall rates for both Metamap and the annotator (MetaMap + post FP 431 303
processor) were 95% due to the fact that both attributed to the same number FN 31 31
of false negative (FN) terms i.e., terms that did not have coverage in UMLS
knowledge source. Precision 58% 66%
Recall 95% 95%

DISCUSSION
The choice of the set of vocabularies in UMLS used to create the mappings depends on the type of biomedical data the ontology is attempting to annotate. Because
the BPO contains terms describing anatomic abnormalities, malignancies, and gene products, various vocabularies from UMLS were used, including GO, OMIM,
SNOMEDCT, ICD9CM. Concepts of type T061 (therapeutic or preventive procedure, e.g., C1533734:Treatment) and the errors caused by multiple occurrences in
semantic types (e.g., in “coagulation evaluation”, “evaluation” yielded: C0220825:Evaluation [Functional Concept], C1261322:Evaluation [Health Care Activity])
presented the greatest challenge to the system’s performance (high FP rate). The other factors that were accounted for the FP and FN terms includes: i) lack of
coverage of narrower terms in Metathasaurus, for example, the term “uterotonic_medication” was mapped to more generic term “Medication, C0013227”; ii) spelling
error causing concepts to be missed and iii) incorrect usage of concept names (e.g., we found false matches for “type AB” whose correct usage yielded to “blood type
AB, C0427624” ). This annotation process helped us further refine our BPO ontology by adhering to more standardized naming convention and allowed us to achieve
a significant improvement in precision rate.

Figure 1. Schematic overview of the automated OWL Annotation framework.


FUTURE WORK REFERENCES
Terms extraction:
We used an XML/OWL rendition of our BPO ontology as the input file. We then utilized Jena, a Java API for Semantic Web applications, to extract all the unique terms 1. C. Jonquet, N. H. Shah, M. A. Musen. The Open Biomedical Annottor. AMIA
Using our prototype as an integration framework, we have demonstrated a
from the input file. Summit on Translational Bioinformatics. 2009: 56-60.
simple, robust mechanism for automatically annotating concepts within an OWL
file. Future work includes refinement and evaluation of the framework using 2. Mauer AC, Barbour EM, Khazanov NA et al.: Creating an ontology-based
CUI generation: human phenotyping system: the Rockefeller University bleeding history
other ontologies, porting of modules to the Java platform, and construction of a
The extracted terms were used as inputs into MetaMap’s MMTx program using the UMLS 2009AB knowledge source. MMTx was set to the strict model to ensure high experience. Clinical and Translational Science. 2009 Oct;2(5):382-5.
Java-based web service that supports automated annotation of both XML and
precision. Concepts were identified by UMLS Concept Unique Identifier (CUI). We allowed all synonyms or labels that syntactically identified each term (e.g., epistaxis, 3. Aronson AR. Effective mapping of biomedical text to the UMLS
RDF OWL rendition. We will submit the annotation module as an informatics
nosebleed, etc.). Metathesaurus: the MetaMap program. Proc AMIA Symp, 2001:17-21.
tool to the National Center for Biomedical Ontologies (NCBO) BioPortal that can
be used by the others in the biomedical community.
Post processing and integration of annotations into an OWL file:
We then developed a Perl program to 1) post process MetaMap output; 2) capture annotations for OMIM, SNOMEDCT, ICD9CM, and GO codes using UMLS’ preferred
term (PT) type; and 3) store the annotations in a table indexed by CUIs. Finally, we used the original OWL file to integrate the annotations into their proper locations in Acknowledgements
the ontology. This work is supported in part by CTSA grants UL1RR024143 from the National Institutes of Health.

You might also like