You are on page 1of 1

Evaluation of Disease-Associated Text-Mining Databases

Myeong-Sang Yu Dokyun Na
Chung-Ang University Chung-Ang University
School of integrative engineering School of integrative engineering
Seoul, Korea Seoul, Korea
+82-2-820-5690 +82-2-820-5690
msirang@ssbio.cau.ac.kr blisszen@cau.ac.kr

ABSTRACT
There are about 20 million scientific articles in PubMed and this
is a great source of knowledge. Extraction of information from the
articles is one of challenges in biology and thus many text-mining
approaches have been developed. However, the accuracy of text-
mined results is still in question. Here we evaluated three text-
mining databases with genes associated with Alzheimer’s disease.
Their per-gene accuracy is high (57-100%), but their per-abstract
accuracy is relatively low (33-64%). This represents that the
association of gene and disease is well-identified when abundant
articles are available. However, genes with fewer articles could be
wrongfully identified associated. Consequently, human-curation is
still complementary to current text-mining approaches and future
text-mining methods should improve their accuracy for genes with
few articles or information.

Categories and Subject Descriptors


J.3 [Life and medical science]: Medical information system

Keywords
Text mining; Database; Alzheimer’s disease

Permission to make digital or hard copies of part or all of this work for
personal or classroom use is granted without fee provided that copies are not
made or distributed for profit or commercial advantage and that copies bear
this notice and the full citation on the first page. Copyrights for third-party
components of this work must be honored. For all other uses, contact the
Owner/Author.
Copyright is held by the owner/author(s).
DTMBIO'15, October 23, 2015, Melbourne, VIC, Australia.
ACM 978-1-4503-3787-8/15/10.
DOI: http://dx.doi.org/10.1145/2811163.2811169

20

You might also like