Professional Documents
Culture Documents
net/publication/328029296
CITATIONS READS
0 116
2 authors:
Some of the authors of this publication are also working on these related projects:
FPGA implementation of CAM design for low power DSP Architecture View project
All content following this page was uploaded by R.Menaha Senthilkumar on 09 April 2019.
Abstract An acronym is a textual form used to refer an entity and to stress the
important concepts. Over the last two decades, many researchers worked for mining
acronym expansion pairs from plain text and Web. This is mainly used in language
processing, information retrieval, Web search, ontology mapping, question answering,
SMS, and social media posting. Acronyms are dynamically growing day by day, and
discovering its definition/expansion is becoming a challenging task because of its
diversified characteristics. Manually edited online repositories have acronym defini-
tion pairs, but it is an overwhelming task to update all possible definitions system-
atically. To extend the support, different approaches are employed for the automatic
detection of acronym definitions from text and Web documents. This paper presents
those approaches and also reveals the Web-based methods used for disambiguating,
ranking, finding popularity score, and context words of the expansions. The scope for
the future work in this research area is also conferred in this paper.
1 Introduction
An acronym is a kind of abbreviation composed from first letter or first few letters
of the words in a phrase. It is also called as short descriptors of phrase. It is quietly
added as a new linguistic feature in English language during the year 1950s. Since
acronym is referred as a word, its definition is called as its expansion. According to
its formation, it is characterized into three types—(i) character-based acronyms:
Generally, it is formed from the initial letter(s) of each word in the phrase
R. Menaha (✉)
Department of Information Technology, Dr. Mahalingam College of Engineering
& Technology, Pollachi, Tamil Nadu, India
e-mail: rmenahasenthil@gmail.com
VE. Jayanthi
Department of Electronics and Communication Engineering, PSNA College of Engineering
& Technology, Dindigul, Tamil Nadu, India
e-mail: Jayanthi.ramu@gmail.com
sureshsatapathy@ieee.org
122 R. Menaha and VE. Jayanthi
[e.g., ISRO is an acronym for the phrase “Indian Space Research Organization”].
(ii) Syllable-based acronyms: Acronyms are formed based on the syllables in the
word [e.g., Kg is an acronym for the phrase “Kilogram.” (iii) Combination of
character and syllable: [e.g., RADAR combines the above initial characters and
syllable-based acronyms].
Acronyms are widely used in biomedical documents because the names of many
diseases, terminologies and procedures can be easily represented by using acro-
nyms. Recognizing the expansions/definitions coupled with an acronym is a sig-
nificant task in natural language processing (NLP) and information retrieval
process. Similarly, acronyms are used very commonly in Web searches; as an
example, the user gives NBA as search query instead of giving its full form
National Board of Accreditation to reduce the access time. In social media like
Twitter, Facebook, the user gives their comments in the form of acronyms to
minimize the typing work. Typically, the acronyms are used during online chatting
by the users. The usage of acronyms is even more common in mobile devices
because acronyms make the typing process easier in such devices and the infor-
mation is also expressed in a concise way.
To support the extraction of Acronym-Expansion (AE) process, manually collected
AE lists are compiled and many are available in Internet as online corpus/repository.
For each acronym query, the number of expansions returned by those corpuses varies
a lot. As an example for the acronym query CAS, the corpus [1] returns 284 different
expansions which are higher than other repositories. For any kind of acronym query,
the repositories [1–5] give more number of expansions than the remaining four cor-
puses [6–9]. However, these repositories are restricted in specific domains or orga-
nizations. And the maintenance of acronym with its list of all possible definitions is a
big problem because of rapid growth of acronyms.
Due to the above-said limitations found in online repositories, an automatic
detection of expansions related to an acronym from free text and Web has been
evolved since from last two decades. This paper presents those approaches followed
for the detection of AE pairs from the text and Web. This paper also reveals the
Web-based methods for expansion disambiguation, ranking, popularity score
detection, and the context words identification.
The paper is structured in the following way. Section 2 presents the acronym
expansion mining approaches. The Web-based methods for disambiguation, rank-
ing, finding popularity score, and context words of the expansion are tabulated in
Sect. 3. The scope of the research work in this research area is obtainable in Sect. 4.
Finally, the article is concluded in the Sect. 5.
Based on the survey analysis, the approaches adopted for recognizing acronym
expansion from text are categorized into two ways: (i) heuristics approach
and (ii) machine learning approach. The first approach is presented in Sect. 2.1,
sureshsatapathy@ieee.org
A Survey on Acronym–Expansion … 123
Knowledge Base
Web
Text Documents
Documents
Acronym - Expansions
sureshsatapathy@ieee.org
124 R. Menaha and VE. Jayanthi
sureshsatapathy@ieee.org
A Survey on Acronym–Expansion … 125
sureshsatapathy@ieee.org
126 R. Menaha and VE. Jayanthi
Table 1 (continued)
Approach and area Author, year, and Explanation
performance
Precision: 98% • It depends on the results of POS and hard to
Recall: 72% find the complex patterns
F-factor: 83%
Approach: NLP-based Zahariev M • Recognize acronyms only if it occurs
Area: Bio medical (2004) sequentially in expansion
Performance • The system does not account for acronyms with
metrics digits and symbolic characters
F-factor: 99.6%
Approach: hybrid Park and Byrd • The first character of the first word must match
[NLP + pattern] [16] the first acronym letter
Performance • The expression must not contain text markers,
metrics stop words at the beginning or at the end
Precision: 97%
Recall: 94%
F-factor: 95.9%
acronym and definition, and SVM to validate AE pairs is presented in [23] by the
authors. In [24], the authors used a linear approach called linear regression on a set
of features to find the possible alignments between acronym and expansions. The
authors [25–28] presented a SVM model, and it uses AE information as features
(e.g. length, existence of special symbols, and context) for recognizing acronyms
and their expansion from the text.
Hidden Markov model (HMM) is a statistical method which can be defined by a set
of states and transitions among these states, forming a hidden chain. Each state
produces a sequence of observation outputs but the state themselves neither or nor
known. A HMM model [29] is used for acronym expansion detection. The model
uses sequential structure of the sentences to find the acronym expansion pairs, but
the given solution is restricted because the expansion and the acronym should be
present in close vicinity. In [30], the authors used HMM model to recognize AE
pairs from biomedical text. Here, the model is built by considering the alignment
between characters or sequence of them in the acronym and expansion.
Conditional random fields (CRFs) are a class of statistical modeling method used
for labeling, segmenting the structured data in the form of sequences, trees, etc.
sureshsatapathy@ieee.org
A Survey on Acronym–Expansion … 127
CRF considers the context into account for the prediction of given input samples. It
is an alternative to the HMM model. A CRF-based approach is proposed by the
authors [31, 32] to write more effective features for AE that works on a group of
neighboring tokens together with the features of individual tokens. They have used
nonlinear hidden layers for better representation of input data.
In [33], the authors viewed the acronym expansion detection problem as a sequence
labeling task and used a hidden layer using neural networks for modeling the
feature selection process. This is commonly known as neural conditional random
fields (NCRF). But their model ignores the fine-grained information due to the
substructure. A hierarchical latent structure neural structured prediction model [34]
is used by the authors for expansion identification. They have introduced latent state
neural conditional neural fields [LNCRF] to solve the problem of expansion
sequence labeling with nonlinear input features and label sub-structures.
In recent years, the usage of machine learning approach is increasing for acro-
nym expansion detection task. The inferences of both heuristics and machine
learning approach are presented in Table 2.
sureshsatapathy@ieee.org
128 R. Menaha and VE. Jayanthi
The research works which are related to mining acronym definitions based on Web
resources, disambiguating, and ranking the expansions. Finding the popularity
score and context words of the expansions is discoursed in this section.
A method [35] recommends a most appropriate expansion of an acronym. This is
done through a dataset built from Wikipedia disambiguation pages [36] by using
simple syntactic patterns. In [37], the authors used decision tree learning program as
classifier for disambiguating expansions. An unsupervised method [38] is employed
by the authors to extract acronym and its definition from Web. They have used
patterns and constraints for configuring their model as domain and language
independent. In [39–41], the authors used paralinguistic features and statistical
measures for extracting the acronym expansion from Web documents.
The heuristic approach is used for checking uppercase and parenthesis in text.
And linguistic features are employed for identifying phrase structures to identify
candidate sentence [42]. The system [43] discovers the expansions of acronym from
query-click log files of a search engine. It also employs methods for finding pop-
ularity score and context words of each expansion. A pattern-based approach called
AcroMiner [44] used for extracting acronym expansion from Web documents and
computed the rank of each expansion.
The authors [45–47] focused on generating knowledge map by recognizing
acronym expansion from two large-scale unstructured data sets; they are
(i) Wikipedia (collective intelligence) and (ii) NDSL (scholarly database). From
Wikipedia, acronym expansion pairs are extracted from disambiguation pages;
synonyms of expansions are identified via URI redirection page. From NDSL,
acronym expansion is extracted from free text. The NN type and NP type features
are used to train naïve Bayesian classifier to recognize correct acronym expansion
pairs. The summary of above-said approaches are presented in Table 3.
The survey result suggests that: (i) To extract AE pairs from text documents like
biomedical documents, the researchers can use either heuristics or any one of the
machine learning approach. (ii) To disambiguate the expansion, finding the ranking
score of an expansion, and identifying the context words related to an expansion,
the researchers can employ the Web resources like Search Engine Results Page
(SERP), log files of search engine, and Web-crawled documents.
As an extension of the survey, we have planned for detecting list of definitions of
an acronym from SERP of Google by using machine learning approach. Few of the
web-based acronym expansion approach uses snippets as resource from SERP for
the detection of expansion. But we have planned to use titles from SERP for
detecting expansions, because the availability of expansions in title part is higher
sureshsatapathy@ieee.org
A Survey on Acronym–Expansion … 129
sureshsatapathy@ieee.org
130 R. Menaha and VE. Jayanthi
Table 3 (continued)
Author and Work and corpus Implementation
year
Alpa Jain Work: web-based AE extraction and • AE pairs are extracted from three
et al. [42] ranking the expansions for an sources
acronym. (i) Crawled Web documents
Evaluation corpus (ii) Search engine logs and
• Web search engine results (live (iii) Web search results
search) • They compared each resources
performance in terms of precision and
recall
Xiaonan J Work: mining, ranking, and using • The system is known as AcroMiner
(2008) acronym patterns • Two strategies are used, namely lower
Evaluation corpus level and upper level for recognizing
• V.E.R.A acronym dictionary [49]. AE patterns
• Rank score is controlled by three
factors; they are
(i) Pattern popularity
(ii) Gap between the acronym
(iii) Mapping score of AE.
Bilyana Work: mining acronym expansion, The system performs four important
Taneva finding popularity score, and context tasks with acronym expansion. They are
et al. [43] words for each acronym. as follows:
Data Source: query clicks logs of bing • Candidate expansion identification
2010 and 2011. • Acronym expansion clustering
Evaluation corpus • Enhancement for tail meanings
Wikipedia disambiguation pages [36]. • Canonical expansion, popularity score
computation, and context words
identification
than the snippet part of SERP. Moreover, the text content subject to mine for
detecting expansion is lesser in this proposed idea.
Sequence labeling is an ideal method for this acronym expansion detection
process because writing rules or regular expressions for all kind of expansions is a
huge task. The machine learning approach is well suited for this sequence labeling
task. In recent years, few authors [31, 32, 34] used this sequence labeling method in
their work for detecting expansion from text documents. The statistical models
like HMM, CRF, and ANN are appropriate for this acronym expansion identifi-
cation task.
Basically, acronym is an ambiguous one; i.e., it can have multiple definitions. In
order to retrieve the maximum possible definitions of an acronym, the researcher
can use Web search engine result pages. To find the rank of each expansions, the
researcher can utilize fuzzy systems by framing rules in such a way that the rank of
the expansions will come under desired categories.
sureshsatapathy@ieee.org
A Survey on Acronym–Expansion … 131
5 Conclusion
This survey reveals the approaches followed for mining acronym expansion from
free formatted text and Web. To recognize acronym and its expansion from text and
Web documents, heuristics and machine learning approaches are mainly employed
by different authors in the past two decades. Heuristics approaches uses NLP and
pattern matching concepts. Statistical models like SVM, CRF, HMM, and neural
networks are used in machine learning approaches. Both the approach has its own
merits and demerits. To assess the performance of those approaches, the researchers
predominantly used three metrics namely precision, recall, and F-score. And it is
observed that the recall value is lesser in heuristics approach than the machine
learning approach. In web-based acronym expansion detection approach, few
authors have been focused on disambiguation of acronym expansions, generating
knowledge map for AE pairs, finding popularity score, and context words of the
expansion. Mostly used Web resources by themselves are Wikipedia, Web-crawled
documents, log files, and search engine results.
References
1. http://www.acronymfinder.com
2. http://www.abbreviations.com
3. http://www.acronymslist.com
4. https://acronyms.thefreedictionary.com/
5. http://www.special-dictionary.com/acronyms/
6. http://www.acronymsearch.com
7. https://www.allacronyms.com
8. http://acronyms.silmaril.ie
9. http://acronym24.com
10. Pustejovsky, J., Castano, J., Cochran, B., Kotecki, M. Morrell, M.: Extraction and
disambiguation of acronym meaning-pairs in MEDLINE. In: Proceedings of 10th Triennial
Congress of the International Medical Informatics Association, pp. 371–375. MEDINFO, IOS
Press, London, (2001)
11. Schwartz, A., Hearst, M.: A simple algorithm for identifying abbreviation definitions in
biomedical text. In: Pacific Symposium on Biocomputing, vol. 8, pp. 451–462 (2003)
12. Zahariev, M.: Efficient acronym – expansion matching for automatic acronym acquisition. In:
International Conference on Information and Knowledge Engineering, pp. 32–37 (2003)
13. Taghva, K., Gilberth, J.: Recognizing acronyms and definitions. Information Science
Research Institute, University of Nevada, Technical Report TR, pp. 191–198 (1999)
14. Yeates, S.: Automatic extraction of acronyms from text. In: Proceedings of third New Zeland
Computer Science Research Student’s Conference, pp. 117–124, University of Waikato, New
Zealand (1999)
15. Larkey, L.S., Ogilvie, P., Price, M.A., Tamilio, B.: Acrophile: an automated acronym
extractor and server. In: Proceedings of 5th ACM Conference on Digital Libraries.
Association for Computing Machinery, pp. 205–214 (2000)
16. Park, Y., Byrd, R.J.: Hybrid text mining for finding abbreviations and their definitions. In:
Proceedings of Conference on Empirical Methods in Natural Language Processing, EMNLP,
pp. 126–133. Intelligent Information System Institute, Pittsburgh (2001)
sureshsatapathy@ieee.org
132 R. Menaha and VE. Jayanthi
17. Adar, E.: S-RAD: a simple and robust abbreviation dictionary. HP Lab. Bioinform. 20(4),
527–533 (2004)
18. Rafeeque, P.C., Abdul Nazeer, K.A.: Text mining for acronym -definition paris from
biomedical text using pattern matching method with space reduction heuristics. In:
Proceedings of 15th International Conference on Advanced Computing and Communications,
pp. 295–300. IEEE, IIT Guwahati, India (2009)
19. Saneesh Mohammed, N., Abdul Nazeer, K.A.: An improved method for extracting acronym–
definition pairs from biomedical literature. In: International Conference on Control
Communication and Computing (ICCC), pp. 194–197. IEEE (2013)
20. Liu, H., Friedman. C.: Mining terminological knowledge in large in biomedical corpora. In:
Proceedings of 8th Pacific Symposium on Biocomputing, PSB Association, Lihue, pp. 415–426
(2003)
21. Okazaki, N., Ananiadou, S.: A term recognition approach to acronym recognition, In:
Proceedings of the COLING – ACL’06, pp. 643–650. ACM, Sydney (2006)
22. Yarygina, A., Vassilieva, N.: High – recall extraction of acronym – definition pairs with
relevance feedback. In: BEWEB, pp. 21–26, ACM, Berlin (2012)
23. Nadeau, D., Turney, P.: A supervised learning approach to acronym identification. In:
Proceedings of 18th Conference of the Canadian Society for Computational Studies of
Intelligence, pp 319–329. Springer, Berlin (2005)
24. Chang, J.T., Schutze, H., Altman, R.B.: Creating an online dictionary abbreviation from
MEDLINE. J. Am. Med. Inform. Assoc. 9(6), 612–620 (2002)
25. Xu, J., Huang, YL.: A machine learning approach to recognizing acronyms and their
expansion. In: International Conference on Machine Learning and Cybernetics, IEEE, China
(2005)
26. Xu, J., Huang, Y.L.: Using SVM to extract acronyms from text. Soft Computing, pp. 369–373.
Springer, Berlin (2006)
27. Ni, W., Xu, J., Huang, Y., Liu, T., Ge, J.: Acronym extraction using SVM with uneven
margins. In: Proceedings of the 2nd IEEE Symposium on Web Society, pp. 132–138. IEEE,
Beijing (2010)
28. Gao, Y.M., Huang, Y.L.: Using SVM with uneven margins to extract acronym expansion. In:
Proceedings of the 8th International Conference on Machine Learning and Cybernetics,
pp. 1286–1292, IEEE, Baoding (2009)
29. Taghva, K., Vyas, L.: Acronym expansion via hidden Markov models. In: Proceedings of
International Conference on Systems Engineering, IEEE, pp. 120–125 (2011)
30. Osiek, B.A., Xexeo, G., de Carvalho, L.A.V.: A language - independent acronym extraction
from biomedical texts with hidden Markov models. IEEE Trans. Biomed. Eng. 57(11),
2677–2688 (2010)
31. Nautial, A., Sristy, N.B., Somayajulu, D.V.L.N: Finding acronym expansion using
semi-Markov conditional random fields. In: Compute 2014, India, pp. 16:1–16:6. ACM, (2014)
32. Liu, J., Chen, J., Liu, T., Huang, Y.: Expansion finding for given acronyms using conditional
random fields. In: WAIM, pp. 191–200 (2011)
33. Liu, J., Liu, C., Hu, Q., Huang, Y.: Fine – grained acronym expansion identification using
latent-state neural structured prediction model. In: Proceedings of International Conference on
Machine Learning and Cybernetics, pp. 259–264. IEEE, Guangzhou (2015)
34. Liu, J., Liu, C., Huang, Y.: Multi-granularity sequence labeling model for acronym expansion
identification. Inf. Sci. 38, 462–474 (2017)
35. Choi, D., Shin, J., Lee, E., Kim, P.: A method for recommending the most appropriate
expansion of acronyms using wikipedia. In: Seventh International Conference on Innovative
Mobile and Internet Services in Ubiquitous Computing, IEEE, pp. 217–220 (2013)
36. https://en.wikipedia.org/wiki
37. Sumita, E., Sugaya, F.: Using the web to disambiguate acronyms. In: Association for
Computational Linguistics (ACL), pp. 161–164. New York (2006)
38. Sanchez, D., Isren, D.: Automatic extraction of acronym definitions from the Web. J. Appl.
Intell. 34(2), 311–327 (2011)
sureshsatapathy@ieee.org
A Survey on Acronym–Expansion … 133
39. Roche, M., Prince, V.: A web-mining approach to disambiguate biomedical acronym
expansions. Informatica 34, 243–253 (2010)
40. Roche, M., Prince, V.: Managing the acronym/ expansion identification process for text -
mining applications. Int. J. Softw. Inf. 2(2), 163–179 (2008)
41. Roche, M.: How to exploit paralinguistic features to identify acronyms in text. In: International
Conference on Language Resources and Evaluation, Reykjavik, Iceland, pp 69–72 (2014)
42. Jain A., Cucerzan, S., Azzam, S.: Acronym-expansion recognition and ranking on the web,
In: Proceedings of the IEEE International Conference on Information Reuse and Integration
(IRI 2007), pp. 209–214 (2007)
43. Taneva, B., Cheng, T., Chakrabarthi, K., He, Y.: Mining Acronym Expansions and their
Meanings Using Query Log. WWW 2013, pp. 1261–1271. ACM, Brazil (2013)
44. Ji, X., Xu, G., Bailey, J., Li, H.: Mining, ranking, and using acronym patterns, In: Proceedings
of the 10th Asia-Pacific Web Conference on Progress in WWW Research and Development,
pp. 371–382 (2008)
45. Jeong, D.H., Gim, J., Jung, H.: Incremental discriminating method for acronyms in
heterogeneous resources. Int. J. Adv. Soft Comput. Appl. 7(1), 59–67 (2015)
46. Jeong, DH.., Hwang, M.G., Kim, J., Jung, H. Sung, W.K.: Acronym- expansion recognition
based on knowledge map system. Int. Inf. Inst. (Tokyo). Inf. Koganei 16(12), 8403–8408
(2013)
47. Jeong, D.H., Hwang, M.G., Sung, W.K.: Generating knowledge map for acronym– expansion
recognition. In: International Conference on U-and E-Service, Science and Technology,
(UNESST), pp 287–293 (2011)
48. http://www.ncbi.nil.nih.gov [MEDLINE Abstracts]
49. http://www.delorie.com/gnu/docs/vera/vera_toc.html [V.E.R.A]
sureshsatapathy@ieee.org