Surveypaper Springer

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/328029296
A Survey on Acronym–Expansion Mining Approaches from Text and Web:

Proceedings of the Second International Conference on SCI 2018, Volume 1
Chapter · January 2019

DOI: 10.1007/978-981-13-1921-1_12
CITATIONS READS
0 116
2 authors:
R.Menaha Senthilkumar Jayanthi VE

Dr. Mahalingam College of Engineering and Technology PSNACET
3 PUBLICATIONS 1 CITATION 60 PUBLICATIONS 41 CITATIONS
SEE PROFILE SEE PROFILE
Some of the authors of this publication are also working on these related projects:
FPGA implementation of CAM design for low power DSP Architecture View project
All content following this page was uploaded by R.Menaha Senthilkumar on 09 April 2019.
The user has requested enhancement of the downloaded file.

A Survey on Acronym–Expansion
Mining Approaches from Text and Web
R. Menaha and VE. Jayanthi
Abstract An acronym is a textual form used to refer an entity and to stress the
important concepts. Over the last two decades, many researchers worked for mining
acronym expansion pairs from plain text and Web. This is mainly used in language
processing, information retrieval, Web search, ontology mapping, question answering,
SMS, and social media posting. Acronyms are dynamically growing day by day, and
discovering its definition/expansion is becoming a challenging task because of its
diversified characteristics. Manually edited online repositories have acronym defini-
tion pairs, but it is an overwhelming task to update all possible definitions system-
atically. To extend the support, different approaches are employed for the automatic
detection of acronym definitions from text and Web documents. This paper presents
those approaches and also reveals the Web-based methods used for disambiguating,
ranking, finding popularity score, and context words of the expansions. The scope for
the future work in this research area is also conferred in this paper.
1 Introduction
An acronym is a kind of abbreviation composed from first letter or first few letters
of the words in a phrase. It is also called as short descriptors of phrase. It is quietly
added as a new linguistic feature in English language during the year 1950s. Since
acronym is referred as a word, its definition is called as its expansion. According to
its formation, it is characterized into three types—(i) character-based acronyms:
Generally, it is formed from the initial letter(s) of each word in the phrase
R. Menaha (✉)
Department of Information Technology, Dr. Mahalingam College of Engineering
& Technology, Pollachi, Tamil Nadu, India
e-mail: rmenahasenthil@gmail.com
VE. Jayanthi
Department of Electronics and Communication Engineering, PSNA College of Engineering
& Technology, Dindigul, Tamil Nadu, India
e-mail: Jayanthi.ramu@gmail.com
© Springer Nature Singapore Pte Ltd. 2019 121

S. C. Satapathy et al. (eds.), Smart Intelligent Computing and Applications,
Smart Innovation, Systems and Technologies 104,
https://doi.org/10.1007/978-981-13-1921-1_12
sureshsatapathy@ieee.org
122 R. Menaha and VE. Jayanthi
[e.g., ISRO is an acronym for the phrase “Indian Space Research Organization”].
(ii) Syllable-based acronyms: Acronyms are formed based on the syllables in the
word [e.g., Kg is an acronym for the phrase “Kilogram.” (iii) Combination of
character and syllable: [e.g., RADAR combines the above initial characters and
syllable-based acronyms].
Acronyms are widely used in biomedical documents because the names of many
diseases, terminologies and procedures can be easily represented by using acro-
nyms. Recognizing the expansions/definitions coupled with an acronym is a sig-
nificant task in natural language processing (NLP) and information retrieval
process. Similarly, acronyms are used very commonly in Web searches; as an
example, the user gives NBA as search query instead of giving its full form
National Board of Accreditation to reduce the access time. In social media like
Twitter, Facebook, the user gives their comments in the form of acronyms to
minimize the typing work. Typically, the acronyms are used during online chatting
by the users. The usage of acronyms is even more common in mobile devices
because acronyms make the typing process easier in such devices and the infor-
mation is also expressed in a concise way.
To support the extraction of Acronym-Expansion (AE) process, manually collected
AE lists are compiled and many are available in Internet as online corpus/repository.
For each acronym query, the number of expansions returned by those corpuses varies
a lot. As an example for the acronym query CAS, the corpus [1] returns 284 different
expansions which are higher than other repositories. For any kind of acronym query,
the repositories [1–5] give more number of expansions than the remaining four cor-
puses [6–9]. However, these repositories are restricted in specific domains or orga-
nizations. And the maintenance of acronym with its list of all possible definitions is a
big problem because of rapid growth of acronyms.
Due to the above-said limitations found in online repositories, an automatic
detection of expansions related to an acronym from free text and Web has been
evolved since from last two decades. This paper presents those approaches followed
for the detection of AE pairs from the text and Web. This paper also reveals the
Web-based methods for expansion disambiguation, ranking, popularity score
detection, and the context words identification.
The paper is structured in the following way. Section 2 presents the acronym
expansion mining approaches. The Web-based methods for disambiguation, rank-
ing, finding popularity score, and context words of the expansion are tabulated in
Sect. 3. The scope of the research work in this research area is obtainable in Sect. 4.
Finally, the article is concluded in the Sect. 5.
2 Acronym-Expansion Mining Approaches
Based on the survey analysis, the approaches adopted for recognizing acronym
expansion from text are categorized into two ways: (i) heuristics approach
and (ii) machine learning approach. The first approach is presented in Sect. 2.1,
A Survey on Acronym–Expansion … 123
Knowledge Base
Web
Text Documents
Documents
Acronym Expansion Mining Approaches
Heuristics Approach Machine Learning Approach
NLP SVM HMM
Patterns CRF ANN
Acronym - Expansions
Fig. 1 Acronym-Expansion mining approaches
and the second is presented in Sect. 2.2. An Illustration of acronym expansion

mining approaches is given in Fig. 1.
2.1 Heuristic Approach
Heuristics-based approach includes NLP and pattern-based approaches. The

NLP-based approach uses the techniques like part of speech (POS) tagging,
chunking, relation extraction to discover the expansion associated with an acronym.
In pattern-based approach, hand-built rules or regular expressions are applied to the
AE extraction process. The rules are manually written by considering the charac-
teristics of acronyms such as ambiguity, nesting, uppercase letters, length, and
paralinguistic markers surrounding acronym as features in the text.
A method called AcroMed [10] uses regular expression algorithm and syntac-
tically constrained algorithm for AE extraction process from biomedical documents.
A pattern matching algorithm is used for extracting AE pairs from MEDLINE
biomedical abstracts [11]. Dynamic programming is used in [12] for matching

acronym and its expansion by maximizing a linguistic plausibility score. An
Acronym Finding Program (AFP) is a most primitive acronym expansion identi-
fication system from free text [13]. This system applies inexact pattern matching
algorithm for mining AE pairs.
A method [14] called three letter acronym (TLA) employs paralinguistic markers
such as parenthesis, commas, and periods to extract acronym definition from
technical and governmental documents. A database for acronym expansions is
developed by applying a large number of patterns and rules on massive Web pages,
and this system is known as Acrophile [15]. An algorithm [16] uses the knowledge
of pattern-based contraction rules, text markers (parenthesis), cue words (such as,
for example, etc.) for AE pairs extraction. In [17], the author applied four scoring
rules on the set of paths to locate the most possible definition in a window of text.
Pattern matching and five different space reduction heuristics are used for the
selection of AE pairs from biomedical text [18].
An approach [19] proposed by the authors solves the problem in [18] by
allowing the extraction of AE pairs from textual data even if the acronym and its
expansion appear in different lines. In [20], the authors used collocation measures
and parenthetical expressions to correlate expansions to an acronym.
In [21], the authors applied a C value measure which combines linguistic and
statistical information for automatic term recognition (ATR). And this measure
recommends nested terms which appear frequently in the text rather than specific
long terms. In [22], the authors build a classifier based on heuristics-based features
and user feedback is used to train classifier for recognizing acronym expansions.
The summary of heuristics approach is presented in Table 1.
2.2 Machine Learning Approach
Machine learning approach can overwhelm the difficulties available in heuristics

approach. Machine learning model can be created by using labeled examples. And
labeling the data is simpler than writing rules or regular expressions. Besides that,
machine learning model can employ different kinds of facts easily. In this approach,
acronym definitions and context they co-occur are denoted by features. Some
features are strong evidences, and some are weak evidences, but both can be
recognized using machine learning approach easily.
2.2.1 SVM-Based Approach
Support vector machine (SVM)-based approach converts acronym expansion

finding to a classification problem. In order to generate a proper acronym expansion
pair, SVM is trained with features that are available between acronym and its
corresponding definition. An approach uses space reduction heuristics on both
Table 1 Summary of heuristics-based Approaches

Approach and area Author, year, and Explanation
performance
Approach: Pattern/ Schwartz and • Acronym length should be 2–10 characters
rule-based Hearst [11] • Algorithm fails to extract AE pairs, if there is
Area: Bio medical Performance no accurate character matching between them
Evaluation corpus: metrics • The definition which contains comma character
MEDLINE abstracts Precision: 96% and one or two words inside parenthesis are not
[48] Recall: 82% recognized by the algorithm
Implementation F-factor: 88.4%
Language: Java Mohammed and • The method can extract AE pairs which contain
Abdul Nazeer digits
[19] • It can detect AE pairs even if they appear in
Performance multiple lines
metrics • AE pairs must be appeared either in acronym
Precision: 98.6% (definition) or definition (acronym) in the text
Recall: 98.6%
Rafeeque and • It uses five different space reduction heuristics
Abdul Nazeer for detecting the candidate acronyms and
[18] definitions
Performance • This system does not recognize acronym with
metrics digit and punctuation marks
Precision: 97.2% • The acronym and its candidate expansion
Recall: 92% should be in the same sentence
Approach: Pattern/ Yeates S (2000) • The system is called three letter acronym
rule-based Performance (TLA)
Area: Text mining metrics • TLA attempts to find the candidate acronym in
Precision: 68% each chunk and the candidate expansions in the
Recall: 91% preceding and following chunks
F-factor: 77.8% • The abbreviated expressions containing more
than one upper case letter from the acronym are
not recognized by TLA
Larkey et al. [15] • The system is called as Acrophile
Performance • A massive number of patterns and rules are
metrics used to identify AE pairs from the Web
Precision: 87% documents
Recall: 88% • The precision of Acrophile is low, due to the
F-factor: 87.5% existence of complex relationship between AE
pairs in Web documents
Approach: NLP-based Taghva K and • The system is called as acronym finder program
Area: Text mining Gilberth J (AFP)
Evaluation corpus: (1999) • Least common subsequence (LCS) is used to
MEDLINE abstracts Performance extract the expansion
[48] metrics • AFP does not support two-letter acronyms
Precision: 98% (e.g., CA, IP)
Recall: 93% • It does not allow interior letter matches in the
F-factor: 95.4% expansion with respect to acronyms
Pustejovsky et al. • The system is called as AcroMed
[10] • It uses regular expression and syntactically
Performance constrained algorithms for AE detection
metrics
(continued)
Table 1 (continued)
Approach and area Author, year, and Explanation
performance
Precision: 98% • It depends on the results of POS and hard to
Recall: 72% find the complex patterns
F-factor: 83%
Approach: NLP-based Zahariev M • Recognize acronyms only if it occurs
Area: Bio medical (2004) sequentially in expansion
Performance • The system does not account for acronyms with
metrics digits and symbolic characters
F-factor: 99.6%
Approach: hybrid Park and Byrd • The first character of the first word must match
[NLP + pattern] [16] the first acronym letter
Performance • The expression must not contain text markers,
metrics stop words at the beginning or at the end
Precision: 97%
Recall: 94%
F-factor: 95.9%
acronym and definition, and SVM to validate AE pairs is presented in [23] by the
authors. In [24], the authors used a linear approach called linear regression on a set
of features to find the possible alignments between acronym and expansions. The
authors [25–28] presented a SVM model, and it uses AE information as features
(e.g. length, existence of special symbols, and context) for recognizing acronyms
and their expansion from the text.
2.2.2 HMM-Based Approach
Hidden Markov model (HMM) is a statistical method which can be defined by a set
of states and transitions among these states, forming a hidden chain. Each state
produces a sequence of observation outputs but the state themselves neither or nor
known. A HMM model [29] is used for acronym expansion detection. The model
uses sequential structure of the sentences to find the acronym expansion pairs, but
the given solution is restricted because the expansion and the acronym should be
present in close vicinity. In [30], the authors used HMM model to recognize AE
pairs from biomedical text. Here, the model is built by considering the alignment
between characters or sequence of them in the acronym and expansion.
2.2.3 CRF-Based Approach
Conditional random fields (CRFs) are a class of statistical modeling method used
for labeling, segmenting the structured data in the form of sequences, trees, etc.
CRF considers the context into account for the prediction of given input samples. It
is an alternative to the HMM model. A CRF-based approach is proposed by the
authors [31, 32] to write more effective features for AE that works on a group of
neighboring tokens together with the features of individual tokens. They have used
nonlinear hidden layers for better representation of input data.
2.2.4 Neural Network-Based Approach
In [33], the authors viewed the acronym expansion detection problem as a sequence
labeling task and used a hidden layer using neural networks for modeling the
feature selection process. This is commonly known as neural conditional random
fields (NCRF). But their model ignores the fine-grained information due to the
substructure. A hierarchical latent structure neural structured prediction model [34]
is used by the authors for expansion identification. They have introduced latent state
neural conditional neural fields [LNCRF] to solve the problem of expansion
sequence labeling with nonlinear input features and label sub-structures.
In recent years, the usage of machine learning approach is increasing for acro-
nym expansion detection task. The inferences of both heuristics and machine
learning approach are presented in Table 2.
Table 2 Annotations of both heuristics and machine learning approach

Approach Inferences
Heuristics • Creating and fine-tuning rules for all kind of acronym expansion is a time
consuming process
• Developing patterns manually can limit the usage of information because
only strong evidences can be included in patterns
• Manually created features are often noisy and needless
• In heuristics-based techniques, mostly the acronyms are selected with a
small number of false positives, so the precision is high, but the recall is low
• Some of the regular expressions are adolescent because it selects the
acronyms with high precision but miss a lot of known patterns. And some
regular expressions are too wide, but it selects too many false-positive
acronyms
Machine • In SVM-based approach, each token is classified as either positive or
learning negative samples without considering the features of neighboring samples
• The HMM-based approach requires expert prior knowledge
• The performance of CRF is heavily depends on the quality of the input
features, but it is difficult for the human to build these features. Each class
label is having the complex sub-structures, but it ignores the intermediate
substructure information that lead to lose important information
• In neural network-based approach, latent state variables are used to capture
the granular structures of each class and learn the high-level representation
of complex input features
3 Web-Based Acronym Expansion Approaches
The research works which are related to mining acronym definitions based on Web
resources, disambiguating, and ranking the expansions. Finding the popularity
score and context words of the expansions is discoursed in this section.
A method [35] recommends a most appropriate expansion of an acronym. This is
done through a dataset built from Wikipedia disambiguation pages [36] by using
simple syntactic patterns. In [37], the authors used decision tree learning program as
classifier for disambiguating expansions. An unsupervised method [38] is employed
by the authors to extract acronym and its definition from Web. They have used
patterns and constraints for configuring their model as domain and language
independent. In [39–41], the authors used paralinguistic features and statistical
measures for extracting the acronym expansion from Web documents.
The heuristic approach is used for checking uppercase and parenthesis in text.
And linguistic features are employed for identifying phrase structures to identify
candidate sentence [42]. The system [43] discovers the expansions of acronym from
query-click log files of a search engine. It also employs methods for finding pop-
ularity score and context words of each expansion. A pattern-based approach called
AcroMiner [44] used for extracting acronym expansion from Web documents and
computed the rank of each expansion.
The authors [45–47] focused on generating knowledge map by recognizing
acronym expansion from two large-scale unstructured data sets; they are
(i) Wikipedia (collective intelligence) and (ii) NDSL (scholarly database). From
Wikipedia, acronym expansion pairs are extracted from disambiguation pages;
synonyms of expansions are identified via URI redirection page. From NDSL,
acronym expansion is extracted from free text. The NN type and NP type features
are used to train naïve Bayesian classifier to recognize correct acronym expansion
pairs. The summary of above-said approaches are presented in Table 3.
4 Scope for the Future Work
The survey result suggests that: (i) To extract AE pairs from text documents like
biomedical documents, the researchers can use either heuristics or any one of the
machine learning approach. (ii) To disambiguate the expansion, finding the ranking
score of an expansion, and identifying the context words related to an expansion,
the researchers can employ the Web resources like Search Engine Results Page
(SERP), log files of search engine, and Web-crawled documents.
As an extension of the survey, we have planned for detecting list of definitions of
an acronym from SERP of Google by using machine learning approach. Few of the
web-based acronym expansion approach uses snippets as resource from SERP for
the detection of expansion. But we have planned to use titles from SERP for
detecting expansions, because the availability of expansions in title part is higher
Table 3 Web-based methods for exploring acronym expansions

Author and Work and corpus Implementation
year
Do-Heon Work: generating knowledge map for • Constructed a gold standard source
Jeong et al. acronym expansion. from NDSL for AE disambiguation
(2014), Evaluation corpus • For disambiguating, the expansion
(2015) and • Wikipedia and NDSL abstracts. with highest frequency is selected as
(2016) right definition for the acronym
Sumita and Work: disambiguation of expansion • Focused on disambiguating acronyms
Sugaya Evaluation corpus by employing decision tree learning
[37] • Wikipedia program as classifier
• www.acronymsearch.com • To train the classifier, top N frequent
words obtained from snippets of
“acronym AND definition” query are
used as features
Donjin Work: recommending most • Recommend most relevant expansion
Choi et al. appropriate expansion using words for an acronym
[35] Wikipedia • Linguistic approach is used to detect
Evaluation corpus AE pairs from Wikipedia
• Wikipedia extended abstracts • WUP similarity measure is used to
discover the most appropriate
expansion
• It cannot detect expansion if the order
of the capital letter of successive
words is not same as the given
acronym
Sanchez D, Work: mining acronym expansion • Iterative query expansion algorithm is
Isren D, from Web. used in Web search to retrieve more
(2011) Evaluation corpus number of definitions
• Web search engine results • Domain and language-independent
approach
Mathieu Work: disambiguate biomedical • Provided nine quality measures for
Roche, acronym expansion. appropriate definition prediction. And
Violin Evaluation corpus those measures are based on mutual
Prince • Biomedical documents provided by information, cubic MI, and Dice’s
(2010) AcroMiner. coefficient
• 102 Acronym AE pairs identified. • Difficult in building a context from
the Web page and context extraction
relies on words frequency
Mathieu Work: deals with extraction of • Paralinguistic markers and statistical
Roche, acronym/expansion and measures are used for AE extraction
Violin disambiguating of these definitions. • Alta vista search engine and Java used
Prince Evaluation corpus for implementation
(2008), • www.sigles.net • Since more global patterns are used, a
(2014) lot of noisy data are returned
(continued)
Table 3 (continued)
Author and Work and corpus Implementation
year
Alpa Jain Work: web-based AE extraction and • AE pairs are extracted from three
et al. [42] ranking the expansions for an sources
acronym. (i) Crawled Web documents
Evaluation corpus (ii) Search engine logs and
• Web search engine results (live (iii) Web search results
search) • They compared each resources
performance in terms of precision and
recall
Xiaonan J Work: mining, ranking, and using • The system is known as AcroMiner
(2008) acronym patterns • Two strategies are used, namely lower
Evaluation corpus level and upper level for recognizing
• V.E.R.A acronym dictionary [49]. AE patterns
• Rank score is controlled by three
factors; they are
(i) Pattern popularity
(ii) Gap between the acronym
(iii) Mapping score of AE.
Bilyana Work: mining acronym expansion, The system performs four important
Taneva finding popularity score, and context tasks with acronym expansion. They are
et al. [43] words for each acronym. as follows:
Data Source: query clicks logs of bing • Candidate expansion identification
2010 and 2011. • Acronym expansion clustering
Evaluation corpus • Enhancement for tail meanings
Wikipedia disambiguation pages [36]. • Canonical expansion, popularity score
computation, and context words
identification
than the snippet part of SERP. Moreover, the text content subject to mine for
detecting expansion is lesser in this proposed idea.
Sequence labeling is an ideal method for this acronym expansion detection
process because writing rules or regular expressions for all kind of expansions is a
huge task. The machine learning approach is well suited for this sequence labeling
task. In recent years, few authors [31, 32, 34] used this sequence labeling method in
their work for detecting expansion from text documents. The statistical models
like HMM, CRF, and ANN are appropriate for this acronym expansion identifi-
cation task.
Basically, acronym is an ambiguous one; i.e., it can have multiple definitions. In
order to retrieve the maximum possible definitions of an acronym, the researcher
can use Web search engine result pages. To find the rank of each expansions, the
researcher can utilize fuzzy systems by framing rules in such a way that the rank of
the expansions will come under desired categories.
5 Conclusion
This survey reveals the approaches followed for mining acronym expansion from
free formatted text and Web. To recognize acronym and its expansion from text and
Web documents, heuristics and machine learning approaches are mainly employed
by different authors in the past two decades. Heuristics approaches uses NLP and
pattern matching concepts. Statistical models like SVM, CRF, HMM, and neural
networks are used in machine learning approaches. Both the approach has its own
merits and demerits. To assess the performance of those approaches, the researchers
predominantly used three metrics namely precision, recall, and F-score. And it is
observed that the recall value is lesser in heuristics approach than the machine
learning approach. In web-based acronym expansion detection approach, few
authors have been focused on disambiguation of acronym expansions, generating
knowledge map for AE pairs, finding popularity score, and context words of the
expansion. Mostly used Web resources by themselves are Wikipedia, Web-crawled
documents, log files, and search engine results.
References
1. http://www.acronymfinder.com
2. http://www.abbreviations.com
3. http://www.acronymslist.com
4. https://acronyms.thefreedictionary.com/
5. http://www.special-dictionary.com/acronyms/
6. http://www.acronymsearch.com
7. https://www.allacronyms.com
8. http://acronyms.silmaril.ie
9. http://acronym24.com
10. Pustejovsky, J., Castano, J., Cochran, B., Kotecki, M. Morrell, M.: Extraction and
disambiguation of acronym meaning-pairs in MEDLINE. In: Proceedings of 10th Triennial
Congress of the International Medical Informatics Association, pp. 371–375. MEDINFO, IOS
Press, London, (2001)
11. Schwartz, A., Hearst, M.: A simple algorithm for identifying abbreviation definitions in
biomedical text. In: Pacific Symposium on Biocomputing, vol. 8, pp. 451–462 (2003)
12. Zahariev, M.: Efficient acronym – expansion matching for automatic acronym acquisition. In:
International Conference on Information and Knowledge Engineering, pp. 32–37 (2003)
13. Taghva, K., Gilberth, J.: Recognizing acronyms and definitions. Information Science
Research Institute, University of Nevada, Technical Report TR, pp. 191–198 (1999)
14. Yeates, S.: Automatic extraction of acronyms from text. In: Proceedings of third New Zeland
Computer Science Research Student’s Conference, pp. 117–124, University of Waikato, New
Zealand (1999)
15. Larkey, L.S., Ogilvie, P., Price, M.A., Tamilio, B.: Acrophile: an automated acronym
extractor and server. In: Proceedings of 5th ACM Conference on Digital Libraries.
Association for Computing Machinery, pp. 205–214 (2000)
16. Park, Y., Byrd, R.J.: Hybrid text mining for finding abbreviations and their definitions. In:
Proceedings of Conference on Empirical Methods in Natural Language Processing, EMNLP,
pp. 126–133. Intelligent Information System Institute, Pittsburgh (2001)
17. Adar, E.: S-RAD: a simple and robust abbreviation dictionary. HP Lab. Bioinform. 20(4),
527–533 (2004)
18. Rafeeque, P.C., Abdul Nazeer, K.A.: Text mining for acronym -definition paris from
biomedical text using pattern matching method with space reduction heuristics. In:
Proceedings of 15th International Conference on Advanced Computing and Communications,
pp. 295–300. IEEE, IIT Guwahati, India (2009)
19. Saneesh Mohammed, N., Abdul Nazeer, K.A.: An improved method for extracting acronym–
definition pairs from biomedical literature. In: International Conference on Control
Communication and Computing (ICCC), pp. 194–197. IEEE (2013)
20. Liu, H., Friedman. C.: Mining terminological knowledge in large in biomedical corpora. In:
Proceedings of 8th Pacific Symposium on Biocomputing, PSB Association, Lihue, pp. 415–426
(2003)
21. Okazaki, N., Ananiadou, S.: A term recognition approach to acronym recognition, In:
Proceedings of the COLING – ACL’06, pp. 643–650. ACM, Sydney (2006)
22. Yarygina, A., Vassilieva, N.: High – recall extraction of acronym – definition pairs with
relevance feedback. In: BEWEB, pp. 21–26, ACM, Berlin (2012)
23. Nadeau, D., Turney, P.: A supervised learning approach to acronym identification. In:
Proceedings of 18th Conference of the Canadian Society for Computational Studies of
Intelligence, pp 319–329. Springer, Berlin (2005)
24. Chang, J.T., Schutze, H., Altman, R.B.: Creating an online dictionary abbreviation from
MEDLINE. J. Am. Med. Inform. Assoc. 9(6), 612–620 (2002)
25. Xu, J., Huang, YL.: A machine learning approach to recognizing acronyms and their
expansion. In: International Conference on Machine Learning and Cybernetics, IEEE, China
(2005)
26. Xu, J., Huang, Y.L.: Using SVM to extract acronyms from text. Soft Computing, pp. 369–373.
Springer, Berlin (2006)
27. Ni, W., Xu, J., Huang, Y., Liu, T., Ge, J.: Acronym extraction using SVM with uneven
margins. In: Proceedings of the 2nd IEEE Symposium on Web Society, pp. 132–138. IEEE,
Beijing (2010)
28. Gao, Y.M., Huang, Y.L.: Using SVM with uneven margins to extract acronym expansion. In:
Proceedings of the 8th International Conference on Machine Learning and Cybernetics,
pp. 1286–1292, IEEE, Baoding (2009)
29. Taghva, K., Vyas, L.: Acronym expansion via hidden Markov models. In: Proceedings of
International Conference on Systems Engineering, IEEE, pp. 120–125 (2011)
30. Osiek, B.A., Xexeo, G., de Carvalho, L.A.V.: A language - independent acronym extraction
from biomedical texts with hidden Markov models. IEEE Trans. Biomed. Eng. 57(11),
2677–2688 (2010)
31. Nautial, A., Sristy, N.B., Somayajulu, D.V.L.N: Finding acronym expansion using
semi-Markov conditional random fields. In: Compute 2014, India, pp. 16:1–16:6. ACM, (2014)
32. Liu, J., Chen, J., Liu, T., Huang, Y.: Expansion finding for given acronyms using conditional
random fields. In: WAIM, pp. 191–200 (2011)
33. Liu, J., Liu, C., Hu, Q., Huang, Y.: Fine – grained acronym expansion identification using
latent-state neural structured prediction model. In: Proceedings of International Conference on
Machine Learning and Cybernetics, pp. 259–264. IEEE, Guangzhou (2015)
34. Liu, J., Liu, C., Huang, Y.: Multi-granularity sequence labeling model for acronym expansion
identification. Inf. Sci. 38, 462–474 (2017)
35. Choi, D., Shin, J., Lee, E., Kim, P.: A method for recommending the most appropriate
expansion of acronyms using wikipedia. In: Seventh International Conference on Innovative
Mobile and Internet Services in Ubiquitous Computing, IEEE, pp. 217–220 (2013)
36. https://en.wikipedia.org/wiki
37. Sumita, E., Sugaya, F.: Using the web to disambiguate acronyms. In: Association for
Computational Linguistics (ACL), pp. 161–164. New York (2006)
38. Sanchez, D., Isren, D.: Automatic extraction of acronym definitions from the Web. J. Appl.
Intell. 34(2), 311–327 (2011)
39. Roche, M., Prince, V.: A web-mining approach to disambiguate biomedical acronym
expansions. Informatica 34, 243–253 (2010)
40. Roche, M., Prince, V.: Managing the acronym/ expansion identification process for text -
mining applications. Int. J. Softw. Inf. 2(2), 163–179 (2008)
41. Roche, M.: How to exploit paralinguistic features to identify acronyms in text. In: International
Conference on Language Resources and Evaluation, Reykjavik, Iceland, pp 69–72 (2014)
42. Jain A., Cucerzan, S., Azzam, S.: Acronym-expansion recognition and ranking on the web,
In: Proceedings of the IEEE International Conference on Information Reuse and Integration
(IRI 2007), pp. 209–214 (2007)
43. Taneva, B., Cheng, T., Chakrabarthi, K., He, Y.: Mining Acronym Expansions and their
Meanings Using Query Log. WWW 2013, pp. 1261–1271. ACM, Brazil (2013)
44. Ji, X., Xu, G., Bailey, J., Li, H.: Mining, ranking, and using acronym patterns, In: Proceedings
of the 10th Asia-Pacific Web Conference on Progress in WWW Research and Development,
pp. 371–382 (2008)
45. Jeong, D.H., Gim, J., Jung, H.: Incremental discriminating method for acronyms in
heterogeneous resources. Int. J. Adv. Soft Comput. Appl. 7(1), 59–67 (2015)
46. Jeong, DH.., Hwang, M.G., Kim, J., Jung, H. Sung, W.K.: Acronym- expansion recognition
based on knowledge map system. Int. Inf. Inst. (Tokyo). Inf. Koganei 16(12), 8403–8408
(2013)
47. Jeong, D.H., Hwang, M.G., Sung, W.K.: Generating knowledge map for acronym– expansion
recognition. In: International Conference on U-and E-Service, Science and Technology,
(UNESST), pp 287–293 (2011)
48. http://www.ncbi.nil.nih.gov [MEDLINE Abstracts]
49. http://www.delorie.com/gnu/docs/vera/vera_toc.html [V.E.R.A]
View publication stats

Surveypaper Springer

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Surveypaper Springer

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

A Survey on Acronym–Expansion Mining Approaches from Text and Web:

Chapter · January 2019

R.Menaha Senthilkumar Jayanthi VE

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

R. Menaha and VE. Jayanthi

© Springer Nature Singapore Pte Ltd. 2019 121

2 Acronym-Expansion Mining Approaches

Acronym Expansion Mining Approaches

Heuristics Approach Machine Learning Approach

NLP SVM HMM

Patterns CRF ANN

Fig. 1 Acronym-Expansion mining approaches

and the second is presented in Sect. 2.2. An Illustration of acronym expansion

2.1 Heuristic Approach

Heuristics-based approach includes NLP and pattern-based approaches. The

biomedical abstracts [11]. Dynamic programming is used in [12] for matching

2.2 Machine Learning Approach

Machine learning approach can overwhelm the difﬁculties available in heuristics

2.2.1 SVM-Based Approach

Support vector machine (SVM)-based approach converts acronym expansion

Table 1 Summary of heuristics-based Approaches

2.2.2 HMM-Based Approach

2.2.3 CRF-Based Approach

2.2.4 Neural Network-Based Approach

Table 2 Annotations of both heuristics and machine learning approach

3 Web-Based Acronym Expansion Approaches

4 Scope for the Future Work

Table 3 Web-based methods for exploring acronym expansions

View publication stats

You might also like