Professional Documents
Culture Documents
Wubetubarud@gmail.com or wubetubarud@yahoo.com
Abstract
Multilingual spelling checker is a tool that needs to be developed for all users. Spelling checker is a
prerequisite to be digitized. It is one of the applications of natural language processing that detects and
corrects errors in natural languages accordingly. Spelling checker applications that have been developed
will be integrated with other natural language processing applications. The research paper that I have
proposed as a model of multilingual spelling checker that is based on dictionary based technique and it is
applied in error detection and correction for five selected Ethiopian languages including Amharic, Afan
Oromo, Tigrinya, Hadiyyisa and Awngi. This model provides correction and suggestion by selecting the
most suitable from a list of corrective suggestions based on lexical resources and dictionary based statistics
and it depends on the lexicon of the selected five Ethiopian languages.
The evaluation of the model uses Amharic, Afan Oromo, Tigrinya, Hadiyyisa and Awngi words in dictionary
form for each of the languages. All language spelling errors have been detected (by using red zigzag line)
and it automatically detects the error from list of words that have been prepared in dictionary. This
approach detects the error with efficiently and effectively with minimum time interval. After effective
evaluation of the model that I have developed for the selected languages, Precision, recall and F-mesures
have been calculated.
Keywords: Error Correction, Error Detection, Multilingual, Suggestion, Spelling Checker, Types of
Errors
I. INTRODUCTION
Multilingual spelling checker which directly identifies what natural language is being dealt with and shifts
to the proper
Spelling checker for the languages that the users are interesting to do so.
Language is a medium of communication and which helps human beings to exchange ideas and
information.
Spelling checker system for languages would be used to check spellings for any kinds of spelling errors.
The spelling error detection and correction tools work on word level and use a dictionary based technique.
Every word from the text is looked up in the speller lexicon. When a word is not in the dictionary, it is
detected as an error. In order to correct the error, spelling checker searches the dictionary for words that
resemble the erroneous words. These words are then suggested to the user who chooses the word that was
intended. Spelling checker systems are used in various Natural Language Processing Applications (NLPA)
including parts of speech tagger [1] [2] and as grammar checker for natural languages [3].
There are two main issues related to spelling checker. These are error detection and correction. In
developing upon the types of errors are non-word and real word errors. There are many techniques available
for detection and correction. In this paper, I have been designed, implemented and evaluated an end to end
system that performs spelling checker and auto correction for multiple Ethiopian languages.
in the dictionary file list. Fifthly, I made an evaluation based on Awngi language test data which are in the
dictionary file list. In order to evaluate spelling error detection capability of the selected approach for all of
the selected five Ethiopian languages, precision, recall, and F1 measure were used as metrics. The
comparative locations of the correct spellings in the reasonable suggestions list were used to evaluate
spelling error correction.
IX.I. Test Data
I have used manually prepared spelling error test corpora for evaluation of the performance. For all selected
Ethiopian languages, I have used a test corpus which have been collected from different sources that are
balanced.
I have prepared word dictionaries for all languages as follows.
Table 1:word dictionaries for all languages
No Language Amount of words (dictionary files)
1. Amharic 993,072
2. Afan Oromo 866,328
3. Tigrinya 966,328
4. Hadiyyisa 987,176
5. Awngi 678,534
Here, to evaluate and compute the actual scores, we used the manually compiled test data as the gold
standard/ balanced data set for the evaluation.
TP
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = TP+FP----------------(1)
TP
𝑅𝑒𝑐𝑎𝑙𝑙 = TP+FN--------------------(2)
2∗(Precision+Recall)
𝐹1 = Precision+Recall ----------(3)
The excellence of suggestions obtainable by a spelling corrector is dignified by the virtual locations of the
accurate spellings in the suggestions list that have been prepared in dictionary suggestions list. In the best
situation, the right correction always appears on the topmost of the list accordingly.
import javax.swing.JEditorPane;
import javax.swing.JFrame;
import javax.swing.JTextPane;
import com.inet.jortho.FileUserDictionary;
import com.inet.jortho.SpellChecker;
import java.awt.Font;
public class SampleApplication extends JFrame{
public static void main(String[] args){
new SampleApplication().setVisible( true );
}
private SampleApplication(){
super(" Multilingual Spelling Checker/የአምስት ቋንቋዎች ቃላት አፃፃፍ ስርዓት");
JEditorPane text = new JTextPane();
Font font = new Font("", Font.BOLD, 22);
text.setText( "Multilingual Spelling Checker "
+ "የአምስት ቋንቋዎች ቃላት አፃፃፍ ስርዓት" );
add( text );
text.setFont(font);
setSize(200, 160);
setDefaultCloseOperation( EXIT_ON_CLOSE );
setLocationRelativeTo( null );
SpellChecker.setUserDictionaryProvider(new FileUserDictionary() );
SpellChecker.registerDictionaries( null, null );
SpellChecker.register( text );
}
}
Example, if we run the above java source code firstly, we will get the GUI which looks like:
As we have seen from the above GUI, the words are underlined in red zigzag line. It indicates that all are
not in dictionary file lists. So the user should have to click on the underlined word (can use "F7" from the
computer keyboard) to display the list of alternatives. After this, the system user will get the following
GUI.
I have tested the proposed system by creating commonly known errors of the selected languages for users
of it accordingly.
Since the collected and used dictionary files of each language are from different genres, the system checks
the error easily and suggests the best alternative from the given list of words that have been provided in the
dictionary. Here, firstly we should have to select the language. The proposed model also suggests the
appropriate word from dictionary based on the user’s query.
Similarly, if the systems dictionary file has more related words, it will suggest all possible list of words
accordingly. For example, consider the sentence " የኢንፎርሜሽን ቴክኖሎጂ ትምህርትን ክፍል" and list of
suggestions
.
XI. CONCLUSION
Spelling checkers are fairly reliant on the words in the lexicon dictionary. Some words have very few words
spelled similarly, so even numerous faults will recover the accurate word. Other words will have many
likewise spelled words, so one error may make alteration problematic or unbearable. This paper proposes
multilingual spelling checker for selected Ethiopian languages that is based on dictionary based method. It
is used in noticing and modifying diverse classes of spelling errors. The main features of the planned model
can be précised in giving of the proposals for noticed errors and providing the correction automatically
using the first suggestion. Furthermore, the planned model is calculated using dictionary based data sets for
all languages that the researcher has been selected for the study.
REFERENCES
1. W. B. Demilie, “Parts of Speech Tagger for Awngi Language,” vol. 9, no. 9, 2019.
2. K. Desta, “Part of Speech Tagger for Hadiyyisa Language.”
3. D. Tesfaye, “A rule-based Afan Oromo Grammar Checker,” vol. 2, no. 8, pp. 126–130, 2011.
4. I. of Spelling, “Importance of Spelling.” .
5. A. M. Gezmu, A. Nürnberger, and B. E. Seyoum, “Portable Spelling Corrector for a Less-Resourced
Language : Amharic,” pp. 4127–4132, 2014.
6. G. O. Ganfure and D. Midekso, “Design And Implementation Of Morphology Based Spell Checker,”
vol. 3, no. 12, pp. 118–125, 2014.
7. M. D. Jeldu and R. Mehta, “Rule based afan oromo analyzer for spell checker 1 1,2,” no. 7, pp. 36–
39, 2018.
8. P. dr. B. Y. (Addis A. University), A Typology of Verbal Derivation in Ethiopian Afro-Asiatic
Languages. .
9. W. T. A. D. Tamirat, “Afan Oromo Sentence Structure.” .
10. V. J. Hodge and J. Austin, “A Comparison of Standard Spell Checking Algorithms and a Novel
Binary Neural Approach,” vol. 15, no. 5, pp. 1073–1081, 2003.
11. B. O. Connor, “Edit Distance, Spelling Correction, and the Noisy Channel,” 2015.
12. F. Ahmed, E. W. De Luca, and A. Nürnberger, “Revised N-Gram based Automatic Spelling
Correction Tool to Improve Retrieval Effectiveness,” 2009.
13. S. M. El Atawy, “Automatic Spelling Correction based on n-Gram Model,” vol. 182, no. 11, pp. 5–
9, 2018.
14. H. L. Liang, “SPELL CHECKERS AND CORRECTORS : By,” no. November, 2008.
15. A. Samuelsson, “Weighting Edit Distance to Improve Spelling Correction in Music Entity Search
Weighting Edit Distance to Improve Spelling Correction in Music Entity Search,” 2017.
16. D. Sundby, “Spelling correction using N-grams.”
17. R. Kumar, M. Bala, and K. Sourabh, “A study of spell checking techniques for Indian Languages,”
no. March, pp. 105–113, 2018.
18. A. A. Patil and P. R. Sharma, “Study and Review of Selective Spell Checking,” pp. 1049–1056, 2019,
doi: 10.15680/IJIRSET.2019.0802064.
19. O. Wilde, “Spelling Correction and the Noisy Channel,” 2019.
20. T. A. Pirinen and M. Silfverberg, “Improving Finite-State Spell-Checker Suggestions with Part of
Speech N-Grams,” vol. 3, no. 2, pp. 153–166, 2012.
21. T. A. Pirinen, Weighted Finite-State Methods for Spell-Checking and Correction. 2014.