Professional Documents
Culture Documents
Mahesh P. K.
Visit to download the full and correct content document:
https://ebookmass.com/product/emerging-technologies-in-engineering-mahesh-p-k/
Emerging Technologies in
Engineering
Emerging Technologies in
Engineering
Editors-in-Chief
Mahesh P K
Don Bosco Institute of Technology, India
Su-Qun Cao
Huaiyin Institute of Technology, China
Ghanshyam Singh
Malaviya National Institute of Technology, India
Information contained in this work has been obtained by McGraw Hill Education (India), from sources believed to be reliable. However, neither
McGraw Hill Education (India) nor its authors guarantee the accuracy or completeness of any information published herein, and neither McGraw
Hill Education (India) nor its authors shall be responsible for any errors, omissions, or damages arising out of use of this information. This work
is published with the understanding that McGraw Hill Education (India) and its authors are supplying information but are not attempting to render
engineering or other professional services. If such services are required, the assistance of an appropriate professional should be sought.
Typeset at Text-o-Graphics, B-1/56, Aravali Apartment, Sector-34, Noida 201 301, and printed at
Cover Printer:
Committees
Honorary Chair
Dr. Shuvra Das (University of Detroit Mercy, USA)
Dr. Jiguo Yu (Qufu Normal University, China)
Dr. Pawel Hitczenko (Drexel University, USA)
Dr. Lisa Osadciw (Syracuse University, USA)
Dr. Jiguo Yu (Qufu Normal University, China)
Prof. Harry E. Ruda (University of Toronto, Canada)
Technical Chair
Dr. Anooj P K (Al Musanna College of Technology, Sultanate of Oman)
Dr. Urmila Shrawankar (G H Raisoni College of Engineering, India)
Dr. Ching-Chih Tsai (IEEE SMCS TC on Intelligent Learning on Control Systems)
Dr. A G Hessami (IEEE SMCS TC on Systems Safety & Security)
Dr. Xuelong Li (Birkbeck College, University of London, U.K.)
Dr. R Vijayakumar (MG University, India)
Dr. N.Nagarajan (Anna University, Coimbatore)
Chief Editors
Dr. Mahesh P.K (Don Bosco Institute of Technology, India)
Dr. Su-Qun Cao (Huaiyin Institute of Technology, China)
Dr. Ghanshyam Singh (Malaviya National Institute of Technology, India)
Technical Co-Chair
Dr. Natarajan Meghanathan (Jackson State University, USA)
Dr. Hicham Elzabadani (American University in Dubai)
Dr. Pingkun Yan (Philip Research North America)
General Chair
Dr. Janahanlal Stephen (Matha College of Technology, India)
Dr. Yogesh Chaba (Guru Jambeswara University, India)
General Co-Chair
Prof. K. U Abraham (Holykings College of Engineering, India)
Finance Chair
Dr. Gylson Thomas (Thejus Engineering College, India)
Dr. Zhenyu Y Angz (Florida International University, USA)
Publicity Chair
Dr. Amit Manocha (Maharaja Agrasen Institute of Technology, India)
vi Committees
Publicity Co-Chair
Dr. Ford Lumban Gaol (University of Indonesia)
Dr. Amlan Chakrabarti (University of Culcutta, India)
Publication Chair
Dr. Vijayakumar (NSS Engg. College, India)
Dr. T.S.B.Sudarshan (BITS Pilani, India)
Dr. KP Soman (Amritha University, India)
It’s a great privilege and pleasure for me to serve as the Editor-in-chief for the
IDES Joint International Conferences. Innovative ideas and research in two streams
are extremely important for the current Electronics industry to be in support to
“Digital India” and “Make in India” an initiative by Government of India. This Joint
International Conference provides a rostrum to the researchers from the academia
and industries all around the world to share their research results, novel ideas as well
as the improvements over the existing methodology.
This conference covers a wide variety of topics in Control Systems and Power
Electronics, to name a few—Mobile Communication Technology, Natural
Language Processing, Algorithm/Protocol Design and Analysis, VLSI Systems,
Intelligent Systems and Approach, Data Communication, Embedded System, Digital
Security, Data Compression, Data Mining, Databases, Digital Signal Processing,
Telecommunication Technologies, Control Theory and Application, Computational
Intelligence, Robotics, HVDC, MEMS-Related Technology. The response from the researchers for the research
papers is staggering.
I would like to convey my heartfelt gratitude and appreciation to the members of the following committees—
Honorary Chair, Technical Chair, Technical Co-Chair, General Chair, General Co-Chair, Publicity Chair, Publicity
Co-Chair, Publication Chair, Finance Chair, National Advisory Committees, Program Committee Chair, International
Advisory Committee, Review Committee, Program Committee Members for contributing either their precious time
in reviewing the papers or their effort in monitoring and making the conference a grand success. I would also like
to acknowledge the support of IDES, Matha College of Technology, Association of Computer Electrical Electronics
and Communication Engineers (ACEECom), ACEE and AMAE for organising such a platform to welcome the
future technology. I also wish to convey my gratitude to McGraw-Hill Education for publishing the registered
papers.
Mahesh P K
Don Bosco Institute of Technology, India
Preface
The goal of the joint conference is to promote research, developmental activities and
scientific information interchange between researchers, developers, engineers, students,
and practitioners working in India and around the world in the fields of Computer
Science, Information Technology, Computational Engineering, Communication,
Electrical Measurements, Instrumentation Engineering, Electronic Devices, Digital
Electronics, Circuits, Control and Instrumentation, Communication system, Robotics,
Power Electronics, Civil Engineering and Power Engineering.
The conference is jointly organised by the ECE Department of Matha College of Technology, the IDES,
ACEECom, ACEE and AMAE. I thank the members of the Organizing Committee and the Programming Committee
for their hard working in the past several months. I wish to express my heartfelt appreciation to the keynote
speakers, session chairs, reviewers and student helpers. Finally, I thank all the authors and participants for their
great contributions and exchanging the experiences.
Su-Qun Cao
Huaiyin Institute of Technology, China
Contents
Committees v
Foreword vii
Preface ix
3. Performance Analysis of AODV+ Routing Protocol for Wireless Ad-hoc Networks 28-35
Sanjeev Kumar Srivastava, Ranjana D Raut and P T Karule
4. Comparative Study of Various File Systems for Data Storage in Virtualized 36-44
Cloud Environment
Rashmi Jogdand, Mohan A Gholap and D R Gangodkar
9. Optimal PI Tuning with GA and PSO for Shunt Active Power Filter 69-75
N Gowtham and Shobha Shankar
16. Pedestrian and Vehicle Detection for Advanced Driver Assistance Systems 122-126
P Lavanya, G Harshith, Chiraag and S S Shylaja
17. Artificial Neural Network for Detection and Location of Faults in Mixed 127-136
Underground Cable and Overhead Transmission Line
Ankita Nag and Anamika Yadav
20. An EOQ Model Dealing with Weibull Deterioration with Shortages, 150-163
Ramp Type Demand Rate and Unit Production Cost Incorporating the Effect of Inflation
Jayshree and Shalini Jain
21. Parameter Centric XML Parsing Techniques for Embedded Systems 164-171
Rashmi Sonar, Sadique Ali and Amol Bhagat
24. Regression Analysis for Stock Market Prediction using Weka Tool Without 190-199
Sentiment Analysis
Sudip Padhye and Karuna Gull
26. Framework for Surplus Food Management using Data Analytics 209-214
M Sridevi and B R Arunkumar
Contents xiii
27. Group Key Management Protocol: Secured Transmission in Compliant Groups 215-222
Amol Bhagat and Lovely Mutneja
30. Handling Sink and Object Mobility in Wireless Sensor Networks 236-242
Kulwardhan Singh and T P Sharma
31. Instance based Multi Criteria Decision Model for Cloud Service Selection 243-251
using TOPSIS and VIKOR
Deepti Rai and V Pavan Kumar
33. Aadhaar based Secure E-Voting System using Cortex-A15 Processor 260-263
Prathiba Jonnala, Joshua Reginald Pullagura and Ashok Kumar Reddy K
34. An Secured and Energy Conserved Utilization Path Algorithm using Secret 264-275
Key and Adaptive Partition Controller in WSN
K Ramanan and E Baburaj
35. Smart Watchmen with Home Automation System based on Raspberry Pi 276-281
Vanita Jain, Ashwani Sinhal and Saksham Jain
36. Big Data Analytics using Hadoop Collaborative Approach on Android 282-288
Altaf Shah, Amol Bhagat and Sadique Ali
38. Five Level Inverter Fed Squirrel Cage Induction Motor Drive with Reduced 300-306
Number of Power Elements
R Harikrishnan, C N Ravi and S Vimala
39. Intelligent Web based Home Automation and Security System using 307-312
Raspberry PI
R Harikrishnan, C N Ravi, Liji S Job and Rudra J Kumar
40. Enhancement of Smart Grid Performance through Logic based Fault Tolerant MPSoC 313-322
D Vijayakumar and V Malathi
xiv Contents
43. A Comprehensive Measurement Placement Method for Power System State Estimation 336-342
Rakesh J Motiyani and Ajitsinh R Chudasama
45. Placement of Synchronized Measurements in Power Networks for Redundant Observability 349-356
Satyendra Pratap Singh and S P Singh
47. Behaviour of Square Model Footing on Sand Reinforced with Woven Coir Geotextiles 365-371
Dharmesh Lal, N Sankar and S Chandrakaran
48. Upgradation of a Building to Higher Certification Levels as per LEEDv4 – Case Study 372-380
N Amrutha Sudhakar and S Shrihari
49. Advanced Oxidation Process (AOP) for Removing High Concentration of 381-386
Iron in Drinking Water Sources
Lakshmy A Kumar and V Meera
53. Prediction of Compressive Strength for Different Curing Stages using Steepest Descent ANNs 418-421
K Rajasekhar and Gottapu Santosh Kumar
56. Biometric Identification using Lip Imprint with Hybrid Feature Extraction Techniques 434-444
Semonti Chowdhury, Joydev Hazra and Satarupa Bagchi Biswas
Contents xv
59. A Model to Enhance the Performance of Distributed File System for Cloud Computing 463-468
Pradheep Manisekaran and M R Ashwin Dhivakar
60. Fuzzy Logic Classification based Approach for Linear Time Series Analysis 469-476
in Medical Data Set
Manish Pandey, Meenu Talwar, Sachin Chauhan and Gurinderjit Kaur
61. Design and Implementation of Cost Effective Controller for Solar PV Application 477-493
Pulkit Singh and D K Palwalia
62. A New Approach of Offline Parameters Estimation for Vector Controlled 494-504
Induction Motor Drive
Krishna R More, P N Kapil and Hormaz Amrolia
63. A Comparative Study of Switching Strategies for Single Phase Matrix Converter 505-514
Mohammadamin Yusufji Khatri and Hormaz Amrolia
Non-Word Error Detection for Luganda
Robert Ssali Balagadde * and Parvataneni Premchand **
* ** Department of Computer Science & Engineering, University College of Engineering, Osmania University
Hyderabad, 500007, TS, India
baross.kla@gmail.com, Profpremchand.p@gmail.com
Abstract: Editing or word processing Luganda text has been an uphill task mainly because of lack of a system in this
environment which could detect spelling errors in Luganda. In this context, this research paper presents a model for non-word
error detection for Luganda (LugDetect) which comes in handy to address this gap and consequently provide a more user
friendly environment for editing Luganda text. To the best of our knowledge LugDetect is the first of this kind of system
developed for Luganda. Experimentation results show that LugDetect detects non-word errors with an accuracy (AP) of 100%
for all the five categories of Luganda words at an average speed of 1471 Hz (number of words per second) so long as the
erroneous word is not a real word. Experimentation on the Luganda corpus which was used in this research work shows that
19% of Luganda text is composed of Clitics Host Word Combination (CHWC), while the other part (81%)- Real Luganda
Words (RLW).
Keywords: non-word error detection, Luganda error detector, dictionary look-up technique, clitic-host word combination
(CHWC), spelling detector.
Introduction
Editing or word processing Luganda text has been an uphill task mainly because of lack of a environmental system which
could detect spelling errors in Luganda. In this context, this research work presents a model for Non-Word Error Detection
For Luganda (NWEDL- LugDetect) which comes in handy to address this gap, and consequently, provide a more user
friendly environment for editing Luganda text. To the best of our knowledge, LugDetect is the first of this kind of system
developed for Luganda.
One challenge encountered while developing a model for Luganda spell checking is dealing with the infinite number of
clitic-host word combinations (CHWCs) which makes Luganda distinct from other foreign languages especially non-Bantu
languages. In this research work, three types of CHWC are identified in respect to the use of inter-word apostrophe (IWA)
defined in subsection on ”The Error Detection Mechanism"
Type one - CHWC_1, bolded in Example 1 - are created as a result of compounding a modified monosyllabic word
(MMW) or modified disyllabic word with an initial vowel (MDWIV) with another succeeding word which begins with a
vowel. Compounding in Luganda may involve two to three words. MMW or MDWIV- referred to as clitic - is formed by
dropping the ending vowel of the mono-syllabic word (MW) or disyllabic word with an initial vowel (DWIV) and replacing
it with an apostrophe resulting into a long sound that is not represented by a double vowel.
Example 1
� omwenge n'ennyama (alcohol and meat) [conjunctive form]
� n'otema omuti (and you cut the tree) [narrative form]
� ew'omuyizzi (at the hunter's place) [locative form]
� Minisita omubeezi ow'ebyobusuubuzi ne tekinologiya (Minister of State for Trade and Technology)
� n'obw'embwa sibwagala (even that for the dog, I don't like)
� n'olw'ensonga eyo, sijja kujja (and for that reason, I will not come)
Example 2 shows some MWs and DWIVs (bolded) used in compounding to form CHWC_Is. Not all MWs and DWIVs are
used in compounding, and Example 3 shows some of these.
Example 2
� ne, na, nga [conjunction]
� ne, nga [narratives]
� be, ze, ge. gwe , bwe, bye, lwe, lye, kye, bye [object relatives]
� kya, ya, za. lya ba, bya, ga, gwa, ka, lwa, wa [possessives]
� ekya, eya, eza. erya, owa, aba [possessives with initial vowel]
� kye, ye, be, ze, twe [copulatives]
� e [locatives]
2 Sixth International Conference on Advances in Information Technology and Mobile Communication – AIM 2016
Example 3
� era, ate, nti, so [conjunction]
� atya, oti [adverbs]
� ggwe, nze, ye, yo. bo, zo, bwo [emphatic pronouns]
� bba [nonn]
� si [negation]
Type II - CHWC_II, shown in Example 4 - are formed as a result of using clitic ng' and nng' to represent "�����������������
double "���������������������������������������������������������������������
Example 4
� ng'ang'ala (to whimper like a dog)
� bbiring'anya (egg plant)
� agakung'anyizzaamu (to collect in it something)
� enng'anda (relatives)
� nng'oma (dram)
Type III - CHWC_III, shown in Example 5 - are formed as a result of using clitic ng' and nng' to represent "�����������������
double "��������������������ctively in words which are initially CHWC_1.
Example 5
� n'agakung'anyizzaamu (and to collect in it something)
� ng'eng'anda (like relatives)
� ng'enng'oma (like a dram)
In view of this, the first task is disambiguating the three types of CHWCs. The approach adopted to disambiguate these words
in LugDetect is discussed in subsection on "Disambiguating the CHWCs"
The second task is how to detect errors in each type of CHWCs. The approach adopted to provide this functionality in
LugDetect is discussed in subsection on "Spell Checking Luganda Clitic Host Word Combination Type I (CHWC_I)".
CHWC_IIs, which can be easy converted into real Luganda words (RLW) by substitution, are dealt with in subsection on
"Spell Checking RLW".
It is worthwhile noting that the CHWC_1 akin to Luganda are also found in other Bantu languages like Runyankore-
Rukiga, Kinyarwanda, among others. Example 6 and Example 7 shows samples of CHWC_I extracted from Runyankore-
Rukiga and Kinyarwanda text respectively.
Example 6
� ky’okutunga
� nk’eihanga
� n’ekya
� g’ebyemikono
Example 7
� n’uw’Umukuru ( and His Excellence)
� Nk’uko (like that)
� y’Umukuru (His Excellence)
� w’Igihugu (of a country)
The third challenge is deciding when to invoke LugDetect or the Error Detection Mechanism (EDM) during interactive
processing and during batch processing. For interactive processing, should LugDetect be invoked when the space bar (SB) is
struck or at every key-stroke? The former approach has a shortfall in that a modified word can miss out on spell checking
through the use of insertion bar (IB) movement keys - a list of which is shown in Table 1 - which move the position of IB
from the modified word to some other word without striking the SB. The latter approach would eliminate this problem but its
main problem is that it is strenuous on the system due to the numerous EDM invocations. A solution to this is the
development of an Algorithm for Interactive Invocation of EDM (IIEDM) discussed in subsection on "Interactive invocation
of EDM"
For batch processing, how should EDM be invoked? A solution to this is discussed in form of an algorithm. A summary
of the modules making up LugDetect is presented in subsection on "The Error Detection Mechanism"
LugDetect can detect numerical errors and word level punctuation errors, however, in this paper, we only mention them.
Details of the modules dealing with these errors can be obtained from "unpublished" [2].
Non-Word Error Detection for Luganda 3
The scope of LugDetect is limited to Luganda non-word detection. Named entities, abbreviations, e-mail addresses, uniform
resource locator (URL) strings are not handled in LugDetect, and neither are the real word errors.
Literature Survey
Errors
According to Peterson [15], spelling errors are basically of two types. Firstly, cognitive errors which are errors due to lack of
knowledge of the language and are errors often ignored on the assertion that they are infrequent. In this context, their
frequency is evaluated to be between 10% and 15% [17]. Secondly, typographical errors are 80% of "typos" (also called
single character errors) and are one of the following four types: one exceeding character, one missing character, a mistaken
character, or the transposition of two consecutive characters [4]. This means that 80% of errors are within edit distance of one
and almost all errors - within edit distance of two.
Error Detection
There are two techniques used in detecting non-word errors. First, the Dictionary Lookup Technique (DLT), in which each
word in the input text is lookup in the dictionary. In case a word is not found in the lexicon that word is a non-word or
incorrect, and therefore, flagged or entered into the list of erroneous words. The larger the lexicon the better is the results.
Shinghal and Toussaint [18] noted that DLT has low error rates, but is disadvantaged by large storage demands and high
computational complexity. Most current spelling correction techniques rely on DLT for non-word error detection.
Second, a technique that uses a non-positional bi-gram 26 by 26 matrix which captures information on the existence of
the bi-grams. A bi-gram is assigned a value of one if it exists in the corpus of the language in consideration, otherwise it is
assigned zero. All the bi-grams making up the input text are checked for their existence in the matrix. if any of the bi-grams
is non-existence than the word is flagged or a non-word. This technique is appropriate for Optical Character Recognition
(OCR) errors and it has proven to be less accurate for detecting human-generated errors.
Lexicon
User-lexicon can be interactively enriched with new entries enabling the checker to recognize all the possible inflexions
derived from them. A lexicon for a spelling correction or text recognition application must be carefully tuned to its intended
domain of discourse. Too small a lexicon can burden the user with too many false rejections of valid terms; too large a
lexicon can result in an unacceptably high number of false acceptances. The relationship between misspellings and word
frequencies is not straightforward.
Peterson [16]. recommend that lexicon for spelling correction be kept relatively small based on the fact that
approximately half a percent of all single error transformations of each of the words on a 350,000-item word list result in
other valid words on the list However, Damerau and Mays [5] challenge this recommendation by using a corpus of over 22
million words of text from various genres and they found that by increasing the size of their frequency rank-ordered word list
from 50,000 to 60,000 words; they were able to eliminate 1,348 false rejections while incurring only 23 additional false
acceptances. Since this 50-to-l differential error rate represents a significant improvement in correction accuracy; therefore,
they recommend the use of larger lexicons.
Dictionaries alone are often insufficient sources for lexicon construction. Walker and Amsle [19] observed that nearly
two-thirds (61%) of the words in the Merriam-Webster Seventh Collegiate Dictionary did not appear in an eight million word
corpus of New York Times news wire text, and conversely, almost two-thirds (64.9%) of the words in the text were not in the
dictionary.
On the topic of construction of the lexicon for the spell program, McIlroy [13] provides some helpful insights into
appropriate sources that may be drawn upon for general lexicon construction.
An article by Damerau [6] provides some insights into and guidelines for the automatic construction and customization of
domain oriented vocabularies for specialized NLP (Natural Language Processing) applications.
LugDetect
LugDetect, developed using Python programming language, identifies the type of Luganda word or token which need be
spell checked, and consequently, invokes the appropriate module for detecting the error. LugDetect works in tandem with
LugCorrect to provide detection and correction for Luganda words. In other words, LugDetect is required to detect the error
before LugCorrect can be invoked for correction purposes. The mechanism for LugCorrect, which provides correction
candidate list (CCL) or an explanation or hint on the type of error committed, is shown in Algorithm 1 and more details are
provided in [3].
4 Sixth International Conference on Advances in Information Technology and Mobile Communication – AIM 2016
GCCL
SubDic=Get all words beginning with 1st character of erroneous word from
lexicon
L=Number of words in SubDic
IF {L>10}
Get 1st 10 words from SubDic load in Dictionary with their corresponding
Jaccard coefficient(JC)
ELSE
Get all words in SubDic load in Dictionary with their corresponding JC
ENDIF
REPEAT
swap if JC of word in SubDic is greater than minimum JC of word in
dictionary
UNTILL all words in SubDic have been checked
The two models form LugSpell, a model for an interactive spell checker providing spelling feedback while the end user is
word processing their document in the editor. However, in this article, more emphasis is directed towards LugDetect.
The Luganda corpus which was experimented upon in the development of LugDetect was developed from Luganda text
obtained from an online local news paper, Bukedde.
presentation of MMW which is formed from the "nga" MW during Luganda compounding, which process results into
formation of "nga" CHWC_1 shown in Example 8. In other words, the task is how do we disambiguate between the use of
clictics ng' in Example 4 and in Example 8.
Example 8
� ng'amagi (like eggs)
� ng'oggyeko (after removing)
� ng'onoobufunamu (If you will benefit)
� ng'era (and also)
In a bid to address this ambiguity, two groups of words were compiled from the Luganda corpus as well as Luganda
dictionary - namely, Group I which contained the "nga" CHWC_1 and Group II which contained CHWC_II, that is, words
with the character ng' (���- and studied them for common feature extraction. The result of the study showed that:
� Words in Group I always begun with the proclitic ng' and there is no situation in which a word in Group II began
with the same proclitic - that is to say, with a single '��� ���������� - except with the verb "ng'ang'ala" (��������
meaning "to whimper like a dog".
� Words in Group II if they begin with character '��������������������������������������������������������������������������
in one case, "ng'ang'ala" [����������
� Clitic nng' is purely a Group II characteristic and so is clitic ng' after the second position.
The approach adopted was, firstly, to extract "�������������������������������������������������������������������������������
observations into consideration and implement them in LugDetect to disambiguate these two groups by substitution. It is
worthwhile noting that during the process of substitution, the CHWC_IIIs are automatically converted to CHWC_Is, while
the CHWC_IIs are converted to Real Luanda Words (RLWs). The algorithm developed is shown in Algorithm 2.
IF{Token = CHWC}
Check whether the word is "ng'ang'ala" or "ng'a����"
IF{Token != "ng'ang'ala" or Token != "ng'a����" }
Substitute 'ng' with '�' in word at 2nd position upward if exists
ELSE
Substitute "ng'ang'ala" or "ng'a����" with "������"
ENDIF
ENDIF
IF{Token = CHWC}
Token is CHWC_I
Invoke CHWC_I Module
ELSE
Token is CHWC_II and in this case, Token is RLW
Invoke RLW Module
ENDIF
� In CHWC_1_2, the first component (C11) is MMW or MDWIV or locative 'e', a form of MW; and the second
component (C21) is a real Luganda word with an initial vowel (RLWIV)
� In CHWC_1_3, the first component (C12) is either MMW or MDWIV or 'e'; the second component (C22) is either
MDWIV or 'e'; and the third component (C32) is RLWIV
In view of these observations, a decomposer was developed to break the CHWC_1 into its constituent components. For
detection purpose, DLT was applied on each component but using three different lexicons. The first lexicon (L1) containing
MMWs, MDWIVs and 'e'; the second lexicon (L2) containing MDWIVs and 'e'; and the third lexicon (L3) containing
RLWIVs. The good news is that the number of MWs and DWIVs which are used in Luganda compounding is finite,
therefore, simplifying the work of developing the corresponding lexicons. The three lexicons were kept separately because
each has its own unique characteristics.
For CHWC_1_2, C11 is checked with L1, and C21 - with L3. For CHWC_1_3, C12 is checked with L1, C22 - with L2, and
C32 - with L3. Figure 1 show the flowchart for the detection and correction process for Luganda CHWC_Is. Note that the
methodology begins with the checking of word level punctuation details and algorithm of which are discussed in
"unpublished" [2]. The correction process - whose algorithm is shown in Algorithm 1 - involves generation of the Candidate
Correction List (CCL).
Start A
CHWC_I
+
More Com-
+ B
Check Word ponents?
P Error?
Punctuation (P)
Mark Word
for P Error
+
CHWC_1_3? D4
End
Decomposer Decomposer
L1
+ +
Mark word
S Error? Correction
for Error
Needed?
Figure 1 Flow Chart for Error Detection and Correction Algorithm for CHWC_1
8 Sixth International Conference on Advances in Information Technology and Mobile Communication – AIM 2016
LIGANDA ALPHABATE
26 CHARCTERS
{A a, B b, C c, D d, E e, F f, G g, H h, I i, J j, K k, L l, M m, N n, Ny ny, ����-
(Ng' ng'), O o, P p, R r, S s, T t, U u, V v, W w, Y y, Z z}
Figure 2 Categorisation of Luganda Alphabet composed of 26 characters sorted in the prescribed order
Table 1 Insertion bar (IB) movement keys with their respective description
Start
RLW
Check Word
Punctuation
+ Mark Word
Punctuation
(P) Error? for P Error
+ Mark Word
Spelling
(S) Error? for S Error
+
Correction
Needed?
End
Figure 3 Flow Chart for Detection and Correction Algorithm for RLW
The CHWC Disambiguation Module (D3) and RLW Module, the two modules invoked by D2, are elucidated in subsection on
"Disambiguating the CHWCs" and subsection on "Spell Checking RLW" respectively.
D3 determines the type of CHWC which has been passed to it, and then passes it on to the appropriate module as
stipulated by the following rules:
i. If token is CHWC_I then pass it on to CHWC_I Module, discussed in subsection on "Spell Checking Luganda
Clitic Host Word Combination Type I (CHWC_I)"
ii. If token is CHWC_II then convert it to RLW and pass it on to RLW Module aforementioned.
10 Sixth International Conference on Advances in Information Technology and Mobile Communication – AIM 2016
IF{char = white space} \\ This results into splitting the word into two words,
\\ that is, left word and right word
For left word
Invoke EDM
For right word
IF{insertion bar leaves word}
Invoke EDM
ELSE
IF (Word=marked}
Word != marked
ENDIF
ENDIF
ENDIF
Get Token
REPEAT
Invoke EDM
UNTIL no more tokens to check
Component Detection Module (D4), which determines the number of components in the CHCW_I, and Decomposer are sub
modules in the CHWC Module.
Non-Word Error Detection for Luganda 11
Luganda Token
CHWC Word
D3 D2 D1 Free Punctuation (P)
RLW
(CHWC_II) Number
CHWC_I
RLW FS FD FT
CHWC_I_3
D4
DA
L4
CHWC_I_2
KEY
L1 -Lexicon: MMWs, MDWIV & E
Decomposer
Decomposer L2 -Lexicon: MDWIV & E
L3 -Lexicon: RLWIV
L4 -Lexicon: RLW
C11 C21 C12 C22 C32 DA-Data Storage: Decimal
Alphabet
FS-Data Storage: Free Single P
FD-Data Storage: Free Double P
L1 L3 L1 L2 L3
FT-Data Storage: Free Triple P
D1: Type Detection Module
D2: RLW Detection Module
D3: CHWC Disambiguation Module
D4: Component Detection Module
Figure 4 Summary of modules making up the Error Detection Mechanism for LugDetect
Get Token
IF{Token=Word}
Pass Token to RLW Detection Module (D2)
ELSE IF{ Token=Number}
Pass Token to Number Module
ELSE
Pass Token to Free Punctuation Module
ENDIF
12 Sixth International Conference on Advances in Information Technology and Mobile Communication – AIM 2016
Get Word
Evaluation
Experimental Setup
Determining the Distribution of the Various Types of Luganda Words in the Corpus
The Luganda corpus which was used in the experimentation has 20,000 tokens collected from 50 articles from different
subsection of an online local news paper, Bukedde. These articles were cleaned of any misspelling and their vocabulary was
extracted and statistically analysed. The lexical density (LD), which defines the lexical richness of the text, was determined
using Equation1. The results of the analysis are presented in Table 2, and diagrammatically, these results are shown in Figure
5, Figure 6, and Figure 7.
LD = TT / TV (1)
Where:
� TT represents the total number token in the text
� TV denotes total number of tokens making up the vocabulary of the text
LUGANDA VOCUBULARY
RLW CHWC
(80.78%) (19.22%)
CHWC_I_2 CHWC_I_3
(97.49%) (0.43%)
Distribution of
Luganda words
CHWC
19%
RLW
81%
Distribution of CHWC
Other
CHWC_I2%
_3
0%
CHWC_II
1%
CHWC_I
_2 CHWC_II
98% I
1%