Natural Language Generation For Bangla G PDF

Natural Language Generation for Bangla Grammatical
Error Detection and Correction
A THESIS
submitted by
BIBEKANANDA KUNDU
for the award of the degree
of
MASTER OF SCIENCE
(by Research)
DEPARTMENT OF COMPUTER SCIENCE AND

ENGINEERING
INDIAN INSTITUTE OF TECHNOLOGY, MADRAS.
April 2014
THESIS CERTIFICATE
This is to certify that the thesis titled Natural Language Generation for Bangla
Grammatical Error Detection and Correction, submitted by Bibekananda Kundu,
to the Indian Institute of Technology, Madras, for the award of the degree of Master
of Science (by Research), is a bona fide record of the research work done by him
under our supervision. The contents of this thesis, in full or in parts, have not
been submitted to any other Institute or University for the award of any degree or
diploma.
Dr. Sutanu Chakraborti Mr. Sanjay Kumar Choudhury

Research Guide Research Co Guide
Assistant Professor Principal Engineer
Dept. of Computer Science and Engineering Language Technology
IIT-Madras, 600036 CDAC Kolkata, 700091
Place: Chennai Place: Kolkata
Date: Date:
ACKNOWLEDGEMENTS
I would like to take this opportunity to say big generic thanks to many people
who deserve to be mentioned on this page, but aren’t mentioned here by name.
Their various sobering and heartening contributions are unforgettable. I wish to
express my deepest gratitude to my supervisors Dr. Sutanu Chakraborti and Mr.
Sanjay Kumar Choudhury for introducing me to this research topic and provid-
ing their valuable guidance, inspiring discussions and unfailing encouragement
throughout the course of the work. I have enjoyed considerable freedom under
their guidance. They alerted me whenever I was in the wrong track during my
research. They shaped how I think, write and do research at a very deep level
and also taught me how to see where an approach will fail even before I try it. I
have learned from them how to think like a linguist and a computer scientist at the
same time. Their enthusiasm provided a lot of fun in my research. I really wonder
at their patience on my writing, struggling to understand every bit of it, always
raising questions and providing new exciting ideas. Their advice, encouragement,
constructive criticism helped me a lot in this research. They always had faith in
my capabilities and made sure that I expressed myself more clearly. They pushed
me to think to explore newer avenues of the research. Whenever I felt that I could
not get any further with grammar checking they came up with new ideas which
motivated me to work hard. I could not imagine having better advisers and men-
tors for my MS research. I consider myself fortunate to be your advisee and I am
honoured to be your student in the first place. Work on this dissertation would not
have been possible without encouragement, contributions and constant support
i
from them. During my research work I have worked in a nice and stimulating
research group where I got much pleasure. Many thanks to all my colleagues of
Language Technology section of CDAC Kolkata and my friends of IIT Chennai for
creating an inspiring research environment with interesting projects, seminars and
conferences. I especially want to mention Sudipta Debnath for being the helping
hand next door whenever I needed. His inspiration and useful suggestions moti-
vated me in this work. Many of the ideas embodied in this study were crystallized
with the help of his support. Programming libraries, subroutines provided by
him appeared as the pillars for building the prototype of the system. I don’t have
enough words to express my feeling, respects and thanks to him. I would also
like to thank Abhijit Chatterjee, Debarun Kar, K.V.S Dileep and Mridusmita Mitra
for helping me a lot, in particular, collecting valuable resources for my study and
careful proof reading of my write-ups. They were the first reviewers of some of
the chapters of this thesis. They noticed many typos, errors, and strange sentence
structures. Their work was much more valuable than that of any grammar checker!
Any errors that still remain in this dissertation are my sole responsibility, but I can
assure you that there are far fewer now than they were used to be. I would also
like to thank Pampa Bhattarchayya and Mridusmita Mitra for always being helpful
regarding necessary linguistic information, whenever I needed. I would like to
thank Subash Chandra who is the first user of my system as a second language
learner of Bangla, Hindi being his mother tongue. He had provided a number of
non-native data which helped me a lot in this research work. I would also like to
thank Pradeep Raychoudhury for helping me to prepare the materials for presen-
tations and posters. Some persons deserve special mention for discussions that
contributed quite directly in this research: Barnali Pal, Sita Rajmohan, Tulika Basu,
Joyanta Basu and Rajib Roy. I am very appreciative of my classmates and friends
ii
at IIT Chennai who participated in this study. These includes Debarun, Dileep,
Sourav, Prateek, Smith and many more. They created a nice research environment
where we have argued, discussed and nurtured our ideas during my presence in
IIT or even some time over telephonic conversations. Without their support, this
study would not have been possible. I am grateful to my organization CDAC
Kolkata for providing me the necessary leaves and infrastructure facility for con-
tinuing my research as an External candidate. I would like to express my heartfelt
gratitude to Executive Director of CDAC Kolkata Col. A. K. Nath and Ex-executive
directors Sri A.B. Saha and Sri R. Rabindra Kumar for their inspiration, motivation
and support for this study. I am very much thankful to all the faculty members,
staff members and research scholars of the Department of Computer Science and
Engineering of IIT Madras for their direct or indirect help in various forms during
my course work and research work. The NLP community have provided excel-
lent feedback and essential criticism on my work through anonymous reviews.
Many useful comments have been provided by conference attendees. I am espe-
cially thankful to Prof. Robert Dale (Macqarie University), Prof. Sudeshna Sarkar
(IIT Kharagpur) and Prof. Puspak Bhattarchayya (IIT Bombay) for their valuable
suggestions and feedback during my presentation at ICON 2011. I am also thank-
ful to Dr. Michael Gamon (Microsoft Research) and Prof. Kevin Knight (USC,
Information Sciences Institute) for providing invaluable feedback and necessary
references. My deepest gratitude goes to my parents who have encouraged me
to pursue my studies, for being there for me and for always believing in me. My
parents have always inspired me for education right from my childhood, which
becomes a source of eternal driving force for me to pursue a higher degree. Their
prayers are always a great source of strength for me. Their supports have brought
me to where I am now. I cannot find appropriate words to thank my wife Soma for
iii
her steady support, encouragement and love throughout the difficult times in my
career. She must also be thanked for her caring during my entire research work.
Many a time she helped me to decide the title of the papers I have written for
conferences. She carefully read my write ups and motivated me to think from a
readers’ point of view. Along with everything else, I am grateful for her constant
support. She did everything she could to make sure I had enough time to finish my
work. I also could never have completed this study without all the encouragement
and support which I have received from my elder brother, parents-in-law and my
sister-in-law. Thank you for always being there. I am indebted to you all a lot
and cannot thank you enough. I owe all of my success to the essential things that
my family has given me over the years. I am dedicating this thesis to my family.
Finally I thank all my well-wishers who directly or indirectly contributed for the
completion of this thesis.
iv
ABSTRACT
KEYWORDS: Automatic Grammar Correction, Natural Language Gener-
ation, Automatic Error Corpora Creation, Active Learning
Based Complexity Estimation.
Learning a new language is an integral part of human life. Even after years of
learning, a person is prone to commit mistakes. These errors are due to their
lack of knowledge of the target language and influence of their previously learnt
language [Leacock et al., 2010]. As a consequence, it has been felt that automatic
detection and correction of grammatical errors will be of immense help as an aid
for language learning.
Automatic detection and correction of grammatical errors in a morphologi-
cally rich and free word order language like Bangla is a non trivial task. Little
research has been done on detection and correction of grammatical errors in such
languages. For Bangla language, this work needs to be done denovo. The problem
is to automatically detect and correct an ungrammatical Bangla sentence having
postpositional and nominal inflectional errors. A methodology needs to be devised
for correcting the mistakes committed by users and also to provide relevant ex-
amples for supporting the suggested correction. To have an idea on how strongly
we can rely on such a correction, it will be useful to devise a measure of sentence
complexity with respect to the grammar correction task. If a sentence is complex,
the user should not be overtly reliant on the correction suggested by the system.
v
Conversely if the complexity measure is low, the user can confidently choose the
suggestion.
A sufficiently large error corpus is essential for training and testing of grammar
correction methodology. Manual collection of huge error corpora is a tedious and
time consuming task. There is a dearth of error corpora for Bangla Language.
Therefore, a synthetic error corpora creation methodology has been proposed.
Divergence between two languages influences second language learners to commit
grammatical mistake. It has been widely studied that the divergence between a
pair of languages has a profound effect on various fields of NLP [Dorr et al.,
2002; Dave et al., 2001; Goyal and Sinha, 2009]. The effect of divergence becomes
more pronounced and acute for widely varying language like English and Bangla
[Bhattacharyya et al., 2011; Dave et al., 2001; Goyal and Sinha, 2009]. Bangla is a
morphologically rich language [Bhattacharya et al., 2005; Dandapat et al., 2004] and
has a free word order. Therefore, State-of-the-art Context Free Grammar (CFG)
is not applicable [Begum et al., 2008; Shieber, 1985; Bharati et al., 2010] here. In
addition to this, lack of robust parsers, insufficient linguistic rules and dearth of
error annotated parallel corpora make this grammar correction task much more
challenging. To address these issues, a novel approach has been proposed for
automatic detection and correction of Bangla grammatical errors using a Natural
Language Generation (NLG) technique.
Evaluation of grammar correction system is one of the challenges in this area of
research. Performance of most of the available grammar checkers cannot be com-
pared as different systems address different types of errors. Moreover, testing on a
common dataset is particularly problematic when different grammar checkers are
designed for different languages. To circumvent these problems, a Methodology
vi
for Evaluation of Grammar Assessment (MEGA) combining a Graded Acceptabil-
ity Assessment Metric (GAAM) and a Complexity Measurement Metric (CMM) has
been introduced. Initially, MEGA has been applied on our Natural Language Gen-
eration (NLG) based Bangla grammar checker. Since direct comparison between
available English grammar checker and the NLG based Bangla grammar checker
is not possible, the NLG based system has been compared against a prototype
Bangla grammar checker based on standard Naïve Bayes classification. Results
show that NLG based approach for our Bangla grammatical error detection and
correction system outperforms the Naïve Bayes classifier system.
vii
TABLE OF CONTENTS
ACKNOWLEDGEMENTS i
ABSTRACT v
LIST OF TABLES xii
LIST OF FIGURES xiv
ABBREVIATIONS xv
NOTATION xviii
1 INTRODUCTION 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Divergence Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 LITERATURE SURVEY 10
2.1 Spell Checker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Grammar Checker . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Automatic Grammar Correction Approaches . . . . . . . . . . . . 18
2.3.1 Rule-based Approach . . . . . . . . . . . . . . . . . . . . . 19
2.3.2 Machine Learning Approach . . . . . . . . . . . . . . . . . 27
2.3.3 Statistical Machine Translation Approach . . . . . . . . . . 39
2.4 Comparison between existing approaches . . . . . . . . . . . . . . 41
2.5 Open Problems and Future Directions . . . . . . . . . . . . . . . . 44
viii
3 AUTOMATIC CREATION OF BANGLA ERROR CORPUS 46
3.1 Errors in Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.1.1 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.2 Experimental Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.3.1 Bangla POS Tagger . . . . . . . . . . . . . . . . . . . . . . . 59
3.3.2 Confidence Score and Mal-rule Filters . . . . . . . . . . . . 63
3.4 Result and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 66
4 BANGLA GRAMMATICAL ERROR DETECTION AND CORREC-

TION 71
4.1 Pruning of the Search Space . . . . . . . . . . . . . . . . . . . . . . 75
4.2 Selection of the Best Correction . . . . . . . . . . . . . . . . . . . . 75
5 EVALUATION 80
5.1 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.2 Standard Evaluation Metric . . . . . . . . . . . . . . . . . . . . . . 82
5.3 Test Suite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.4 Evaluation Methodology . . . . . . . . . . . . . . . . . . . . . . . . 84
5.4.1 Evaluation using Standard Metrics . . . . . . . . . . . . . . 84
5.4.2 Graded Acceptability Assessment Metric: . . . . . . . . . . 86
5.4.3 Complexity Estimation of Grammar Correction . . . . . . 89
6 CONCLUSIONS AND FUTURE WORK 107

6.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
A Examples of some interesting erroneous sentences corrected by the

system. 111
B Examples of incorrect prediction by the system. 112
C Examples of sentences having different complexity 113
ix
D Examples of sentences collected from literature domain 114
LIST OF TABLES
1.1 Examples of single preposition “with” having different types of

realization in Bangla . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 Percentage of various types of errors in Bangla . . . . . . . . . . . 13

2.2 Research on grammar checking in different languages . . . . . . . 17
2.3 A brief road map of grammatical error detection and correction
approaches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4 Syntax based grammatical error detection and correction approaches. 24
2.5 Effectiveness of Individual Features . . . . . . . . . . . . . . . . . 36
3.1 Examples of errors committed by a Bangla Second Language Learner 49

3.2 Examples of Transposition Operation. . . . . . . . . . . . . . . . . 56
3.3 Examples of Deletion Operation. . . . . . . . . . . . . . . . . . . . 58
3.4 Examples of Addition Operation. . . . . . . . . . . . . . . . . . . . 58
3.5 POS tags used in our tagger . . . . . . . . . . . . . . . . . . . . . . 61
3.6 POS tag distribution in our training and test corpus . . . . . . . . 61
3.7 Accuracy of individual POS tag using HMM . . . . . . . . . . . . 62
3.8 Three most common types of errors . . . . . . . . . . . . . . . . . 62
3.9 Experiment with confidence thresholds for generating erroneous
sentences generated by substitution operation . . . . . . . . . . . 66
3.10 Experiment with confidence thresholds for generating erroneous
sentences generated by transposition operation . . . . . . . . . . 67
3.11 Erroneous sentences generated from a single sentence and selected
according to the confidence score. . . . . . . . . . . . . . . . . . . . 69
3.12 Bangla Echo words and Hyphenated words. . . . . . . . . . . . . 70
3.13 Automatically collected collocated and co-occurred word sequences. 70
4.1 Example of Nominal Morphological Synthesis . . . . . . . . . . . 72

4.2 Example of Nominal Morphological Analysis . . . . . . . . . . . . 73
xi
5.1 Evaluation Measure Formulae . . . . . . . . . . . . . . . . . . . . . 82
5.2 True Positive, False Positive, False Negative and True Negative with
respect to grammatical error detection task. . . . . . . . . . . . . . 83
5.3 Performance evaluation of NLG based system on individual errors
as well as combined errors in five text genres. P indicates Precision
and R indicates Recall. . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.4 Grading Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.5 Features for estimation of grammar correction complexity . . . . 96
5.6 Complexity Score in different complexity level . . . . . . . . . . . 97
5.7 Correlation of complexity score with grammar checkers accuracy 103
xii
LIST OF FIGURES
2.1 Error localization by conventional and reverse dictionary [Chaud-

huri, 2002, 2001] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 The weighted SpellNet for 6 words . . . . . . . . . . . . . . . . . . 15
2.3 Structure of SpellNet for θ=1 . . . . . . . . . . . . . . . . . . . . . 15
2.4 Simplified functional diagram of grammatical error detection and
correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 Simplified functional diagram of a rule-based grammar checker . 20
2.6 Syntax tree generated by MS-NLP System . . . . . . . . . . . . . . 23
2.7 Examples of trigram sequences. . . . . . . . . . . . . . . . . . . . . 28
2.8 Basic architecture of post editing after machine translation. . . . . 34
3.1 Proportion of Errors in Native Speakers and Second Language Learn-

ers Corpus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2 Taxonomy of errors found Bangla text of second language learners. 50
3.3 Bangla Sentence Length Distribution. . . . . . . . . . . . . . . . . 55
3.4 POS tag association matrix. . . . . . . . . . . . . . . . . . . . . . . 59
3.5 Simplified functional diagram of automatic error corpora creation. 68
4.1 Generative model for well-formed and ill-formed sentence detec-

tion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.2 Example of Linguistic function . . . . . . . . . . . . . . . . . . . . 75
4.3 Pruned trellis after applying Linguistic Hard Constraints . . . . . 76
4.4 N-gram matching score between ungrammatical and correct sen-
tences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.1 Performance of error detection . . . . . . . . . . . . . . . . . . . . 85

5.2 Performance of error correction . . . . . . . . . . . . . . . . . . . . 86
5.3 Grades given by tester-1 and tester-2 in blind testing . . . . . . . . 88
5.4 Agreement between two testers during manual evaluation . . . . 89
xiii
5.5 Screenshot of active learning framework for estimation of text com-
plexity. The explanation of the feature names are available at
http://nlp.cdackolkata.in/testComplexity/FeatDtl.spy . . 104
5.6 Complexity values across different datasets . . . . . . . . . . . . . 105
5.7 POS Tag distributions in different domains. . . . . . . . . . . . . . 105
5.8 Frequency of word distribution across different domains. . . . . . 106
5.9 Complexity measure and Precision score obtained by NLG based
grammar checker and Naïve Bayes classifier systems. . . . . . . . 106
xiv
ABBREVIATIONS
ALEK Assessing Lexical Knowledge
ALEP Advanced Language Engineering Platform
AMT Amazon Mechanical Turk
APSG Augmented Phrase Structure Grammar
ASL American Sign Language
BLEU BiLingual Evaluation Understudy
CALI Computer Assisted Language Instruction
CALL Computer Assisted Language Learning
CFG Context Free Grammar
CLEC Chinese Learners of English Corpus
CLEF Cross Language Evaluation Forum
CMM Complexity Measurement Metric
CSP Constraint Satisfaction Problem
EP Example Provider
ERG English Resource Grammar
ESL English Second Language
FLAG Flexible Language and Grammar Checking
FN False Negative
FP False Positive
GAAM Graded Acceptability Assessment Metric
GRADES GRAmmar Diagnostic Expert System
xv
HMM Hidden Markov Model
HOO Helping Our Own
HPSG Head Driven Phrase Structure Grammar
ICICLE Interactive Computer Identification and Correction of Language Errors
LFG Lexical Functional Grammar
LM Language Model
MAT Machine Aided Translation
MCQ Multiple Choice Questions
MI Mutual Information
ML Machine Learning
NGC Norwegian Grammar Checker
NLG Natural Language Generation
NLP Natural Language Processing
NLU Natural Language Understanding
OCR Optical Character Recognition
OT Optimality Theory
PCFG Probabilistic Context Free Grammar
PCSP Partial Constraint Satisfaction Problem
POS Parts-of-Speech
SMT Statistical Machine Translation
S-O-V Subject-Object-Verb
SP Suggestion Provider
SST Standard Speaking Test
SVM Support Vector Machine
S-V-O Subject-Verb-Object
TN True Negative
xvi
TP True Positive
WER Word Error Rate
xvii
NOTATION
σ2 Variance
γ Brevity Penalty
µ Mean
Ω Complexity Score
xviii
TRANSLITERATION KEY USED IN THE DISSERTATION
xix
CHAPTER 1
INTRODUCTION
“Grammar has sometimes been described as the Art of speaking and writing correctly.
But people may possess the Art of correctly using their own language without having any
knowledge of grammar. We define it therefore as the Science which treats of words and
their correct use”. – Alfred [1894]
Most people are fluent in speaking language but their writing skill is appalling
because of their lack of grammatical knowledge and oversight at the time of
writing. From naïve users to professional writers, most people are vulnerable
to the curse of grammatical mistakes [Leacock et al., 2010]. Thus casual spoken
language for communication differs from formal written text. Written language
has become more or less a prerequisite for daily communication. Moreover, written
communication leaves a deep impact on education. In the context of our everyday
use of some editing work environments, the need of automatic grammatical error
detection and correction cannot be overlooked. We can remember Socrates in this
context, “Correct language is the prerequisite for correct living”.
With the advancement of computational algorithms the expectation of people is
also increasing day by day. Rather than depending only on mechanical assistance,
we are now seeking intellectual support as well. Zillions of people deal with texts
throughout the world without having proper knowledge about the language and
many of them are not necessarily a native speakers. Most of them use spelling
correction tools when writing documents on a computer. These tools provide a first
step towards writing correct text by saving large human intervention. The second
step of high quality text generation is grammar checking. Grammar checking
is essential for several reasons. It improves the quality of text, saves time, and
supports learning of the language. This tool not only helps native speakers but
also helps the second language learners to communicate in other languages. As a
whole, the system plays a pivotal role in Computer Assisted Language Learning
(CALL). Its function can also be encapsulated as a post processor component of
Machine Translation (MT) and Optical Character Recognition (OCR) system.
1.1 Motivation
A lot of work has been done in grammatical error detection and correction, mainly
in English language. Very little work has been done for Indian languages. Prob-
ably, Punjabi grammar checker [Gill and Lehal, 2008] is the first and only system
developed for an Indian language. Our interest is in developing grammar checker
for Bangla language. Bangla is the sixth most widely spoken language in the
world [Lewis and Paul, 2009] and the second in India. It is the national language
of Bangladesh. This language belongs to the Indo-Aryan family and originated
from Prakrit which is a sister language of Sanskrit. Sister languages of Bangla are
Oriya, Magahi and Maithili in the west and Assamese in the north east of India
[Chatterjee, 1926]. Bangla, Oriya and Assamese are the eastern most languages
of the Indo-European family of languages. Compared to languages like English,
Bangla is a morphologically rich [Bhattacharya et al., 2005; Dandapat et al., 2004]
language and has relatively free word order. It follows a Subject-Object-Verb (S-O-
V) pattern but orientation of these three units is flexible, i.e. S-V-O is allowable but
not popularly used. Till now no significant research and development has been
2
done on grammatical error detection and correction of morphologically rich and
free word order languages like Bangla. To the best of our knowledge, ours is the
first work in India relating to Bangla grammar correction.
1.2 Divergence Issues
Though previous studies have revealed commonalities in types of errors commit-
ted by second language learners of different languages, some novelties are found
with respect to error production in individual language. It has been widely studied
that the divergence between a pair of languages has a profound effect on various
fields of NLP [Dorr et al., 2002; Dave et al., 2001; Goyal and Sinha, 2009]. The diver-
gence between the two languages influences the kind of mistakes second language
learners typically commit. Previous studies have revealed that second language
learners of English having mother tongue as Japanese, Chinese, Korean or Russian
produce article errors due to divergence between English and those languages that
do not have any article [Leacock et al., 2010]. Therefore article selection is specif-
ically problematic for speakers of those languages. Similarly, divergence issues
between Bangla and other languages also influence the kind of mistakes Bangla
second language learners typically commit in their text. Bangla does not have any
preposition as in English. Here prepositions are used either as postpositions after
noun or as nominal inflections [Bhattacharyya et al., 2011]. Bangla language may
have different representations of postpositions for the same preposition used in
English language. Table 1.1 shows a single preposition “with”, it has got different
types of realization in Bangla Language. These differences pose challenges to sec-
ond language learners of Bangla when using postposition and nominal inflections.
3
Table 1.1: Examples of single preposition “with” having different types of realiza-
tion in Bangla
English Bangla
A girl with beautiful eyes. sundara chokhera ekaTi meYe.
A boy with high fever. prachaNDa jbare AkrAnta ekaTi chhele
He wrote with a pen. se pena diYe likhechhila
Milkman mixes water with milk. dudhaoYAlA dudhera sAthe jala meshAna
1.3 Challenges
Grammatical error detection and correction of natural language is a difficult task,
as it deals with the full complexity of language at the time of identification of a
syntactic and semantic structure from input text [Dale, 2011].
Initially most grammar checkers were available as a part of word processors, but
nowadays, they are taking a rebirth as an individual identities. Though grammar
checker tools are already available for English and for other European languages,
they have not matured enough to guarantee correct results most of the time for
every error. They do not provide satisfactorily account for complexities of individ-
ual languages. Moreover, these systems have several limitations. One significant
issue is false alarm where correct constituents are indicted as incorrect which badly
affected the learners’ language acquisition process [Leacock et al., 2010]. Many a
time even ill-formed constructions are not inspected. Thus, in spite of automated
grammatical error detection and correction, manual reviewing is indispensable in
order to achieve high quality text. Therefore, there is a need for an efficient and
reliable grammar checker which can alleviate the potential problems of existing
systems.
Available grammar checkers for other languages are developed based on either
rule-based or Machine Learning (ML) based approaches. A rule-based grammar
4
checker checks the grammatical structure of sentences depending on morpholog-
ical and syntactic analysis. In rule-based approach, rules are manually designed
by linguists to recognize and rectify specific grammatical errors from parse tree
patterns. In Bangla, linguistically rich error correction rules and robust parsers
are not available till date. As Bangla allows free word order, state-of-the-art Con-
text Free Grammar (CFG) is not applicable [Shieber, 1985; Begum et al., 2008;
Bharati et al., 2010] here. Context Free Grammar (CFG) is basically a positional
grammar. It is true that Bangla has a dominant word order, which is SOV (i.e.
Subject-Object-Verb). However, alternative ordering of words are not only used
in literature or poetry but also found in day-to-day news articles. It has been seen
that news reporters often use this alternative ordering to emphasize the event of
occurrences. The evidence of free word order is very frequent in Bangla news cor-
pora and Bangla blogs. Thus, a parser that follows positional grammar is unable
to generate correct parse tree of a Bangla sentence. Free word order also yields
to structural ambiguity and increases the computational cost of the parser. Dis-
continuities (words that belong together but not placed into the same phrase) and
long distance dependencies also pose problems for positional grammars [Bartha
et al., 2006; Covington, 1990]. These linguistic phenomena are quite common for
Indian languages. Due to these challenges current trends of parsing of relatively
free word order languages like Indian languages is based on Paninian framework
and Dependency Parsing techniques [Garain and De, 2013; Ghosh. A. and S., 2009;
Zhang and Nivre, 2011; Nivre, 2008]. It has already been reported in literature that
parser that follows Paninian frame work (designed for free word order languages)
perform well in asymptotic time complexity with the parser for context free gram-
mars (CFG) which are basically designed for positional grammar [Bharati et al.,
2010]. The parsing in Paninial model is based on karaka relations between verbs
5
and nouns in a sentence. It does not consider the position of the constituent during
parsing of a sentence. Thus, Paninian grammar enables parser to parse sentences
even when words are discontinuous and related to each other in long distance.
In contrast to rule-based approach, ML based approaches do not need such
handcrafted linguistic rules. This can alleviate the potential problem of rule-based
approaches. ML based grammar checkers rely on sufficiently large annotated
learners’ error corpora. There is a dearth of annotated error learner corpus of
Bangla text. One of the major problems of building such error corpus from learn-
ers’ data is that the process is very time consuming. It also requires linguistic
knowledge to examine each sentence of learners’ text to determine the nature and
frequency of errors.
To measure improvement in grammar checker performance, we need to devise
a method that can evaluate the functionality and usability of existing grammar
checkers. Over the last few years, most studies regarding grammatical error de-
tection and correction have focused towards the design and development aspects.
Very little attention has been paid to evaluation. Evaluation of such a system is
essential to validate whether the grammar correction methodology adopted by
the system is in the right direction. Performance of most of the existing grammar
checkers cannot be compared as different systems address different types of er-
rors. Moreover, these systems are not tested on a common dataset. Testing on a
common dataset is particularly problematic when different grammar checkers are
designed for different languages. Direct comparison is not possible since different
evaluation metrics are used by different researchers to indicate performance error.
These challenges have motivated us to examine new directions in evaluation.
6
1.4 Research Objectives
The primary goal of the thesis is to develop a grammar checker for Bangla language
with a reasonably good accuracy. Our aim is to contribute a novel grammatical
error detection and correction methodology for morphologically rich and free
word order languages. In this thesis, our focus is to correct postpositional and
nominal inflection errors which are the most frequent mistakes committed by
second language learners of Bangla language. We are also planning to provide
relevant examples for supporting the suggested correction. Though the thesis
deals with grammatical error detection and correction of Bangla language, our
proposed methodology can easily be re-engineered for other similar languages. To
address the broad objective we have identified the following goals:
• We investigate different types of errors done by native speakers and second

language learners of Bangla language and to provide taxonomy of errors.
• Due to unavailability of annotated learners’ error corpora, we propose an ap-

proach for automatic creation of a synthetic error corpus to mimic real world
errors. This auto generated sufficient error corpus helps during evaluation
of the system’s performance.
• We propose a new methodology for correcting the mistakes committed by

users and also to provide relevant examples for supporting the suggested
correction. To have an idea on how strongly we can rely on such a correction,
it will be useful to devise a measure of sentence complexity with respect to
the grammar correction task. If a sentence is complex, the user should not be
overtly reliant on the correction suggested by the system. Conversely if the
complexity measure is low, the user can confidently choose the suggestion.
• Finally, we plan to propose a novel evaluation strategy to assess the per-

formance of grammar correction systems even when these systems are not
tested on the same test set. Our aim is to formulate an innovative complex-
ity measurement metric for test data. Then we will find out the correlation
between complexity value and accuracy of the system tested on that test data.
7
1.5 Thesis Outline
In this Chapter, we have given brief description of grammatical error detection
problem. Challenges in this research problem are highlighted here. We have
looked at the motivation behind the research and identified the initial research
objectives that have directed the research. The rest of the thesis is organized into
chapters as follows:
Chapter 2 reviews recent prior work in grammatical error detection and correc-
tion. We do not aim to give a comprehensive review of the related work. Such
an attempt is extremely difficult due to the large number of publications in
this area and the diverse language dependent works based on several theo-
ries and techniques used by researchers over the years. Instead, we briefly
review the work based on different techniques used for grammatical error
detection and correction.
Chapter 3 describes our novel approach for automatic creation of Bangla error
corpus for training and evaluation of grammar checkers. Though the present
work focuses on the most frequent grammatical errors in Bangla written text,
a detailed taxonomy of grammatical errors in Bangla is also presented here,
with an aim to increase the coverage of the error corpus in future. Mistakes
committed by native speaker and second language learners are compared
here and reasons behind such errors in their text are also investigated.
Chapter 4 describes our procedure for automatic detection and correction of Bangla
grammatical errors using a Natural Language Generation based approach.
Practical issues pertaining to automatic detection and correction of grammat-
ical errors using this approach are discussed here. In this chapter, we also
8
discuss the scope and limitations of the proposed approach.
Chapter 5 deals with a novel evaluation methodology of grammatical error detec-
tion and correction to access its performance. A comparative study between
our grammar checkers based on NLG and a baseline system using Naïve
Bayes classifier is carried out to show the performance of our system. An
innovative complexity measurement metric is introduced here to alleviate
the need of standard test corpus for evaluation of the system. Performance
of individual grammar checker is assessed on texts having different complex-
ity and severity of errors. To evaluate the performance, correlation between
complexity of texts and accuracy of the system is also estimated.
Chapter 6 summarises the contributions of the research and concludes with future
directions for possible extensions of the current work.
Appendixes. Some appendixes have been added in order to cover the comple-
mentary details. More precisely, the list of included materials are:
Appendix A: Examples of some interesting erroneous sentences corrected
by the system.
Appendix B: Examples of incorrect prediction by the system.
Appendix C: Examples of some interesting erroneous sentences having dif-
ferent complexity and severity of errors, including those that are difficult
to correct even for human annotators.
Appendix D: Examples of complex sentences collected from Literature Do-
main.
9
CHAPTER 2
LITERATURE SURVEY
The main focus of the dissertation is on grammar correction task specifically for
Bangla language. It has been assumed that the input to our grammar correction
system is free from spelling errors. Thus in the literature survey, we have largely
focused on grammar checking techniques. However, misspelling also contributes
to the faulty construction of a sentence. Therefore, it has been felt that a brief
discussion about different aspects of spell checking is necessary in the context of
grammar checker. Interested readers can go through the cited references for more
details on spell checking.
2.1 Spell Checker
The main task of a spell checker is to find the appropriate word the author intended
to type given a misspelled word. A spell checker is used to correct spelling er-
rors in text, fixing the output of Optical Character Recognition (OCR)1 and Online
Handwriting Recognition (OHR). Often it also appears as a preprocessing com-
ponent of grammar checkers. Spell checkers ensure initial step towards effective
writing. Spelling errors in human writing are committed due to homophone2 and
when there is no neat mapping between structure and pronunciation of words.
According to Heift and Schulze [2007] language learners can commit misspellings
1
Examples of OCR generated errors are “rn” for ‘m’, ‘e’ for ‘c’ etc. [Lopresti and Zhou, 1997]
2
Examples of errors due to homophone are “it’s” for “its”, “dessert” for “desert”, “piece” for
“peace” etc.
due to either a misapplication of morphological rules or other influences from their
native languages or for incomplete knowledge of the morphology and phonology
for the language they have learned. For example, a learner may write “goed” and
“writed” in English for incomplete knowledge of English irregular verbs [Leacock
et al., 2010]. However, one can argue that this problem beyond the scope of spell
checking and need to be addressed as a part of grammar checking. Spelling er-
rors are classified into two types, namely, Non-Word error and Real-Word error
[Kukich, 1992]. A Non-Word error occurs when a misspelled word is not a valid
dictionary word. Conversely, a Real-Word error occurs when user writes a valid
dictionary word but it is not suitable in the context of the sentence. Examples of
a Non-Word error and a Real-Word error for a correct sentence “The boys ate their
toast” have been shown below:
Example of a Non-Word error: “The boys ate *thier toast”
Example of a Real-Word error: “The boys ate *there toast”
‘*’ indicates the error words in the sentences. Detection of Real-Word errors is com-
paratively more complex than detection of Non-Word Errors. Many a time context
words provide clue to detect Real-Word errors. An alternative classification of
spelling errors is (1) Orthographical error (also known as ‘Cognitive error’) and (2)
Typological error [Kukich, 1992]. The Orthographical error occurs when author
either simply does not know the correct spelling or forgets it during typing. Or-
thographical errors mostly generate strings with phonologically identical or very
similar to the correct word (e.g. “indicies” and “indices”). As a result, these errors
have a dependency with the spelling and pronunciation in a particular language.
Typological errors occur due to wrong hit of key sequences. Thus characteristics of
this type of errors depend on keyboard layout rather than the language in which
the word has been written.
11
Spelling correction happens in three stages viz. “Error Detection”, “Candi-
date Corrections Generation” and “Ranking of Candidates”. Structural similarity,
pronunciation similarity, syntactic and semantic context, exploiting knowledge of
sources (like keyboard, OCR, Speech-to-Text etc.) can be used to detect and cor-
rect spelling errors. Structural similarity between misspelled word and candidate
corrections is estimated using edit distance [Levenshtein, 1966]. To select the best
candidate correction, the minimum edit distance [Wagner and Fischer, 1974] is
preferred. Dynamic programming algorithm [Bellman, 1957] is used to calculate
minimum edit distance. For a good overview of edit distance please refers to
[Dasgupta et al., 2008; Jurafsky and Martin, 2009]. Pronunciation similarity is usu-
ally measured by Russel’s Soundex [Knuth, 1998] and Metaphone [Philips, 1990].
Noisy channel model, a probabilistic approach, is used for spelling correction. The
intuition of noisy channel model is to treat the misspelled words as an instance of
the correct words which have been passed through a noisy channel [Jurafsky and
Martin, 2009]. A comprehensive study of earlier spell checking techniques has been
discussed in [Kukich, 1992] and [Peterson, 1980]. There are various approaches
for Non-Word spelling corrections like Trigram Analysis [Angell et al., 1983], Error
Patterns [Yannakoudakis and Fawthrop, 1983], Triphone Analysis [van Berkel and
De Smedt, 1988], Noisy Channel Model [Kernighan et al., 1990], Using Context
[Agirre et al., 1998], String-to-String Edits [Brill and Moore, 2000] and Pronuncia-
tion Model [Toutanova and Moore, 2002]. To correct Real-Word errors approaches
like Trigram based [Mays et al., 1991], Noisy Channel Model [Mays et al., 1991],
Lexical Cohesion [Hirst and Budanitsky, 2005], Web as a Source of Information
[Whitelaw et al., 2009] and Using Confusion Sets [Golding, 1995; Golding and
Schabes, 1996; Mangu and Brill, 1997] have been discussed in the literature.
Works on spell checker development in Indian languages like Bangla [Chaud-
12
Table 2.1: Percentage of various types of errors in Bangla
Type of error Percentage

Substitution error 66.32
Deletion error 21.88
Insertion error 6.53
Transposition error 5.27
huri, 2002, 2001; Choudhury et al., 2007; UzZaman, 2005; Haque and Kaykobad,
2002; Uzzaman, 2004; Bhatt et al., 2005; Bansal et al., 2004], Assamese [Das et al.,
2002], Punjabi [Lehal, 2007], Marathi [Dixit et al., 2006] etc. are worth mentioning.
Here we are discussing two major works in Bangla Spelling correction.
Chaudhuri [2002, 2001] has analysed Non-Word patterns in Bangla hand-
written text. These patterns have been collected from samples of answer scripts of
students at various levels of studies like Secondary, Higher Secondary and Under-
graduate. For studying phonetic spelling errors, they have also collected samples
of dictated notes. These notes have been dictated from various topics chosen from
story, novels, books of science, geography, history etc. They have manually col-
lected the misspelled words from these texts. Illegible words and words of length
greater than four but having more than three errors have been rejected. They have
analysed the different types of spelling errors (substitution, deletion, insertion and
transposition) found in Bangla text. The percentages of such errors are shown
in Table 2.1. They have seen that most misspelling take place by omissions of
mAtrA3 and omission of vowel diacritical markers. Mistakes committed in Bangla
compound constants (called yuktAkShara) due to ignorance are also observed. In
Bangla, dental and cerebral nasal consonants are phonetically very similar. As a
result there is a chance of misspelling when proper spelling rules are not remem-
bered. Details of the error pattern analysis have been reported in [Chaudhuri and
3
mAtrA or shirorekhA is a horizontal line present at the upper part of many Bangla characters.
13
Kundu, 2000]. They have proposed two-stage techniques to detect and correct
Non-Word errors in Bangla text. The first stage takes care of phonetic similarity
error and the second stage takes care of errors other than the phonetic similarity.
The phonetically similar characters are mapped into single units of character code.
A new dictionary Dc is constructed with this reduced set of alphabet. They have
also constructed a reverse order dictionary Dr . In Dr , the characters of each word
are kept in reverse order. A phonetically similar but wrongly spelt word can be
easily corrected using Dc . Phonetically non similar misspelled words are searched
in both the dictionaries. If the word of length n is not found in Dc , then its first k1
characters are matched with words in this dictionary. The last k2 characters of the
same word are searched in Dr . A misspelled word with a single error is located in
the intersection region of the first k1+1 and last k2+1 characters. Figure 2.1 shows the
error localization by conventional and reverse dictionary. Candidate corrections
are suggested by searching in the conventional dictionary for those words start-
ing with first k1 characters and ending with last k2 characters. They have tested
Figure 2.1: Error localization by conventional and reverse dictionary [Chaudhuri,

2002, 2001]
their approach on 250k words and reported that all Non-Word errors are correctly
detected but false error detection rate by the system is 5%.
Choudhury et al. [2007] have investigated the difficulties involved in spelling
14
error detection and correction in Bangla, Hindi and English through the concep-
tualization of Spelling Network (named as SpellNet). This work is inspired by a
similar work of complex network approach [Albert and Barabási, 2002; Newman,
2003] and work on phonological neighbours’ network of words [Kapatsinski, 2006;
Vitevitch, 2005]. The SpellNet is a weighted network of words, where the nodes
represent the words and the weights of the edges indicate the orthographic simi-
larity between the pair of words they connect. The structure of a SpellNet is shown
in the Figure 2.2. They have focused on the networks at three different thresholds
Figure 2.2: The weighted SpellNet for 6 words
Figure 2.3: Structure of SpellNet for θ=1
(θ) of edge weights, that is for θ = 1, 3 and 5. They have studied the properties of
15
Complex Network at these θ values for the three languages. They do not consider
higher thresholds as the networks become completely connected at θ = 5. Thresh-
olded counterpart of Figure 2.2, for θ = 1 has been shown in Figure 2.3. It has
been seen that the orthography of the two Indian languages Bangla and Hindi are
highly phonemic in nature, in contrast to the orthography of English. They have
seen that probability of making a Real Word error in a language is proportionate to
the average weighted degree of SpellNet. They have reported that the probability
of Real-Word error is highest in Hindi followed by Bangla and English.
2.2 Grammar Checker
Grammar checking is a process that verifies morphology, syntax and semantics of
an input text. This verification process is executed by the Grammar Checker. The
grammar correction problem belongs to the field of Natural Language Process-
ing (NLP) which is a branch of Artificial Intelligence. Generation of text that is
syntactically and semantically correct is an important aspect in NLP and Natural
Language Generation (NLG). An automatic Grammar Checker is a computerized
writing aid that examines written text to detect and correct grammatical mistakes
and provides necessary feedback to the user. Figure 2.4 shows a basic functional
diagram of grammatical error detection and correction. In some of the reviewed
papers, and depending on the technique that is used, "correction" could be per-
formed, without any kind of "detection". In this figure, the dotted line indicates
this situation. Grammar checkers are one of the most widely used tools in the
area of language processing. Though most of the existing grammar checkers are in
English, grammar checkers for other languages are also available. Table 2.2 shows
research work carried out in languages other than English. In this chapter, rele-
16
Figure 2.4: Simplified functional diagram of grammatical error detection and cor-
rection
Table 2.2: Research on grammar checking in different languages
Languages Authors
Afan Oromo Tesfaye [2011]
Basque Uria et al. [2009]
Chinese Liu et al. [2008]
French Hermet et al. [2008]
German Schmidt-Wigger and Anje [1998]
Japanese Izumi et al. [2003]
Korean Young-Soog and Chae [1998]
Norwegian Bondi et al. [2002]
Punjabi Gill and Lehal [2008]
Spanish Lozano and Melero [2001]
Swedish Birn [2000]
17
vant research work in relation to grammatical error detection and correction are
surveyed. The aim of the chapter is to provide a brief idea regarding the existing
grammar checking techniques.
2.3 Automatic Grammar Correction Approaches
Uszkoreit [1996] (quoted in [Hein, 1998]), suggested a four level scheme for gram-
mar correction approaches viz.
i. Detection: deals with identification of possible ungrammatical segments.
ii. Recognition: deals with localization and identification of the probable vio-
lated constructions.
iii. Diagnosis: deals with identification of the possible sources of errors.
iv. Correction: deals with construction and ordering of the correct alternatives.
Different approaches have been taken for grammatical error detection and
correction by different researchers. Some researchers [Lozano and Melero, 2001;
Bredenkamp et al., 2000; Jensen et al., 1983] follow Rule-based or Parser based ap-
proach, some [Fujishima and Ishizaki, 2011; Izumi et al., 2003; Bigert and Knutsson,
2002; Knight and Chander, 1994] follow purely Machine Learning (ML) based em-
pirical approach, while others [Hermet and Désilets, 2009; Liu et al., 2008] prefer
Statistical Machine Translation (SMT) based approach. A brief road map of gram-
matical error detection and correction approaches is shown in Table 2.3. Research
in the field of grammatical error detection and correction started since the early
1978 [Weischedel et al., 1978]. From 1978 to 2002, most grammar checkers followed
rule-based approach. Since 2002, ML and SMT approaches have dominated over
18
Table 2.3: A brief road map of grammatical error detection and correction ap-
proaches.
Rule-based Approach
Nina H. MacDonald and Keenan. [1982]: String matching.
Jensen et al. [1983]: Parse fitting.
Douglas and Dale [1992]: Constraint Relaxation.
Bredenkamp et al. [2000]: Syntax-based.
Lozano and Melero [2001]: Syntactic and Semantic analysis.
Machine Learning based Approach
Knight and Chander [1994]: Decision tree classifier.
Scheler and Munchen [1996]: Neural Network.
Bigert and Knutsson [2002]: n-gram Language Model.
Izumi et al. [2003]: Maximum entropy classifier.
Yi et al. [2008]: Web counting.
Fujishima and Ishizaki [2011]: Support Vector Machine.
Statistical Machine Translation based Approach
Liu et al. [2008]: Noisy channel Model.
Hermet and Désilets [2009]: Round trip SMT technique.
rule-based approach. Based on these approaches research prototypes have been
built for different languages.
2.3.1 Rule-based Approach
In the rule-based approach, rules are manually designed by linguists to recognize
and rectify specific grammatical errors from parse tree patterns. A rule-based
grammar checking tool checks the grammatical structure of sentences depending
on morphological and syntactic analysis. At the time of morphological analysis,
individual words are mapped to their lexical components and necessary informa-
tion related to their lexical structures are returned. In syntactic analysis, a parser
is used to analyse sentence structure and build its structural representation. This
structural representation maintains grammatical relationship between the words
in a sentence [Rich and Knight, 1991]. The primary goal of a rule-based system
19
is to parse ill-formed sentences in order to detect and correct the errors in a sen-
tence. Instead of using parsers, some rule-based diagnostic approaches consult
a list of errors. Related rules of errors are grouped together for identification
of such errors. Figure 2.5 shows a simplified functional diagram of rule-based
Figure 2.5: Simplified functional diagram of a rule-based grammar checker
grammatical error detection and correction system. Early works in grammatical
error detection and correction were based on pattern matching or rule-based tech-
niques. At that time, rule-based approach only used to depend on hand-crafted
heuristic rules. But later, improvement has been done using computational gram-
mar like Precision Grammar [Bender et al., 2004], Lexical Functional Grammar
[Dalrymple, 2001], Constraint Grammar [Karlsson, 1990a], Head Driven Phrase
20
Structure Grammar [Proudian and Pollard, 1985], Tree Adjoining Grammar [Joshi
et al., 1975], Augmented Phrase Structure Grammar (APSG) [Heidorn, 1975], etc.
Besides this, smart parsing techniques like Constraint Relaxation [Fouvry, 2003;
Vogel and Cooper, 1995; Bolioli et al., 1992] and Parse Fitting Jensen et al. [1983]
are also employed for efficient grammar correction. Unix Writer’s workbench
[Nina H. MacDonald and Keenan., 1982] was one of the oldest and widely used
grammar checkers which was based on a string matching algorithm rather than
grammatical processing. But later, CorrectText (Houghton Mifflin Company)
and Grammatik (Aspen Software) introduced some amount of linguistic analy-
sis, while FLAG [Bredenkamp et al., 2000], VIRKKU, MS-NLP [Heidorn, 2000;
Lozano and Melero, 2001], EasyEnglish [Bernth, 1997], NGC [Bondi et al., 2002],
Grammatifix [Arppe, 2000; Birn, 2000] etc. systems carried out detailed linguis-
tic analysis. EPISTLE [Heidorn et al., 1982], Critique, GramCheck [Bustamante
and León, 1996], SCRIPSI [Catt and Hirst, 1990] etc. are examples of some exist-
ing grammar checkers that follow constraint relaxation technique. Today’s open
source systems like AbiWord (http://www.abisource.com) uses linguistic gram-
mar rule for grammar checking. VP2 [Schuster, 1986], the Intelligent Language
Tutor [Schwind, 1988] and Automated German Tutor [Weischedel et al., 1978] use
linguistic tools like POS tagger and rule-based parser that depends on relatively
small grammar for target specific errors.
Now we will discuss different rule-based strategies adopted by different sys-
tems in brief.
21
Syntax-Based
MS-NLP [Heidorn, 2000; Lozano and Melero, 2001] is a rule-based system that
analyses English language. It is used as a grammar checker component in Mi-
crosoft Word. The main focus of this system is to detect and correct the specific
types of errors made by native speakers such as subject verb disagreement, num-
ber disagreement, etc. The grammatical error detection process of this system
consists of four stages. In the first stage, the input text is tokenised into individual
words. Then these tokens are passed to the morphological analyser for analysis of
individual components. The second stage of processing is known as “sketch” since
it provides the basic syntactic parsing of the input sentence. The system uses Aug-
mented Phrase Structure grammar [Heidorn, 1975], consisting of a set of binary
phrase structure rules. The third stage is known as “portrait”, informally known
as reattachment. The parsed tree produced by “sketch” is refined in this stage
to produce a more accurate tree attachment of constituents such as prepositional
phrases, relative clauses, or infinitive clauses. Syntax tree generated by MS-NLP
system may look like as shown in Figure 2.6. The fourth stage is known as “logical
form”. A semantic graph is produced in this stage to display the basic semantic
relations underlying the syntactic tree. The MS-NLP system consists of a seman-
tic analyser known as MindNet which is used for word sense disambiguation.
Michaud and Mccoy [2001, 2000] proposed an Interactive Computer Identification
and Correction of Language Errors (ICICLE) system to improve the literacy of
American Sign Language(ASL) signers. This system analyzes the grammatical
errors of a text written by deaf students and enables them to generate appropri-
ate text by a tutorial dialog. Feedback is given instead of providing corrections.
The system uses Context Free Grammar (CFG) augmented with error-production
22
Figure 2.6: Syntax tree generated by MS-NLP System
rules known as mal-rules. Mal-rules precisely describe expected error forms and
focuses on previously known error patterns. Mal-rules parse the ungrammatical
sentences and trigger error flag on successful parsing of an erroneous sentence
and use annotation to indicate the types of errors in the sentence. An example
for mal-rule for detecting “be” verb deletion error for an English sentence “The
boy honest” may look like VP(error+) → AdjP where the conventional context free
rule is written as VP → V AdjP. Similarly to handle missing subject in the English
sentence “The honest”, mal-rule can be formulated as S(error+)→ VP [Schneider
and McCoy, 1998].
Norwegian Grammar Checker (NGC) [Bondi et al., 2002] uses Constraint
Grammar [Karlsson et al., 1995] to detect a wide range of grammatical errors.
The system contains a preprocessor which performs spell checking on the input
text and has three other major components. These components are Morphological
Analyser, Constraint Grammar Disambiguator and Error Detector. Initially, an in-
23
Table 2.4: Syntax based grammatical error detection and correction approaches.
Authors Grammar/Rules and Techniques

Lin and Su [1995] Phrase Level Building (PLB) parsing
Bernth [1997] English Slot Grammar [McCord, 1980]
Combinatory Categorial Grammar
Park et al. [1997]
[Steedman and Baldridge, 2005]
Phrase constituent rules and local error rules.
Hein [1998] Chart parser and Chart scanner
[Jurafsky and Martin, 2009].
English Resource Grammar
Frank et al. [1998] (http://www.delph-in.net/erg/) and XLE
(http://www2.parc.com/isl/groups/nltt/xle/)
Bredenkamp et al. [2000] Trigger and Confirmation rule
Precision grammar and English Resource Grammar.
Bender et al. [2004] Linguistic Knowledge Builder (LKB)
(http://moin.delph-in.net/LkbTop)
ParGram English Lexical Functional Grammar
Khader et al. [2004]
(http://pargram.b.uib.no/)
Uria et al. [2009] Constraint Grammar [Karlsson, 1990b]
put sentence is tokenized and spell checked. Then a morphological analyzer and a
POS tagger are used to provide necessary lexical attributes and POS tags. Then the
“Constraint Grammar Disambiguator” filters out the improper tags depending on
the grammatical context. Finally, the Error Detector module detects grammatical
errors depending on the linguistic inputs provided by the previous two modules.
Table 2.4 shows that other researchers follow the similar Rule-based techniques
for grammatical error detection and correction but their methodologies differ de-
pending on the grammar and language processing tools they have used.
Constraint Relaxation
In Constraint Relaxation technique, broad classes of errors are detected by relaxing
constraints in an unification framework [Fouvry, 2003, 2000; Lascarides et al., 1996].
In this approach, the constraints such as subject-verb agreement are gradually
24
relaxed until the sentence can be parsed completely. Corrections are suggested after
examining the violated constraints. Degree of error in a sentence is determined by
the order of relaxation.
IBM’s EPISTLE [Heidorn et al., 1982] system performs complete linguistic anal-
ysis using rule-based grammar and parser built on that grammar. This system
checks both the grammar and style of English written texts. Grammar checking
module takes care of the improper agreement between subject and verb whereas
the style checking module points out problems regarding excessively complex sen-
tences. A constraint in an EPISTLE rule may looks like:
NP VP (NUMB.AGREE.NUMB(NP)) → VP(SUBJECT=NP).
For analysing a text this system follows three levels like Word processing, Gram-
mar checking and Style checking. At Word processing level system does efficient
dictionary lookup and also deals with suffixes and prefixes. This dictionary look
up procedure returns necessary attributes of words along with the POS tag for
further processing. According to the English grammar rules, a general language
processing system attempts to parse each sentence in order to check the grammat-
ical construction of the sentence. Those sentences will be parsed successfully that
follow the specified grammar rules (constituent class patterns) along with the im-
posed constraint (restrictive conditions on those patterns) on clauses. On the other
hand, unsuccessful sentences are parsed again by relaxing some of the conditions
and with some additional rules. The relaxed conditions and the corresponding
problematic constituents of the sentence are noted to provide the indication and
information of grammatical errors. The parse trees developed during grammar
checking are utilized later by Style processing module to detect probable stylistic
problems in the sentence.
25
CRITIQUE is a text processing system which checks grammar as well as style
using a broad-coverage PLNLP English Grammar [Jensen et al., 1993]. The system
also follows constraint relaxation technique.
Fliedner [2002] proposed a rule-based system to detect NP agreement errors
in German. He has used a shallow parsing based on finite state automata in
combination with constraint relaxation and a method for parse ranking based on
Optimality Theory [Smolensky and Legendre, 2006]. Parse ranking is used to
select a best parse or a small number of best parses from a ‘parse forest’, which is
especially important when grammatical constraints are relaxed, as the number of
possible parses may become quite large.
Other researchers like Douglas and Dale [1992],Dini and Malnati [1993] and
Schwind [1990] also follow Constraint Relaxation technique for grammatical error
detection and correction.
Parse Fitting
A parse fitting procedure proposed by Jensen et al. [1983] to “fit” together pieces
of a parse-tree when the parser fails to generate the complete parse tree for a
given input sentence. This technique generates a reasonable approximate parse
tree when the rules of a conventional syntactic grammar fail to parse an input
string. The approximate parse tree can serve as input to the remaining stages of
processing. When a bottom-up parsing fails to produce a start(S) node to cover
the string then the fitting procedure begins. The by-product of this unsuccessful
bottom-up parsing is recorded for inspection of various segments of the input
string from error detection and correction perspectives.
26
Mellish [1989] has presented a generalized parsing strategy based on an active
chart which can diagnose errors in sentences. His proposed technique applies a
top down parser when the bottom up parser fails to produce the complete parse
tree. This is done so that the top down parser can examine the pieces of parse
constituents of the bottom up parser and provides a suggestion where the bottom
up parser might have failed to parse the sentence.
2.3.2 Machine Learning Approach
ML is a field of study of algorithms which predict unknowns from observed
data using inductive inference. Inductive inference provides information about
statistical phenomena and generalizes conclusion from specific examples. These
generalized models help to predict the future data. According to Mitchell [1997],
“A computer program is said to learn from experience E with respect to some class of
tasks T and performance measure P, if its performance at tasks in T, as measured by P,
improves with experience E”. In the field of ML, for grammatical error detection and
correction some of the researchers use Language Modelling approach Hermet and
Désilets [2009]; Bigert and Knutsson [2002]; Chodorow and Leacock [2000], some
of them [Knight and Chander, 1994; Gamon et al., 2008; Izumi et al., 2003] prefer
classification based approach and rest prefer web counting method. Now we will
discuss these aforementioned three approaches in brief.
Language Modeling
A Language Model (LM) is basically a probability distribution over all possible
word sequences of a language. LM is used to predict the next word depending
on the previous history [Jurafsky and Martin, 2009]. Probability assigned to a
27
word sequence of a particular language is indicative of likelihood of this sequence
being uttered by a speaker of that language. From the training corpus, LM gathers
statistical knowledge which is used to estimate the probability of a sentence. The
probability of a sentence containing word sequences w1 , w2 , w3 , · · · , wn can be esti-
mated by decomposing it into a series of product of conditional probability using
chain rule as follows:
P(w1 , w2 , w3 , · · · , wn ) = P(wn1 )
= P(w1 )P(w2 |w1 )P(w3 |w21 ) · · · P(wn |wn−1

1 ) (2.1)
∏
n
= P(wi |wi−1
1 )
i=1
But P(wn |wn−1

1
) is difficult to compute due to sparseness of data in the training
corpus. To resolve this problem, the probability of a word wn given all the previous
words can be approximated by the probability given only the previous N words.
This N-gram approximation to the conditional probability of the next word in a
sequence is written as: P(wn |wn−1

1
) ≈ P(wn |wn−1
n−N+1
)
Figure 2.7 shows examples of trigrams for English sentence “Ram is a good boy”.
Figure 2.7: Examples of trigram sequences.
LM can be used to differentiate ill-formed sentences from well-formed sen-
tences depending on the probability scores of the sentences. If the probability
score is below some predefined threshold value then the sentence is considered
28
as ill-formed sentence, otherwise the sentence is grammatically correct. Many
researchers prefer POS tag sequences rather than word sequences. N-grams of
POS tags have many useful properties. Some of the features of the language it-
self are captured by n-grams as they are extracted from a corpus representing the
language. The extracted features contain only local information due to the limited
scope of an n-gram. As each of these n-grams describes an allowable sequence of
n POS tags, they represent a small acceptance grammar of the language.
Bigert and Knutsson [2002] have proposed a robust probabilistic method for
detection of context-sensitive errors. Initially, input sentence is tagged using a POS
tagger. N-gram constituents are collected from resulting tag sequences and then
the occurrence frequency of each n-gram is fetched from the n-gram frequency
table. If the frequency is greater than a predefined threshold value then this con-
struction is considered as grammatically correct. Otherwise, it is considered as
grammatically incorrect because rare constructs are relatively improbable. How-
ever, due to the sparseness of the tags participating in the n-gram, sometimes it
may happen that an n-gram representing an acceptable grammatical construct may
not be encountered in the training data. To mitigate this problem, they have built
a confusion matrix which is a matrix of syntactic distance between POS tags. This
matrix contains information about how suitable one tag is in the context of another.
This information is utilized at the time of replacing one tag with the other. A rare
tag is substituted with a tag of higher frequency suitable in the same context. If tag
t1 is substituted with tag t2 , then t2 is called a representative for t1 . Though a list
of feasible representatives can be easily produced, problem lies in ordering these
representatives. Furthermore, all representatives are not equally appropriate in the
given context. For these reasons they have introduced a weight. Representative
list is built using distance between two tags and to measure this distance, L1-norm
29
and POS tag n-grams are used. The process of distance calculation is explained
below:
Initially, to obtain a fair comparison between tags of different frequency, normal-
ization of the trigram (tL , t, tR ) is calculated as follows:
f req(tL , t, tR )
n(tL , t, tR ) = (2.2)
f req(t)
If t′ is the replacement tag for the tag t and tL and tR are two context surrounding
the tag t. Then the distance is calculated by measuring the difference between the
normalized frequencies as:
disttL ,tR (t, t′ ) = |n(tL , t, tR ) − n(tL , t′ , tR )| (2.3)
Finally, all POS tag contexts are considered and the generalized equation becomes:
∑
dist(t, t′ ) = disttL ,tR (t, t′ ) (2.4)
tL ,tR
Distance dist(t, t′ ) calculated using this formula ranges from 0 to 2. When the con-
texts are identical then the vale is 0 and when the uses of t and t′ are disjunct then
the value is 1. The probability p(t, t′ ) of replacing t with t′ is calculated depending
on this distance. When the probability is less than 1, a penalty is introduced as
the tag t′ is less appropriate than tag t in this context. By substituting the tag with
its representative tag and maintaining the similar syntactic structure, their algo-
rithm detects less-frequent grammatical constructions and attempts to transform
them into more-frequent constructions. Even after this transformation, if mod-
ified construction is also a low-frequency construction then the text is expected
to contain an error. A robust rule-based phrase and clause detection modules is
used to avoid false alarm generated by the system. Their algorithm utilizes the
30
information of clause boundaries where clauses are used as the unit for error de-
tection algorithm to operate on it. For the detection of clause boundaries they have
implemented Ejerhed’s algorithm for Swedish [Ejerhed, 1999]. This algorithm is
based on context-sensitive rules operating on POS tags.
Chodorow and Leacock [2000] proposed an unsupervised method to detect gram-
matical errors by inferring negative evidence from the edited text corpus. They
have developed a statistical system known as Assessing Lexical Knowledge (ALEK).
ALEK was trained on a general purpose corpus of English edited text containing
examples of sentences of the target word. Depending on differences between
word’s local context cues, the system identifies inappropriate usage. ALEK infers
negative evidence from the contextual cues that do not co-occur with the target
word. The system collects contextual cues in a ±2 word window around the target
word. Function words (closed-class items) and POS tags are the two kinds of
contextual cues used by the system. Initially, sentences have been tagged using
POS tagger and then the frequency of sequences of adjacent POS tags and function
words are counted. For example, in the sentence “a/AT tall/JJ man/NN”, the occur-
rence frequency of the bi-gram sequences AT+JJ, JJ+NN, a+JJ, and unigram count
of individual POS tags and functional words are calculated. These frequencies
are the basis of their error detection measure. To determine the unusual and rare
combination of POS tags and functional words, ALEK computes Mutual Infor-
mation (MI) based measure. MI based measure is used to find combinations that
occurs less often than expected. Usually n-gram probabilities of ungrammatical se-
quences are much smaller than the product of the unigram probabilities. Then the
value of MI becomes negative. Thus a negative value of MI often indicates that a
syntactic rule is violated. The experimental result shows that ALEK performs with
80% Precision and 20% Recall. Powers [1997] explored the concept of Differential
31
Grammar and applied bigram frequency of POS tag sequences to discriminate be-
tween ill-formed and well-formed sentences. An empirically established threshold
value is used to decide the error indication of bi-gram. A Differential Grammar
can be defined as a small set of environments that helps to distinguish between a
pair of confused words in all contexts. According to Powers [1997] the definition of
Differential Grammar is “A minimal set of syntactically significant environments that
differentiate amongst a set of possible targets.” A Differential Grammar is not actually
a linguistic grammar; it is basically designed to discriminate a token from a set
of confusable alternatives based on most likely occurrence in a given context. It
does not have a concept of rule like traditional rule oriented grammar but rather
it is very simple, more specialized and lexically-focussed. In order to differenti-
ate between correct target word and one or more incorrect confused words, this
grammar utilizes high-order N-gram statistics. The n-gram contexts are reduced
based on high frequency important tokens like words, numbers, punctuation and
affixes.
Henrich and Reuter [2009] used n-gram based statistical approach for lan-
guage independent grammar checking. For checking a sentence their extraction
of n-gram starts with all pentagrams of tokens for the whole sentence. Then the
process continues with the corresponding quadrigrams of tokens, going on with
the trigrams of tokens and so on. If an n-gram is not found in the database, it is as-
sumed that this n-gram is wrong. An error level is calculated corresponding to the
number of n-grams which are not found in the database. The smallest erroneous
n-gram finally points to the error in the input text. All these errors are summed
up and the result is compared to an overall error threshold. If it is higher than the
threshold, then the sentence is marked as wrong. They used wildcard (*) in the
erroneous n-gram for finding most probable n-gram sequence from the training
32
database. They also stored temporal adverb-verb and adjective-noun agreement
for statistical analysis of the agreement of a temporal adverb with the tense of the
corresponding verb and the agreement of an adjective to its corresponding noun.
Nagata et al. [2004] proposed a simple statistical model based on conditional
probabilities of articles for detecting article errors committed by Japanese learners
in English text. Their model detects article errors based on three head words:
head verb (v), preposition (prep) and head noun (n). Initially, from the input
sentence three head words are extracted. Then all the head words are reduced
to their stem/root form and also converted to lower case. Then a quadruple like
(I1,v,prep,n) is prepared. Now probability of a particular article Ii given v, prep and
n is calculated as follows:
f (Ii , v, prep, n)
P(Ii |v, prep, n) = (2.5)
∑
k
f (Im , v, prep, n)
m=1
where symbol f denotes the frequency of occurrence of a particular tuple and k is
the total number of articles. To estimate which article class is relatively low in a
particular tuple they formulated an equation as
P(Ii |v, prep, n)

S(Ii , v, prep, n) = (2.6)
max P(Im |v, prep, n)
1≤m≤k
when a S(Ii , v, prep, n) is less than some predefine threshold θ(0 < θ < 1) then
an article error is detected. To avoid the sparseness problem during probability
estimation they used backed-off smoothing [Jurafsky and Martin, 2009] technique.
Their system achieved 77% Precision, 64% Recall and F-measure of 0.70 when they
set the threshold θ to 0.334.
33
Classification
In Classification based approach individual sentences are classified as being either
correct or incorrect using features extracted from training data. Classification ap-
proaches of different researchers differ from their use of features and classifiers
like Naïve Bayes [Mitchell, 1997], Balanced Winnow [Littlestone, 1988], Support
Vector Machine [Campbell and Ying, 2011], Voted Perceptron Freund and Schapire
[1999], Maximum Entropy [Berger et al., 1996], Decision Tree [Mitchell, 1997; Quin-
lan, 1986] etc.
Scheler and Munchen [1996] used a feature model of the semantics of plural de-
terminers to detect and correct grammatical errors of definiteness. They had used
an Artificial Neural Network to learn a function that maps the semantic feature
representation to category of indefinite/definite article. They have also provided
surface-oriented textual encoding of their text corpus to reduce the informational
content of the text without losing its essential components.
Knight and Chander [1994] used decision tree classifier over lexical features
for detection and correction of article errors in the Japanese to English machine
translation outputs. Figure 2.8 shows basic architecture of their post editing task.
A set of binary features was developed by them to characterize noun phrases.
Figure 2.8: Basic architecture of post editing after machine translation.
34
These binary features are either lexical or abstract which includes POS tags, plural
markers, tense and subcategories like superlative adjectives, mass nouns etc. To
build the decision tree, each feature maintains three types of measures. These
three types of measures are frequency of occurrence, distribution of a/an for noun
phrases in which the features are present and distribution for those without the
features. To choose the best feature an information-theoretic approach has been
taken. The decision tree is built depending on the datasets and the feature-based
split. Their post editing algorithm achieved an overall accuracy rate of 78% on
financial text.
Gamon et al. [2008] used a decision tree classifier along with a language model
for determining article and prepositional errors in a sentence. They used a language
model which was trained on the Gigaword corpus. The language model was used
to provide additional information to filter out invalid suggestions. Their system
has three main components: Suggestion Provider (SP), Language Model (LM)
and Example Provider (EP). Initially, an input sentence is tokenized and POS
tagged. Then these tokens are sent to the SP module which employs decision
tree classifier for providing suggestions. All suggestions from the SP module are
collected and sent to the LM. Here the suggestions are ranked based on probability
score assigned by the LM. Finally, the EP returns example sentences containing
suggested correction by using query in the web. This information is provided to
the user to choose the suggestion and to make an informed decision about the
correction. They achieved 55% accuracy for article error detection tested on 6K
CLEC and 46% accuracy for prepositional error detection tested on 8K CLEC test
corpus.
35
Table 2.5: Effectiveness of Individual Features
Feature %Correct
Word/POS of all words in NP 80.41
Word/POS of w(NP-1) + Head/POS 77.98
Head/POS 77.30
POS of all words in NP 73.96
Word/POS of w(NP+1) 72.97
Word/POS of w(NP[1]) 72.53
POS of w(NP[1]) 72.52
Word/POS of w(NP-1) 72.30
POS of Head 71.98
Head’s Countability 71.85
Izumi et al. [2003] used Maximum Entropy classifier for handling insertion,
omission and replacement errors. They have tested their model on 1915 sentences
collected from Standard Speaking Test (SST) corpus and have achieved approxi-
mately 50% Recall and 60% Precision using this approach. Han et al. [2004, 2006]
have trained a Maximum Entropy classifier to select among a/an, the, or zero ar-
ticle for noun phrases, based on a set of features extracted from the local context
of each. The system was trained on 6 million noun phrases from the MetaMetric
Lexical corpus. On an average, there were about 390,000 features in their Maxi-
mum Entropy model. The system was tested on 668 TOEFL essays and achieved
90% Precision and 40% Recall. Table 2.5 shows effectiveness of individual features
used by them. De Felice and Pulman [2007] propose a machine-learning based
approach to detect prepositional errors depending on a syntactic and semantic
context. They used a richer set of syntactic and semantic features. Their approach
suggests a preposition which is most likely to occur in that context. The context of
the prepositions which are found in an English corpus is represented by a vector
containing 307 features. They assumed that a set of 307 features may capture all the
latent elements of a sentence that may help to recognize a preposition accurately.
They selected these features based on a study of most frequent errors generated by
36
English learners. Head Noun, Number, Noun Type, WordNet information, Named
Entity information and ± 2 POS tag window are some examples of features they
have used. They also used additional features like whether the noun is modified
by a predeterminer, possessive, numeral and/or a relative clause or whether it is
part of a ‘there is · · · ’ phrase. To learn associations between contexts and prepo-
sitions these vectors are processed by a voted perceptron algorithm. Artificially
created test set containing preposition errors were used to test their system. They
found that their system can successfully detect between 76% to 81% of errors. Later
De Felice and Pulman [2009] used Maximum Entropy classifier for correction of
propositional errors in English of second language learner’s writing. The feature
set used by them contains a wider range of syntactic and semantic elements, includ-
ing a full syntactic analysis of the data. Their system achieved average Precision
of 42% and Recall of 35%. Fujishima and Ishizaki [2011] proposed a method to
identify inappropriate word combinations in a raw English corpus using an unsu-
pervised algorithms based on the One-Class Support Vector Machine (OC-SVMs).
Combined with n-gram language models and document categorization technique,
their OC-SVM classifier classifies a sentence into ill-formed or well-formed class.
Oyama and Matsumoto [2008] also proposed a similar approach. They combined
n-gram features and supervised document categorization technique based on the
hard margin SVM to find the learners’ error in Japanese text. But Fujishima and
Ishizaki [2011] followed an unsupervised technique to reduce the required com-
putational cost. To compare their approach, they built a supervised SVM classifier
following the approach taken by Oyama and Matsumoto [2008] and reported that
using their unsupervised algorithm they achieved almost the same prediction ac-
curacy as the supervised learning algorithm. They tested their system on 3155
selected erroneous sentences and achieved accuracy 79.30% with bigram model,
37
86.63% with trigram model and 34.34% with quadrigram model. They found that
the classification accuracy using quadrigram model is lower than trigram model.
Web Counting
Empirical NLP systems rely on a large sized corpus of text in order to resolve
ambiguity. The corpus helps to determine which candidate is more frequent than
other in similar contexts. The accuracy of disambiguation process improves with
the increment of size of the corpus [Banko and Brill, 2001]. As the World Wide
Web (WWW) is the largest corpus till now, many researchers incorporate web
frequency counts to identify and correct writing errors made by non-native writers
of English. To correct collocation and determiner error Yi et al. [2008] proposed a
web-based proof reading methodology. Initially, an input sentence is preprocessed
using POS tagger and chunker to identify the check points. These check points
depict the context around the determiner and collocation. To find the appropriate
examples from the web, queries are generated in three granularity levels (viz.
reduced sentence level, chunk level and word level) according to the syntax of a
sentence. Generally, number of queries depends on the number of target solution
set. To find the appropriate examples of determiner from the web, a query may
look like { Wi−2 Wi−1 null Wi+1 Wi+2 } or { Wi−2 Wi−1 a Wi+1 Wi+2 } or { Wi−2 Wi−1 an
Wi+1 Wi+2 } or { Wi−2 Wi−1 the Wi+1 Wi+2 }. Since long queries have fewer web counts
than short queries, each count is multiplied with the number of words in the
query. If the weighted count is very low then the web unable to provide enough
support to determine the existence of an error. Otherwise, ratio between weighted
count for query containing writer’s determiner and maximum weighted count for
query containing other determiner is calculated and compared to a predefined
38
threshold. If the ratio is smaller than the threshold then an error is flagged.
Evaluation of the system on a real world ESL corpus, reported 62% Precision and
41% Recall. Hermet et al. [2008] described a web-based frequency count algorithm
to detect and correct the prepositional errors in French language. They use a two-
phase hybrid approach combining rule-based and statistical approaches. In the
first phrase, a short expression using rule-based method is generated in order to
capture the context around the preposition in the input sentence. In the second
phase, web searching technique is used to evaluate the frequency of this expression
by considering alternative prepositions instead of the original one. They tested
this algorithm on a corpus of 133 French sentences written by intermediate second
language learners and they achieved 69.9% accuracy. They have also reported that
when a corpus of frequent n-grams is used instead of the web, the performance of
their system degrades.
2.3.3 Statistical Machine Translation Approach
Some researchers have used monolingual Statistical Machine Translation (SMT)
paradigm to detect and correct grammatical errors in text. They have trained
parallel corpora of grammatical and ungrammatical sentences and translate from
ungrammatical to grammatical sentences using their SMT system. Brockett et al.
[2006] showed that a noisy channel model (instantiated within the paradigm of
SMT) can successfully provide editorial assistance for non-native writers. SMT
technique provides a natural mechanism for suggesting a correction, rather than
simply indicating an error flag. Their system is able to correct 61.81% of mistakes in
a set of naturally occurring examples of mass noun errors found on the World Wide
Web. Liu et al. [2008] proposed a noisy channel model along with a novel relative
39
position language model for correcting word order errors in sentences produced
by second language learners of Chinese. To detect word order errors, they used
SVM classifier whereas for correcting those detected errors they followed a noise
channel model. For a given erroneous sentence E having word order errors, their
model tries to find out the most probable corrected sentence using equation 2.7.
Ĉ = argmax P(C|E)
c
(2.7)
= argmax P(E|C).P(C)
c
Here, C represents a corrected sentence, P(C|E) is the reordering model and P(C) is
the probability of corrected sentence. The probability of C is estimated using a lan-
guage model derived from a large corpus of correct sentences. A weighted relative
position score is used as a language model P(C) to circumvent the limitation of cap-
turing long distance lexical relationship by an usual n-gram language model. The
reordering model estimates the transformation probability of a reordered sentence
for a given input sentence. For this model, they used probability of C generated by
PCFG (Probabilistic Context Free Grammar) as a structural transformation proba-
bility. Experimental result shows that the overall accuracy of their error detection
module is 96.7%. They used BLEU([Papineni et al., 2002]) score as the evaluation
metric to evaluate the performance of their word order error correction module.
Their result shows that their error correction methodology outperforms the usual
n-gram based approach. They also found that the proposed system’s performance,
in term of BLEU score, can be improved by 20.3% and 26.5% when compared to
n-gram and SMT-based base system, respectively. Hermet and Désilets [2009]
proposed a “round-trip” bilingual SMT technique to correct preposition errors in
French learner writing. A writer’s L2 language is translated to the writer’s L1
40
language and then back to L2. When the round-trip MT system encounters an ill-
formed chunk in L2 language, it makes a word-by-word translation of that chunk.
Afterwards, when the system tries to translate L1 to L2, it produces a better L2
translation of that chunk than original L2 sentence. Thus using round-trip trans-
lation, errors present in the L2 sentence have been repaired. They tested their
methodology on 133 French sentences containing prepositional error and reported
66.4% accuracy using their round-trip SMT method. The performance of their
Round Trip SMT method was slightly worse than their web-count [Hermet et al.,
2008] method. Later, they proposed a hybrid method combining the round-trip
SMT and web-count. In this hybrid model, round-trip SMT works as a back-up
when their generated query using web-count method got almost zero hit. Using
this hybrid model they achieved 82.1% accuracy.
2.4 Comparison between existing approaches
A rule-based system is based on core linguistic knowledge. It depends on hand-
crafted rules generated by language experts. In this approach, it is easy to incorpo-
rate domain knowledge into linguistic knowledge which provides highly accurate
results. Other than the above mentioned advantages, a rule-based system is easy
to understand. Thus, the user can easily extend the rules for handling new error
types. Rules can be built incrementally by starting with just one rule and then
extending it. Each rule of a rule-based system can be easily configured. A rule-
based system provides detailed analysis of the learner’s writing using linguistic
knowledge and provides reasonable feedback. Such feedback help learners to im-
prove their writing skill. Furthermore, the linguistic knowledge acquired for one
natural language processing system may be reused to build knowledge required
41
for a similar task in another system. Both grammatical and ungrammatical sen-
tences can be parsed using constraint relaxation. The errors in an ungrammatical
sentence can be easily identified based on the constraints which are relaxed during
parsing of the sentence. The main advantage of using mal-rules is the simplicity
with which they can generate feedback. High precision can be achieved by ap-
plying properly created constraints and mal-rules. The main disadvantage of the
Rule-based approach is that complexity of the grammar increases exponentially
as we try to solve different types of errors. Rule-based approaches need a lot of
manual effort. This increases cognitive load on the human analyst and also in-
creases the degree of ambiguity in the grammar. Moreover, constraint relaxation
technique is not well suited for parsing sentences with missing or extra words.
Constraints and mal-rules have to be pre-defined. Improperly designed mal-rules
due to casual observations of domain experts can also pose a problem. Shallow
parsing is preferable than parsing with Precision grammar when there is a dearth
of sufficient linguistic rules. It is difficult to detect the potential erroneous words
within an input sentence without using an explicit error model. Failure of parsing
does not always reliably ensure that the input sentence is ungrammatical because
the insufficient coverage of grammatical rules may also be a cause of unsuccessful
parsing. The efforts required for grammatical error detection and correction varies
depending on the involvement of the error types and the grammatical context in
which the errors occur. However, one of the main disadvantages of rule-based
approach is that it requires complete grammar rules to cover all types of sentence
constructions. Though varieties of grammar formalisms are available but till now
robust parsers with sufficient linguistic rules are not available. Moreover, existing
rule-based parsers suffer from the curse of natural language ambiguities which un-
necessarily produce more than one parse tree even for the correct input sentence.
42
These are the limitations of parsing strategy.
On the other hand, Machine Learning (ML) based approaches usually rely on
large sized training data and parallel texts. When the training set and the test
set are similar, then ML approach provides good results. Data sparseness poses a
problem for ML. Due to data sparseness, many grammatical constructs may never
have been encountered. As most of the time, ML based system does not provide
necessary comments on errors, users are usually surprised when system predicts
a correct sentence as wrong. Results of ML based systems are difficult to interpret.
Sometimes debugging the reasons of system’s failure becomes very complicated
because the results are generated by aggregating probabilities and frequencies.
Another problem is that, some ML based systems rely on threshold values which
are usually estimated heuristically. Threshold may vary depending on the domain
of text where the system is trained or tested. If an erroneous word interacts with
other erroneous words then the correction of either error cannot be done indepen-
dently. Moreover, if other errors lie within the context window of an erroneous
word, then the extracted features depending on that context window may also
contain some of these errors leading to unreliable classification. Unfortunately,
corpora that are used are not large enough to cover all range of lexical patterns
of a given language. That implies some lexical occurrences are left unexamined.
One solution is to use the World Wide Web as a linguistic corpus. An advantage
of web based grammar correction is that it is dynamic in nature. The web search
hits change with the change of language and also reflect the current state of the
language. Moreover, most of the contents of web are freely accessible. Inspite of
several advantages, it has lots of limitations. Kilgarriff [2007] correctly pointed
out several limitations of the Web Count approach for grammar correction. Firstly,
commercial search engines do not provide the root/stem or POS tag of the given
43
input sentence. Secondly, there are constraints on numbers of queries and numbers
of hits per query. Thirdly, search hits are for pages, not for instances. Last but not
the least, web count results vary for different search engines.
In contrast to Rule-based approach, SMT approach does not rely on handcrafted
complex linguistic rules or regular expressions. Therefore, little or no linguistic
expertise is required for developing and maintaining applications. SMT approach
heavily relies on the availability of large amount of parallel training sentences. The
expense and difficulty of collecting large quantities of raw and annotated learners’
parallel corpora pose an obstacle to this approach.
2.5 Open Problems and Future Directions
Despite existing approaches, reliable grammatical error detection and correction is
still a very difficult task. We cannot simply apply the existing approaches for our
Bangla grammatical error detection and correction task. As discussed in Chapter
1, Bangla is a morphologically rich language [Bhattacharya et al., 2005; Dandapat
et al., 2004] and has free word order. State-of-the-art CFG is not applicable [Shieber,
1985; Begum et al., 2008; Bharati et al., 2010] here. In addition to this, lack of robust
parsers, insufficient linguistic rules and dearth of error annotated parallel corpora
make this grammar correction task much more challenging.
We prefer Natural Language Generation (NLG) [Dale et al., 1990; Reiter and
Dale, 2000; Hovy, 1991; Dale et al., 1998] approach instead of Natural Language
Understanding (NLU) [Allen, 1987]. The main reason behind preferring this ap-
proach is that we need not model the ungrammatical sentences as has been done
in classification based or statistical machine translation based approach. Broad
44
coverage linguistic rules also are not required like a rule based system. This sys-
tem is suitable where robust parsers and linguistic rules are not available. The
NLG based approach maps non-linguistic representations to natural language ex-
pressions. Any system based on this approach identifies the main keywords in a
sentence and then reconstructs the sentence from these keywords. This technique
is suitable for erroneous sentences where major corrections are required. The as-
sumption behind this approach is that the user can supply the important key words
of the sentence, even if the user is unable to write a grammatically correct sentence.
It consists of two main steps. Initially, without considering grammatical errors and
other noises, NLG based system extracts a meaning representation from the in-
put sentence then from the meaning representation it generates a grammatical
sentence. Baptist and Seneff [2000] followed NLG approach for their conversa-
tional system named GENESIS. We have applied NLG approach for our Bangla
grammatical error detection and correction which will be discussed in Chapter 4.
45
CHAPTER 3
AUTOMATIC CREATION OF BANGLA ERROR
CORPUS
“A collection of texts assumed to be representative of a given language, or other subset of
a language, to be used for linguistic analysis.” – Francis [1982]
A sufficiently large error corpus is essential for training and testing of any
grammar correction methodology. There is a dearth of error-annotated learner
corpus of Bangla text. One of the major problems of building error corpus from
learners’ data is that the process is very time consuming. It also requires linguistic
knowledge to examine each sentence of learners’ text to determine nature and
frequency of errors. To overcome this problem, a corpus of ungrammatical Bangla
sentences has been created automatically considering performance errors and lan-
guage learning errors that occur frequently. This chapter is more closely aligned
to the task of automatic error corpora creation. Before starting our discussion on
automated error corpus creation methodology, we illustrate types of text errors
committed by Bangla Second Language Learners at the time of writing text.
3.1 Errors in Text
Bangla Second Language Learners often commit grammatical mistakes while writ-
ing text because of their lack of language knowledge (Language Learning Error)
and due to oversight, carelessness or tiredness (performance error). Performance
errors can occur mainly due to four operations: insertion, deletion, transposition
and substitution. When an error involves more than one operation, it is known as
Composite Error. There are two primary concerns at the time of automatic error
corpus creation, first one being linguistically realistic and the second one is to
mimic the error scenarios that happen normally. To analyse the kind of naturally
produced error scenario we have collected 1500 sentences from 10 standard native
students’ exam papers of Bangla and also have collected second language learn-
ers’ data from students whose first language is either Hindi or Oriya or Telegu.
Performance errors and language learning errors occurred in their text are then
carefully analysed. Exam papers are collected with the assumption that students
make more mistakes in the time of examination as they are usually in a hurry to
complete their answers within the limited time period. In the course of studying
Second Language Learners text, it has been found that the proportion of errors oc-
curred by substitution operation is much more than any other operation. We have
seen than substitution errors and deletion errors committed by second language
learners are 14% and 18% higher than native speakers. However, an interesting
observation was insertion errors committed by native speakers are much higher
(21%) than second language learners. The proportion of transposition errors com-
mitted by second language learners and native speakers are much less than any
other operation. Transposition errors committed by native speaker are slightly
higher (4%) than second language learners. After analysing unique words col-
lected from native speakers and second language learners real data, we found
native speaker committed 13.5% errors whereas percentage of errors committed
by second language is learners is 34%.
Figure 3.1 shows the proportion of performance errors caused by each of the
47
four operations. The Native Speakers and the Second Language Learners make
Figure 3.1: Proportion of Errors in Native Speakers and Second Language Learners
Corpus.
same kinds of mistakes such as misuse of punctuation and cohort/homophones.
But study [Leacock et al., 2010] shows that Second Language Learners make many
more mistakes than native speakers. Most frequent error types produced by native
speakers may not be produced by second language learners. For example, errors
generated while writing complex sentences are infrequent for language learners,
as most of the time language learners avoid writing complex sentences. They
write complex sentences only when they have enough confidence in their ability
to construct them correctly. Second Language Learners can be of two types viz.
L1 and L2. Kind of errors produced by L1 Language Learners are influenced
by their native language. When native languages are similar but not identical,
L1 produces errors due to negative transfers. They fail to find exact equivalence
between these two languages. On the other hand, L2 Language Learners produce
errors because of their incomplete knowledge of syntactic and/or morphological
irregularities. They face trouble due to the novelty of the new language [Leacock
et al., 2010]. After analyzing the collected Bangla second language learners’ data
we came to know that the above statements (quoted in [Leacock et al., 2010]) are
also true for Bangla language. Therefore, learners who learn Bangla language
48
Table 3.1: Examples of errors committed by a Bangla Second Language Learner
Erroneous Sentence Operation Comments

Bangla: aami baajaare eka Substitution
iimaanadaara puruSha dekha-
laama
English: I saw an honest man in User did not find suit-
the market able Bangla word for
“honest”.
Bangla: aami ka.Daka chaa Substitution
khaaba
English: I will drink strong tea User did not find suit-
able Bangla word for
“strong”.
Bangla: aapani ki sochachhena ? Substitution
English: What are you thinking ? Bangla root verb is re-
placed by Hindi root
verb.
Bangla: duudhabaalaa jala sa.nge Substitution,
duudha milaaYa Transposition
and Deletion
English: Milkman mixes water Nominal inflection
with milk “er” of “jaler” is
deleted. “milaaYa” is
substituted in place
of “mishaYa”. “jalera
sa.nge duudha” is
transpose in place of
“duudher sa.nge jala”.
having the background of Oriya, Assamese or Hindi as native language produce
different kinds of errors than learners having native languages like Malayalam,
Tamil, Telegu or English. We have classified the types of errors according to
the operations involved in performance error and also depending on language
learning errors. Table 3.1 shows examples of errors committed by a Bangla Second
Language Learner having mother tongue Hindi. Figure 3.2 shows taxonomy of
errors found in Bangla Text of second language learners. We shall now elaborate
below different kind of errors depicted by second language learners.
1. Transposition Operation:
• Incorrect Sentence:
49
. Types of Errors
Operational Error Grammatical Error
Transposition Tense Error
Addition
Person Error
Repeated Word
Unnecessary Word Case Error
Deletion Adjectival Suffix Error
Implicit Subject
Pronoun Error
Implicit Object Sentence Fragment Error
Substitution Subject-Verb Agreement Error
Cohort Replacement
Count Error
Figure 3.2: Taxonomy of errors found Bangla text of second language learners.
Bangla: theke gaachha phala pa.De

English: from tree fruit falls.
Here the post position theke (from) is placed before noun gaachha (tree).
• Correct Sentence:
Bangla:gaachha theke phala pa.De
English:Fruit falls from tree.
2. Addition Operations:
(a) Repeated words:
Bangla: aami ekati *1 bhaala bhaala Chele
English: I am a *good good boy.
(b) Unnecessary words:
Bangla: paramaaNu anu apekShaa *adhika kShudratara
English: atom is *more smaller than molecule.
3. Deletion Operations:
(a) Implicit Subject:
Bangla: *[ ] tomaara maÑgala karuna (Subject:iishbara is missing here)
English: May *[ ] bless you. (Subject: God is missing here)
(b) Implicit Verb:
Bangla: tumi ki maadhyamika pariikShaa *[ ] ? (Verb: debe is missing here)
English: Will you *[ ] matriculation exam? (Verb: give is missing here)
4. Substitution Operations:
(a) Similar word or Cohort replacement:
1
* indicates error word in the sentence
50
• Incorrect Sentence:
Bangla: *bale baagha thaake
English: *tell tiger lives
• Correct Sentence:
Bangla: bane baagha thaake
English: Tiger lives in forest
Here bale and bane are cohort of each other but bale is verb and bane is
noun. In literature this type of error is also known as real word spelling
error.
Types of Grammatical Errors
1. Tense Error:
• Example 1:
Bangla: aami prashnapatra pa.Daba o uttara diYechhilaama
English: I will read the question paper and I gave the answer.
• Example 2:
Bangla: gatakaala aami sinemaa Jaaba
English: Yesterday I will go to Cinema.
• Example 3:
Bangla: Jakhana aami darajaa khulachhilaama takhana se ghare Dhuke pa.Dechhila
English: When I was opening the door then he entered the room.
2. Person Error:
• Example:
Bangla: chhaatraraa nishchaYa bidyaalaYa Jaabe Jadi *se pariikShaa dite
chaaYa
English: student will definitely go to school if *he wants to appear in the exam.
Plural sense of student has been lost by the singular representation of
‘he’.
3. Case Error:
• Example:
Bangla: eTaa *kaakaaraa bai
English: This is uncle’s book
The suffix raa of the noun kaakaa (uncle) is changed from genitive case
‘ra’.
4. Adjectival Suffix Error:
51
• Example:
Bangla: *daYaamaYii shikShaka aasachhena
English: The kind-hearted teacher is coming
The female suffix maYii of the word daYaa (kindness) is changed from
male suffix maYa which goes with shikShaka (male teacher).
5. Improper use of punctuation:
• Example 1:
Bangla: tomaara naama ki |
English: What is your name .
Here the punctuation | is used instead of ‘?’ symbol.
• Example 2:
Bangla: aami*, dekhalaama se aasachhe |
English: I, saw he is coming.
6. Sentence Fragment:
• Example:
Bangla: aami gaana gaa_iba *| jadi tumi naacha |
English: I will sing. if you dance.
7. Invalid Subject-Verb agreement:
• Example:
Bangla: aami bhaata *khaabena
English: I eat rice
Here the subject aami (I) is the first person non honorific but the person
information of the verb khaabena (eat) is third person honorific.
8. Count Error:
• Example:
Bangla: aamaara tinajana bandhu aachhe : jaYanta, raajiiba, debaaruna o
saurabha |
English: I have three friends: Joyanta, Rajib, Debarun and Saurabh.
3.1.1 Previous Work
Stemberger [1982] investigates the performance error of native speaker spoken
language and reports proportion of the four types of error as follows: substitution
52
(48%) > insertion (24%) > deletion (17%) > combination (11%). Foster [2005] has
manually created an error corpus for English and has classified missing word
errors based on the Part of Speech tag of this missing word. According to her “98%
of the missing POS come from the following list (the frequency distribution in the
error corpus is given in brackets): determinent (28%) >verb (23%) > preposition
(21%) > pronoun (10%) > noun (7%) > to (7%) > conjunct (2%)”. But manually
creation of such corpus is very time consuming and non trivial task. Brockett
et al. [2006] created an artificial error corpus by introducing mass/count noun
errors. They treated the error correction task in the machine translation point of
view. Their aim was to apply Statistical Machine Translation (SMT) technique
for converting ungrammatical sentences containing mass/count noun errors to
grammatical sentences. Wagner et al. [2007] have suggested a novel approach
of automated error corpus creation. They have carried out a detailed analysis of
Missing Word Errors, Extra Word Errors, Agreement Errors and Covert Errors. Lee
and Seneff [2008] created artificial error corpora by introducing verb form errors.
To mimic the real life errors, Foster and Andersen [2009] designed the GenERRate
tool. Their algorithm generates error corpus by introducing error along the line of
the previously specified real life error templates.
3.2 Experimental Dataset
Various online resources are available nowadays, from where Bangla Unicode
sentences can be collected. These include -
1. Bangla online news papers like “Ananda Bazar Patrika” (http://www.anandabazar.

com/)
2. Online version of Bangla literatures written by Rabindranath Tagore, Sarat
Chandra Chattapadhyay and Bankim Chandra Chattapadhyay (http://
53
www.nltr.org/) published by Society for Natural Language Technology Re-
search (SNLTR).
3. Bangla blogs (http://www.amarblog.com/) etc.
Special care needs to be taken at the time of selecting well-formed sentences
due to different reasons. In Bangla literature, diglossic variations are found in
the form of “Sadhu” and “Chalit”. Sentences written in “Sadhu” are mostly found
writings of Bankim Chandra Chattapadhyay and writings of Rabindranath Tagore
and Sarat Chandra Chattapadhyay. Sentences written in “Sadhu” are not used in
day-to-day communication. On the other hand, most recent works follow “Chalit”
form as it is used in daily communications. Due to informal communication,
Bangla blogs contain sentences of “Benglish” (a mixed language of Bangla and
English) language [Kundu and Chandra, 2012]. Sentences written in “Sadhu” and
“Benglish” are not important in our case, as our focus is to detect and correct
grammatical sentences used in day-to-day written communication. Therefore, at
the time of sentence selection from online websites (like http://www.nltr.org/,
http://www.amarblog.com/ etc.), we have manually filtered out the sentences
written in “Sadhu” form and “Benglish” language. In addition to that, we have
collected sentences from a detective novel, namely “Feluda Samagra” written by
Satyajit Ray. The reason behind selecting the novel is that sentences are written
in “Chalit” and most of the sentence are simple and representative of those that
are used in day-to-day communication. We have also collected sentences from
“Jekhane Dactar Nei” a Bengali book translated from English work “Where There
is no Doctor”. Thus, we have collected Unicode sentences from various domains
including Literature, Sports, Health, Politics and Business (2005-2012).
We assumed that the syntax and semantics of the collected sentences are correct
as they are mostly collected from different newswires which are normally edited
54
and proof-read. Corpora from multiple domains have been collected to avoid the
skewed distribution of data. From this set of collected Bangla sentences (approx
4,68,582), sentence length distribution has been measured. It is found that sen-
tences containing 9 words are the most frequent in this corpus. Figure 3.3 shows
the Bangla Sentence length distribution.
Figure 3.3: Bangla Sentence Length Distribution.
3.3 Methodology
Now we will discuss our novel approach for error corpus generation. The proce-
dure is as follows:
Step 1. If a grammatical sentence contains n words then transposition between
two consecutive words can generate (n-1) sentences with assumption that
only one transposition done in each sentence. Table 3.2 shows 3 sentences
generated from a sentence containing 4 words. Though the last two examples
55
Table 3.2: Examples of Transposition Operation.
Operation Example
Source gaachha theke phala pa.De2
Transposition-1 theke gaachha phala pa.De
Transposition-2 gaachha phala theke pa.De
Transposition-3 gaachha theke pa.De phala
in the table are grammatically correct, but transposition-2 is semantically
weird and transposition-3 is relatively uncommon.
Step-2 Transposition of highly collocated sequences surely induces noise in a
grammatical sentence. Erroneous sentences have been automatically gener-
ated by changing the word order of different types of Bangla collocated words
sequences collected from the corpus. We distinguish between the following
three categories: echo words (if w1 w2 is a word sequence and w2 has no mean-
ing), hyphenated words (w1 and w2 are connected by hyphen) and highly
collocated words Extraction of echo words and hyphenated words is simple.
One can use a simple regular expression [a−zA−Z]+ −[a−zA−Z]+ for collect-
ing hyphenated words from corpus and [\s]([a − z]([a − z]+ ) \ s+ [a − z] \ 2)[\s]3
for collecting echo words. For collecting collocated and co-occured word
sequences from corpus, a statistical approach [Manning and Schütze, 1999]
has been used. Variance(σ2 ) of the number of words separating word w2 from
word w1 have been estimated and low variance word sequences have been
filtered using a statistical significance test (t-test) with 99.5% confidence level.
The null hypothesis H0 is that the word sequences (w1 w2 ) appear indepen-
dently in the corpus. These filtered word sequences are cross verified with
2
Bangla Sentence: gaachha theke phala pa.De
English Word Meaning: Tree from fruit fall
English Translation: Fruit falls from tree
3
Python regex notation has been used here. The pattern will match with the word pairs where
the first characters of both words differ and the remaining characters remain unchanged. For
example, in the word pair “nardamaa Tardamaa”, only the first character differs and rest of the
character sequence is same for both words.
56
Mutual Information (MI) values between wi and w j . The word sequences
having higher Mutual Information and lower variances and having t-value
greater than 2.57 (considering α=0.005) have been considered as collocated
words. MI between words w1 and w2 has been estimated as follows:
p(w1 , w2 )
MI(w1 , w2 ) = log2 (3.1)
p(w1 ).p(w2 )
Count(w1 ,w2 )
where p(w1 , w2 ) = N
and Count(w1 , w2 ) is the number of sentences in
which w1 and w2 co-occur and N is the number of sentences in the training
corpus. Accordingly the probability of the denominator of equation 3.1 is
calculated.
Step-3 Another way of generating erroneous sentences is by replacing a word with
its cohorts and homophones. Cohorts are generated using regular expression
by adding, deleting or substituting a single character or moving character
sequences in a word. These generated words are then verified with spelling
dictionary to ensure that the generated words are correctly spelled. In this
process, if we assume that k number of words/cohorts can be generated on
an average from a single word then k x n sentences can be generated from a
sentence containing n words. Levenshtein [1966] also can be used to prune
the over generated cohort words. Words having minimum edit distance with
the original word are selected for the cohort list.
Step-4 By deleting a particular word from a sentence containing n words we can
generate n sentences where each sentences containing (n-1) words. Table 3.3
shows 4 sentences generated from a sentence containing 4 words where each
sentence containing 3 words.
57
Table 3.3: Examples of Deletion Operation.
Operation Example
Source gaachha theke phala pa.De
Deletion - 1 theke phala pa.De
Deletion - 2 gaachha phala pa.De
Deletion - 3 gaachha theke pa.De
Deletion - 4 gaachha theke phala
Table 3.4: Examples of Addition Operation.
Operation Example
Source gaachha theke phala pa.De
Addition - 1 ⃗ gaachha theke phala pa.De
W
Addition - 2 gaachha W⃗ theke phala pa.De
Addition - 3 ⃗ phala pa.De
gaachha theke W
Addition - 4 ⃗ pa.De
gaachha theke phala W
Addition - 5 gaachha theke phala pa.De W⃗
Step-5 By addition a word from a vector
 
w 
 1 
 
 
w 
 2 
 
 
W = w3 
⃗
 
 
 ... 
 
 
 
w 
v
in (n+1) possible position of a sentence containing n words, we can generate
V x (n+1) sentences where V is the length of the vector. Here we are consid-
ering one word is inserted at a time. Table 3.4 shows number of sentences
generated by addition operation. Thus applying step-1 to step-5 we can gen-
erate approximately (n-1)+ k x n+ n + V x (n+1) sentences from a sentence
containing n words.
Step-6 Figure 3.4 shows a N x N tag association matrix which is generated af-
ter analyzing 5000 manually POS tagged Bangla sentences having different
58
syntactic categories. Every possible combination of two POS tag sequence is
searched programmatically from this tagged corpus. On successful match,
each cell of the matrix corresponding to the tag sequence is filled with 1,
otherwise the cell contains 0. A cell with zero value indicates an invalid rela-
tionship i.e. POS tag of column Ni cannot occur after tag of row N j . In other
words POS tag of Ni does not follow tag N j . For example, post position (PPS)
cannot appear after intensifier (INT). Consulting this matrix, the mal-rule
can be generated which can be used for transposition of the word sequence
of a sentence after being annotated by an automatic POS tagger. Description
of our POS tagger given in subsection 3.3.1.
Figure 3.4: POS tag association matrix.
3.3.1 Bangla POS Tagger
In this research, we have used a HMM based POS tagger [Dandapat and Sarkar,
2006; Rabiner and Juang, 1993; Van Gael et al., 2009; Cutting et al., 1992] which has
been developed in our lab (http://nlp.cdackolkata.in/nlpcdack/POSTAG/index.spy).
The POS tagger has been trained on 5345 annotated sentences having 13215 unique
words. When small numbers of annotated sentences are available, less numbers
59
of tags are preferred [Bharati et al., 2006]. It has been seen that sentences annotated
with less number tags lead to efficient machine learning. Moreover, when the
number of tags are less accuracy of manual tagging is higher due to less disagree-
ment among annotators [Bharati et al., 2006]. However, generated tagset should
not be so coarse such that important lexical and grammatical information encoded
in sentence are missed out. Standardization of tagset for Indian languages is a
very challenging task. Studies related to this issues have already been reported
in [Bharati et al., 2006; Sankaran et al., 2008] and tagsets have been designed for
Bangla language based on these studies. We have collected 5345 raw sentences
from MIT Bangla corpora4 . Initially we have decided to have a finer tagset (con-
taining 90 tags), but later we come up with comparatively course tagset having
only basic 14 tags (see Table 3.5). Our final tagset has been prepared after con-
sulting and comparing with available tagsets like Penn tagset5 [Santorini, 1990] ,
tagset designed by IIIT Hyderabad6 [Bharati et al., 2006], the BIS POS tagset [Dash,
2013] and tagset reported in [Sankaran et al., 2008]. The sentences which had been
previously annotated with finer categories and other tagsets, are automatically an-
notated with these 14 tags. Then, errors induced during such automatic mapping
are manually verified and corrected. Thus, 5345 sentences are manually annotated
using 14 basic tags.
Our test set contains 500 Bangla sentences having 3228 number of unique
words. The number of unknown words7 in our test set is 1392. Table 3.6 shows
POS tag distribution in our training and test corpus. Table 3.7 shows accuracy of
individual POS tag on our training and test sentences. Table 3.8 shows top three
4
http://tdil.mit.gov.in/
5
http://www.cst.dk/mulinco/filer/PennTreebankTS.html
6
http://ltrc.iiit.ac.in/nlptools2010/files/documents/POS-Tag-List.pdf
7
The term “Unknown words” means number of unique words that are not found in training
corpus.
60
Table 3.5: POS tags used in our tagger
POS Tag Description

PN Proper Noun
CN Common Noun
JJ Adjective
RB Adverb
PR Pronoun
VBF Finite Verb
VNF Non Finite Verb
VBN Verbal Noun
INT Intensifier
PPS Post Position
CC Conjunct
IND Indeclinable
DGT Digit
PUNC Punctuation
Table 3.6: POS tag distribution in our training and test corpus
POS Tag Distribution in Training set (%) Distribution in Test set (%)
CN 20.79 29.72
PUNC 14.20 14.97
JJ 12.81 7.19
VBF 11.73 13.38
PN 8.89 7.37
PR 5.65 7.5
CC 5.33 4.6
VBN 4.61 1.65
RB 3.34 4.25
VNF 2.68 0.08
IND 2.59 2.22
PPS 0.82 2.25
DGT 0.77 0.03
INT 0.60 4.79
61
Table 3.7: Accuracy of individual POS tag using HMM
POS Tag Accuracy in Train set (%) Accuracy in Test set (%)
PUNC 99.96 96.13
CN 99.35 95.24
PR 96.97 91.43
JJ 96.35 89.28
VBF 95.98 85.93
PPS 95.96 83.40
CC 95.58 81.29
VNF 94.45 79.00
VBN 93.95 82.24
RB 90.66 81.06
IND 90.56 76.67
INT 89.42 73.05
DGT 80.43 66.30
PN 3.93 11.30
Table 3.8: Three most common types of errors
Actual Tag Predicted Tag Prediction Error (%)

PN CN 88.7
DGT PN 26.95
INT JJ 20.07
wrong predictions by our POS tagger. The reason behind such wrong predictions
is due to less number of occurrences of the tag in our training corpus. Lexical gap
[Manning, 2011] in training corpus and number of unknown words in test corpus
is another reason for wrong predictions. From Table 3.8 we can see that often
PN tag is predicted as CN. Both PN and CN are nouns in the broader category.
Words tagged as PN or CN are agglutinated with similar morphological suffixes.
Therefore, wrong prediction of PN as CN does not hamper the performance of our
work. However, prediction of INT as JJ and DGT as PN are serious issues that need
to be considered. To disambiguate INT from JJ we have searched the words in the
Bangla word-tag dictionary. If the words belong to a closed set of INT containing
words like “ati”,“bhIShana”,“khuba” etc. then we simply tagged the words as
INT. Similarly if a word follow a pattern of digit like [0-9]+,[0-9]+/[0-9]+/[0-9]+,[0-
62
9]+[a-zA-Z]+ etc., the word is tagged as DGT. Thus, applying linguistic and pattern
matching rules after our tagging module, we reduce the errors of our POS tagging.
3.3.2 Confidence Score and Mal-rule Filters
Following the above mentioned procedure, we can generate erroneous sentences
from a corpus of grammatical sentences. Our procedure generates approximately
(n-1)+ k x n + n + V x (n+1) sentences from a sentence containing n words.
Therefore, the number of generated sentences using this method increases with
the number of words in a grammatical sentence. We have seen that the mode of
the sentence length distribution of our collected Bangla corpora is 9. This implies
that the number of sentences generated by our procedure is 8+ k x 9 + 9 + V x 10.
Those many sentences can be generated from a single sentence having 9 words. If
we have 33513 9-word sentences in our corpus of approximately 4,68,582 gram-
matical sentences, then 33513 * 8+ k x 9 + 9 + V x 10 sentences can be generated
using our method. Some Bangla sentences may have as many as 57 words but we
are not considering such cases as such sentences are very infrequent (see Figure
3.3). Moreover, as Indian languages are relatively free word order, some valid
well-formed sentences also get generated after this noise induction procedure.
Therefore manual filtering of ungrammatical sentences from this set of (n-1)+ k x
n + n + V x (n+1) sentences is tedious task. Proper sampling is required so that
sentences indicative of more frequently made errors have higher probability of
getting selected. Therefore we have applied both rule-based and statistical based
approach for collecting significant sample from this population. Initially we pass
the sentences though our HMM based POS tagger (see subsection 3.3.1)and then
generated tag sequences are passed through mal-rule detector which collect the
63
sentences containing improper POS tag sequences. We also have calculated the
confidence score of each sentence by calculating bigram, Mutual Information (MI)
and Relative Position Score [Liu et al., 2008]. A numeric score is assigned to deter-
mine the quality of the sentence. The sentence-level confidence measure is based
on the score of each and every individual word in the sentence. Confidence score
estimation using N-gram, measures the grammatical soundness of the sentence
and MI based confidence score, measures the lexical consistency [Raybaud et al.,
2009]. MI is used to detect presence of which word reduces the uncertainty of
appearance of another word in the same sentence. Confidence score of a sentence
using MI has been calculated as follows:
Score(S) = Score(w1 , w2 , w3 , · · · , wn )
1∑
n
= Score(wi )
n i=1
(3.2)
∑
n
MI(w j , wi )
1∑
n
j=1, j,i
=
n i=1
n−1
Here MI(w j , wi )is calculated using equation 3.1. MI based confidence measures
do not take word order into account and instead focus on long range lexical
relationships. For this reason, we have also estimated the relative position based
confidence score. Confidence score of a sentence using Relative Position Score [Liu
et al., 2008] has been calculated as follows:
64
RPscore (S) = RPscore (w1 , w2 , w3 , · · · , wn )
( )
f reqDep (wi ,w j )
∑n ∑j−1
f reqInd (wi ,w j ) (3.3)
j=2 i=1
j−1
=
n−1
where f reqDep (wi , w j ) is the number of sentences in which wi and w j co-occur with a
constraint that w j appear after wi in a sentence and f reqInd (wi , w j ) is the number of
sentences in which wi and w j co-occur without any positional constraint. Mutual
Information has been used for proper selection of the erroneous sentences gener-
ated by substitute operation. Low Mutual Information ensures that a word in the
sentence is wrongly placed in the context of the other words. Bigram and Rel-
ative position scores have been used to select the erroneous sentences generated
by transposition operations. We have three confidence thresholds θbigram , θMI and
θRS for each of the three metrics. The range of the estimated scores varies with
the number of words in a sentence and selection of the confidence score estimator.
Therefore, we have normalized the scores generated by each estimators so that the
confidence values lies between 0 and 1. The normalization has been done using
the following formula:
Actual Score - Minimum Score

Normalized Score = (3.4)
Maximum Score - Minimum Score
Erroneous sentences generated by substitution operation are selected if their nor-
malized MI score is less than θMI . Similarly, erroneous sentences generated by
transposition operations are selected if their normalized bigram score is less than
θbigram and normalized RS is less than θRS . It implies that bigram and RS are
related with logical AND operation. These confidence thresholds are selected
65
Table 3.9: Experiment with confidence thresholds for generating erroneous sen-
tences generated by substitution operation
Erroneous sentences
θMI generated by substitution operation
Precision Recall F-Score
0.1 0.9 0.36 0.514286
0.2 0.9 0.367347 0.521739
0.3 0.9 0.367347 0.521739
0.4 0.9 0.367347 0.521739
0.5 0.88 0.458333 0.60274
0.6 0.911765 0.632653 0.746988
0.7 0.925 0.787234 0.850575
0.8 0.82 0.87234 0.845361
0.9 0.84 0.823529 0.831683
experimentally. Table 3.9 and Table 3.10 show change of precision and recall of
automatic error corpus creation with confidence thresholds. We have seen that
automatic error corpus creation methodology achieved highest F-Score when θMI
= 0.7 (for sentences generated by substitution operation), θbigram = 0.5, and θRS = 0.9
(for sentences generated by transposition operation). The error corpora creation
procedure with an English example is shown in Figure 3.5.
3.4 Result and Discussion
Following the experimental procedure described in section 3.3 we have gener-
ated erroneous sentences from randomly selected 1000 sentences from a corpus
of grammatical sentences. Then these generated ill-formed sentences are filtered
using mal-rule detector and depending on the confidence score (see subsection
3.3.2). After manually analysing the random sample of generated ill-formed sen-
tences, we found that 87% of generated sentences are really ungrammatical. Most
of these generated sentences have invalid POS tag sequences. Though some of
the generated sentences have valid POS tag sequences, the word sequences in
66
Table 3.10: Experiment with confidence thresholds for generating erroneous sen-
tences generated by transposition operation
Erroneous sentences generated by transposition operation

θbigram θRS
Precision Recall F-Score
0.2 0.5 0.769231 0.285714 0.416667
0.2 0.6 0.8125 0.787879 0.8
0.2 0.7 0.875 0.8 0.835821
0.2 0.8 0.875 0.8 0.835821
0.2 0.9 0.875 0.8 0.835821
0.3 0.5 0.769231 0.285714 0.416667
0.3 0.6 0.848485 0.8 0.823529
0.3 0.7 0.848485 0.8 0.823529
0.3 0.8 0.848485 0.8 0.823529
0.3 0.9 0.878788 0.828571 0.852941
0.4 0.5 0.846154 0.314286 0.458333
0.4 0.7 0.878788 0.805556 0.84058
0.4 0.8 0.878788 0.805556 0.84058
0.4 0.9 0.878788 0.805556 0.84058
0.5 0.5 0.769231 0.30303 0.434783
0.5 0.6 0.848485 0.823529 0.835821
0.5 0.7 0.757576 0.862069 0.806452
0.5 0.8 0.818182 0.84375 0.830769
0.5 0.9 0.818182 0.9 0.857143
0.6 0.5 0.846154 0.407407 0.55
0.6 0.6 0.848485 0.823529 0.835821
0.6 0.7 0.878788 0.805556 0.84058
0.6 0.8 0.878788 0.805556 0.84058
67
Figure 3.5: Simplified functional diagram of automatic error corpora creation.
68
Table 3.11: Erroneous sentences generated from a single sentence and selected
according to the confidence score.
Bangla Sentence Bigram MI R_S

Correct Sentence
gaachha theke phala pa.De 7.40E-0.26 0.6502461741 0.4810439560
Error sentences generated by Transposition operation
theke gaachha phala pa.De 3.02E-033 0.6502461741 0.4334249084
gaachha theke pa.De phala 1.85E-025 0.6502461741 0.43477564103
gaachha phala theke pa.De 2.64E-029 0.6502461741 0.4641941392
Error sentences generated by Addition operation
gaachha theke phala phala pa.De 6.59E-033 0.8127406288 0.5180275743
gaachha gaachha theke phala pa.De 6.65E-033 1.05182834701 0.49725020350
gaachha theke theke phala pa.De 7.50E-029 0.7025908508583 0.5030321530
Error sentences generated by Substitution operation
gaachha Theke phala pa.De 6.61E-033 -5.5447936457 0.3600000002
gaana theke phala pa.De 7.53E-030 -1.74079366467 0.39056776562
gaachha theke kala pa.De 3.76E-029 -3.3069949612 0.3964285715
maachha theke phala pa.De 7.58E-030 0.55386208974 0.40056776557
Error sentences generated by Deletion operation
gaachha phala pa.De 7.30E-026 0.59991544233 0.4375
theke phala pa.De 6.71E-023 0.23883813519 0.4367845696
gaachha theke pa.De 2.08E-018 0.64066086710 0.408854166667
these sentences are infrequent. Experimental result also shows that 13% of those
generated sentences are grammatical because insertion, deletion and substitu-
tion operation some time generates another grammatical construction. Table 3.11
shows sample of Bangla erroneous sentences generated by our method from a
grammatical sentence with their aforementioned confidence score. In this Table,
the first sentence is a correct sentence and the remaining erroneous sentences are
generated automatically. In this Table, R_S indicate the relative position score
of a sentence. Using echo words, hyphenated words and collocation collection
methodology as discuss in the step 2 of section 3.3. We have collected desired
results. Table 3.12 shows Bangla Echo words and Hyphenated words collected
from the corpus. Transposition between them might cause error to be induced in
a sentence. Transpositions of echo words are not allowable but transpositions of
69
Table 3.12: Bangla Echo words and Hyphenated words.
Echo Words Hyphenated Words

oShudha TaShudha aNu-paramaaNu
kha_i Ta_i adala-badal
goYendaa ToYendaa anumata-abhimata
chaakara baakara asukha-bisukha
chaNDaala phaNDaala aaina-aadaalata
jaata paata kaapa.Da-chopa.Da
nardamaa Tardamaa kaamanaa-baasanaa
Table 3.13: Automatically collected collocated and co-occurred word sequences.
W1 W2 Relative Positions MEAN SD TVAL MI

jij∼njaasaa karala 1 1 0 5.99 0.02028
chautrisha nambara 1 1 0 4 0.0106
ghaad.a naad.ala 1 1 0 3.16 0.008667
kamyunisTa paaTira 1 1 0 2.65 0.005921
chamake uThala 1 1 0 2.64 0.003883
satyi kathaa 1 1 0 2.7 0.002006
khrii puu 1,8,10 1.83 2.56 5.4 0.0295
hyphenated words are allowed sometime. For example, we may sometimes use
“baasanaa- kaamanaa” in place of “kaamanaa-baasanaa”, though these appearances
are very infrequent. Table 3.13 shows some automatically collected collocated and
co-occured word sequences along with their relative position, mean and variance
of relative positions, t-value and Mutual Information between these word se-
quences. Transposition of automatically collected echo words, hyphenated words
and collocated words induce noise in a grammatical sentence and this procedure
of automatic induction of noise gives a very good result.
70
CHAPTER 4
BANGLA GRAMMATICAL ERROR DETECTION
AND CORRECTION
“The principal design of a Grammar of any language is to teach us to express ourselves with
propriety in that language, and to be able to judge of every phrase and form of construction,
whether it be right or not.” – Lowth [1762]
The NLG based approach has been used for grammatical error detection and
correction of Bangla language. There are two levels of operations in an NLG based
approach. In the first level, the input word sequence (w1 ,w2 ,w3 ,· · · ,wn ) of a sen-
tence is transformed into over generated word vectors (w⃗1 ,w⃗2 ,w⃗3 ,· · · ,w⃗n ) which is
basically a trellis of all possible sentences. In the second level, a Language Model
with optimal search algorithm is used for selecting the best path from this search
space. The Language Model is used specially for scoring the various paths of the
trellis. The best path indicates the grammatically well-formed sentence whereas
the worst path indicates the ill-formed sentence. To create the trellis the input
word sequences are first passed though HMM based POS tagger (see subsection
3.3.1) and rule based morphological analyser that reduce each word to its root
form. Then using a morphological synthesizer, each root is over-generated by
including all possible suffixes with the root. In this phase, proper care is necessary
for selection and ordering of the suffixes. The most common ordering of suffixes
is a classifier followed by a case marker followed by an emphasizer. At the time of
morphological analysis and synthesis we have considered 14 classifiers (edera, eraa,

Table 4.1: Example of Nominal Morphological Synthesis
Root Classifier Case Marker E3mphasizer Generated Word

chhele raa chheleraa
chhele gulo chhelegulo
chhele Taa chheleTaa
chhele Ti chheleTi
chhele dera chheledera
chhele gulo ke chheleguloke
chhele Taa ke chheleTaake
chhele Ti ke chheleTike
chhele dera ke chhelederake
chhele gulo ke i chhelegulokei
chhele Taa ke i chheleTaakei
chhele Ti ke i chheleTikei
chhele dera ke i chhelederakei
chhele gulo ke o chhelegulokeo
chhele Taa ke o chheleTaakeo
chhele Ti ke o chheleTikeo
chhele dera ke o chhelederakeo
chhele gulo i chheleguloi
chhele Taa i chheleTaai
chhele Ti i chheleTii
chhele dera i chhelederai
chhele gulo o chheleguloo
chhele Taa o chheleTaao
chhele Ti o chheleTio
chhele dera o chhelederao
khaanaa,khaani, guli, gulo, Tuku, Taa, Ti, Te, To, dera, bRinda, raa), 10 case markers (e,
ete, era, ere, ke, te, Ya, Ye, ra, re) and 2 emhasizers (i, o). Morphological constituents
of a Bangla Noun (W) can be represented as W = R + CL? + CA? + EM? Here R,
CL, CA and EM denote Root, Classifier, Case Marker and Emphasizer respectively.
The symbol ‘?’ indicates CL, CA and EM can occur zero or one time, i.e. CL, CA
and EM can be implicit for a given word. Thus following this rule, for a given
root word ‘chhele’ we can generate inflected words as shown in the following table
4.1. We have used a rule based morphological analyser for Bangla. Initially Part
72
Table 4.2: Example of Nominal Morphological Analysis
Iteration Word Stripped Word Suffixes

1 chhelegulokei chheleguloke i
2 chheleguloke chhelegulo ke
3 chhelegulo chhele gulo
of Speech (POS) wise suffix lists have been prepared. In our NLG based grammar
correction system we have used our noun morphological analyser during correc-
tion of nominal inflectional errors. We have used a simple suffix striping algorithm
for noun morphology. The suffix stripping algorithm simply checks if the word
has any suffixes from the previously collected suffix list. This checking is done by
using regular expression. Then the suffix is stripped from the word. The same
procedure iterates on the remaining string (after stripping the suffixes from the
word). The number of iteration depends on the rules. For example the stripping
procedure iterates three times for noun. Finally, the remaining string is searched
in the root word dictionary for verifying its existence. If the root word is a proper
noun then it will not be found in the root word dictionary. Table 4.2 shows strip-
ping steps in each iteration during morphology of noun ‘chhelegulokei’. There
are some Bangla nouns that appear both as a root form as well as inflected form.
Examples of such Bangla nouns are “jaamaai”, “maalaai” etc. The word “jaamaai”
appears as a whole root words to indicate the meaning in English ‘son in law’
or inflected form like ‘jaamaa’ + ‘i’. Here ‘jaamaa’ is the root and its meaning in
English is ‘shirt’ and suffix ‘i’ is agglutinated with it to intensify the meaning (i.e.
‘only shirt’). Similarly, the word ‘maalaai’ is a root word which means “a special
type of sweet”; alternatively it also means a “garland” if the word is analyzed
as ‘maalaa’ + ‘i’. In such scenarios our noun morphological analyser returns the
whole word (considering as root with no inflection) and also in morphed form (i.e.
73
root + suffixes). The noun morphological analyser has been tested on 300 Bangla
inflected Common Nouns and 300 Bangla Proper Noun. The morphological anal-
yser yields an accuracy of 98.4% on Common Noun and 91.3% on Proper Noun
data. In case of post positions, the whole post position list is simply over gen-
erated, instead of using a morphological synthesizer. The Bigram model is used
here for calculating the scores of each node in the trellis. To avoid the sparseness
problem of data, Jelinek Mercer Smoothing [Jelinek and Mercer, 1980] is applied.
The Viterbi [1967] algorithm is used for selecting the optimal path from the trellis
depending on the scores generated by the language model. Figure 4.1 shows the
selection of best probable well-formed sentence from the trellis by bold line and
ill-formed sentences are marked with dotted lines.
Figure 4.1: Generative model for well-formed and ill-formed sentence detection.
74
4.1 Pruning of the Search Space
Availability of a good POS tagger and selectional restriction rules help avoiding
certain paths in the trellis. A rule-based function can be used to prune the search
space. Our linguistic function is defined by a set of hard constraints which are ba-
sically a knowledge base of linguistic selectional rules. For example, our linguistic
function can be defined as shown in Figure 4.2.
The function returns 1 when certain condition is satisfied and 0 otherwise. Ap-
Figure 4.2: Example of Linguistic function
plying Linguistic Hard Constraints on a trellis shown in figure 4.1, we can get a
pruned trellis (relatively smaller search space than the previous one) as shown in
figure 4.3
4.2 Selection of the Best Correction
It is also important to ensure that the corrected sentence is not too far away from the
ungrammatical one. To ensure this, initially k-best correct sentences are selected
from the trellis and then modified BLEU [Papineni et al., 2002] Score and Word
Error Rate (WER) are applied. BLEU is the geometric mean of n-gram match of
words with a brevity penalty and WER is calculated using Levenshtein Distance
(Edit Distance) between ungrammatical sentence and the correct sentence. WER
75
Figure 4.3: Pruned trellis after applying Linguistic Hard Constraints
76
is calculated as follows:
Insertion + Deletion + Substitution

WER(W, C) = (4.1)
Nr
Here W is the ungrammatical sentence, C is the correct sentence, Nr is the
number of words in the ungrammatical sentences. The higher the value of WER,
the lesser the value of similarity between two strings. The value of WER can vary
from 0 to 1 and some time the value becomes more than 1 when the length of
the correct sentence is greater than the ungrammatical sentence due to insertion
operation. Our aim is to select the correct sentence from a set of correct sentences
having a minimum WER rate. The BLEU score is calculated as follows:
 N 
∑ 
BLEU(W, C) = γ ∗ exp  λn log (Prec(W, C)) (4.2)
n=1
 N 
∑ 
here exp  λn log (Prec(W, C)) is the geometric mean of the modified n-gram Pre-
n=1
cision Prec(W,C) using n gram up to length N where Precision is calculated as :
Countmatch n gram (W, C)

Prec(W, C) = (4.3)
Countn gram (C)
and positive weight λn is calculated as λn = 1

(N+1)−n
i.e.when N= 3, weight for
unigram matching is λ1 = 0.33, weight for bigram matching is λ2 = 0.5 and for tri-
gram matching λ3 = 1 . Figure 4.4 shows n-gram matching score between following
ungrammatical and grammatical sentences. BLEU is a Precision based measure.
Brevity penalty is introduced to compensate for the possibility of proposing high
Precision of the hypothesis correction which has a less number of words than the
input ungrammatical sentence. Consider the sentence -2 and sentence -3 in figure
4.4. Grammatical sentence -2 has n gram Precision values
77
Figure 4.4: N-gram matching score between ungrammatical and correct sentences
Countmatch unigram (W, C2 ) 4

Prec(W, C2 ) = = (4.4)
Countunigram (C2 ) 7
And grammatical sentence -3 has ngram Precision value
Countmatch unigram (W, C3 ) 4

Prec(W, C3 ) = = (4.5)
Countunigram (C3 ) 6
The calculation shown above, grammatical sentence 3 has more Precision value
than sentence 2 though sentence 2 is closer to the input ungrammatical sentence
than sentence 3. The brevity penalty is calculated as follows:




 if c > r
 1
γ=
 (4.6)
 (1− cr )

 e if c ≤ r
Where c is the length of the correct sentence and r is the length of the ungram-
matical sentence. Here the BLEU score indicates reference sentence is more close to
the candidate sentence. Thus using a high BLEU score and low WER we can select
a suggested representative from a set of candidate correct sentences to ensure that
correct sentence is not too far from the ungrammatical sentence.
78
Our NLG based approach can also be used to correct performance error. Only
modification is needed in the trellis generation process. To generate the trellis
each word of the sentence is replaced with its cohort or homophones. Cohorts are
generated using regular expression by adding, deleting or substituting a single
character or moving character sequences in a word. These generated words are
then verified with Bangla spelling dictionary to ensure that the generated words
are correctly spelled. In this process, if we assume that k number of words/cohorts
can be generated on an average from a single word then k x n sentences can be gen-
erated from a sentence containing n words. Levenshtein distance also can be used
to prune the over generated cohort words. Words having minimum edit distance
with the original word are selected for the cohort list. Following this procedure
we can generate “mAchha” for given word “gAchha” and “khAtA”, “chhAtA” for
a given word “pAtA” and so on.
In our approach detection and correction is done in a single phase. It may
happen that the user has provided a grammatical sentence and the correction pro-
vided by the system is another grammatical sentence. In this scenario, we think the
system provided correction is more natural than the candidate sentence because it
chooses the correction based on language model from all possible sentences gen-
erated from the candidate sentence.
79
CHAPTER 5
EVALUATION
“The most serious mistakes are not being made as a result of wrong answers. The truly
dangerous thing is asking the wrong question.” – Drucker [2010]
Nowadays, grammar checkers are widely available as part of word processors
or as standalone components. But there is still a considerable room for improve-
ment in their error handling abilities. In order to quantify any improvement, we
need to devise a methodology for evaluating the effectiveness and acceptability
of a grammar checker. Over the last few years, most of the studies regarding
grammatical error detection and correction have been focused towards the design
and development aspects but relatively less attention has been directed towards
evaluation issues [Leacock et al., 2010; Chodorow et al., 2012]. The current work
attempts to address this gap by taking a fresh look at standard measures and moti-
vating the need for a finer grained evaluation based on a detailed characterization
of the complexity of sentences.
5.1 Challenges
An obvious way to compare two or more grammar checkers is to test them on
same test set and compare the results. But due to the lack of substantially large
standard test corpora, comparison among existing grammatical error detection
and correction approaches is presently hindered . Moreover, direct comparison of

existing approaches is not possible since most of the available approaches mainly
focus on some specific types of grammatical errors [Leacock et al., 2010]. Moreover,
these approaches are tested on different test sets which also vary in size and error
density. Moreover, different researchers address different evaluation metrics for
same types of errors [Tetreault et al., 2010; Dickinson et al., 2011; Rozovskaya
and Roth, 2011]. Sometimes, different metrics have been used as different aspect
of the same task. For example, in [Han et al., 2010] performance of omission
type prepositional errors correction were reported in term of accuracy whereas
performance of extraneous and replacement type preposition errors correction
were reported using Precision and Recall. Some researchers [Park and Levy, 2011]
preferred BLEU [Papineni et al., 2002] and METEOR [Lavie and Agarwal, 2007]
as evaluation metrics for their grammar correction task. However, Chodorow
et al. [2012] recommended to report True Positive (TP), False Positive (FP), False
Negative (FN) and True Negative (TN) in addition to any metrics derived from
them so that any reader can calculate other measures that the authors of a particular
paper did not choose to include. It would thus be a worthwhile enterprise to look
into new possibilities in the evaluation process. In this chapter, we have introduced
a novel methodology for evaluation of grammar assessment (MEGA) to measure
the acceptability of the grammatical error detection and correction system and to
circumvent the need of gold standard test corpora during comparison among the
systems targeting different types of errors. MEGA has been applied on our Bangla
grammar checker based on NLG approach. Since direct comparison between
existing English grammar checkers and the NLG based Bangla grammar checker
is not possible, the NLG based system has been compared against a prototype
Bangla grammar checker based on standard Naïve Bayes classification.
In the next section we will discuss on different evaluation metrics.
81
Table 5.1: Evaluation Measure Formulae
Metrics Formulae
TP
Precision TP+FP
TP
Recall TP+FN
2∗(Precision∗Recall)
F1-Score (Precision+Recall)
(TP+TN)
Accuracy (TP+TN+FP+FN)
5.2 Standard Evaluation Metric
Efficiency of grammatical error detection and correction system is usually mea-
sured by metrics like Precision, Recall, F-Score and Accuracy. These measures
generally indicate how often grammatical incorrectness is rejected and how often
grammatical correctness is accepted. Table 5.1 shows the definition of various
evaluation measures. True Positive (TP) means whenever machine’s judgement
corresponds to the manual judgement. In this context, TP occurs if an error exists
in the text and the grammar checker rightly detects that error. False Positive (FP)
occurs when the system identifies existence of an error even when there is no such
error in the text. False Negative (FN) occurs when the system misses an error in
the text. True Negative (TN) occurs when the system correctly identifies absence
of errors in the text. Table 5.2 shows these relationships with respect to grammati-
cal error detection task. Now we shall discuss some important aspects regarding
preparation of test suite which is central to the evaluation process.
82
Table 5.2: True Positive, False Positive, False Negative and True Negative with
respect to grammatical error detection task.
Grammatical Errors
(Condition)
Present Absent
(Negative) (Positive)
Not Found Found
Error Detection
True Positive False Positive

(Test)
False Negative True Negative
5.3 Test Suite
Most often, test suites for grammar checkers are prepared by using a set of well-
formed sentences and a set of ill-formed sentences. A test suite of well-formed
sentences is prepared from a collection of proof read and edited sentences which are
easily available from online newswire. The reasons behind preferring newswire is
to avoid the skewed distribution of data, since newswire has a good representation
of data from diverse domains. A test suite of ill-formed sentences should ideally
cover a wide range of the sentences some having single errors and many others
that have multiple types of errors. However, sufficient numbers of ill-formed
sentences are not easily available. Manual creation of fully annotated learners’
error corpora is quite expensive, time consuming and non-trivial task. To avoid the
problem of creating a corpus of manual errors, one can synthetically generate error
corpora to simulate real errors (as discussed in Chapter 3 ).There has been previous
works [Foster and Andersen, 2009; Lee et al., 2011] on using synthesized error data
which indicates artificial error corpora can be a valid source of evaluation. Due
to unavailability of standard test corpora, one solution is to test the system with
83
different domains having different structural complexity and hardness of errors1 .
For this reason, we have categorized our test corpora across axes like domains
(examples include Business, Politics, Sports, Literature and Health), structural
complexity (like simple sentence, complex sentence and compound sentence) and
types of errors and their proportions. Our error corpora contains 66% post position
error, 29% noun inflectional error, 3% determiner error and 2% combined error.
The proportion is selected depending on the probability of actual errors found in
learners’ writing.
5.4 Evaluation Methodology
Evaluating a grammatical error detection and correction system requires various
criteria, such as output quality, maintainability and user satisfaction. However,
satisfying all of them at the same time is quite difficult. We present evaluation of
Bangla grammatical error detection and correction systems using both standard
metrics like Precision, Recall and F-Score. We propose two new metrics, namely the
Graded Acceptability Assessment Metric (GAAM) and Complexity Measurement
Metric (CMM). GAAM measures the acceptability of the system whereas CMM
circumvents the need of gold standard test corpora during comparison among
different grammar checkers targeting different types of errors.
5.4.1 Evaluation using Standard Metrics
We have compared our NLG based system with another grammatical error detec-
tion system that uses a Naïve Bayes classifier. The Naïve Bayes classifier follows the
1
The term “hardness of errors” indicates the complexity of grammar correction due to presence
of errors in the sentence.
84
method reported in Golding [1995]. Four features, namely, word-word, word-tag,
tag-word and tag-tag sequences have been used in this classification algorithm.
The classifier has been trained on 4,68,582 well-formed Bangla unicode sentences
and same number of ill-formed sentences. The well-formed sentence collection
procedure has been elaborated in section 3.2. Ill-formed sentences are generated
by inserting errors into a corpus of correct text using combination of confidence
score estimator and mal-rule filter (see Chapter 3).
Error detection performance of the NLG based grammar checker has been eval-
uated on a predefined set of 1500 well-formed sentences and 1500 automatically
generated ill-formed sentences. The Naïve Bayes classifier is tested using the same
test sentences which were used for NLG based system and the true acceptance
and false rejection rates are also verified. Figure 5.1 shows the performance of
these two error detection approaches. A comparison of these two error correction
models are shown in figure 5.2. The synthetically generated ill-formed sentences
Figure 5.1: Performance of error detection
85
Figure 5.2: Performance of error correction
have been divided into some subcategories, so that each subcategory contains spe-
cific types of errors like post positional errors, determiner errors and case marker
errors. We have also tested the NLG based system on individual subsets as well
as on the total set. Details of the proportion of errors were mentioned in section
5.3. Table 5.3 shows the performance of the NLG based system on different types
of domains as well as different types of errors.
5.4.2 Graded Acceptability Assessment Metric:
We have introduced a novel Graded Acceptability Assessment Metric (GAAM)
for evaluating the acceptability of grammar checkers. Here, we have performed
a blind testing to evaluate the acceptability of the system’s outputs. In blind
testing, we only provide the system’s output (suggestion) to two testers having
86
Table 5.3: Performance evaluation of NLG based system on individual errors as
well as combined errors in five text genres. P indicates Precision and R
indicates Recall.
Error Types Business Health Sports Literature Politics

Post Position P= 84.83 % P= 86.13% P= 81.82% P= 66.19% P= 84.36%
R=83.64% R=85.3% R=81.04% R=64.65% R=82.40%
Determiner P= 66.67% P= 75% P= 42.86% P= 50% P= 57.14%
R=44.44% R=42.86% R=27.27% R=33.33% R=33.33%
Case marker P= 87.14% P= 82.86% P= 88.06% P= 79.71% P= 86.96%
R=82.43% R=81.69% R=85.51% R=76.39% R=81.01%
Combined P= 70% P= 72.73% P= 72.73% P= 63.63% P= 72.72%
R=53.85% R=61.54% R=47.06% R=46.67% R=53.33%
Table 5.4: Grading Scale
0 Not acceptable
1 Acceptable with difficulty
2 Fully acceptable.
no knowledge of the input, whereas in the open testing the input sentences are
provided. We have consciously avoided open testing and subsequent comparison
between blind and open testing, as our intention was not to investigate the bias of
the testers in presence of the inputs. In blind testing testers are requested to grade
the output sentences in a three level grading scale (0, 1 and 2) depending on their
acceptability. Grading scale is defined as table 5.4. Depending on the user’s grade,
acceptability of the system is calculated by the following formula.
∑
s=N
µs
s=1
Acceptability of the system φ = ∗ 50% (5.1)
N
87
where N is the number of test sentences and µs is the Mean acceptability grade for
each sentence calculated as

∑
e=n
G
e=1
µs = (5.2)
n
Here G is the grade (within 0, 1 and 2) given by evaluator e and n is the number
of evaluators. Using this formula, we found GAAM score of our system’s output
is 80.26 % tested on 1000 synthetically generated ill-formed sentences. Figure
5.3 shows the result of blind testing. We have found that the acceptability of
Figure 5.3: Grades given by tester-1 and tester-2 in blind testing
the NLG based system’s output is 80.26%. Now the question is, does the score
remain same after testing on large number of ill-formed sentences? To answer this
question, we have done a statistical significance test. Using t-test we found that
the acceptability of system’s output lies within the confidence interval [80.21 ±
1.17] with 95% confidence. We have also calculated the inter-annotator agreement
by kappa statistic [Cohen, 1960; Fleiss, 1981]. Using Cohen’s kappa we get the
kappa score between two testers as k=0.34. Agreement between two testers is
88
shown as a radar graph in figure 5.4. Each axis in the graph represents a system’s
suggestion corresponding to an erroneous sentence chosen randomly from a set
of 38 sentences. The plot is guided by the acceptability score for those system’s
suggestions provided by individual tester.
Figure 5.4: Agreement between two testers during manual evaluation
5.4.3 Complexity Estimation of Grammar Correction
Very often, comparison between available grammar checkers is not possible due
to unavailability of a common test set. Moreover, if two grammar checkers are
developed for two different languages, then performance of these systems cannot
be compared as there is no question of the common test set. To get rid of this
problem, we are proposing a novel Complexity Measurement Metric (CMM) by
which it will be possible to compare two grammar checkers developed for different
languages without the need of a common test set. Our approach is to estimate the
complexity of the grammar correction for a given input test data then find out the
correlation between the performance of the grammar checker and the complexity
89
value of the test data. Our hypothesis is that, these correlation values will indicate
how well a grammar checker performs for a given complex test data. Thus even
if two test set data are not similar but have same complexity value then we can
compare the performance of two system depending on the complexity of the
grammar correction problem. A significant research challenge is to estimate the
complexity of a grammar correction problem in the context of a given erroneous
test sentence. A first step would be to identify the important features that increase
complexity in the text. On the surface, this problem has some resemblances to the
problem of estimating readability of text [McCallum and Peterson, 1982; Kim et al.,
2012; Collins-Thompson et al., 2011; Heilman et al., 2008; Collins-Thompson and
Callan, 2004, 2005; Collins-Thompson, 2011]. In the context of Bangla sentences,
readability estimation has been explored by [Sinha et al., 2012]. Some features
proposed for assessing readability may be utilized in complexity estimation of
grammar correction under the basic premise that text which is harder to read will
be harder to correct. This makes sense when we observe that in the process of
manually correcting a piece of erroneous text, we first try to understand roughly
the meaning of that text depicted by the word sequence and then attempt to
place words in particular positions of the text so that the meaning of the text
can be properly conveyed. Sentences that are complex to read are often hard to
understand and are more complex to correct. We thus surveyed readability and
lexical richness estimation metrics proposed till date, like Flesch Kincaid Reading
Ease, Gunning Fog index [McCallum and Peterson, 1982], Smog [McLaughlin,
1969], Lix, Rix, Yule’s characteristic [Yule, 1944], Simpson’s Index, Guiraud Index
[Daller, 2010] and Uber Index etc. However, not all features used in the problem of
estimating readability of text [Sinha et al., 2012; McCallum and Peterson, 1982; Kim
et al., 2012; Collins-Thompson et al., 2011; Heilman et al., 2008; Collins-Thompson
90
and Callan, 2004, 2005; Collins-Thompson, 2011] are directly applicable in our
case as we are dealing with erroneous text. As a result, we have introduced new
features that have been explained in the next subsection.
Feature Set of Complexity Estimation
Complexity of text occurs mainly for two reasons. Firstly, a sentence might not
contain enough information to convey the concept behind the sentence. Secondly,
a sentence might contain lots of information that increases the cognitive load to
decode the intended meaning. Complexity of text can be classified as syntac-
tic complexity and cognitive complexity. Syntactic complexity reflects elements
such as sentence length, amount of embedding, and range and sophistication of
structures [Lourdes, 2003; Bachman, 1976]. Here we will concentrate only on the
syntactic complexity and will define syntactic/lexical features to measure cognitive
load indirectly. Consider the following features responsible for text and grammar
correction complexity.
Presence of Comma: Presence of comma contributes to the overall readability of a
sentence [Hill and Murray, 1998]. Presence of comma in the proper place in a
sentence can lead to faster reading times and reduces the need to re-read the
entire sentence. Commas also help to reduce problems arising from ambi-
guities; the “garden path effect” [Ferreira et al., 2001] can be greatly reduced
if commas are correctly present after introductory phrases and reduced rela-
tive clauses [Israel et al., 2012]. Several studies have provided evidence that
readers experience difficulty when they read “garden path sentences” like
“The old man the boat”. A “garden path sentence” [Pazzani, 1984]is one that
is exceptionally hard for the reader to parse. Presence of comma in proper
91
place can decrease the complexity of text.
Multiple Parts of Speech of a Single Word: In most of the languages a particular
word can have different POS. Generally, when a person reads a sentence
the user builds up a likely meaning for each word and a meaning for the
whole sentence word by word. At the time of sentence processing if a word
appears that changes the meaning of the sentence, the user switches to the
new meaning and continues. If a word has multiple POS tags and a tag
which is infrequent is used in the sentence then it increases the complexity
of the sentence. For example, in the sentence “The complex houses married and
single soldiers and their families” is complex to understand for second language
learners of English. This is because the word ’houses’ is used as a verb here,
which is infrequent as opposed to its use as a noun.
Syntactic Structure: If thematic roles in a sentence deviate from usual agent (do-
er) before patient (do-ee) order, then patient increases cognitive load and
thus increases sentence complexity. For example:
Simpler: The man who killed the Tiger · · ·
Complex: The Tiger whom the man killed · · ·
A reversible passive sentence like “The little rat is chased by the big cat.” is
complex than “The big cat chases the little rat”. Sentence complexity using
syntactic pattern can be defined as :
#LV(s) + #LN(s)
Sentence Complexity (s)=
#Clauses(s)
Where #LV(s) and #LN(s) are the number of verbal and non verbal links
(i.e. Verb phrase, Noun Phrase) and #Clauses is the number of clause in
the sentence [Basili and Zanzotto, 2002]. Coordinating conjuncts increase
92
the complexity because relationships between clauses are not always used
explicitly in the sentence.
Metaphor: Metaphor is an important feature that is responsible for text complexity
found mostly in the literature domain. One can detect metaphor by bigram
analysis of noun-verb agreement. If P(Common Noun | Verb) is less than
some predefined threshold then it can be considered as a metaphor. For
example, “He planted good ideas in their minds.” Here the verb ‘planted’
acts on the noun ‘ideas’ and makes the sentence metaphoric. Generally in
corpus the object that occurs more frequently with verb ‘planted’ are ‘trees’,
‘bomb’ and ’wheat’ etc [Krishnakumaran and Zhu, 2007].
Lexical Density: Psycholinguistic studies have long shown that less densely packed
texts are more easily comprehended, particularly among non-proficient read-
ers. Lexical density is a measure of the ratio of different words to the total
number of words in a text [McCarthy, 1986]. In earlier work [Bradac et al.,
1977] it has been seen that there is a correlation between low lexical density
and comprehension test scores.
T-Unit: T-Unit is an important feature responsible for text complexity. T-Unit is
the “shortest grammatically allowable sentence (writing can be segmented)
or minimally terminable unit” [Hunt, 1965; Sachs and Polio, 2007]. T-Units
which are longer (number of words) and have more subordinate clauses are
more complex [Robb et al., 1986]. A simple sentence or a complex sentence
consists one T-Unit, while a compound sentence consists of more than one
T-Unit [Gaies, 1980]. For example: The Sun rose. The fog dispersed. The general
determined to delay no longer. He gave the order to advance. Here number of
T-Unit is 4. Mean T-Unit length = (Number of Words/T-Unit)=(19/4)=4.75
93
When the above sentence is written as, At Sunrise, the fog having dispersed,
the general, determined to delay no longer, gave the order to advance. Then the
number of T-Unit is 1 and mean T-Unit length is =(18/1)=18.00. It is quite
obvious that the second sentence having greater mean T-Unit length is more
complex than the first sentence.
Abstractness: Less frequent (i.e. unfamiliar) words and words that represent
abstract ideas increase text complexity because the presence of such words
require greater level of interpretation to understand the intended meaning.
Pronominal Reference: A pronoun always points to a noun or a clause in the
sentence to indicate a reference. As pronouns are used as references and in
many cases the references of the pronouns cross sentence boundaries, if the
sentence starts with a particular pronoun then it is often difficult to identify
the context on which that pronoun was used.
Confusion Set: Confusion set is an important feature for estimating grammar
correction complexity. Context window surrounding erroneous words and
the set of possible corrections will be referred to as the confusion set hence-
forth. Consider a sentence S=w1 w2 · · · wi XCYw j · · · wn . Where C is a confu-
sion set such that C = {c1 , c2 , · · · , cn } from where a particular word ci need
to be placed to make the sentence correct. X and Y are the left and right
context windows of C. We will say that the given sentence is complex if
count(Xci Y) = count(Xc j Y) + θ
Where ci , c j and {ci , c j } ∈ C and 0 ≤ θ ≤ 100 . Complexity of the sentence
increases as the value of θ decreases and the size of the context window
(size of X and Y) and the size of the confusion set C increases. For example,
the English sentence “Ram is going C market” is not complex when C= {to,
94
at}, X={going} and Y={market}, because θ is very large as frequency(going to
market) ≫ frequency(going at market) in general English corpus. Manual in-
vestigation also depicts that complexity of grammar correction also increases
with less number of words (especially nouns) in the left side of C and large
number of words in the right side of C. It has been seen that if we have a
sentence like S = {X = pronoun}CYw j · · · wn where n > 10 then it is very much
difficult to find out proper ci . We also need to know the previous context
of that sentence in order to find out the pronominal reference that the pro-
noun is pointing to. Presence of multiple errors that have mutual influences
increase the comprehensibility of the sentence and in turn creates difficulty
at the time of grammar correction.
Other factors also can increase the complexity of the sentences like sentence length,
presence of idiomatic expressions, figurative use of words and assimilation of for-

2
eign words and phrases in the source text. A hyperbole and understatement
3
also increases the complexity. The variations of such features create different
levels of complexity in different domains. Thus we have collected a set of features
F = { fi , fi+1 , · · · , fi+n , f j , f j+1 , · · · , f j+m } where the fi are the features responsible for
readability of text and the f j features are responsible for severity of errors in text.
Table 5.5 shows features used for grammar correction complexity estimation.
Using these features we have design a multiple linear regression model as shown
below:
α0 + (αi fi + αi+1 fi+1 + · · · + αi+n fi+n ) + (β j f j + β j+1 f j+1 + · · · + β j+n f j+n ) = Ω
where {α0 , αi , αi+1 , · · · , αi+n } and {β j , β j+1 , · · · , β j+n } are the parameters of the multi-
2
A hyperbole is an exaggeration which is used to put emphasis on the statement, and is usually
not intended to be taken literally.
3
An understatement is a phrase that minimizes the content of the message and represents much
less than it really is.
95
Table 5.5: Features for estimation of grammar correction complexity
Readability of Text
Number of word per sentence
Number of punctuation per sentence
Number of conjunction per sentence
Number of discourse marker (Example: “like”, “how”, “as”)
indicating reason, confirmative or concessive subordinate per sentence
Number of words having more than 7 or more letters per sentence
Number of pronouns per sentence
Number of coordinating conjuncts per sentence
Number of infrequent words (unigram count in corpora less than 50) per sentence
Severity of error
Number of errors per sentence
Number of errors per sentence influencing to each other
Length of confusion set C
Value of θ when count(Xci Y) = count(Xc j Y) + θ and 0 ≤ θ ≤ 100
Number of words in left side of C
Number of words in right side of C
96
Table 5.6: Complexity Score in different complexity level
Level of Complexity Numerical Value

Very easy 0-25
Easy 26-50
Complex 51-75
Very Complex 76-100
ple linear regression that need to be learned during training process and Ω is the
complexity score.
To build the training data we collected 1000 different sentences from different
domains having different level of complexity of readability. Then we syntheti-
cally induced errors in those sentences. The resultant erroneous sentences had
different level of error density. Furthermore, we also tried to ensure that sen-
tences also contain the features as described in table 5.5. Then we defined the
complexity score for four levels as “Very easy”, “Easy”, “Complex”, and “Very
complex”. Thereafter, these erroneous sentences were given to two language
experts and two native speakers for correction. We also requested them to en-
ter a complexity score (see table 5.6) according to the difficulty that they faced
during correction of those sentences. Then the proposed multiple linear regres-
sion model was trained on this training dataset and the values of the parameters
α0 , αi , αi+1 , · · · , αi+n , β j , β j+1 , · · · , β j+n } were estimated. After learning the parame-
ters of the multiple linear regression, we estimated the complexity score of five
text domains (business, health, sports, literature and politics) each containing 500
erroneous sentences. We observed that the Relative Error of the multiple linear
regression model is 0.39. Relative error is calculated as follows:
1 ∑ ActualScorei − PredictedScorei
|N|
RelativeError = | |
|N| i=1 ActualScorei
97
here |N| is the number of test sentences. ActualScorei and PredictedScorei are the
actual complexity score given by user and predicted complexity score by the model.
Feature Selection
While trying to analyze the cause of poor performance, we found that there are
some irrelevant and redundant features. High dimensional features increase the
computational cost and irrelevant features hamper the accurate prediction of com-
plexity. So, there is a requirement for reduction of dimensionality by filtering the
irrelevant and redundant features. But manual identification of important features
from a large number of features is practically not feasible. Correlation analysis was
performed on this training data with two objectives. First, to identify the set of
redundant attributes by identifying features with a high correlation between them.
Secondly, to find out features which are more relevant for a particular complexity
value by looking at the correlation values between the target variables and the fea-
tures. Those features having a low correlation (-0.1 to +0.1) between features and
the complexity score have been removed, on the assumption that those features
will not be contributing for setting the model for estimating grammar correction
complexity. However, following this relevant feature selection procedure, the rel-
ative error of our multiple linear regression model becomes 0.36. The reasons
behind this are inadequate training data and lack of more refined linguistic fea-
tures. Moreover, the multiple linear regression model only provides a complexity
score but is unable to comprehensively explain the factors regarding features that
contribute to the text complexity. To address this issue, a framework based on
the idea of active learning has been employed for bettering our estimate of the
complexity of the text.
98
There are active learning frameworks already in place like PROTOS [Bareiss
et al., 1990; Clark, 1987] which has been used in the field of medical diagnostics
to learn interactively from a domain expert to classify events. The system re-
tains the guided learning cases and also the causes of failures and the associated
explanations for those specific cases. We have followed the PROTOS architec-
ture for active learning of grammar correction complexity for better generaliza-
tion because of the need to elicit knowledge from an expert user and to provide
a language specific feature that may benefit from guided explanation from lin-
guists. We have used the k-Nearest Neighbour algorithm (k-NN)[Mitchell, 1997]
for following PROTOS framework to estimate grammar correction complexity
of a given input text. Initially the example-base contains examples in the form
[< f1 : v1 >, < f2 : v2 >, · · · , < fn : vn >, ci ], where fi is the attribute of the fea-
ture, vi is its value and ci is the complexity score of a sentence involving these
features. For a given English sentence “Ram *go to market”, an example may look
like [< Num_o f _words : 4 >, < Num_o f _preposition : 1 >, < Num_o f _Error : 1 >, <
Num_o f _in f requent_words : 1 >, 10]. Here * indicates the erroneous word and the
number 10 indicates the complexity of the sentence. At the time of training of the
system, sentences of different complexity are provided to the user in a multiple
choice questions (MCQ) format. Then the user provides his correction and com-
plexity value of sentence consulting table 5.6. Depending on the extracted feature
values from the sentence and using the k-NN algorithm this system estimates the
complexity score of the same sentence. Given this setting, the following situations
are possible.
Situation 1: User’s selected option is correct and user’s complexity score and
system estimated complexity score is not same.
99
Situation 2: User’s selected option is incorrect and user’s complexity score and
system estimated complexity score is not same.
If the user’s selected option is incorrect and complexity score provided by the user
is very low and the score does not match with system generated score then system
will not ask user to supply an explanation regarding complexity of the given
input sentence. In such situation, it is assumed that user is not confident enough
to guide the system to take better inference as a part of his interaction. Other
than the above mentioned situation, whenever the complexity score provided by
the user and the one estimated by the system varies, the interaction based active
learning procedure starts. In this case, the system provides explanations of its
decision in the form of common features between the given input sentence and
the nearest example from the example-base selected using the k-NN algorithm. It
also provides the extra features in input sentence not present in nearest matched
example or vice versa. The user then selects or adds the features that contribute
to the complexity of the given sentence. After receiving the user’s feedback, the
system generates a new example with the selected and the new features. The new
example is inserted into the example-base if it is not present there. The system
also remembers a link between the nearest example provided by the k-NN and
the new example generated depending on the user’s feedback, so that whenever
in the future this nearest example is selected, then the system will automatically
map it to the newly generated example. The proposed active learning procedure
is shown in Algorithm 1.
The screenshot of the active learning application prototype is shown in figure
5.5. User can provide a class name (like “Very Easy”, “Easy”, “Complex” or “Very
Complex”) instead of entering specific complexity score. Then the system’s gen-
100
Algorithm 1 Algorithm for estimation of grammar correction complexity using
Active Learning
Require: UsrComScore, UsrSel, MCQ, ExampleBase
{MCQ : Multiple Choice Question}
{UsrComScore : Complexity score provided by the users}
{UsrSel : Selection of user for MCQ from available option}
Query_Example ← extract features from MCQ

Best_Match_Example ← k-NN(Query_Example,ExampleBase)
SysComScore ← Complexity score of the best match example
if UsrComScore < 50 and UsrSel is incorrect then
User_Con f idence ← low
else
User_Con f idence ← high
end if
if User_Con f idence = high and UsrComScore , SysComScore then
Present common feature and extra feature between Query_Example and
Best_Match_Example
Ask user to select/add those features that contribute to the complexity of the
sentence
UsrSelFeatures ← Features given by the user
Create New_Example using UsrSelFeatures and UsrComScore
New_Example ← <UsrSelFeatures, UsrComScore>
if New_Example < Example_Base then
Example_Base ← Example_Base ∪ New_Example
end if
end if
101
erated complexity score will be mapped to one of these complexity class. It is seen
that relative error of the proposed active learning model is 0.16 which is much
less than that obtained using multiple regression when tested on the same dataset.
Figure 5.6 shows complexity score obtained over 10 trials of each of the five do-
mains. In each trial, we have randomly selected 50 sentences from 500 erroneous
sentences of each domain and computed the average complexity score using our
active learning based model. From the complexity score shown in figure 5.6, it is
apparent that complexity of the literature domain is higher than any other domain
considered here. This is expected, since figurative uses of words are common in
this domain, and nouns are ornamented with adjective and intensifiers. Rhetori-
cal structures are usually found in sentences of literature domain. Idiomatic and
colloquial patterns are used more than any other domain considered here. Some-
times phrases of foreign language are present as a part of the source language in
its original orthographic representation or transliterated according to the source
language. These patterns are mostly common in dialog sentences of literature
domain. Unfamiliarity of the foreign language increases the complexity of text.
Uses of Informal style in text sometimes involve region dependent slang that is not
found in the other domains considered. Unfamiliarity of region dependent slang
due to sociocultural variation of readers increases the complexity of the text. Ap-
pendix D shows variation of sentences found in Bangla literature domain in MCQ
format that appears complex to the Bangla second language learners and to the
Bangla native speakers as well. In figure 5.7, we have shown POS tag distribution
of five domains (business, health, sports, literature and politics). It is apparent that
probability distributions of punctuation (RD_PUNC), quotative (CC_CCS_UT),
subordinate (CC_CCS), personal pronoun (PR_PRP) and wh-pronoun (PR_PRQ)
are appearing with higher frequency in literature domain than any other domain
102
Table 5.7: Correlation of complexity score with grammar checkers accuracy
Naïve Bayes binary classifier NLG based Grammar Checker

Pearson’s coefficient
of Correlation (r) -0.91 -0.87
considered. Punctuations like multiple commas (,), semicolon (;) in a sentence
represent that the sentence is a complex sentence according to the syntactic struc-
ture. Quotative sentences contain multiple phrases in a combination of direct and
indirect phrases. In most of the cases, wh-pronouns of a sentence points to a noun
or clause beyond the boundary of that particular sentence. Figure 5.8 shows the
distribution of infrequent words of five domains (business, health, sports, liter-
ature and politics). In this figure we can see that the distributions of infrequent
words are more in the literature domain than any other domains considered.
Figure 5.9 shows the complexity scores obtained from 500 erroneous sentences
of each domain and the respective accuracies obtained by our NLG based grammar
checker.
The Pearson’s correlation [Mangal, 2012] coefficient (r) of the complexity score
against grammar correction accuracies obtained by Naïve Bayes classifier and
the NLG based grammar checker is shown in Table 5.7, which shows a strong
negative linear correlation of complexity scores with accuracies achieved by the
two systems. Thus both classifiers have low accuracies when the complexity is
high, and vice versa. This strengthens the case for the robustness of the proposed
complexity measure.
103
Figure 5.5: Screenshot of active learning framework for estimation of text com-
plexity. The explanation of the feature names are available at http:
//nlp.cdackolkata.in/testComplexity/FeatDtl.spy
104
Figure 5.6: Complexity values across different datasets
Figure 5.7: POS Tag distributions in different domains.
105
Figure 5.8: Frequency of word distribution across different domains.
Figure 5.9: Complexity measure and Precision score obtained by NLG based gram-
mar checker and Naïve Bayes classifier systems.
106
CHAPTER 6
CONCLUSIONS AND FUTURE WORK
As a part of conclusion, this final chapter summarises our contributions and scope
of the future work. The aim of the thesis was to represent a technique to detect
and correct grammatical errors in a morphologically rich and free word order
language like Bangla. At the phase of initiation, we faced challenges regarding
the unavailability of large sized error corpus, robust parsers, insufficient linguistic
rules and lack of standard evaluation metric of grammatical error detection and
correction. Here we have proposed a methodology for automatic creation of
synthetic error corpora by combining statistical and linguistic knowledge. This
has been done using a combination of a confidence score estimator and a mal-rule
filter to introduce errors into a corpus of correct text. These synthetic corpora
have been utilized during evaluation. A similar approach can also be adopted
for generation of synthetic error corpus in other Indian languages where such
resources are not available as of now. A novel NLG based approach has been
proposed for automatic detection and correction of Bangla grammatical errors.
The NLG based approach has been used instead of NLU based approach to avoid
the complexity and ambiguity of the grammar for parsing and also to circumvent
the need for modeling ungrammatical sentences. The proposed approach not only
corrects the mistakes committed by the users but also provides relevant examples
for supporting its correction. It also estimates the complexity of the grammar
correction task so that the user can be informed about the system’s confidence.
An active learning based complexity estimation has been used to estimate the
complexity of grammar correction. The NLG based approach reported here can
also be applied on other Indian languages to build robust grammar checkers.
We have also proposed a Methodology for Evaluation of Grammar Assessment
(MEGA) combining a Graded Acceptability Assessment Metric (GAAM) and a
Complexity Measurement Metric (CMM). GAAM employs a three level grading
scale to calculate the acceptability score depending upon the judgment of the
human evaluators having three options viz. a) not acceptable b) acceptable with
difficulty c) fully acceptable. CMM is being introduced to estimate the complexity
of test data and to find out the correlation between complexity value and accuracy
of the system tested on that data. To provide better generalization and explanations
of language specific features, CMM follows active learning methodology with
expert user interaction.
6.1 Contributions
The major contributions in this research are as follows:
• Automatic creation of Bangla error corpora to mimic real world errors.
• NLG based grammar correction methodology for Bangla language.
• Active Learning based complexity estimator for reliable grammar correction.
• Graded Acceptability Assessment Metric (GAAM) and active learning based

Complexity Measurement Metric (CMM) for evaluation.
Other relevant contributions of the thesis are as follows:
• Developed a web based grammatical error detection and correction sys-

tem for Bangla language. It is available at http://nlp.cdackolkata.in/
nlpcdack/GrammarChecker.
• A detailed survey of grammatical error detection and correction.
108
• Taxonomy of Bangla grammatical errors.
• Bangla Language Model and learning based on N-gram.
• A broad coverage of compilation of references on different work in English

and other languages for grammatical error detection and correction task.
• A resource of well-formed sentences and ill-formed sentences from different

text genre and a test suite of sentences having different complexity levels for
testing Bangla grammar checker.
• Other resources like echo words, hyphenated words also collected automat-
ically at the time of automatic synthetic error corpora creation that may help
in other research of Bangla language analysis.
6.2 Future Works
As part of future work, a more principled approach needs to be devised to correct
grammatical errors with better selectional restrictions. A closer investigation on
Bangla Second Language Learners data is required to find the patterns of their
errors in order to improve the performance of the system. Instead of the bigram
language model, a weighted higher order n-gram based linear learning model with
± k context window can be designed to provide better solution. The improvement
of our NLG based system relies on some basic NLP components like POS tagger,
Morphological analyser and Morphological synthesizer, those components needs
to be improved in future. Finer linguistic constraints need to be defined to prune
the search space and to improve the processing speed of the system. As Bangla is
a free word order language [Bhattacharya et al., 2005; Dandapat et al., 2004], CFG
based parsing framework has limitation [Shieber, 1985; Begum et al., 2008; Bharati
et al., 2010] for analysing Bangla sentences. Hence, a dependency based parsing
[Nivre, 2005; Popel et al., 2011; Nivre, 2008; Zhang and Nivre, 2011; Chen et al.,
2012, 2010, 2011] framework can be used with our proposed NLG based system for
109
better analysis of input sentences specially for checking subject-verb agreement.
Moreover, a principled approach needs to be devised for sampling the auto gen-
erated error corpus in the boundary cases and also to ensure that automatically
generated error sentences will mimic the naturally occurring learners’ errors. A
statistical classifier can make use of active learning to bootstrap the corpus creation
process. The parameters in our regression model need to be examined more closely
to have insights into which features are more central in determining complexity.
At a later stage, we may also need to study interactions between the features more
closely.
110
APPENDIX A
Examples of some interesting erroneous sentences
corrected by the system.
Sl No Incorrect Sentence Predicted Correction

1 িনেচর আেলাচনা সে ব াপারটা জানা যােব ৷ িনেচর আেলাচনা থেক ব াপারটা জানা যােব ৷
nichera aalochanaa sa Nge byaapaaraTaa Nichera aalochanaa theke byaapaaraTaa
jaanaa Jaabe — jaanaa Jaabe —
2 এই অসুিবধা দূর করার জন পণ িবিনমেয় ধাতুখ চলন হয় ৷ এই অসুিবধা দূর করার জন পণ িবিনমেয় ধাতুখে র চলন হয় ৷
ei asubidhaa duura karaara janya paNya bini- ei asubidhaa duura karaara janya paNya bin-
maYe dhaatukhaNDa prachalana haYa — imaYe dhaatukhaNDera prachalana haYa
—
3 খুব একজন হইচই করেত চাইেছ না পুিলশ ৷ খুব একটা হইচই করেত চাইেছ না পুিলশ ৷
khuba ekajana haichai karate chaaichhe naa khuba ekaTaa haichai karate chaaichhe naa
pulisha — pulisha —
4 জনসংখ ার অিধকাংশই আবার াম বাস কের ৷ জনসংখ ার অিধকাংশই আবার ােম বাস কের ৷
janasa.nkhyaara adhikaa.nshai aabaara janasa.nkhyaara adhikaa.nshai aabaara
graama baasa kare — graame baasa kare —
5 তঁােদর জন এক জন ছুিটরা িছেলন । তঁােদর মেধ এক জন ছুিটেত িছেলন ।
taa.Ndera janya eka jana chhuTiraa chhilena taa.Ndera madhJe eka jana chhuTite
chhilena
6 পুরসভা েলাই জন তা পেত বশ কেয়কবার সদরখানােক আসেত পুরসভা থেক তা পেত বশ কেয়কবার সদের আসেত হয় ।
হয় ।
purasabhaaguloi janya taa pete besha purasabhaa theke taa pete besha kaYek-
kaYekabaara sadarakhaanaake aasate haYa abaara sadare aasate haYa
7 বৃি টােক হার ৫০ বিসস পেয় জন ৭৫ বিসস পেয় । বৃি র হার ৫০ বিসস পেয় থেক ৭৫ বিসস পেয় ।
bRRiddhiTaake haara 50 besisa paYenTa bRRiddhira haara 50 besisa paYenTa theke
janya 75 besisa paYenTa 75 besisa paYenTa
8 ওই সারিণ অনুযায়ী নিট রােত হাওড়া নীেচ ছেড় মালদহ শেন ওই সারিণ অনুযায়ী নিট রােত হাওড়া থেক ছেড় মালদহ শেন
পঁৗছােব পর িদন ৮টা ১০ িমিনেট ৷ পঁৗছােব পর িদন ৮টা ১০ িমিনেট ৷
oi saaraNi anuJaaYii TrenaTi raate haao.Daa oi saaraNi anuJaaYii TrenaTi raate
niiche chhe.De maaladaha sTeshane haao.Daa theke chhe.De maaladaha sTeshane
pau.Nchhaabe para dina 8Taa 10 miniTe pau.Nchhaabe para dina 8Taa 10 miniTe —
—
APPENDIX B
Examples of incorrect prediction by the system.
Sl No Original Sentence Predicted Correction

1 বুধবােরর ঘটনার িববরণ িদেয় ওই ৬ জেনর এক বুধবােরর ঘটনার িববরণ থেক ওই ৬ জেনর এক
জন বেলন াম পাহারা িদি লাম ৷ জন বেলন াম পাহারা িদি লাম ৷
budhabaarera ghaTanaara budhabaarera ghaTanaara
bibaraNa diYe oi 6 janera eka bibaraNa theke oi 6 janera eka
jana balena graama paahaaraa jana balena graama paahaaraa
dichchhilaama — dichchhilaama —
2 সই জন পাথর ভাঙেত হে ৷ সই সে পাথর ভাঙেত হে ৷
sei janya paathara bhaa Nate sei sa Nge paathara bhaa Nate
hachchhe — hachchhe —
3 এ িনেয় মে া কতৃপে র স েক আেলাচনাও এ পয মে া কতৃপে র সে আেলাচনা
কেরেছ ডুক াব ৷ কেরেছ ডুক াব ৷
e niYe meTro kartRRipakShera e paryanta meTro kartRRipak-
samparke aalochanaao shuru Shera sa Nge aalochanaa shuru
karechhe Dukyaaba — karechhe Dukyaaba —
4 নমুনা সং হ কের বােয়াপিসর পে পাঠান তঁারা নমুনা সং হ কের বােয়াপিসর চেয় পাঠান তঁারা
। ।
namunaa sa.ngraha kare baaY- namunaa sa.ngraha kare baaY-
opasira pakShe paaThaana opasira cheYe paaThaana
taa.Nraa taa.Nraa
5 ১১ জুন রামািনয়ার কন ানটায় ওই জাহােজ ১১ জুন রামািনয়ার কন ানটায় ওই জাহােজ
কলকাতার সােথ লাহার পাইপ তালা হয় ৷ কলকাতার সে লাহার পাইপ তালা হয় ৷
11 juna romaaniYaara kanas- 11 juna romaaniYaara kanas-
TaanaTaaYa oi jaahaaje TaanaTaaYa oi jaahaaje
kalakaataara saathe lohaara kalakaataara sa Nge lo-
paaipa tolaa haYa — haara paaipa tolaa haYa —
6 ওই হাসপাতােল দেরাগীেদর ারা ২িট শয া ওই হাসপাতােল দেরাগীেদর মেধ ২িট শয া
রেয়েছ । রেয়েছ ।
oi haasapaataale hRRidarogi- oi haasapaataale hRRidarogi-
idera dbaaraa 2Ti shaJyaa idera madhJe 2Ti shaJyaa
raYechhe raYechhe
7 িক মন িদেয় ৃিত মুেছ যায়িন ৷ িক মেনর মেধ ৃিত মুেছ যায়িন ৷
kintu mana diYe smRRiti kintu manera madhJe smRRiti
muchhe JaaYani — muchhe JaaYani —
8 অি েজন দওয়ার পয িসিল ার এেন িত অি েজন দওয়ার থেক িসিল ার এেন িত
নওয়া হি ল । নওয়া হি ল ।
aksijena deoYaara paryanta aksijena deoYaara theke silin-
silinDaara ene prastuti neoYaa Daara ene prastuti neoYaa
hachchhila hachchhila
APPENDIX C
Examples of sentences having different complexity

APPENDIX D
Examples of sentences collected from literature
domain
REFERENCES
Agirre, E., K. Gojenola, K. Sarasola, and A. Voutilainen, Towards a Single Pro-
posal in Spelling Correction. In Proceedings of the 36th Annual Meeting of the Asso-
ciation for Computational Linguistics and 17th International Conference on Computa-
tional Linguistics, volume 1 of ACL ’98. Association for Computational Linguis-
tics, Stroudsburg, PA, USA, 1998. URL http://dx.doi.org/10.3115/980845.
980850.
Albert, R. and A.-L. Barabási (2002). Statistical Mechanics of Complex Net-

works. Rev. Mod. Phys., 74(1), 47–97. URL http://link.aps.org/doi/10.1103/
RevModPhys.74.47.
Alfred, W., The Elements of English Grammar. 2nd. Pitt Press, Cambridge, 1894.
Allen, J., Natural Language Understanding. Menlo Park, Benjamin/Cummings, 1987.
Angell, R. C., G. E. Freund, and P. Willett (1983). Automatic Spelling Correction

Using a Trigram Similarity Measure. Information Processing and Management,
19(4), 255–261. URL http://www.sciencedirect.com/science/article/pii/
0306457383900225.
Arppe, A., Developing a grammar checker for swedish. In Proceedings of the Twelfth
Nordic Conference in Computational Linguistics. 2000.
Bachman, L. F., Fundamental Considerations in Language Testing. Oxford University

Press, Oxford, 1976.
Banko and Brill, Scaling to very large corpora for natural language disambigua-
tion. In ACL. 2001.
Bansal, B., M. Choudhury, P. R. Ray, S. Sarkar, and A. Basu, Isolated-word Error

Correction for Partially Phonemic Languages using Phonetic Cues. In Proceedings
of the International conference on Knowledge based Computer Systems. 2004.
Baptist, L. and S. Seneff, Genesis-ii: A versatile system for language generation

in conversational system applications. In Proceeding ICSLP. Beijing, China, 2000.
Bareiss, E. R., B. E. Porter, and C. C. Wier, Machine Learning and Uncertain

Reasoning. chapter PROTOS: an exemplar-based learning apprentice. Academic
Press Ltd., London, UK, UK, 1990. ISBN 0-12-273252-9, 1–13. URL http://dl.
acm.org/citation.cfm?id=92900.92906.
Bartha, C., T. Spiegelhauer, R. Dormeyer, and I. Fischer (2006). Word order and
discontinuities in dependency grammar. Acta Cybern., 17(3), 617–632. URL http:
//dblp.uni-trier.de/db/journals/actaC/actaC17.html#BarthaSDF06.
115
Basili, R. and F. M. Zanzotto (2002). Parsing Engineering and Empirical Robust-
ness. Natural Language Engineering, 8(2-3), 97–120.
Begum, R., S. Husain, A. Dhwaj, D. M. Sharma, L. Bai, and R. Sangal, De-

pendency Annotation Scheme for Indian Languages. In Proceedings of IJCNLP.
2008.
Bellman, R., Dynamic Programming. Princeton University Press, Princeton, NJ,

USA, 1957, 1 edition.
Bender, E. M., D. Flickinger, S. Oepen, A. Walsh, and T. Baldwin, Arboretum:

Using a precision grammar for grammar checking in call. In In Proceedings of
the InSTIL/ICALL Symposium: NLP and Speech Technologies in Advanced Language
Learning Systems. 2004.
Berger, A. L., Pietra, V. J. Della, and S. A. Della (1996). A maximum entropy

approach to natural language processing. Computational Linguistics, 22(1), 39–71.
ISSN 0891-2017. URL http://dl.acm.org/citation.cfm?id=234285.234289.
Bernth, A., Easyenglish: a tool for improving document quality. In Proceedings of

the fifth conference on Applied natural language processing (ANLC’97). Stroudsburg,
PA, USA, 1997.
Bharati, A., V. Chaitanya, and R. Sangal, Natural Language Processing: A Paninian

Perspective. PHI, 2010.
Bharati, A., D. M. Sharma, L. Bai, and R. Sangal (2006). AnnCorra: Annotat-

ing Corpora Guidelines for POS and Chunk Annotation for Indian Languages.
Technical report, Language Technologies Research Centre, IIIT-Hyderabad. URL
http://ltrc.iiit.ac.in/tr031/posguidelines.pdf.
Bhatt, A., M. Choudhury, S. Sarkar, and A. Basu, Exploring the Limits of

Spellcheckers: A comparative Study in Bengali and English. In Proceedings of
the Second Symposium on Indian Morphology, Phonology and Language Engineering.
CIIL Mysore, Kharagpur, INDIA, 2005.
Bhattacharya, S., M. Choudhury, S. Sarkar, and A. Basu, Inflectional Morphology

Synthesis for Bengali Noun, Pronoun and Verb Systems. In Proceedings of the
National Conference on Computer Processing of Bangla. Independent University,
Bangladesh, 2005.
Bhattacharyya, P., M. Mitra, and S. Choudhury, Divergence Patterns between

English and Bangla: Machine Translation Perspective. In Proceedings of the 9th
International Conference of Natural Language Processing. AUKBC, Chennai, India,
2011.
Bigert, J. and O. Knutsson, Robust error detection: a hybrid approach com-

bining unsupervised error detection and linguistic knowledge. In Proceedings
ROMAND-02. Frascati, Italy, 2002.
116
Birn, J., Detecting grammar errors with lingsoft’s swedish grammar checker. In In
Proceedings of the Twelfth Nordic Conference in Computational Linguistics. 2000.
Bolioli, Dini, and Malnati, Jdii: Parsing italian with a robust constraint grammar.
In Proceedings of COLING. 1992.
Bondi, J., Johannessen, K. Hagen, and P. Lane, The performance of a grammar

checker with deviant language input. Mounton de Gruyter, Berlin and New
York, 2002.
Bradac, J. J., R. A. Davies, and J. A. Courtright (1977). The Role of Prior Mes-
sage Context in Evaluative Judgments of High- and Low-Diversity Messages.
Language and Speech, 20(4), 295–307.
Bredenkamp, Crysmann, and Petrea, Looking for Errors: A Declarative Formal-

ism for Resource-Adaptive Language Checking. In Proceedings of the 2nd Inter-
national Conference on Language Resources and Evaluation (LREC-2000). Athens,
Greece, 2000.
Brill, E. and R. C. Moore, An Improved Error Model for Noisy Channel Spelling
Correction. In Proceedings of the 38th Annual Meeting on Association for Computa-
tional Linguistics, ACL ’00. Association for Computational Linguistics, Strouds-
burg, PA, USA, 2000. URL http://dx.doi.org/10.3115/1075218.1075255.
Brockett, C., W. B. Dolan, and M. Gamon, Correcting ESL errors using phrasal
SMT techniques. In Proceedings of the 21st International Conference on Computa-
tional Linguistics and the 44th annual meeting of the Association for Computational
Linguistics, ACL-44. Association for Computational Linguistics, Stroudsburg,
PA, USA, 2006. URL http://dx.doi.org/10.3115/1220175.1220207.
Bustamante, F. R. and F. S. León, Gramcheck: A grammar and style checker. In

Proceedings of COLING-96. 1996.
Campbell, C. and Y. Ying (2011). Learning with support vector machines. Synthesis
Lectures on Artificial Intelligence and Machine Learning, 5, 1–95.
Catt, M. and G. Hirst (1990). An intelligent cali system for grammatical error
diagnosis. Computer Assisted Language Learning, 3, 3–26.
Chatterjee, S. K., The Origin and Development of the Bengali Language. Rupa co.,
New Delhi, 1926.
Chaudhuri, B. and P. Kundu (2000). Error Pattern in Bangla Text. International

Journal of Dravidian Linguistics, (2), 48–88.
Chaudhuri, B. B., Reversed Word Dictionary and Phonetically Similar Word

Grouping based Spell-Checker to Bengali Text. In Proceedings of LESAL Workshop.
2001. URL http://www.emille.lancs.ac.uk/lesal/bangla.pdf.
117
Chaudhuri, B. B., Towards Indian Language Spell-checker Design. In Proceedings of
the Language Engineering Conference, LEC ’02. IEEE Computer Society, Washing-
ton, DC, USA, 2002. ISBN 0-7695-1885-0. URL http://dl.acm.org/citation.
cfm?id=788016.788703.
Chen, W., J. Kazama, Y. Tsuruoka, and K. Torisawa, Improving Graph-based
Dependency Parsing with Decision History. In COLING (Posters). 2010.
Chen, W., J. Kazama, M. Zhang, Y. Tsuruoka, Y. Zhang, Y. Wang, K. Torisawa,
and H. Li, SMT Helps Bitext Dependency Parsing. In EMNLP. 2011.
Chen, W., J. Kazama, M. Zhang, Y. Tsuruoka, Y. Zhang, Y. Wang, K. Torisawa,
and H. Li (2012). Bitext Dependency Parsing With Auto-Generated Bilingual
Treebank. IEEE Transactions on Audio, Speech & Language Processing, 20(5), 1461–
1472.
Chodorow, M., M. Dickinson, R. Israel, and J. R. Tetreault, Problems in Evaluating
Grammatical Error Detection Systems. In COLING. 2012.
Chodorow, M. and C. Leacock, An unsupervised method for detecting grammat-
ical errors. In Proceedings of the 1st North American chapter of the Association for
Computational Linguistics conference (NAACL 2000). San Francisco,CA, 2000.
Choudhury, M., M. Thomas, A. Mukherjee, A. Basu, and N. Ganguly, How
Difficult is it to Develop a Perfect Spell-checker? A Cross-Linguistic Analysis
through Complex Network Approach. In Proceedings of the Second Workshop on
TextGraphs: Graph-Based Algorithms for Natural Language Processing. Association
for Computational Linguistics, Rochester, NY, USA, 2007. URL http://aclweb.
org/anthology//W/W07/W07-0212.pdf.
Clark, P. (1987). PROTOS - A Rational Reconstruction. Technical report, Turing
Institute, Glasgow.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and
Psychological Measurement, 20(1), 37–46. URL http://epm.sagepub.com/cgi/
doi/10.1177/001316446002000104.
Collins-Thompson, K., Enriching Information Retrieval with Reading Level Pre-
diction. In Proceedings of SIGIR 2011 Workshop on Enriching Information Retrieval.
Beijing, China, 2011.
Collins-Thompson, K., P. N. Bennett, R. W. White, S. de la Chica, and D. Sontag,
Personalizing Web Search Results by Reading Level. In Proceedings of the 20th
ACM international conference on Information and knowledge management, CIKM
’11. ACM, New York, NY, USA, 2011. ISBN 978-1-4503-0717-8. URL http:
//doi.acm.org/10.1145/2063576.2063639.
Collins-Thompson, K. and J. Callan (2005). Predicting Reading Difficulty with
Statistical Language Models. Journal of the American Society for Information Science
and Technology, 56(13), 1448–1462. ISSN 1532-2882. URL http://dx.doi.org/
10.1002/asi.20243.
118
Collins-Thompson, K. and J. P. Callan, A Language Modeling Approach to Pre-
dicting Reading Difficulty. In Proceedings of HLT-NAACL. 2004.
Covington, M. A. (1990). Parsing discontinuous constituents in dependency
grammar. Comput. Linguist., 16(4), 234–236. ISSN 0891-2017. URL http:
//dl.acm.org/citation.cfm?id=124992.124997.
Cutting, D., J. Kupiec, J. Pedersen, and P. Sibun, A Practical Part-of-Speech
Tagger. In Proceedings of the third conference on Applied natural language processing,
ANLC ’92. Association for Computational Linguistics, Stroudsburg, PA, USA,
1992. URL http://dx.doi.org/10.3115/974499.974523.
Dale, R., Helping People Write: Grammar Checking and Beyond. In Tutorial in 9th
International Conference of Natural Language Processing. AUKBC, Chennai, India,
2011.
Dale, R., C. Mellish, and M. Zock, Current Research in Natural Language Generation.
Academic Press, London, 1990.
Dale, R., D. Scott, and B. D. Eugenio (1998). Introduction to the Special Issue on
Natural Language Generation. Computational Linguistic, 24(3), 346–353. ISSN
0891-2017. URL http://dl.acm.org/citation.cfm?id=972749.972751.
Daller, M., Guiraud’s index of lexical richness. In British Association of Applied
Linguistics. 2010.
Dalrymple, M., Lexical Functional Grammar. Syntax and Semantics Series, Xerox
Palo Alto Research Center, 2001.
Damerau, F. J. (1964). A Technique for Computer Detection and Correction of
Spelling Errors. Communications of the ACM, 7(3), 171–176. ISSN 0001-0782. URL
http://doi.acm.org/10.1145/363958.363994.
Dandapat, S. and S. Sarkar, Part of Speech Tagging for Bengali with Hidden
Markov Model. In Proceeding of the NLPAI Machine Learning Competition. 2006.
URL http://ltrc.iiit.ac.in/nlpai_contest06/papers/mla.pdf.
Dandapat, S., S. Sarkar, and A. Basu, A Hybrid Model for Part-of-Speech Tag-
ging and Its Application to Bengali. In Proceedings of International Conference on
Computational Intelligence. 2004.
Das, M., S. Borgohain, J. Gogoi, and S. B. Nair, Design and Implementation of a
Spell Checker for Assamese. In Language Engineering Conference. IEEE Computer
Society, 2002. ISBN 0-7695-1885-0. URL http://dblp.uni-trier.de/db/conf/
lec/lec2002.html#DasBGN02.
Dasgupta, S., C. Papadimitriou, and U. Vazirani, Algorithm. Mc. Graw Hill, 2008.
URL http://www.cs.berkeley.edu/~vazirani/algorithms.html.
Dash, N. S. (2013). Part-of-Speech (POS) Tagging of Bengali Written Text Cor-
pus. Bhasha Bijnan o Prayukti, 1(1). URL http://www.academia.edu/3931246/
Part-of-Speech_POS_Tagging_of_Bengali_Written_Text_Corpus.
119
Dave, S., J. Parikh, and P. Bhattacharyya (2001). Interlingua-Based English-Hindi
Machine Translation and Language Divergence. Machine Translation, 16(4), 251–
304.
De Felice, R. and S. Pulman, Automatic detection of preposition errors in learner

writing. volume 26. 2009. URL https://www.calico.org.
De Felice, R. and S. G. Pulman, Automatically acquiring models of preposition

use. In Proceedings of the Fourth ACL-SIGSEM Workshop on Prepositions, SigSem
’07. Association for Computational Linguistics, Stroudsburg, PA, USA, 2007.
URL http://portal.acm.org/citation.cfm?id=1654629.1654639.
Dickinson, M., R. Israel, and S.-H. Lee, Developing Methodology for Korean
Particle Error Detection. In Proceedings of the 6th Workshop on Innovative Use of
NLP for Building Educational Applications, IUNLPBEA ’11. Association for Com-
putational Linguistics, Stroudsburg, PA, USA, 2011. ISBN 9781937284039. URL
http://dl.acm.org/citation.cfm?id=2043132.2043142.
Dini, L. and G. Malnati, Weak constraints and preference rules. In Prefernce in

Eurotra, volume 6. 1993.
Dixit, V., S. Dethe, and R. K. Joshi (2006). Design and Implementation of a

Morphology-based Spell-Checker for Marathi, an Indian Language. Special issue
on Human Language Technologies as a Challenge for Computer Science and Linguistics,
309–316.
Dorr, B., L. Pearl, R. Hwa, and N. Habash, DUSTer: A Method for Unraveling
Cross-Language Divergences for Statistical Word-level Alignment. In Proceedings
of AMTA-02. Springer, 2002.
Douglas, S. and R. Dale, Towards robust patr. In Proceedings of the 14th confer-
ence on Computational linguistics, COLING ’92. Association for Computational
Linguistics, Stroudsburg, PA, USA, 1992. URL http://dx.doi.org/10.3115/
992133.992143.
Drucker, P., Men, Ideas, and Politics. Harvard Business School Publishing, 2010.
Ejerhed, E., Finite state segmentation of discourse into clauses. Cambridge University
Press, 1999.
Ferreira, F., K. Christianson, and A. Hollingworth (2001). Misinterpre-

tations of Garden-Path Sentences: Implications for Models of Sentence
Processing and Reanalysis. Journal of Psycholinguistic Research, 30(1), 3–
20. URL http://scholar.google.de/scholar.bib?q=info:Q9cl72BuXYIJ:
scholar.google.com/&output=citation&hl=en&as_sdt=2005&sciodt=0,
5&ct=citation&cd=9.
Fleiss, J. L. (1981). Statistical methods for rates and proportions.
120
Fliedner, G., A system for checking np agreement in german texts. In In Proceedings
of the Student Research Workshop at the 40th Annual Meeting of the Association for
Computational Lingustics(ACL). 2002.
Foster, J. (2005). Good Reasons for Noting Bad Grammar: Empirical Investigations into
the Parsing of Ungrammatical Written English. Ph.D. thesis, University of Dublin,
Trinity College, Dublin, Ireland.
Foster, J. and i. E. Andersen (2009). Generrate : Generating errors for use in

grammatical error detection. Analysis, (June), 82–90. URL http://www.aclweb.
org/anthology/W/W09/W09-2112.
Fouvry, F., Robust Unification for Linguistics. In Proceedings of ROMAND. Lau-

sanne, 2000.
Fouvry, F., Constraint relaxation with weighted feature structures. In Proceedings

of the Eighth International Workshop on Parsing Technologies (IWPT-03), volume 6.
2003.
Francis, N. W., Problems of Assembling and Computerizing Large Corpora. In

S. Johansson (ed.), Computer corpora in English language research. Norwegian
Computing Centre for the Humanities, 1982. ISBN 9788272830273, 7–24.
Frank, A., J. K. Tracy Holloway King, and J. Maxwell, Optimality theory style
constraint ranking in large-scale lfg grammars. In Proceedings of LFG-98. Bris-
bane,Australia, 1998.
Freund and Schapire (1999). Large margin classification using the perceptron
algorithm. Machine Learning, 37(3), 277–296.
Fujishima, S. and S. Ishizaki, Automated detection of usage errors in non-native

english writing. In IES 2011-Emerging Technology for Better Human Life. 2011.
Gaies, S. J. (1980). T-Unit Analysis in Second Language Research: Applications,

Problems and Limitations. TESOL Quarterly, 14(1), 53–60.
Gamon, M., J. Gao, C. Brockett, A. Klementiev, W. B. Dolan, D. Belenko, and

L. Vanderwende, Using contextual speller techniques and language modeling
for esl error correction. 2008.
Garain, U. and S. De, Dependency parsing in bangla. In Technical Challenges and

Design Issues in Bangla Language Processing. USA, 2013.
Ghosh. A., D. A., Bhaskar. P. and B. S., Dependency parser for bengali: the
ju system. In NLP Tool Contest at International Conference on Natural Language
Processing(ICON 2009). 2009.
Gill, M. S. and G. S. Lehal, A grammar checking system for panjabi. In Proceedings

of the 22nd International Conference on Computational Linguistics. 2008.
121
Golding, A. R., A Bayesian Hybrid Method for Context-Sensitive Spelling Cor-
rection. In Proceedings of the Third Workshop on Very Large Corpora. 1995. URL
http://arxiv.org/pdf/cmp-lg/9606001.pdf.
Golding, A. R. and Y. Schabes, Combining Trigram-based and Feature-based

Methods for Context-Sensitive Spelling Correction. In Proceedings of the 34th
annual meeting on Association for Computational Linguistics, ACL ’96. Association
for Computational Linguistics, Stroudsburg, PA, USA, 1996. URL http://dx.
doi.org/10.3115/981863.981873.
Goyal, P. and R. M. K. Sinha, Translation Divergence in English-Sanskrit-Hindi

Language Pairs. In Sanskrit Computational Linguistics. 2009.
Han, N.-R., M. Chodorow, and C. Leacock, Detecting errors in english article

usage with a maximum entropy classifier trained on large, diverse corpus. 2004.
Han, N.-R., M. Chodorow, and C. Leacock, Detecting errors in english article

usage by non-native speakers. volume 12. Cambridge University Press, New
York, NY, USA, 2006. ISSN 1351-3249. URL http://portal.acm.org/citation.
cfm?id=1133917.1133922.
Han, N.-R., J. R. Tetreault, S.-H. Lee, and J.-Y. Ha, Using an Error-Annotated
Learner Corpus to Develop an ESL/EFL Error Correction System. In LREC. 2010.
Haque, M. T. and M. Kaykobad, Use of Phonetic Similarity for Bangla Spell

Checker. In Proceedings of 5th International Conference on Computer and Information
Technology. 2002. URL http://research.banglacomputing.net/iccit/ICCIT_
pdf/5th%20ICCIT-2002_p182-p185.pdf.
Heidorn, G. E., Augmented phrase structure grammar. In Theoretical Issues in

Natural Lunguage Processing. 1975.
Heidorn, G. E., Intelligence writing assistance. In A Handbook of Natural Language

Processing: Techniques and Applications for the Processing of Language as Text. Marcel
Dekker, New York,USA, 2000.
Heidorn, G. E., K. Jensen, L. A. Miller, R. J.Byrd, and M. Chodorow., The epistle

text-critiquing system. In IBM Systems Journal, volume 21. 1982.
Heift, T. and M. Schulze, Errors and Intelligence in Computer-Assisted Language

Learning: Parsers and Pedagogues. Routledge Taylor and Francis Group, New
York and London, 2007.
Heilman, M., K. Collins-Thompson, and M. Eskenazi, An Analysis of Statistical

Models and Features for Reading Difficulty Prediction. In Proceedings of the
Third Workshop on Innovative Use of NLP for Building Educational Applications,
EANL ’08. Association for Computational Linguistics, Stroudsburg, PA, USA,
2008. ISBN 978-1-932432-08-4. URL http://dl.acm.org/citation.cfm?id=
1631836.1631845.
122
Hein, S., A chart-based frame work for grammar checking-initial studies. In 11th
Nordic Conference in Computational Linguistic.. 1998.
Henrich, V. and T. Reuter (2009). LISGrammarChecker:Language Independent Statis-

tical Grammar Checking. Master’s thesis, Hochschule Darmstadt University.
Hermet, M. and A. Désilets, Using first and second language models to correct
preposition errors in second language authoring. In Proceedings of the Fourth
Workshop on Innovative Use of NLP for Building Educational Applications, EdApp-
sNLP ’09. Association for Computational Linguistics, Stroudsburg, PA, USA,
1609843.1609853.
Hermet, M., A. Désilets, and S. Szpakowicz, Using the web as a linguistic resource
to automatically correct lexico-syntactic errors. In In Proceedings of the Sixth
International Conference on Language Resources and Evaluation. 2008.
Hill, L. R. and S. W. Murray, Comma and Spaces: The Point of Punctuation. In

Proceedings of the 11th Annual CUNY Conference on Human Sentence Processing.
1998.
Hirst, G. and A. Budanitsky (2005). Correcting Real-Word Spelling Errors by

Restoring Lexical Cohesion. Natural Language Engineering, 11(1), 87–111. ISSN
1351-3249. URL http://dx.doi.org/10.1017/S1351324904003560.
Hovy, E., Approaches to the Planning of Coherent Text. In W. R. S. Cecile L. Paris

and W. C. Mann (eds.), Natural Language Generation in Artificial Intelligence and
Computational Linguistics. Kluwer, Boston, 1991, 83–102.
Hunt, K. W., Grammatical Structures Written at Three Grade Levels. NCTE Research
report, USA, 1965.
Israel, R., J. R. Tetreault, and M. Chodorow, Correcting Comma Errors in Learner

Essays, and Restoring Commas in Newswire Text. In HLT-NAACL. 2012.
Izumi, E., K. Uchimoto, T. Saiga, T. Supnithi, and H. Isahara, Automatic error

detection in the japanese learners’ english spoken data. In Proceedings of the 41st
Annual Meeting on Association for Computational Linguistics - Volume 2, ACL ’03.
Association for Computational Linguistics, Stroudsburg, PA, USA, 2003. ISBN
0-111-456789. URL http://dx.doi.org/10.3115/1075178.1075202.
Jelinek, F. and R. L. Mercer, Interpolated Estimation of Markov Source Parameters

from Sparse Data. In Proceedings of the Workshop on Pattern Recognition in Practice.
Amsterdam, The Netherlands: North-Holland, 1980.
Jensen, K., G. E. Heidorn, L. A. Miller, and Y. Ravin, Parse fitting and prose
fixing: Getting a hold on ill-formedness. In American Journal of Computational
Linguistics. 1983.
Jensen, K., G. E. Heidorn, and S. D. Richardson, Natural language processing: the

PLNLP approach. Kluwer Academic Publishers, 1993.
123
Joshi, Levy, and Takahashi (1975). Tree adjunct grammars. Computer Systems
Science, 10.
Jurafsky, D. and J. H. Martin, Speech and Language Processing: An Introduction

to Natural Language Processing, Speech Recognition, and Computational Linguistics.
Prentice-Hall, 2009.
Kapatsinski, V. (2006). Sound Similarity Relations in the Mental Lexicon: Model-

ing the Lexicon as a Complex Network. Speech research Lab Progress Report, 27,
133–152.
Karlsson, F., Constraint grammar as a framework for parsing running text. In

Proceedings of the 13th Conference on Computational Linguistics, volume 3. Helsinki,
Finland, 1990a.
Karlsson, F., Constraint Grammar as a Framework for Parsing Running Text. In

Proceedings of the 13th conference on Computational linguistics - Volume 3, COLING
’90. Association for Computational Linguistics, Stroudsburg, PA, USA, 1990b.
ISBN 952-90-2028-7. URL http://dx.doi.org/10.3115/991146.991176.
Karlsson, F., A. Voutilainen, J. Heikkil, and A. Anttila., Constraint grammar: A

language independent system for parsing unrestricted text. In Proceedings of the
19th International Conference on Computational Linguistic. 1995.
Kernighan, M. D., K. W. Church, and W. A. Gale, A Spelling Correction Program

based on a Noisy Channel Model. In Proceedings of the 13th International Confer-
ence on Computational Linguistics, COLING ’90. Association for Computational
997939.997975.
Khader, R. A., T. H. King, and M. Butt, Deep call grammars: The lfg-ot experiment.
2004.
Kilgarriff, A. (2007). Googleology is bad science. Comput. Linguist., 33(1), 147–151.

ISSN 0891-2017. URL http://dx.doi.org/10.1162/coli.2007.33.1.147.
Kim, J. Y., K. Collins-Thompson, P. N. Bennett, and S. T. Dumais, Characterizing

Web Content, User Interests, and Search Behavior by Reading Level and Topic. In
Proceedings of the fifth ACM international conference on Web search and data mining,
WSDM ’12. ACM, New York, NY, USA, 2012. ISBN 978-1-4503-0747-5. URL
http://doi.acm.org/10.1145/2124295.2124323.
Knight, K. and I. Chander, Automated postediting of documents. In In Proceedings

of the Twelfth National Conference on Artificial Intelligent (AAAI). 1994.
Knuth, D. E., The Art of Computer Programming, volume 3. Addison Wesley Long-
man Publishing Co., Inc., Redwood City, CA, USA, 1998, 2 edition. ISBN 0-201-
89685-0.
124
Krishnakumaran, S. and X. Zhu, Hunting Elusive Metaphors Using Lexical Re-
sources. In Proceedings of the Workshop on Computational Approaches to Figura-
tive Language, FigLanguages ’07. Association for Computational Linguistics,
Stroudsburg, PA, USA, 2007. URL http://dl.acm.org/citation.cfm?id=
1611528.1611531.
Kukich, K. (1992). Techniques for Automatically Correcting Words in Text. ACM

Computing Survey, 24(4), 377–439. ISSN 0360-0300. URL http://doi.acm.org/
10.1145/146370.146380.
Kundu, B. and S. Chandra, Automatic Detection of English Words in Benglish

Text. In Proceedings of 4th International Conference on Intelligent Human Computer
Interaction. 2012.
Lascarides, A., T. Briscoe, P. Street, N. Asher, and A. Copestake (1996). Order

Independent and Persistent Typed Default Unification. Linguistic and Philosophy,
19(1), 1–90.
Lavie, A. and A. Agarwal, Meteor: an Automatic Metric for MT Evalua-

tion with High Levels of Correlation with Human Judgments. In Proceed-
ings of the Second Workshop on Statistical Machine Translation, StatMT ’07. As-
sociation for Computational Linguistics, Stroudsburg, PA, USA, 2007. URL
Leacock, C., M. Chodorow, M. Gamon, and J. Tetreault, Automatic Grammati-

cal Error Detection for Language Learners. Synthesis Lectures on Human Language
Technologies. Morgan Claypool, 2010.
Lee, J. and S. Seneff, Correcting Misuse of Verb Forms. June. Association for Compu-
tational Linguistics, 2008, 174–182. URL http://www.aclweb.org/anthology/
P/P08/P08-1021.
Lee, S., J. Lee, H. Noh, K. Lee, and G. G. Lee (2011). Grammatical Error Simulation
for Computer-Assisted Language Learning. Knowledge Based System, 24(6), 868–
876.
Lehal, G. S., Design and Implementation of Punjabi Spell Checker. In Interna-

tional Journal of Systemics. 2007. URL http://learnpunjabi.org/pdf/Punjabi%
20Spell%20Checker%20%282%29.pdf.
Levenshtein, V. I. (1966). Binary codes capable of correcting deletions,insertions,

and reversals. 10(8), 707–710.
Lewis and Paul (2009). Ethnologue: Languages of the world. URL http://www.
ethnologue.com/.
Lin, Y. C. and K. Y. Su (1995). A level synchronous approach to ill-formed sentence

parsing and error recovery. Technical report.
Littlestone, N., Learning quickly when irrelevant attributes abound: A new linear-
threshold algorithm. In Machine Learning. 1988.
125
Liu, C.-H., C.-H. Wu, and M. Harris, Word Order Correction for Language Transfer
Using Relative Position Language Modeling. In 6th International Symposium on
Chinese Spoken Language Processing. 2008.
Lopresti, D. and J. Zhou (1997). Using Consensus Sequence Voting to Correct OCR
Errors. Computer Vision and Image Understanding, 67(1), 39–47. ISSN 1077-3142.
URL http://dx.doi.org/10.1006/cviu.1996.0502.
Lourdes, O. (2003). Syntactic Complexity Measures and Their Relationship to L2

Proficiency A Research Synthesis of College Level L2 Writing. Applied Linguistics,
24(4), 492–518.
Lowth, R., A Short Introduction to English Grammar: With Critical Notes. Millar and
Dodsley, London, 1762.
Lozano and Melero, Spanish nlp projects at microsoft research. In Proceedings

of the 2nd International Workshop of Spanish Language Processing and Language
Technologies. Jaén, Spain, 2001.
Mangal, S. K., Statistic in psychology and education. PHI, India, 2012.
Mangu, L. and E. Brill, Automatic Rule Acquisition for Spelling Correction. In

Proceedings of the Fourteenth International Conference on Machine Learning, ICML
’97. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1997. ISBN
1-55860-486-3. URL http://dl.acm.org/citation.cfm?id=645526.657126.
Manning, C. D., Part-of-Speech Tagging from 97% to 100%: Is It Time for Some
Linguistics? In Proceedings of the 12th international conference on Computational
linguistics and intelligent text processing - Volume Part I, CICLing’11. Springer-
Verlag, Berlin, Heidelberg, 2011. ISBN 978-3-642-19399-6. URL http://dl.acm.
org/citation.cfm?id=1964799.1964816.
Manning, C. D. and H. Schütze, Foundations of Statistical Natural Language Process-

ing. MIT Press, 1999.
Mays, E., F. J. Damerau, and R. L. Mercer (1991). Context based Spelling Cor-
rection. Information Processing and Management, 27(5), 517–522. ISSN 0306-4573.
URL http://dx.doi.org/10.1016/0306-4573(91)90066-U.
McCallum, D. R. and J. L. Peterson, Computer-based readability indexes. In

Proceedings of the ACM ’82 conference, ACM ’82. ACM, New York, NY, USA, 1982.
ISBN 0-89791-085-0. URL http://doi.acm.org/10.1145/800174.809754.
McCarthy, J. (1986). Applications of Circumscription to Formalizing Common-

Sense Knowledge. Artificial Intelligent, 28(1), 89–116.
McCord, M. C., Slot grammars. In Computational Linguistics, volume 6. 1980.
McLaughlin, H. G. (1969). SMOG grading - a new readability formula. Journal of

Reading, 639–646.
126
Mellish, C. S., Some chart-based techniques for parsing ill-formed input. In Pro-
ceedings of the 27th annual meeting on Association for Computational Linguistics,
ACL ’89. Association for Computational Linguistics, Stroudsburg, PA, USA,
1989. URL http://dx.doi.org/10.3115/981623.981636.
Michaud, L. N. and K. F. Mccoy, An intelligent tutoring system for deaf learners
of written english. In In Proceedings of the Fourth International ACM SIGCAPH
Conference on Assistive Technologies (ASSETS 2000). SIGCAPH, 2000.
Michaud, L. N. and K. F. Mccoy, Error profiling: Toward a model of english
acquisition for deaf learners. In In Proc. of the 39th Annual Meeting and the
10th Conference of the European Chapter of Association for Computational Linguistics
(EACL). 2001.
Mitchell, T., Machine Learning. 1997.
Nagata, R., F. Masui, A. Kawai, and N. Isu, Recognizing Article Errors Based on
the Three Head Words. In CELDA. 2004.
Newman, M. E. J. (2003). The structure and function of complex networks. SIAM
Review, 45, 167–256.
Nina H. MacDonald, P. S. G., Lawrence T. Frase and S. A. Keenan., The writer’s
workbench: Computer aids for text analysis. In IEEE Transactions on Communi-
cations, volume 3. 1982.
Nivre, J. (2005). Dependency Grammar and Dependency Parsing. Technical report,
Växjö University: School of Mathematics and Systems Engineering.
Nivre, J. (2008). Algorithms for Deterministic Incremental Dependency Parsing.
Computational Linguistic, 34(4), 513–553. ISSN 0891-2017. URL http://dx.doi.
org/10.1162/coli.07-056-R1-07-027.
Oyama, H. and Y. Matsumoto, A machine learning approach for error identifi-
cation for learners of japanese. In The society for Teaching Japanese as a Foreign
Language Spring Meeting. 2008.
Papineni, K., S. Roukos, T. Ward, and W.-J. Zhu, Bleu: a method for automatic
evaluation of machine translation. In Proceedings of the 40th Annual Meeting on
Association for Computational Linguistics, ACL ’02. Association for Computational
1073083.1073135.
Park, J. C., M. Palmer, and G. Washburn, An english grammar checker as a writing
aid for students of english as a second language. 1997.
Park, Y. A. and R. Levy, Automated Whole Sentence Grammar Correction Us-
ing a Noisy Channel Model. In Proceedings of the 49th Annual Meeting of the
Association for Computational Linguistics: Human Language Technologies - Volume
1, HLT ’11. Association for Computational Linguistics, Stroudsburg, PA, USA,
2002472.2002590.
127
Pazzani, M. J., Conceptual Analysis of Garden-Path Sentences. In Proceedings of the
10th international conference on Computational linguistics, COLING ’84. Association
doi.org/10.3115/980431.980595.
Peterson, J. L. (1980). Computer Programs for Detecting and Correcting Spelling

Errors. Communications of the ACM, 23(12), 676–687. ISSN 0001-0782. URL
http://doi.acm.org/10.1145/359038.359041.
Philips, L. (1990). Hanging on the Metaphone. Computer Language Maga-

zine, 7(12), 39–44. Accessible at http://www.cuj.com/documents/s=8038/
cuj0006philips/.
Popel, M., D. Mareček, N. Green, and Z. Žabokrtský, Influence of Parser

Choice on Dependency-Based MT. In Proceedings of the Sixth Workshop on
Statistical Machine Translation, WMT ’11. Association for Computational Lin-
guistics, Stroudsburg, PA, USA, 2011. ISBN 978-1-937284-12-1. URL http:
//dl.acm.org/citation.cfm?id=2132960.2133019.
Powers, D. M. W., Learning and application of differential grammars. In In Proc.

Meeting of the ACL Special Interest Group in Natural Language Learning. 1997.
Proudian and Pollard, Parsing Head-driven Phrase Structure Grammar. In Pro-

ceedings of the 23rd annual meeting on Association for Computational Linguistics.
Stroudsburg, PA, USA, 1985.
Quinlan, J. R., Introduction to decision trees. In Machine Learning. 1986.
Rabiner, L. and B.-H. Juang, Fundamentals of Speech Recognition. Prentice-Hall, Inc.,

Upper Saddle River, NJ, USA, 1993. ISBN 0-13-015157-2.
Raybaud, S., D. Langlois, and K. Smaïli, Efficient Combination of Confidence

Measures for Machine Translation. In Proceedings of INTERSPEECH. 2009.
Reiter, E. and R. Dale, Building Natural Language Generation Systems. Cambridge

University Press, New York, NY, USA, 2000. ISBN 0-521-62036-8.
Rich and Knight, Artificial Intelligence. 2nd. McGraw Hill, New York, 1991.
Robb, T., S. Ross, and I. Shortreed (1986). Salience of Feedback on Error and Its
Effect on EFL Writing Quality. TESOL Quarterly, 20, 83–93.
Rozovskaya, A. and D. Roth, Algorithm Selection and Model Adaptation for ESL
Correction Tasks. In ACL. 2011.
Sachs, R. and C. Polio (2007). Learners’ Uses of Two Types of Written Feedback
on a L2 Writing Revision Task. Studies in Second Language Acquisition, 29, 67–100.
Sankaran, B., K. Bali, M. Choudhury, T. Bhattacharya, P. Bhattacharyya, G. N.

Jha, S. Rajendran, K. Saravanan, L. Sobha, and K. V. Subbarao, A Common
Parts-of-Speech Tagset Framework for Indian Languages. In LREC. European
128
Language Resources Association, 2008. URL http://dblp.uni-trier.de/db/
conf/lrec/lrec2008.html#SankaranBCBBJRSSS08.
Santorini, B. (1990). Part-Of-Speech Tagging Guidelines for the Penn Treebank

Project 3rd revision, 2nd printing. Technical report, Department of Linguistics,
University of Pennsylvania, Philadelphia, PA, USA.
Scheler, G. and T. Munchen, With raised eyebrows or the eyebrows raised? a neu-
ral network approach to grammar checking for definiteness. In Bilkent University.
1996.
Schmidt-Wigger and Anje, Grammar and Style Checking for German. In Pro-
ceedings of the Second International Workshop on Control Language. Pittsburgh,PA,
1998.
Schneider, D. A. and K. F. McCoy, Recognizing syntactic errors in the writing

of second language learners. In Proceedings of the 17th international conference on
Computational linguistics (COLING’98), volume 2. 1998.
Schuster, E. (1986). The role of native grammars in correcting errors in second

language learning. Computational Intelligence, 2, 93–98.
Schwind, C., Sensitive parsing: error analysis and explanation in an intelligent

language tutoring system. In COLING’88. 1988.
Schwind, C. B., Feature grammar for semantic analysis. In Computer Intelligence.

1990.
Shieber, S. (1985). Evidence Against the Context-Freeness of Natural Language.

Linguistics and Philosophy, 8(3), 333–343. URL http://www.eecs.harvard.edu/
shieber/Biblio/Papers/shieber85.pdf.
Sinha, M., S. Sharma, T. Dasgupta, and A. Basu, New Readability Measures for
Bangla and Hindi Texts. In COLING (Posters). 2012.
Smolensky, P. and G. Legendre, The Harmonic Mind: From Neural Computation to

Optimality-Theoretic Grammar. MIT press, Cambridge, MA, 2006.
Steedman, M. and J. Baldridge, Combinatory categorial grammar. In Non-

Transformational Syntax. Blackwell, 2005.
Stemberger, J. P., Syntactic errors in speech. In Journal of Psycholinguistic Research.

1982.
Tesfaye, D. (2011). A rule-based afan oromo grammar checker. International Journal

of Advanced Computer Science and Applications, 2, 8.
Tetreault, J. R., J. Foster, and M. Chodorow, Using Parse Features for Preposition
Selection and Error Detection. In ACL (Short Papers). 2010.
129
Toutanova, K. and R. C. Moore, Pronunciation Modeling for Improved Spelling
Correction. In Proceedings of the 40th Annual Meeting on Association for Computa-
tional Linguistics, ACL ’02. Association for Computational Linguistics, Strouds-
burg, PA, USA, 2002. URL http://dx.doi.org/10.3115/1073083.1073109.
Uria, L., B. Arrieta, A. D. de Ilarraza, M. Maritxalar, and M. Oronoz, Determiner
errors in basque: Analysis and automatic detection. In Procesamiento del Lenguaje
Natural. 2009.
Uszkoreit, H., Grammar Checking: Theory, Practice and Lessons learned in LATESLAV.
Prague, 1996.
Uzzaman, N., A Bangla Phonetic Encoding for Better Spelling Suggestion. In
Proceedings of 7th International Conference on Computer and Information Technology.
2004.
UzZaman, N. (2005). Phonetic Encoding for Bangla and its Application to
Spelling checker, Transliteration, Cross Language Information Retrieval and
Name Searching. BRAC University. URL http://citeseerx.ist.psu.edu/
viewdoc/download?doi=10.1.1.173.1756&rep=rep1&type=pdf. Undergradu-
ate Thesis.
van Berkel, B. and K. De Smedt, Triphone Analysis: A Combined Method for
the Correction of Orthographical and Typographical Errors. In Proceedings of the
Second Conference on Applied Natural Language Processing, ANLC ’88. Association
doi.org/10.3115/974235.974250.
Van Gael, J., A. Vlachos, and Z. Ghahramani, The Infinite HMM for Unsupervised
PoS Tagging. In Proceedings of the 2009 Conference on Empirical Methods in Natural
Language Processing: Volume 2 - Volume 2, EMNLP ’09. Association for Compu-
tational Linguistics, Stroudsburg, PA, USA, 2009. ISBN 978-1-932432-62-6. URL
Viterbi, A. (1967). Error Bounds for Convolutional Codes and an Asymptotically
Optimum Decoding Algorithm. IEEE Transactions on Information Theory, 13(2),
260–269. URL http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?
arnumber=1054010.
Vitevitch, M. S., Phonological neighbors in a smallworld: What can graph theory
tell us about word learning? In Spring 2005 Talk Series on Networks and Complex
Systems. Indiana University, 2005.
Vogel, C. and R. Cooper, Robust chart parsing with mildly inconsistent feature
structures. In Nonclassical Feature Systems, volume 10. 1995.
Wagner, J., J. Foster, and J. van Genabith, A Comparative Evaluation of Deep and
Shallow Approaches to the Automatic Detection of Common Grammatical Er-
rors. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Lan-
guage Processing and Computational Natural Language Learning (EMNLP-CoNLL).
Association for Computational Linguistics, Prague, Czech Republic, 2007.
130
Wagner, R. A. and M. J. Fischer (1974). The String-to-String Correction Problem.
J. ACM, 21(1), 168–173. ISSN 0004-5411. URL http://doi.acm.org/10.1145/
321796.321811.
Weischedel, R. M., W. M. Voge, and M. James (1978). An artificial intelligence

approach to language instruction. Artificial Intelligence, 10, 225–240.
Whitelaw, C., B. Hutchinson, G. Y. Chung, and G. Ellis, Using the Web for Lan-
guage Independent Spellchecking and Autocorrection. In Proceedings of the 2009
Conference on Empirical Methods in Natural Language Processing, EMNLP ’09. As-
sociation for Computational Linguistics, Stroudsburg, PA, USA, 2009. ISBN 978-
1-932432-62-6. URL http://dl.acm.org/citation.cfm?id=1699571.1699629.
Yannakoudakis, E. J. and D. Fawthrop (1983). The Rules of Spelling Errors.

Information Processing and Management, 19(2), 87–99.
Yi, X., J. Gao, and W. B. Dolan, A web-based english proofing system for english
as a second language users. In Proceeding of the International Joint Conference on
Natural Langauge Processing(IJCNLP). 2008.
Young-Soog and Chae, Improvement of korean proofreading system using corpus

and collocation rules. In Proceedings of the 12th Pacific Asia Conference on Language,
Information and Computation. Singapore, 1998.
Yule, G. U., The Statistical Study of Literary Vocabulary. Cambridge University Press,
1944.
Zhang, Y. and J. Nivre, Transition-Based Dependency Parsing with Rich Non-

Local Features. In Proceedings of the 49th Annual Meeting of the Association for
Computational Linguistics: Human Language Technologies: short papers - Volume
2, HLT ’11. Association for Computational Linguistics, Stroudsburg, PA, USA,
2002736.2002777.
131
LIST OF PAPERS BASED ON THESIS
1. Bibekananda Kundu, Sutanu Chakraborti and Sanjay Kumar Choudhury.

NLG Approach for Bangla Grammatical Error Correction. In Proceeding of 9th
International Conference of Natural Language Processing, Macmillan Publisher,
pp. 225–230, (2011).

Combining Confidence Score and Mal-rule Filters for Automatic Creation
of Bangla Error Corpus: Grammar Checker Perspective. In Proceeding of
13th International Conference on Intelligent Text Processing and Computational
Linguistics, Springer LNCS, 7182, pp. 462–477, (2012).

Complexity Guided Active Learning for Bangla Grammar Correction. In
Proceeding of 10th International Conference of Natural Language Processing, pp.
40–49, (2013).
132

Natural Language Generation For Bangla G PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Natural Language Generation For Bangla G PDF

Uploaded by

Copyright:

Available Formats

Natural Language Generation for Bangla Grammatical

Error Detection and Correction

for the award of the degree

DEPARTMENT OF COMPUTER SCIENCE AND

Grammatical Error Detection and Correction, submitted by Bibekananda Kundu,

Dr. Sutanu Chakraborti Mr. Sanjay Kumar Choudhury

Their various sobering and heartening contributions are unforgettable. I wish to

express my deepest gratitude to my supervisors Dr. Sutanu Chakraborti and Mr.

ing their valuable guidance, inspiring discussions and unfailing encouragement

at their patience on my writing, struggling to understand every bit of it, always

tors for my MS research. I consider myself fortunate to be your advisee and I am

have been possible without encouragement, contributions and constant support

creating an inspiring research environment with interesting projects, seminars and

with the help of his support. Programming libraries, subroutines provided by

regarding necessary linguistic information, whenever I needed. I would like to

study would not have been possible. I am grateful to my organization CDAC

tinuing my research as an External candidate. I would like to express my heartfelt

gratitude to Executive Director of CDAC Kolkata Col. A. K. Nath and Ex-executive

lent feedback and essential criticism on my work through anonymous reviews.

Many useful comments have been provided by conference attendees. I am espe-

suggestions and feedback during my presentation at ICON 2011. I am also thank-

Information Sciences Institute) for providing invaluable feedback and necessary

references. My deepest gratitude goes to my parents who have encouraged me

completion of this thesis.

KEYWORDS: Automatic Grammar Correction, Natural Language Gener-

ation, Automatic Error Corpora Creation, Active Learning

Based Complexity Estimation.

detection and correction of grammatical errors will be of immense help as an aid

for language learning.

Automatic detection and correction of grammatical errors in a morphologi-

is to automatically detect and correct an ungrammatical Bangla sentence having

postpositional and nominal inflectional errors. A methodology needs to be devised

we can rely on such a correction, it will be useful to devise a measure of sentence

complexity with respect to the grammar correction task. If a sentence is complex,

correction methodology. Manual collection of huge error corpora is a tedious and

Therefore, a synthetic error corpora creation methodology has been proposed.

Divergence between two languages influences second language learners to commit

automatic detection and correction of Bangla grammatical errors using a Natural

Language Generation (NLG) technique.

Evaluation of grammar correction system is one of the challenges in this area of

research. Performance of most of the available grammar checkers cannot be com-

pared as diﬀerent systems address diﬀerent types of errors. Moreover, testing on a

common dataset is particularly problematic when diﬀerent grammar checkers are

designed for diﬀerent languages. To circumvent these problems, a Methodology

Bangla grammar checker based on standard Naïve Bayes classification. Results

correction system outperforms the Naïve Bayes classifier system.

LIST OF TABLES xii

LIST OF FIGURES xiv

4 BANGLA GRAMMATICAL ERROR DETECTION AND CORREC-

6 CONCLUSIONS AND FUTURE WORK 107

A Examples of some interesting erroneous sentences corrected by the

B Examples of incorrect prediction by the system. 112

C Examples of sentences having different complexity 113

1.1 Examples of single preposition “with” having diﬀerent types of

2.1 Percentage of various types of errors in Bangla . . . . . . . . . . . 13

3.1 Examples of errors committed by a Bangla Second Language Learner 49

4.1 Example of Nominal Morphological Synthesis . . . . . . . . . . . 72

2.1 Error localization by conventional and reverse dictionary [Chaud-

3.1 Proportion of Errors in Native Speakers and Second Language Learn-

4.1 Generative model for well-formed and ill-formed sentence detec-

5.1 Performance of error detection . . . . . . . . . . . . . . . . . . . . 85

ALEK Assessing Lexical Knowledge