Professional Documents
Culture Documents
A THESIS
submitted by
BIBEKANANDA KUNDU
of
MASTER OF SCIENCE
(by Research)
This is to certify that the thesis titled Natural Language Generation for Bangla
to the Indian Institute of Technology, Madras, for the award of the degree of Master
of Science (by Research), is a bona fide record of the research work done by him
under our supervision. The contents of this thesis, in full or in parts, have not
been submitted to any other Institute or University for the award of any degree or
diploma.
Date: Date:
ACKNOWLEDGEMENTS
I would like to take this opportunity to say big generic thanks to many people
who deserve to be mentioned on this page, but aren’t mentioned here by name.
Sanjay Kumar Choudhury for introducing me to this research topic and provid-
throughout the course of the work. I have enjoyed considerable freedom under
their guidance. They alerted me whenever I was in the wrong track during my
research. They shaped how I think, write and do research at a very deep level
and also taught me how to see where an approach will fail even before I try it. I
have learned from them how to think like a linguist and a computer scientist at the
same time. Their enthusiasm provided a lot of fun in my research. I really wonder
raising questions and providing new exciting ideas. Their advice, encouragement,
constructive criticism helped me a lot in this research. They always had faith in
my capabilities and made sure that I expressed myself more clearly. They pushed
me to think to explore newer avenues of the research. Whenever I felt that I could
not get any further with grammar checking they came up with new ideas which
motivated me to work hard. I could not imagine having better advisers and men-
honoured to be your student in the first place. Work on this dissertation would not
i
from them. During my research work I have worked in a nice and stimulating
research group where I got much pleasure. Many thanks to all my colleagues of
Language Technology section of CDAC Kolkata and my friends of IIT Chennai for
conferences. I especially want to mention Sudipta Debnath for being the helping
hand next door whenever I needed. His inspiration and useful suggestions moti-
vated me in this work. Many of the ideas embodied in this study were crystallized
him appeared as the pillars for building the prototype of the system. I don’t have
enough words to express my feeling, respects and thanks to him. I would also
like to thank Abhijit Chatterjee, Debarun Kar, K.V.S Dileep and Mridusmita Mitra
for helping me a lot, in particular, collecting valuable resources for my study and
careful proof reading of my write-ups. They were the first reviewers of some of
the chapters of this thesis. They noticed many typos, errors, and strange sentence
structures. Their work was much more valuable than that of any grammar checker!
Any errors that still remain in this dissertation are my sole responsibility, but I can
assure you that there are far fewer now than they were used to be. I would also
like to thank Pampa Bhattarchayya and Mridusmita Mitra for always being helpful
thank Subash Chandra who is the first user of my system as a second language
learner of Bangla, Hindi being his mother tongue. He had provided a number of
non-native data which helped me a lot in this research work. I would also like to
thank Pradeep Raychoudhury for helping me to prepare the materials for presen-
tations and posters. Some persons deserve special mention for discussions that
contributed quite directly in this research: Barnali Pal, Sita Rajmohan, Tulika Basu,
Joyanta Basu and Rajib Roy. I am very appreciative of my classmates and friends
ii
at IIT Chennai who participated in this study. These includes Debarun, Dileep,
Sourav, Prateek, Smith and many more. They created a nice research environment
where we have argued, discussed and nurtured our ideas during my presence in
IIT or even some time over telephonic conversations. Without their support, this
Kolkata for providing me the necessary leaves and infrastructure facility for con-
directors Sri A.B. Saha and Sri R. Rabindra Kumar for their inspiration, motivation
and support for this study. I am very much thankful to all the faculty members,
staff members and research scholars of the Department of Computer Science and
Engineering of IIT Madras for their direct or indirect help in various forms during
my course work and research work. The NLP community have provided excel-
cially thankful to Prof. Robert Dale (Macqarie University), Prof. Sudeshna Sarkar
(IIT Kharagpur) and Prof. Puspak Bhattarchayya (IIT Bombay) for their valuable
ful to Dr. Michael Gamon (Microsoft Research) and Prof. Kevin Knight (USC,
to pursue my studies, for being there for me and for always believing in me. My
parents have always inspired me for education right from my childhood, which
becomes a source of eternal driving force for me to pursue a higher degree. Their
prayers are always a great source of strength for me. Their supports have brought
me to where I am now. I cannot find appropriate words to thank my wife Soma for
iii
her steady support, encouragement and love throughout the difficult times in my
career. She must also be thanked for her caring during my entire research work.
Many a time she helped me to decide the title of the papers I have written for
conferences. She carefully read my write ups and motivated me to think from a
readers’ point of view. Along with everything else, I am grateful for her constant
support. She did everything she could to make sure I had enough time to finish my
work. I also could never have completed this study without all the encouragement
and support which I have received from my elder brother, parents-in-law and my
sister-in-law. Thank you for always being there. I am indebted to you all a lot
and cannot thank you enough. I owe all of my success to the essential things that
my family has given me over the years. I am dedicating this thesis to my family.
Finally I thank all my well-wishers who directly or indirectly contributed for the
iv
ABSTRACT
Learning a new language is an integral part of human life. Even after years of
learning, a person is prone to commit mistakes. These errors are due to their
lack of knowledge of the target language and influence of their previously learnt
language [Leacock et al., 2010]. As a consequence, it has been felt that automatic
cally rich and free word order language like Bangla is a non trivial task. Little
research has been done on detection and correction of grammatical errors in such
languages. For Bangla language, this work needs to be done denovo. The problem
for correcting the mistakes committed by users and also to provide relevant ex-
amples for supporting the suggested correction. To have an idea on how strongly
the user should not be overtly reliant on the correction suggested by the system.
v
Conversely if the complexity measure is low, the user can confidently choose the
suggestion.
A sufficiently large error corpus is essential for training and testing of grammar
time consuming task. There is a dearth of error corpora for Bangla Language.
grammatical mistake. It has been widely studied that the divergence between a
pair of languages has a profound effect on various fields of NLP [Dorr et al.,
2002; Dave et al., 2001; Goyal and Sinha, 2009]. The effect of divergence becomes
more pronounced and acute for widely varying language like English and Bangla
[Bhattacharyya et al., 2011; Dave et al., 2001; Goyal and Sinha, 2009]. Bangla is a
morphologically rich language [Bhattacharya et al., 2005; Dandapat et al., 2004] and
has a free word order. Therefore, State-of-the-art Context Free Grammar (CFG)
is not applicable [Begum et al., 2008; Shieber, 1985; Bharati et al., 2010] here. In
addition to this, lack of robust parsers, insufficient linguistic rules and dearth of
error annotated parallel corpora make this grammar correction task much more
challenging. To address these issues, a novel approach has been proposed for
vi
for Evaluation of Grammar Assessment (MEGA) combining a Graded Acceptabil-
ity Assessment Metric (GAAM) and a Complexity Measurement Metric (CMM) has
been introduced. Initially, MEGA has been applied on our Natural Language Gen-
eration (NLG) based Bangla grammar checker. Since direct comparison between
available English grammar checker and the NLG based Bangla grammar checker
is not possible, the NLG based system has been compared against a prototype
show that NLG based approach for our Bangla grammatical error detection and
vii
TABLE OF CONTENTS
ACKNOWLEDGEMENTS i
ABSTRACT v
ABBREVIATIONS xv
NOTATION xviii
1 INTRODUCTION 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Divergence Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 LITERATURE SURVEY 10
2.1 Spell Checker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Grammar Checker . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Automatic Grammar Correction Approaches . . . . . . . . . . . . 18
2.3.1 Rule-based Approach . . . . . . . . . . . . . . . . . . . . . 19
2.3.2 Machine Learning Approach . . . . . . . . . . . . . . . . . 27
2.3.3 Statistical Machine Translation Approach . . . . . . . . . . 39
2.4 Comparison between existing approaches . . . . . . . . . . . . . . 41
2.5 Open Problems and Future Directions . . . . . . . . . . . . . . . . 44
viii
3 AUTOMATIC CREATION OF BANGLA ERROR CORPUS 46
3.1 Errors in Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.1.1 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.2 Experimental Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.3.1 Bangla POS Tagger . . . . . . . . . . . . . . . . . . . . . . . 59
3.3.2 Confidence Score and Mal-rule Filters . . . . . . . . . . . . 63
3.4 Result and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 66
5 EVALUATION 80
5.1 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.2 Standard Evaluation Metric . . . . . . . . . . . . . . . . . . . . . . 82
5.3 Test Suite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.4 Evaluation Methodology . . . . . . . . . . . . . . . . . . . . . . . . 84
5.4.1 Evaluation using Standard Metrics . . . . . . . . . . . . . . 84
5.4.2 Graded Acceptability Assessment Metric: . . . . . . . . . . 86
5.4.3 Complexity Estimation of Grammar Correction . . . . . . 89
ix
D Examples of sentences collected from literature domain 114
LIST OF TABLES
xi
5.1 Evaluation Measure Formulae . . . . . . . . . . . . . . . . . . . . . 82
5.2 True Positive, False Positive, False Negative and True Negative with
respect to grammatical error detection task. . . . . . . . . . . . . . 83
5.3 Performance evaluation of NLG based system on individual errors
as well as combined errors in five text genres. P indicates Precision
and R indicates Recall. . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.4 Grading Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.5 Features for estimation of grammar correction complexity . . . . 96
5.6 Complexity Score in different complexity level . . . . . . . . . . . 97
5.7 Correlation of complexity score with grammar checkers accuracy 103
xii
LIST OF FIGURES
xiii
5.5 Screenshot of active learning framework for estimation of text com-
plexity. The explanation of the feature names are available at
http://nlp.cdackolkata.in/testComplexity/FeatDtl.spy . . 104
5.6 Complexity values across different datasets . . . . . . . . . . . . . 105
5.7 POS Tag distributions in different domains. . . . . . . . . . . . . . 105
5.8 Frequency of word distribution across different domains. . . . . . 106
5.9 Complexity measure and Precision score obtained by NLG based
grammar checker and Naïve Bayes classifier systems. . . . . . . . 106
xiv
ABBREVIATIONS
EP Example Provider
FN False Negative
FP False Positive
xv
HMM Hidden Markov Model
LM Language Model
MI Mutual Information
ML Machine Learning
OT Optimality Theory
POS Parts-of-Speech
S-O-V Subject-Object-Verb
SP Suggestion Provider
S-V-O Subject-Verb-Object
TN True Negative
xvi
TP True Positive
xvii
NOTATION
σ2 Variance
γ Brevity Penalty
µ Mean
Ω Complexity Score
xviii
TRANSLITERATION KEY USED IN THE DISSERTATION
xix
CHAPTER 1
INTRODUCTION
“Grammar has sometimes been described as the Art of speaking and writing correctly.
But people may possess the Art of correctly using their own language without having any
knowledge of grammar. We define it therefore as the Science which treats of words and
Most people are fluent in speaking language but their writing skill is appalling
writing. From naïve users to professional writers, most people are vulnerable
to the curse of grammatical mistakes [Leacock et al., 2010]. Thus casual spoken
language for communication differs from formal written text. Written language
has become more or less a prerequisite for daily communication. Moreover, written
use of some editing work environments, the need of automatic grammatical error
also increasing day by day. Rather than depending only on mechanical assistance,
we are now seeking intellectual support as well. Zillions of people deal with texts
throughout the world without having proper knowledge about the language and
many of them are not necessarily a native speakers. Most of them use spelling
correction tools when writing documents on a computer. These tools provide a first
step towards writing correct text by saving large human intervention. The second
is essential for several reasons. It improves the quality of text, saves time, and
supports learning of the language. This tool not only helps native speakers but
whole, the system plays a pivotal role in Computer Assisted Language Learning
1.1 Motivation
A lot of work has been done in grammatical error detection and correction, mainly
in English language. Very little work has been done for Indian languages. Prob-
ably, Punjabi grammar checker [Gill and Lehal, 2008] is the first and only system
for Bangla language. Bangla is the sixth most widely spoken language in the
world [Lewis and Paul, 2009] and the second in India. It is the national language
from Prakrit which is a sister language of Sanskrit. Sister languages of Bangla are
Oriya, Magahi and Maithili in the west and Assamese in the north east of India
[Chatterjee, 1926]. Bangla, Oriya and Assamese are the eastern most languages
language and has relatively free word order. It follows a Subject-Object-Verb (S-O-
V) pattern but orientation of these three units is flexible, i.e. S-V-O is allowable but
not popularly used. Till now no significant research and development has been
2
done on grammatical error detection and correction of morphologically rich and
free word order languages like Bangla. To the best of our knowledge, ours is the
ted by second language learners of different languages, some novelties are found
with respect to error production in individual language. It has been widely studied
that the divergence between a pair of languages has a profound effect on various
fields of NLP [Dorr et al., 2002; Dave et al., 2001; Goyal and Sinha, 2009]. The diver-
gence between the two languages influences the kind of mistakes second language
learners typically commit. Previous studies have revealed that second language
produce article errors due to divergence between English and those languages that
do not have any article [Leacock et al., 2010]. Therefore article selection is specif-
between Bangla and other languages also influence the kind of mistakes Bangla
second language learners typically commit in their text. Bangla does not have any
English language. Table 1.1 shows a single preposition “with”, it has got different
ond language learners of Bangla when using postposition and nominal inflections.
3
Table 1.1: Examples of single preposition “with” having different types of realiza-
tion in Bangla
English Bangla
A girl with beautiful eyes. sundara chokhera ekaTi meYe.
A boy with high fever. prachaNDa jbare AkrAnta ekaTi chhele
He wrote with a pen. se pena diYe likhechhila
Milkman mixes water with milk. dudhaoYAlA dudhera sAthe jala meshAna
1.3 Challenges
Initially most grammar checkers were available as a part of word processors, but
checker tools are already available for English and for other European languages,
they have not matured enough to guarantee correct results most of the time for
every error. They do not provide satisfactorily account for complexities of individ-
ual languages. Moreover, these systems have several limitations. One significant
issue is false alarm where correct constituents are indicted as incorrect which badly
affected the learners’ language acquisition process [Leacock et al., 2010]. Many a
time even ill-formed constructions are not inspected. Thus, in spite of automated
order to achieve high quality text. Therefore, there is a need for an efficient and
reliable grammar checker which can alleviate the potential problems of existing
systems.
Available grammar checkers for other languages are developed based on either
4
checker checks the grammatical structure of sentences depending on morpholog-
ical and syntactic analysis. In rule-based approach, rules are manually designed
by linguists to recognize and rectify specific grammatical errors from parse tree
patterns. In Bangla, linguistically rich error correction rules and robust parsers
are not available till date. As Bangla allows free word order, state-of-the-art Con-
text Free Grammar (CFG) is not applicable [Shieber, 1985; Begum et al., 2008;
Bharati et al., 2010] here. Context Free Grammar (CFG) is basically a positional
grammar. It is true that Bangla has a dominant word order, which is SOV (i.e.
in literature or poetry but also found in day-to-day news articles. It has been seen
that news reporters often use this alternative ordering to emphasize the event of
occurrences. The evidence of free word order is very frequent in Bangla news cor-
pora and Bangla blogs. Thus, a parser that follows positional grammar is unable
to generate correct parse tree of a Bangla sentence. Free word order also yields
to structural ambiguity and increases the computational cost of the parser. Dis-
continuities (words that belong together but not placed into the same phrase) and
long distance dependencies also pose problems for positional grammars [Bartha
et al., 2006; Covington, 1990]. These linguistic phenomena are quite common for
free word order languages like Indian languages is based on Paninian framework
and Dependency Parsing techniques [Garain and De, 2013; Ghosh. A. and S., 2009;
Zhang and Nivre, 2011; Nivre, 2008]. It has already been reported in literature that
parser that follows Paninian frame work (designed for free word order languages)
perform well in asymptotic time complexity with the parser for context free gram-
mars (CFG) which are basically designed for positional grammar [Bharati et al.,
2010]. The parsing in Paninial model is based on karaka relations between verbs
5
and nouns in a sentence. It does not consider the position of the constituent during
even when words are discontinuous and related to each other in long distance.
handcrafted linguistic rules. This can alleviate the potential problem of rule-based
Bangla text. One of the major problems of building such error corpus from learn-
ers’ data is that the process is very time consuming. It also requires linguistic
knowledge to examine each sentence of learners’ text to determine the nature and
frequency of errors.
a method that can evaluate the functionality and usability of existing grammar
checkers. Over the last few years, most studies regarding grammatical error de-
tection and correction have focused towards the design and development aspects.
Very little attention has been paid to evaluation. Evaluation of such a system is
the system is in the right direction. Performance of most of the existing grammar
rors. Moreover, these systems are not tested on a common dataset. Testing on a
designed for different languages. Direct comparison is not possible since different
6
1.4 Research Objectives
The primary goal of the thesis is to develop a grammar checker for Bangla language
error detection and correction methodology for morphologically rich and free
word order languages. In this thesis, our focus is to correct postpositional and
nominal inflection errors which are the most frequent mistakes committed by
relevant examples for supporting the suggested correction. Though the thesis
deals with grammatical error detection and correction of Bangla language, our
7
1.5 Thesis Outline
looked at the motivation behind the research and identified the initial research
objectives that have directed the research. The rest of the thesis is organized into
chapters as follows:
Chapter 2 reviews recent prior work in grammatical error detection and correc-
tion. We do not aim to give a comprehensive review of the related work. Such
this area and the diverse language dependent works based on several theo-
ries and techniques used by researchers over the years. Instead, we briefly
review the work based on different techniques used for grammatical error
Chapter 3 describes our novel approach for automatic creation of Bangla error
corpus for training and evaluation of grammar checkers. Though the present
work focuses on the most frequent grammatical errors in Bangla written text,
with an aim to increase the coverage of the error corpus in future. Mistakes
here and reasons behind such errors in their text are also investigated.
Chapter 4 describes our procedure for automatic detection and correction of Bangla
ical errors using this approach are discussed here. In this chapter, we also
8
discuss the scope and limitations of the proposed approach.
our grammar checkers based on NLG and a baseline system using Naïve
the need of standard test corpus for evaluation of the system. Performance
Chapter 6 summarises the contributions of the research and concludes with future
Appendixes. Some appendixes have been added in order to cover the comple-
by the system.
ferent complexity and severity of errors, including those that are difficult
main.
9
CHAPTER 2
LITERATURE SURVEY
The main focus of the dissertation is on grammar correction task specifically for
Bangla language. It has been assumed that the input to our grammar correction
system is free from spelling errors. Thus in the literature survey, we have largely
to the faulty construction of a sentence. Therefore, it has been felt that a brief
grammar checker. Interested readers can go through the cited references for more
The main task of a spell checker is to find the appropriate word the author intended
to type given a misspelled word. A spell checker is used to correct spelling er-
rors in text, fixing the output of Optical Character Recognition (OCR)1 and Online
ponent of grammar checkers. Spell checkers ensure initial step towards effective
writing. Spelling errors in human writing are committed due to homophone2 and
According to Heift and Schulze [2007] language learners can commit misspellings
1
Examples of OCR generated errors are “rn” for ‘m’, ‘e’ for ‘c’ etc. [Lopresti and Zhou, 1997]
2
Examples of errors due to homophone are “it’s” for “its”, “dessert” for “desert”, “piece” for
“peace” etc.
due to either a misapplication of morphological rules or other influences from their
for the language they have learned. For example, a learner may write “goed” and
et al., 2010]. However, one can argue that this problem beyond the scope of spell
rors are classified into two types, namely, Non-Word error and Real-Word error
[Kukich, 1992]. A Non-Word error occurs when a misspelled word is not a valid
dictionary word. Conversely, a Real-Word error occurs when user writes a valid
dictionary word but it is not suitable in the context of the sentence. Examples of
a Non-Word error and a Real-Word error for a correct sentence “The boys ate their
‘*’ indicates the error words in the sentences. Detection of Real-Word errors is com-
paratively more complex than detection of Non-Word Errors. Many a time context
spelling errors is (1) Orthographical error (also known as ‘Cognitive error’) and (2)
Typological error [Kukich, 1992]. The Orthographical error occurs when author
either simply does not know the correct spelling or forgets it during typing. Or-
similar to the correct word (e.g. “indicies” and “indices”). As a result, these errors
Typological errors occur due to wrong hit of key sequences. Thus characteristics of
this type of errors depend on keyboard layout rather than the language in which
11
Spelling correction happens in three stages viz. “Error Detection”, “Candi-
sources (like keyboard, OCR, Speech-to-Text etc.) can be used to detect and cor-
rect spelling errors. Structural similarity between misspelled word and candidate
corrections is estimated using edit distance [Levenshtein, 1966]. To select the best
candidate correction, the minimum edit distance [Wagner and Fischer, 1974] is
minimum edit distance. For a good overview of edit distance please refers to
[Dasgupta et al., 2008; Jurafsky and Martin, 2009]. Pronunciation similarity is usu-
ally measured by Russel’s Soundex [Knuth, 1998] and Metaphone [Philips, 1990].
Noisy channel model, a probabilistic approach, is used for spelling correction. The
the correct words which have been passed through a noisy channel [Jurafsky and
Martin, 2009]. A comprehensive study of earlier spell checking techniques has been
discussed in [Kukich, 1992] and [Peterson, 1980]. There are various approaches
for Non-Word spelling corrections like Trigram Analysis [Angell et al., 1983], Error
Patterns [Yannakoudakis and Fawthrop, 1983], Triphone Analysis [van Berkel and
De Smedt, 1988], Noisy Channel Model [Kernighan et al., 1990], Using Context
[Agirre et al., 1998], String-to-String Edits [Brill and Moore, 2000] and Pronuncia-
tion Model [Toutanova and Moore, 2002]. To correct Real-Word errors approaches
like Trigram based [Mays et al., 1991], Noisy Channel Model [Mays et al., 1991],
[Whitelaw et al., 2009] and Using Confusion Sets [Golding, 1995; Golding and
Schabes, 1996; Mangu and Brill, 1997] have been discussed in the literature.
12
Table 2.1: Percentage of various types of errors in Bangla
huri, 2002, 2001; Choudhury et al., 2007; UzZaman, 2005; Haque and Kaykobad,
2002; Uzzaman, 2004; Bhatt et al., 2005; Bansal et al., 2004], Assamese [Das et al.,
2002], Punjabi [Lehal, 2007], Marathi [Dixit et al., 2006] etc. are worth mentioning.
written text. These patterns have been collected from samples of answer scripts of
students at various levels of studies like Secondary, Higher Secondary and Under-
graduate. For studying phonetic spelling errors, they have also collected samples
of dictated notes. These notes have been dictated from various topics chosen from
story, novels, books of science, geography, history etc. They have manually col-
lected the misspelled words from these texts. Illegible words and words of length
greater than four but having more than three errors have been rejected. They have
analysed the different types of spelling errors (substitution, deletion, insertion and
transposition) found in Bangla text. The percentages of such errors are shown
in Table 2.1. They have seen that most misspelling take place by omissions of
Bangla, dental and cerebral nasal consonants are phonetically very similar. As a
result there is a chance of misspelling when proper spelling rules are not remem-
bered. Details of the error pattern analysis have been reported in [Chaudhuri and
3
mAtrA or shirorekhA is a horizontal line present at the upper part of many Bangla characters.
13
Kundu, 2000]. They have proposed two-stage techniques to detect and correct
Non-Word errors in Bangla text. The first stage takes care of phonetic similarity
error and the second stage takes care of errors other than the phonetic similarity.
The phonetically similar characters are mapped into single units of character code.
A new dictionary Dc is constructed with this reduced set of alphabet. They have
are kept in reverse order. A phonetically similar but wrongly spelt word can be
easily corrected using Dc . Phonetically non similar misspelled words are searched
in both the dictionaries. If the word of length n is not found in Dc , then its first k1
characters are matched with words in this dictionary. The last k2 characters of the
same word are searched in Dr . A misspelled word with a single error is located in
the intersection region of the first k1+1 and last k2+1 characters. Figure 2.1 shows the
are suggested by searching in the conventional dictionary for those words start-
ing with first k1 characters and ending with last k2 characters. They have tested
their approach on 250k words and reported that all Non-Word errors are correctly
14
error detection and correction in Bangla, Hindi and English through the concep-
similar work of complex network approach [Albert and Barabási, 2002; Newman,
Vitevitch, 2005]. The SpellNet is a weighted network of words, where the nodes
represent the words and the weights of the edges indicate the orthographic simi-
larity between the pair of words they connect. The structure of a SpellNet is shown
in the Figure 2.2. They have focused on the networks at three different thresholds
(θ) of edge weights, that is for θ = 1, 3 and 5. They have studied the properties of
15
Complex Network at these θ values for the three languages. They do not consider
olded counterpart of Figure 2.2, for θ = 1 has been shown in Figure 2.3. It has
been seen that the orthography of the two Indian languages Bangla and Hindi are
the average weighted degree of SpellNet. They have reported that the probability
an input text. This verification process is executed by the Grammar Checker. The
writing aid that examines written text to detect and correct grammatical mistakes
and provides necessary feedback to the user. Figure 2.4 shows a basic functional
papers, and depending on the technique that is used, "correction" could be per-
formed, without any kind of "detection". In this figure, the dotted line indicates
this situation. Grammar checkers are one of the most widely used tools in the
area of language processing. Though most of the existing grammar checkers are in
English, grammar checkers for other languages are also available. Table 2.2 shows
research work carried out in languages other than English. In this chapter, rele-
16
Figure 2.4: Simplified functional diagram of grammatical error detection and cor-
rection
Languages Authors
Afan Oromo Tesfaye [2011]
Basque Uria et al. [2009]
Chinese Liu et al. [2008]
French Hermet et al. [2008]
German Schmidt-Wigger and Anje [1998]
Japanese Izumi et al. [2003]
Korean Young-Soog and Chae [1998]
Norwegian Bondi et al. [2002]
Punjabi Gill and Lehal [2008]
Spanish Lozano and Melero [2001]
Swedish Birn [2000]
17
vant research work in relation to grammatical error detection and correction are
surveyed. The aim of the chapter is to provide a brief idea regarding the existing
Uszkoreit [1996] (quoted in [Hein, 1998]), suggested a four level scheme for gram-
ii. Recognition: deals with localization and identification of the probable vio-
lated constructions.
iv. Correction: deals with construction and ordering of the correct alternatives.
Different approaches have been taken for grammatical error detection and
Bredenkamp et al., 2000; Jensen et al., 1983] follow Rule-based or Parser based ap-
proach, some [Fujishima and Ishizaki, 2011; Izumi et al., 2003; Bigert and Knutsson,
2002; Knight and Chander, 1994] follow purely Machine Learning (ML) based em-
pirical approach, while others [Hermet and Désilets, 2009; Liu et al., 2008] prefer
Statistical Machine Translation (SMT) based approach. A brief road map of gram-
matical error detection and correction approaches is shown in Table 2.3. Research
in the field of grammatical error detection and correction started since the early
1978 [Weischedel et al., 1978]. From 1978 to 2002, most grammar checkers followed
rule-based approach. Since 2002, ML and SMT approaches have dominated over
18
Table 2.3: A brief road map of grammatical error detection and correction ap-
proaches.
Rule-based Approach
Nina H. MacDonald and Keenan. [1982]: String matching.
Jensen et al. [1983]: Parse fitting.
Douglas and Dale [1992]: Constraint Relaxation.
Bredenkamp et al. [2000]: Syntax-based.
Lozano and Melero [2001]: Syntactic and Semantic analysis.
Machine Learning based Approach
Knight and Chander [1994]: Decision tree classifier.
Scheler and Munchen [1996]: Neural Network.
Bigert and Knutsson [2002]: n-gram Language Model.
Izumi et al. [2003]: Maximum entropy classifier.
Yi et al. [2008]: Web counting.
Fujishima and Ishizaki [2011]: Support Vector Machine.
Statistical Machine Translation based Approach
Liu et al. [2008]: Noisy channel Model.
Hermet and Désilets [2009]: Round trip SMT technique.
and rectify specific grammatical errors from parse tree patterns. A rule-based
individual words are mapped to their lexical components and necessary informa-
tion related to their lexical structures are returned. In syntactic analysis, a parser
is used to analyse sentence structure and build its structural representation. This
in a sentence [Rich and Knight, 1991]. The primary goal of a rule-based system
19
is to parse ill-formed sentences in order to detect and correct the errors in a sen-
a list of errors. Related rules of errors are grouped together for identification
error detection and correction were based on pattern matching or rule-based tech-
heuristic rules. But later, improvement has been done using computational gram-
mar like Precision Grammar [Bender et al., 2004], Lexical Functional Grammar
20
Structure Grammar [Proudian and Pollard, 1985], Tree Adjoining Grammar [Joshi
et al., 1975], Augmented Phrase Structure Grammar (APSG) [Heidorn, 1975], etc.
Besides this, smart parsing techniques like Constraint Relaxation [Fouvry, 2003;
Vogel and Cooper, 1995; Bolioli et al., 1992] and Parse Fitting Jensen et al. [1983]
are also employed for efficient grammar correction. Unix Writer’s workbench
[Nina H. MacDonald and Keenan., 1982] was one of the oldest and widely used
grammar checkers which was based on a string matching algorithm rather than
sis, while FLAG [Bredenkamp et al., 2000], VIRKKU, MS-NLP [Heidorn, 2000;
Lozano and Melero, 2001], EasyEnglish [Bernth, 1997], NGC [Bondi et al., 2002],
Grammatifix [Arppe, 2000; Birn, 2000] etc. systems carried out detailed linguis-
and León, 1996], SCRIPSI [Catt and Hirst, 1990] etc. are examples of some exist-
ing grammar checkers that follow constraint relaxation technique. Today’s open
mar rule for grammar checking. VP2 [Schuster, 1986], the Intelligent Language
Tutor [Schwind, 1988] and Automated German Tutor [Weischedel et al., 1978] use
linguistic tools like POS tagger and rule-based parser that depends on relatively
tems in brief.
21
Syntax-Based
MS-NLP [Heidorn, 2000; Lozano and Melero, 2001] is a rule-based system that
crosoft Word. The main focus of this system is to detect and correct the specific
types of errors made by native speakers such as subject verb disagreement, num-
ber disagreement, etc. The grammatical error detection process of this system
consists of four stages. In the first stage, the input text is tokenised into individual
words. Then these tokens are passed to the morphological analyser for analysis of
it provides the basic syntactic parsing of the input sentence. The system uses Aug-
phrase structure rules. The third stage is known as “portrait”, informally known
system may look like as shown in Figure 2.6. The fourth stage is known as “logical
form”. A semantic graph is produced in this stage to display the basic semantic
relations underlying the syntactic tree. The MS-NLP system consists of a seman-
tic analyser known as MindNet which is used for word sense disambiguation.
errors of a text written by deaf students and enables them to generate appropri-
The system uses Context Free Grammar (CFG) augmented with error-production
22
Figure 2.6: Syntax tree generated by MS-NLP System
rules known as mal-rules. Mal-rules precisely describe expected error forms and
and use annotation to indicate the types of errors in the sentence. An example
for mal-rule for detecting “be” verb deletion error for an English sentence “The
boy honest” may look like VP(error+) → AdjP where the conventional context free
The system contains a preprocessor which performs spell checking on the input
text and has three other major components. These components are Morphological
23
Table 2.4: Syntax based grammatical error detection and correction approaches.
put sentence is tokenized and spell checked. Then a morphological analyzer and a
POS tagger are used to provide necessary lexical attributes and POS tags. Then the
the grammatical context. Finally, the Error Detector module detects grammatical
errors depending on the linguistic inputs provided by the previous two modules.
Table 2.4 shows that other researchers follow the similar Rule-based techniques
for grammatical error detection and correction but their methodologies differ de-
pending on the grammar and language processing tools they have used.
Constraint Relaxation
24
relaxed until the sentence can be parsed completely. Corrections are suggested after
IBM’s EPISTLE [Heidorn et al., 1982] system performs complete linguistic anal-
ysis using rule-based grammar and parser built on that grammar. This system
checks both the grammar and style of English written texts. Grammar checking
module takes care of the improper agreement between subject and verb whereas
the style checking module points out problems regarding excessively complex sen-
NP VP (NUMB.AGREE.NUMB(NP)) → VP(SUBJECT=NP).
For analysing a text this system follows three levels like Word processing, Gram-
mar checking and Style checking. At Word processing level system does efficient
dictionary lookup and also deals with suffixes and prefixes. This dictionary look
up procedure returns necessary attributes of words along with the POS tag for
processing system attempts to parse each sentence in order to check the grammat-
ical construction of the sentence. Those sentences will be parsed successfully that
follow the specified grammar rules (constituent class patterns) along with the im-
hand, unsuccessful sentences are parsed again by relaxing some of the conditions
and with some additional rules. The relaxed conditions and the corresponding
problematic constituents of the sentence are noted to provide the indication and
checking are utilized later by Style processing module to detect probable stylistic
25
CRITIQUE is a text processing system which checks grammar as well as style
using a broad-coverage PLNLP English Grammar [Jensen et al., 1993]. The system
combination with constraint relaxation and a method for parse ranking based on
select a best parse or a small number of best parses from a ‘parse forest’, which is
Other researchers like Douglas and Dale [1992],Dini and Malnati [1993] and
Schwind [1990] also follow Constraint Relaxation technique for grammatical error
Parse Fitting
A parse fitting procedure proposed by Jensen et al. [1983] to “fit” together pieces
of a parse-tree when the parser fails to generate the complete parse tree for a
tree when the rules of a conventional syntactic grammar fail to parse an input
string. The approximate parse tree can serve as input to the remaining stages of
the string then the fitting procedure begins. The by-product of this unsuccessful
26
Mellish [1989] has presented a generalized parsing strategy based on an active
chart which can diagnose errors in sentences. His proposed technique applies a
top down parser when the bottom up parser fails to produce the complete parse
tree. This is done so that the top down parser can examine the pieces of parse
constituents of the bottom up parser and provides a suggestion where the bottom
generalized models help to predict the future data. According to Mitchell [1997],
“A computer program is said to learn from experience E with respect to some class of
improves with experience E”. In the field of ML, for grammatical error detection and
correction some of the researchers use Language Modelling approach Hermet and
Désilets [2009]; Bigert and Knutsson [2002]; Chodorow and Leacock [2000], some
of them [Knight and Chander, 1994; Gamon et al., 2008; Izumi et al., 2003] prefer
classification based approach and rest prefer web counting method. Now we will
Language Modeling
27
word sequence of a particular language is indicative of likelihood of this sequence
being uttered by a speaker of that language. From the training corpus, LM gathers
P(w1 , w2 , w3 , · · · , wn ) = P(wn1 )
corpus. To resolve this problem, the probability of a word wn given all the previous
words can be approximated by the probability given only the previous N words.
Figure 2.7 shows examples of trigrams for English sentence “Ram is a good boy”.
score is below some predefined threshold value then the sentence is considered
28
as ill-formed sentence, otherwise the sentence is grammatically correct. Many
researchers prefer POS tag sequences rather than word sequences. N-grams of
POS tags have many useful properties. Some of the features of the language it-
self are captured by n-grams as they are extracted from a corpus representing the
language. The extracted features contain only local information due to the limited
Bigert and Knutsson [2002] have proposed a robust probabilistic method for
tagger. N-gram constituents are collected from resulting tag sequences and then
the occurrence frequency of each n-gram is fetched from the n-gram frequency
table. If the frequency is greater than a predefined threshold value then this con-
ever, due to the sparseness of the tags participating in the n-gram, sometimes it
not be encountered in the training data. To mitigate this problem, they have built
a confusion matrix which is a matrix of syntactic distance between POS tags. This
matrix contains information about how suitable one tag is in the context of another.
This information is utilized at the time of replacing one tag with the other. A rare
tag is substituted with a tag of higher frequency suitable in the same context. If tag
given context. For these reasons they have introduced a weight. Representative
list is built using distance between two tags and to measure this distance, L1-norm
29
and POS tag n-grams are used. The process of distance calculation is explained
below:
f req(tL , t, tR )
n(tL , t, tR ) = (2.2)
f req(t)
If t′ is the replacement tag for the tag t and tL and tR are two context surrounding
the tag t. Then the distance is calculated by measuring the difference between the
Finally, all POS tag contexts are considered and the generalized equation becomes:
∑
dist(t, t′ ) = disttL ,tR (t, t′ ) (2.4)
tL ,tR
Distance dist(t, t′ ) calculated using this formula ranges from 0 to 2. When the con-
texts are identical then the vale is 0 and when the uses of t and t′ are disjunct then
the tag t′ is less appropriate than tag t in this context. By substituting the tag with
its representative tag and maintaining the similar syntactic structure, their algo-
used to avoid false alarm generated by the system. Their algorithm utilizes the
30
information of clause boundaries where clauses are used as the unit for error de-
tection algorithm to operate on it. For the detection of clause boundaries they have
matical errors by inferring negative evidence from the edited text corpus. They
ALEK was trained on a general purpose corpus of English edited text containing
word’s local context cues, the system identifies inappropriate usage. ALEK infers
negative evidence from the contextual cues that do not co-occur with the target
word. The system collects contextual cues in a ±2 word window around the target
word. Function words (closed-class items) and POS tags are the two kinds of
contextual cues used by the system. Initially, sentences have been tagged using
POS tagger and then the frequency of sequences of adjacent POS tags and function
words are counted. For example, in the sentence “a/AT tall/JJ man/NN”, the occur-
rence frequency of the bi-gram sequences AT+JJ, JJ+NN, a+JJ, and unigram count
of individual POS tags and functional words are calculated. These frequencies
are the basis of their error detection measure. To determine the unusual and rare
combination of POS tags and functional words, ALEK computes Mutual Infor-
mation (MI) based measure. MI based measure is used to find combinations that
occurs less often than expected. Usually n-gram probabilities of ungrammatical se-
quences are much smaller than the product of the unigram probabilities. Then the
syntactic rule is violated. The experimental result shows that ALEK performs with
80% Precision and 20% Recall. Powers [1997] explored the concept of Differential
31
Grammar and applied bigram frequency of POS tag sequences to discriminate be-
pair of confused words in all contexts. According to Powers [1997] the definition of
does not have a concept of rule like traditional rule oriented grammar but rather
ate between correct target word and one or more incorrect confused words, this
grammar utilizes high-order N-gram statistics. The n-gram contexts are reduced
based on high frequency important tokens like words, numbers, punctuation and
affixes.
Henrich and Reuter [2009] used n-gram based statistical approach for lan-
of n-gram starts with all pentagrams of tokens for the whole sentence. Then the
the trigrams of tokens and so on. If an n-gram is not found in the database, it is as-
sumed that this n-gram is wrong. An error level is calculated corresponding to the
number of n-grams which are not found in the database. The smallest erroneous
n-gram finally points to the error in the input text. All these errors are summed
up and the result is compared to an overall error threshold. If it is higher than the
threshold, then the sentence is marked as wrong. They used wildcard (*) in the
erroneous n-gram for finding most probable n-gram sequence from the training
32
database. They also stored temporal adverb-verb and adjective-noun agreement
for statistical analysis of the agreement of a temporal adverb with the tense of the
in English text. Their model detects article errors based on three head words:
head verb (v), preposition (prep) and head noun (n). Initially, from the input
sentence three head words are extracted. Then all the head words are reduced
to their stem/root form and also converted to lower case. Then a quadruple like
n is calculated as follows:
f (Ii , v, prep, n)
P(Ii |v, prep, n) = (2.5)
∑
k
f (Im , v, prep, n)
m=1
the total number of articles. To estimate which article class is relatively low in a
when a S(Ii , v, prep, n) is less than some predefine threshold θ(0 < θ < 1) then
estimation they used backed-off smoothing [Jurafsky and Martin, 2009] technique.
Their system achieved 77% Precision, 64% Recall and F-measure of 0.70 when they
33
Classification
correct or incorrect using features extracted from training data. Classification ap-
proaches of different researchers differ from their use of features and classifiers
like Naïve Bayes [Mitchell, 1997], Balanced Winnow [Littlestone, 1988], Support
Vector Machine [Campbell and Ying, 2011], Voted Perceptron Freund and Schapire
[1999], Maximum Entropy [Berger et al., 1996], Decision Tree [Mitchell, 1997; Quin-
Scheler and Munchen [1996] used a feature model of the semantics of plural de-
terminers to detect and correct grammatical errors of definiteness. They had used
an Artificial Neural Network to learn a function that maps the semantic feature
Knight and Chander [1994] used decision tree classifier over lexical features
for detection and correction of article errors in the Japanese to English machine
translation outputs. Figure 2.8 shows basic architecture of their post editing task.
34
These binary features are either lexical or abstract which includes POS tags, plural
markers, tense and subcategories like superlative adjectives, mass nouns etc. To
build the decision tree, each feature maintains three types of measures. These
three types of measures are frequency of occurrence, distribution of a/an for noun
phrases in which the features are present and distribution for those without the
taken. The decision tree is built depending on the datasets and the feature-based
split. Their post editing algorithm achieved an overall accuracy rate of 78% on
financial text.
Gamon et al. [2008] used a decision tree classifier along with a language model
for determining article and prepositional errors in a sentence. They used a language
model which was trained on the Gigaword corpus. The language model was used
has three main components: Suggestion Provider (SP), Language Model (LM)
and Example Provider (EP). Initially, an input sentence is tokenized and POS
tagged. Then these tokens are sent to the SP module which employs decision
tree classifier for providing suggestions. All suggestions from the SP module are
collected and sent to the LM. Here the suggestions are ranked based on probability
score assigned by the LM. Finally, the EP returns example sentences containing
the user to choose the suggestion and to make an informed decision about the
correction. They achieved 55% accuracy for article error detection tested on 6K
CLEC and 46% accuracy for prepositional error detection tested on 8K CLEC test
corpus.
35
Table 2.5: Effectiveness of Individual Features
Feature %Correct
Word/POS of all words in NP 80.41
Word/POS of w(NP-1) + Head/POS 77.98
Head/POS 77.30
POS of all words in NP 73.96
Word/POS of w(NP+1) 72.97
Word/POS of w(NP[1]) 72.53
POS of w(NP[1]) 72.52
Word/POS of w(NP-1) 72.30
POS of Head 71.98
Head’s Countability 71.85
Izumi et al. [2003] used Maximum Entropy classifier for handling insertion,
omission and replacement errors. They have tested their model on 1915 sentences
collected from Standard Speaking Test (SST) corpus and have achieved approxi-
mately 50% Recall and 60% Precision using this approach. Han et al. [2004, 2006]
have trained a Maximum Entropy classifier to select among a/an, the, or zero ar-
ticle for noun phrases, based on a set of features extracted from the local context
of each. The system was trained on 6 million noun phrases from the MetaMetric
Lexical corpus. On an average, there were about 390,000 features in their Maxi-
mum Entropy model. The system was tested on 668 TOEFL essays and achieved
90% Precision and 40% Recall. Table 2.5 shows effectiveness of individual features
context. They used a richer set of syntactic and semantic features. Their approach
suggests a preposition which is most likely to occur in that context. The context of
containing 307 features. They assumed that a set of 307 features may capture all the
They selected these features based on a study of most frequent errors generated by
36
English learners. Head Noun, Number, Noun Type, WordNet information, Named
Entity information and ± 2 POS tag window are some examples of features they
have used. They also used additional features like whether the noun is modified
created test set containing preposition errors were used to test their system. They
found that their system can successfully detect between 76% to 81% of errors. Later
De Felice and Pulman [2009] used Maximum Entropy classifier for correction of
set used by them contains a wider range of syntactic and semantic elements, includ-
ing a full syntactic analysis of the data. Their system achieved average Precision
of 42% and Recall of 35%. Fujishima and Ishizaki [2011] proposed a method to
Oyama and Matsumoto [2008] also proposed a similar approach. They combined
hard margin SVM to find the learners’ error in Japanese text. But Fujishima and
putational cost. To compare their approach, they built a supervised SVM classifier
following the approach taken by Oyama and Matsumoto [2008] and reported that
using their unsupervised algorithm they achieved almost the same prediction ac-
curacy as the supervised learning algorithm. They tested their system on 3155
selected erroneous sentences and achieved accuracy 79.30% with bigram model,
37
86.63% with trigram model and 34.34% with quadrigram model. They found that
the classification accuracy using quadrigram model is lower than trigram model.
Web Counting
Empirical NLP systems rely on a large sized corpus of text in order to resolve
ambiguity. The corpus helps to determine which candidate is more frequent than
the increment of size of the corpus [Banko and Brill, 2001]. As the World Wide
Web (WWW) is the largest corpus till now, many researchers incorporate web
frequency counts to identify and correct writing errors made by non-native writers
using POS tagger and chunker to identify the check points. These check points
depict the context around the determiner and collocation. To find the appropriate
examples from the web, queries are generated in three granularity levels (viz.
reduced sentence level, chunk level and word level) according to the syntax of a
set. To find the appropriate examples of determiner from the web, a query may
look like { Wi−2 Wi−1 null Wi+1 Wi+2 } or { Wi−2 Wi−1 a Wi+1 Wi+2 } or { Wi−2 Wi−1 an
Wi+1 Wi+2 } or { Wi−2 Wi−1 the Wi+1 Wi+2 }. Since long queries have fewer web counts
than short queries, each count is multiplied with the number of words in the
query. If the weighted count is very low then the web unable to provide enough
count for query containing writer’s determiner and maximum weighted count for
38
threshold. If the ratio is smaller than the threshold then an error is flagged.
Evaluation of the system on a real world ESL corpus, reported 62% Precision and
41% Recall. Hermet et al. [2008] described a web-based frequency count algorithm
to detect and correct the prepositional errors in French language. They use a two-
capture the context around the preposition in the input sentence. In the second
phase, web searching technique is used to evaluate the frequency of this expression
language learners and they achieved 69.9% accuracy. They have also reported that
when a corpus of frequent n-grams is used instead of the web, the performance of
paradigm to detect and correct grammatical errors in text. They have trained
[2006] showed that a noisy channel model (instantiated within the paradigm of
SMT) can successfully provide editorial assistance for non-native writers. SMT
simply indicating an error flag. Their system is able to correct 61.81% of mistakes in
a set of naturally occurring examples of mass noun errors found on the World Wide
Web. Liu et al. [2008] proposed a noisy channel model along with a novel relative
39
position language model for correcting word order errors in sentences produced
by second language learners of Chinese. To detect word order errors, they used
SVM classifier whereas for correcting those detected errors they followed a noise
channel model. For a given erroneous sentence E having word order errors, their
model tries to find out the most probable corrected sentence using equation 2.7.
Ĉ = argmax P(C|E)
c
(2.7)
= argmax P(E|C).P(C)
c
Here, C represents a corrected sentence, P(C|E) is the reordering model and P(C) is
guage model derived from a large corpus of correct sentences. A weighted relative
position score is used as a language model P(C) to circumvent the limitation of cap-
turing long distance lexical relationship by an usual n-gram language model. The
for a given input sentence. For this model, they used probability of C generated by
bility. Experimental result shows that the overall accuracy of their error detection
module is 96.7%. They used BLEU([Papineni et al., 2002]) score as the evaluation
metric to evaluate the performance of their word order error correction module.
Their result shows that their error correction methodology outperforms the usual
n-gram based approach. They also found that the proposed system’s performance,
in term of BLEU score, can be improved by 20.3% and 26.5% when compared to
n-gram and SMT-based base system, respectively. Hermet and Désilets [2009]
40
language and then back to L2. When the round-trip MT system encounters an ill-
translation of that chunk than original L2 sentence. Thus using round-trip trans-
lation, errors present in the L2 sentence have been repaired. They tested their
66.4% accuracy using their round-trip SMT method. The performance of their
Round Trip SMT method was slightly worse than their web-count [Hermet et al.,
2008] method. Later, they proposed a hybrid method combining the round-trip
SMT and web-count. In this hybrid model, round-trip SMT works as a back-up
when their generated query using web-count method got almost zero hit. Using
rate domain knowledge into linguistic knowledge which provides highly accurate
results. Other than the above mentioned advantages, a rule-based system is easy
to understand. Thus, the user can easily extend the rules for handling new error
types. Rules can be built incrementally by starting with just one rule and then
extending it. Each rule of a rule-based system can be easily configured. A rule-
based system provides detailed analysis of the learner’s writing using linguistic
knowledge and provides reasonable feedback. Such feedback help learners to im-
prove their writing skill. Furthermore, the linguistic knowledge acquired for one
41
for a similar task in another system. Both grammatical and ungrammatical sen-
sentence can be easily identified based on the constraints which are relaxed during
parsing of the sentence. The main advantage of using mal-rules is the simplicity
with which they can generate feedback. High precision can be achieved by ap-
plying properly created constraints and mal-rules. The main disadvantage of the
manual effort. This increases cognitive load on the human analyst and also in-
technique is not well suited for parsing sentences with missing or extra words.
due to casual observations of domain experts can also pose a problem. Shallow
parsing is preferable than parsing with Precision grammar when there is a dearth
within an input sentence without using an explicit error model. Failure of parsing
does not always reliably ensure that the input sentence is ungrammatical because
parsing. The efforts required for grammatical error detection and correction varies
depending on the involvement of the error types and the grammatical context in
which the errors occur. However, one of the main disadvantages of rule-based
approach is that it requires complete grammar rules to cover all types of sentence
constructions. Though varieties of grammar formalisms are available but till now
robust parsers with sufficient linguistic rules are not available. Moreover, existing
rule-based parsers suffer from the curse of natural language ambiguities which un-
necessarily produce more than one parse tree even for the correct input sentence.
42
These are the limitations of parsing strategy.
On the other hand, Machine Learning (ML) based approaches usually rely on
large sized training data and parallel texts. When the training set and the test
set are similar, then ML approach provides good results. Data sparseness poses a
problem for ML. Due to data sparseness, many grammatical constructs may never
have been encountered. As most of the time, ML based system does not provide
necessary comments on errors, users are usually surprised when system predicts
Another problem is that, some ML based systems rely on threshold values which
are usually estimated heuristically. Threshold may vary depending on the domain
of text where the system is trained or tested. If an erroneous word interacts with
other erroneous words then the correction of either error cannot be done indepen-
dently. Moreover, if other errors lie within the context window of an erroneous
word, then the extracted features depending on that context window may also
corpora that are used are not large enough to cover all range of lexical patterns
of a given language. That implies some lexical occurrences are left unexamined.
One solution is to use the World Wide Web as a linguistic corpus. An advantage
of web based grammar correction is that it is dynamic in nature. The web search
hits change with the change of language and also reflect the current state of the
language. Moreover, most of the contents of web are freely accessible. Inspite of
out several limitations of the Web Count approach for grammar correction. Firstly,
commercial search engines do not provide the root/stem or POS tag of the given
43
input sentence. Secondly, there are constraints on numbers of queries and numbers
of hits per query. Thirdly, search hits are for pages, not for instances. Last but not
the least, web count results vary for different search engines.
heavily relies on the availability of large amount of parallel training sentences. The
expense and difficulty of collecting large quantities of raw and annotated learners’
still a very difficult task. We cannot simply apply the existing approaches for our
et al., 2004] and has free word order. State-of-the-art CFG is not applicable [Shieber,
1985; Begum et al., 2008; Bharati et al., 2010] here. In addition to this, lack of robust
parsers, insufficient linguistic rules and dearth of error annotated parallel corpora
We prefer Natural Language Generation (NLG) [Dale et al., 1990; Reiter and
Dale, 2000; Hovy, 1991; Dale et al., 1998] approach instead of Natural Language
Understanding (NLU) [Allen, 1987]. The main reason behind preferring this ap-
proach is that we need not model the ungrammatical sentences as has been done
44
coverage linguistic rules also are not required like a rule based system. This sys-
tem is suitable where robust parsers and linguistic rules are not available. The
pressions. Any system based on this approach identifies the main keywords in a
sentence and then reconstructs the sentence from these keywords. This technique
is suitable for erroneous sentences where major corrections are required. The as-
sumption behind this approach is that the user can supply the important key words
of the sentence, even if the user is unable to write a grammatically correct sentence.
It consists of two main steps. Initially, without considering grammatical errors and
other noises, NLG based system extracts a meaning representation from the in-
sentence. Baptist and Seneff [2000] followed NLG approach for their conversa-
tional system named GENESIS. We have applied NLG approach for our Bangla
45
CHAPTER 3
CORPUS
A sufficiently large error corpus is essential for training and testing of any
corpus of Bangla text. One of the major problems of building error corpus from
learners’ data is that the process is very time consuming. It also requires linguistic
sentences has been created automatically considering performance errors and lan-
guage learning errors that occur frequently. This chapter is more closely aligned
to the task of automatic error corpora creation. Before starting our discussion on
Bangla Second Language Learners often commit grammatical mistakes while writ-
ing text because of their lack of language knowledge (Language Learning Error)
and due to oversight, carelessness or tiredness (performance error). Performance
errors can occur mainly due to four operations: insertion, deletion, transposition
and substitution. When an error involves more than one operation, it is known as
Composite Error. There are two primary concerns at the time of automatic error
corpus creation, first one being linguistically realistic and the second one is to
mimic the error scenarios that happen normally. To analyse the kind of naturally
produced error scenario we have collected 1500 sentences from 10 standard native
students’ exam papers of Bangla and also have collected second language learn-
ers’ data from students whose first language is either Hindi or Oriya or Telegu.
Performance errors and language learning errors occurred in their text are then
carefully analysed. Exam papers are collected with the assumption that students
make more mistakes in the time of examination as they are usually in a hurry to
complete their answers within the limited time period. In the course of studying
Second Language Learners text, it has been found that the proportion of errors oc-
curred by substitution operation is much more than any other operation. We have
seen than substitution errors and deletion errors committed by second language
learners are 14% and 18% higher than native speakers. However, an interesting
observation was insertion errors committed by native speakers are much higher
(21%) than second language learners. The proportion of transposition errors com-
mitted by second language learners and native speakers are much less than any
higher (4%) than second language learners. After analysing unique words col-
lected from native speakers and second language learners real data, we found
Figure 3.1 shows the proportion of performance errors caused by each of the
47
four operations. The Native Speakers and the Second Language Learners make
Figure 3.1: Proportion of Errors in Native Speakers and Second Language Learners
Corpus.
But study [Leacock et al., 2010] shows that Second Language Learners make many
more mistakes than native speakers. Most frequent error types produced by native
speakers may not be produced by second language learners. For example, errors
generated while writing complex sentences are infrequent for language learners,
as most of the time language learners avoid writing complex sentences. They
write complex sentences only when they have enough confidence in their ability
to construct them correctly. Second Language Learners can be of two types viz.
by their native language. When native languages are similar but not identical,
L1 produces errors due to negative transfers. They fail to find exact equivalence
between these two languages. On the other hand, L2 Language Learners produce
irregularities. They face trouble due to the novelty of the new language [Leacock
et al., 2010]. After analyzing the collected Bangla second language learners’ data
we came to know that the above statements (quoted in [Leacock et al., 2010]) are
also true for Bangla language. Therefore, learners who learn Bangla language
48
Table 3.1: Examples of errors committed by a Bangla Second Language Learner
different kinds of errors than learners having native languages like Malayalam,
learning errors. Table 3.1 shows examples of errors committed by a Bangla Second
Language Learner having mother tongue Hindi. Figure 3.2 shows taxonomy of
errors found in Bangla Text of second language learners. We shall now elaborate
1. Transposition Operation:
• Incorrect Sentence:
49
. Types of Errors
Addition
Person Error
Repeated Word
Implicit Subject
Pronoun Error
Cohort Replacement
Count Error
Figure 3.2: Taxonomy of errors found Bangla text of second language learners.
2. Addition Operations:
(a) Repeated words:
Bangla: aami ekati *1 bhaala bhaala Chele
English: I am a *good good boy.
(b) Unnecessary words:
Bangla: paramaaNu anu apekShaa *adhika kShudratara
English: atom is *more smaller than molecule.
3. Deletion Operations:
(a) Implicit Subject:
Bangla: *[ ] tomaara maÑgala karuna (Subject:iishbara is missing here)
English: May *[ ] bless you. (Subject: God is missing here)
(b) Implicit Verb:
Bangla: tumi ki maadhyamika pariikShaa *[ ] ? (Verb: debe is missing here)
English: Will you *[ ] matriculation exam? (Verb: give is missing here)
4. Substitution Operations:
(a) Similar word or Cohort replacement:
1
* indicates error word in the sentence
50
• Incorrect Sentence:
Bangla: *bale baagha thaake
English: *tell tiger lives
• Correct Sentence:
Bangla: bane baagha thaake
English: Tiger lives in forest
Here bale and bane are cohort of each other but bale is verb and bane is
noun. In literature this type of error is also known as real word spelling
error.
1. Tense Error:
• Example 1:
Bangla: aami prashnapatra pa.Daba o uttara diYechhilaama
English: I will read the question paper and I gave the answer.
• Example 2:
Bangla: gatakaala aami sinemaa Jaaba
English: Yesterday I will go to Cinema.
• Example 3:
Bangla: Jakhana aami darajaa khulachhilaama takhana se ghare Dhuke pa.Dechhila
English: When I was opening the door then he entered the room.
2. Person Error:
• Example:
Bangla: chhaatraraa nishchaYa bidyaalaYa Jaabe Jadi *se pariikShaa dite
chaaYa
English: student will definitely go to school if *he wants to appear in the exam.
Plural sense of student has been lost by the singular representation of
‘he’.
3. Case Error:
• Example:
Bangla: eTaa *kaakaaraa bai
English: This is uncle’s book
The suffix raa of the noun kaakaa (uncle) is changed from genitive case
‘ra’.
51
• Example:
Bangla: *daYaamaYii shikShaka aasachhena
English: The kind-hearted teacher is coming
The female suffix maYii of the word daYaa (kindness) is changed from
male suffix maYa which goes with shikShaka (male teacher).
• Example 1:
Bangla: tomaara naama ki |
English: What is your name .
Here the punctuation | is used instead of ‘?’ symbol.
• Example 2:
Bangla: aami*, dekhalaama se aasachhe |
English: I, saw he is coming.
6. Sentence Fragment:
• Example:
Bangla: aami gaana gaa_iba *| jadi tumi naacha |
English: I will sing. if you dance.
• Example:
Bangla: aami bhaata *khaabena
English: I eat rice
Here the subject aami (I) is the first person non honorific but the person
information of the verb khaabena (eat) is third person honorific.
8. Count Error:
• Example:
Bangla: aamaara tinajana bandhu aachhe : jaYanta, raajiiba, debaaruna o
saurabha |
English: I have three friends: Joyanta, Rajib, Debarun and Saurabh.
language and reports proportion of the four types of error as follows: substitution
52
(48%) > insertion (24%) > deletion (17%) > combination (11%). Foster [2005] has
manually created an error corpus for English and has classified missing word
errors based on the Part of Speech tag of this missing word. According to her “98%
of the missing POS come from the following list (the frequency distribution in the
error corpus is given in brackets): determinent (28%) >verb (23%) > preposition
(21%) > pronoun (10%) > noun (7%) > to (7%) > conjunct (2%)”. But manually
creation of such corpus is very time consuming and non trivial task. Brockett
errors. They treated the error correction task in the machine translation point of
view. Their aim was to apply Statistical Machine Translation (SMT) technique
of automated error corpus creation. They have carried out a detailed analysis of
Missing Word Errors, Extra Word Errors, Agreement Errors and Covert Errors. Lee
and Seneff [2008] created artificial error corpora by introducing verb form errors.
To mimic the real life errors, Foster and Andersen [2009] designed the GenERRate
tool. Their algorithm generates error corpus by introducing error along the line of
Various online resources are available nowadays, from where Bangla Unicode
53
www.nltr.org/) published by Society for Natural Language Technology Re-
search (SNLTR).
the form of “Sadhu” and “Chalit”. Sentences written in “Sadhu” are mostly found
and Sarat Chandra Chattapadhyay. Sentences written in “Sadhu” are not used in
day-to-day communication. On the other hand, most recent works follow “Chalit”
English) language [Kundu and Chandra, 2012]. Sentences written in “Sadhu” and
“Benglish” are not important in our case, as our focus is to detect and correct
Satyajit Ray. The reason behind selecting the novel is that sentences are written
in “Chalit” and most of the sentence are simple and representative of those that
“Jekhane Dactar Nei” a Bengali book translated from English work “Where There
We assumed that the syntax and semantics of the collected sentences are correct
as they are mostly collected from different newswires which are normally edited
54
and proof-read. Corpora from multiple domains have been collected to avoid the
skewed distribution of data. From this set of collected Bangla sentences (approx
4,68,582), sentence length distribution has been measured. It is found that sen-
tences containing 9 words are the most frequent in this corpus. Figure 3.3 shows
3.3 Methodology
Now we will discuss our novel approach for error corpus generation. The proce-
dure is as follows:
two consecutive words can generate (n-1) sentences with assumption that
only one transposition done in each sentence. Table 3.2 shows 3 sentences
generated from a sentence containing 4 words. Though the last two examples
55
Table 3.2: Examples of Transposition Operation.
Operation Example
Source gaachha theke phala pa.De2
Transposition-1 theke gaachha phala pa.De
Transposition-2 gaachha phala theke pa.De
Transposition-3 gaachha theke pa.De phala
ated by changing the word order of different types of Bangla collocated words
three categories: echo words (if w1 w2 is a word sequence and w2 has no mean-
ing), hyphenated words (w1 and w2 are connected by hyphen) and highly
One can use a simple regular expression [a−zA−Z]+ −[a−zA−Z]+ for collect-
ing hyphenated words from corpus and [\s]([a − z]([a − z]+ ) \ s+ [a − z] \ 2)[\s]3
for collecting echo words. For collecting collocated and co-occured word
has been used. Variance(σ2 ) of the number of words separating word w2 from
word w1 have been estimated and low variance word sequences have been
filtered using a statistical significance test (t-test) with 99.5% confidence level.
The null hypothesis H0 is that the word sequences (w1 w2 ) appear indepen-
dently in the corpus. These filtered word sequences are cross verified with
2
Bangla Sentence: gaachha theke phala pa.De
English Word Meaning: Tree from fruit fall
English Translation: Fruit falls from tree
3
Python regex notation has been used here. The pattern will match with the word pairs where
the first characters of both words differ and the remaining characters remain unchanged. For
example, in the word pair “nardamaa Tardamaa”, only the first character differs and rest of the
character sequence is same for both words.
56
Mutual Information (MI) values between wi and w j . The word sequences
having higher Mutual Information and lower variances and having t-value
p(w1 , w2 )
MI(w1 , w2 ) = log2 (3.1)
p(w1 ).p(w2 )
Count(w1 ,w2 )
where p(w1 , w2 ) = N
and Count(w1 , w2 ) is the number of sentences in
calculated.
its cohorts and homophones. Cohorts are generated using regular expression
sequences in a word. These generated words are then verified with spelling
dictionary to ensure that the generated words are correctly spelled. In this
the over generated cohort words. Words having minimum edit distance with
generate n sentences where each sentences containing (n-1) words. Table 3.3
57
Table 3.3: Examples of Deletion Operation.
Operation Example
Source gaachha theke phala pa.De
Deletion - 1 theke phala pa.De
Deletion - 2 gaachha phala pa.De
Deletion - 3 gaachha theke pa.De
Deletion - 4 gaachha theke phala
Operation Example
Source gaachha theke phala pa.De
Addition - 1 ⃗ gaachha theke phala pa.De
W
Addition - 2 gaachha W⃗ theke phala pa.De
Addition - 3 ⃗ phala pa.De
gaachha theke W
Addition - 4 ⃗ pa.De
gaachha theke phala W
Addition - 5 gaachha theke phala pa.De W⃗
w
1
w
2
W = w3
⃗
...
w
v
V x (n+1) sentences where V is the length of the vector. Here we are consid-
ering one word is inserted at a time. Table 3.4 shows number of sentences
containing n words.
Step-6 Figure 3.4 shows a N x N tag association matrix which is generated af-
ter analyzing 5000 manually POS tagged Bangla sentences having different
58
syntactic categories. Every possible combination of two POS tag sequence is
each cell of the matrix corresponding to the tag sequence is filled with 1,
otherwise the cell contains 0. A cell with zero value indicates an invalid rela-
tionship i.e. POS tag of column Ni cannot occur after tag of row N j . In other
words POS tag of Ni does not follow tag N j . For example, post position (PPS)
cannot appear after intensifier (INT). Consulting this matrix, the mal-rule
can be generated which can be used for transposition of the word sequence
In this research, we have used a HMM based POS tagger [Dandapat and Sarkar,
2006; Rabiner and Juang, 1993; Van Gael et al., 2009; Cutting et al., 1992] which has
The POS tagger has been trained on 5345 annotated sentences having 13215 unique
words. When small numbers of annotated sentences are available, less numbers
59
of tags are preferred [Bharati et al., 2006]. It has been seen that sentences annotated
with less number tags lead to efficient machine learning. Moreover, when the
number of tags are less accuracy of manual tagging is higher due to less disagree-
ment among annotators [Bharati et al., 2006]. However, generated tagset should
not be so coarse such that important lexical and grammatical information encoded
very challenging task. Studies related to this issues have already been reported
in [Bharati et al., 2006; Sankaran et al., 2008] and tagsets have been designed for
Bangla language based on these studies. We have collected 5345 raw sentences
from MIT Bangla corpora4 . Initially we have decided to have a finer tagset (con-
taining 90 tags), but later we come up with comparatively course tagset having
only basic 14 tags (see Table 3.5). Our final tagset has been prepared after con-
sulting and comparing with available tagsets like Penn tagset5 [Santorini, 1990] ,
tagset designed by IIIT Hyderabad6 [Bharati et al., 2006], the BIS POS tagset [Dash,
2013] and tagset reported in [Sankaran et al., 2008]. The sentences which had been
previously annotated with finer categories and other tagsets, are automatically an-
notated with these 14 tags. Then, errors induced during such automatic mapping
are manually verified and corrected. Thus, 5345 sentences are manually annotated
Our test set contains 500 Bangla sentences having 3228 number of unique
words. The number of unknown words7 in our test set is 1392. Table 3.6 shows
POS tag distribution in our training and test corpus. Table 3.7 shows accuracy of
individual POS tag on our training and test sentences. Table 3.8 shows top three
4
http://tdil.mit.gov.in/
5
http://www.cst.dk/mulinco/filer/PennTreebankTS.html
6
http://ltrc.iiit.ac.in/nlptools2010/files/documents/POS-Tag-List.pdf
7
The term “Unknown words” means number of unique words that are not found in training
corpus.
60
Table 3.5: POS tags used in our tagger
Table 3.6: POS tag distribution in our training and test corpus
POS Tag Distribution in Training set (%) Distribution in Test set (%)
CN 20.79 29.72
PUNC 14.20 14.97
JJ 12.81 7.19
VBF 11.73 13.38
PN 8.89 7.37
PR 5.65 7.5
CC 5.33 4.6
VBN 4.61 1.65
RB 3.34 4.25
VNF 2.68 0.08
IND 2.59 2.22
PPS 0.82 2.25
DGT 0.77 0.03
INT 0.60 4.79
61
Table 3.7: Accuracy of individual POS tag using HMM
POS Tag Accuracy in Train set (%) Accuracy in Test set (%)
PUNC 99.96 96.13
CN 99.35 95.24
PR 96.97 91.43
JJ 96.35 89.28
VBF 95.98 85.93
PPS 95.96 83.40
CC 95.58 81.29
VNF 94.45 79.00
VBN 93.95 82.24
RB 90.66 81.06
IND 90.56 76.67
INT 89.42 73.05
DGT 80.43 66.30
PN 3.93 11.30
wrong predictions by our POS tagger. The reason behind such wrong predictions
is due to less number of occurrences of the tag in our training corpus. Lexical gap
[Manning, 2011] in training corpus and number of unknown words in test corpus
is another reason for wrong predictions. From Table 3.8 we can see that often
PN tag is predicted as CN. Both PN and CN are nouns in the broader category.
work. However, prediction of INT as JJ and DGT as PN are serious issues that need
Bangla word-tag dictionary. If the words belong to a closed set of INT containing
62
9]+[a-zA-Z]+ etc., the word is tagged as DGT. Thus, applying linguistic and pattern
matching rules after our tagging module, we reduce the errors of our POS tagging.
Therefore, the number of generated sentences using this method increases with
the number of words in a grammatical sentence. We have seen that the mode of
the sentence length distribution of our collected Bangla corpora is 9. This implies
Those many sentences can be generated from a single sentence having 9 words. If
using our method. Some Bangla sentences may have as many as 57 words but we
are not considering such cases as such sentences are very infrequent (see Figure
3.3). Moreover, as Indian languages are relatively free word order, some valid
well-formed sentences also get generated after this noise induction procedure.
getting selected. Therefore we have applied both rule-based and statistical based
approach for collecting significant sample from this population. Initially we pass
the sentences though our HMM based POS tagger (see subsection 3.3.1)and then
generated tag sequences are passed through mal-rule detector which collect the
63
sentences containing improper POS tag sequences. We also have calculated the
and Relative Position Score [Liu et al., 2008]. A numeric score is assigned to deter-
mine the quality of the sentence. The sentence-level confidence measure is based
on the score of each and every individual word in the sentence. Confidence score
and MI based confidence score, measures the lexical consistency [Raybaud et al.,
Score(S) = Score(w1 , w2 , w3 , · · · , wn )
1∑
n
= Score(wi )
n i=1
(3.2)
∑
n
MI(w j , wi )
1∑
n
j=1, j,i
=
n i=1
n−1
Here MI(w j , wi )is calculated using equation 3.1. MI based confidence measures
do not take word order into account and instead focus on long range lexical
relationships. For this reason, we have also estimated the relative position based
confidence score. Confidence score of a sentence using Relative Position Score [Liu
64
RPscore (S) = RPscore (w1 , w2 , w3 , · · · , wn )
( )
f reqDep (wi ,w j )
∑n ∑j−1
f reqInd (wi ,w j ) (3.3)
j=2 i=1
j−1
=
n−1
where f reqDep (wi , w j ) is the number of sentences in which wi and w j co-occur with a
constraint that w j appear after wi in a sentence and f reqInd (wi , w j ) is the number of
Information has been used for proper selection of the erroneous sentences gener-
ated by substitute operation. Low Mutual Information ensures that a word in the
sentence is wrongly placed in the context of the other words. Bigram and Rel-
ative position scores have been used to select the erroneous sentences generated
θRS for each of the three metrics. The range of the estimated scores varies with
the number of words in a sentence and selection of the confidence score estimator.
Therefore, we have normalized the scores generated by each estimators so that the
confidence values lies between 0 and 1. The normalization has been done using
transposition operations are selected if their normalized bigram score is less than
θbigram and normalized RS is less than θRS . It implies that bigram and RS are
related with logical AND operation. These confidence thresholds are selected
65
Table 3.9: Experiment with confidence thresholds for generating erroneous sen-
tences generated by substitution operation
Erroneous sentences
θMI generated by substitution operation
Precision Recall F-Score
0.1 0.9 0.36 0.514286
0.2 0.9 0.367347 0.521739
0.3 0.9 0.367347 0.521739
0.4 0.9 0.367347 0.521739
0.5 0.88 0.458333 0.60274
0.6 0.911765 0.632653 0.746988
0.7 0.925 0.787234 0.850575
0.8 0.82 0.87234 0.845361
0.9 0.84 0.823529 0.831683
experimentally. Table 3.9 and Table 3.10 show change of precision and recall of
automatic error corpus creation with confidence thresholds. We have seen that
automatic error corpus creation methodology achieved highest F-Score when θMI
= 0.7 (for sentences generated by substitution operation), θbigram = 0.5, and θRS = 0.9
ated erroneous sentences from randomly selected 1000 sentences from a corpus
using mal-rule detector and depending on the confidence score (see subsection
3.3.2). After manually analysing the random sample of generated ill-formed sen-
tences, we found that 87% of generated sentences are really ungrammatical. Most
of these generated sentences have invalid POS tag sequences. Though some of
the generated sentences have valid POS tag sequences, the word sequences in
66
Table 3.10: Experiment with confidence thresholds for generating erroneous sen-
tences generated by transposition operation
67
Figure 3.5: Simplified functional diagram of automatic error corpora creation.
68
Table 3.11: Erroneous sentences generated from a single sentence and selected
according to the confidence score.
these sentences are infrequent. Experimental result also shows that 13% of those
tion operation some time generates another grammatical construction. Table 3.11
the first sentence is a correct sentence and the remaining erroneous sentences are
generated automatically. In this Table, R_S indicate the relative position score
results. Table 3.12 shows Bangla Echo words and Hyphenated words collected
from the corpus. Transposition between them might cause error to be induced in
69
Table 3.12: Bangla Echo words and Hyphenated words.
hyphenated words are allowed sometime. For example, we may sometimes use
are very infrequent. Table 3.13 shows some automatically collected collocated and
co-occured word sequences along with their relative position, mean and variance
of relative positions, t-value and Mutual Information between these word se-
and collocated words induce noise in a grammatical sentence and this procedure
70
CHAPTER 4
AND CORRECTION
“The principal design of a Grammar of any language is to teach us to express ourselves with
propriety in that language, and to be able to judge of every phrase and form of construction,
The NLG based approach has been used for grammatical error detection and
correction of Bangla language. There are two levels of operations in an NLG based
approach. In the first level, the input word sequence (w1 ,w2 ,w3 ,· · · ,wn ) of a sen-
tence is transformed into over generated word vectors (w⃗1 ,w⃗2 ,w⃗3 ,· · · ,w⃗n ) which is
basically a trellis of all possible sentences. In the second level, a Language Model
with optimal search algorithm is used for selecting the best path from this search
space. The Language Model is used specially for scoring the various paths of the
trellis. The best path indicates the grammatically well-formed sentence whereas
the worst path indicates the ill-formed sentence. To create the trellis the input
word sequences are first passed though HMM based POS tagger (see subsection
3.3.1) and rule based morphological analyser that reduce each word to its root
including all possible suffixes with the root. In this phase, proper care is necessary
for selection and ordering of the suffixes. The most common ordering of suffixes
khaanaa,khaani, guli, gulo, Tuku, Taa, Ti, Te, To, dera, bRinda, raa), 10 case markers (e,
ete, era, ere, ke, te, Ya, Ye, ra, re) and 2 emhasizers (i, o). Morphological constituents
CL, CA and EM denote Root, Classifier, Case Marker and Emphasizer respectively.
The symbol ‘?’ indicates CL, CA and EM can occur zero or one time, i.e. CL, CA
and EM can be implicit for a given word. Thus following this rule, for a given
root word ‘chhele’ we can generate inflected words as shown in the following table
4.1. We have used a rule based morphological analyser for Bangla. Initially Part
72
Table 4.2: Example of Nominal Morphological Analysis
of Speech (POS) wise suffix lists have been prepared. In our NLG based grammar
correction system we have used our noun morphological analyser during correc-
tion of nominal inflectional errors. We have used a simple suffix striping algorithm
for noun morphology. The suffix stripping algorithm simply checks if the word
has any suffixes from the previously collected suffix list. This checking is done by
using regular expression. Then the suffix is stripped from the word. The same
procedure iterates on the remaining string (after stripping the suffixes from the
word). The number of iteration depends on the rules. For example the stripping
procedure iterates three times for noun. Finally, the remaining string is searched
in the root word dictionary for verifying its existence. If the root word is a proper
noun then it will not be found in the root word dictionary. Table 4.2 shows strip-
are some Bangla nouns that appear both as a root form as well as inflected form.
Examples of such Bangla nouns are “jaamaai”, “maalaai” etc. The word “jaamaai”
appears as a whole root words to indicate the meaning in English ‘son in law’
or inflected form like ‘jaamaa’ + ‘i’. Here ‘jaamaa’ is the root and its meaning in
English is ‘shirt’ and suffix ‘i’ is agglutinated with it to intensify the meaning (i.e.
‘only shirt’). Similarly, the word ‘maalaai’ is a root word which means “a special
as ‘maalaa’ + ‘i’. In such scenarios our noun morphological analyser returns the
whole word (considering as root with no inflection) and also in morphed form (i.e.
73
root + suffixes). The noun morphological analyser has been tested on 300 Bangla
inflected Common Nouns and 300 Bangla Proper Noun. The morphological anal-
yser yields an accuracy of 98.4% on Common Noun and 91.3% on Proper Noun
data. In case of post positions, the whole post position list is simply over gen-
here for calculating the scores of each node in the trellis. To avoid the sparseness
problem of data, Jelinek Mercer Smoothing [Jelinek and Mercer, 1980] is applied.
The Viterbi [1967] algorithm is used for selecting the optimal path from the trellis
depending on the scores generated by the language model. Figure 4.1 shows the
selection of best probable well-formed sentence from the trellis by bold line and
Figure 4.1: Generative model for well-formed and ill-formed sentence detection.
74
4.1 Pruning of the Search Space
Availability of a good POS tagger and selectional restriction rules help avoiding
certain paths in the trellis. A rule-based function can be used to prune the search
space. Our linguistic function is defined by a set of hard constraints which are ba-
sically a knowledge base of linguistic selectional rules. For example, our linguistic
The function returns 1 when certain condition is satisfied and 0 otherwise. Ap-
plying Linguistic Hard Constraints on a trellis shown in figure 4.1, we can get a
pruned trellis (relatively smaller search space than the previous one) as shown in
figure 4.3
It is also important to ensure that the corrected sentence is not too far away from the
ungrammatical one. To ensure this, initially k-best correct sentences are selected
from the trellis and then modified BLEU [Papineni et al., 2002] Score and Word
Error Rate (WER) are applied. BLEU is the geometric mean of n-gram match of
words with a brevity penalty and WER is calculated using Levenshtein Distance
(Edit Distance) between ungrammatical sentence and the correct sentence. WER
75
Figure 4.3: Pruned trellis after applying Linguistic Hard Constraints
76
is calculated as follows:
number of words in the ungrammatical sentences. The higher the value of WER,
the lesser the value of similarity between two strings. The value of WER can vary
from 0 to 1 and some time the value becomes more than 1 when the length of
the correct sentence is greater than the ungrammatical sentence due to insertion
operation. Our aim is to select the correct sentence from a set of correct sentences
N
∑
BLEU(W, C) = γ ∗ exp λn log (Prec(W, C)) (4.2)
n=1
N
∑
here exp λn log (Prec(W, C)) is the geometric mean of the modified n-gram Pre-
n=1
cision Prec(W,C) using n gram up to length N where Precision is calculated as :
unigram matching is λ1 = 0.33, weight for bigram matching is λ2 = 0.5 and for tri-
gram matching λ3 = 1 . Figure 4.4 shows n-gram matching score between following
Precision of the hypothesis correction which has a less number of words than the
77
Figure 4.4: N-gram matching score between ungrammatical and correct sentences
The calculation shown above, grammatical sentence 3 has more Precision value
if c > r
1
γ=
(4.6)
(1− cr )
e if c ≤ r
Where c is the length of the correct sentence and r is the length of the ungram-
matical sentence. Here the BLEU score indicates reference sentence is more close to
the candidate sentence. Thus using a high BLEU score and low WER we can select
78
Our NLG based approach can also be used to correct performance error. Only
each word of the sentence is replaced with its cohort or homophones. Cohorts are
then verified with Bangla spelling dictionary to ensure that the generated words
can be generated on an average from a single word then k x n sentences can be gen-
erated from a sentence containing n words. Levenshtein distance also can be used
to prune the over generated cohort words. Words having minimum edit distance
with the original word are selected for the cohort list. Following this procedure
we can generate “mAchha” for given word “gAchha” and “khAtA”, “chhAtA” for
happen that the user has provided a grammatical sentence and the correction pro-
vided by the system is another grammatical sentence. In this scenario, we think the
system provided correction is more natural than the candidate sentence because it
chooses the correction based on language model from all possible sentences gen-
79
CHAPTER 5
EVALUATION
“The most serious mistakes are not being made as a result of wrong answers. The truly
of a grammar checker. Over the last few years, most of the studies regarding
grammatical error detection and correction have been focused towards the design
and development aspects but relatively less attention has been directed towards
evaluation issues [Leacock et al., 2010; Chodorow et al., 2012]. The current work
attempts to address this gap by taking a fresh look at standard measures and moti-
vating the need for a finer grained evaluation based on a detailed characterization
5.1 Challenges
same test set and compare the results. But due to the lack of substantially large
focus on some specific types of grammatical errors [Leacock et al., 2010]. Moreover,
these approaches are tested on different test sets which also vary in size and error
same types of errors [Tetreault et al., 2010; Dickinson et al., 2011; Rozovskaya
and Roth, 2011]. Sometimes, different metrics have been used as different aspect
of the same task. For example, in [Han et al., 2010] performance of omission
were reported using Precision and Recall. Some researchers [Park and Levy, 2011]
preferred BLEU [Papineni et al., 2002] and METEOR [Lavie and Agarwal, 2007]
et al. [2012] recommended to report True Positive (TP), False Positive (FP), False
Negative (FN) and True Negative (TN) in addition to any metrics derived from
them so that any reader can calculate other measures that the authors of a particular
paper did not choose to include. It would thus be a worthwhile enterprise to look
into new possibilities in the evaluation process. In this chapter, we have introduced
the acceptability of the grammatical error detection and correction system and to
circumvent the need of gold standard test corpora during comparison among the
systems targeting different types of errors. MEGA has been applied on our Bangla
existing English grammar checkers and the NLG based Bangla grammar checker
is not possible, the NLG based system has been compared against a prototype
81
Table 5.1: Evaluation Measure Formulae
Metrics Formulae
TP
Precision TP+FP
TP
Recall TP+FN
2∗(Precision∗Recall)
F1-Score (Precision+Recall)
(TP+TN)
Accuracy (TP+TN+FP+FN)
sured by metrics like Precision, Recall, F-Score and Accuracy. These measures
generally indicate how often grammatical incorrectness is rejected and how often
in the text and the grammar checker rightly detects that error. False Positive (FP)
occurs when the system identifies existence of an error even when there is no such
error in the text. False Negative (FN) occurs when the system misses an error in
the text. True Negative (TN) occurs when the system correctly identifies absence
of errors in the text. Table 5.2 shows these relationships with respect to grammati-
cal error detection task. Now we shall discuss some important aspects regarding
82
Table 5.2: True Positive, False Positive, False Negative and True Negative with
respect to grammatical error detection task.
Grammatical Errors
(Condition)
Present Absent
(Negative) (Positive)
Not Found Found
Error Detection
Most often, test suites for grammar checkers are prepared by using a set of well-
sentences is prepared from a collection of proof read and edited sentences which are
easily available from online newswire. The reasons behind preferring newswire is
to avoid the skewed distribution of data, since newswire has a good representation
of data from diverse domains. A test suite of ill-formed sentences should ideally
cover a wide range of the sentences some having single errors and many others
sentences are not easily available. Manual creation of fully annotated learners’
error corpora is quite expensive, time consuming and non-trivial task. To avoid the
problem of creating a corpus of manual errors, one can synthetically generate error
corpora to simulate real errors (as discussed in Chapter 3 ).There has been previous
works [Foster and Andersen, 2009; Lee et al., 2011] on using synthesized error data
which indicates artificial error corpora can be a valid source of evaluation. Due
to unavailability of standard test corpora, one solution is to test the system with
83
different domains having different structural complexity and hardness of errors1 .
For this reason, we have categorized our test corpora across axes like domains
complexity (like simple sentence, complex sentence and compound sentence) and
types of errors and their proportions. Our error corpora contains 66% post position
error, 29% noun inflectional error, 3% determiner error and 2% combined error.
learners’ writing.
satisfying all of them at the same time is quite difficult. We present evaluation of
Bangla grammatical error detection and correction systems using both standard
metrics like Precision, Recall and F-Score. We propose two new metrics, namely the
Metric (CMM). GAAM measures the acceptability of the system whereas CMM
circumvents the need of gold standard test corpora during comparison among
We have compared our NLG based system with another grammatical error detec-
tion system that uses a Naïve Bayes classifier. The Naïve Bayes classifier follows the
1
The term “hardness of errors” indicates the complexity of grammar correction due to presence
of errors in the sentence.
84
method reported in Golding [1995]. Four features, namely, word-word, word-tag,
tag-word and tag-tag sequences have been used in this classification algorithm.
The classifier has been trained on 4,68,582 well-formed Bangla unicode sentences
procedure has been elaborated in section 3.2. Ill-formed sentences are generated
Error detection performance of the NLG based grammar checker has been eval-
generated ill-formed sentences. The Naïve Bayes classifier is tested using the same
test sentences which were used for NLG based system and the true acceptance
and false rejection rates are also verified. Figure 5.1 shows the performance of
these two error detection approaches. A comparison of these two error correction
models are shown in figure 5.2. The synthetically generated ill-formed sentences
85
Figure 5.2: Performance of error correction
have been divided into some subcategories, so that each subcategory contains spe-
cific types of errors like post positional errors, determiner errors and case marker
errors. We have also tested the NLG based system on individual subsets as well
as on the total set. Details of the proportion of errors were mentioned in section
5.3. Table 5.3 shows the performance of the NLG based system on different types
testing, we only provide the system’s output (suggestion) to two testers having
86
Table 5.3: Performance evaluation of NLG based system on individual errors as
well as combined errors in five text genres. P indicates Precision and R
indicates Recall.
0 Not acceptable
1 Acceptable with difficulty
2 Fully acceptable.
no knowledge of the input, whereas in the open testing the input sentences are
between blind and open testing, as our intention was not to investigate the bias of
the testers in presence of the inputs. In blind testing testers are requested to grade
the output sentences in a three level grading scale (0, 1 and 2) depending on their
acceptability. Grading scale is defined as table 5.4. Depending on the user’s grade,
∑
s=N
µs
s=1
Acceptability of the system φ = ∗ 50% (5.1)
N
87
where N is the number of test sentences and µs is the Mean acceptability grade for
Here G is the grade (within 0, 1 and 2) given by evaluator e and n is the number
of evaluators. Using this formula, we found GAAM score of our system’s output
5.3 shows the result of blind testing. We have found that the acceptability of
the NLG based system’s output is 80.26%. Now the question is, does the score
remain same after testing on large number of ill-formed sentences? To answer this
question, we have done a statistical significance test. Using t-test we found that
the acceptability of system’s output lies within the confidence interval [80.21 ±
1.17] with 95% confidence. We have also calculated the inter-annotator agreement
by kappa statistic [Cohen, 1960; Fleiss, 1981]. Using Cohen’s kappa we get the
kappa score between two testers as k=0.34. Agreement between two testers is
88
shown as a radar graph in figure 5.4. Each axis in the graph represents a system’s
of 38 sentences. The plot is guided by the acceptability score for those system’s
Very often, comparison between available grammar checkers is not possible due
developed for two different languages, then performance of these systems cannot
be compared as there is no question of the common test set. To get rid of this
which it will be possible to compare two grammar checkers developed for different
languages without the need of a common test set. Our approach is to estimate the
complexity of the grammar correction for a given input test data then find out the
correlation between the performance of the grammar checker and the complexity
89
value of the test data. Our hypothesis is that, these correlation values will indicate
how well a grammar checker performs for a given complex test data. Thus even
if two test set data are not similar but have same complexity value then we can
test sentence. A first step would be to identify the important features that increase
complexity in the text. On the surface, this problem has some resemblances to the
problem of estimating readability of text [McCallum and Peterson, 1982; Kim et al.,
readability estimation has been explored by [Sinha et al., 2012]. Some features
grammar correction under the basic premise that text which is harder to read will
be harder to correct. This makes sense when we observe that in the process of
the meaning of that text depicted by the word sequence and then attempt to
place words in particular positions of the text so that the meaning of the text
can be properly conveyed. Sentences that are complex to read are often hard to
understand and are more complex to correct. We thus surveyed readability and
lexical richness estimation metrics proposed till date, like Flesch Kincaid Reading
Ease, Gunning Fog index [McCallum and Peterson, 1982], Smog [McLaughlin,
1969], Lix, Rix, Yule’s characteristic [Yule, 1944], Simpson’s Index, Guiraud Index
[Daller, 2010] and Uber Index etc. However, not all features used in the problem of
estimating readability of text [Sinha et al., 2012; McCallum and Peterson, 1982; Kim
90
and Callan, 2004, 2005; Collins-Thompson, 2011] are directly applicable in our
case as we are dealing with erroneous text. As a result, we have introduced new
Complexity of text occurs mainly for two reasons. Firstly, a sentence might not
contain enough information to convey the concept behind the sentence. Secondly,
a sentence might contain lots of information that increases the cognitive load to
structures [Lourdes, 2003; Bachman, 1976]. Here we will concentrate only on the
load indirectly. Consider the following features responsible for text and grammar
correction complexity.
sentence [Hill and Murray, 1998]. Presence of comma in the proper place in a
sentence can lead to faster reading times and reduces the need to re-read the
entire sentence. Commas also help to reduce problems arising from ambi-
guities; the “garden path effect” [Ferreira et al., 2001] can be greatly reduced
if commas are correctly present after introductory phrases and reduced rela-
tive clauses [Israel et al., 2012]. Several studies have provided evidence that
readers experience difficulty when they read “garden path sentences” like
“The old man the boat”. A “garden path sentence” [Pazzani, 1984]is one that
91
place can decrease the complexity of text.
word can have different POS. Generally, when a person reads a sentence
the user builds up a likely meaning for each word and a meaning for the
appears that changes the meaning of the sentence, the user switches to the
new meaning and continues. If a word has multiple POS tags and a tag
of the sentence. For example, in the sentence “The complex houses married and
single soldiers and their families” is complex to understand for second language
learners of English. This is because the word ’houses’ is used as a verb here,
Syntactic Structure: If thematic roles in a sentence deviate from usual agent (do-
er) before patient (do-ee) order, then patient increases cognitive load and
A reversible passive sentence like “The little rat is chased by the big cat.” is
complex than “The big cat chases the little rat”. Sentence complexity using
#LV(s) + #LN(s)
Sentence Complexity (s)=
#Clauses(s)
Where #LV(s) and #LN(s) are the number of verbal and non verbal links
(i.e. Verb phrase, Noun Phrase) and #Clauses is the number of clause in
92
the complexity because relationships between clauses are not always used
found mostly in the literature domain. One can detect metaphor by bigram
example, “He planted good ideas in their minds.” Here the verb ‘planted’
acts on the noun ‘ideas’ and makes the sentence metaphoric. Generally in
corpus the object that occurs more frequently with verb ‘planted’ are ‘trees’,
Lexical Density: Psycholinguistic studies have long shown that less densely packed
ers. Lexical density is a measure of the ratio of different words to the total
1977] it has been seen that there is a correlation between low lexical density
or minimally terminable unit” [Hunt, 1965; Sachs and Polio, 2007]. T-Units
which are longer (number of words) and have more subordinate clauses are
consists one T-Unit, while a compound sentence consists of more than one
T-Unit [Gaies, 1980]. For example: The Sun rose. The fog dispersed. The general
93
When the above sentence is written as, At Sunrise, the fog having dispersed,
the general, determined to delay no longer, gave the order to advance. Then the
obvious that the second sentence having greater mean T-Unit length is more
Abstractness: Less frequent (i.e. unfamiliar) words and words that represent
abstract ideas increase text complexity because the presence of such words
many cases the references of the pronouns cross sentence boundaries, if the
the set of possible corrections will be referred to as the confusion set hence-
sion set such that C = {c1 , c2 , · · · , cn } from where a particular word ci need
to be placed to make the sentence correct. X and Y are the left and right
count(Xci Y) = count(Xc j Y) + θ
increases as the value of θ decreases and the size of the context window
(size of X and Y) and the size of the confusion set C increases. For example,
the English sentence “Ram is going C market” is not complex when C= {to,
94
at}, X={going} and Y={market}, because θ is very large as frequency(going to
with less number of words (especially nouns) in the left side of C and large
number of words in the right side of C. It has been seen that if we have a
difficult to find out proper ci . We also need to know the previous context
of that sentence in order to find out the pronominal reference that the pro-
noun is pointing to. Presence of multiple errors that have mutual influences
Other factors also can increase the complexity of the sentences like sentence length,
F = { fi , fi+1 , · · · , fi+n , f j , f j+1 , · · · , f j+m } where the fi are the features responsible for
readability of text and the f j features are responsible for severity of errors in text.
Table 5.5 shows features used for grammar correction complexity estimation.
Using these features we have design a multiple linear regression model as shown
below:
where {α0 , αi , αi+1 , · · · , αi+n } and {β j , β j+1 , · · · , β j+n } are the parameters of the multi-
2
A hyperbole is an exaggeration which is used to put emphasis on the statement, and is usually
not intended to be taken literally.
3
An understatement is a phrase that minimizes the content of the message and represents much
less than it really is.
95
Table 5.5: Features for estimation of grammar correction complexity
Readability of Text
Number of word per sentence
Number of punctuation per sentence
Number of conjunction per sentence
Number of discourse marker (Example: “like”, “how”, “as”)
indicating reason, confirmative or concessive subordinate per sentence
Number of words having more than 7 or more letters per sentence
Number of pronouns per sentence
Number of coordinating conjuncts per sentence
Number of infrequent words (unigram count in corpora less than 50) per sentence
Severity of error
Number of errors per sentence
Number of errors per sentence influencing to each other
Length of confusion set C
Value of θ when count(Xci Y) = count(Xc j Y) + θ and 0 ≤ θ ≤ 100
Number of words in left side of C
Number of words in right side of C
96
Table 5.6: Complexity Score in different complexity level
ple linear regression that need to be learned during training process and Ω is the
complexity score.
To build the training data we collected 1000 different sentences from different
cally induced errors in those sentences. The resultant erroneous sentences had
different level of error density. Furthermore, we also tried to ensure that sen-
tences also contain the features as described in table 5.5. Then we defined the
complexity score for four levels as “Very easy”, “Easy”, “Complex”, and “Very
experts and two native speakers for correction. We also requested them to en-
ter a complexity score (see table 5.6) according to the difficulty that they faced
during correction of those sentences. Then the proposed multiple linear regres-
sion model was trained on this training dataset and the values of the parameters
α0 , αi , αi+1 , · · · , αi+n , β j , β j+1 , · · · , β j+n } were estimated. After learning the parame-
ters of the multiple linear regression, we estimated the complexity score of five
text domains (business, health, sports, literature and politics) each containing 500
erroneous sentences. We observed that the Relative Error of the multiple linear
1 ∑ ActualScorei − PredictedScorei
|N|
RelativeError = | |
|N| i=1 ActualScorei
97
here |N| is the number of test sentences. ActualScorei and PredictedScorei are the
actual complexity score given by user and predicted complexity score by the model.
Feature Selection
While trying to analyze the cause of poor performance, we found that there are
some irrelevant and redundant features. High dimensional features increase the
computational cost and irrelevant features hamper the accurate prediction of com-
from a large number of features is practically not feasible. Correlation analysis was
performed on this training data with two objectives. First, to identify the set of
Secondly, to find out features which are more relevant for a particular complexity
value by looking at the correlation values between the target variables and the fea-
tures. Those features having a low correlation (-0.1 to +0.1) between features and
the complexity score have been removed, on the assumption that those features
will not be contributing for setting the model for estimating grammar correction
complexity. However, following this relevant feature selection procedure, the rel-
ative error of our multiple linear regression model becomes 0.36. The reasons
behind this are inadequate training data and lack of more refined linguistic fea-
tures. Moreover, the multiple linear regression model only provides a complexity
score but is unable to comprehensively explain the factors regarding features that
the idea of active learning has been employed for bettering our estimate of the
98
There are active learning frameworks already in place like PROTOS [Bareiss
et al., 1990; Clark, 1987] which has been used in the field of medical diagnostics
to learn interactively from a domain expert to classify events. The system re-
tains the guided learning cases and also the causes of failures and the associated
explanations for those specific cases. We have followed the PROTOS architec-
ture for active learning of grammar correction complexity for better generaliza-
tion because of the need to elicit knowledge from an expert user and to provide
a language specific feature that may benefit from guided explanation from lin-
of a given input text. Initially the example-base contains examples in the form
[< f1 : v1 >, < f2 : v2 >, · · · , < fn : vn >, ci ], where fi is the attribute of the fea-
ture, vi is its value and ci is the complexity score of a sentence involving these
features. For a given English sentence “Ram *go to market”, an example may look
like [< Num_o f _words : 4 >, < Num_o f _preposition : 1 >, < Num_o f _Error : 1 >, <
Num_o f _in f requent_words : 1 >, 10]. Here * indicates the erroneous word and the
number 10 indicates the complexity of the sentence. At the time of training of the
choice questions (MCQ) format. Then the user provides his correction and com-
plexity value of sentence consulting table 5.6. Depending on the extracted feature
values from the sentence and using the k-NN algorithm this system estimates the
complexity score of the same sentence. Given this setting, the following situations
are possible.
Situation 1: User’s selected option is correct and user’s complexity score and
99
Situation 2: User’s selected option is incorrect and user’s complexity score and
If the user’s selected option is incorrect and complexity score provided by the user
is very low and the score does not match with system generated score then system
will not ask user to supply an explanation regarding complexity of the given
input sentence. In such situation, it is assumed that user is not confident enough
to guide the system to take better inference as a part of his interaction. Other
than the above mentioned situation, whenever the complexity score provided by
the user and the one estimated by the system varies, the interaction based active
learning procedure starts. In this case, the system provides explanations of its
decision in the form of common features between the given input sentence and
the nearest example from the example-base selected using the k-NN algorithm. It
also provides the extra features in input sentence not present in nearest matched
example or vice versa. The user then selects or adds the features that contribute
to the complexity of the given sentence. After receiving the user’s feedback, the
system generates a new example with the selected and the new features. The new
example is inserted into the example-base if it is not present there. The system
also remembers a link between the nearest example provided by the k-NN and
the new example generated depending on the user’s feedback, so that whenever
in the future this nearest example is selected, then the system will automatically
map it to the newly generated example. The proposed active learning procedure
is shown in Algorithm 1.
5.5. User can provide a class name (like “Very Easy”, “Easy”, “Complex” or “Very
Complex”) instead of entering specific complexity score. Then the system’s gen-
100
Algorithm 1 Algorithm for estimation of grammar correction complexity using
Active Learning
Require: UsrComScore, UsrSel, MCQ, ExampleBase
{MCQ : Multiple Choice Question}
{UsrComScore : Complexity score provided by the users}
{UsrSel : Selection of user for MCQ from available option}
101
erated complexity score will be mapped to one of these complexity class. It is seen
that relative error of the proposed active learning model is 0.16 which is much
less than that obtained using multiple regression when tested on the same dataset.
Figure 5.6 shows complexity score obtained over 10 trials of each of the five do-
mains. In each trial, we have randomly selected 50 sentences from 500 erroneous
sentences of each domain and computed the average complexity score using our
active learning based model. From the complexity score shown in figure 5.6, it is
apparent that complexity of the literature domain is higher than any other domain
considered here. This is expected, since figurative uses of words are common in
this domain, and nouns are ornamented with adjective and intensifiers. Rhetori-
cal structures are usually found in sentences of literature domain. Idiomatic and
colloquial patterns are used more than any other domain considered here. Some-
times phrases of foreign language are present as a part of the source language in
Uses of Informal style in text sometimes involve region dependent slang that is not
due to sociocultural variation of readers increases the complexity of the text. Ap-
format that appears complex to the Bangla second language learners and to the
Bangla native speakers as well. In figure 5.7, we have shown POS tag distribution
of five domains (business, health, sports, literature and politics). It is apparent that
are appearing with higher frequency in literature domain than any other domain
102
Table 5.7: Correlation of complexity score with grammar checkers accuracy
represent that the sentence is a complex sentence according to the syntactic struc-
or clause beyond the boundary of that particular sentence. Figure 5.8 shows the
ature and politics). In this figure we can see that the distributions of infrequent
words are more in the literature domain than any other domains considered.
Figure 5.9 shows the complexity scores obtained from 500 erroneous sentences
of each domain and the respective accuracies obtained by our NLG based grammar
checker.
The Pearson’s correlation [Mangal, 2012] coefficient (r) of the complexity score
the NLG based grammar checker is shown in Table 5.7, which shows a strong
two systems. Thus both classifiers have low accuracies when the complexity is
high, and vice versa. This strengthens the case for the robustness of the proposed
complexity measure.
103
Figure 5.5: Screenshot of active learning framework for estimation of text com-
plexity. The explanation of the feature names are available at http:
//nlp.cdackolkata.in/testComplexity/FeatDtl.spy
104
Figure 5.6: Complexity values across different datasets
105
Figure 5.8: Frequency of word distribution across different domains.
Figure 5.9: Complexity measure and Precision score obtained by NLG based gram-
mar checker and Naïve Bayes classifier systems.
106
CHAPTER 6
As a part of conclusion, this final chapter summarises our contributions and scope
of the future work. The aim of the thesis was to represent a technique to detect
and correct grammatical errors in a morphologically rich and free word order
the unavailability of large sized error corpus, robust parsers, insufficient linguistic
rules and lack of standard evaluation metric of grammatical error detection and
has been done using a combination of a confidence score estimator and a mal-rule
filter to introduce errors into a corpus of correct text. These synthetic corpora
have been utilized during evaluation. A similar approach can also be adopted
for generation of synthetic error corpus in other Indian languages where such
resources are not available as of now. A novel NLG based approach has been
The NLG based approach has been used instead of NLU based approach to avoid
the complexity and ambiguity of the grammar for parsing and also to circumvent
the need for modeling ungrammatical sentences. The proposed approach not only
corrects the mistakes committed by the users but also provides relevant examples
for supporting its correction. It also estimates the complexity of the grammar
correction task so that the user can be informed about the system’s confidence.
An active learning based complexity estimation has been used to estimate the
complexity of grammar correction. The NLG based approach reported here can
scale to calculate the acceptability score depending upon the judgment of the
human evaluators having three options viz. a) not acceptable b) acceptable with
of test data and to find out the correlation between complexity value and accuracy
of the system tested on that data. To provide better generalization and explanations
6.1 Contributions
108
• Taxonomy of Bangla grammatical errors.
• Other resources like echo words, hyphenated words also collected automat-
ically at the time of automatic synthetic error corpora creation that may help
in other research of Bangla language analysis.
Bangla Second Language Learners data is required to find the patterns of their
errors in order to improve the performance of the system. Instead of the bigram
language model, a weighted higher order n-gram based linear learning model with
of our NLG based system relies on some basic NLP components like POS tagger,
the search space and to improve the processing speed of the system. As Bangla is
a free word order language [Bhattacharya et al., 2005; Dandapat et al., 2004], CFG
based parsing framework has limitation [Shieber, 1985; Begum et al., 2008; Bharati
et al., 2010] for analysing Bangla sentences. Hence, a dependency based parsing
[Nivre, 2005; Popel et al., 2011; Nivre, 2008; Zhang and Nivre, 2011; Chen et al.,
2012, 2010, 2011] framework can be used with our proposed NLG based system for
109
better analysis of input sentences specially for checking subject-verb agreement.
Moreover, a principled approach needs to be devised for sampling the auto gen-
erated error corpus in the boundary cases and also to ensure that automatically
generated error sentences will mimic the naturally occurring learners’ errors. A
statistical classifier can make use of active learning to bootstrap the corpus creation
process. The parameters in our regression model need to be examined more closely
to have insights into which features are more central in determining complexity.
At a later stage, we may also need to study interactions between the features more
closely.
110
APPENDIX A
domain
REFERENCES
Agirre, E., K. Gojenola, K. Sarasola, and A. Voutilainen, Towards a Single Pro-
posal in Spelling Correction. In Proceedings of the 36th Annual Meeting of the Asso-
ciation for Computational Linguistics and 17th International Conference on Computa-
tional Linguistics, volume 1 of ACL ’98. Association for Computational Linguis-
tics, Stroudsburg, PA, USA, 1998. URL http://dx.doi.org/10.3115/980845.
980850.
Alfred, W., The Elements of English Grammar. 2nd. Pitt Press, Cambridge, 1894.
Arppe, A., Developing a grammar checker for swedish. In Proceedings of the Twelfth
Nordic Conference in Computational Linguistics. 2000.
Banko and Brill, Scaling to very large corpora for natural language disambigua-
tion. In ACL. 2001.
Bartha, C., T. Spiegelhauer, R. Dormeyer, and I. Fischer (2006). Word order and
discontinuities in dependency grammar. Acta Cybern., 17(3), 617–632. URL http:
//dblp.uni-trier.de/db/journals/actaC/actaC17.html#BarthaSDF06.
115
Basili, R. and F. M. Zanzotto (2002). Parsing Engineering and Empirical Robust-
ness. Natural Language Engineering, 8(2-3), 97–120.
116
Birn, J., Detecting grammar errors with lingsoft’s swedish grammar checker. In In
Proceedings of the Twelfth Nordic Conference in Computational Linguistics. 2000.
Bolioli, Dini, and Malnati, Jdii: Parsing italian with a robust constraint grammar.
In Proceedings of COLING. 1992.
Bradac, J. J., R. A. Davies, and J. A. Courtright (1977). The Role of Prior Mes-
sage Context in Evaluative Judgments of High- and Low-Diversity Messages.
Language and Speech, 20(4), 295–307.
Brill, E. and R. C. Moore, An Improved Error Model for Noisy Channel Spelling
Correction. In Proceedings of the 38th Annual Meeting on Association for Computa-
tional Linguistics, ACL ’00. Association for Computational Linguistics, Strouds-
burg, PA, USA, 2000. URL http://dx.doi.org/10.3115/1075218.1075255.
Brockett, C., W. B. Dolan, and M. Gamon, Correcting ESL errors using phrasal
SMT techniques. In Proceedings of the 21st International Conference on Computa-
tional Linguistics and the 44th annual meeting of the Association for Computational
Linguistics, ACL-44. Association for Computational Linguistics, Stroudsburg,
PA, USA, 2006. URL http://dx.doi.org/10.3115/1220175.1220207.
Campbell, C. and Y. Ying (2011). Learning with support vector machines. Synthesis
Lectures on Artificial Intelligence and Machine Learning, 5, 1–95.
Catt, M. and G. Hirst (1990). An intelligent cali system for grammatical error
diagnosis. Computer Assisted Language Learning, 3, 3–26.
Chatterjee, S. K., The Origin and Development of the Bengali Language. Rupa co.,
New Delhi, 1926.
117
Chaudhuri, B. B., Towards Indian Language Spell-checker Design. In Proceedings of
the Language Engineering Conference, LEC ’02. IEEE Computer Society, Washing-
ton, DC, USA, 2002. ISBN 0-7695-1885-0. URL http://dl.acm.org/citation.
cfm?id=788016.788703.
Chen, W., J. Kazama, Y. Tsuruoka, and K. Torisawa, Improving Graph-based
Dependency Parsing with Decision History. In COLING (Posters). 2010.
Chen, W., J. Kazama, M. Zhang, Y. Tsuruoka, Y. Zhang, Y. Wang, K. Torisawa,
and H. Li, SMT Helps Bitext Dependency Parsing. In EMNLP. 2011.
Chen, W., J. Kazama, M. Zhang, Y. Tsuruoka, Y. Zhang, Y. Wang, K. Torisawa,
and H. Li (2012). Bitext Dependency Parsing With Auto-Generated Bilingual
Treebank. IEEE Transactions on Audio, Speech & Language Processing, 20(5), 1461–
1472.
Chodorow, M., M. Dickinson, R. Israel, and J. R. Tetreault, Problems in Evaluating
Grammatical Error Detection Systems. In COLING. 2012.
Chodorow, M. and C. Leacock, An unsupervised method for detecting grammat-
ical errors. In Proceedings of the 1st North American chapter of the Association for
Computational Linguistics conference (NAACL 2000). San Francisco,CA, 2000.
Choudhury, M., M. Thomas, A. Mukherjee, A. Basu, and N. Ganguly, How
Difficult is it to Develop a Perfect Spell-checker? A Cross-Linguistic Analysis
through Complex Network Approach. In Proceedings of the Second Workshop on
TextGraphs: Graph-Based Algorithms for Natural Language Processing. Association
for Computational Linguistics, Rochester, NY, USA, 2007. URL http://aclweb.
org/anthology//W/W07/W07-0212.pdf.
Clark, P. (1987). PROTOS - A Rational Reconstruction. Technical report, Turing
Institute, Glasgow.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and
Psychological Measurement, 20(1), 37–46. URL http://epm.sagepub.com/cgi/
doi/10.1177/001316446002000104.
Collins-Thompson, K., Enriching Information Retrieval with Reading Level Pre-
diction. In Proceedings of SIGIR 2011 Workshop on Enriching Information Retrieval.
Beijing, China, 2011.
Collins-Thompson, K., P. N. Bennett, R. W. White, S. de la Chica, and D. Sontag,
Personalizing Web Search Results by Reading Level. In Proceedings of the 20th
ACM international conference on Information and knowledge management, CIKM
’11. ACM, New York, NY, USA, 2011. ISBN 978-1-4503-0717-8. URL http:
//doi.acm.org/10.1145/2063576.2063639.
Collins-Thompson, K. and J. Callan (2005). Predicting Reading Difficulty with
Statistical Language Models. Journal of the American Society for Information Science
and Technology, 56(13), 1448–1462. ISSN 1532-2882. URL http://dx.doi.org/
10.1002/asi.20243.
118
Collins-Thompson, K. and J. P. Callan, A Language Modeling Approach to Pre-
dicting Reading Difficulty. In Proceedings of HLT-NAACL. 2004.
Covington, M. A. (1990). Parsing discontinuous constituents in dependency
grammar. Comput. Linguist., 16(4), 234–236. ISSN 0891-2017. URL http:
//dl.acm.org/citation.cfm?id=124992.124997.
Cutting, D., J. Kupiec, J. Pedersen, and P. Sibun, A Practical Part-of-Speech
Tagger. In Proceedings of the third conference on Applied natural language processing,
ANLC ’92. Association for Computational Linguistics, Stroudsburg, PA, USA,
1992. URL http://dx.doi.org/10.3115/974499.974523.
Dale, R., Helping People Write: Grammar Checking and Beyond. In Tutorial in 9th
International Conference of Natural Language Processing. AUKBC, Chennai, India,
2011.
Dale, R., C. Mellish, and M. Zock, Current Research in Natural Language Generation.
Academic Press, London, 1990.
Dale, R., D. Scott, and B. D. Eugenio (1998). Introduction to the Special Issue on
Natural Language Generation. Computational Linguistic, 24(3), 346–353. ISSN
0891-2017. URL http://dl.acm.org/citation.cfm?id=972749.972751.
Daller, M., Guiraud’s index of lexical richness. In British Association of Applied
Linguistics. 2010.
Dalrymple, M., Lexical Functional Grammar. Syntax and Semantics Series, Xerox
Palo Alto Research Center, 2001.
Damerau, F. J. (1964). A Technique for Computer Detection and Correction of
Spelling Errors. Communications of the ACM, 7(3), 171–176. ISSN 0001-0782. URL
http://doi.acm.org/10.1145/363958.363994.
Dandapat, S. and S. Sarkar, Part of Speech Tagging for Bengali with Hidden
Markov Model. In Proceeding of the NLPAI Machine Learning Competition. 2006.
URL http://ltrc.iiit.ac.in/nlpai_contest06/papers/mla.pdf.
Dandapat, S., S. Sarkar, and A. Basu, A Hybrid Model for Part-of-Speech Tag-
ging and Its Application to Bengali. In Proceedings of International Conference on
Computational Intelligence. 2004.
Das, M., S. Borgohain, J. Gogoi, and S. B. Nair, Design and Implementation of a
Spell Checker for Assamese. In Language Engineering Conference. IEEE Computer
Society, 2002. ISBN 0-7695-1885-0. URL http://dblp.uni-trier.de/db/conf/
lec/lec2002.html#DasBGN02.
Dasgupta, S., C. Papadimitriou, and U. Vazirani, Algorithm. Mc. Graw Hill, 2008.
URL http://www.cs.berkeley.edu/~vazirani/algorithms.html.
Dash, N. S. (2013). Part-of-Speech (POS) Tagging of Bengali Written Text Cor-
pus. Bhasha Bijnan o Prayukti, 1(1). URL http://www.academia.edu/3931246/
Part-of-Speech_POS_Tagging_of_Bengali_Written_Text_Corpus.
119
Dave, S., J. Parikh, and P. Bhattacharyya (2001). Interlingua-Based English-Hindi
Machine Translation and Language Divergence. Machine Translation, 16(4), 251–
304.
Dickinson, M., R. Israel, and S.-H. Lee, Developing Methodology for Korean
Particle Error Detection. In Proceedings of the 6th Workshop on Innovative Use of
NLP for Building Educational Applications, IUNLPBEA ’11. Association for Com-
putational Linguistics, Stroudsburg, PA, USA, 2011. ISBN 9781937284039. URL
http://dl.acm.org/citation.cfm?id=2043132.2043142.
Dorr, B., L. Pearl, R. Hwa, and N. Habash, DUSTer: A Method for Unraveling
Cross-Language Divergences for Statistical Word-level Alignment. In Proceedings
of AMTA-02. Springer, 2002.
Douglas, S. and R. Dale, Towards robust patr. In Proceedings of the 14th confer-
ence on Computational linguistics, COLING ’92. Association for Computational
Linguistics, Stroudsburg, PA, USA, 1992. URL http://dx.doi.org/10.3115/
992133.992143.
Drucker, P., Men, Ideas, and Politics. Harvard Business School Publishing, 2010.
Ejerhed, E., Finite state segmentation of discourse into clauses. Cambridge University
Press, 1999.
120
Fliedner, G., A system for checking np agreement in german texts. In In Proceedings
of the Student Research Workshop at the 40th Annual Meeting of the Association for
Computational Lingustics(ACL). 2002.
Foster, J. (2005). Good Reasons for Noting Bad Grammar: Empirical Investigations into
the Parsing of Ungrammatical Written English. Ph.D. thesis, University of Dublin,
Trinity College, Dublin, Ireland.
Frank, A., J. K. Tracy Holloway King, and J. Maxwell, Optimality theory style
constraint ranking in large-scale lfg grammars. In Proceedings of LFG-98. Bris-
bane,Australia, 1998.
Freund and Schapire (1999). Large margin classification using the perceptron
algorithm. Machine Learning, 37(3), 277–296.
Ghosh. A., D. A., Bhaskar. P. and B. S., Dependency parser for bengali: the
ju system. In NLP Tool Contest at International Conference on Natural Language
Processing(ICON 2009). 2009.
121
Golding, A. R., A Bayesian Hybrid Method for Context-Sensitive Spelling Cor-
rection. In Proceedings of the Third Workshop on Very Large Corpora. 1995. URL
http://arxiv.org/pdf/cmp-lg/9606001.pdf.
Han, N.-R., J. R. Tetreault, S.-H. Lee, and J.-Y. Ha, Using an Error-Annotated
Learner Corpus to Develop an ESL/EFL Error Correction System. In LREC. 2010.
122
Hein, S., A chart-based frame work for grammar checking-initial studies. In 11th
Nordic Conference in Computational Linguistic.. 1998.
Hermet, M. and A. Désilets, Using first and second language models to correct
preposition errors in second language authoring. In Proceedings of the Fourth
Workshop on Innovative Use of NLP for Building Educational Applications, EdApp-
sNLP ’09. Association for Computational Linguistics, Stroudsburg, PA, USA,
2009. ISBN 978-1-932432-37-4. URL http://dl.acm.org/citation.cfm?id=
1609843.1609853.
Hermet, M., A. Désilets, and S. Szpakowicz, Using the web as a linguistic resource
to automatically correct lexico-syntactic errors. In In Proceedings of the Sixth
International Conference on Language Resources and Evaluation. 2008.
Hunt, K. W., Grammatical Structures Written at Three Grade Levels. NCTE Research
report, USA, 1965.
Jensen, K., G. E. Heidorn, L. A. Miller, and Y. Ravin, Parse fitting and prose
fixing: Getting a hold on ill-formedness. In American Journal of Computational
Linguistics. 1983.
123
Joshi, Levy, and Takahashi (1975). Tree adjunct grammars. Computer Systems
Science, 10.
Khader, R. A., T. H. King, and M. Butt, Deep call grammars: The lfg-ot experiment.
2004.
Knuth, D. E., The Art of Computer Programming, volume 3. Addison Wesley Long-
man Publishing Co., Inc., Redwood City, CA, USA, 1998, 2 edition. ISBN 0-201-
89685-0.
124
Krishnakumaran, S. and X. Zhu, Hunting Elusive Metaphors Using Lexical Re-
sources. In Proceedings of the Workshop on Computational Approaches to Figura-
tive Language, FigLanguages ’07. Association for Computational Linguistics,
Stroudsburg, PA, USA, 2007. URL http://dl.acm.org/citation.cfm?id=
1611528.1611531.
Lee, J. and S. Seneff, Correcting Misuse of Verb Forms. June. Association for Compu-
tational Linguistics, 2008, 174–182. URL http://www.aclweb.org/anthology/
P/P08/P08-1021.
Lee, S., J. Lee, H. Noh, K. Lee, and G. G. Lee (2011). Grammatical Error Simulation
for Computer-Assisted Language Learning. Knowledge Based System, 24(6), 868–
876.
Lewis and Paul (2009). Ethnologue: Languages of the world. URL http://www.
ethnologue.com/.
Littlestone, N., Learning quickly when irrelevant attributes abound: A new linear-
threshold algorithm. In Machine Learning. 1988.
125
Liu, C.-H., C.-H. Wu, and M. Harris, Word Order Correction for Language Transfer
Using Relative Position Language Modeling. In 6th International Symposium on
Chinese Spoken Language Processing. 2008.
Lopresti, D. and J. Zhou (1997). Using Consensus Sequence Voting to Correct OCR
Errors. Computer Vision and Image Understanding, 67(1), 39–47. ISSN 1077-3142.
URL http://dx.doi.org/10.1006/cviu.1996.0502.
Lowth, R., A Short Introduction to English Grammar: With Critical Notes. Millar and
Dodsley, London, 1762.
Manning, C. D., Part-of-Speech Tagging from 97% to 100%: Is It Time for Some
Linguistics? In Proceedings of the 12th international conference on Computational
linguistics and intelligent text processing - Volume Part I, CICLing’11. Springer-
Verlag, Berlin, Heidelberg, 2011. ISBN 978-3-642-19399-6. URL http://dl.acm.
org/citation.cfm?id=1964799.1964816.
Mays, E., F. J. Damerau, and R. L. Mercer (1991). Context based Spelling Cor-
rection. Information Processing and Management, 27(5), 517–522. ISSN 0306-4573.
URL http://dx.doi.org/10.1016/0306-4573(91)90066-U.
126
Mellish, C. S., Some chart-based techniques for parsing ill-formed input. In Pro-
ceedings of the 27th annual meeting on Association for Computational Linguistics,
ACL ’89. Association for Computational Linguistics, Stroudsburg, PA, USA,
1989. URL http://dx.doi.org/10.3115/981623.981636.
Michaud, L. N. and K. F. Mccoy, An intelligent tutoring system for deaf learners
of written english. In In Proceedings of the Fourth International ACM SIGCAPH
Conference on Assistive Technologies (ASSETS 2000). SIGCAPH, 2000.
Michaud, L. N. and K. F. Mccoy, Error profiling: Toward a model of english
acquisition for deaf learners. In In Proc. of the 39th Annual Meeting and the
10th Conference of the European Chapter of Association for Computational Linguistics
(EACL). 2001.
Mitchell, T., Machine Learning. 1997.
Nagata, R., F. Masui, A. Kawai, and N. Isu, Recognizing Article Errors Based on
the Three Head Words. In CELDA. 2004.
Newman, M. E. J. (2003). The structure and function of complex networks. SIAM
Review, 45, 167–256.
Nina H. MacDonald, P. S. G., Lawrence T. Frase and S. A. Keenan., The writer’s
workbench: Computer aids for text analysis. In IEEE Transactions on Communi-
cations, volume 3. 1982.
Nivre, J. (2005). Dependency Grammar and Dependency Parsing. Technical report,
Växjö University: School of Mathematics and Systems Engineering.
Nivre, J. (2008). Algorithms for Deterministic Incremental Dependency Parsing.
Computational Linguistic, 34(4), 513–553. ISSN 0891-2017. URL http://dx.doi.
org/10.1162/coli.07-056-R1-07-027.
Oyama, H. and Y. Matsumoto, A machine learning approach for error identifi-
cation for learners of japanese. In The society for Teaching Japanese as a Foreign
Language Spring Meeting. 2008.
Papineni, K., S. Roukos, T. Ward, and W.-J. Zhu, Bleu: a method for automatic
evaluation of machine translation. In Proceedings of the 40th Annual Meeting on
Association for Computational Linguistics, ACL ’02. Association for Computational
Linguistics, Stroudsburg, PA, USA, 2002. URL http://dx.doi.org/10.3115/
1073083.1073135.
Park, J. C., M. Palmer, and G. Washburn, An english grammar checker as a writing
aid for students of english as a second language. 1997.
Park, Y. A. and R. Levy, Automated Whole Sentence Grammar Correction Us-
ing a Noisy Channel Model. In Proceedings of the 49th Annual Meeting of the
Association for Computational Linguistics: Human Language Technologies - Volume
1, HLT ’11. Association for Computational Linguistics, Stroudsburg, PA, USA,
2011. ISBN 978-1-932432-87-9. URL http://dl.acm.org/citation.cfm?id=
2002472.2002590.
127
Pazzani, M. J., Conceptual Analysis of Garden-Path Sentences. In Proceedings of the
10th international conference on Computational linguistics, COLING ’84. Association
for Computational Linguistics, Stroudsburg, PA, USA, 1984. URL http://dx.
doi.org/10.3115/980431.980595.
Rich and Knight, Artificial Intelligence. 2nd. McGraw Hill, New York, 1991.
Robb, T., S. Ross, and I. Shortreed (1986). Salience of Feedback on Error and Its
Effect on EFL Writing Quality. TESOL Quarterly, 20, 83–93.
Rozovskaya, A. and D. Roth, Algorithm Selection and Model Adaptation for ESL
Correction Tasks. In ACL. 2011.
Sachs, R. and C. Polio (2007). Learners’ Uses of Two Types of Written Feedback
on a L2 Writing Revision Task. Studies in Second Language Acquisition, 29, 67–100.
128
Language Resources Association, 2008. URL http://dblp.uni-trier.de/db/
conf/lrec/lrec2008.html#SankaranBCBBJRSSS08.
Scheler, G. and T. Munchen, With raised eyebrows or the eyebrows raised? a neu-
ral network approach to grammar checking for definiteness. In Bilkent University.
1996.
Schmidt-Wigger and Anje, Grammar and Style Checking for German. In Pro-
ceedings of the Second International Workshop on Control Language. Pittsburgh,PA,
1998.
Sinha, M., S. Sharma, T. Dasgupta, and A. Basu, New Readability Measures for
Bangla and Hindi Texts. In COLING (Posters). 2012.
Tetreault, J. R., J. Foster, and M. Chodorow, Using Parse Features for Preposition
Selection and Error Detection. In ACL (Short Papers). 2010.
129
Toutanova, K. and R. C. Moore, Pronunciation Modeling for Improved Spelling
Correction. In Proceedings of the 40th Annual Meeting on Association for Computa-
tional Linguistics, ACL ’02. Association for Computational Linguistics, Strouds-
burg, PA, USA, 2002. URL http://dx.doi.org/10.3115/1073083.1073109.
Uria, L., B. Arrieta, A. D. de Ilarraza, M. Maritxalar, and M. Oronoz, Determiner
errors in basque: Analysis and automatic detection. In Procesamiento del Lenguaje
Natural. 2009.
Uszkoreit, H., Grammar Checking: Theory, Practice and Lessons learned in LATESLAV.
Prague, 1996.
Uzzaman, N., A Bangla Phonetic Encoding for Better Spelling Suggestion. In
Proceedings of 7th International Conference on Computer and Information Technology.
2004.
UzZaman, N. (2005). Phonetic Encoding for Bangla and its Application to
Spelling checker, Transliteration, Cross Language Information Retrieval and
Name Searching. BRAC University. URL http://citeseerx.ist.psu.edu/
viewdoc/download?doi=10.1.1.173.1756&rep=rep1&type=pdf. Undergradu-
ate Thesis.
van Berkel, B. and K. De Smedt, Triphone Analysis: A Combined Method for
the Correction of Orthographical and Typographical Errors. In Proceedings of the
Second Conference on Applied Natural Language Processing, ANLC ’88. Association
for Computational Linguistics, Stroudsburg, PA, USA, 1988. URL http://dx.
doi.org/10.3115/974235.974250.
Van Gael, J., A. Vlachos, and Z. Ghahramani, The Infinite HMM for Unsupervised
PoS Tagging. In Proceedings of the 2009 Conference on Empirical Methods in Natural
Language Processing: Volume 2 - Volume 2, EMNLP ’09. Association for Compu-
tational Linguistics, Stroudsburg, PA, USA, 2009. ISBN 978-1-932432-62-6. URL
http://dl.acm.org/citation.cfm?id=1699571.1699601.
Viterbi, A. (1967). Error Bounds for Convolutional Codes and an Asymptotically
Optimum Decoding Algorithm. IEEE Transactions on Information Theory, 13(2),
260–269. URL http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?
arnumber=1054010.
Vitevitch, M. S., Phonological neighbors in a smallworld: What can graph theory
tell us about word learning? In Spring 2005 Talk Series on Networks and Complex
Systems. Indiana University, 2005.
Vogel, C. and R. Cooper, Robust chart parsing with mildly inconsistent feature
structures. In Nonclassical Feature Systems, volume 10. 1995.
Wagner, J., J. Foster, and J. van Genabith, A Comparative Evaluation of Deep and
Shallow Approaches to the Automatic Detection of Common Grammatical Er-
rors. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Lan-
guage Processing and Computational Natural Language Learning (EMNLP-CoNLL).
Association for Computational Linguistics, Prague, Czech Republic, 2007.
130
Wagner, R. A. and M. J. Fischer (1974). The String-to-String Correction Problem.
J. ACM, 21(1), 168–173. ISSN 0004-5411. URL http://doi.acm.org/10.1145/
321796.321811.
Whitelaw, C., B. Hutchinson, G. Y. Chung, and G. Ellis, Using the Web for Lan-
guage Independent Spellchecking and Autocorrection. In Proceedings of the 2009
Conference on Empirical Methods in Natural Language Processing, EMNLP ’09. As-
sociation for Computational Linguistics, Stroudsburg, PA, USA, 2009. ISBN 978-
1-932432-62-6. URL http://dl.acm.org/citation.cfm?id=1699571.1699629.
Yi, X., J. Gao, and W. B. Dolan, A web-based english proofing system for english
as a second language users. In Proceeding of the International Joint Conference on
Natural Langauge Processing(IJCNLP). 2008.
Yule, G. U., The Statistical Study of Literary Vocabulary. Cambridge University Press,
1944.
131
LIST OF PAPERS BASED ON THESIS
132