You are on page 1of 153

Natural Language Generation for Bangla Grammatical

Error Detection and Correction

A THESIS

submitted by

BIBEKANANDA KUNDU

for the award of the degree

of

MASTER OF SCIENCE
(by Research)

DEPARTMENT OF COMPUTER SCIENCE AND


ENGINEERING
INDIAN INSTITUTE OF TECHNOLOGY, MADRAS.
April 2014
THESIS CERTIFICATE

This is to certify that the thesis titled Natural Language Generation for Bangla

Grammatical Error Detection and Correction, submitted by Bibekananda Kundu,

to the Indian Institute of Technology, Madras, for the award of the degree of Master

of Science (by Research), is a bona fide record of the research work done by him

under our supervision. The contents of this thesis, in full or in parts, have not

been submitted to any other Institute or University for the award of any degree or

diploma.

Dr. Sutanu Chakraborti Mr. Sanjay Kumar Choudhury


Research Guide Research Co Guide
Assistant Professor Principal Engineer
Dept. of Computer Science and Engineering Language Technology
IIT-Madras, 600036 CDAC Kolkata, 700091
Place: Chennai Place: Kolkata

Date: Date:
ACKNOWLEDGEMENTS

I would like to take this opportunity to say big generic thanks to many people

who deserve to be mentioned on this page, but aren’t mentioned here by name.

Their various sobering and heartening contributions are unforgettable. I wish to

express my deepest gratitude to my supervisors Dr. Sutanu Chakraborti and Mr.

Sanjay Kumar Choudhury for introducing me to this research topic and provid-

ing their valuable guidance, inspiring discussions and unfailing encouragement

throughout the course of the work. I have enjoyed considerable freedom under

their guidance. They alerted me whenever I was in the wrong track during my

research. They shaped how I think, write and do research at a very deep level

and also taught me how to see where an approach will fail even before I try it. I

have learned from them how to think like a linguist and a computer scientist at the

same time. Their enthusiasm provided a lot of fun in my research. I really wonder

at their patience on my writing, struggling to understand every bit of it, always

raising questions and providing new exciting ideas. Their advice, encouragement,

constructive criticism helped me a lot in this research. They always had faith in

my capabilities and made sure that I expressed myself more clearly. They pushed

me to think to explore newer avenues of the research. Whenever I felt that I could

not get any further with grammar checking they came up with new ideas which

motivated me to work hard. I could not imagine having better advisers and men-

tors for my MS research. I consider myself fortunate to be your advisee and I am

honoured to be your student in the first place. Work on this dissertation would not

have been possible without encouragement, contributions and constant support

i
from them. During my research work I have worked in a nice and stimulating

research group where I got much pleasure. Many thanks to all my colleagues of

Language Technology section of CDAC Kolkata and my friends of IIT Chennai for

creating an inspiring research environment with interesting projects, seminars and

conferences. I especially want to mention Sudipta Debnath for being the helping

hand next door whenever I needed. His inspiration and useful suggestions moti-

vated me in this work. Many of the ideas embodied in this study were crystallized

with the help of his support. Programming libraries, subroutines provided by

him appeared as the pillars for building the prototype of the system. I don’t have

enough words to express my feeling, respects and thanks to him. I would also

like to thank Abhijit Chatterjee, Debarun Kar, K.V.S Dileep and Mridusmita Mitra

for helping me a lot, in particular, collecting valuable resources for my study and

careful proof reading of my write-ups. They were the first reviewers of some of

the chapters of this thesis. They noticed many typos, errors, and strange sentence

structures. Their work was much more valuable than that of any grammar checker!

Any errors that still remain in this dissertation are my sole responsibility, but I can

assure you that there are far fewer now than they were used to be. I would also

like to thank Pampa Bhattarchayya and Mridusmita Mitra for always being helpful

regarding necessary linguistic information, whenever I needed. I would like to

thank Subash Chandra who is the first user of my system as a second language

learner of Bangla, Hindi being his mother tongue. He had provided a number of

non-native data which helped me a lot in this research work. I would also like to

thank Pradeep Raychoudhury for helping me to prepare the materials for presen-

tations and posters. Some persons deserve special mention for discussions that

contributed quite directly in this research: Barnali Pal, Sita Rajmohan, Tulika Basu,

Joyanta Basu and Rajib Roy. I am very appreciative of my classmates and friends

ii
at IIT Chennai who participated in this study. These includes Debarun, Dileep,

Sourav, Prateek, Smith and many more. They created a nice research environment

where we have argued, discussed and nurtured our ideas during my presence in

IIT or even some time over telephonic conversations. Without their support, this

study would not have been possible. I am grateful to my organization CDAC

Kolkata for providing me the necessary leaves and infrastructure facility for con-

tinuing my research as an External candidate. I would like to express my heartfelt

gratitude to Executive Director of CDAC Kolkata Col. A. K. Nath and Ex-executive

directors Sri A.B. Saha and Sri R. Rabindra Kumar for their inspiration, motivation

and support for this study. I am very much thankful to all the faculty members,

staff members and research scholars of the Department of Computer Science and

Engineering of IIT Madras for their direct or indirect help in various forms during

my course work and research work. The NLP community have provided excel-

lent feedback and essential criticism on my work through anonymous reviews.

Many useful comments have been provided by conference attendees. I am espe-

cially thankful to Prof. Robert Dale (Macqarie University), Prof. Sudeshna Sarkar

(IIT Kharagpur) and Prof. Puspak Bhattarchayya (IIT Bombay) for their valuable

suggestions and feedback during my presentation at ICON 2011. I am also thank-

ful to Dr. Michael Gamon (Microsoft Research) and Prof. Kevin Knight (USC,

Information Sciences Institute) for providing invaluable feedback and necessary

references. My deepest gratitude goes to my parents who have encouraged me

to pursue my studies, for being there for me and for always believing in me. My

parents have always inspired me for education right from my childhood, which

becomes a source of eternal driving force for me to pursue a higher degree. Their

prayers are always a great source of strength for me. Their supports have brought

me to where I am now. I cannot find appropriate words to thank my wife Soma for

iii
her steady support, encouragement and love throughout the difficult times in my

career. She must also be thanked for her caring during my entire research work.

Many a time she helped me to decide the title of the papers I have written for

conferences. She carefully read my write ups and motivated me to think from a

readers’ point of view. Along with everything else, I am grateful for her constant

support. She did everything she could to make sure I had enough time to finish my

work. I also could never have completed this study without all the encouragement

and support which I have received from my elder brother, parents-in-law and my

sister-in-law. Thank you for always being there. I am indebted to you all a lot

and cannot thank you enough. I owe all of my success to the essential things that

my family has given me over the years. I am dedicating this thesis to my family.

Finally I thank all my well-wishers who directly or indirectly contributed for the

completion of this thesis.

iv
ABSTRACT

KEYWORDS: Automatic Grammar Correction, Natural Language Gener-

ation, Automatic Error Corpora Creation, Active Learning

Based Complexity Estimation.

Learning a new language is an integral part of human life. Even after years of

learning, a person is prone to commit mistakes. These errors are due to their

lack of knowledge of the target language and influence of their previously learnt

language [Leacock et al., 2010]. As a consequence, it has been felt that automatic

detection and correction of grammatical errors will be of immense help as an aid

for language learning.

Automatic detection and correction of grammatical errors in a morphologi-

cally rich and free word order language like Bangla is a non trivial task. Little

research has been done on detection and correction of grammatical errors in such

languages. For Bangla language, this work needs to be done denovo. The problem

is to automatically detect and correct an ungrammatical Bangla sentence having

postpositional and nominal inflectional errors. A methodology needs to be devised

for correcting the mistakes committed by users and also to provide relevant ex-

amples for supporting the suggested correction. To have an idea on how strongly

we can rely on such a correction, it will be useful to devise a measure of sentence

complexity with respect to the grammar correction task. If a sentence is complex,

the user should not be overtly reliant on the correction suggested by the system.

v
Conversely if the complexity measure is low, the user can confidently choose the

suggestion.

A sufficiently large error corpus is essential for training and testing of grammar

correction methodology. Manual collection of huge error corpora is a tedious and

time consuming task. There is a dearth of error corpora for Bangla Language.

Therefore, a synthetic error corpora creation methodology has been proposed.

Divergence between two languages influences second language learners to commit

grammatical mistake. It has been widely studied that the divergence between a

pair of languages has a profound effect on various fields of NLP [Dorr et al.,

2002; Dave et al., 2001; Goyal and Sinha, 2009]. The effect of divergence becomes

more pronounced and acute for widely varying language like English and Bangla

[Bhattacharyya et al., 2011; Dave et al., 2001; Goyal and Sinha, 2009]. Bangla is a

morphologically rich language [Bhattacharya et al., 2005; Dandapat et al., 2004] and

has a free word order. Therefore, State-of-the-art Context Free Grammar (CFG)

is not applicable [Begum et al., 2008; Shieber, 1985; Bharati et al., 2010] here. In

addition to this, lack of robust parsers, insufficient linguistic rules and dearth of

error annotated parallel corpora make this grammar correction task much more

challenging. To address these issues, a novel approach has been proposed for

automatic detection and correction of Bangla grammatical errors using a Natural

Language Generation (NLG) technique.

Evaluation of grammar correction system is one of the challenges in this area of

research. Performance of most of the available grammar checkers cannot be com-

pared as different systems address different types of errors. Moreover, testing on a

common dataset is particularly problematic when different grammar checkers are

designed for different languages. To circumvent these problems, a Methodology

vi
for Evaluation of Grammar Assessment (MEGA) combining a Graded Acceptabil-

ity Assessment Metric (GAAM) and a Complexity Measurement Metric (CMM) has

been introduced. Initially, MEGA has been applied on our Natural Language Gen-

eration (NLG) based Bangla grammar checker. Since direct comparison between

available English grammar checker and the NLG based Bangla grammar checker

is not possible, the NLG based system has been compared against a prototype

Bangla grammar checker based on standard Naïve Bayes classification. Results

show that NLG based approach for our Bangla grammatical error detection and

correction system outperforms the Naïve Bayes classifier system.

vii
TABLE OF CONTENTS

ACKNOWLEDGEMENTS i

ABSTRACT v

LIST OF TABLES xii

LIST OF FIGURES xiv

ABBREVIATIONS xv

NOTATION xviii

1 INTRODUCTION 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Divergence Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 LITERATURE SURVEY 10
2.1 Spell Checker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Grammar Checker . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Automatic Grammar Correction Approaches . . . . . . . . . . . . 18
2.3.1 Rule-based Approach . . . . . . . . . . . . . . . . . . . . . 19
2.3.2 Machine Learning Approach . . . . . . . . . . . . . . . . . 27
2.3.3 Statistical Machine Translation Approach . . . . . . . . . . 39
2.4 Comparison between existing approaches . . . . . . . . . . . . . . 41
2.5 Open Problems and Future Directions . . . . . . . . . . . . . . . . 44

viii
3 AUTOMATIC CREATION OF BANGLA ERROR CORPUS 46
3.1 Errors in Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.1.1 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.2 Experimental Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.3.1 Bangla POS Tagger . . . . . . . . . . . . . . . . . . . . . . . 59
3.3.2 Confidence Score and Mal-rule Filters . . . . . . . . . . . . 63
3.4 Result and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 66

4 BANGLA GRAMMATICAL ERROR DETECTION AND CORREC-


TION 71
4.1 Pruning of the Search Space . . . . . . . . . . . . . . . . . . . . . . 75
4.2 Selection of the Best Correction . . . . . . . . . . . . . . . . . . . . 75

5 EVALUATION 80
5.1 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.2 Standard Evaluation Metric . . . . . . . . . . . . . . . . . . . . . . 82
5.3 Test Suite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.4 Evaluation Methodology . . . . . . . . . . . . . . . . . . . . . . . . 84
5.4.1 Evaluation using Standard Metrics . . . . . . . . . . . . . . 84
5.4.2 Graded Acceptability Assessment Metric: . . . . . . . . . . 86
5.4.3 Complexity Estimation of Grammar Correction . . . . . . 89

6 CONCLUSIONS AND FUTURE WORK 107


6.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

A Examples of some interesting erroneous sentences corrected by the


system. 111

B Examples of incorrect prediction by the system. 112

C Examples of sentences having different complexity 113

ix
D Examples of sentences collected from literature domain 114
LIST OF TABLES

1.1 Examples of single preposition “with” having different types of


realization in Bangla . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1 Percentage of various types of errors in Bangla . . . . . . . . . . . 13


2.2 Research on grammar checking in different languages . . . . . . . 17
2.3 A brief road map of grammatical error detection and correction
approaches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4 Syntax based grammatical error detection and correction approaches. 24
2.5 Effectiveness of Individual Features . . . . . . . . . . . . . . . . . 36

3.1 Examples of errors committed by a Bangla Second Language Learner 49


3.2 Examples of Transposition Operation. . . . . . . . . . . . . . . . . 56
3.3 Examples of Deletion Operation. . . . . . . . . . . . . . . . . . . . 58
3.4 Examples of Addition Operation. . . . . . . . . . . . . . . . . . . . 58
3.5 POS tags used in our tagger . . . . . . . . . . . . . . . . . . . . . . 61
3.6 POS tag distribution in our training and test corpus . . . . . . . . 61
3.7 Accuracy of individual POS tag using HMM . . . . . . . . . . . . 62
3.8 Three most common types of errors . . . . . . . . . . . . . . . . . 62
3.9 Experiment with confidence thresholds for generating erroneous
sentences generated by substitution operation . . . . . . . . . . . 66
3.10 Experiment with confidence thresholds for generating erroneous
sentences generated by transposition operation . . . . . . . . . . 67
3.11 Erroneous sentences generated from a single sentence and selected
according to the confidence score. . . . . . . . . . . . . . . . . . . . 69
3.12 Bangla Echo words and Hyphenated words. . . . . . . . . . . . . 70
3.13 Automatically collected collocated and co-occurred word sequences. 70

4.1 Example of Nominal Morphological Synthesis . . . . . . . . . . . 72


4.2 Example of Nominal Morphological Analysis . . . . . . . . . . . . 73

xi
5.1 Evaluation Measure Formulae . . . . . . . . . . . . . . . . . . . . . 82
5.2 True Positive, False Positive, False Negative and True Negative with
respect to grammatical error detection task. . . . . . . . . . . . . . 83
5.3 Performance evaluation of NLG based system on individual errors
as well as combined errors in five text genres. P indicates Precision
and R indicates Recall. . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.4 Grading Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.5 Features for estimation of grammar correction complexity . . . . 96
5.6 Complexity Score in different complexity level . . . . . . . . . . . 97
5.7 Correlation of complexity score with grammar checkers accuracy 103

xii
LIST OF FIGURES

2.1 Error localization by conventional and reverse dictionary [Chaud-


huri, 2002, 2001] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 The weighted SpellNet for 6 words . . . . . . . . . . . . . . . . . . 15
2.3 Structure of SpellNet for θ=1 . . . . . . . . . . . . . . . . . . . . . 15
2.4 Simplified functional diagram of grammatical error detection and
correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 Simplified functional diagram of a rule-based grammar checker . 20
2.6 Syntax tree generated by MS-NLP System . . . . . . . . . . . . . . 23
2.7 Examples of trigram sequences. . . . . . . . . . . . . . . . . . . . . 28
2.8 Basic architecture of post editing after machine translation. . . . . 34

3.1 Proportion of Errors in Native Speakers and Second Language Learn-


ers Corpus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2 Taxonomy of errors found Bangla text of second language learners. 50
3.3 Bangla Sentence Length Distribution. . . . . . . . . . . . . . . . . 55
3.4 POS tag association matrix. . . . . . . . . . . . . . . . . . . . . . . 59
3.5 Simplified functional diagram of automatic error corpora creation. 68

4.1 Generative model for well-formed and ill-formed sentence detec-


tion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.2 Example of Linguistic function . . . . . . . . . . . . . . . . . . . . 75
4.3 Pruned trellis after applying Linguistic Hard Constraints . . . . . 76
4.4 N-gram matching score between ungrammatical and correct sen-
tences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

5.1 Performance of error detection . . . . . . . . . . . . . . . . . . . . 85


5.2 Performance of error correction . . . . . . . . . . . . . . . . . . . . 86
5.3 Grades given by tester-1 and tester-2 in blind testing . . . . . . . . 88
5.4 Agreement between two testers during manual evaluation . . . . 89

xiii
5.5 Screenshot of active learning framework for estimation of text com-
plexity. The explanation of the feature names are available at
http://nlp.cdackolkata.in/testComplexity/FeatDtl.spy . . 104
5.6 Complexity values across different datasets . . . . . . . . . . . . . 105
5.7 POS Tag distributions in different domains. . . . . . . . . . . . . . 105
5.8 Frequency of word distribution across different domains. . . . . . 106
5.9 Complexity measure and Precision score obtained by NLG based
grammar checker and Naïve Bayes classifier systems. . . . . . . . 106

xiv
ABBREVIATIONS

ALEK Assessing Lexical Knowledge

ALEP Advanced Language Engineering Platform

AMT Amazon Mechanical Turk

APSG Augmented Phrase Structure Grammar

ASL American Sign Language

BLEU BiLingual Evaluation Understudy

CALI Computer Assisted Language Instruction

CALL Computer Assisted Language Learning

CFG Context Free Grammar

CLEC Chinese Learners of English Corpus

CLEF Cross Language Evaluation Forum

CMM Complexity Measurement Metric

CSP Constraint Satisfaction Problem

EP Example Provider

ERG English Resource Grammar

ESL English Second Language

FLAG Flexible Language and Grammar Checking

FN False Negative

FP False Positive

GAAM Graded Acceptability Assessment Metric

GRADES GRAmmar Diagnostic Expert System

xv
HMM Hidden Markov Model

HOO Helping Our Own

HPSG Head Driven Phrase Structure Grammar

ICICLE Interactive Computer Identification and Correction of Language Errors

LFG Lexical Functional Grammar

LM Language Model

MAT Machine Aided Translation

MCQ Multiple Choice Questions

MI Mutual Information

ML Machine Learning

NGC Norwegian Grammar Checker

NLG Natural Language Generation

NLP Natural Language Processing

NLU Natural Language Understanding

OCR Optical Character Recognition

OT Optimality Theory

PCFG Probabilistic Context Free Grammar

PCSP Partial Constraint Satisfaction Problem

POS Parts-of-Speech

SMT Statistical Machine Translation

S-O-V Subject-Object-Verb

SP Suggestion Provider

SST Standard Speaking Test

SVM Support Vector Machine

S-V-O Subject-Verb-Object

TN True Negative

xvi
TP True Positive

WER Word Error Rate

xvii
NOTATION

σ2 Variance
γ Brevity Penalty
µ Mean
Ω Complexity Score

xviii
TRANSLITERATION KEY USED IN THE DISSERTATION

xix
CHAPTER 1

INTRODUCTION

“Grammar has sometimes been described as the Art of speaking and writing correctly.

But people may possess the Art of correctly using their own language without having any

knowledge of grammar. We define it therefore as the Science which treats of words and

their correct use”. – Alfred [1894]

Most people are fluent in speaking language but their writing skill is appalling

because of their lack of grammatical knowledge and oversight at the time of

writing. From naïve users to professional writers, most people are vulnerable

to the curse of grammatical mistakes [Leacock et al., 2010]. Thus casual spoken

language for communication differs from formal written text. Written language

has become more or less a prerequisite for daily communication. Moreover, written

communication leaves a deep impact on education. In the context of our everyday

use of some editing work environments, the need of automatic grammatical error

detection and correction cannot be overlooked. We can remember Socrates in this

context, “Correct language is the prerequisite for correct living”.

With the advancement of computational algorithms the expectation of people is

also increasing day by day. Rather than depending only on mechanical assistance,

we are now seeking intellectual support as well. Zillions of people deal with texts

throughout the world without having proper knowledge about the language and

many of them are not necessarily a native speakers. Most of them use spelling

correction tools when writing documents on a computer. These tools provide a first
step towards writing correct text by saving large human intervention. The second

step of high quality text generation is grammar checking. Grammar checking

is essential for several reasons. It improves the quality of text, saves time, and

supports learning of the language. This tool not only helps native speakers but

also helps the second language learners to communicate in other languages. As a

whole, the system plays a pivotal role in Computer Assisted Language Learning

(CALL). Its function can also be encapsulated as a post processor component of

Machine Translation (MT) and Optical Character Recognition (OCR) system.

1.1 Motivation

A lot of work has been done in grammatical error detection and correction, mainly

in English language. Very little work has been done for Indian languages. Prob-

ably, Punjabi grammar checker [Gill and Lehal, 2008] is the first and only system

developed for an Indian language. Our interest is in developing grammar checker

for Bangla language. Bangla is the sixth most widely spoken language in the

world [Lewis and Paul, 2009] and the second in India. It is the national language

of Bangladesh. This language belongs to the Indo-Aryan family and originated

from Prakrit which is a sister language of Sanskrit. Sister languages of Bangla are

Oriya, Magahi and Maithili in the west and Assamese in the north east of India

[Chatterjee, 1926]. Bangla, Oriya and Assamese are the eastern most languages

of the Indo-European family of languages. Compared to languages like English,

Bangla is a morphologically rich [Bhattacharya et al., 2005; Dandapat et al., 2004]

language and has relatively free word order. It follows a Subject-Object-Verb (S-O-

V) pattern but orientation of these three units is flexible, i.e. S-V-O is allowable but

not popularly used. Till now no significant research and development has been

2
done on grammatical error detection and correction of morphologically rich and

free word order languages like Bangla. To the best of our knowledge, ours is the

first work in India relating to Bangla grammar correction.

1.2 Divergence Issues

Though previous studies have revealed commonalities in types of errors commit-

ted by second language learners of different languages, some novelties are found

with respect to error production in individual language. It has been widely studied

that the divergence between a pair of languages has a profound effect on various

fields of NLP [Dorr et al., 2002; Dave et al., 2001; Goyal and Sinha, 2009]. The diver-

gence between the two languages influences the kind of mistakes second language

learners typically commit. Previous studies have revealed that second language

learners of English having mother tongue as Japanese, Chinese, Korean or Russian

produce article errors due to divergence between English and those languages that

do not have any article [Leacock et al., 2010]. Therefore article selection is specif-

ically problematic for speakers of those languages. Similarly, divergence issues

between Bangla and other languages also influence the kind of mistakes Bangla

second language learners typically commit in their text. Bangla does not have any

preposition as in English. Here prepositions are used either as postpositions after

noun or as nominal inflections [Bhattacharyya et al., 2011]. Bangla language may

have different representations of postpositions for the same preposition used in

English language. Table 1.1 shows a single preposition “with”, it has got different

types of realization in Bangla Language. These differences pose challenges to sec-

ond language learners of Bangla when using postposition and nominal inflections.

3
Table 1.1: Examples of single preposition “with” having different types of realiza-
tion in Bangla

English Bangla
A girl with beautiful eyes. sundara chokhera ekaTi meYe.
A boy with high fever. prachaNDa jbare AkrAnta ekaTi chhele
He wrote with a pen. se pena diYe likhechhila
Milkman mixes water with milk. dudhaoYAlA dudhera sAthe jala meshAna

1.3 Challenges

Grammatical error detection and correction of natural language is a difficult task,

as it deals with the full complexity of language at the time of identification of a

syntactic and semantic structure from input text [Dale, 2011].

Initially most grammar checkers were available as a part of word processors, but

nowadays, they are taking a rebirth as an individual identities. Though grammar

checker tools are already available for English and for other European languages,

they have not matured enough to guarantee correct results most of the time for

every error. They do not provide satisfactorily account for complexities of individ-

ual languages. Moreover, these systems have several limitations. One significant

issue is false alarm where correct constituents are indicted as incorrect which badly

affected the learners’ language acquisition process [Leacock et al., 2010]. Many a

time even ill-formed constructions are not inspected. Thus, in spite of automated

grammatical error detection and correction, manual reviewing is indispensable in

order to achieve high quality text. Therefore, there is a need for an efficient and

reliable grammar checker which can alleviate the potential problems of existing

systems.

Available grammar checkers for other languages are developed based on either

rule-based or Machine Learning (ML) based approaches. A rule-based grammar

4
checker checks the grammatical structure of sentences depending on morpholog-

ical and syntactic analysis. In rule-based approach, rules are manually designed

by linguists to recognize and rectify specific grammatical errors from parse tree

patterns. In Bangla, linguistically rich error correction rules and robust parsers

are not available till date. As Bangla allows free word order, state-of-the-art Con-

text Free Grammar (CFG) is not applicable [Shieber, 1985; Begum et al., 2008;

Bharati et al., 2010] here. Context Free Grammar (CFG) is basically a positional

grammar. It is true that Bangla has a dominant word order, which is SOV (i.e.

Subject-Object-Verb). However, alternative ordering of words are not only used

in literature or poetry but also found in day-to-day news articles. It has been seen

that news reporters often use this alternative ordering to emphasize the event of

occurrences. The evidence of free word order is very frequent in Bangla news cor-

pora and Bangla blogs. Thus, a parser that follows positional grammar is unable

to generate correct parse tree of a Bangla sentence. Free word order also yields

to structural ambiguity and increases the computational cost of the parser. Dis-

continuities (words that belong together but not placed into the same phrase) and

long distance dependencies also pose problems for positional grammars [Bartha

et al., 2006; Covington, 1990]. These linguistic phenomena are quite common for

Indian languages. Due to these challenges current trends of parsing of relatively

free word order languages like Indian languages is based on Paninian framework

and Dependency Parsing techniques [Garain and De, 2013; Ghosh. A. and S., 2009;

Zhang and Nivre, 2011; Nivre, 2008]. It has already been reported in literature that

parser that follows Paninian frame work (designed for free word order languages)

perform well in asymptotic time complexity with the parser for context free gram-

mars (CFG) which are basically designed for positional grammar [Bharati et al.,

2010]. The parsing in Paninial model is based on karaka relations between verbs

5
and nouns in a sentence. It does not consider the position of the constituent during

parsing of a sentence. Thus, Paninian grammar enables parser to parse sentences

even when words are discontinuous and related to each other in long distance.

In contrast to rule-based approach, ML based approaches do not need such

handcrafted linguistic rules. This can alleviate the potential problem of rule-based

approaches. ML based grammar checkers rely on sufficiently large annotated

learners’ error corpora. There is a dearth of annotated error learner corpus of

Bangla text. One of the major problems of building such error corpus from learn-

ers’ data is that the process is very time consuming. It also requires linguistic

knowledge to examine each sentence of learners’ text to determine the nature and

frequency of errors.

To measure improvement in grammar checker performance, we need to devise

a method that can evaluate the functionality and usability of existing grammar

checkers. Over the last few years, most studies regarding grammatical error de-

tection and correction have focused towards the design and development aspects.

Very little attention has been paid to evaluation. Evaluation of such a system is

essential to validate whether the grammar correction methodology adopted by

the system is in the right direction. Performance of most of the existing grammar

checkers cannot be compared as different systems address different types of er-

rors. Moreover, these systems are not tested on a common dataset. Testing on a

common dataset is particularly problematic when different grammar checkers are

designed for different languages. Direct comparison is not possible since different

evaluation metrics are used by different researchers to indicate performance error.

These challenges have motivated us to examine new directions in evaluation.

6
1.4 Research Objectives

The primary goal of the thesis is to develop a grammar checker for Bangla language

with a reasonably good accuracy. Our aim is to contribute a novel grammatical

error detection and correction methodology for morphologically rich and free

word order languages. In this thesis, our focus is to correct postpositional and

nominal inflection errors which are the most frequent mistakes committed by

second language learners of Bangla language. We are also planning to provide

relevant examples for supporting the suggested correction. Though the thesis

deals with grammatical error detection and correction of Bangla language, our

proposed methodology can easily be re-engineered for other similar languages. To

address the broad objective we have identified the following goals:

• We investigate different types of errors done by native speakers and second


language learners of Bangla language and to provide taxonomy of errors.

• Due to unavailability of annotated learners’ error corpora, we propose an ap-


proach for automatic creation of a synthetic error corpus to mimic real world
errors. This auto generated sufficient error corpus helps during evaluation
of the system’s performance.

• We propose a new methodology for correcting the mistakes committed by


users and also to provide relevant examples for supporting the suggested
correction. To have an idea on how strongly we can rely on such a correction,
it will be useful to devise a measure of sentence complexity with respect to
the grammar correction task. If a sentence is complex, the user should not be
overtly reliant on the correction suggested by the system. Conversely if the
complexity measure is low, the user can confidently choose the suggestion.

• Finally, we plan to propose a novel evaluation strategy to assess the per-


formance of grammar correction systems even when these systems are not
tested on the same test set. Our aim is to formulate an innovative complex-
ity measurement metric for test data. Then we will find out the correlation
between complexity value and accuracy of the system tested on that test data.

7
1.5 Thesis Outline

In this Chapter, we have given brief description of grammatical error detection

problem. Challenges in this research problem are highlighted here. We have

looked at the motivation behind the research and identified the initial research

objectives that have directed the research. The rest of the thesis is organized into

chapters as follows:

Chapter 2 reviews recent prior work in grammatical error detection and correc-

tion. We do not aim to give a comprehensive review of the related work. Such

an attempt is extremely difficult due to the large number of publications in

this area and the diverse language dependent works based on several theo-

ries and techniques used by researchers over the years. Instead, we briefly

review the work based on different techniques used for grammatical error

detection and correction.

Chapter 3 describes our novel approach for automatic creation of Bangla error

corpus for training and evaluation of grammar checkers. Though the present

work focuses on the most frequent grammatical errors in Bangla written text,

a detailed taxonomy of grammatical errors in Bangla is also presented here,

with an aim to increase the coverage of the error corpus in future. Mistakes

committed by native speaker and second language learners are compared

here and reasons behind such errors in their text are also investigated.

Chapter 4 describes our procedure for automatic detection and correction of Bangla

grammatical errors using a Natural Language Generation based approach.

Practical issues pertaining to automatic detection and correction of grammat-

ical errors using this approach are discussed here. In this chapter, we also

8
discuss the scope and limitations of the proposed approach.

Chapter 5 deals with a novel evaluation methodology of grammatical error detec-

tion and correction to access its performance. A comparative study between

our grammar checkers based on NLG and a baseline system using Naïve

Bayes classifier is carried out to show the performance of our system. An

innovative complexity measurement metric is introduced here to alleviate

the need of standard test corpus for evaluation of the system. Performance

of individual grammar checker is assessed on texts having different complex-

ity and severity of errors. To evaluate the performance, correlation between

complexity of texts and accuracy of the system is also estimated.

Chapter 6 summarises the contributions of the research and concludes with future

directions for possible extensions of the current work.

Appendixes. Some appendixes have been added in order to cover the comple-

mentary details. More precisely, the list of included materials are:

Appendix A: Examples of some interesting erroneous sentences corrected

by the system.

Appendix B: Examples of incorrect prediction by the system.

Appendix C: Examples of some interesting erroneous sentences having dif-

ferent complexity and severity of errors, including those that are difficult

to correct even for human annotators.

Appendix D: Examples of complex sentences collected from Literature Do-

main.

9
CHAPTER 2

LITERATURE SURVEY

The main focus of the dissertation is on grammar correction task specifically for

Bangla language. It has been assumed that the input to our grammar correction

system is free from spelling errors. Thus in the literature survey, we have largely

focused on grammar checking techniques. However, misspelling also contributes

to the faulty construction of a sentence. Therefore, it has been felt that a brief

discussion about different aspects of spell checking is necessary in the context of

grammar checker. Interested readers can go through the cited references for more

details on spell checking.

2.1 Spell Checker

The main task of a spell checker is to find the appropriate word the author intended

to type given a misspelled word. A spell checker is used to correct spelling er-

rors in text, fixing the output of Optical Character Recognition (OCR)1 and Online

Handwriting Recognition (OHR). Often it also appears as a preprocessing com-

ponent of grammar checkers. Spell checkers ensure initial step towards effective

writing. Spelling errors in human writing are committed due to homophone2 and

when there is no neat mapping between structure and pronunciation of words.

According to Heift and Schulze [2007] language learners can commit misspellings
1
Examples of OCR generated errors are “rn” for ‘m’, ‘e’ for ‘c’ etc. [Lopresti and Zhou, 1997]
2
Examples of errors due to homophone are “it’s” for “its”, “dessert” for “desert”, “piece” for
“peace” etc.
due to either a misapplication of morphological rules or other influences from their

native languages or for incomplete knowledge of the morphology and phonology

for the language they have learned. For example, a learner may write “goed” and

“writed” in English for incomplete knowledge of English irregular verbs [Leacock

et al., 2010]. However, one can argue that this problem beyond the scope of spell

checking and need to be addressed as a part of grammar checking. Spelling er-

rors are classified into two types, namely, Non-Word error and Real-Word error

[Kukich, 1992]. A Non-Word error occurs when a misspelled word is not a valid

dictionary word. Conversely, a Real-Word error occurs when user writes a valid

dictionary word but it is not suitable in the context of the sentence. Examples of

a Non-Word error and a Real-Word error for a correct sentence “The boys ate their

toast” have been shown below:

Example of a Non-Word error: “The boys ate *thier toast”

Example of a Real-Word error: “The boys ate *there toast”

‘*’ indicates the error words in the sentences. Detection of Real-Word errors is com-

paratively more complex than detection of Non-Word Errors. Many a time context

words provide clue to detect Real-Word errors. An alternative classification of

spelling errors is (1) Orthographical error (also known as ‘Cognitive error’) and (2)

Typological error [Kukich, 1992]. The Orthographical error occurs when author

either simply does not know the correct spelling or forgets it during typing. Or-

thographical errors mostly generate strings with phonologically identical or very

similar to the correct word (e.g. “indicies” and “indices”). As a result, these errors

have a dependency with the spelling and pronunciation in a particular language.

Typological errors occur due to wrong hit of key sequences. Thus characteristics of

this type of errors depend on keyboard layout rather than the language in which

the word has been written.

11
Spelling correction happens in three stages viz. “Error Detection”, “Candi-

date Corrections Generation” and “Ranking of Candidates”. Structural similarity,

pronunciation similarity, syntactic and semantic context, exploiting knowledge of

sources (like keyboard, OCR, Speech-to-Text etc.) can be used to detect and cor-

rect spelling errors. Structural similarity between misspelled word and candidate

corrections is estimated using edit distance [Levenshtein, 1966]. To select the best

candidate correction, the minimum edit distance [Wagner and Fischer, 1974] is

preferred. Dynamic programming algorithm [Bellman, 1957] is used to calculate

minimum edit distance. For a good overview of edit distance please refers to

[Dasgupta et al., 2008; Jurafsky and Martin, 2009]. Pronunciation similarity is usu-

ally measured by Russel’s Soundex [Knuth, 1998] and Metaphone [Philips, 1990].

Noisy channel model, a probabilistic approach, is used for spelling correction. The

intuition of noisy channel model is to treat the misspelled words as an instance of

the correct words which have been passed through a noisy channel [Jurafsky and

Martin, 2009]. A comprehensive study of earlier spell checking techniques has been

discussed in [Kukich, 1992] and [Peterson, 1980]. There are various approaches

for Non-Word spelling corrections like Trigram Analysis [Angell et al., 1983], Error

Patterns [Yannakoudakis and Fawthrop, 1983], Triphone Analysis [van Berkel and

De Smedt, 1988], Noisy Channel Model [Kernighan et al., 1990], Using Context

[Agirre et al., 1998], String-to-String Edits [Brill and Moore, 2000] and Pronuncia-

tion Model [Toutanova and Moore, 2002]. To correct Real-Word errors approaches

like Trigram based [Mays et al., 1991], Noisy Channel Model [Mays et al., 1991],

Lexical Cohesion [Hirst and Budanitsky, 2005], Web as a Source of Information

[Whitelaw et al., 2009] and Using Confusion Sets [Golding, 1995; Golding and

Schabes, 1996; Mangu and Brill, 1997] have been discussed in the literature.

Works on spell checker development in Indian languages like Bangla [Chaud-

12
Table 2.1: Percentage of various types of errors in Bangla

Type of error Percentage


Substitution error 66.32
Deletion error 21.88
Insertion error 6.53
Transposition error 5.27

huri, 2002, 2001; Choudhury et al., 2007; UzZaman, 2005; Haque and Kaykobad,

2002; Uzzaman, 2004; Bhatt et al., 2005; Bansal et al., 2004], Assamese [Das et al.,

2002], Punjabi [Lehal, 2007], Marathi [Dixit et al., 2006] etc. are worth mentioning.

Here we are discussing two major works in Bangla Spelling correction.

Chaudhuri [2002, 2001] has analysed Non-Word patterns in Bangla hand-

written text. These patterns have been collected from samples of answer scripts of

students at various levels of studies like Secondary, Higher Secondary and Under-

graduate. For studying phonetic spelling errors, they have also collected samples

of dictated notes. These notes have been dictated from various topics chosen from

story, novels, books of science, geography, history etc. They have manually col-

lected the misspelled words from these texts. Illegible words and words of length

greater than four but having more than three errors have been rejected. They have

analysed the different types of spelling errors (substitution, deletion, insertion and

transposition) found in Bangla text. The percentages of such errors are shown

in Table 2.1. They have seen that most misspelling take place by omissions of

mAtrA3 and omission of vowel diacritical markers. Mistakes committed in Bangla

compound constants (called yuktAkShara) due to ignorance are also observed. In

Bangla, dental and cerebral nasal consonants are phonetically very similar. As a

result there is a chance of misspelling when proper spelling rules are not remem-

bered. Details of the error pattern analysis have been reported in [Chaudhuri and
3
mAtrA or shirorekhA is a horizontal line present at the upper part of many Bangla characters.

13
Kundu, 2000]. They have proposed two-stage techniques to detect and correct

Non-Word errors in Bangla text. The first stage takes care of phonetic similarity

error and the second stage takes care of errors other than the phonetic similarity.

The phonetically similar characters are mapped into single units of character code.

A new dictionary Dc is constructed with this reduced set of alphabet. They have

also constructed a reverse order dictionary Dr . In Dr , the characters of each word

are kept in reverse order. A phonetically similar but wrongly spelt word can be

easily corrected using Dc . Phonetically non similar misspelled words are searched

in both the dictionaries. If the word of length n is not found in Dc , then its first k1

characters are matched with words in this dictionary. The last k2 characters of the

same word are searched in Dr . A misspelled word with a single error is located in

the intersection region of the first k1+1 and last k2+1 characters. Figure 2.1 shows the

error localization by conventional and reverse dictionary. Candidate corrections

are suggested by searching in the conventional dictionary for those words start-

ing with first k1 characters and ending with last k2 characters. They have tested

Figure 2.1: Error localization by conventional and reverse dictionary [Chaudhuri,


2002, 2001]

their approach on 250k words and reported that all Non-Word errors are correctly

detected but false error detection rate by the system is 5%.

Choudhury et al. [2007] have investigated the difficulties involved in spelling

14
error detection and correction in Bangla, Hindi and English through the concep-

tualization of Spelling Network (named as SpellNet). This work is inspired by a

similar work of complex network approach [Albert and Barabási, 2002; Newman,

2003] and work on phonological neighbours’ network of words [Kapatsinski, 2006;

Vitevitch, 2005]. The SpellNet is a weighted network of words, where the nodes

represent the words and the weights of the edges indicate the orthographic simi-

larity between the pair of words they connect. The structure of a SpellNet is shown

in the Figure 2.2. They have focused on the networks at three different thresholds

Figure 2.2: The weighted SpellNet for 6 words

Figure 2.3: Structure of SpellNet for θ=1

(θ) of edge weights, that is for θ = 1, 3 and 5. They have studied the properties of

15
Complex Network at these θ values for the three languages. They do not consider

higher thresholds as the networks become completely connected at θ = 5. Thresh-

olded counterpart of Figure 2.2, for θ = 1 has been shown in Figure 2.3. It has

been seen that the orthography of the two Indian languages Bangla and Hindi are

highly phonemic in nature, in contrast to the orthography of English. They have

seen that probability of making a Real Word error in a language is proportionate to

the average weighted degree of SpellNet. They have reported that the probability

of Real-Word error is highest in Hindi followed by Bangla and English.

2.2 Grammar Checker

Grammar checking is a process that verifies morphology, syntax and semantics of

an input text. This verification process is executed by the Grammar Checker. The

grammar correction problem belongs to the field of Natural Language Process-

ing (NLP) which is a branch of Artificial Intelligence. Generation of text that is

syntactically and semantically correct is an important aspect in NLP and Natural

Language Generation (NLG). An automatic Grammar Checker is a computerized

writing aid that examines written text to detect and correct grammatical mistakes

and provides necessary feedback to the user. Figure 2.4 shows a basic functional

diagram of grammatical error detection and correction. In some of the reviewed

papers, and depending on the technique that is used, "correction" could be per-

formed, without any kind of "detection". In this figure, the dotted line indicates

this situation. Grammar checkers are one of the most widely used tools in the

area of language processing. Though most of the existing grammar checkers are in

English, grammar checkers for other languages are also available. Table 2.2 shows

research work carried out in languages other than English. In this chapter, rele-

16
Figure 2.4: Simplified functional diagram of grammatical error detection and cor-
rection

Table 2.2: Research on grammar checking in different languages

Languages Authors
Afan Oromo Tesfaye [2011]
Basque Uria et al. [2009]
Chinese Liu et al. [2008]
French Hermet et al. [2008]
German Schmidt-Wigger and Anje [1998]
Japanese Izumi et al. [2003]
Korean Young-Soog and Chae [1998]
Norwegian Bondi et al. [2002]
Punjabi Gill and Lehal [2008]
Spanish Lozano and Melero [2001]
Swedish Birn [2000]

17
vant research work in relation to grammatical error detection and correction are

surveyed. The aim of the chapter is to provide a brief idea regarding the existing

grammar checking techniques.

2.3 Automatic Grammar Correction Approaches

Uszkoreit [1996] (quoted in [Hein, 1998]), suggested a four level scheme for gram-

mar correction approaches viz.

i. Detection: deals with identification of possible ungrammatical segments.

ii. Recognition: deals with localization and identification of the probable vio-
lated constructions.

iii. Diagnosis: deals with identification of the possible sources of errors.

iv. Correction: deals with construction and ordering of the correct alternatives.

Different approaches have been taken for grammatical error detection and

correction by different researchers. Some researchers [Lozano and Melero, 2001;

Bredenkamp et al., 2000; Jensen et al., 1983] follow Rule-based or Parser based ap-

proach, some [Fujishima and Ishizaki, 2011; Izumi et al., 2003; Bigert and Knutsson,

2002; Knight and Chander, 1994] follow purely Machine Learning (ML) based em-

pirical approach, while others [Hermet and Désilets, 2009; Liu et al., 2008] prefer

Statistical Machine Translation (SMT) based approach. A brief road map of gram-

matical error detection and correction approaches is shown in Table 2.3. Research

in the field of grammatical error detection and correction started since the early

1978 [Weischedel et al., 1978]. From 1978 to 2002, most grammar checkers followed

rule-based approach. Since 2002, ML and SMT approaches have dominated over

18
Table 2.3: A brief road map of grammatical error detection and correction ap-
proaches.

Rule-based Approach
Nina H. MacDonald and Keenan. [1982]: String matching.
Jensen et al. [1983]: Parse fitting.
Douglas and Dale [1992]: Constraint Relaxation.
Bredenkamp et al. [2000]: Syntax-based.
Lozano and Melero [2001]: Syntactic and Semantic analysis.
Machine Learning based Approach
Knight and Chander [1994]: Decision tree classifier.
Scheler and Munchen [1996]: Neural Network.
Bigert and Knutsson [2002]: n-gram Language Model.
Izumi et al. [2003]: Maximum entropy classifier.
Yi et al. [2008]: Web counting.
Fujishima and Ishizaki [2011]: Support Vector Machine.
Statistical Machine Translation based Approach
Liu et al. [2008]: Noisy channel Model.
Hermet and Désilets [2009]: Round trip SMT technique.

rule-based approach. Based on these approaches research prototypes have been

built for different languages.

2.3.1 Rule-based Approach

In the rule-based approach, rules are manually designed by linguists to recognize

and rectify specific grammatical errors from parse tree patterns. A rule-based

grammar checking tool checks the grammatical structure of sentences depending

on morphological and syntactic analysis. At the time of morphological analysis,

individual words are mapped to their lexical components and necessary informa-

tion related to their lexical structures are returned. In syntactic analysis, a parser

is used to analyse sentence structure and build its structural representation. This

structural representation maintains grammatical relationship between the words

in a sentence [Rich and Knight, 1991]. The primary goal of a rule-based system

19
is to parse ill-formed sentences in order to detect and correct the errors in a sen-

tence. Instead of using parsers, some rule-based diagnostic approaches consult

a list of errors. Related rules of errors are grouped together for identification

of such errors. Figure 2.5 shows a simplified functional diagram of rule-based

Figure 2.5: Simplified functional diagram of a rule-based grammar checker

grammatical error detection and correction system. Early works in grammatical

error detection and correction were based on pattern matching or rule-based tech-

niques. At that time, rule-based approach only used to depend on hand-crafted

heuristic rules. But later, improvement has been done using computational gram-

mar like Precision Grammar [Bender et al., 2004], Lexical Functional Grammar

[Dalrymple, 2001], Constraint Grammar [Karlsson, 1990a], Head Driven Phrase

20
Structure Grammar [Proudian and Pollard, 1985], Tree Adjoining Grammar [Joshi

et al., 1975], Augmented Phrase Structure Grammar (APSG) [Heidorn, 1975], etc.

Besides this, smart parsing techniques like Constraint Relaxation [Fouvry, 2003;

Vogel and Cooper, 1995; Bolioli et al., 1992] and Parse Fitting Jensen et al. [1983]

are also employed for efficient grammar correction. Unix Writer’s workbench

[Nina H. MacDonald and Keenan., 1982] was one of the oldest and widely used

grammar checkers which was based on a string matching algorithm rather than

grammatical processing. But later, CorrectText (Houghton Mifflin Company)

and Grammatik (Aspen Software) introduced some amount of linguistic analy-

sis, while FLAG [Bredenkamp et al., 2000], VIRKKU, MS-NLP [Heidorn, 2000;

Lozano and Melero, 2001], EasyEnglish [Bernth, 1997], NGC [Bondi et al., 2002],

Grammatifix [Arppe, 2000; Birn, 2000] etc. systems carried out detailed linguis-

tic analysis. EPISTLE [Heidorn et al., 1982], Critique, GramCheck [Bustamante

and León, 1996], SCRIPSI [Catt and Hirst, 1990] etc. are examples of some exist-

ing grammar checkers that follow constraint relaxation technique. Today’s open

source systems like AbiWord (http://www.abisource.com) uses linguistic gram-

mar rule for grammar checking. VP2 [Schuster, 1986], the Intelligent Language

Tutor [Schwind, 1988] and Automated German Tutor [Weischedel et al., 1978] use

linguistic tools like POS tagger and rule-based parser that depends on relatively

small grammar for target specific errors.

Now we will discuss different rule-based strategies adopted by different sys-

tems in brief.

21
Syntax-Based

MS-NLP [Heidorn, 2000; Lozano and Melero, 2001] is a rule-based system that

analyses English language. It is used as a grammar checker component in Mi-

crosoft Word. The main focus of this system is to detect and correct the specific

types of errors made by native speakers such as subject verb disagreement, num-

ber disagreement, etc. The grammatical error detection process of this system

consists of four stages. In the first stage, the input text is tokenised into individual

words. Then these tokens are passed to the morphological analyser for analysis of

individual components. The second stage of processing is known as “sketch” since

it provides the basic syntactic parsing of the input sentence. The system uses Aug-

mented Phrase Structure grammar [Heidorn, 1975], consisting of a set of binary

phrase structure rules. The third stage is known as “portrait”, informally known

as reattachment. The parsed tree produced by “sketch” is refined in this stage

to produce a more accurate tree attachment of constituents such as prepositional

phrases, relative clauses, or infinitive clauses. Syntax tree generated by MS-NLP

system may look like as shown in Figure 2.6. The fourth stage is known as “logical

form”. A semantic graph is produced in this stage to display the basic semantic

relations underlying the syntactic tree. The MS-NLP system consists of a seman-

tic analyser known as MindNet which is used for word sense disambiguation.

Michaud and Mccoy [2001, 2000] proposed an Interactive Computer Identification

and Correction of Language Errors (ICICLE) system to improve the literacy of

American Sign Language(ASL) signers. This system analyzes the grammatical

errors of a text written by deaf students and enables them to generate appropri-

ate text by a tutorial dialog. Feedback is given instead of providing corrections.

The system uses Context Free Grammar (CFG) augmented with error-production

22
Figure 2.6: Syntax tree generated by MS-NLP System

rules known as mal-rules. Mal-rules precisely describe expected error forms and

focuses on previously known error patterns. Mal-rules parse the ungrammatical

sentences and trigger error flag on successful parsing of an erroneous sentence

and use annotation to indicate the types of errors in the sentence. An example

for mal-rule for detecting “be” verb deletion error for an English sentence “The

boy honest” may look like VP(error+) → AdjP where the conventional context free

rule is written as VP → V AdjP. Similarly to handle missing subject in the English

sentence “The honest”, mal-rule can be formulated as S(error+)→ VP [Schneider

and McCoy, 1998].

Norwegian Grammar Checker (NGC) [Bondi et al., 2002] uses Constraint

Grammar [Karlsson et al., 1995] to detect a wide range of grammatical errors.

The system contains a preprocessor which performs spell checking on the input

text and has three other major components. These components are Morphological

Analyser, Constraint Grammar Disambiguator and Error Detector. Initially, an in-

23
Table 2.4: Syntax based grammatical error detection and correction approaches.

Authors Grammar/Rules and Techniques


Lin and Su [1995] Phrase Level Building (PLB) parsing
Bernth [1997] English Slot Grammar [McCord, 1980]
Combinatory Categorial Grammar
Park et al. [1997]
[Steedman and Baldridge, 2005]
Phrase constituent rules and local error rules.
Hein [1998] Chart parser and Chart scanner
[Jurafsky and Martin, 2009].
English Resource Grammar
Frank et al. [1998] (http://www.delph-in.net/erg/) and XLE
(http://www2.parc.com/isl/groups/nltt/xle/)
Bredenkamp et al. [2000] Trigger and Confirmation rule
Precision grammar and English Resource Grammar.
Bender et al. [2004] Linguistic Knowledge Builder (LKB)
(http://moin.delph-in.net/LkbTop)
ParGram English Lexical Functional Grammar
Khader et al. [2004]
(http://pargram.b.uib.no/)
Uria et al. [2009] Constraint Grammar [Karlsson, 1990b]

put sentence is tokenized and spell checked. Then a morphological analyzer and a

POS tagger are used to provide necessary lexical attributes and POS tags. Then the

“Constraint Grammar Disambiguator” filters out the improper tags depending on

the grammatical context. Finally, the Error Detector module detects grammatical

errors depending on the linguistic inputs provided by the previous two modules.

Table 2.4 shows that other researchers follow the similar Rule-based techniques

for grammatical error detection and correction but their methodologies differ de-

pending on the grammar and language processing tools they have used.

Constraint Relaxation

In Constraint Relaxation technique, broad classes of errors are detected by relaxing

constraints in an unification framework [Fouvry, 2003, 2000; Lascarides et al., 1996].

In this approach, the constraints such as subject-verb agreement are gradually

24
relaxed until the sentence can be parsed completely. Corrections are suggested after

examining the violated constraints. Degree of error in a sentence is determined by

the order of relaxation.

IBM’s EPISTLE [Heidorn et al., 1982] system performs complete linguistic anal-

ysis using rule-based grammar and parser built on that grammar. This system

checks both the grammar and style of English written texts. Grammar checking

module takes care of the improper agreement between subject and verb whereas

the style checking module points out problems regarding excessively complex sen-

tences. A constraint in an EPISTLE rule may looks like:

NP VP (NUMB.AGREE.NUMB(NP)) → VP(SUBJECT=NP).

For analysing a text this system follows three levels like Word processing, Gram-

mar checking and Style checking. At Word processing level system does efficient

dictionary lookup and also deals with suffixes and prefixes. This dictionary look

up procedure returns necessary attributes of words along with the POS tag for

further processing. According to the English grammar rules, a general language

processing system attempts to parse each sentence in order to check the grammat-

ical construction of the sentence. Those sentences will be parsed successfully that

follow the specified grammar rules (constituent class patterns) along with the im-

posed constraint (restrictive conditions on those patterns) on clauses. On the other

hand, unsuccessful sentences are parsed again by relaxing some of the conditions

and with some additional rules. The relaxed conditions and the corresponding

problematic constituents of the sentence are noted to provide the indication and

information of grammatical errors. The parse trees developed during grammar

checking are utilized later by Style processing module to detect probable stylistic

problems in the sentence.

25
CRITIQUE is a text processing system which checks grammar as well as style

using a broad-coverage PLNLP English Grammar [Jensen et al., 1993]. The system

also follows constraint relaxation technique.

Fliedner [2002] proposed a rule-based system to detect NP agreement errors

in German. He has used a shallow parsing based on finite state automata in

combination with constraint relaxation and a method for parse ranking based on

Optimality Theory [Smolensky and Legendre, 2006]. Parse ranking is used to

select a best parse or a small number of best parses from a ‘parse forest’, which is

especially important when grammatical constraints are relaxed, as the number of

possible parses may become quite large.

Other researchers like Douglas and Dale [1992],Dini and Malnati [1993] and

Schwind [1990] also follow Constraint Relaxation technique for grammatical error

detection and correction.

Parse Fitting

A parse fitting procedure proposed by Jensen et al. [1983] to “fit” together pieces

of a parse-tree when the parser fails to generate the complete parse tree for a

given input sentence. This technique generates a reasonable approximate parse

tree when the rules of a conventional syntactic grammar fail to parse an input

string. The approximate parse tree can serve as input to the remaining stages of

processing. When a bottom-up parsing fails to produce a start(S) node to cover

the string then the fitting procedure begins. The by-product of this unsuccessful

bottom-up parsing is recorded for inspection of various segments of the input

string from error detection and correction perspectives.

26
Mellish [1989] has presented a generalized parsing strategy based on an active

chart which can diagnose errors in sentences. His proposed technique applies a

top down parser when the bottom up parser fails to produce the complete parse

tree. This is done so that the top down parser can examine the pieces of parse

constituents of the bottom up parser and provides a suggestion where the bottom

up parser might have failed to parse the sentence.

2.3.2 Machine Learning Approach

ML is a field of study of algorithms which predict unknowns from observed

data using inductive inference. Inductive inference provides information about

statistical phenomena and generalizes conclusion from specific examples. These

generalized models help to predict the future data. According to Mitchell [1997],

“A computer program is said to learn from experience E with respect to some class of

tasks T and performance measure P, if its performance at tasks in T, as measured by P,

improves with experience E”. In the field of ML, for grammatical error detection and

correction some of the researchers use Language Modelling approach Hermet and

Désilets [2009]; Bigert and Knutsson [2002]; Chodorow and Leacock [2000], some

of them [Knight and Chander, 1994; Gamon et al., 2008; Izumi et al., 2003] prefer

classification based approach and rest prefer web counting method. Now we will

discuss these aforementioned three approaches in brief.

Language Modeling

A Language Model (LM) is basically a probability distribution over all possible

word sequences of a language. LM is used to predict the next word depending

on the previous history [Jurafsky and Martin, 2009]. Probability assigned to a

27
word sequence of a particular language is indicative of likelihood of this sequence

being uttered by a speaker of that language. From the training corpus, LM gathers

statistical knowledge which is used to estimate the probability of a sentence. The

probability of a sentence containing word sequences w1 , w2 , w3 , · · · , wn can be esti-

mated by decomposing it into a series of product of conditional probability using

chain rule as follows:

P(w1 , w2 , w3 , · · · , wn ) = P(wn1 )

= P(w1 )P(w2 |w1 )P(w3 |w21 ) · · · P(wn |wn−1


1 ) (2.1)

n
= P(wi |wi−1
1 )
i=1

But P(wn |wn−1


1
) is difficult to compute due to sparseness of data in the training

corpus. To resolve this problem, the probability of a word wn given all the previous

words can be approximated by the probability given only the previous N words.

This N-gram approximation to the conditional probability of the next word in a

sequence is written as: P(wn |wn−1


1
) ≈ P(wn |wn−1
n−N+1
)

Figure 2.7 shows examples of trigrams for English sentence “Ram is a good boy”.

Figure 2.7: Examples of trigram sequences.

LM can be used to differentiate ill-formed sentences from well-formed sen-

tences depending on the probability scores of the sentences. If the probability

score is below some predefined threshold value then the sentence is considered

28
as ill-formed sentence, otherwise the sentence is grammatically correct. Many

researchers prefer POS tag sequences rather than word sequences. N-grams of

POS tags have many useful properties. Some of the features of the language it-

self are captured by n-grams as they are extracted from a corpus representing the

language. The extracted features contain only local information due to the limited

scope of an n-gram. As each of these n-grams describes an allowable sequence of

n POS tags, they represent a small acceptance grammar of the language.

Bigert and Knutsson [2002] have proposed a robust probabilistic method for

detection of context-sensitive errors. Initially, input sentence is tagged using a POS

tagger. N-gram constituents are collected from resulting tag sequences and then

the occurrence frequency of each n-gram is fetched from the n-gram frequency

table. If the frequency is greater than a predefined threshold value then this con-

struction is considered as grammatically correct. Otherwise, it is considered as

grammatically incorrect because rare constructs are relatively improbable. How-

ever, due to the sparseness of the tags participating in the n-gram, sometimes it

may happen that an n-gram representing an acceptable grammatical construct may

not be encountered in the training data. To mitigate this problem, they have built

a confusion matrix which is a matrix of syntactic distance between POS tags. This

matrix contains information about how suitable one tag is in the context of another.

This information is utilized at the time of replacing one tag with the other. A rare

tag is substituted with a tag of higher frequency suitable in the same context. If tag

t1 is substituted with tag t2 , then t2 is called a representative for t1 . Though a list

of feasible representatives can be easily produced, problem lies in ordering these

representatives. Furthermore, all representatives are not equally appropriate in the

given context. For these reasons they have introduced a weight. Representative

list is built using distance between two tags and to measure this distance, L1-norm

29
and POS tag n-grams are used. The process of distance calculation is explained

below:

Initially, to obtain a fair comparison between tags of different frequency, normal-

ization of the trigram (tL , t, tR ) is calculated as follows:

f req(tL , t, tR )
n(tL , t, tR ) = (2.2)
f req(t)

If t′ is the replacement tag for the tag t and tL and tR are two context surrounding

the tag t. Then the distance is calculated by measuring the difference between the

normalized frequencies as:

disttL ,tR (t, t′ ) = |n(tL , t, tR ) − n(tL , t′ , tR )| (2.3)

Finally, all POS tag contexts are considered and the generalized equation becomes:


dist(t, t′ ) = disttL ,tR (t, t′ ) (2.4)
tL ,tR

Distance dist(t, t′ ) calculated using this formula ranges from 0 to 2. When the con-

texts are identical then the vale is 0 and when the uses of t and t′ are disjunct then

the value is 1. The probability p(t, t′ ) of replacing t with t′ is calculated depending

on this distance. When the probability is less than 1, a penalty is introduced as

the tag t′ is less appropriate than tag t in this context. By substituting the tag with

its representative tag and maintaining the similar syntactic structure, their algo-

rithm detects less-frequent grammatical constructions and attempts to transform

them into more-frequent constructions. Even after this transformation, if mod-

ified construction is also a low-frequency construction then the text is expected

to contain an error. A robust rule-based phrase and clause detection modules is

used to avoid false alarm generated by the system. Their algorithm utilizes the

30
information of clause boundaries where clauses are used as the unit for error de-

tection algorithm to operate on it. For the detection of clause boundaries they have

implemented Ejerhed’s algorithm for Swedish [Ejerhed, 1999]. This algorithm is

based on context-sensitive rules operating on POS tags.

Chodorow and Leacock [2000] proposed an unsupervised method to detect gram-

matical errors by inferring negative evidence from the edited text corpus. They

have developed a statistical system known as Assessing Lexical Knowledge (ALEK).

ALEK was trained on a general purpose corpus of English edited text containing

examples of sentences of the target word. Depending on differences between

word’s local context cues, the system identifies inappropriate usage. ALEK infers

negative evidence from the contextual cues that do not co-occur with the target

word. The system collects contextual cues in a ±2 word window around the target

word. Function words (closed-class items) and POS tags are the two kinds of

contextual cues used by the system. Initially, sentences have been tagged using

POS tagger and then the frequency of sequences of adjacent POS tags and function

words are counted. For example, in the sentence “a/AT tall/JJ man/NN”, the occur-

rence frequency of the bi-gram sequences AT+JJ, JJ+NN, a+JJ, and unigram count

of individual POS tags and functional words are calculated. These frequencies

are the basis of their error detection measure. To determine the unusual and rare

combination of POS tags and functional words, ALEK computes Mutual Infor-

mation (MI) based measure. MI based measure is used to find combinations that

occurs less often than expected. Usually n-gram probabilities of ungrammatical se-

quences are much smaller than the product of the unigram probabilities. Then the

value of MI becomes negative. Thus a negative value of MI often indicates that a

syntactic rule is violated. The experimental result shows that ALEK performs with

80% Precision and 20% Recall. Powers [1997] explored the concept of Differential

31
Grammar and applied bigram frequency of POS tag sequences to discriminate be-

tween ill-formed and well-formed sentences. An empirically established threshold

value is used to decide the error indication of bi-gram. A Differential Grammar

can be defined as a small set of environments that helps to distinguish between a

pair of confused words in all contexts. According to Powers [1997] the definition of

Differential Grammar is “A minimal set of syntactically significant environments that

differentiate amongst a set of possible targets.” A Differential Grammar is not actually

a linguistic grammar; it is basically designed to discriminate a token from a set

of confusable alternatives based on most likely occurrence in a given context. It

does not have a concept of rule like traditional rule oriented grammar but rather

it is very simple, more specialized and lexically-focussed. In order to differenti-

ate between correct target word and one or more incorrect confused words, this

grammar utilizes high-order N-gram statistics. The n-gram contexts are reduced

based on high frequency important tokens like words, numbers, punctuation and

affixes.

Henrich and Reuter [2009] used n-gram based statistical approach for lan-

guage independent grammar checking. For checking a sentence their extraction

of n-gram starts with all pentagrams of tokens for the whole sentence. Then the

process continues with the corresponding quadrigrams of tokens, going on with

the trigrams of tokens and so on. If an n-gram is not found in the database, it is as-

sumed that this n-gram is wrong. An error level is calculated corresponding to the

number of n-grams which are not found in the database. The smallest erroneous

n-gram finally points to the error in the input text. All these errors are summed

up and the result is compared to an overall error threshold. If it is higher than the

threshold, then the sentence is marked as wrong. They used wildcard (*) in the

erroneous n-gram for finding most probable n-gram sequence from the training

32
database. They also stored temporal adverb-verb and adjective-noun agreement

for statistical analysis of the agreement of a temporal adverb with the tense of the

corresponding verb and the agreement of an adjective to its corresponding noun.

Nagata et al. [2004] proposed a simple statistical model based on conditional

probabilities of articles for detecting article errors committed by Japanese learners

in English text. Their model detects article errors based on three head words:

head verb (v), preposition (prep) and head noun (n). Initially, from the input

sentence three head words are extracted. Then all the head words are reduced

to their stem/root form and also converted to lower case. Then a quadruple like

(I1,v,prep,n) is prepared. Now probability of a particular article Ii given v, prep and

n is calculated as follows:

f (Ii , v, prep, n)
P(Ii |v, prep, n) = (2.5)

k
f (Im , v, prep, n)
m=1

where symbol f denotes the frequency of occurrence of a particular tuple and k is

the total number of articles. To estimate which article class is relatively low in a

particular tuple they formulated an equation as

P(Ii |v, prep, n)


S(Ii , v, prep, n) = (2.6)
max P(Im |v, prep, n)
1≤m≤k

when a S(Ii , v, prep, n) is less than some predefine threshold θ(0 < θ < 1) then

an article error is detected. To avoid the sparseness problem during probability

estimation they used backed-off smoothing [Jurafsky and Martin, 2009] technique.

Their system achieved 77% Precision, 64% Recall and F-measure of 0.70 when they

set the threshold θ to 0.334.

33
Classification

In Classification based approach individual sentences are classified as being either

correct or incorrect using features extracted from training data. Classification ap-

proaches of different researchers differ from their use of features and classifiers

like Naïve Bayes [Mitchell, 1997], Balanced Winnow [Littlestone, 1988], Support

Vector Machine [Campbell and Ying, 2011], Voted Perceptron Freund and Schapire

[1999], Maximum Entropy [Berger et al., 1996], Decision Tree [Mitchell, 1997; Quin-

lan, 1986] etc.

Scheler and Munchen [1996] used a feature model of the semantics of plural de-

terminers to detect and correct grammatical errors of definiteness. They had used

an Artificial Neural Network to learn a function that maps the semantic feature

representation to category of indefinite/definite article. They have also provided

surface-oriented textual encoding of their text corpus to reduce the informational

content of the text without losing its essential components.

Knight and Chander [1994] used decision tree classifier over lexical features

for detection and correction of article errors in the Japanese to English machine

translation outputs. Figure 2.8 shows basic architecture of their post editing task.

A set of binary features was developed by them to characterize noun phrases.

Figure 2.8: Basic architecture of post editing after machine translation.

34
These binary features are either lexical or abstract which includes POS tags, plural

markers, tense and subcategories like superlative adjectives, mass nouns etc. To

build the decision tree, each feature maintains three types of measures. These

three types of measures are frequency of occurrence, distribution of a/an for noun

phrases in which the features are present and distribution for those without the

features. To choose the best feature an information-theoretic approach has been

taken. The decision tree is built depending on the datasets and the feature-based

split. Their post editing algorithm achieved an overall accuracy rate of 78% on

financial text.

Gamon et al. [2008] used a decision tree classifier along with a language model

for determining article and prepositional errors in a sentence. They used a language

model which was trained on the Gigaword corpus. The language model was used

to provide additional information to filter out invalid suggestions. Their system

has three main components: Suggestion Provider (SP), Language Model (LM)

and Example Provider (EP). Initially, an input sentence is tokenized and POS

tagged. Then these tokens are sent to the SP module which employs decision

tree classifier for providing suggestions. All suggestions from the SP module are

collected and sent to the LM. Here the suggestions are ranked based on probability

score assigned by the LM. Finally, the EP returns example sentences containing

suggested correction by using query in the web. This information is provided to

the user to choose the suggestion and to make an informed decision about the

correction. They achieved 55% accuracy for article error detection tested on 6K

CLEC and 46% accuracy for prepositional error detection tested on 8K CLEC test

corpus.

35
Table 2.5: Effectiveness of Individual Features

Feature %Correct
Word/POS of all words in NP 80.41
Word/POS of w(NP-1) + Head/POS 77.98
Head/POS 77.30
POS of all words in NP 73.96
Word/POS of w(NP+1) 72.97
Word/POS of w(NP[1]) 72.53
POS of w(NP[1]) 72.52
Word/POS of w(NP-1) 72.30
POS of Head 71.98
Head’s Countability 71.85

Izumi et al. [2003] used Maximum Entropy classifier for handling insertion,

omission and replacement errors. They have tested their model on 1915 sentences

collected from Standard Speaking Test (SST) corpus and have achieved approxi-

mately 50% Recall and 60% Precision using this approach. Han et al. [2004, 2006]

have trained a Maximum Entropy classifier to select among a/an, the, or zero ar-

ticle for noun phrases, based on a set of features extracted from the local context

of each. The system was trained on 6 million noun phrases from the MetaMetric

Lexical corpus. On an average, there were about 390,000 features in their Maxi-

mum Entropy model. The system was tested on 668 TOEFL essays and achieved

90% Precision and 40% Recall. Table 2.5 shows effectiveness of individual features

used by them. De Felice and Pulman [2007] propose a machine-learning based

approach to detect prepositional errors depending on a syntactic and semantic

context. They used a richer set of syntactic and semantic features. Their approach

suggests a preposition which is most likely to occur in that context. The context of

the prepositions which are found in an English corpus is represented by a vector

containing 307 features. They assumed that a set of 307 features may capture all the

latent elements of a sentence that may help to recognize a preposition accurately.

They selected these features based on a study of most frequent errors generated by

36
English learners. Head Noun, Number, Noun Type, WordNet information, Named

Entity information and ± 2 POS tag window are some examples of features they

have used. They also used additional features like whether the noun is modified

by a predeterminer, possessive, numeral and/or a relative clause or whether it is

part of a ‘there is · · · ’ phrase. To learn associations between contexts and prepo-

sitions these vectors are processed by a voted perceptron algorithm. Artificially

created test set containing preposition errors were used to test their system. They

found that their system can successfully detect between 76% to 81% of errors. Later

De Felice and Pulman [2009] used Maximum Entropy classifier for correction of

propositional errors in English of second language learner’s writing. The feature

set used by them contains a wider range of syntactic and semantic elements, includ-

ing a full syntactic analysis of the data. Their system achieved average Precision

of 42% and Recall of 35%. Fujishima and Ishizaki [2011] proposed a method to

identify inappropriate word combinations in a raw English corpus using an unsu-

pervised algorithms based on the One-Class Support Vector Machine (OC-SVMs).

Combined with n-gram language models and document categorization technique,

their OC-SVM classifier classifies a sentence into ill-formed or well-formed class.

Oyama and Matsumoto [2008] also proposed a similar approach. They combined

n-gram features and supervised document categorization technique based on the

hard margin SVM to find the learners’ error in Japanese text. But Fujishima and

Ishizaki [2011] followed an unsupervised technique to reduce the required com-

putational cost. To compare their approach, they built a supervised SVM classifier

following the approach taken by Oyama and Matsumoto [2008] and reported that

using their unsupervised algorithm they achieved almost the same prediction ac-

curacy as the supervised learning algorithm. They tested their system on 3155

selected erroneous sentences and achieved accuracy 79.30% with bigram model,

37
86.63% with trigram model and 34.34% with quadrigram model. They found that

the classification accuracy using quadrigram model is lower than trigram model.

Web Counting

Empirical NLP systems rely on a large sized corpus of text in order to resolve

ambiguity. The corpus helps to determine which candidate is more frequent than

other in similar contexts. The accuracy of disambiguation process improves with

the increment of size of the corpus [Banko and Brill, 2001]. As the World Wide

Web (WWW) is the largest corpus till now, many researchers incorporate web

frequency counts to identify and correct writing errors made by non-native writers

of English. To correct collocation and determiner error Yi et al. [2008] proposed a

web-based proof reading methodology. Initially, an input sentence is preprocessed

using POS tagger and chunker to identify the check points. These check points

depict the context around the determiner and collocation. To find the appropriate

examples from the web, queries are generated in three granularity levels (viz.

reduced sentence level, chunk level and word level) according to the syntax of a

sentence. Generally, number of queries depends on the number of target solution

set. To find the appropriate examples of determiner from the web, a query may

look like { Wi−2 Wi−1 null Wi+1 Wi+2 } or { Wi−2 Wi−1 a Wi+1 Wi+2 } or { Wi−2 Wi−1 an

Wi+1 Wi+2 } or { Wi−2 Wi−1 the Wi+1 Wi+2 }. Since long queries have fewer web counts

than short queries, each count is multiplied with the number of words in the

query. If the weighted count is very low then the web unable to provide enough

support to determine the existence of an error. Otherwise, ratio between weighted

count for query containing writer’s determiner and maximum weighted count for

query containing other determiner is calculated and compared to a predefined

38
threshold. If the ratio is smaller than the threshold then an error is flagged.

Evaluation of the system on a real world ESL corpus, reported 62% Precision and

41% Recall. Hermet et al. [2008] described a web-based frequency count algorithm

to detect and correct the prepositional errors in French language. They use a two-

phase hybrid approach combining rule-based and statistical approaches. In the

first phrase, a short expression using rule-based method is generated in order to

capture the context around the preposition in the input sentence. In the second

phase, web searching technique is used to evaluate the frequency of this expression

by considering alternative prepositions instead of the original one. They tested

this algorithm on a corpus of 133 French sentences written by intermediate second

language learners and they achieved 69.9% accuracy. They have also reported that

when a corpus of frequent n-grams is used instead of the web, the performance of

their system degrades.

2.3.3 Statistical Machine Translation Approach

Some researchers have used monolingual Statistical Machine Translation (SMT)

paradigm to detect and correct grammatical errors in text. They have trained

parallel corpora of grammatical and ungrammatical sentences and translate from

ungrammatical to grammatical sentences using their SMT system. Brockett et al.

[2006] showed that a noisy channel model (instantiated within the paradigm of

SMT) can successfully provide editorial assistance for non-native writers. SMT

technique provides a natural mechanism for suggesting a correction, rather than

simply indicating an error flag. Their system is able to correct 61.81% of mistakes in

a set of naturally occurring examples of mass noun errors found on the World Wide

Web. Liu et al. [2008] proposed a noisy channel model along with a novel relative

39
position language model for correcting word order errors in sentences produced

by second language learners of Chinese. To detect word order errors, they used

SVM classifier whereas for correcting those detected errors they followed a noise

channel model. For a given erroneous sentence E having word order errors, their

model tries to find out the most probable corrected sentence using equation 2.7.

Ĉ = argmax P(C|E)
c
(2.7)
= argmax P(E|C).P(C)
c

Here, C represents a corrected sentence, P(C|E) is the reordering model and P(C) is

the probability of corrected sentence. The probability of C is estimated using a lan-

guage model derived from a large corpus of correct sentences. A weighted relative

position score is used as a language model P(C) to circumvent the limitation of cap-

turing long distance lexical relationship by an usual n-gram language model. The

reordering model estimates the transformation probability of a reordered sentence

for a given input sentence. For this model, they used probability of C generated by

PCFG (Probabilistic Context Free Grammar) as a structural transformation proba-

bility. Experimental result shows that the overall accuracy of their error detection

module is 96.7%. They used BLEU([Papineni et al., 2002]) score as the evaluation

metric to evaluate the performance of their word order error correction module.

Their result shows that their error correction methodology outperforms the usual

n-gram based approach. They also found that the proposed system’s performance,

in term of BLEU score, can be improved by 20.3% and 26.5% when compared to

n-gram and SMT-based base system, respectively. Hermet and Désilets [2009]

proposed a “round-trip” bilingual SMT technique to correct preposition errors in

French learner writing. A writer’s L2 language is translated to the writer’s L1

40
language and then back to L2. When the round-trip MT system encounters an ill-

formed chunk in L2 language, it makes a word-by-word translation of that chunk.

Afterwards, when the system tries to translate L1 to L2, it produces a better L2

translation of that chunk than original L2 sentence. Thus using round-trip trans-

lation, errors present in the L2 sentence have been repaired. They tested their

methodology on 133 French sentences containing prepositional error and reported

66.4% accuracy using their round-trip SMT method. The performance of their

Round Trip SMT method was slightly worse than their web-count [Hermet et al.,

2008] method. Later, they proposed a hybrid method combining the round-trip

SMT and web-count. In this hybrid model, round-trip SMT works as a back-up

when their generated query using web-count method got almost zero hit. Using

this hybrid model they achieved 82.1% accuracy.

2.4 Comparison between existing approaches

A rule-based system is based on core linguistic knowledge. It depends on hand-

crafted rules generated by language experts. In this approach, it is easy to incorpo-

rate domain knowledge into linguistic knowledge which provides highly accurate

results. Other than the above mentioned advantages, a rule-based system is easy

to understand. Thus, the user can easily extend the rules for handling new error

types. Rules can be built incrementally by starting with just one rule and then

extending it. Each rule of a rule-based system can be easily configured. A rule-

based system provides detailed analysis of the learner’s writing using linguistic

knowledge and provides reasonable feedback. Such feedback help learners to im-

prove their writing skill. Furthermore, the linguistic knowledge acquired for one

natural language processing system may be reused to build knowledge required

41
for a similar task in another system. Both grammatical and ungrammatical sen-

tences can be parsed using constraint relaxation. The errors in an ungrammatical

sentence can be easily identified based on the constraints which are relaxed during

parsing of the sentence. The main advantage of using mal-rules is the simplicity

with which they can generate feedback. High precision can be achieved by ap-

plying properly created constraints and mal-rules. The main disadvantage of the

Rule-based approach is that complexity of the grammar increases exponentially

as we try to solve different types of errors. Rule-based approaches need a lot of

manual effort. This increases cognitive load on the human analyst and also in-

creases the degree of ambiguity in the grammar. Moreover, constraint relaxation

technique is not well suited for parsing sentences with missing or extra words.

Constraints and mal-rules have to be pre-defined. Improperly designed mal-rules

due to casual observations of domain experts can also pose a problem. Shallow

parsing is preferable than parsing with Precision grammar when there is a dearth

of sufficient linguistic rules. It is difficult to detect the potential erroneous words

within an input sentence without using an explicit error model. Failure of parsing

does not always reliably ensure that the input sentence is ungrammatical because

the insufficient coverage of grammatical rules may also be a cause of unsuccessful

parsing. The efforts required for grammatical error detection and correction varies

depending on the involvement of the error types and the grammatical context in

which the errors occur. However, one of the main disadvantages of rule-based

approach is that it requires complete grammar rules to cover all types of sentence

constructions. Though varieties of grammar formalisms are available but till now

robust parsers with sufficient linguistic rules are not available. Moreover, existing

rule-based parsers suffer from the curse of natural language ambiguities which un-

necessarily produce more than one parse tree even for the correct input sentence.

42
These are the limitations of parsing strategy.

On the other hand, Machine Learning (ML) based approaches usually rely on

large sized training data and parallel texts. When the training set and the test

set are similar, then ML approach provides good results. Data sparseness poses a

problem for ML. Due to data sparseness, many grammatical constructs may never

have been encountered. As most of the time, ML based system does not provide

necessary comments on errors, users are usually surprised when system predicts

a correct sentence as wrong. Results of ML based systems are difficult to interpret.

Sometimes debugging the reasons of system’s failure becomes very complicated

because the results are generated by aggregating probabilities and frequencies.

Another problem is that, some ML based systems rely on threshold values which

are usually estimated heuristically. Threshold may vary depending on the domain

of text where the system is trained or tested. If an erroneous word interacts with

other erroneous words then the correction of either error cannot be done indepen-

dently. Moreover, if other errors lie within the context window of an erroneous

word, then the extracted features depending on that context window may also

contain some of these errors leading to unreliable classification. Unfortunately,

corpora that are used are not large enough to cover all range of lexical patterns

of a given language. That implies some lexical occurrences are left unexamined.

One solution is to use the World Wide Web as a linguistic corpus. An advantage

of web based grammar correction is that it is dynamic in nature. The web search

hits change with the change of language and also reflect the current state of the

language. Moreover, most of the contents of web are freely accessible. Inspite of

several advantages, it has lots of limitations. Kilgarriff [2007] correctly pointed

out several limitations of the Web Count approach for grammar correction. Firstly,

commercial search engines do not provide the root/stem or POS tag of the given

43
input sentence. Secondly, there are constraints on numbers of queries and numbers

of hits per query. Thirdly, search hits are for pages, not for instances. Last but not

the least, web count results vary for different search engines.

In contrast to Rule-based approach, SMT approach does not rely on handcrafted

complex linguistic rules or regular expressions. Therefore, little or no linguistic

expertise is required for developing and maintaining applications. SMT approach

heavily relies on the availability of large amount of parallel training sentences. The

expense and difficulty of collecting large quantities of raw and annotated learners’

parallel corpora pose an obstacle to this approach.

2.5 Open Problems and Future Directions

Despite existing approaches, reliable grammatical error detection and correction is

still a very difficult task. We cannot simply apply the existing approaches for our

Bangla grammatical error detection and correction task. As discussed in Chapter

1, Bangla is a morphologically rich language [Bhattacharya et al., 2005; Dandapat

et al., 2004] and has free word order. State-of-the-art CFG is not applicable [Shieber,

1985; Begum et al., 2008; Bharati et al., 2010] here. In addition to this, lack of robust

parsers, insufficient linguistic rules and dearth of error annotated parallel corpora

make this grammar correction task much more challenging.

We prefer Natural Language Generation (NLG) [Dale et al., 1990; Reiter and

Dale, 2000; Hovy, 1991; Dale et al., 1998] approach instead of Natural Language

Understanding (NLU) [Allen, 1987]. The main reason behind preferring this ap-

proach is that we need not model the ungrammatical sentences as has been done

in classification based or statistical machine translation based approach. Broad

44
coverage linguistic rules also are not required like a rule based system. This sys-

tem is suitable where robust parsers and linguistic rules are not available. The

NLG based approach maps non-linguistic representations to natural language ex-

pressions. Any system based on this approach identifies the main keywords in a

sentence and then reconstructs the sentence from these keywords. This technique

is suitable for erroneous sentences where major corrections are required. The as-

sumption behind this approach is that the user can supply the important key words

of the sentence, even if the user is unable to write a grammatically correct sentence.

It consists of two main steps. Initially, without considering grammatical errors and

other noises, NLG based system extracts a meaning representation from the in-

put sentence then from the meaning representation it generates a grammatical

sentence. Baptist and Seneff [2000] followed NLG approach for their conversa-

tional system named GENESIS. We have applied NLG approach for our Bangla

grammatical error detection and correction which will be discussed in Chapter 4.

45
CHAPTER 3

AUTOMATIC CREATION OF BANGLA ERROR

CORPUS

“A collection of texts assumed to be representative of a given language, or other subset of

a language, to be used for linguistic analysis.” – Francis [1982]

A sufficiently large error corpus is essential for training and testing of any

grammar correction methodology. There is a dearth of error-annotated learner

corpus of Bangla text. One of the major problems of building error corpus from

learners’ data is that the process is very time consuming. It also requires linguistic

knowledge to examine each sentence of learners’ text to determine nature and

frequency of errors. To overcome this problem, a corpus of ungrammatical Bangla

sentences has been created automatically considering performance errors and lan-

guage learning errors that occur frequently. This chapter is more closely aligned

to the task of automatic error corpora creation. Before starting our discussion on

automated error corpus creation methodology, we illustrate types of text errors

committed by Bangla Second Language Learners at the time of writing text.

3.1 Errors in Text

Bangla Second Language Learners often commit grammatical mistakes while writ-

ing text because of their lack of language knowledge (Language Learning Error)
and due to oversight, carelessness or tiredness (performance error). Performance

errors can occur mainly due to four operations: insertion, deletion, transposition

and substitution. When an error involves more than one operation, it is known as

Composite Error. There are two primary concerns at the time of automatic error

corpus creation, first one being linguistically realistic and the second one is to

mimic the error scenarios that happen normally. To analyse the kind of naturally

produced error scenario we have collected 1500 sentences from 10 standard native

students’ exam papers of Bangla and also have collected second language learn-

ers’ data from students whose first language is either Hindi or Oriya or Telegu.

Performance errors and language learning errors occurred in their text are then

carefully analysed. Exam papers are collected with the assumption that students

make more mistakes in the time of examination as they are usually in a hurry to

complete their answers within the limited time period. In the course of studying

Second Language Learners text, it has been found that the proportion of errors oc-

curred by substitution operation is much more than any other operation. We have

seen than substitution errors and deletion errors committed by second language

learners are 14% and 18% higher than native speakers. However, an interesting

observation was insertion errors committed by native speakers are much higher

(21%) than second language learners. The proportion of transposition errors com-

mitted by second language learners and native speakers are much less than any

other operation. Transposition errors committed by native speaker are slightly

higher (4%) than second language learners. After analysing unique words col-

lected from native speakers and second language learners real data, we found

native speaker committed 13.5% errors whereas percentage of errors committed

by second language is learners is 34%.

Figure 3.1 shows the proportion of performance errors caused by each of the

47
four operations. The Native Speakers and the Second Language Learners make

Figure 3.1: Proportion of Errors in Native Speakers and Second Language Learners
Corpus.

same kinds of mistakes such as misuse of punctuation and cohort/homophones.

But study [Leacock et al., 2010] shows that Second Language Learners make many

more mistakes than native speakers. Most frequent error types produced by native

speakers may not be produced by second language learners. For example, errors

generated while writing complex sentences are infrequent for language learners,

as most of the time language learners avoid writing complex sentences. They

write complex sentences only when they have enough confidence in their ability

to construct them correctly. Second Language Learners can be of two types viz.

L1 and L2. Kind of errors produced by L1 Language Learners are influenced

by their native language. When native languages are similar but not identical,

L1 produces errors due to negative transfers. They fail to find exact equivalence

between these two languages. On the other hand, L2 Language Learners produce

errors because of their incomplete knowledge of syntactic and/or morphological

irregularities. They face trouble due to the novelty of the new language [Leacock

et al., 2010]. After analyzing the collected Bangla second language learners’ data

we came to know that the above statements (quoted in [Leacock et al., 2010]) are

also true for Bangla language. Therefore, learners who learn Bangla language

48
Table 3.1: Examples of errors committed by a Bangla Second Language Learner

Erroneous Sentence Operation Comments


Bangla: aami baajaare eka Substitution
iimaanadaara puruSha dekha-
laama
English: I saw an honest man in User did not find suit-
the market able Bangla word for
“honest”.
Bangla: aami ka.Daka chaa Substitution
khaaba
English: I will drink strong tea User did not find suit-
able Bangla word for
“strong”.
Bangla: aapani ki sochachhena ? Substitution
English: What are you thinking ? Bangla root verb is re-
placed by Hindi root
verb.
Bangla: duudhabaalaa jala sa.nge Substitution,
duudha milaaYa Transposition
and Deletion
English: Milkman mixes water Nominal inflection
with milk “er” of “jaler” is
deleted. “milaaYa” is
substituted in place
of “mishaYa”. “jalera
sa.nge duudha” is
transpose in place of
“duudher sa.nge jala”.

having the background of Oriya, Assamese or Hindi as native language produce

different kinds of errors than learners having native languages like Malayalam,

Tamil, Telegu or English. We have classified the types of errors according to

the operations involved in performance error and also depending on language

learning errors. Table 3.1 shows examples of errors committed by a Bangla Second

Language Learner having mother tongue Hindi. Figure 3.2 shows taxonomy of

errors found in Bangla Text of second language learners. We shall now elaborate

below different kind of errors depicted by second language learners.

1. Transposition Operation:
• Incorrect Sentence:

49
. Types of Errors

Operational Error Grammatical Error

Transposition Tense Error

Addition
Person Error
Repeated Word

Unnecessary Word Case Error

Deletion Adjectival Suffix Error

Implicit Subject
Pronoun Error

Implicit Object Sentence Fragment Error

Substitution Subject-Verb Agreement Error

Cohort Replacement
Count Error

Figure 3.2: Taxonomy of errors found Bangla text of second language learners.

Bangla: theke gaachha phala pa.De


English: from tree fruit falls.
Here the post position theke (from) is placed before noun gaachha (tree).
• Correct Sentence:
Bangla:gaachha theke phala pa.De
English:Fruit falls from tree.

2. Addition Operations:
(a) Repeated words:
Bangla: aami ekati *1 bhaala bhaala Chele
English: I am a *good good boy.
(b) Unnecessary words:
Bangla: paramaaNu anu apekShaa *adhika kShudratara
English: atom is *more smaller than molecule.

3. Deletion Operations:
(a) Implicit Subject:
Bangla: *[ ] tomaara maÑgala karuna (Subject:iishbara is missing here)
English: May *[ ] bless you. (Subject: God is missing here)
(b) Implicit Verb:
Bangla: tumi ki maadhyamika pariikShaa *[ ] ? (Verb: debe is missing here)
English: Will you *[ ] matriculation exam? (Verb: give is missing here)

4. Substitution Operations:
(a) Similar word or Cohort replacement:
1
* indicates error word in the sentence

50
• Incorrect Sentence:
Bangla: *bale baagha thaake
English: *tell tiger lives
• Correct Sentence:
Bangla: bane baagha thaake
English: Tiger lives in forest
Here bale and bane are cohort of each other but bale is verb and bane is
noun. In literature this type of error is also known as real word spelling
error.

Types of Grammatical Errors

1. Tense Error:
• Example 1:
Bangla: aami prashnapatra pa.Daba o uttara diYechhilaama
English: I will read the question paper and I gave the answer.

• Example 2:
Bangla: gatakaala aami sinemaa Jaaba
English: Yesterday I will go to Cinema.

• Example 3:
Bangla: Jakhana aami darajaa khulachhilaama takhana se ghare Dhuke pa.Dechhila
English: When I was opening the door then he entered the room.

2. Person Error:

• Example:
Bangla: chhaatraraa nishchaYa bidyaalaYa Jaabe Jadi *se pariikShaa dite
chaaYa
English: student will definitely go to school if *he wants to appear in the exam.
Plural sense of student has been lost by the singular representation of
‘he’.

3. Case Error:

• Example:
Bangla: eTaa *kaakaaraa bai
English: This is uncle’s book
The suffix raa of the noun kaakaa (uncle) is changed from genitive case
‘ra’.

4. Adjectival Suffix Error:

51
• Example:
Bangla: *daYaamaYii shikShaka aasachhena
English: The kind-hearted teacher is coming
The female suffix maYii of the word daYaa (kindness) is changed from
male suffix maYa which goes with shikShaka (male teacher).

5. Improper use of punctuation:

• Example 1:
Bangla: tomaara naama ki |
English: What is your name .
Here the punctuation | is used instead of ‘?’ symbol.

• Example 2:
Bangla: aami*, dekhalaama se aasachhe |
English: I, saw he is coming.

6. Sentence Fragment:

• Example:
Bangla: aami gaana gaa_iba *| jadi tumi naacha |
English: I will sing. if you dance.

7. Invalid Subject-Verb agreement:

• Example:
Bangla: aami bhaata *khaabena
English: I eat rice
Here the subject aami (I) is the first person non honorific but the person
information of the verb khaabena (eat) is third person honorific.

8. Count Error:

• Example:
Bangla: aamaara tinajana bandhu aachhe : jaYanta, raajiiba, debaaruna o
saurabha |
English: I have three friends: Joyanta, Rajib, Debarun and Saurabh.

3.1.1 Previous Work

Stemberger [1982] investigates the performance error of native speaker spoken

language and reports proportion of the four types of error as follows: substitution

52
(48%) > insertion (24%) > deletion (17%) > combination (11%). Foster [2005] has

manually created an error corpus for English and has classified missing word

errors based on the Part of Speech tag of this missing word. According to her “98%

of the missing POS come from the following list (the frequency distribution in the

error corpus is given in brackets): determinent (28%) >verb (23%) > preposition

(21%) > pronoun (10%) > noun (7%) > to (7%) > conjunct (2%)”. But manually

creation of such corpus is very time consuming and non trivial task. Brockett

et al. [2006] created an artificial error corpus by introducing mass/count noun

errors. They treated the error correction task in the machine translation point of

view. Their aim was to apply Statistical Machine Translation (SMT) technique

for converting ungrammatical sentences containing mass/count noun errors to

grammatical sentences. Wagner et al. [2007] have suggested a novel approach

of automated error corpus creation. They have carried out a detailed analysis of

Missing Word Errors, Extra Word Errors, Agreement Errors and Covert Errors. Lee

and Seneff [2008] created artificial error corpora by introducing verb form errors.

To mimic the real life errors, Foster and Andersen [2009] designed the GenERRate

tool. Their algorithm generates error corpus by introducing error along the line of

the previously specified real life error templates.

3.2 Experimental Dataset

Various online resources are available nowadays, from where Bangla Unicode

sentences can be collected. These include -

1. Bangla online news papers like “Ananda Bazar Patrika” (http://www.anandabazar.


com/)
2. Online version of Bangla literatures written by Rabindranath Tagore, Sarat
Chandra Chattapadhyay and Bankim Chandra Chattapadhyay (http://

53
www.nltr.org/) published by Society for Natural Language Technology Re-
search (SNLTR).

3. Bangla blogs (http://www.amarblog.com/) etc.

Special care needs to be taken at the time of selecting well-formed sentences

due to different reasons. In Bangla literature, diglossic variations are found in

the form of “Sadhu” and “Chalit”. Sentences written in “Sadhu” are mostly found

writings of Bankim Chandra Chattapadhyay and writings of Rabindranath Tagore

and Sarat Chandra Chattapadhyay. Sentences written in “Sadhu” are not used in

day-to-day communication. On the other hand, most recent works follow “Chalit”

form as it is used in daily communications. Due to informal communication,

Bangla blogs contain sentences of “Benglish” (a mixed language of Bangla and

English) language [Kundu and Chandra, 2012]. Sentences written in “Sadhu” and

“Benglish” are not important in our case, as our focus is to detect and correct

grammatical sentences used in day-to-day written communication. Therefore, at

the time of sentence selection from online websites (like http://www.nltr.org/,

http://www.amarblog.com/ etc.), we have manually filtered out the sentences

written in “Sadhu” form and “Benglish” language. In addition to that, we have

collected sentences from a detective novel, namely “Feluda Samagra” written by

Satyajit Ray. The reason behind selecting the novel is that sentences are written

in “Chalit” and most of the sentence are simple and representative of those that

are used in day-to-day communication. We have also collected sentences from

“Jekhane Dactar Nei” a Bengali book translated from English work “Where There

is no Doctor”. Thus, we have collected Unicode sentences from various domains

including Literature, Sports, Health, Politics and Business (2005-2012).

We assumed that the syntax and semantics of the collected sentences are correct

as they are mostly collected from different newswires which are normally edited

54
and proof-read. Corpora from multiple domains have been collected to avoid the

skewed distribution of data. From this set of collected Bangla sentences (approx

4,68,582), sentence length distribution has been measured. It is found that sen-

tences containing 9 words are the most frequent in this corpus. Figure 3.3 shows

the Bangla Sentence length distribution.

Figure 3.3: Bangla Sentence Length Distribution.

3.3 Methodology

Now we will discuss our novel approach for error corpus generation. The proce-

dure is as follows:

Step 1. If a grammatical sentence contains n words then transposition between

two consecutive words can generate (n-1) sentences with assumption that

only one transposition done in each sentence. Table 3.2 shows 3 sentences

generated from a sentence containing 4 words. Though the last two examples

55
Table 3.2: Examples of Transposition Operation.

Operation Example
Source gaachha theke phala pa.De2
Transposition-1 theke gaachha phala pa.De
Transposition-2 gaachha phala theke pa.De
Transposition-3 gaachha theke pa.De phala

in the table are grammatically correct, but transposition-2 is semantically

weird and transposition-3 is relatively uncommon.

Step-2 Transposition of highly collocated sequences surely induces noise in a

grammatical sentence. Erroneous sentences have been automatically gener-

ated by changing the word order of different types of Bangla collocated words

sequences collected from the corpus. We distinguish between the following

three categories: echo words (if w1 w2 is a word sequence and w2 has no mean-

ing), hyphenated words (w1 and w2 are connected by hyphen) and highly

collocated words Extraction of echo words and hyphenated words is simple.

One can use a simple regular expression [a−zA−Z]+ −[a−zA−Z]+ for collect-

ing hyphenated words from corpus and [\s]([a − z]([a − z]+ ) \ s+ [a − z] \ 2)[\s]3

for collecting echo words. For collecting collocated and co-occured word

sequences from corpus, a statistical approach [Manning and Schütze, 1999]

has been used. Variance(σ2 ) of the number of words separating word w2 from

word w1 have been estimated and low variance word sequences have been

filtered using a statistical significance test (t-test) with 99.5% confidence level.

The null hypothesis H0 is that the word sequences (w1 w2 ) appear indepen-

dently in the corpus. These filtered word sequences are cross verified with
2
Bangla Sentence: gaachha theke phala pa.De
English Word Meaning: Tree from fruit fall
English Translation: Fruit falls from tree
3
Python regex notation has been used here. The pattern will match with the word pairs where
the first characters of both words differ and the remaining characters remain unchanged. For
example, in the word pair “nardamaa Tardamaa”, only the first character differs and rest of the
character sequence is same for both words.

56
Mutual Information (MI) values between wi and w j . The word sequences

having higher Mutual Information and lower variances and having t-value

greater than 2.57 (considering α=0.005) have been considered as collocated

words. MI between words w1 and w2 has been estimated as follows:

p(w1 , w2 )
MI(w1 , w2 ) = log2 (3.1)
p(w1 ).p(w2 )
Count(w1 ,w2 )
where p(w1 , w2 ) = N
and Count(w1 , w2 ) is the number of sentences in

which w1 and w2 co-occur and N is the number of sentences in the training

corpus. Accordingly the probability of the denominator of equation 3.1 is

calculated.

Step-3 Another way of generating erroneous sentences is by replacing a word with

its cohorts and homophones. Cohorts are generated using regular expression

by adding, deleting or substituting a single character or moving character

sequences in a word. These generated words are then verified with spelling

dictionary to ensure that the generated words are correctly spelled. In this

process, if we assume that k number of words/cohorts can be generated on

an average from a single word then k x n sentences can be generated from a

sentence containing n words. Levenshtein [1966] also can be used to prune

the over generated cohort words. Words having minimum edit distance with

the original word are selected for the cohort list.

Step-4 By deleting a particular word from a sentence containing n words we can

generate n sentences where each sentences containing (n-1) words. Table 3.3

shows 4 sentences generated from a sentence containing 4 words where each

sentence containing 3 words.

57
Table 3.3: Examples of Deletion Operation.

Operation Example
Source gaachha theke phala pa.De
Deletion - 1 theke phala pa.De
Deletion - 2 gaachha phala pa.De
Deletion - 3 gaachha theke pa.De
Deletion - 4 gaachha theke phala

Table 3.4: Examples of Addition Operation.

Operation Example
Source gaachha theke phala pa.De
Addition - 1 ⃗ gaachha theke phala pa.De
W
Addition - 2 gaachha W⃗ theke phala pa.De
Addition - 3 ⃗ phala pa.De
gaachha theke W
Addition - 4 ⃗ pa.De
gaachha theke phala W
Addition - 5 gaachha theke phala pa.De W⃗

Step-5 By addition a word from a vector

 
w 
 1 
 
 
w 
 2 
 
 
W = w3 

 
 
 ... 
 
 
 
w 
v

in (n+1) possible position of a sentence containing n words, we can generate

V x (n+1) sentences where V is the length of the vector. Here we are consid-

ering one word is inserted at a time. Table 3.4 shows number of sentences

generated by addition operation. Thus applying step-1 to step-5 we can gen-

erate approximately (n-1)+ k x n+ n + V x (n+1) sentences from a sentence

containing n words.

Step-6 Figure 3.4 shows a N x N tag association matrix which is generated af-

ter analyzing 5000 manually POS tagged Bangla sentences having different

58
syntactic categories. Every possible combination of two POS tag sequence is

searched programmatically from this tagged corpus. On successful match,

each cell of the matrix corresponding to the tag sequence is filled with 1,

otherwise the cell contains 0. A cell with zero value indicates an invalid rela-

tionship i.e. POS tag of column Ni cannot occur after tag of row N j . In other

words POS tag of Ni does not follow tag N j . For example, post position (PPS)

cannot appear after intensifier (INT). Consulting this matrix, the mal-rule

can be generated which can be used for transposition of the word sequence

of a sentence after being annotated by an automatic POS tagger. Description

of our POS tagger given in subsection 3.3.1.

Figure 3.4: POS tag association matrix.

3.3.1 Bangla POS Tagger

In this research, we have used a HMM based POS tagger [Dandapat and Sarkar,

2006; Rabiner and Juang, 1993; Van Gael et al., 2009; Cutting et al., 1992] which has

been developed in our lab (http://nlp.cdackolkata.in/nlpcdack/POSTAG/index.spy).

The POS tagger has been trained on 5345 annotated sentences having 13215 unique

words. When small numbers of annotated sentences are available, less numbers

59
of tags are preferred [Bharati et al., 2006]. It has been seen that sentences annotated

with less number tags lead to efficient machine learning. Moreover, when the

number of tags are less accuracy of manual tagging is higher due to less disagree-

ment among annotators [Bharati et al., 2006]. However, generated tagset should

not be so coarse such that important lexical and grammatical information encoded

in sentence are missed out. Standardization of tagset for Indian languages is a

very challenging task. Studies related to this issues have already been reported

in [Bharati et al., 2006; Sankaran et al., 2008] and tagsets have been designed for

Bangla language based on these studies. We have collected 5345 raw sentences

from MIT Bangla corpora4 . Initially we have decided to have a finer tagset (con-

taining 90 tags), but later we come up with comparatively course tagset having

only basic 14 tags (see Table 3.5). Our final tagset has been prepared after con-

sulting and comparing with available tagsets like Penn tagset5 [Santorini, 1990] ,

tagset designed by IIIT Hyderabad6 [Bharati et al., 2006], the BIS POS tagset [Dash,

2013] and tagset reported in [Sankaran et al., 2008]. The sentences which had been

previously annotated with finer categories and other tagsets, are automatically an-

notated with these 14 tags. Then, errors induced during such automatic mapping

are manually verified and corrected. Thus, 5345 sentences are manually annotated

using 14 basic tags.

Our test set contains 500 Bangla sentences having 3228 number of unique

words. The number of unknown words7 in our test set is 1392. Table 3.6 shows

POS tag distribution in our training and test corpus. Table 3.7 shows accuracy of

individual POS tag on our training and test sentences. Table 3.8 shows top three
4
http://tdil.mit.gov.in/
5
http://www.cst.dk/mulinco/filer/PennTreebankTS.html
6
http://ltrc.iiit.ac.in/nlptools2010/files/documents/POS-Tag-List.pdf
7
The term “Unknown words” means number of unique words that are not found in training
corpus.

60
Table 3.5: POS tags used in our tagger

POS Tag Description


PN Proper Noun
CN Common Noun
JJ Adjective
RB Adverb
PR Pronoun
VBF Finite Verb
VNF Non Finite Verb
VBN Verbal Noun
INT Intensifier
PPS Post Position
CC Conjunct
IND Indeclinable
DGT Digit
PUNC Punctuation

Table 3.6: POS tag distribution in our training and test corpus

POS Tag Distribution in Training set (%) Distribution in Test set (%)
CN 20.79 29.72
PUNC 14.20 14.97
JJ 12.81 7.19
VBF 11.73 13.38
PN 8.89 7.37
PR 5.65 7.5
CC 5.33 4.6
VBN 4.61 1.65
RB 3.34 4.25
VNF 2.68 0.08
IND 2.59 2.22
PPS 0.82 2.25
DGT 0.77 0.03
INT 0.60 4.79

61
Table 3.7: Accuracy of individual POS tag using HMM

POS Tag Accuracy in Train set (%) Accuracy in Test set (%)
PUNC 99.96 96.13
CN 99.35 95.24
PR 96.97 91.43
JJ 96.35 89.28
VBF 95.98 85.93
PPS 95.96 83.40
CC 95.58 81.29
VNF 94.45 79.00
VBN 93.95 82.24
RB 90.66 81.06
IND 90.56 76.67
INT 89.42 73.05
DGT 80.43 66.30
PN 3.93 11.30

Table 3.8: Three most common types of errors

Actual Tag Predicted Tag Prediction Error (%)


PN CN 88.7
DGT PN 26.95
INT JJ 20.07

wrong predictions by our POS tagger. The reason behind such wrong predictions

is due to less number of occurrences of the tag in our training corpus. Lexical gap

[Manning, 2011] in training corpus and number of unknown words in test corpus

is another reason for wrong predictions. From Table 3.8 we can see that often

PN tag is predicted as CN. Both PN and CN are nouns in the broader category.

Words tagged as PN or CN are agglutinated with similar morphological suffixes.

Therefore, wrong prediction of PN as CN does not hamper the performance of our

work. However, prediction of INT as JJ and DGT as PN are serious issues that need

to be considered. To disambiguate INT from JJ we have searched the words in the

Bangla word-tag dictionary. If the words belong to a closed set of INT containing

words like “ati”,“bhIShana”,“khuba” etc. then we simply tagged the words as

INT. Similarly if a word follow a pattern of digit like [0-9]+,[0-9]+/[0-9]+/[0-9]+,[0-

62
9]+[a-zA-Z]+ etc., the word is tagged as DGT. Thus, applying linguistic and pattern

matching rules after our tagging module, we reduce the errors of our POS tagging.

3.3.2 Confidence Score and Mal-rule Filters

Following the above mentioned procedure, we can generate erroneous sentences

from a corpus of grammatical sentences. Our procedure generates approximately

(n-1)+ k x n + n + V x (n+1) sentences from a sentence containing n words.

Therefore, the number of generated sentences using this method increases with

the number of words in a grammatical sentence. We have seen that the mode of

the sentence length distribution of our collected Bangla corpora is 9. This implies

that the number of sentences generated by our procedure is 8+ k x 9 + 9 + V x 10.

Those many sentences can be generated from a single sentence having 9 words. If

we have 33513 9-word sentences in our corpus of approximately 4,68,582 gram-

matical sentences, then 33513 * 8+ k x 9 + 9 + V x 10 sentences can be generated

using our method. Some Bangla sentences may have as many as 57 words but we

are not considering such cases as such sentences are very infrequent (see Figure

3.3). Moreover, as Indian languages are relatively free word order, some valid

well-formed sentences also get generated after this noise induction procedure.

Therefore manual filtering of ungrammatical sentences from this set of (n-1)+ k x

n + n + V x (n+1) sentences is tedious task. Proper sampling is required so that

sentences indicative of more frequently made errors have higher probability of

getting selected. Therefore we have applied both rule-based and statistical based

approach for collecting significant sample from this population. Initially we pass

the sentences though our HMM based POS tagger (see subsection 3.3.1)and then

generated tag sequences are passed through mal-rule detector which collect the

63
sentences containing improper POS tag sequences. We also have calculated the

confidence score of each sentence by calculating bigram, Mutual Information (MI)

and Relative Position Score [Liu et al., 2008]. A numeric score is assigned to deter-

mine the quality of the sentence. The sentence-level confidence measure is based

on the score of each and every individual word in the sentence. Confidence score

estimation using N-gram, measures the grammatical soundness of the sentence

and MI based confidence score, measures the lexical consistency [Raybaud et al.,

2009]. MI is used to detect presence of which word reduces the uncertainty of

appearance of another word in the same sentence. Confidence score of a sentence

using MI has been calculated as follows:

Score(S) = Score(w1 , w2 , w3 , · · · , wn )

1∑
n
= Score(wi )
n i=1
(3.2)

n
MI(w j , wi )
1∑
n
j=1, j,i
=
n i=1
n−1

Here MI(w j , wi )is calculated using equation 3.1. MI based confidence measures

do not take word order into account and instead focus on long range lexical

relationships. For this reason, we have also estimated the relative position based

confidence score. Confidence score of a sentence using Relative Position Score [Liu

et al., 2008] has been calculated as follows:

64
RPscore (S) = RPscore (w1 , w2 , w3 , · · · , wn )
( )
f reqDep (wi ,w j )
∑n ∑j−1
f reqInd (wi ,w j ) (3.3)

j=2 i=1
j−1
=
n−1

where f reqDep (wi , w j ) is the number of sentences in which wi and w j co-occur with a

constraint that w j appear after wi in a sentence and f reqInd (wi , w j ) is the number of

sentences in which wi and w j co-occur without any positional constraint. Mutual

Information has been used for proper selection of the erroneous sentences gener-

ated by substitute operation. Low Mutual Information ensures that a word in the

sentence is wrongly placed in the context of the other words. Bigram and Rel-

ative position scores have been used to select the erroneous sentences generated

by transposition operations. We have three confidence thresholds θbigram , θMI and

θRS for each of the three metrics. The range of the estimated scores varies with

the number of words in a sentence and selection of the confidence score estimator.

Therefore, we have normalized the scores generated by each estimators so that the

confidence values lies between 0 and 1. The normalization has been done using

the following formula:

Actual Score - Minimum Score


Normalized Score = (3.4)
Maximum Score - Minimum Score

Erroneous sentences generated by substitution operation are selected if their nor-

malized MI score is less than θMI . Similarly, erroneous sentences generated by

transposition operations are selected if their normalized bigram score is less than

θbigram and normalized RS is less than θRS . It implies that bigram and RS are

related with logical AND operation. These confidence thresholds are selected

65
Table 3.9: Experiment with confidence thresholds for generating erroneous sen-
tences generated by substitution operation

Erroneous sentences
θMI generated by substitution operation
Precision Recall F-Score
0.1 0.9 0.36 0.514286
0.2 0.9 0.367347 0.521739
0.3 0.9 0.367347 0.521739
0.4 0.9 0.367347 0.521739
0.5 0.88 0.458333 0.60274
0.6 0.911765 0.632653 0.746988
0.7 0.925 0.787234 0.850575
0.8 0.82 0.87234 0.845361
0.9 0.84 0.823529 0.831683

experimentally. Table 3.9 and Table 3.10 show change of precision and recall of

automatic error corpus creation with confidence thresholds. We have seen that

automatic error corpus creation methodology achieved highest F-Score when θMI

= 0.7 (for sentences generated by substitution operation), θbigram = 0.5, and θRS = 0.9

(for sentences generated by transposition operation). The error corpora creation

procedure with an English example is shown in Figure 3.5.

3.4 Result and Discussion

Following the experimental procedure described in section 3.3 we have gener-

ated erroneous sentences from randomly selected 1000 sentences from a corpus

of grammatical sentences. Then these generated ill-formed sentences are filtered

using mal-rule detector and depending on the confidence score (see subsection

3.3.2). After manually analysing the random sample of generated ill-formed sen-

tences, we found that 87% of generated sentences are really ungrammatical. Most

of these generated sentences have invalid POS tag sequences. Though some of

the generated sentences have valid POS tag sequences, the word sequences in

66
Table 3.10: Experiment with confidence thresholds for generating erroneous sen-
tences generated by transposition operation

Erroneous sentences generated by transposition operation


θbigram θRS
Precision Recall F-Score
0.2 0.5 0.769231 0.285714 0.416667
0.2 0.6 0.8125 0.787879 0.8
0.2 0.7 0.875 0.8 0.835821
0.2 0.8 0.875 0.8 0.835821
0.2 0.9 0.875 0.8 0.835821
0.3 0.5 0.769231 0.285714 0.416667
0.3 0.6 0.848485 0.8 0.823529
0.3 0.7 0.848485 0.8 0.823529
0.3 0.8 0.848485 0.8 0.823529
0.3 0.9 0.878788 0.828571 0.852941
0.4 0.5 0.846154 0.314286 0.458333
0.4 0.7 0.878788 0.805556 0.84058
0.4 0.8 0.878788 0.805556 0.84058
0.4 0.9 0.878788 0.805556 0.84058
0.5 0.5 0.769231 0.30303 0.434783
0.5 0.6 0.848485 0.823529 0.835821
0.5 0.7 0.757576 0.862069 0.806452
0.5 0.8 0.818182 0.84375 0.830769
0.5 0.9 0.818182 0.9 0.857143
0.6 0.5 0.846154 0.407407 0.55
0.6 0.6 0.848485 0.823529 0.835821
0.6 0.7 0.878788 0.805556 0.84058
0.6 0.8 0.878788 0.805556 0.84058

67
Figure 3.5: Simplified functional diagram of automatic error corpora creation.

68
Table 3.11: Erroneous sentences generated from a single sentence and selected
according to the confidence score.

Bangla Sentence Bigram MI R_S


Correct Sentence
gaachha theke phala pa.De 7.40E-0.26 0.6502461741 0.4810439560
Error sentences generated by Transposition operation
theke gaachha phala pa.De 3.02E-033 0.6502461741 0.4334249084
gaachha theke pa.De phala 1.85E-025 0.6502461741 0.43477564103
gaachha phala theke pa.De 2.64E-029 0.6502461741 0.4641941392
Error sentences generated by Addition operation
gaachha theke phala phala pa.De 6.59E-033 0.8127406288 0.5180275743
gaachha gaachha theke phala pa.De 6.65E-033 1.05182834701 0.49725020350
gaachha theke theke phala pa.De 7.50E-029 0.7025908508583 0.5030321530
Error sentences generated by Substitution operation
gaachha Theke phala pa.De 6.61E-033 -5.5447936457 0.3600000002
gaana theke phala pa.De 7.53E-030 -1.74079366467 0.39056776562
gaachha theke kala pa.De 3.76E-029 -3.3069949612 0.3964285715
maachha theke phala pa.De 7.58E-030 0.55386208974 0.40056776557
Error sentences generated by Deletion operation
gaachha phala pa.De 7.30E-026 0.59991544233 0.4375
theke phala pa.De 6.71E-023 0.23883813519 0.4367845696
gaachha theke pa.De 2.08E-018 0.64066086710 0.408854166667

these sentences are infrequent. Experimental result also shows that 13% of those

generated sentences are grammatical because insertion, deletion and substitu-

tion operation some time generates another grammatical construction. Table 3.11

shows sample of Bangla erroneous sentences generated by our method from a

grammatical sentence with their aforementioned confidence score. In this Table,

the first sentence is a correct sentence and the remaining erroneous sentences are

generated automatically. In this Table, R_S indicate the relative position score

of a sentence. Using echo words, hyphenated words and collocation collection

methodology as discuss in the step 2 of section 3.3. We have collected desired

results. Table 3.12 shows Bangla Echo words and Hyphenated words collected

from the corpus. Transposition between them might cause error to be induced in

a sentence. Transpositions of echo words are not allowable but transpositions of

69
Table 3.12: Bangla Echo words and Hyphenated words.

Echo Words Hyphenated Words


oShudha TaShudha aNu-paramaaNu
kha_i Ta_i adala-badal
goYendaa ToYendaa anumata-abhimata
chaakara baakara asukha-bisukha
chaNDaala phaNDaala aaina-aadaalata
jaata paata kaapa.Da-chopa.Da
nardamaa Tardamaa kaamanaa-baasanaa

Table 3.13: Automatically collected collocated and co-occurred word sequences.

W1 W2 Relative Positions MEAN SD TVAL MI


jij∼njaasaa karala 1 1 0 5.99 0.02028
chautrisha nambara 1 1 0 4 0.0106
ghaad.a naad.ala 1 1 0 3.16 0.008667
kamyunisTa paaTira 1 1 0 2.65 0.005921
chamake uThala 1 1 0 2.64 0.003883
satyi kathaa 1 1 0 2.7 0.002006
khrii puu 1,8,10 1.83 2.56 5.4 0.0295

hyphenated words are allowed sometime. For example, we may sometimes use

“baasanaa- kaamanaa” in place of “kaamanaa-baasanaa”, though these appearances

are very infrequent. Table 3.13 shows some automatically collected collocated and

co-occured word sequences along with their relative position, mean and variance

of relative positions, t-value and Mutual Information between these word se-

quences. Transposition of automatically collected echo words, hyphenated words

and collocated words induce noise in a grammatical sentence and this procedure

of automatic induction of noise gives a very good result.

70
CHAPTER 4

BANGLA GRAMMATICAL ERROR DETECTION

AND CORRECTION

“The principal design of a Grammar of any language is to teach us to express ourselves with

propriety in that language, and to be able to judge of every phrase and form of construction,

whether it be right or not.” – Lowth [1762]

The NLG based approach has been used for grammatical error detection and

correction of Bangla language. There are two levels of operations in an NLG based

approach. In the first level, the input word sequence (w1 ,w2 ,w3 ,· · · ,wn ) of a sen-

tence is transformed into over generated word vectors (w⃗1 ,w⃗2 ,w⃗3 ,· · · ,w⃗n ) which is

basically a trellis of all possible sentences. In the second level, a Language Model

with optimal search algorithm is used for selecting the best path from this search

space. The Language Model is used specially for scoring the various paths of the

trellis. The best path indicates the grammatically well-formed sentence whereas

the worst path indicates the ill-formed sentence. To create the trellis the input

word sequences are first passed though HMM based POS tagger (see subsection

3.3.1) and rule based morphological analyser that reduce each word to its root

form. Then using a morphological synthesizer, each root is over-generated by

including all possible suffixes with the root. In this phase, proper care is necessary

for selection and ordering of the suffixes. The most common ordering of suffixes

is a classifier followed by a case marker followed by an emphasizer. At the time of

morphological analysis and synthesis we have considered 14 classifiers (edera, eraa,


Table 4.1: Example of Nominal Morphological Synthesis

Root Classifier Case Marker E3mphasizer Generated Word


chhele raa chheleraa
chhele gulo chhelegulo
chhele Taa chheleTaa
chhele Ti chheleTi
chhele dera chheledera
chhele gulo ke chheleguloke
chhele Taa ke chheleTaake
chhele Ti ke chheleTike
chhele dera ke chhelederake
chhele gulo ke i chhelegulokei
chhele Taa ke i chheleTaakei
chhele Ti ke i chheleTikei
chhele dera ke i chhelederakei
chhele gulo ke o chhelegulokeo
chhele Taa ke o chheleTaakeo
chhele Ti ke o chheleTikeo
chhele dera ke o chhelederakeo
chhele gulo i chheleguloi
chhele Taa i chheleTaai
chhele Ti i chheleTii
chhele dera i chhelederai
chhele gulo o chheleguloo
chhele Taa o chheleTaao
chhele Ti o chheleTio
chhele dera o chhelederao

khaanaa,khaani, guli, gulo, Tuku, Taa, Ti, Te, To, dera, bRinda, raa), 10 case markers (e,

ete, era, ere, ke, te, Ya, Ye, ra, re) and 2 emhasizers (i, o). Morphological constituents

of a Bangla Noun (W) can be represented as W = R + CL? + CA? + EM? Here R,

CL, CA and EM denote Root, Classifier, Case Marker and Emphasizer respectively.

The symbol ‘?’ indicates CL, CA and EM can occur zero or one time, i.e. CL, CA

and EM can be implicit for a given word. Thus following this rule, for a given

root word ‘chhele’ we can generate inflected words as shown in the following table

4.1. We have used a rule based morphological analyser for Bangla. Initially Part

72
Table 4.2: Example of Nominal Morphological Analysis

Iteration Word Stripped Word Suffixes


1 chhelegulokei chheleguloke i
2 chheleguloke chhelegulo ke
3 chhelegulo chhele gulo

of Speech (POS) wise suffix lists have been prepared. In our NLG based grammar

correction system we have used our noun morphological analyser during correc-

tion of nominal inflectional errors. We have used a simple suffix striping algorithm

for noun morphology. The suffix stripping algorithm simply checks if the word

has any suffixes from the previously collected suffix list. This checking is done by

using regular expression. Then the suffix is stripped from the word. The same

procedure iterates on the remaining string (after stripping the suffixes from the

word). The number of iteration depends on the rules. For example the stripping

procedure iterates three times for noun. Finally, the remaining string is searched

in the root word dictionary for verifying its existence. If the root word is a proper

noun then it will not be found in the root word dictionary. Table 4.2 shows strip-

ping steps in each iteration during morphology of noun ‘chhelegulokei’. There

are some Bangla nouns that appear both as a root form as well as inflected form.

Examples of such Bangla nouns are “jaamaai”, “maalaai” etc. The word “jaamaai”

appears as a whole root words to indicate the meaning in English ‘son in law’

or inflected form like ‘jaamaa’ + ‘i’. Here ‘jaamaa’ is the root and its meaning in

English is ‘shirt’ and suffix ‘i’ is agglutinated with it to intensify the meaning (i.e.

‘only shirt’). Similarly, the word ‘maalaai’ is a root word which means “a special

type of sweet”; alternatively it also means a “garland” if the word is analyzed

as ‘maalaa’ + ‘i’. In such scenarios our noun morphological analyser returns the

whole word (considering as root with no inflection) and also in morphed form (i.e.

73
root + suffixes). The noun morphological analyser has been tested on 300 Bangla

inflected Common Nouns and 300 Bangla Proper Noun. The morphological anal-

yser yields an accuracy of 98.4% on Common Noun and 91.3% on Proper Noun

data. In case of post positions, the whole post position list is simply over gen-

erated, instead of using a morphological synthesizer. The Bigram model is used

here for calculating the scores of each node in the trellis. To avoid the sparseness

problem of data, Jelinek Mercer Smoothing [Jelinek and Mercer, 1980] is applied.

The Viterbi [1967] algorithm is used for selecting the optimal path from the trellis

depending on the scores generated by the language model. Figure 4.1 shows the

selection of best probable well-formed sentence from the trellis by bold line and

ill-formed sentences are marked with dotted lines.

Figure 4.1: Generative model for well-formed and ill-formed sentence detection.

74
4.1 Pruning of the Search Space

Availability of a good POS tagger and selectional restriction rules help avoiding

certain paths in the trellis. A rule-based function can be used to prune the search

space. Our linguistic function is defined by a set of hard constraints which are ba-

sically a knowledge base of linguistic selectional rules. For example, our linguistic

function can be defined as shown in Figure 4.2.

The function returns 1 when certain condition is satisfied and 0 otherwise. Ap-

Figure 4.2: Example of Linguistic function

plying Linguistic Hard Constraints on a trellis shown in figure 4.1, we can get a

pruned trellis (relatively smaller search space than the previous one) as shown in

figure 4.3

4.2 Selection of the Best Correction

It is also important to ensure that the corrected sentence is not too far away from the

ungrammatical one. To ensure this, initially k-best correct sentences are selected

from the trellis and then modified BLEU [Papineni et al., 2002] Score and Word

Error Rate (WER) are applied. BLEU is the geometric mean of n-gram match of

words with a brevity penalty and WER is calculated using Levenshtein Distance

(Edit Distance) between ungrammatical sentence and the correct sentence. WER

75
Figure 4.3: Pruned trellis after applying Linguistic Hard Constraints

76
is calculated as follows:

Insertion + Deletion + Substitution


WER(W, C) = (4.1)
Nr

Here W is the ungrammatical sentence, C is the correct sentence, Nr is the

number of words in the ungrammatical sentences. The higher the value of WER,

the lesser the value of similarity between two strings. The value of WER can vary

from 0 to 1 and some time the value becomes more than 1 when the length of

the correct sentence is greater than the ungrammatical sentence due to insertion

operation. Our aim is to select the correct sentence from a set of correct sentences

having a minimum WER rate. The BLEU score is calculated as follows:

 N 
∑ 
BLEU(W, C) = γ ∗ exp  λn log (Prec(W, C)) (4.2)
n=1
 N 
∑ 
here exp  λn log (Prec(W, C)) is the geometric mean of the modified n-gram Pre-
n=1
cision Prec(W,C) using n gram up to length N where Precision is calculated as :

Countmatch n gram (W, C)


Prec(W, C) = (4.3)
Countn gram (C)

and positive weight λn is calculated as λn = 1


(N+1)−n
i.e.when N= 3, weight for

unigram matching is λ1 = 0.33, weight for bigram matching is λ2 = 0.5 and for tri-

gram matching λ3 = 1 . Figure 4.4 shows n-gram matching score between following

ungrammatical and grammatical sentences. BLEU is a Precision based measure.

Brevity penalty is introduced to compensate for the possibility of proposing high

Precision of the hypothesis correction which has a less number of words than the

input ungrammatical sentence. Consider the sentence -2 and sentence -3 in figure

4.4. Grammatical sentence -2 has n gram Precision values

77
Figure 4.4: N-gram matching score between ungrammatical and correct sentences

Countmatch unigram (W, C2 ) 4


Prec(W, C2 ) = = (4.4)
Countunigram (C2 ) 7

And grammatical sentence -3 has ngram Precision value

Countmatch unigram (W, C3 ) 4


Prec(W, C3 ) = = (4.5)
Countunigram (C3 ) 6

The calculation shown above, grammatical sentence 3 has more Precision value

than sentence 2 though sentence 2 is closer to the input ungrammatical sentence

than sentence 3. The brevity penalty is calculated as follows:





 if c > r
 1
γ=
 (4.6)
 (1− cr )

 e if c ≤ r

Where c is the length of the correct sentence and r is the length of the ungram-

matical sentence. Here the BLEU score indicates reference sentence is more close to

the candidate sentence. Thus using a high BLEU score and low WER we can select

a suggested representative from a set of candidate correct sentences to ensure that

correct sentence is not too far from the ungrammatical sentence.

78
Our NLG based approach can also be used to correct performance error. Only

modification is needed in the trellis generation process. To generate the trellis

each word of the sentence is replaced with its cohort or homophones. Cohorts are

generated using regular expression by adding, deleting or substituting a single

character or moving character sequences in a word. These generated words are

then verified with Bangla spelling dictionary to ensure that the generated words

are correctly spelled. In this process, if we assume that k number of words/cohorts

can be generated on an average from a single word then k x n sentences can be gen-

erated from a sentence containing n words. Levenshtein distance also can be used

to prune the over generated cohort words. Words having minimum edit distance

with the original word are selected for the cohort list. Following this procedure

we can generate “mAchha” for given word “gAchha” and “khAtA”, “chhAtA” for

a given word “pAtA” and so on.

In our approach detection and correction is done in a single phase. It may

happen that the user has provided a grammatical sentence and the correction pro-

vided by the system is another grammatical sentence. In this scenario, we think the

system provided correction is more natural than the candidate sentence because it

chooses the correction based on language model from all possible sentences gen-

erated from the candidate sentence.

79
CHAPTER 5

EVALUATION

“The most serious mistakes are not being made as a result of wrong answers. The truly

dangerous thing is asking the wrong question.” – Drucker [2010]

Nowadays, grammar checkers are widely available as part of word processors

or as standalone components. But there is still a considerable room for improve-

ment in their error handling abilities. In order to quantify any improvement, we

need to devise a methodology for evaluating the effectiveness and acceptability

of a grammar checker. Over the last few years, most of the studies regarding

grammatical error detection and correction have been focused towards the design

and development aspects but relatively less attention has been directed towards

evaluation issues [Leacock et al., 2010; Chodorow et al., 2012]. The current work

attempts to address this gap by taking a fresh look at standard measures and moti-

vating the need for a finer grained evaluation based on a detailed characterization

of the complexity of sentences.

5.1 Challenges

An obvious way to compare two or more grammar checkers is to test them on

same test set and compare the results. But due to the lack of substantially large

standard test corpora, comparison among existing grammatical error detection

and correction approaches is presently hindered . Moreover, direct comparison of


existing approaches is not possible since most of the available approaches mainly

focus on some specific types of grammatical errors [Leacock et al., 2010]. Moreover,

these approaches are tested on different test sets which also vary in size and error

density. Moreover, different researchers address different evaluation metrics for

same types of errors [Tetreault et al., 2010; Dickinson et al., 2011; Rozovskaya

and Roth, 2011]. Sometimes, different metrics have been used as different aspect

of the same task. For example, in [Han et al., 2010] performance of omission

type prepositional errors correction were reported in term of accuracy whereas

performance of extraneous and replacement type preposition errors correction

were reported using Precision and Recall. Some researchers [Park and Levy, 2011]

preferred BLEU [Papineni et al., 2002] and METEOR [Lavie and Agarwal, 2007]

as evaluation metrics for their grammar correction task. However, Chodorow

et al. [2012] recommended to report True Positive (TP), False Positive (FP), False

Negative (FN) and True Negative (TN) in addition to any metrics derived from

them so that any reader can calculate other measures that the authors of a particular

paper did not choose to include. It would thus be a worthwhile enterprise to look

into new possibilities in the evaluation process. In this chapter, we have introduced

a novel methodology for evaluation of grammar assessment (MEGA) to measure

the acceptability of the grammatical error detection and correction system and to

circumvent the need of gold standard test corpora during comparison among the

systems targeting different types of errors. MEGA has been applied on our Bangla

grammar checker based on NLG approach. Since direct comparison between

existing English grammar checkers and the NLG based Bangla grammar checker

is not possible, the NLG based system has been compared against a prototype

Bangla grammar checker based on standard Naïve Bayes classification.

In the next section we will discuss on different evaluation metrics.

81
Table 5.1: Evaluation Measure Formulae

Metrics Formulae
TP
Precision TP+FP
TP
Recall TP+FN
2∗(Precision∗Recall)
F1-Score (Precision+Recall)
(TP+TN)
Accuracy (TP+TN+FP+FN)

5.2 Standard Evaluation Metric

Efficiency of grammatical error detection and correction system is usually mea-

sured by metrics like Precision, Recall, F-Score and Accuracy. These measures

generally indicate how often grammatical incorrectness is rejected and how often

grammatical correctness is accepted. Table 5.1 shows the definition of various

evaluation measures. True Positive (TP) means whenever machine’s judgement

corresponds to the manual judgement. In this context, TP occurs if an error exists

in the text and the grammar checker rightly detects that error. False Positive (FP)

occurs when the system identifies existence of an error even when there is no such

error in the text. False Negative (FN) occurs when the system misses an error in

the text. True Negative (TN) occurs when the system correctly identifies absence

of errors in the text. Table 5.2 shows these relationships with respect to grammati-

cal error detection task. Now we shall discuss some important aspects regarding

preparation of test suite which is central to the evaluation process.

82
Table 5.2: True Positive, False Positive, False Negative and True Negative with
respect to grammatical error detection task.

Grammatical Errors
(Condition)
Present Absent

(Negative) (Positive)
Not Found Found
Error Detection

True Positive False Positive


(Test)

False Negative True Negative

5.3 Test Suite

Most often, test suites for grammar checkers are prepared by using a set of well-

formed sentences and a set of ill-formed sentences. A test suite of well-formed

sentences is prepared from a collection of proof read and edited sentences which are

easily available from online newswire. The reasons behind preferring newswire is

to avoid the skewed distribution of data, since newswire has a good representation

of data from diverse domains. A test suite of ill-formed sentences should ideally

cover a wide range of the sentences some having single errors and many others

that have multiple types of errors. However, sufficient numbers of ill-formed

sentences are not easily available. Manual creation of fully annotated learners’

error corpora is quite expensive, time consuming and non-trivial task. To avoid the

problem of creating a corpus of manual errors, one can synthetically generate error

corpora to simulate real errors (as discussed in Chapter 3 ).There has been previous

works [Foster and Andersen, 2009; Lee et al., 2011] on using synthesized error data

which indicates artificial error corpora can be a valid source of evaluation. Due

to unavailability of standard test corpora, one solution is to test the system with

83
different domains having different structural complexity and hardness of errors1 .

For this reason, we have categorized our test corpora across axes like domains

(examples include Business, Politics, Sports, Literature and Health), structural

complexity (like simple sentence, complex sentence and compound sentence) and

types of errors and their proportions. Our error corpora contains 66% post position

error, 29% noun inflectional error, 3% determiner error and 2% combined error.

The proportion is selected depending on the probability of actual errors found in

learners’ writing.

5.4 Evaluation Methodology

Evaluating a grammatical error detection and correction system requires various

criteria, such as output quality, maintainability and user satisfaction. However,

satisfying all of them at the same time is quite difficult. We present evaluation of

Bangla grammatical error detection and correction systems using both standard

metrics like Precision, Recall and F-Score. We propose two new metrics, namely the

Graded Acceptability Assessment Metric (GAAM) and Complexity Measurement

Metric (CMM). GAAM measures the acceptability of the system whereas CMM

circumvents the need of gold standard test corpora during comparison among

different grammar checkers targeting different types of errors.

5.4.1 Evaluation using Standard Metrics

We have compared our NLG based system with another grammatical error detec-

tion system that uses a Naïve Bayes classifier. The Naïve Bayes classifier follows the
1
The term “hardness of errors” indicates the complexity of grammar correction due to presence
of errors in the sentence.

84
method reported in Golding [1995]. Four features, namely, word-word, word-tag,

tag-word and tag-tag sequences have been used in this classification algorithm.

The classifier has been trained on 4,68,582 well-formed Bangla unicode sentences

and same number of ill-formed sentences. The well-formed sentence collection

procedure has been elaborated in section 3.2. Ill-formed sentences are generated

by inserting errors into a corpus of correct text using combination of confidence

score estimator and mal-rule filter (see Chapter 3).

Error detection performance of the NLG based grammar checker has been eval-

uated on a predefined set of 1500 well-formed sentences and 1500 automatically

generated ill-formed sentences. The Naïve Bayes classifier is tested using the same

test sentences which were used for NLG based system and the true acceptance

and false rejection rates are also verified. Figure 5.1 shows the performance of

these two error detection approaches. A comparison of these two error correction

models are shown in figure 5.2. The synthetically generated ill-formed sentences

Figure 5.1: Performance of error detection

85
Figure 5.2: Performance of error correction

have been divided into some subcategories, so that each subcategory contains spe-

cific types of errors like post positional errors, determiner errors and case marker

errors. We have also tested the NLG based system on individual subsets as well

as on the total set. Details of the proportion of errors were mentioned in section

5.3. Table 5.3 shows the performance of the NLG based system on different types

of domains as well as different types of errors.

5.4.2 Graded Acceptability Assessment Metric:

We have introduced a novel Graded Acceptability Assessment Metric (GAAM)

for evaluating the acceptability of grammar checkers. Here, we have performed

a blind testing to evaluate the acceptability of the system’s outputs. In blind

testing, we only provide the system’s output (suggestion) to two testers having

86
Table 5.3: Performance evaluation of NLG based system on individual errors as
well as combined errors in five text genres. P indicates Precision and R
indicates Recall.

Error Types Business Health Sports Literature Politics


Post Position P= 84.83 % P= 86.13% P= 81.82% P= 66.19% P= 84.36%
R=83.64% R=85.3% R=81.04% R=64.65% R=82.40%
Determiner P= 66.67% P= 75% P= 42.86% P= 50% P= 57.14%
R=44.44% R=42.86% R=27.27% R=33.33% R=33.33%
Case marker P= 87.14% P= 82.86% P= 88.06% P= 79.71% P= 86.96%
R=82.43% R=81.69% R=85.51% R=76.39% R=81.01%
Combined P= 70% P= 72.73% P= 72.73% P= 63.63% P= 72.72%
R=53.85% R=61.54% R=47.06% R=46.67% R=53.33%

Table 5.4: Grading Scale

0 Not acceptable
1 Acceptable with difficulty
2 Fully acceptable.

no knowledge of the input, whereas in the open testing the input sentences are

provided. We have consciously avoided open testing and subsequent comparison

between blind and open testing, as our intention was not to investigate the bias of

the testers in presence of the inputs. In blind testing testers are requested to grade

the output sentences in a three level grading scale (0, 1 and 2) depending on their

acceptability. Grading scale is defined as table 5.4. Depending on the user’s grade,

acceptability of the system is calculated by the following formula.


s=N
µs
s=1
Acceptability of the system φ = ∗ 50% (5.1)
N

87
where N is the number of test sentences and µs is the Mean acceptability grade for

each sentence calculated as



e=n
G
e=1
µs = (5.2)
n

Here G is the grade (within 0, 1 and 2) given by evaluator e and n is the number

of evaluators. Using this formula, we found GAAM score of our system’s output

is 80.26 % tested on 1000 synthetically generated ill-formed sentences. Figure

5.3 shows the result of blind testing. We have found that the acceptability of

Figure 5.3: Grades given by tester-1 and tester-2 in blind testing

the NLG based system’s output is 80.26%. Now the question is, does the score

remain same after testing on large number of ill-formed sentences? To answer this

question, we have done a statistical significance test. Using t-test we found that

the acceptability of system’s output lies within the confidence interval [80.21 ±

1.17] with 95% confidence. We have also calculated the inter-annotator agreement

by kappa statistic [Cohen, 1960; Fleiss, 1981]. Using Cohen’s kappa we get the

kappa score between two testers as k=0.34. Agreement between two testers is

88
shown as a radar graph in figure 5.4. Each axis in the graph represents a system’s

suggestion corresponding to an erroneous sentence chosen randomly from a set

of 38 sentences. The plot is guided by the acceptability score for those system’s

suggestions provided by individual tester.

Figure 5.4: Agreement between two testers during manual evaluation

5.4.3 Complexity Estimation of Grammar Correction

Very often, comparison between available grammar checkers is not possible due

to unavailability of a common test set. Moreover, if two grammar checkers are

developed for two different languages, then performance of these systems cannot

be compared as there is no question of the common test set. To get rid of this

problem, we are proposing a novel Complexity Measurement Metric (CMM) by

which it will be possible to compare two grammar checkers developed for different

languages without the need of a common test set. Our approach is to estimate the

complexity of the grammar correction for a given input test data then find out the

correlation between the performance of the grammar checker and the complexity

89
value of the test data. Our hypothesis is that, these correlation values will indicate

how well a grammar checker performs for a given complex test data. Thus even

if two test set data are not similar but have same complexity value then we can

compare the performance of two system depending on the complexity of the

grammar correction problem. A significant research challenge is to estimate the

complexity of a grammar correction problem in the context of a given erroneous

test sentence. A first step would be to identify the important features that increase

complexity in the text. On the surface, this problem has some resemblances to the

problem of estimating readability of text [McCallum and Peterson, 1982; Kim et al.,

2012; Collins-Thompson et al., 2011; Heilman et al., 2008; Collins-Thompson and

Callan, 2004, 2005; Collins-Thompson, 2011]. In the context of Bangla sentences,

readability estimation has been explored by [Sinha et al., 2012]. Some features

proposed for assessing readability may be utilized in complexity estimation of

grammar correction under the basic premise that text which is harder to read will

be harder to correct. This makes sense when we observe that in the process of

manually correcting a piece of erroneous text, we first try to understand roughly

the meaning of that text depicted by the word sequence and then attempt to

place words in particular positions of the text so that the meaning of the text

can be properly conveyed. Sentences that are complex to read are often hard to

understand and are more complex to correct. We thus surveyed readability and

lexical richness estimation metrics proposed till date, like Flesch Kincaid Reading

Ease, Gunning Fog index [McCallum and Peterson, 1982], Smog [McLaughlin,

1969], Lix, Rix, Yule’s characteristic [Yule, 1944], Simpson’s Index, Guiraud Index

[Daller, 2010] and Uber Index etc. However, not all features used in the problem of

estimating readability of text [Sinha et al., 2012; McCallum and Peterson, 1982; Kim

et al., 2012; Collins-Thompson et al., 2011; Heilman et al., 2008; Collins-Thompson

90
and Callan, 2004, 2005; Collins-Thompson, 2011] are directly applicable in our

case as we are dealing with erroneous text. As a result, we have introduced new

features that have been explained in the next subsection.

Feature Set of Complexity Estimation

Complexity of text occurs mainly for two reasons. Firstly, a sentence might not

contain enough information to convey the concept behind the sentence. Secondly,

a sentence might contain lots of information that increases the cognitive load to

decode the intended meaning. Complexity of text can be classified as syntac-

tic complexity and cognitive complexity. Syntactic complexity reflects elements

such as sentence length, amount of embedding, and range and sophistication of

structures [Lourdes, 2003; Bachman, 1976]. Here we will concentrate only on the

syntactic complexity and will define syntactic/lexical features to measure cognitive

load indirectly. Consider the following features responsible for text and grammar

correction complexity.

Presence of Comma: Presence of comma contributes to the overall readability of a

sentence [Hill and Murray, 1998]. Presence of comma in the proper place in a

sentence can lead to faster reading times and reduces the need to re-read the

entire sentence. Commas also help to reduce problems arising from ambi-

guities; the “garden path effect” [Ferreira et al., 2001] can be greatly reduced

if commas are correctly present after introductory phrases and reduced rela-

tive clauses [Israel et al., 2012]. Several studies have provided evidence that

readers experience difficulty when they read “garden path sentences” like

“The old man the boat”. A “garden path sentence” [Pazzani, 1984]is one that

is exceptionally hard for the reader to parse. Presence of comma in proper

91
place can decrease the complexity of text.

Multiple Parts of Speech of a Single Word: In most of the languages a particular

word can have different POS. Generally, when a person reads a sentence

the user builds up a likely meaning for each word and a meaning for the

whole sentence word by word. At the time of sentence processing if a word

appears that changes the meaning of the sentence, the user switches to the

new meaning and continues. If a word has multiple POS tags and a tag

which is infrequent is used in the sentence then it increases the complexity

of the sentence. For example, in the sentence “The complex houses married and

single soldiers and their families” is complex to understand for second language

learners of English. This is because the word ’houses’ is used as a verb here,

which is infrequent as opposed to its use as a noun.

Syntactic Structure: If thematic roles in a sentence deviate from usual agent (do-

er) before patient (do-ee) order, then patient increases cognitive load and

thus increases sentence complexity. For example:

Simpler: The man who killed the Tiger · · ·

Complex: The Tiger whom the man killed · · ·

A reversible passive sentence like “The little rat is chased by the big cat.” is

complex than “The big cat chases the little rat”. Sentence complexity using

syntactic pattern can be defined as :

#LV(s) + #LN(s)
Sentence Complexity (s)=
#Clauses(s)

Where #LV(s) and #LN(s) are the number of verbal and non verbal links

(i.e. Verb phrase, Noun Phrase) and #Clauses is the number of clause in

the sentence [Basili and Zanzotto, 2002]. Coordinating conjuncts increase

92
the complexity because relationships between clauses are not always used

explicitly in the sentence.

Metaphor: Metaphor is an important feature that is responsible for text complexity

found mostly in the literature domain. One can detect metaphor by bigram

analysis of noun-verb agreement. If P(Common Noun | Verb) is less than

some predefined threshold then it can be considered as a metaphor. For

example, “He planted good ideas in their minds.” Here the verb ‘planted’

acts on the noun ‘ideas’ and makes the sentence metaphoric. Generally in

corpus the object that occurs more frequently with verb ‘planted’ are ‘trees’,

‘bomb’ and ’wheat’ etc [Krishnakumaran and Zhu, 2007].

Lexical Density: Psycholinguistic studies have long shown that less densely packed

texts are more easily comprehended, particularly among non-proficient read-

ers. Lexical density is a measure of the ratio of different words to the total

number of words in a text [McCarthy, 1986]. In earlier work [Bradac et al.,

1977] it has been seen that there is a correlation between low lexical density

and comprehension test scores.

T-Unit: T-Unit is an important feature responsible for text complexity. T-Unit is

the “shortest grammatically allowable sentence (writing can be segmented)

or minimally terminable unit” [Hunt, 1965; Sachs and Polio, 2007]. T-Units

which are longer (number of words) and have more subordinate clauses are

more complex [Robb et al., 1986]. A simple sentence or a complex sentence

consists one T-Unit, while a compound sentence consists of more than one

T-Unit [Gaies, 1980]. For example: The Sun rose. The fog dispersed. The general

determined to delay no longer. He gave the order to advance. Here number of

T-Unit is 4. Mean T-Unit length = (Number of Words/T-Unit)=(19/4)=4.75

93
When the above sentence is written as, At Sunrise, the fog having dispersed,

the general, determined to delay no longer, gave the order to advance. Then the

number of T-Unit is 1 and mean T-Unit length is =(18/1)=18.00. It is quite

obvious that the second sentence having greater mean T-Unit length is more

complex than the first sentence.

Abstractness: Less frequent (i.e. unfamiliar) words and words that represent

abstract ideas increase text complexity because the presence of such words

require greater level of interpretation to understand the intended meaning.

Pronominal Reference: A pronoun always points to a noun or a clause in the

sentence to indicate a reference. As pronouns are used as references and in

many cases the references of the pronouns cross sentence boundaries, if the

sentence starts with a particular pronoun then it is often difficult to identify

the context on which that pronoun was used.

Confusion Set: Confusion set is an important feature for estimating grammar

correction complexity. Context window surrounding erroneous words and

the set of possible corrections will be referred to as the confusion set hence-

forth. Consider a sentence S=w1 w2 · · · wi XCYw j · · · wn . Where C is a confu-

sion set such that C = {c1 , c2 , · · · , cn } from where a particular word ci need

to be placed to make the sentence correct. X and Y are the left and right

context windows of C. We will say that the given sentence is complex if

count(Xci Y) = count(Xc j Y) + θ

Where ci , c j and {ci , c j } ∈ C and 0 ≤ θ ≤ 100 . Complexity of the sentence

increases as the value of θ decreases and the size of the context window

(size of X and Y) and the size of the confusion set C increases. For example,

the English sentence “Ram is going C market” is not complex when C= {to,

94
at}, X={going} and Y={market}, because θ is very large as frequency(going to

market) ≫ frequency(going at market) in general English corpus. Manual in-

vestigation also depicts that complexity of grammar correction also increases

with less number of words (especially nouns) in the left side of C and large

number of words in the right side of C. It has been seen that if we have a

sentence like S = {X = pronoun}CYw j · · · wn where n > 10 then it is very much

difficult to find out proper ci . We also need to know the previous context

of that sentence in order to find out the pronominal reference that the pro-

noun is pointing to. Presence of multiple errors that have mutual influences

increase the comprehensibility of the sentence and in turn creates difficulty

at the time of grammar correction.

Other factors also can increase the complexity of the sentences like sentence length,

presence of idiomatic expressions, figurative use of words and assimilation of for-


2
eign words and phrases in the source text. A hyperbole and understatement
3
also increases the complexity. The variations of such features create different

levels of complexity in different domains. Thus we have collected a set of features

F = { fi , fi+1 , · · · , fi+n , f j , f j+1 , · · · , f j+m } where the fi are the features responsible for

readability of text and the f j features are responsible for severity of errors in text.

Table 5.5 shows features used for grammar correction complexity estimation.

Using these features we have design a multiple linear regression model as shown

below:

α0 + (αi fi + αi+1 fi+1 + · · · + αi+n fi+n ) + (β j f j + β j+1 f j+1 + · · · + β j+n f j+n ) = Ω

where {α0 , αi , αi+1 , · · · , αi+n } and {β j , β j+1 , · · · , β j+n } are the parameters of the multi-
2
A hyperbole is an exaggeration which is used to put emphasis on the statement, and is usually
not intended to be taken literally.
3
An understatement is a phrase that minimizes the content of the message and represents much
less than it really is.

95
Table 5.5: Features for estimation of grammar correction complexity

Readability of Text
Number of word per sentence
Number of punctuation per sentence
Number of conjunction per sentence
Number of discourse marker (Example: “like”, “how”, “as”)
indicating reason, confirmative or concessive subordinate per sentence
Number of words having more than 7 or more letters per sentence
Number of pronouns per sentence
Number of coordinating conjuncts per sentence
Number of infrequent words (unigram count in corpora less than 50) per sentence
Severity of error
Number of errors per sentence
Number of errors per sentence influencing to each other
Length of confusion set C
Value of θ when count(Xci Y) = count(Xc j Y) + θ and 0 ≤ θ ≤ 100
Number of words in left side of C
Number of words in right side of C

96
Table 5.6: Complexity Score in different complexity level

Level of Complexity Numerical Value


Very easy 0-25
Easy 26-50
Complex 51-75
Very Complex 76-100

ple linear regression that need to be learned during training process and Ω is the

complexity score.

To build the training data we collected 1000 different sentences from different

domains having different level of complexity of readability. Then we syntheti-

cally induced errors in those sentences. The resultant erroneous sentences had

different level of error density. Furthermore, we also tried to ensure that sen-

tences also contain the features as described in table 5.5. Then we defined the

complexity score for four levels as “Very easy”, “Easy”, “Complex”, and “Very

complex”. Thereafter, these erroneous sentences were given to two language

experts and two native speakers for correction. We also requested them to en-

ter a complexity score (see table 5.6) according to the difficulty that they faced

during correction of those sentences. Then the proposed multiple linear regres-

sion model was trained on this training dataset and the values of the parameters

α0 , αi , αi+1 , · · · , αi+n , β j , β j+1 , · · · , β j+n } were estimated. After learning the parame-

ters of the multiple linear regression, we estimated the complexity score of five

text domains (business, health, sports, literature and politics) each containing 500

erroneous sentences. We observed that the Relative Error of the multiple linear

regression model is 0.39. Relative error is calculated as follows:

1 ∑ ActualScorei − PredictedScorei
|N|
RelativeError = | |
|N| i=1 ActualScorei

97
here |N| is the number of test sentences. ActualScorei and PredictedScorei are the

actual complexity score given by user and predicted complexity score by the model.

Feature Selection

While trying to analyze the cause of poor performance, we found that there are

some irrelevant and redundant features. High dimensional features increase the

computational cost and irrelevant features hamper the accurate prediction of com-

plexity. So, there is a requirement for reduction of dimensionality by filtering the

irrelevant and redundant features. But manual identification of important features

from a large number of features is practically not feasible. Correlation analysis was

performed on this training data with two objectives. First, to identify the set of

redundant attributes by identifying features with a high correlation between them.

Secondly, to find out features which are more relevant for a particular complexity

value by looking at the correlation values between the target variables and the fea-

tures. Those features having a low correlation (-0.1 to +0.1) between features and

the complexity score have been removed, on the assumption that those features

will not be contributing for setting the model for estimating grammar correction

complexity. However, following this relevant feature selection procedure, the rel-

ative error of our multiple linear regression model becomes 0.36. The reasons

behind this are inadequate training data and lack of more refined linguistic fea-

tures. Moreover, the multiple linear regression model only provides a complexity

score but is unable to comprehensively explain the factors regarding features that

contribute to the text complexity. To address this issue, a framework based on

the idea of active learning has been employed for bettering our estimate of the

complexity of the text.

98
There are active learning frameworks already in place like PROTOS [Bareiss

et al., 1990; Clark, 1987] which has been used in the field of medical diagnostics

to learn interactively from a domain expert to classify events. The system re-

tains the guided learning cases and also the causes of failures and the associated

explanations for those specific cases. We have followed the PROTOS architec-

ture for active learning of grammar correction complexity for better generaliza-

tion because of the need to elicit knowledge from an expert user and to provide

a language specific feature that may benefit from guided explanation from lin-

guists. We have used the k-Nearest Neighbour algorithm (k-NN)[Mitchell, 1997]

for following PROTOS framework to estimate grammar correction complexity

of a given input text. Initially the example-base contains examples in the form

[< f1 : v1 >, < f2 : v2 >, · · · , < fn : vn >, ci ], where fi is the attribute of the fea-

ture, vi is its value and ci is the complexity score of a sentence involving these

features. For a given English sentence “Ram *go to market”, an example may look

like [< Num_o f _words : 4 >, < Num_o f _preposition : 1 >, < Num_o f _Error : 1 >, <

Num_o f _in f requent_words : 1 >, 10]. Here * indicates the erroneous word and the

number 10 indicates the complexity of the sentence. At the time of training of the

system, sentences of different complexity are provided to the user in a multiple

choice questions (MCQ) format. Then the user provides his correction and com-

plexity value of sentence consulting table 5.6. Depending on the extracted feature

values from the sentence and using the k-NN algorithm this system estimates the

complexity score of the same sentence. Given this setting, the following situations

are possible.

Situation 1: User’s selected option is correct and user’s complexity score and

system estimated complexity score is not same.

99
Situation 2: User’s selected option is incorrect and user’s complexity score and

system estimated complexity score is not same.

If the user’s selected option is incorrect and complexity score provided by the user

is very low and the score does not match with system generated score then system

will not ask user to supply an explanation regarding complexity of the given

input sentence. In such situation, it is assumed that user is not confident enough

to guide the system to take better inference as a part of his interaction. Other

than the above mentioned situation, whenever the complexity score provided by

the user and the one estimated by the system varies, the interaction based active

learning procedure starts. In this case, the system provides explanations of its

decision in the form of common features between the given input sentence and

the nearest example from the example-base selected using the k-NN algorithm. It

also provides the extra features in input sentence not present in nearest matched

example or vice versa. The user then selects or adds the features that contribute

to the complexity of the given sentence. After receiving the user’s feedback, the

system generates a new example with the selected and the new features. The new

example is inserted into the example-base if it is not present there. The system

also remembers a link between the nearest example provided by the k-NN and

the new example generated depending on the user’s feedback, so that whenever

in the future this nearest example is selected, then the system will automatically

map it to the newly generated example. The proposed active learning procedure

is shown in Algorithm 1.

The screenshot of the active learning application prototype is shown in figure

5.5. User can provide a class name (like “Very Easy”, “Easy”, “Complex” or “Very

Complex”) instead of entering specific complexity score. Then the system’s gen-

100
Algorithm 1 Algorithm for estimation of grammar correction complexity using
Active Learning
Require: UsrComScore, UsrSel, MCQ, ExampleBase
{MCQ : Multiple Choice Question}
{UsrComScore : Complexity score provided by the users}
{UsrSel : Selection of user for MCQ from available option}

Query_Example ← extract features from MCQ


Best_Match_Example ← k-NN(Query_Example,ExampleBase)
SysComScore ← Complexity score of the best match example
if UsrComScore < 50 and UsrSel is incorrect then
User_Con f idence ← low
else
User_Con f idence ← high
end if
if User_Con f idence = high and UsrComScore , SysComScore then
Present common feature and extra feature between Query_Example and
Best_Match_Example
Ask user to select/add those features that contribute to the complexity of the
sentence
UsrSelFeatures ← Features given by the user
Create New_Example using UsrSelFeatures and UsrComScore
New_Example ← <UsrSelFeatures, UsrComScore>
if New_Example < Example_Base then
Example_Base ← Example_Base ∪ New_Example
end if
end if

101
erated complexity score will be mapped to one of these complexity class. It is seen

that relative error of the proposed active learning model is 0.16 which is much

less than that obtained using multiple regression when tested on the same dataset.

Figure 5.6 shows complexity score obtained over 10 trials of each of the five do-

mains. In each trial, we have randomly selected 50 sentences from 500 erroneous

sentences of each domain and computed the average complexity score using our

active learning based model. From the complexity score shown in figure 5.6, it is

apparent that complexity of the literature domain is higher than any other domain

considered here. This is expected, since figurative uses of words are common in

this domain, and nouns are ornamented with adjective and intensifiers. Rhetori-

cal structures are usually found in sentences of literature domain. Idiomatic and

colloquial patterns are used more than any other domain considered here. Some-

times phrases of foreign language are present as a part of the source language in

its original orthographic representation or transliterated according to the source

language. These patterns are mostly common in dialog sentences of literature

domain. Unfamiliarity of the foreign language increases the complexity of text.

Uses of Informal style in text sometimes involve region dependent slang that is not

found in the other domains considered. Unfamiliarity of region dependent slang

due to sociocultural variation of readers increases the complexity of the text. Ap-

pendix D shows variation of sentences found in Bangla literature domain in MCQ

format that appears complex to the Bangla second language learners and to the

Bangla native speakers as well. In figure 5.7, we have shown POS tag distribution

of five domains (business, health, sports, literature and politics). It is apparent that

probability distributions of punctuation (RD_PUNC), quotative (CC_CCS_UT),

subordinate (CC_CCS), personal pronoun (PR_PRP) and wh-pronoun (PR_PRQ)

are appearing with higher frequency in literature domain than any other domain

102
Table 5.7: Correlation of complexity score with grammar checkers accuracy

Naïve Bayes binary classifier NLG based Grammar Checker


Pearson’s coefficient
of Correlation (r) -0.91 -0.87

considered. Punctuations like multiple commas (,), semicolon (;) in a sentence

represent that the sentence is a complex sentence according to the syntactic struc-

ture. Quotative sentences contain multiple phrases in a combination of direct and

indirect phrases. In most of the cases, wh-pronouns of a sentence points to a noun

or clause beyond the boundary of that particular sentence. Figure 5.8 shows the

distribution of infrequent words of five domains (business, health, sports, liter-

ature and politics). In this figure we can see that the distributions of infrequent

words are more in the literature domain than any other domains considered.

Figure 5.9 shows the complexity scores obtained from 500 erroneous sentences

of each domain and the respective accuracies obtained by our NLG based grammar

checker.

The Pearson’s correlation [Mangal, 2012] coefficient (r) of the complexity score

against grammar correction accuracies obtained by Naïve Bayes classifier and

the NLG based grammar checker is shown in Table 5.7, which shows a strong

negative linear correlation of complexity scores with accuracies achieved by the

two systems. Thus both classifiers have low accuracies when the complexity is

high, and vice versa. This strengthens the case for the robustness of the proposed

complexity measure.

103
Figure 5.5: Screenshot of active learning framework for estimation of text com-
plexity. The explanation of the feature names are available at http:
//nlp.cdackolkata.in/testComplexity/FeatDtl.spy

104
Figure 5.6: Complexity values across different datasets

Figure 5.7: POS Tag distributions in different domains.

105
Figure 5.8: Frequency of word distribution across different domains.

Figure 5.9: Complexity measure and Precision score obtained by NLG based gram-
mar checker and Naïve Bayes classifier systems.

106
CHAPTER 6

CONCLUSIONS AND FUTURE WORK

As a part of conclusion, this final chapter summarises our contributions and scope

of the future work. The aim of the thesis was to represent a technique to detect

and correct grammatical errors in a morphologically rich and free word order

language like Bangla. At the phase of initiation, we faced challenges regarding

the unavailability of large sized error corpus, robust parsers, insufficient linguistic

rules and lack of standard evaluation metric of grammatical error detection and

correction. Here we have proposed a methodology for automatic creation of

synthetic error corpora by combining statistical and linguistic knowledge. This

has been done using a combination of a confidence score estimator and a mal-rule

filter to introduce errors into a corpus of correct text. These synthetic corpora

have been utilized during evaluation. A similar approach can also be adopted

for generation of synthetic error corpus in other Indian languages where such

resources are not available as of now. A novel NLG based approach has been

proposed for automatic detection and correction of Bangla grammatical errors.

The NLG based approach has been used instead of NLU based approach to avoid

the complexity and ambiguity of the grammar for parsing and also to circumvent

the need for modeling ungrammatical sentences. The proposed approach not only

corrects the mistakes committed by the users but also provides relevant examples

for supporting its correction. It also estimates the complexity of the grammar

correction task so that the user can be informed about the system’s confidence.

An active learning based complexity estimation has been used to estimate the
complexity of grammar correction. The NLG based approach reported here can

also be applied on other Indian languages to build robust grammar checkers.

We have also proposed a Methodology for Evaluation of Grammar Assessment

(MEGA) combining a Graded Acceptability Assessment Metric (GAAM) and a

Complexity Measurement Metric (CMM). GAAM employs a three level grading

scale to calculate the acceptability score depending upon the judgment of the

human evaluators having three options viz. a) not acceptable b) acceptable with

difficulty c) fully acceptable. CMM is being introduced to estimate the complexity

of test data and to find out the correlation between complexity value and accuracy

of the system tested on that data. To provide better generalization and explanations

of language specific features, CMM follows active learning methodology with

expert user interaction.

6.1 Contributions

The major contributions in this research are as follows:

• Automatic creation of Bangla error corpora to mimic real world errors.

• NLG based grammar correction methodology for Bangla language.

• Active Learning based complexity estimator for reliable grammar correction.

• Graded Acceptability Assessment Metric (GAAM) and active learning based


Complexity Measurement Metric (CMM) for evaluation.

Other relevant contributions of the thesis are as follows:

• Developed a web based grammatical error detection and correction sys-


tem for Bangla language. It is available at http://nlp.cdackolkata.in/
nlpcdack/GrammarChecker.

• A detailed survey of grammatical error detection and correction.

108
• Taxonomy of Bangla grammatical errors.

• Bangla Language Model and learning based on N-gram.

• A broad coverage of compilation of references on different work in English


and other languages for grammatical error detection and correction task.

• A resource of well-formed sentences and ill-formed sentences from different


text genre and a test suite of sentences having different complexity levels for
testing Bangla grammar checker.

• Other resources like echo words, hyphenated words also collected automat-
ically at the time of automatic synthetic error corpora creation that may help
in other research of Bangla language analysis.

6.2 Future Works

As part of future work, a more principled approach needs to be devised to correct

grammatical errors with better selectional restrictions. A closer investigation on

Bangla Second Language Learners data is required to find the patterns of their

errors in order to improve the performance of the system. Instead of the bigram

language model, a weighted higher order n-gram based linear learning model with

± k context window can be designed to provide better solution. The improvement

of our NLG based system relies on some basic NLP components like POS tagger,

Morphological analyser and Morphological synthesizer, those components needs

to be improved in future. Finer linguistic constraints need to be defined to prune

the search space and to improve the processing speed of the system. As Bangla is

a free word order language [Bhattacharya et al., 2005; Dandapat et al., 2004], CFG

based parsing framework has limitation [Shieber, 1985; Begum et al., 2008; Bharati

et al., 2010] for analysing Bangla sentences. Hence, a dependency based parsing

[Nivre, 2005; Popel et al., 2011; Nivre, 2008; Zhang and Nivre, 2011; Chen et al.,

2012, 2010, 2011] framework can be used with our proposed NLG based system for

109
better analysis of input sentences specially for checking subject-verb agreement.

Moreover, a principled approach needs to be devised for sampling the auto gen-

erated error corpus in the boundary cases and also to ensure that automatically

generated error sentences will mimic the naturally occurring learners’ errors. A

statistical classifier can make use of active learning to bootstrap the corpus creation

process. The parameters in our regression model need to be examined more closely

to have insights into which features are more central in determining complexity.

At a later stage, we may also need to study interactions between the features more

closely.

110
APPENDIX A

Examples of some interesting erroneous sentences

corrected by the system.

Sl No Incorrect Sentence Predicted Correction


1 িনেচর আেলাচনা সে ব াপারটা জানা যােব ৷ িনেচর আেলাচনা থেক ব াপারটা জানা যােব ৷
nichera aalochanaa sa Nge byaapaaraTaa Nichera aalochanaa theke byaapaaraTaa
jaanaa Jaabe — jaanaa Jaabe —
2 এই অসুিবধা দূর করার জন পণ িবিনমেয় ধাতুখ চলন হয় ৷ এই অসুিবধা দূর করার জন পণ িবিনমেয় ধাতুখে র চলন হয় ৷
ei asubidhaa duura karaara janya paNya bini- ei asubidhaa duura karaara janya paNya bin-
maYe dhaatukhaNDa prachalana haYa — imaYe dhaatukhaNDera prachalana haYa

3 খুব একজন হইচই করেত চাইেছ না পুিলশ ৷ খুব একটা হইচই করেত চাইেছ না পুিলশ ৷
khuba ekajana haichai karate chaaichhe naa khuba ekaTaa haichai karate chaaichhe naa
pulisha — pulisha —
4 জনসংখ ার অিধকাংশই আবার াম বাস কের ৷ জনসংখ ার অিধকাংশই আবার ােম বাস কের ৷
janasa.nkhyaara adhikaa.nshai aabaara janasa.nkhyaara adhikaa.nshai aabaara
graama baasa kare — graame baasa kare —
5 তঁােদর জন এক জন ছুিটরা িছেলন । তঁােদর মেধ এক জন ছুিটেত িছেলন ।
taa.Ndera janya eka jana chhuTiraa chhilena taa.Ndera madhJe eka jana chhuTite
chhilena
6 পুরসভা েলাই জন তা পেত বশ কেয়কবার সদরখানােক আসেত পুরসভা থেক তা পেত বশ কেয়কবার সদের আসেত হয় ।
হয় ।
purasabhaaguloi janya taa pete besha purasabhaa theke taa pete besha kaYek-
kaYekabaara sadarakhaanaake aasate haYa abaara sadare aasate haYa
7 বৃি টােক হার ৫০ বিসস পেয় জন ৭৫ বিসস পেয় । বৃি র হার ৫০ বিসস পেয় থেক ৭৫ বিসস পেয় ।
bRRiddhiTaake haara 50 besisa paYenTa bRRiddhira haara 50 besisa paYenTa theke
janya 75 besisa paYenTa 75 besisa paYenTa
8 ওই সারিণ অনুযায়ী নিট রােত হাওড়া নীেচ ছেড় মালদহ শেন ওই সারিণ অনুযায়ী নিট রােত হাওড়া থেক ছেড় মালদহ শেন
পঁৗছােব পর িদন ৮টা ১০ িমিনেট ৷ পঁৗছােব পর িদন ৮টা ১০ িমিনেট ৷
oi saaraNi anuJaaYii TrenaTi raate haao.Daa oi saaraNi anuJaaYii TrenaTi raate
niiche chhe.De maaladaha sTeshane haao.Daa theke chhe.De maaladaha sTeshane
pau.Nchhaabe para dina 8Taa 10 miniTe pau.Nchhaabe para dina 8Taa 10 miniTe —

APPENDIX B

Examples of incorrect prediction by the system.

Sl No Original Sentence Predicted Correction


1 বুধবােরর ঘটনার িববরণ িদেয় ওই ৬ জেনর এক বুধবােরর ঘটনার িববরণ থেক ওই ৬ জেনর এক
জন বেলন াম পাহারা িদি লাম ৷ জন বেলন াম পাহারা িদি লাম ৷
budhabaarera ghaTanaara budhabaarera ghaTanaara
bibaraNa diYe oi 6 janera eka bibaraNa theke oi 6 janera eka
jana balena graama paahaaraa jana balena graama paahaaraa
dichchhilaama — dichchhilaama —
2 সই জন পাথর ভাঙেত হে ৷ সই সে পাথর ভাঙেত হে ৷
sei janya paathara bhaa Nate sei sa Nge paathara bhaa Nate
hachchhe — hachchhe —
3 এ িনেয় মে া কতৃপে র স েক আেলাচনাও এ পয মে া কতৃপে র সে আেলাচনা
কেরেছ ডুক াব ৷ কেরেছ ডুক াব ৷
e niYe meTro kartRRipakShera e paryanta meTro kartRRipak-
samparke aalochanaao shuru Shera sa Nge aalochanaa shuru
karechhe Dukyaaba — karechhe Dukyaaba —
4 নমুনা সং হ কের বােয়াপিসর পে পাঠান তঁারা নমুনা সং হ কের বােয়াপিসর চেয় পাঠান তঁারা
। ।
namunaa sa.ngraha kare baaY- namunaa sa.ngraha kare baaY-
opasira pakShe paaThaana opasira cheYe paaThaana
taa.Nraa taa.Nraa
5 ১১ জুন রামািনয়ার কন ানটায় ওই জাহােজ ১১ জুন রামািনয়ার কন ানটায় ওই জাহােজ
কলকাতার সােথ লাহার পাইপ তালা হয় ৷ কলকাতার সে লাহার পাইপ তালা হয় ৷
11 juna romaaniYaara kanas- 11 juna romaaniYaara kanas-
TaanaTaaYa oi jaahaaje TaanaTaaYa oi jaahaaje
kalakaataara saathe lohaara kalakaataara sa Nge lo-
paaipa tolaa haYa — haara paaipa tolaa haYa —
6 ওই হাসপাতােল দেরাগীেদর ারা ২িট শয া ওই হাসপাতােল দেরাগীেদর মেধ ২িট শয া
রেয়েছ । রেয়েছ ।
oi haasapaataale hRRidarogi- oi haasapaataale hRRidarogi-
idera dbaaraa 2Ti shaJyaa idera madhJe 2Ti shaJyaa
raYechhe raYechhe
7 িক মন িদেয় ৃিত মুেছ যায়িন ৷ িক মেনর মেধ ৃিত মুেছ যায়িন ৷
kintu mana diYe smRRiti kintu manera madhJe smRRiti
muchhe JaaYani — muchhe JaaYani —
8 অি েজন দওয়ার পয িসিল ার এেন িত অি েজন দওয়ার থেক িসিল ার এেন িত
নওয়া হি ল । নওয়া হি ল ।
aksijena deoYaara paryanta aksijena deoYaara theke silin-
silinDaara ene prastuti neoYaa Daara ene prastuti neoYaa
hachchhila hachchhila
APPENDIX C

Examples of sentences having different complexity


APPENDIX D

Examples of sentences collected from literature

domain
REFERENCES
Agirre, E., K. Gojenola, K. Sarasola, and A. Voutilainen, Towards a Single Pro-
posal in Spelling Correction. In Proceedings of the 36th Annual Meeting of the Asso-
ciation for Computational Linguistics and 17th International Conference on Computa-
tional Linguistics, volume 1 of ACL ’98. Association for Computational Linguis-
tics, Stroudsburg, PA, USA, 1998. URL http://dx.doi.org/10.3115/980845.
980850.

Albert, R. and A.-L. Barabási (2002). Statistical Mechanics of Complex Net-


works. Rev. Mod. Phys., 74(1), 47–97. URL http://link.aps.org/doi/10.1103/
RevModPhys.74.47.

Alfred, W., The Elements of English Grammar. 2nd. Pitt Press, Cambridge, 1894.

Allen, J., Natural Language Understanding. Menlo Park, Benjamin/Cummings, 1987.

Angell, R. C., G. E. Freund, and P. Willett (1983). Automatic Spelling Correction


Using a Trigram Similarity Measure. Information Processing and Management,
19(4), 255–261. URL http://www.sciencedirect.com/science/article/pii/
0306457383900225.

Arppe, A., Developing a grammar checker for swedish. In Proceedings of the Twelfth
Nordic Conference in Computational Linguistics. 2000.

Bachman, L. F., Fundamental Considerations in Language Testing. Oxford University


Press, Oxford, 1976.

Banko and Brill, Scaling to very large corpora for natural language disambigua-
tion. In ACL. 2001.

Bansal, B., M. Choudhury, P. R. Ray, S. Sarkar, and A. Basu, Isolated-word Error


Correction for Partially Phonemic Languages using Phonetic Cues. In Proceedings
of the International conference on Knowledge based Computer Systems. 2004.

Baptist, L. and S. Seneff, Genesis-ii: A versatile system for language generation


in conversational system applications. In Proceeding ICSLP. Beijing, China, 2000.

Bareiss, E. R., B. E. Porter, and C. C. Wier, Machine Learning and Uncertain


Reasoning. chapter PROTOS: an exemplar-based learning apprentice. Academic
Press Ltd., London, UK, UK, 1990. ISBN 0-12-273252-9, 1–13. URL http://dl.
acm.org/citation.cfm?id=92900.92906.

Bartha, C., T. Spiegelhauer, R. Dormeyer, and I. Fischer (2006). Word order and
discontinuities in dependency grammar. Acta Cybern., 17(3), 617–632. URL http:
//dblp.uni-trier.de/db/journals/actaC/actaC17.html#BarthaSDF06.

115
Basili, R. and F. M. Zanzotto (2002). Parsing Engineering and Empirical Robust-
ness. Natural Language Engineering, 8(2-3), 97–120.

Begum, R., S. Husain, A. Dhwaj, D. M. Sharma, L. Bai, and R. Sangal, De-


pendency Annotation Scheme for Indian Languages. In Proceedings of IJCNLP.
2008.

Bellman, R., Dynamic Programming. Princeton University Press, Princeton, NJ,


USA, 1957, 1 edition.

Bender, E. M., D. Flickinger, S. Oepen, A. Walsh, and T. Baldwin, Arboretum:


Using a precision grammar for grammar checking in call. In In Proceedings of
the InSTIL/ICALL Symposium: NLP and Speech Technologies in Advanced Language
Learning Systems. 2004.

Berger, A. L., Pietra, V. J. Della, and S. A. Della (1996). A maximum entropy


approach to natural language processing. Computational Linguistics, 22(1), 39–71.
ISSN 0891-2017. URL http://dl.acm.org/citation.cfm?id=234285.234289.

Bernth, A., Easyenglish: a tool for improving document quality. In Proceedings of


the fifth conference on Applied natural language processing (ANLC’97). Stroudsburg,
PA, USA, 1997.

Bharati, A., V. Chaitanya, and R. Sangal, Natural Language Processing: A Paninian


Perspective. PHI, 2010.

Bharati, A., D. M. Sharma, L. Bai, and R. Sangal (2006). AnnCorra: Annotat-


ing Corpora Guidelines for POS and Chunk Annotation for Indian Languages.
Technical report, Language Technologies Research Centre, IIIT-Hyderabad. URL
http://ltrc.iiit.ac.in/tr031/posguidelines.pdf.

Bhatt, A., M. Choudhury, S. Sarkar, and A. Basu, Exploring the Limits of


Spellcheckers: A comparative Study in Bengali and English. In Proceedings of
the Second Symposium on Indian Morphology, Phonology and Language Engineering.
CIIL Mysore, Kharagpur, INDIA, 2005.

Bhattacharya, S., M. Choudhury, S. Sarkar, and A. Basu, Inflectional Morphology


Synthesis for Bengali Noun, Pronoun and Verb Systems. In Proceedings of the
National Conference on Computer Processing of Bangla. Independent University,
Bangladesh, 2005.

Bhattacharyya, P., M. Mitra, and S. Choudhury, Divergence Patterns between


English and Bangla: Machine Translation Perspective. In Proceedings of the 9th
International Conference of Natural Language Processing. AUKBC, Chennai, India,
2011.

Bigert, J. and O. Knutsson, Robust error detection: a hybrid approach com-


bining unsupervised error detection and linguistic knowledge. In Proceedings
ROMAND-02. Frascati, Italy, 2002.

116
Birn, J., Detecting grammar errors with lingsoft’s swedish grammar checker. In In
Proceedings of the Twelfth Nordic Conference in Computational Linguistics. 2000.

Bolioli, Dini, and Malnati, Jdii: Parsing italian with a robust constraint grammar.
In Proceedings of COLING. 1992.

Bondi, J., Johannessen, K. Hagen, and P. Lane, The performance of a grammar


checker with deviant language input. Mounton de Gruyter, Berlin and New
York, 2002.

Bradac, J. J., R. A. Davies, and J. A. Courtright (1977). The Role of Prior Mes-
sage Context in Evaluative Judgments of High- and Low-Diversity Messages.
Language and Speech, 20(4), 295–307.

Bredenkamp, Crysmann, and Petrea, Looking for Errors: A Declarative Formal-


ism for Resource-Adaptive Language Checking. In Proceedings of the 2nd Inter-
national Conference on Language Resources and Evaluation (LREC-2000). Athens,
Greece, 2000.

Brill, E. and R. C. Moore, An Improved Error Model for Noisy Channel Spelling
Correction. In Proceedings of the 38th Annual Meeting on Association for Computa-
tional Linguistics, ACL ’00. Association for Computational Linguistics, Strouds-
burg, PA, USA, 2000. URL http://dx.doi.org/10.3115/1075218.1075255.

Brockett, C., W. B. Dolan, and M. Gamon, Correcting ESL errors using phrasal
SMT techniques. In Proceedings of the 21st International Conference on Computa-
tional Linguistics and the 44th annual meeting of the Association for Computational
Linguistics, ACL-44. Association for Computational Linguistics, Stroudsburg,
PA, USA, 2006. URL http://dx.doi.org/10.3115/1220175.1220207.

Bustamante, F. R. and F. S. León, Gramcheck: A grammar and style checker. In


Proceedings of COLING-96. 1996.

Campbell, C. and Y. Ying (2011). Learning with support vector machines. Synthesis
Lectures on Artificial Intelligence and Machine Learning, 5, 1–95.

Catt, M. and G. Hirst (1990). An intelligent cali system for grammatical error
diagnosis. Computer Assisted Language Learning, 3, 3–26.

Chatterjee, S. K., The Origin and Development of the Bengali Language. Rupa co.,
New Delhi, 1926.

Chaudhuri, B. and P. Kundu (2000). Error Pattern in Bangla Text. International


Journal of Dravidian Linguistics, (2), 48–88.

Chaudhuri, B. B., Reversed Word Dictionary and Phonetically Similar Word


Grouping based Spell-Checker to Bengali Text. In Proceedings of LESAL Workshop.
2001. URL http://www.emille.lancs.ac.uk/lesal/bangla.pdf.

117
Chaudhuri, B. B., Towards Indian Language Spell-checker Design. In Proceedings of
the Language Engineering Conference, LEC ’02. IEEE Computer Society, Washing-
ton, DC, USA, 2002. ISBN 0-7695-1885-0. URL http://dl.acm.org/citation.
cfm?id=788016.788703.
Chen, W., J. Kazama, Y. Tsuruoka, and K. Torisawa, Improving Graph-based
Dependency Parsing with Decision History. In COLING (Posters). 2010.
Chen, W., J. Kazama, M. Zhang, Y. Tsuruoka, Y. Zhang, Y. Wang, K. Torisawa,
and H. Li, SMT Helps Bitext Dependency Parsing. In EMNLP. 2011.
Chen, W., J. Kazama, M. Zhang, Y. Tsuruoka, Y. Zhang, Y. Wang, K. Torisawa,
and H. Li (2012). Bitext Dependency Parsing With Auto-Generated Bilingual
Treebank. IEEE Transactions on Audio, Speech & Language Processing, 20(5), 1461–
1472.
Chodorow, M., M. Dickinson, R. Israel, and J. R. Tetreault, Problems in Evaluating
Grammatical Error Detection Systems. In COLING. 2012.
Chodorow, M. and C. Leacock, An unsupervised method for detecting grammat-
ical errors. In Proceedings of the 1st North American chapter of the Association for
Computational Linguistics conference (NAACL 2000). San Francisco,CA, 2000.
Choudhury, M., M. Thomas, A. Mukherjee, A. Basu, and N. Ganguly, How
Difficult is it to Develop a Perfect Spell-checker? A Cross-Linguistic Analysis
through Complex Network Approach. In Proceedings of the Second Workshop on
TextGraphs: Graph-Based Algorithms for Natural Language Processing. Association
for Computational Linguistics, Rochester, NY, USA, 2007. URL http://aclweb.
org/anthology//W/W07/W07-0212.pdf.
Clark, P. (1987). PROTOS - A Rational Reconstruction. Technical report, Turing
Institute, Glasgow.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and
Psychological Measurement, 20(1), 37–46. URL http://epm.sagepub.com/cgi/
doi/10.1177/001316446002000104.
Collins-Thompson, K., Enriching Information Retrieval with Reading Level Pre-
diction. In Proceedings of SIGIR 2011 Workshop on Enriching Information Retrieval.
Beijing, China, 2011.
Collins-Thompson, K., P. N. Bennett, R. W. White, S. de la Chica, and D. Sontag,
Personalizing Web Search Results by Reading Level. In Proceedings of the 20th
ACM international conference on Information and knowledge management, CIKM
’11. ACM, New York, NY, USA, 2011. ISBN 978-1-4503-0717-8. URL http:
//doi.acm.org/10.1145/2063576.2063639.
Collins-Thompson, K. and J. Callan (2005). Predicting Reading Difficulty with
Statistical Language Models. Journal of the American Society for Information Science
and Technology, 56(13), 1448–1462. ISSN 1532-2882. URL http://dx.doi.org/
10.1002/asi.20243.

118
Collins-Thompson, K. and J. P. Callan, A Language Modeling Approach to Pre-
dicting Reading Difficulty. In Proceedings of HLT-NAACL. 2004.
Covington, M. A. (1990). Parsing discontinuous constituents in dependency
grammar. Comput. Linguist., 16(4), 234–236. ISSN 0891-2017. URL http:
//dl.acm.org/citation.cfm?id=124992.124997.
Cutting, D., J. Kupiec, J. Pedersen, and P. Sibun, A Practical Part-of-Speech
Tagger. In Proceedings of the third conference on Applied natural language processing,
ANLC ’92. Association for Computational Linguistics, Stroudsburg, PA, USA,
1992. URL http://dx.doi.org/10.3115/974499.974523.
Dale, R., Helping People Write: Grammar Checking and Beyond. In Tutorial in 9th
International Conference of Natural Language Processing. AUKBC, Chennai, India,
2011.
Dale, R., C. Mellish, and M. Zock, Current Research in Natural Language Generation.
Academic Press, London, 1990.
Dale, R., D. Scott, and B. D. Eugenio (1998). Introduction to the Special Issue on
Natural Language Generation. Computational Linguistic, 24(3), 346–353. ISSN
0891-2017. URL http://dl.acm.org/citation.cfm?id=972749.972751.
Daller, M., Guiraud’s index of lexical richness. In British Association of Applied
Linguistics. 2010.
Dalrymple, M., Lexical Functional Grammar. Syntax and Semantics Series, Xerox
Palo Alto Research Center, 2001.
Damerau, F. J. (1964). A Technique for Computer Detection and Correction of
Spelling Errors. Communications of the ACM, 7(3), 171–176. ISSN 0001-0782. URL
http://doi.acm.org/10.1145/363958.363994.
Dandapat, S. and S. Sarkar, Part of Speech Tagging for Bengali with Hidden
Markov Model. In Proceeding of the NLPAI Machine Learning Competition. 2006.
URL http://ltrc.iiit.ac.in/nlpai_contest06/papers/mla.pdf.
Dandapat, S., S. Sarkar, and A. Basu, A Hybrid Model for Part-of-Speech Tag-
ging and Its Application to Bengali. In Proceedings of International Conference on
Computational Intelligence. 2004.
Das, M., S. Borgohain, J. Gogoi, and S. B. Nair, Design and Implementation of a
Spell Checker for Assamese. In Language Engineering Conference. IEEE Computer
Society, 2002. ISBN 0-7695-1885-0. URL http://dblp.uni-trier.de/db/conf/
lec/lec2002.html#DasBGN02.
Dasgupta, S., C. Papadimitriou, and U. Vazirani, Algorithm. Mc. Graw Hill, 2008.
URL http://www.cs.berkeley.edu/~vazirani/algorithms.html.
Dash, N. S. (2013). Part-of-Speech (POS) Tagging of Bengali Written Text Cor-
pus. Bhasha Bijnan o Prayukti, 1(1). URL http://www.academia.edu/3931246/
Part-of-Speech_POS_Tagging_of_Bengali_Written_Text_Corpus.

119
Dave, S., J. Parikh, and P. Bhattacharyya (2001). Interlingua-Based English-Hindi
Machine Translation and Language Divergence. Machine Translation, 16(4), 251–
304.

De Felice, R. and S. Pulman, Automatic detection of preposition errors in learner


writing. volume 26. 2009. URL https://www.calico.org.

De Felice, R. and S. G. Pulman, Automatically acquiring models of preposition


use. In Proceedings of the Fourth ACL-SIGSEM Workshop on Prepositions, SigSem
’07. Association for Computational Linguistics, Stroudsburg, PA, USA, 2007.
URL http://portal.acm.org/citation.cfm?id=1654629.1654639.

Dickinson, M., R. Israel, and S.-H. Lee, Developing Methodology for Korean
Particle Error Detection. In Proceedings of the 6th Workshop on Innovative Use of
NLP for Building Educational Applications, IUNLPBEA ’11. Association for Com-
putational Linguistics, Stroudsburg, PA, USA, 2011. ISBN 9781937284039. URL
http://dl.acm.org/citation.cfm?id=2043132.2043142.

Dini, L. and G. Malnati, Weak constraints and preference rules. In Prefernce in


Eurotra, volume 6. 1993.

Dixit, V., S. Dethe, and R. K. Joshi (2006). Design and Implementation of a


Morphology-based Spell-Checker for Marathi, an Indian Language. Special issue
on Human Language Technologies as a Challenge for Computer Science and Linguistics,
309–316.

Dorr, B., L. Pearl, R. Hwa, and N. Habash, DUSTer: A Method for Unraveling
Cross-Language Divergences for Statistical Word-level Alignment. In Proceedings
of AMTA-02. Springer, 2002.

Douglas, S. and R. Dale, Towards robust patr. In Proceedings of the 14th confer-
ence on Computational linguistics, COLING ’92. Association for Computational
Linguistics, Stroudsburg, PA, USA, 1992. URL http://dx.doi.org/10.3115/
992133.992143.

Drucker, P., Men, Ideas, and Politics. Harvard Business School Publishing, 2010.

Ejerhed, E., Finite state segmentation of discourse into clauses. Cambridge University
Press, 1999.

Ferreira, F., K. Christianson, and A. Hollingworth (2001). Misinterpre-


tations of Garden-Path Sentences: Implications for Models of Sentence
Processing and Reanalysis. Journal of Psycholinguistic Research, 30(1), 3–
20. URL http://scholar.google.de/scholar.bib?q=info:Q9cl72BuXYIJ:
scholar.google.com/&output=citation&hl=en&as_sdt=2005&sciodt=0,
5&ct=citation&cd=9.

Fleiss, J. L. (1981). Statistical methods for rates and proportions.

120
Fliedner, G., A system for checking np agreement in german texts. In In Proceedings
of the Student Research Workshop at the 40th Annual Meeting of the Association for
Computational Lingustics(ACL). 2002.

Foster, J. (2005). Good Reasons for Noting Bad Grammar: Empirical Investigations into
the Parsing of Ungrammatical Written English. Ph.D. thesis, University of Dublin,
Trinity College, Dublin, Ireland.

Foster, J. and i. E. Andersen (2009). Generrate : Generating errors for use in


grammatical error detection. Analysis, (June), 82–90. URL http://www.aclweb.
org/anthology/W/W09/W09-2112.

Fouvry, F., Robust Unification for Linguistics. In Proceedings of ROMAND. Lau-


sanne, 2000.

Fouvry, F., Constraint relaxation with weighted feature structures. In Proceedings


of the Eighth International Workshop on Parsing Technologies (IWPT-03), volume 6.
2003.

Francis, N. W., Problems of Assembling and Computerizing Large Corpora. In


S. Johansson (ed.), Computer corpora in English language research. Norwegian
Computing Centre for the Humanities, 1982. ISBN 9788272830273, 7–24.

Frank, A., J. K. Tracy Holloway King, and J. Maxwell, Optimality theory style
constraint ranking in large-scale lfg grammars. In Proceedings of LFG-98. Bris-
bane,Australia, 1998.

Freund and Schapire (1999). Large margin classification using the perceptron
algorithm. Machine Learning, 37(3), 277–296.

Fujishima, S. and S. Ishizaki, Automated detection of usage errors in non-native


english writing. In IES 2011-Emerging Technology for Better Human Life. 2011.

Gaies, S. J. (1980). T-Unit Analysis in Second Language Research: Applications,


Problems and Limitations. TESOL Quarterly, 14(1), 53–60.

Gamon, M., J. Gao, C. Brockett, A. Klementiev, W. B. Dolan, D. Belenko, and


L. Vanderwende, Using contextual speller techniques and language modeling
for esl error correction. 2008.

Garain, U. and S. De, Dependency parsing in bangla. In Technical Challenges and


Design Issues in Bangla Language Processing. USA, 2013.

Ghosh. A., D. A., Bhaskar. P. and B. S., Dependency parser for bengali: the
ju system. In NLP Tool Contest at International Conference on Natural Language
Processing(ICON 2009). 2009.

Gill, M. S. and G. S. Lehal, A grammar checking system for panjabi. In Proceedings


of the 22nd International Conference on Computational Linguistics. 2008.

121
Golding, A. R., A Bayesian Hybrid Method for Context-Sensitive Spelling Cor-
rection. In Proceedings of the Third Workshop on Very Large Corpora. 1995. URL
http://arxiv.org/pdf/cmp-lg/9606001.pdf.

Golding, A. R. and Y. Schabes, Combining Trigram-based and Feature-based


Methods for Context-Sensitive Spelling Correction. In Proceedings of the 34th
annual meeting on Association for Computational Linguistics, ACL ’96. Association
for Computational Linguistics, Stroudsburg, PA, USA, 1996. URL http://dx.
doi.org/10.3115/981863.981873.

Goyal, P. and R. M. K. Sinha, Translation Divergence in English-Sanskrit-Hindi


Language Pairs. In Sanskrit Computational Linguistics. 2009.

Han, N.-R., M. Chodorow, and C. Leacock, Detecting errors in english article


usage with a maximum entropy classifier trained on large, diverse corpus. 2004.

Han, N.-R., M. Chodorow, and C. Leacock, Detecting errors in english article


usage by non-native speakers. volume 12. Cambridge University Press, New
York, NY, USA, 2006. ISSN 1351-3249. URL http://portal.acm.org/citation.
cfm?id=1133917.1133922.

Han, N.-R., J. R. Tetreault, S.-H. Lee, and J.-Y. Ha, Using an Error-Annotated
Learner Corpus to Develop an ESL/EFL Error Correction System. In LREC. 2010.

Haque, M. T. and M. Kaykobad, Use of Phonetic Similarity for Bangla Spell


Checker. In Proceedings of 5th International Conference on Computer and Information
Technology. 2002. URL http://research.banglacomputing.net/iccit/ICCIT_
pdf/5th%20ICCIT-2002_p182-p185.pdf.

Heidorn, G. E., Augmented phrase structure grammar. In Theoretical Issues in


Natural Lunguage Processing. 1975.

Heidorn, G. E., Intelligence writing assistance. In A Handbook of Natural Language


Processing: Techniques and Applications for the Processing of Language as Text. Marcel
Dekker, New York,USA, 2000.

Heidorn, G. E., K. Jensen, L. A. Miller, R. J.Byrd, and M. Chodorow., The epistle


text-critiquing system. In IBM Systems Journal, volume 21. 1982.

Heift, T. and M. Schulze, Errors and Intelligence in Computer-Assisted Language


Learning: Parsers and Pedagogues. Routledge Taylor and Francis Group, New
York and London, 2007.

Heilman, M., K. Collins-Thompson, and M. Eskenazi, An Analysis of Statistical


Models and Features for Reading Difficulty Prediction. In Proceedings of the
Third Workshop on Innovative Use of NLP for Building Educational Applications,
EANL ’08. Association for Computational Linguistics, Stroudsburg, PA, USA,
2008. ISBN 978-1-932432-08-4. URL http://dl.acm.org/citation.cfm?id=
1631836.1631845.

122
Hein, S., A chart-based frame work for grammar checking-initial studies. In 11th
Nordic Conference in Computational Linguistic.. 1998.

Henrich, V. and T. Reuter (2009). LISGrammarChecker:Language Independent Statis-


tical Grammar Checking. Master’s thesis, Hochschule Darmstadt University.

Hermet, M. and A. Désilets, Using first and second language models to correct
preposition errors in second language authoring. In Proceedings of the Fourth
Workshop on Innovative Use of NLP for Building Educational Applications, EdApp-
sNLP ’09. Association for Computational Linguistics, Stroudsburg, PA, USA,
2009. ISBN 978-1-932432-37-4. URL http://dl.acm.org/citation.cfm?id=
1609843.1609853.

Hermet, M., A. Désilets, and S. Szpakowicz, Using the web as a linguistic resource
to automatically correct lexico-syntactic errors. In In Proceedings of the Sixth
International Conference on Language Resources and Evaluation. 2008.

Hill, L. R. and S. W. Murray, Comma and Spaces: The Point of Punctuation. In


Proceedings of the 11th Annual CUNY Conference on Human Sentence Processing.
1998.

Hirst, G. and A. Budanitsky (2005). Correcting Real-Word Spelling Errors by


Restoring Lexical Cohesion. Natural Language Engineering, 11(1), 87–111. ISSN
1351-3249. URL http://dx.doi.org/10.1017/S1351324904003560.

Hovy, E., Approaches to the Planning of Coherent Text. In W. R. S. Cecile L. Paris


and W. C. Mann (eds.), Natural Language Generation in Artificial Intelligence and
Computational Linguistics. Kluwer, Boston, 1991, 83–102.

Hunt, K. W., Grammatical Structures Written at Three Grade Levels. NCTE Research
report, USA, 1965.

Israel, R., J. R. Tetreault, and M. Chodorow, Correcting Comma Errors in Learner


Essays, and Restoring Commas in Newswire Text. In HLT-NAACL. 2012.

Izumi, E., K. Uchimoto, T. Saiga, T. Supnithi, and H. Isahara, Automatic error


detection in the japanese learners’ english spoken data. In Proceedings of the 41st
Annual Meeting on Association for Computational Linguistics - Volume 2, ACL ’03.
Association for Computational Linguistics, Stroudsburg, PA, USA, 2003. ISBN
0-111-456789. URL http://dx.doi.org/10.3115/1075178.1075202.

Jelinek, F. and R. L. Mercer, Interpolated Estimation of Markov Source Parameters


from Sparse Data. In Proceedings of the Workshop on Pattern Recognition in Practice.
Amsterdam, The Netherlands: North-Holland, 1980.

Jensen, K., G. E. Heidorn, L. A. Miller, and Y. Ravin, Parse fitting and prose
fixing: Getting a hold on ill-formedness. In American Journal of Computational
Linguistics. 1983.

Jensen, K., G. E. Heidorn, and S. D. Richardson, Natural language processing: the


PLNLP approach. Kluwer Academic Publishers, 1993.

123
Joshi, Levy, and Takahashi (1975). Tree adjunct grammars. Computer Systems
Science, 10.

Jurafsky, D. and J. H. Martin, Speech and Language Processing: An Introduction


to Natural Language Processing, Speech Recognition, and Computational Linguistics.
Prentice-Hall, 2009.

Kapatsinski, V. (2006). Sound Similarity Relations in the Mental Lexicon: Model-


ing the Lexicon as a Complex Network. Speech research Lab Progress Report, 27,
133–152.

Karlsson, F., Constraint grammar as a framework for parsing running text. In


Proceedings of the 13th Conference on Computational Linguistics, volume 3. Helsinki,
Finland, 1990a.

Karlsson, F., Constraint Grammar as a Framework for Parsing Running Text. In


Proceedings of the 13th conference on Computational linguistics - Volume 3, COLING
’90. Association for Computational Linguistics, Stroudsburg, PA, USA, 1990b.
ISBN 952-90-2028-7. URL http://dx.doi.org/10.3115/991146.991176.

Karlsson, F., A. Voutilainen, J. Heikkil, and A. Anttila., Constraint grammar: A


language independent system for parsing unrestricted text. In Proceedings of the
19th International Conference on Computational Linguistic. 1995.

Kernighan, M. D., K. W. Church, and W. A. Gale, A Spelling Correction Program


based on a Noisy Channel Model. In Proceedings of the 13th International Confer-
ence on Computational Linguistics, COLING ’90. Association for Computational
Linguistics, Stroudsburg, PA, USA, 1990. URL http://dx.doi.org/10.3115/
997939.997975.

Khader, R. A., T. H. King, and M. Butt, Deep call grammars: The lfg-ot experiment.
2004.

Kilgarriff, A. (2007). Googleology is bad science. Comput. Linguist., 33(1), 147–151.


ISSN 0891-2017. URL http://dx.doi.org/10.1162/coli.2007.33.1.147.

Kim, J. Y., K. Collins-Thompson, P. N. Bennett, and S. T. Dumais, Characterizing


Web Content, User Interests, and Search Behavior by Reading Level and Topic. In
Proceedings of the fifth ACM international conference on Web search and data mining,
WSDM ’12. ACM, New York, NY, USA, 2012. ISBN 978-1-4503-0747-5. URL
http://doi.acm.org/10.1145/2124295.2124323.

Knight, K. and I. Chander, Automated postediting of documents. In In Proceedings


of the Twelfth National Conference on Artificial Intelligent (AAAI). 1994.

Knuth, D. E., The Art of Computer Programming, volume 3. Addison Wesley Long-
man Publishing Co., Inc., Redwood City, CA, USA, 1998, 2 edition. ISBN 0-201-
89685-0.

124
Krishnakumaran, S. and X. Zhu, Hunting Elusive Metaphors Using Lexical Re-
sources. In Proceedings of the Workshop on Computational Approaches to Figura-
tive Language, FigLanguages ’07. Association for Computational Linguistics,
Stroudsburg, PA, USA, 2007. URL http://dl.acm.org/citation.cfm?id=
1611528.1611531.

Kukich, K. (1992). Techniques for Automatically Correcting Words in Text. ACM


Computing Survey, 24(4), 377–439. ISSN 0360-0300. URL http://doi.acm.org/
10.1145/146370.146380.

Kundu, B. and S. Chandra, Automatic Detection of English Words in Benglish


Text. In Proceedings of 4th International Conference on Intelligent Human Computer
Interaction. 2012.

Lascarides, A., T. Briscoe, P. Street, N. Asher, and A. Copestake (1996). Order


Independent and Persistent Typed Default Unification. Linguistic and Philosophy,
19(1), 1–90.

Lavie, A. and A. Agarwal, Meteor: an Automatic Metric for MT Evalua-


tion with High Levels of Correlation with Human Judgments. In Proceed-
ings of the Second Workshop on Statistical Machine Translation, StatMT ’07. As-
sociation for Computational Linguistics, Stroudsburg, PA, USA, 2007. URL
http://dl.acm.org/citation.cfm?id=1626355.1626389.

Leacock, C., M. Chodorow, M. Gamon, and J. Tetreault, Automatic Grammati-


cal Error Detection for Language Learners. Synthesis Lectures on Human Language
Technologies. Morgan Claypool, 2010.

Lee, J. and S. Seneff, Correcting Misuse of Verb Forms. June. Association for Compu-
tational Linguistics, 2008, 174–182. URL http://www.aclweb.org/anthology/
P/P08/P08-1021.

Lee, S., J. Lee, H. Noh, K. Lee, and G. G. Lee (2011). Grammatical Error Simulation
for Computer-Assisted Language Learning. Knowledge Based System, 24(6), 868–
876.

Lehal, G. S., Design and Implementation of Punjabi Spell Checker. In Interna-


tional Journal of Systemics. 2007. URL http://learnpunjabi.org/pdf/Punjabi%
20Spell%20Checker%20%282%29.pdf.

Levenshtein, V. I. (1966). Binary codes capable of correcting deletions,insertions,


and reversals. 10(8), 707–710.

Lewis and Paul (2009). Ethnologue: Languages of the world. URL http://www.
ethnologue.com/.

Lin, Y. C. and K. Y. Su (1995). A level synchronous approach to ill-formed sentence


parsing and error recovery. Technical report.

Littlestone, N., Learning quickly when irrelevant attributes abound: A new linear-
threshold algorithm. In Machine Learning. 1988.

125
Liu, C.-H., C.-H. Wu, and M. Harris, Word Order Correction for Language Transfer
Using Relative Position Language Modeling. In 6th International Symposium on
Chinese Spoken Language Processing. 2008.

Lopresti, D. and J. Zhou (1997). Using Consensus Sequence Voting to Correct OCR
Errors. Computer Vision and Image Understanding, 67(1), 39–47. ISSN 1077-3142.
URL http://dx.doi.org/10.1006/cviu.1996.0502.

Lourdes, O. (2003). Syntactic Complexity Measures and Their Relationship to L2


Proficiency A Research Synthesis of College Level L2 Writing. Applied Linguistics,
24(4), 492–518.

Lowth, R., A Short Introduction to English Grammar: With Critical Notes. Millar and
Dodsley, London, 1762.

Lozano and Melero, Spanish nlp projects at microsoft research. In Proceedings


of the 2nd International Workshop of Spanish Language Processing and Language
Technologies. Jaén, Spain, 2001.

Mangal, S. K., Statistic in psychology and education. PHI, India, 2012.

Mangu, L. and E. Brill, Automatic Rule Acquisition for Spelling Correction. In


Proceedings of the Fourteenth International Conference on Machine Learning, ICML
’97. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1997. ISBN
1-55860-486-3. URL http://dl.acm.org/citation.cfm?id=645526.657126.

Manning, C. D., Part-of-Speech Tagging from 97% to 100%: Is It Time for Some
Linguistics? In Proceedings of the 12th international conference on Computational
linguistics and intelligent text processing - Volume Part I, CICLing’11. Springer-
Verlag, Berlin, Heidelberg, 2011. ISBN 978-3-642-19399-6. URL http://dl.acm.
org/citation.cfm?id=1964799.1964816.

Manning, C. D. and H. Schütze, Foundations of Statistical Natural Language Process-


ing. MIT Press, 1999.

Mays, E., F. J. Damerau, and R. L. Mercer (1991). Context based Spelling Cor-
rection. Information Processing and Management, 27(5), 517–522. ISSN 0306-4573.
URL http://dx.doi.org/10.1016/0306-4573(91)90066-U.

McCallum, D. R. and J. L. Peterson, Computer-based readability indexes. In


Proceedings of the ACM ’82 conference, ACM ’82. ACM, New York, NY, USA, 1982.
ISBN 0-89791-085-0. URL http://doi.acm.org/10.1145/800174.809754.

McCarthy, J. (1986). Applications of Circumscription to Formalizing Common-


Sense Knowledge. Artificial Intelligent, 28(1), 89–116.

McCord, M. C., Slot grammars. In Computational Linguistics, volume 6. 1980.

McLaughlin, H. G. (1969). SMOG grading - a new readability formula. Journal of


Reading, 639–646.

126
Mellish, C. S., Some chart-based techniques for parsing ill-formed input. In Pro-
ceedings of the 27th annual meeting on Association for Computational Linguistics,
ACL ’89. Association for Computational Linguistics, Stroudsburg, PA, USA,
1989. URL http://dx.doi.org/10.3115/981623.981636.
Michaud, L. N. and K. F. Mccoy, An intelligent tutoring system for deaf learners
of written english. In In Proceedings of the Fourth International ACM SIGCAPH
Conference on Assistive Technologies (ASSETS 2000). SIGCAPH, 2000.
Michaud, L. N. and K. F. Mccoy, Error profiling: Toward a model of english
acquisition for deaf learners. In In Proc. of the 39th Annual Meeting and the
10th Conference of the European Chapter of Association for Computational Linguistics
(EACL). 2001.
Mitchell, T., Machine Learning. 1997.
Nagata, R., F. Masui, A. Kawai, and N. Isu, Recognizing Article Errors Based on
the Three Head Words. In CELDA. 2004.
Newman, M. E. J. (2003). The structure and function of complex networks. SIAM
Review, 45, 167–256.
Nina H. MacDonald, P. S. G., Lawrence T. Frase and S. A. Keenan., The writer’s
workbench: Computer aids for text analysis. In IEEE Transactions on Communi-
cations, volume 3. 1982.
Nivre, J. (2005). Dependency Grammar and Dependency Parsing. Technical report,
Växjö University: School of Mathematics and Systems Engineering.
Nivre, J. (2008). Algorithms for Deterministic Incremental Dependency Parsing.
Computational Linguistic, 34(4), 513–553. ISSN 0891-2017. URL http://dx.doi.
org/10.1162/coli.07-056-R1-07-027.
Oyama, H. and Y. Matsumoto, A machine learning approach for error identifi-
cation for learners of japanese. In The society for Teaching Japanese as a Foreign
Language Spring Meeting. 2008.
Papineni, K., S. Roukos, T. Ward, and W.-J. Zhu, Bleu: a method for automatic
evaluation of machine translation. In Proceedings of the 40th Annual Meeting on
Association for Computational Linguistics, ACL ’02. Association for Computational
Linguistics, Stroudsburg, PA, USA, 2002. URL http://dx.doi.org/10.3115/
1073083.1073135.
Park, J. C., M. Palmer, and G. Washburn, An english grammar checker as a writing
aid for students of english as a second language. 1997.
Park, Y. A. and R. Levy, Automated Whole Sentence Grammar Correction Us-
ing a Noisy Channel Model. In Proceedings of the 49th Annual Meeting of the
Association for Computational Linguistics: Human Language Technologies - Volume
1, HLT ’11. Association for Computational Linguistics, Stroudsburg, PA, USA,
2011. ISBN 978-1-932432-87-9. URL http://dl.acm.org/citation.cfm?id=
2002472.2002590.

127
Pazzani, M. J., Conceptual Analysis of Garden-Path Sentences. In Proceedings of the
10th international conference on Computational linguistics, COLING ’84. Association
for Computational Linguistics, Stroudsburg, PA, USA, 1984. URL http://dx.
doi.org/10.3115/980431.980595.

Peterson, J. L. (1980). Computer Programs for Detecting and Correcting Spelling


Errors. Communications of the ACM, 23(12), 676–687. ISSN 0001-0782. URL
http://doi.acm.org/10.1145/359038.359041.

Philips, L. (1990). Hanging on the Metaphone. Computer Language Maga-


zine, 7(12), 39–44. Accessible at http://www.cuj.com/documents/s=8038/
cuj0006philips/.

Popel, M., D. Mareček, N. Green, and Z. Žabokrtský, Influence of Parser


Choice on Dependency-Based MT. In Proceedings of the Sixth Workshop on
Statistical Machine Translation, WMT ’11. Association for Computational Lin-
guistics, Stroudsburg, PA, USA, 2011. ISBN 978-1-937284-12-1. URL http:
//dl.acm.org/citation.cfm?id=2132960.2133019.

Powers, D. M. W., Learning and application of differential grammars. In In Proc.


Meeting of the ACL Special Interest Group in Natural Language Learning. 1997.

Proudian and Pollard, Parsing Head-driven Phrase Structure Grammar. In Pro-


ceedings of the 23rd annual meeting on Association for Computational Linguistics.
Stroudsburg, PA, USA, 1985.

Quinlan, J. R., Introduction to decision trees. In Machine Learning. 1986.

Rabiner, L. and B.-H. Juang, Fundamentals of Speech Recognition. Prentice-Hall, Inc.,


Upper Saddle River, NJ, USA, 1993. ISBN 0-13-015157-2.

Raybaud, S., D. Langlois, and K. Smaïli, Efficient Combination of Confidence


Measures for Machine Translation. In Proceedings of INTERSPEECH. 2009.

Reiter, E. and R. Dale, Building Natural Language Generation Systems. Cambridge


University Press, New York, NY, USA, 2000. ISBN 0-521-62036-8.

Rich and Knight, Artificial Intelligence. 2nd. McGraw Hill, New York, 1991.

Robb, T., S. Ross, and I. Shortreed (1986). Salience of Feedback on Error and Its
Effect on EFL Writing Quality. TESOL Quarterly, 20, 83–93.

Rozovskaya, A. and D. Roth, Algorithm Selection and Model Adaptation for ESL
Correction Tasks. In ACL. 2011.

Sachs, R. and C. Polio (2007). Learners’ Uses of Two Types of Written Feedback
on a L2 Writing Revision Task. Studies in Second Language Acquisition, 29, 67–100.

Sankaran, B., K. Bali, M. Choudhury, T. Bhattacharya, P. Bhattacharyya, G. N.


Jha, S. Rajendran, K. Saravanan, L. Sobha, and K. V. Subbarao, A Common
Parts-of-Speech Tagset Framework for Indian Languages. In LREC. European

128
Language Resources Association, 2008. URL http://dblp.uni-trier.de/db/
conf/lrec/lrec2008.html#SankaranBCBBJRSSS08.

Santorini, B. (1990). Part-Of-Speech Tagging Guidelines for the Penn Treebank


Project 3rd revision, 2nd printing. Technical report, Department of Linguistics,
University of Pennsylvania, Philadelphia, PA, USA.

Scheler, G. and T. Munchen, With raised eyebrows or the eyebrows raised? a neu-
ral network approach to grammar checking for definiteness. In Bilkent University.
1996.

Schmidt-Wigger and Anje, Grammar and Style Checking for German. In Pro-
ceedings of the Second International Workshop on Control Language. Pittsburgh,PA,
1998.

Schneider, D. A. and K. F. McCoy, Recognizing syntactic errors in the writing


of second language learners. In Proceedings of the 17th international conference on
Computational linguistics (COLING’98), volume 2. 1998.

Schuster, E. (1986). The role of native grammars in correcting errors in second


language learning. Computational Intelligence, 2, 93–98.

Schwind, C., Sensitive parsing: error analysis and explanation in an intelligent


language tutoring system. In COLING’88. 1988.

Schwind, C. B., Feature grammar for semantic analysis. In Computer Intelligence.


1990.

Shieber, S. (1985). Evidence Against the Context-Freeness of Natural Language.


Linguistics and Philosophy, 8(3), 333–343. URL http://www.eecs.harvard.edu/
shieber/Biblio/Papers/shieber85.pdf.

Sinha, M., S. Sharma, T. Dasgupta, and A. Basu, New Readability Measures for
Bangla and Hindi Texts. In COLING (Posters). 2012.

Smolensky, P. and G. Legendre, The Harmonic Mind: From Neural Computation to


Optimality-Theoretic Grammar. MIT press, Cambridge, MA, 2006.

Steedman, M. and J. Baldridge, Combinatory categorial grammar. In Non-


Transformational Syntax. Blackwell, 2005.

Stemberger, J. P., Syntactic errors in speech. In Journal of Psycholinguistic Research.


1982.

Tesfaye, D. (2011). A rule-based afan oromo grammar checker. International Journal


of Advanced Computer Science and Applications, 2, 8.

Tetreault, J. R., J. Foster, and M. Chodorow, Using Parse Features for Preposition
Selection and Error Detection. In ACL (Short Papers). 2010.

129
Toutanova, K. and R. C. Moore, Pronunciation Modeling for Improved Spelling
Correction. In Proceedings of the 40th Annual Meeting on Association for Computa-
tional Linguistics, ACL ’02. Association for Computational Linguistics, Strouds-
burg, PA, USA, 2002. URL http://dx.doi.org/10.3115/1073083.1073109.
Uria, L., B. Arrieta, A. D. de Ilarraza, M. Maritxalar, and M. Oronoz, Determiner
errors in basque: Analysis and automatic detection. In Procesamiento del Lenguaje
Natural. 2009.
Uszkoreit, H., Grammar Checking: Theory, Practice and Lessons learned in LATESLAV.
Prague, 1996.
Uzzaman, N., A Bangla Phonetic Encoding for Better Spelling Suggestion. In
Proceedings of 7th International Conference on Computer and Information Technology.
2004.
UzZaman, N. (2005). Phonetic Encoding for Bangla and its Application to
Spelling checker, Transliteration, Cross Language Information Retrieval and
Name Searching. BRAC University. URL http://citeseerx.ist.psu.edu/
viewdoc/download?doi=10.1.1.173.1756&rep=rep1&type=pdf. Undergradu-
ate Thesis.
van Berkel, B. and K. De Smedt, Triphone Analysis: A Combined Method for
the Correction of Orthographical and Typographical Errors. In Proceedings of the
Second Conference on Applied Natural Language Processing, ANLC ’88. Association
for Computational Linguistics, Stroudsburg, PA, USA, 1988. URL http://dx.
doi.org/10.3115/974235.974250.
Van Gael, J., A. Vlachos, and Z. Ghahramani, The Infinite HMM for Unsupervised
PoS Tagging. In Proceedings of the 2009 Conference on Empirical Methods in Natural
Language Processing: Volume 2 - Volume 2, EMNLP ’09. Association for Compu-
tational Linguistics, Stroudsburg, PA, USA, 2009. ISBN 978-1-932432-62-6. URL
http://dl.acm.org/citation.cfm?id=1699571.1699601.
Viterbi, A. (1967). Error Bounds for Convolutional Codes and an Asymptotically
Optimum Decoding Algorithm. IEEE Transactions on Information Theory, 13(2),
260–269. URL http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?
arnumber=1054010.
Vitevitch, M. S., Phonological neighbors in a smallworld: What can graph theory
tell us about word learning? In Spring 2005 Talk Series on Networks and Complex
Systems. Indiana University, 2005.
Vogel, C. and R. Cooper, Robust chart parsing with mildly inconsistent feature
structures. In Nonclassical Feature Systems, volume 10. 1995.
Wagner, J., J. Foster, and J. van Genabith, A Comparative Evaluation of Deep and
Shallow Approaches to the Automatic Detection of Common Grammatical Er-
rors. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Lan-
guage Processing and Computational Natural Language Learning (EMNLP-CoNLL).
Association for Computational Linguistics, Prague, Czech Republic, 2007.

130
Wagner, R. A. and M. J. Fischer (1974). The String-to-String Correction Problem.
J. ACM, 21(1), 168–173. ISSN 0004-5411. URL http://doi.acm.org/10.1145/
321796.321811.

Weischedel, R. M., W. M. Voge, and M. James (1978). An artificial intelligence


approach to language instruction. Artificial Intelligence, 10, 225–240.

Whitelaw, C., B. Hutchinson, G. Y. Chung, and G. Ellis, Using the Web for Lan-
guage Independent Spellchecking and Autocorrection. In Proceedings of the 2009
Conference on Empirical Methods in Natural Language Processing, EMNLP ’09. As-
sociation for Computational Linguistics, Stroudsburg, PA, USA, 2009. ISBN 978-
1-932432-62-6. URL http://dl.acm.org/citation.cfm?id=1699571.1699629.

Yannakoudakis, E. J. and D. Fawthrop (1983). The Rules of Spelling Errors.


Information Processing and Management, 19(2), 87–99.

Yi, X., J. Gao, and W. B. Dolan, A web-based english proofing system for english
as a second language users. In Proceeding of the International Joint Conference on
Natural Langauge Processing(IJCNLP). 2008.

Young-Soog and Chae, Improvement of korean proofreading system using corpus


and collocation rules. In Proceedings of the 12th Pacific Asia Conference on Language,
Information and Computation. Singapore, 1998.

Yule, G. U., The Statistical Study of Literary Vocabulary. Cambridge University Press,
1944.

Zhang, Y. and J. Nivre, Transition-Based Dependency Parsing with Rich Non-


Local Features. In Proceedings of the 49th Annual Meeting of the Association for
Computational Linguistics: Human Language Technologies: short papers - Volume
2, HLT ’11. Association for Computational Linguistics, Stroudsburg, PA, USA,
2011. ISBN 978-1-932432-88-6. URL http://dl.acm.org/citation.cfm?id=
2002736.2002777.

131
LIST OF PAPERS BASED ON THESIS

1. Bibekananda Kundu, Sutanu Chakraborti and Sanjay Kumar Choudhury.


NLG Approach for Bangla Grammatical Error Correction. In Proceeding of 9th
International Conference of Natural Language Processing, Macmillan Publisher,
pp. 225–230, (2011).

2. Bibekananda Kundu, Sutanu Chakraborti and Sanjay Kumar Choudhury.


Combining Confidence Score and Mal-rule Filters for Automatic Creation
of Bangla Error Corpus: Grammar Checker Perspective. In Proceeding of
13th International Conference on Intelligent Text Processing and Computational
Linguistics, Springer LNCS, 7182, pp. 462–477, (2012).

3. Bibekananda Kundu, Sutanu Chakraborti and Sanjay Kumar Choudhury.


Complexity Guided Active Learning for Bangla Grammar Correction. In
Proceeding of 10th International Conference of Natural Language Processing, pp.
40–49, (2013).

132

You might also like