You are on page 1of 8

REPORT ON

“AUTOMATED WRONG GRAMMAR


DETECTOR”
Submitted to:
Dr. Vishal Gupta
By

Shashank Agrawal: 2020A7PS0073P


Shadan Hussain: 2020A7PS0134P
Introduction

Checking grammar is the process of identifying and correcting grammatical


problems in a text. In the realm of science and technology, English is the dominant
language. Therefore, non-native English speakers must be able to read, write, and
talk with perfect English grammar. This necessitates the use of grammar-checking
software.

Grammar checking by a person becomes inconvenient when human resources are


limited, the size of the document is huge, or the checking must be performed often.
Therefore, it would be advantageous to automate the grammar-checking procedure.
A grammar checking program can automatically discover and rectify any incorrect,
unorthodox, or problematic usage of the underlying grammar.

The purpose of this systematic study is to examine the existing literature, identify
current issues and propose potential future study areas. Since the algorithm for
checking grammar depends on the particular language, here we provide analysis
solely for the English language.

We start our discussion by exploring various types of errors that are usually
incurred and then an insight into various techniques for grammar checking. Then
we will shift our attention to the most significant noise channel model for
addressing syntax and spelling errors.
Types of Errors and Broad Techniques for Grammar checking:
Before the actual implementation of any grammar checking
approach, it is important to identify major types of errors and their classification on
the basis of some criteria.

There can be various classifications of errors in grammar based on different


criteria.

Fig. 1. Classification of errors based on (a) frequency, (b) validity, (c) level and (d)
combining (a), (b) and (c)
Various Techniques for Grammar checking

1.) Rule based approach :


The traditional method of grammar verification involves manually creating
grammatical rules. These high-quality rules were formulated by linguists. A
part-of-speech-tagged English text is compared to the given set of rules, and
a matching rule is implemented to rectify any errors. The approach seems
simple since adding, editing, or removing a rule is straightforward; yet,
developing rules requires substantial understanding of the grammar of the
underlying language. Rule-based systems may give thorough explanations
for errors that have been highlighted, making the system particularly useful
for computer-assisted language acquisition. However, manually maintaining
hundreds of grammatical rules is tiresome.
Some examples of models of grammar checking tools which follow
rule-based approach:

English Grammar Checker


- utilizes Combinatory Categorial Grammar (CCG) to derive
syntax information of a sentence in a categorical lexicon
- each lexicon is a collection of lexical entries, where entries define the
acceptable words for that lexicon
- it defines the order of appearance of categories of words in a sentence
- can detect spelling errors, article or determiner error, agreement errors,
missing or extra elements
and verb tense error
- other type of errors such as wrong word choice errors, preposition errors
and run-ons could
not be detected

Island processing based approach


- First, the input text is broken into sentences and words
- Then, a set of finite state automata are used to assign a category to the
word such as noun, verb, etc. and are the important 'islands' of the sentence
- They are assigned special features and stored in registers
- The error detection automata is called which matches the word features to
decide on an error and suggests a correction to it

2.) Machine Learning based technique :


Currently, the most used technology for grammar verification is machine
learning. Using supervised learning yields the most effective outcomes .
These approaches employ an annotated corpus, which is then utilized to do
statistical analysis on the text in order to find and repair grammar problems
automatically. In contrast to rule-based systems, it is challenging to explain
the mistakes caused by these systems. Due to their reliance on the
underlying corpus, machine learning-based solutions do not require a
comprehensive understanding of grammar. The absence of a large annotated
corpus complicates the deployment of these approaches for the purpose of
grammar checking. Also, the outcomes are very dependent on the
cleanliness of the corpus.

SMT based approach:


- makes use of Statistical Machine Translation (SMT) to detect and correct
Grammar errors
- translates whole erroneous phrase instead of individual words
- Uses noisy channel model for error correction
- Sentences containing errors are used to create training data which can map
erroneous string to correct one

We will further study and deep dive this approach in future discussions.

3.) Hybrid Approach :


Combining machine learning with rule-based approaches can be used to
increase the system's performance.
Since, certain mistakes are best addressed with a rule-based approach (e.g.,
the usage of 'a' or 'an') and others with machine learning (e.g., determiner
errors). Therefore, each component of the hybrid approach should be
deployed based on its "competence" .The corpus of text may be used to train
a system to recognise valid sentence patterns, and the results can be filtered
using custom-designed criteria. The hybrid approach is useful for tackling a
broad variety of complicated mistakes. In addition, the arduous task of
creating so many rules can be decreased to a larger degree.

In the next section we shall discuss examples of each of these approaches.

Maximum Entropy Classifier based approach:


- uses maximum entropy model
- trained with prepositions along with a set of associated feature-value pairs
(its context)
- predicts the probability of each preposition in the given context and then
compares it with the preposition used by the writer
- The erroneous preposition is replaced with most probable preposition
- each context is classified into one of the 34 classes of prepositions.
- To solve the problem of detecting extra prepositions, authors devised two
rules- Rule1 deals with repetition of the same preposition and error is
detected when the same POS tag is used. Rule2 deals with wrong addition of
a preposition between a plural noun and a quantifier.

THE NOISY-CHANNEL MODEL

The noisy channel model is a framework used by computers for spell


checking, question answering, speech recognition, and machine translation.
It attempts to discover the right spelling or pronunciation of a misspelled or
mispronounced word.
It consists of 2 main components. A Base Model and a noise model. The
basic language model is a probabilistic language model that creates a
'error-free' phrase with a predetermined probability. The probabilistic noise
model then analyzes this text and determines whether or not to make it
incorrect by injecting different forms of errors. These errors might be any of
such as misspellings, improper article usage, improper word form usage, etc.
We then find a probability p(S(original)|S(observed)) using bayes rule where
S(original) is the original sentence created by our base language model, and
S(observed) is the observed erroneous sentence
Using Bayes Rule we have the following relation:
𝑝[𝑆(𝑜𝑟𝑖𝑔)|𝑆(𝑜𝑏𝑠)] * 𝑝[𝑆(𝑜𝑟𝑖𝑔)]
𝑃[𝑆(𝑜𝑟𝑖𝑔)|𝑆(𝑜𝑏𝑠)] = 𝑝[𝑆(𝑜𝑏𝑠)]

Now for the language model we can use various other known probabilistic
models and techniques which have predefined methods for learning the
parameters, such as N gram models or PCFGs.

By tracing all feasible routes from the language model through the noise
model and terminating in the observed phrase as output, we may discover
the greatest likelihood of error-free sentence for an observed output sentence
using this model.
References:
1.) Madhavi Soni et al. A Systematic Review of Automated Grammar Checking
in English Language
2.) Park Levy et al. Automated Whole Sentence Grammar Correction Using a
Noisy Channel Model
3.) Shi-Li Zhe et al. Automated Error Detection of Vocabulary Usage in College
English Writing

You might also like