Professional Documents
Culture Documents
Specifically, the three metrics are expressed as: the grammar correction in the setting and only
ask it to correct the ones with correctness prob-
TP lems (red underline), while leaving the clarity
Precision = , (1)
TP + FP (blue underline), engagement (green underline)
TP and delivery (purple underline) unchanged. We
Recall = , (2)
TP + FN iterate this process several times until there is no
1.25 × Precision × Recall error detected by Grammarly.
F0.5 = , (3)
0.25 × Precision + Recall
• GECToR: Besides Grammarly, we also compare
where T P , F P and F N represent the true posi- ChatGPT with GECToR (Omelianchuk et al.,
tives, false positives and false negatives of the pre- 2020), a state-of-the-art model on GEC in re-
dictions, respectively. We use the scoring program search, which also exhibits good performance
provided by CoNLL2014 official but adapt it to be on the CoNLL2014 task. We adopt the imple-
compatible with the latest Python environment. mentation based on the pre-trained RoBERTa
Baselines. In this report, we perform the GEC model.
task on three systems, including: 3.2 Results and Analysis
• ChatGPT: We query ChatGPT manually rather Overall Performance. Table 2 presents the over-
than using some API due to the instability of all performance of the three systems. As seen,
ChatGPT. For example, when a query sentence ChatGPT obtains the highest recall value, GECToR
resembles a question or demand, ChatGPT may obtains the highest precision value, while Gram-
stop the process of GEC but respond to the “de- marly achieves a better balance between the two
mand” instead. After a few trials, we find a metrics and results in the highest F0.5 score. These
prompt that works well for ChatGPT: results suggest that ChatGPT tends to correct as
many errors as possible, which may lead to more
Do grammatical error correction overcorrections. Instead, GECToR corrects only
on all the following sentences those it is confident about, which leaves many er-
I type in the conversation. rors uncorrected. Grammarly combines the advan-
We query ChatGPT with this prompt for each tages of both such that it performs more stably.
test sample. ChatGPT Performs Worse on Long Sentences?
To understand which kind of sentences ChatGPT
• Grammarly: Grammarly is a prevalent cloud-
are good at, we divide the 100 test sentences
based English typing assistant. It reviews
into three equally sized categories, namely, Short,
spelling, grammar, punctuation, clarity, engage-
Medium and Long. Table 3 shows the results with
ment, and delivery mistakes in English texts,
respect to sentence length. As seen, the gap be-
detects plagiarism and suggests replacements
tween ChatGPT and Grammarly is significantly
for the identified errors (Wikipedia contribu-
bridged on short sentences. In contrast, ChatGPT
tors, 2023b). As stated by Grammarly, every
performs much worse on those longer sentences, at
day, 30 million people and 50,000 teams around
least in terms of the existing evaluation metrics.
the world use Grammarly with their writing
(Grammarly, 2023). When querying Grammarly, ChatGPT Goes Beyond One-by-One Correc-
we open a text file and paste all the test sam- tions. We inspect the output of the three systems,
ples into separate paragraphs. We enable all especially those for long sentences, and find that
System Sentence
Source For an example , if exercising is helpful for family potential disease , we can
always look for more chances for the family to go exercise .
Reference For example , if exercising (OR exercise) is helpful for a potential family disease
, we can always look for more chances for the family to do exercise .
GECToR For example , if exercising is helpful for family potential disease , we can
always look for more chances for the family to go exercise .
Grammarly For example , if exercising is helpful for a family ’s potential disease , we can
always look for more chances for the family to go exercise .
ChatGPT For example , if exercise is helpful in preventing potential family diseases , we
can always look for more opportunities for the family to exercise .