You are on page 1of 31

The Design of a Proofreading Software Service

Raphael Mudge NLP Hacker, Automattic

Overview
y What is After the Deadline y Where can you use it y How it works y Where to get it

What is AtD?
y A software service, checks:
y y y y
Spelling Real-word errors Style Grammar

A Software Service?

What is AtD?
y A software service, checks:

What is AtD?
y A software service, checks:

What is AtD?
y A software service, checks:

What is AtD?
y A software service, checks:

Where can you use it?
y In your browser
y Google Chrome and Firefox

y With your blog
y WordPress and IntenseDebate

y On your site
y TinyMCE and jQuery

Google Chrome

Firefox

OpenOffice.org

How much use?
May 2010:

3.5 million requests
100-140K requests/day

Design Goals
y Speed y Simplicity y A working solution

Spell Checking
Is word in dictionary ? No :( Generate Suggestions

Sort Suggestions

Sorting Suggestions
y Compare suggestion, error
y Do the first letters match? y Edit distance y Probability of suggestion in context

Sorting Suggestions
y ³The written wrd´
y Suggestions: ward, word
y First letters match y Edit distance = 1 y Pn(ward | written) = 0.00% y Pn(word | written) = 0.17%

Language Model
P(word)
count(word) / total

Pn(word|previous)
count(previous word) / count(previous)

Pp(word|next)
Pn(next|word) * P(word) / P(next)

Sorting Suggestions
y We want to calculate:
score(suggestion, error, context)

y Answer? Neural networks
y Trained with misspelled words y Returns a value 0.0 « 1.0

Spell Checker Evaluation

Method / Numbers from: Sebastian Deorowicz and Marcin G. Ciura. 2005. Correcting spelling errors by modeling their causes. International Journal of Applied Mathematics and Computer Science, 15(2):275±285.

Real-Word Errors
Is word part of a confusion set? Yes Sort Confusion Set in Context

Sorting Confusion Set
y Features
y P(word) y Pp(word | previous) y Pn(word | next) y Pp(word | previous, previous2) y Pn(word | next, next2) y Score function : Neural Network

Real-Word Error Evaluation

Evaluation captures grammar checker and statistical method. Details at: http://wp.me/pCBVi-a1

Grammar Checker
y The error:
y I wonder if this is your
companies way of providing support?

Grammar Checker
y The error:
y I wonder if this is your
companies way of providing support?

Grammar Checker
y The error:
y I wonder if this is your
companies way of providing support?

y The rule:
y Pattern: your .*/NNS y Suggestion: your \1:possessive

Grammar Checker
y How AtD sees the sentence:
y I/PRP wonder/VBP if/IN this/DT is/VBZ
your/PRP$ companies/NNS way/NN of/IN providing/VBG support/NN

y How AtD sees it:
y your companies way = 0.000004% y your company's way = 0.000030%

Design Principles
y Speed over accuracy y Simplicity over complexity y Do what works

Why now?
y Cheap hardware y Persistent internet y Lots of data

Open Source
y Server technology is GPL y Bootstrap data available y Front-end technology is LGPL

Where to get it
http://open.afterthedeadline.com AtD Technology (GPL) http://www.afterthedeadline.com Homepage mailto: raffi@automattic.com My email, I don¶t bite.