You are on page 1of 31

The Design of a

Proofreading Software Service

Raphael Mudge
NLP Hacker, Automattic
Overview
 What is After the Deadline
 Where can you use it
 How it works
 Where to get it
What is AtD?
 A software service, checks:
 Spelling
 Real-word errors
 Style
 Grammar
A Software Service?
What is AtD?
 A software service, checks:
What is AtD?
 A software service, checks:
What is AtD?
 A software service, checks:
What is AtD?
 A software service, checks:
Where can you use it?
 In your browser
Google Chrome and Firefox
 With your blog
WordPress and IntenseDebate
 On your site
TinyMCE and jQuery
Google Chrome
Firefox
OpenOffice.org
How much use?
May 2010:
3.5 million requests

100-140K requests/day
Design Goals
 Speed
 Simplicity
 A working solution
Spell Checking
No :(
Is word in Generate Suggestions
dictionary?

Sort Suggestions
Sorting Suggestions
 Compare suggestion, error
 Do the first letters match?
 Edit distance
 Probability of suggestion in context
Sorting Suggestions
 “The written wrd”
 Suggestions: ward, word
First letters match
Edit distance = 1
Pn(ward | written) = 0.00%
Pn(word | written) = 0.17%
Language Model
P(word)
count(word) / total

Pn(word|previous)
count(previous word) / count(previous)

Pp(word|next)
Pn(next|word) * P(word) / P(next)
Sorting Suggestions
We want to calculate:

score(suggestion, error, context)

 Answer? Neural networks


 Trained with misspelled words
 Returns a value 0.0 … 1.0
Spell Checker Evaluation

Method / Numbers from:


Sebastian Deorowicz and Marcin G. Ciura. 2005. Correcting spelling errors by modeling their
causes. International Journal of Applied Mathematics and Computer Science, 15(2):275–285.
Real-Word Errors
Is word Yes
part of a Sort Confusion Set in
confusion Context
set?
Sorting Confusion Set
 Features
 P(word)
 Pp(word | previous)
 Pn(word | next)
 Pp(word | previous, previous2)
 Pn(word | next, next2)
 Score function : Neural Network
Real-Word Error Evaluation

Evaluation captures grammar checker and statistical


method. Details at: http://wp.me/pCBVi-a1
Grammar Checker
 The error:
I wonder if this is your companies
way of providing support?
Grammar Checker
 The error:
I wonder if this is your companies
way of providing support?
Grammar Checker
 The error:
I wonder if this is your companies
way of providing support?
The rule:
Pattern: your .*/NNS
Suggestion: your \1:possessive
Grammar Checker
How AtD sees the sentence:
 I/PRP wonder/VBP if/IN this/DT is/VBZ
your/PRP$ companies/NNS way/NN of/IN
providing/VBG support/NN

How AtD sees it:


your companies way = 0.000004%
your company's way = 0.000030%
Design Principles
 Speed over accuracy
 Simplicity over complexity
 Do what works
Why now?
 Cheap hardware
 Persistent internet
 Lots of data
Open Source
 Server technology is GPL
 Bootstrap data available
 Front-end technology is LGPL
Where to get it
http://open.afterthedeadline.com
AtD Technology (GPL)
http://www.afterthedeadline.com
Homepage
mailto: raffi@automattic.com
My email, I don’t bite.

You might also like