Professional Documents
Culture Documents
anrprogrammer
A summary of my Statistics, Mathematics, CSC, AI, Machine Learning Projects
A spell-checker in R
I came across Dr. Peter Norvig’s blog about writing a basic spell-checker (http://norvig.com/spell-correct.html), and
just had to try to implement it in R. Please excuse the ugly-ish code (I have not optimized it or commented it
adequately at this point, but you can get the idea of what it does by reading Dr. Norvig’s blog). If anyone knows of
any pre-built spell-checker packages in R, please let me know in a comment!
I do not think R is a particularly good language for this sort of activity, but I got it to work out ne. The rst few lines
here create a list of common words, and their frequencies in the English language. The following lines may take a
few minutes to run on an average machine, but I will try to upload them soon so that you can just download the
table instead of creating it yourself…
The above functions generate “neighbors” of words, determine probabilities of the neighbors, and return the best
ones. Function “CorrectDocument” will correct an entire document (with special characters and punctuation
removed), and “Correct” will simply correct a word. Here are some sample runs.
As you can see, this function is obviously not perfect. It will do some basic corrections automatically though, but
there are some improvements to be made. More to come!
https://anrprogrammer.wordpress.com/2012/02/08/a-spell-checker-in-r/ 3/6
9/17/2017 A spell-checker in R | anrprogrammer
Advertisements
Ads by industrybuying.com
Ads by industrybuying.com
This entry was posted in cipher, R, Statistics and tagged data mining, machine learning, R, statistics on February 8,
2012 [https://anrprogrammer.wordpress.com/2012/02/08/a-spell-checker-in-r/] .
Richie Cotton
February 8, 2012 at 12:04 pm
Very nice, but did you know about the aspell function in the utils package?
https://anrprogrammer.wordpress.com/2012/02/08/a-spell-checker-in-r/ 4/6
9/17/2017 A spell-checker in R | anrprogrammer
Yes, I had come across that in my search. I didn’t think much about it since you cannot directly supply it with a
string and have it corrected, but it could easily perform the same function (and better, I’m sure) if I just output the
string to a le and then read it in using aspell. I will likely do the latter if I need any serious spell-correction done
with R, but learning the idea behind very basic spell-checking was interesting!
luiscarlosmr
February 10, 2012 at 2:55 pm
# Deletes
# TRY: word[-i]
Deletes <- function(word = FALSE) {
N <- nchar(word)
out<-mat.or.vec(1,length(N))
word <- unlist(strsplit(word, NULL))
for(i in 1:length(N)) {
out[i] <- paste(word[-i], collapse = "")
}
return(out)
}
As you can see I just added a line. Have a great time and this code is really interesting to me. Thank you.
Deepak
October 2, 2012 at 12:39 pm
https://anrprogrammer.wordpress.com/2012/02/08/a-spell-checker-in-r/ 5/6
9/17/2017 A spell-checker in R | anrprogrammer
Dan
October 15, 2012 at 2:49 am
Nope, it is completely free to use, and has no license associated with it.
Dan
October 16, 2012 at 9:57 pm
Awesome! I’d love to use it in a project! Unfortunately, “no license associated with it” doesn’t exactly mean “free to
use” because it’s copyrighted by default. Do you mean that you release it into the public domain? Thanks again, this
was an awesome lesson, and I’d love to be able to adapt it for a need of mine.
Ankit
July 9, 2013 at 2:15 pm
Hello, I am working on a project where I need to build a spell checker and that’s how I stumbled onto your blog
post. But I’m facing an issue when I am trying to execute the code. R gives me the following error:
Do you have any idea what could be the reason for such an error?
https://anrprogrammer.wordpress.com/2012/02/08/a-spell-checker-in-r/ 6/6