Professional Documents
Culture Documents
http://wordnet.princeton.edu/wordnet/related-
projects/#web
Definition:
the process for removing suffixes of words to get
their base or root form
Example:
‘fishing’, ‘fished’, ‘fish’, ‘fisher’ ‘fish’
Porter Stemmer
http://tartarus.org/~martin/PorterStemmer/
WordNet Stemmer
http://tipsandtricks.runicsoft.com/Other/JavaSte
mmer.html
Tokenization
The process of breaking a stream of text up into
“words” and punctuation marks.
Sentence Splitting
Part of Speech Tagging
Example: