Professional Documents
Culture Documents
SPRING, 2022
[ ]: import nltk
# from nltk.book import *
from nltk.stem.porter import *
from nltk.stem import *
[ ]: #The nltk.tag module defines functions and classes for manipulating tagged␣
,→tokens, which combine a basic
#token value with a tag. Tags are case-sensitive strings that identify some␣
,→property of a token, such as
#its part of speech. Tagged tokens are encoded as tuples (tag, token)\
import nltk.tag
1
0.0.2 Various Tagsets
[ ]: treebank.tagged_words()[1:20]
[ ]: treebank.tagged_words(tagset ='universal')[1:20]
[ ]: nltk.help.brown_tagset('VB.*')
[ ]: text = "The bitterest tears shed over graves are for words left unsaid and deeds␣
,→left undone - Harriet Beecher Stowe"
tok_text = word_tokenize(text)
nltk.pos_tag(tok_text)
print(tagged_token)
0.0.5 Corpus reader functions are named based on the type of information they return.
2
Find most common tags
[ ]: brown_tagged = b.tagged_words(categories='science_fiction', tagset='universal')