You are on page 1of 16

Practical File

ON
Natural Language Processing CSE0742[P]

Submitted By- Submitted To-


Vinayak Singh Mr. H.N. Verma
BETN1CS19068 Associate Professor
7Th Semester ITM University, Gwalior

1
INDEX
S. No. Practical List Page No. Remarks
1 Installation and working on 3
classical toolkit
2 Write a program for Tokenization 4
3 Write program for stop word 5
removal
4 Write a program for stemming 6 ,7
5 Write a program for 8

Lemmatization
6 Write a program for speech 9,10

tagging.
7 Implement N-Gram model. 11,12

8 Generating Unigram, Bigram, 13,14


Trigram in NLTK
9 Write a program for Cocke- 15,16

Kasami Younger Algorithm.

2
EXPERIMENT NO: 1

Installation and working on classical toolkit.

3
EXPERIMENT NO:2

Write a program for Tokenization

4
EXPERIMENT NO:3

Write program for stop word removal


Stop Words: A stop word is a commonly used word (such as “the”, “a”, “an”,
“in”) that a search engine has been programmed to ignore, both when indexing
entries for searching and when retrieving them as the result of a search query.
We would not want these words to take up space in our database, or taking up
valuable processing time. For this, we can remove them easily, by storing a list
of words that you consider to stop words. NLTK(Natural Language Toolkit) in
python has a list of stop words stored in 16 different languages.

5
EXPERIMENT NO:4

Write a program for Stemming


Stemming is the process of producing morphological variants of a root word.
Stemming programs are commonly referred to as stemming algorithms or
stemmers. A stemming algorithm reduces the words “chocolates”, “chocolatey”,
and “choco” to the root word, “chocolate”.

6
7
EXPERIMENT NO:5

Write a program for Lemmatization


Lemmatization in NLTK is the algorithmic process of finding the lemma of a
word depending on its meaning and context. Lemmatization usually refers to the
morphological analysis of words, which aims to remove inflectional endings. It
helps in returning the base or dictionary form of a word known as the lemma.

8
EXPERIMENT NO:6

Write a program for speech tagging.


POS tag list:

CC coordinating conjunction
CD cardinal digit
DT determiner
EX existential there (like: "there is" ... think of it like "there exists")
FW foreign word
IN preposition/subordinating conjunction
JJ adjective 'big'
JJR adjective, comparative 'bigger'
JJS adjective, superlative 'biggest'
LS list marker 1)
MD modal could, will
NN noun, singular 'desk'
NNS noun plural 'desks'
NNP proper noun, singular 'Harrison'
NNPS proper noun, plural 'Americans'
PDT predeterminer 'all the kids'
POS possessive ending parent\'s
PRP personal pronoun I, he, she
PRP$ possessive pronoun my, his, hers
RB adverb very, silently,
RBR adverb, comparative better
RBS adverb, superlative best RP particle give up TO to go 'to'
the store.
UH interjection errrrrrrrm
VB verb, base form take
VBD verb, past tense took
VBG verb, gerund/present participle taking
VBN verb, past participle taken
VBP verb, sing. present, non-3d take
VBZ verb, 3rd person sing. present takes
WDT wh-determiner which
WP wh-pronoun who, what

9
WP$ possessive wh-pronoun whose
WRB wh-abverb where, when

10
EXPERIMENT NO:7

Implement N-Gram Model.


N-grams are continuous sequences of words or symbols or tokens in a
document. In technical terms, they can be defined as the neighbouring
sequences of items in a document. They come into play when we deal with text
data in NLP(Natural Language Processing) tasks.

11
12
EXPERIMENT NO:8

Generating Unigram, Bigram, Trigram in NLTK

Unigram or 1-grams

13
Bigram or 2-grams

Trigrams or 3-grams

14
EXPERIMENT NO:9

Write a program for Cocke-Kasami Younger Algorithm.


It is used to solves the membership problem using a dynamic programming
approach. The algorithm is based on the principle that the solution to
problem [i, j] can constructed from solution to subproblem [i, k] and
solution to sub problem [k, j]. The algorithm requires the
Grammar G to be in Chomsky Normal Form (CNF). Note that any
ContextFree Grammar can be systematically converted to CNF. This
restriction is employed so that each problem can only be divided into two
subproblems and not more – to bound the time complexity. The parse tree of
this phrase would look like this:

15
16

You might also like