You are on page 1of 5

WM-CSE3024

LAB L29+30

ASSESSMENT 1

NAME: RITVIK BILLA


REG NO. 20BCE0306
DATE: 07-02-2022

Submitted To: Hiteshwar Kumar Azad.


a) AIM:
To create a python programme to tokenize the word using nltk toolkit.

CODE:

from nltk.tokenize import word_tokenize

input= "What is Web Mining? Web Mining is the process of 'Data Mining' techniques,
and extract information from Web documents and services. The main purpose of web
mining is discovering useful information from the World-Wide Web and it's usage
patterns"

print(word_tokenize(input))

OUTPUT:

RESULT:
In this we have used word_tokenize function of nltk toolkit to tokenize word of the
given input. The programme gave the successfully output as shown screen shot of the
code and output. The programme printed all the words in the given input.

b) AIM:
To create a python programme to tokenize the sentence using nltk toolkit.

CODE:

from nltk.tokenize import sent_tokenize


input= "What is Web Mining? Web Mining is the process of 'Data Mining' techniques,
and extract information from Web documents and services. The main purpose of web
mining is discovering useful information from the World-Wide Web and it's usage
patterns"

print(sent_tokenize(input))

OUTPUT:

RESULT:
In this we have used sent_tokenize function of nltk toolkit to tokenize sentence of the
given input. The programme run successfully and gave the desired output as shown in
screen shot. The programme printed all the sentences in the given input.

c) AIM:
To create a python programme to remove stop words & punctuation and list the
words using nltk toolkit.

CODE:
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

text= "What is Web Mining? Web Mining is the process of 'Data Mining' techniques,
and extract information from Web documents and services. The main purpose of web
mining is discovering useful information from the World-Wide Web and it's usage
patterns"
punct="!@#$%^&*()-[]{}:;',.?/|\`~_+="

no_punct=""

#removing punctution
for char in text:
if char not in punct:
no_punct=no_punct+char

#removing all stop_words


stop_words=stopwords.words('english')
text_token=word_tokenize(no_punct)
filtered_text=[word for word in text_token if not word in stop_words]

print(filtered_text)

OUTPUT:

RESULT:
In this we first removed the punctuation from the text, using loop and by appending
only chars ( i.e chars which are not in punct string) in no_punct string. Then in second
part we removed stop words using stopwords function. Then tokenize the final string
we get after removing stop words and punctuation from text and printed the filtered
text. The programme was run successfully with the help of nltk toolkit.

You might also like