Professional Documents
Culture Documents
Wm-Cse3024: Lab L29+30
Wm-Cse3024: Lab L29+30
LAB L29+30
ASSESSMENT 1
CODE:
input= "What is Web Mining? Web Mining is the process of 'Data Mining' techniques,
and extract information from Web documents and services. The main purpose of web
mining is discovering useful information from the World-Wide Web and it's usage
patterns"
print(word_tokenize(input))
OUTPUT:
RESULT:
In this we have used word_tokenize function of nltk toolkit to tokenize word of the
given input. The programme gave the successfully output as shown screen shot of the
code and output. The programme printed all the words in the given input.
b) AIM:
To create a python programme to tokenize the sentence using nltk toolkit.
CODE:
print(sent_tokenize(input))
OUTPUT:
RESULT:
In this we have used sent_tokenize function of nltk toolkit to tokenize sentence of the
given input. The programme run successfully and gave the desired output as shown in
screen shot. The programme printed all the sentences in the given input.
c) AIM:
To create a python programme to remove stop words & punctuation and list the
words using nltk toolkit.
CODE:
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
text= "What is Web Mining? Web Mining is the process of 'Data Mining' techniques,
and extract information from Web documents and services. The main purpose of web
mining is discovering useful information from the World-Wide Web and it's usage
patterns"
punct="!@#$%^&*()-[]{}:;',.?/|\`~_+="
no_punct=""
#removing punctution
for char in text:
if char not in punct:
no_punct=no_punct+char
print(filtered_text)
OUTPUT:
RESULT:
In this we first removed the punctuation from the text, using loop and by appending
only chars ( i.e chars which are not in punct string) in no_punct string. Then in second
part we removed stop words using stopwords function. Then tokenize the final string
we get after removing stop words and punctuation from text and printed the filtered
text. The programme was run successfully with the help of nltk toolkit.