You are on page 1of 14

M.Sc.

IT(AI&ML)-III
P83A1NLP: Natural Language Processing
Practical List
1. Write a python program to explain various methods of the OS Module.
Code:-
import os

print(os.name)
os.mkdir("D:\\New_folder")
print(os.getcwd())
os.chdir("D:\\")
os.rmdir("D:\\New_folder")
fw = os.popen("D:\\02file.txt", 'w')
fw.write("This is awesome")
os.rename("D:\\02file.txt",'Python1.txt')

Ouput:-

Yash Amin 21084341001 1


2. Write a python program to show various ways to read and write as well
as append the data in a text file.
Code:-
with open("File12.txt","w") as fw:
fw.write("Hello\n")
fw.write("World\n")
print("Written in file")
with open("File12.txt","a") as fa:
fa.write("Nice\n")
fa.write("to\n")
fa.write("Meet\n")
fa.write("You\n")
print("Appended in file")
with open("File12.txt","r") as fr:
a = fr.read()
print(a)

output:-

Yash Amin 21084341001 2


3. Write a python program to show various ways to read and write as well
as append the data in a word file.
Code:-
import docx
doc = docx.Document()
doc.add_paragraph("This is first paragraph of a MS Word file.")
doc.add_paragraph("This is the second paragraph of a MS Word file.")
doc.add_heading("This is level 1 heading", 0)
doc.add_heading("This is level 2 heading", 1)
doc.save("D:/file1.docx")
all_paras = doc.paragraphs
for para in all_paras:
print(para.text)
print("-------")

Output:-

Yash Amin 21084341001 3


4. Write a python program to demonstrate the words and sentences
tokenizing using NLTK. Also show the concept of Bigrams, Trigrams &
Ngrams.
Code:-
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords,wordnet
from nltk.stem import WordNetLemmatizer
from nltk.stem import PorterStemmer
from nltk import bigrams,trigrams,ngrams
from nltk.probability import FreqDist

df['tokenized']=df['text'].apply(word_tokenize)
df['lower'] = df['tokenized'].apply(lambda x: [word.lower() for word in x])
stop_words=set(stopwords.words('english'))
df['stopwords_removed']= df['lower'].apply(lambda x:
[word for word in x if word not in stop_words])
wnl = WordNetLemmatizer()
df['lemmatized'] = df['stopwords_removed'].apply(lambda x:
[wnl.lemmatize(word) for word in x])

Output:-

Yash Amin 21084341001 4


Yash Amin 21084341001 5
5. Write a python program to demonstrate the concept of Frequency
Distribution in text or document.
Code:-
from nltk.corpus import stopwords,webtext
from nltk import bigrams
from nltk.probability import FreqDist
text_data = webtext.words('D:\\abc.txt')
stop_words = set(stopwords.words('english'))
f_w = []

for word in tex_lst:


if word not in stop_words:
if len(word)>3:
f_w.append(word)
bigram = bigrams(f_w)
freq_dist = FreqDist(bigram)

Output:-

Yash Amin 21084341001 6


6. Write a python program to implement the removing stop words from the
document according to the English dictionary using NLTK.
Code:-
from nltk.corpus import stopwords,webtext
from nltk import bigrams
from nltk.probability import FreqDist
text_data = webtext.words('D:\\abc.txt')
stop_words = set(stopwords.words('english'))
f_w = []

for word in text_data:


if word not in stop_words:
f_w.append(word)
print(f_w)

Output:-

Yash Amin 21084341001 7


7. Write a python program to implement the part of Speech tagging in
NLTK.
Code:-
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords,wordnet
from nltk.stem import WordNetLemmatizer
from nltk.stem import PorterStemmer
from nltk import bigrams,trigrams,ngrams
from nltk.probability import FreqDist

df['tokenized']=df['text'].apply(word_tokenize)
df['lower'] = df['tokenized'].apply(lambda x: [word.lower() for word in x])
stop_words=set(stopwords.words('english'))
df['stopwords_removed']= df['lower'].apply(lambda x:
[word for word in x if word not in stop_words])
nltk.download('averaged_perceptron_tagger')
df['pos_tags'] = df['stopwords_removed'].apply(nltk.tag.pos_tag)

Yash Amin 21084341001 8


def get_wordnet_pos(tag):
if tag.startswith('J'):
return wordnet.ADJ
elif tag.startswith('V'):
return wordnet.VERB
elif tag.startswith('N'):
return wordnet.NOUN
elif tag.startswith('R'):
return wordnet.ADV
else:
return wordnet.NOUN
df['wordnet_pos'] = df['pos_tags'].apply(lambda x:
[(word, get_wordnet_pos(pos_tag))
for (word, pos_tag) in
x])
wnl = WordNetLemmatizer()
df['lemmatized']=df['stopwords_removed'].apply(lambda x:
[wnl.lemmatize(word) for word in x])

Output:-

Yash Amin 21084341001 9


8. Write a python program to implement stemming and lemmatization with
NLTK.

Code:-

from nltk.stem import PorterStemmer

ps = PorterStemmer()

df['stemming'] = df['stopwords_removed'].apply(lambda x: [ps.stem(word)


for word in x])

wnl = WordNetLemmatizer()

df['lemmatized'] = df['stopwords_removed'].apply(lambda x:
[wnl.lemmatize(word) for word in x])

df.head()
Yash Amin 21084341001 10
Output:-

9. Write a python program to implement the Named Entity Recognition


(NER) using NLTK.
Code:-
import spacy
from spacy import displacy

NER = spacy.load("en_core_web_sm")

raw_text="The Indian Space Research Organisation or is the national space


agency of India, headquartered in Bengaluru. It operates under Department
of Space which is directly overseen by the Prime Minister of India while
Chairman of ISRO acts as executive of DOS as well."

Yash Amin 21084341001 11


text1= NER(raw_text)

for word in text1.ents:


print(word.text,word.label_)

Output:-

10. Write a complete NLP task for cleaning and pre-processing text using
NLTK.
Code:-
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords,wordnet
from nltk.stem import WordNetLemmatizer
from nltk.stem import PorterStemmer
from nltk import bigrams,trigrams,ngrams
from nltk.probability import FreqDist

df['tokenized']=df['text'].apply(word_tokenize)

Yash Amin 21084341001 12


df['lower'] = df['tokenized'].apply(lambda x: [word.lower() for word in x])
stop_words=set(stopwords.words('english'))
df['stopwords_removed']= df['lower'].apply(lambda x:
[word for word in x if word not in stop_words])
nltk.download('averaged_perceptron_tagger')
df['pos_tags'] = df['stopwords_removed'].apply(nltk.tag.pos_tag)
def get_wordnet_pos(tag):
if tag.startswith('J'):
return wordnet.ADJ
elif tag.startswith('V'):
return wordnet.VERB
elif tag.startswith('N'):
return wordnet.NOUN
elif tag.startswith('R'):
return wordnet.ADV
else:
return wordnet.NOUN
df['wordnet_pos'] = df['pos_tags'].apply(lambda x:
[(word, get_wordnet_pos(pos_tag))
for (word, pos_tag) in
x])

wnl = WordNetLemmatizer()
df['lemmatized'] = df['stopwords_removed'].apply(lambda x:
[wnl.lemmatize(word) for word in x])

Yash Amin 21084341001 13


df.head()

Output:-

Yash Amin 21084341001 14

You might also like