You are on page 1of 2

1/23/23, 6:23 PM Untitled0.

ipynb - Colaboratory

KARAKA.RUPASREE 20BCI7108

import nltk
nltk.download()

NLTK Downloader
---------------------------------------------------------------------------
d) Download l) List u) Update c) Config h) Help q) Quit
---------------------------------------------------------------------------
Downloader> d

Download which package (l=list; x=cancel)?


Identifier> punkt
Downloading package punkt to /root/nltk_data...
Package punkt is already up-to-date!

---------------------------------------------------------------------------
d) Download l) List u) Update c) Config h) Help q) Quit
---------------------------------------------------------------------------
Downloader> q
True
Code Text
#Q1
import nltk
para="""The name of my village is Pakdiyar which falls in the Gopalganj district of Bihar.During the summer and winter vacations, I visit
My grandparents house is one of the biggest pakka houses in the village.My grandmother does a lot of social work for the villagers. There
People fetch water from these sources for daily use, irrigation, etc. They celebrate their joys together and stand united in tough times.
Every person in my village is hard working. My village does not have tall buildings and glittering lights. But it has peace, warmth and a
I love spending vacations in my village along with my parents and grandparents."""
from nltk.tokenize import word_tokenize
wt = word_tokenize(para)
print(wt)
print("\nNo.of Words in the Paragraph: ",len(wt))
from nltk.probability import FreqDist
fd = (FreqDist(wt))
list = [(m,n) for m,n in fd.items()]
print("\n",list)

['The', 'name', 'of', 'my', 'village', 'is', 'Pakdiyar', 'which', 'falls', 'in', 'the', 'Gopalganj', 'district', 'of', 'Bihar.Durin

No.of Words in the Paragraph: 140

[('The', 1), ('name', 1), ('of', 4), ('my', 5), ('village', 7), ('is', 3), ('Pakdiyar', 1), ('which', 1), ('falls', 1), ('in', 7),

#Q2
from urllib import request
url = "http://www.gutenberg.org/files/2554/2554-0.txt"
response = request.urlopen(url)
raw = response.read().decode('utf8')
print(raw)
uwt = word_tokenize(raw)
print(uwt)

https://colab.research.google.com/drive/12Xymz9HL-V5nzLd23uQwGvRKBnb73_Fs#scrollTo=tCC-7ykhqkBC&printMode=true 1/3
1/23/23, 6:23 PM Untitled0.ipynb - Colaboratory
Gutenberg-tm concept of a library of electronic works that could be
freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg-tm eBooks with only a loose network of
volunteer support.

Project Gutenberg-tm eBooks are often created from several printed


editions, all of which are confirmed as not protected by copyright in
the U.S. unless a copyright notice is included. Thus, we do not
necessarily keep eBooks in compliance with any particular paper
edition.

Most people start at our website which has the main PG search
facility: www.gutenberg.org

This website includes information about Project Gutenberg-tm,


including how to make donations to the Project Gutenberg Literary
Archive Foundation, how to help produce our new eBooks, and how to
subscribe to our email newsletter to hear about new eBooks.

IOPub data rate exceeded.


The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)

#Q3
fd1 = (FreqDist(uwt))
list1 = [(m,n) for m,n in fd1.items()]
print(list1)

[('\ufeffThe', 1), ('Project', 84), ('Gutenberg', 28), ('eBook', 11), ('of', 3849), ('Crime', 4), ('and', 6279), ('Punishment', 2),

#Q4
from matplotlib import pyplot as plt
fd1.plot(30,cumulative=False)
print(fd1.most_common(4))
plt.show()

[(',', 16177), ('.', 8908), ('the', 7447), ('and', 6279)]

https://colab.research.google.com/drive/12Xymz9HL-V5nzLd23uQwGvRKBnb73_Fs#scrollTo=tCC-7ykhqkBC&printMode=true 2/3

You might also like