Welcome to Scribd!

Import From Import: 'Gutenberg' 'Punkt'

Uploaded by

0% found this document useful (0 votes)

3 views2 pages

The document demonstrates how to analyze a text from the Gutenberg corpus using NLTK in Python. It downloads the required Gutenberg and punkt packages, selects a text from the corpus, then analyzes the text to determine the number of characters, words, sentences. It also displays the first 100 characters, 50 words, and 5 sentences.

Original Description:

Original Title

76009aa9-b6dc-4d82-ad29-120d62ccbe74

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

3 views2 pages

Import From Import: 'Gutenberg' 'Punkt'

Uploaded by

I yr IT 10-Cherisha S

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 2

Search inside document

8/17/23, 9:24 PM Exp4_TSA - Jupyter Notebook

1. Download the Gutenberg corpus

In [20]: import nltk

from nltk.corpus import gutenberg

2. Import Gutenberg corpus from NLTK

In [21]: nltk.download('gutenberg')
nltk.download('punkt')

[nltk_data] Downloading package gutenberg to /root/nltk_data...

[nltk_data] Package gutenberg is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Package punkt is already up-to-date!

Out[21]: True

3. Display all text associated with Gutenberg corpus

In [12]: gutenberg_files = gutenberg.fileids()

gutenberg_text = ''
for file_id in gutenberg_files:
gutenberg_text += gutenberg.raw(file_id)

4. Extract any one file from Gutenberg corpus

In [13]: selected_file_id = 'shakespeare-hamlet.txt'

selected_text = gutenberg.raw(selected_file_id)

5. Find the number of characters in the selected corpus

In [14]: num_characters = len(selected_text)

print("Number of characters:", num_characters)

Number of characters: 162881

6. Display the first 100 characters from the selected corpus

In [15]: first_100_characters = selected_text[:100]

print("First 100 characters:", first_100_characters)

First 100 characters: [The Tragedie of Hamlet by William Shakespeare 1599]

Actus Primus. Scoena Prima.

Enter Barnardo a

7 Fi d th
localhost:8891/notebooks/Exp4_TSA.ipynb#
b f d i th l t d 1/2
8/17/23, 9:24 PM Exp4_TSA - Jupyter Notebook
7. Find the number of words in the selected corpus

In [16]: words = nltk.word_tokenize(selected_text)

num_words = len(words)
print("Number of words:", num_words)

Number of words: 36372

8. Display the first 50 words from the selected corpus

In [17]: first_50_words = ' '.join(words[:50])

print("First 50 words:", first_50_words)

First 50 words: [ The Tragedie of Hamlet by William Shakespeare 1599 ] Actu

s Primus . Scoena Prima . Enter Barnardo and Francisco two Centinels . Barn
ardo . Who 's there ? Fran . Nay answer me : Stand & vnfold your selfe Bar
. Long liue the King Fran . Barnardo ?

9. Find the total number of sentences in the selected corpus

In [18]: sentences = nltk.sent_tokenize(selected_text)

num_sentences = len(sentences)
print("Total number of sentences:", num_sentences)

Total number of sentences: 2355

10. Display the first 5 sentences from the selected corpus

In [19]: first_5_sentences = '\n'.join(sentences[:5])

print("First 5 sentences:\n", first_5_sentences)

First 5 sentences:
[The Tragedie of Hamlet by William Shakespeare 1599]

Actus Primus.
Scoena Prima.
Enter Barnardo and Francisco two Centinels.
Barnardo.
Who's there?

localhost:8891/notebooks/Exp4_TSA.ipynb# 2/2

The Tech Resume
Document45 pages
The Tech Resume
Manu Kalyan
100% (1)
HUAWEI - 03 Python Advanced
Document22 pages
HUAWEI - 03 Python Advanced
Pierpaolo Vergati
No ratings yet
Korea
Document2 pages
Korea
Salman
No ratings yet
Basic Maltese Grammar
Document222 pages
Basic Maltese Grammar
pbury
89% (9)
Linux Script
Document15 pages
Linux Script
arun0076@gmail.com
100% (1)
Week1.py: Write A Program To Demonstrate Different Number Data Types in Python
Document17 pages
Week1.py: Write A Program To Demonstrate Different Number Data Types in Python
jai gera
86% (21)
Empower C1 SB Content
Document5 pages
Empower C1 SB Content
ISRAEL
100% (1)
Chinese Radicals
Document8 pages
Chinese Radicals
Alex Larionov
No ratings yet
Neurology Exam Checklist1
Document6 pages
Neurology Exam Checklist1
Syed Afzal
No ratings yet
Zionism at The Un
Document358 pages
Zionism at The Un
damianosito
No ratings yet
Chapter 1 Lesson 2: Concept of Hero
Document20 pages
Chapter 1 Lesson 2: Concept of Hero
GARAS, JOYLYN JANE M.
100% (1)
Reflexives, Reciprocals and Passives
Document3 pages
Reflexives, Reciprocals and Passives
newjaner
No ratings yet
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
From Everand
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
MODULE 3 MTB MLE Teaching Methodologies
Document20 pages
MODULE 3 MTB MLE Teaching Methodologies
Joyce Fuertez
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Xii Cs Practical Record
Document20 pages
Xii Cs Practical Record
Om Tank
No ratings yet
The Origin of Language (Summary)
Document5 pages
The Origin of Language (Summary)
Qudwatu Nabila Rodhiya
75% (8)
Text Processing
Document16 pages
Text Processing
Nipuni
No ratings yet
Practice Questions With Solutions of Unix
Document26 pages
Practice Questions With Solutions of Unix
userdemo12334
No ratings yet
Part 2
Document16 pages
Part 2
Kashyap Patel
No ratings yet
LAB
Document31 pages
LAB
der nilam
No ratings yet
Unix Lab Manual
Document29 pages
Unix Lab Manual
Joylin Pinto
No ratings yet
Introduction To Linux Assignments
Document5 pages
Introduction To Linux Assignments
saeuhsaoteu
No ratings yet
NP Lab Record
Document15 pages
NP Lab Record
Nag Raj
No ratings yet
Lectuer Four: 1 Lecture 4: String Manipulation
Document9 pages
Lectuer Four: 1 Lecture 4: String Manipulation
Aram Salah
No ratings yet
Linux Basic Commands and Shell Scripting
Document8 pages
Linux Basic Commands and Shell Scripting
Apurba Debnath
No ratings yet
PH434 Autumn 2022 - Class # 2 Working On A Terminal: I. Working With Bash Ii - Scripts
Document5 pages
PH434 Autumn 2022 - Class # 2 Working On A Terminal: I. Working With Bash Ii - Scripts
jeremy
No ratings yet
C:/Users/Rafe/Appdata/Local/Programs/Python/Python35-32/Scripts Object and Data Structures Basics
Document16 pages
C:/Users/Rafe/Appdata/Local/Programs/Python/Python35-32/Scripts Object and Data Structures Basics
rafe
No ratings yet
I041 - NLP - Assignment1.ipynb - Colaboratory
Document11 pages
I041 - NLP - Assignment1.ipynb - Colaboratory
Devesh Pawar
No ratings yet
Ranjithkumar PPK
Document4 pages
Ranjithkumar PPK
uilhgcy
No ratings yet
Lab Instructions: A. How To Login To Emerald Cluster
Document7 pages
Lab Instructions: A. How To Login To Emerald Cluster
Perseus79
No ratings yet
#Loading NLTK: Import
Document5 pages
#Loading NLTK: Import
avinash
No ratings yet
1234
Document2 pages
1234
api-26176404
No ratings yet
Python Solutions
Document21 pages
Python Solutions
Disha Goel
No ratings yet
Solution 8th Chapter
Document10 pages
Solution 8th Chapter
Sajid Hussain S
No ratings yet
Adminhack
Document22 pages
Adminhack
ligmab618
No ratings yet
Linux Mini Shell
Document7 pages
Linux Mini Shell
Thanh Hải
No ratings yet
BSC#3 Sem Os Practical
Document14 pages
BSC#3 Sem Os Practical
Mr Light Dark
No ratings yet
Filehandling
Document7 pages
Filehandling
classfunction9
No ratings yet
p66 0x0a Malloc Des-Maleficarum by Blackngel
Document46 pages
p66 0x0a Malloc Des-Maleficarum by Blackngel
abuadzkasalafy
No ratings yet
03-Functional Programming Exercises
Document15 pages
03-Functional Programming Exercises
Aditya Singh
No ratings yet
Write A Program To Capitalize First and Last Letter of Given String
Document45 pages
Write A Program To Capitalize First and Last Letter of Given String
rasalshweta221
No ratings yet
Module-3 - User MNGT and I, O Redirection
Document26 pages
Module-3 - User MNGT and I, O Redirection
Joshua Louis Racelis
No ratings yet
QT Lab 08 by 17-16TL20
Document7 pages
QT Lab 08 by 17-16TL20
Saad Khan
No ratings yet
Unix Lab Manual Part B PDF
Document12 pages
Unix Lab Manual Part B PDF
Manoj BE
No ratings yet
SL - Programs-1
Document35 pages
SL - Programs-1
veerendranath
No ratings yet
Scripts 1
Document23 pages
Scripts 1
nitindxt
No ratings yet
Web Scraping
Document11 pages
Web Scraping
Alya Rusmi
No ratings yet
CSE207 HW3 - Solution
Document6 pages
CSE207 HW3 - Solution
Yogendra Singh
No ratings yet
Practical File - Jupyter Notebook
Document11 pages
Practical File - Jupyter Notebook
Lili
No ratings yet
Pythpn Cheat
Document8 pages
Pythpn Cheat
Animesh Maity
No ratings yet
Aspire Assignment Answers
Document35 pages
Aspire Assignment Answers
Vijay Raghava
No ratings yet
DAL Experiments Output
Document19 pages
DAL Experiments Output
sujaykulkarni755
No ratings yet
NP Record
Document32 pages
NP Record
Ram Devineni
No ratings yet
3 Namespaces Operators and Expressions
Document24 pages
3 Namespaces Operators and Expressions
Utsav Vedant
No ratings yet
Python Program
Document8 pages
Python Program
sharmila
No ratings yet
1.1 Lab Mannul
Document47 pages
1.1 Lab Mannul
VINAY SANKAR PERNA
No ratings yet
Natural Language Processing
Document17 pages
Natural Language Processing
coding ak
No ratings yet
CS 247 Assignment Two
Document3 pages
CS 247 Assignment Two
karllixiangyu
No ratings yet
Python Mannual
Document50 pages
Python Mannual
rasalshweta221
No ratings yet
Unit-4 and 5
Document81 pages
Unit-4 and 5
Omnious
No ratings yet
10 DSA Hashing Question You Should Practice
Document14 pages
10 DSA Hashing Question You Should Practice
yonoji1415
No ratings yet
Bhavik and Parth
Document12 pages
Bhavik and Parth
vaje rohan
No ratings yet
Artigo PDF
Document6 pages
Artigo PDF
dongutsi
No ratings yet
Lab Report 05
Document5 pages
Lab Report 05
Imamul Hasan
No ratings yet
Final Review Worksheet: CMSC 201 Spring 2019 Name
Document5 pages
Final Review Worksheet: CMSC 201 Spring 2019 Name
cabdi saf saf
No ratings yet
Team 7 Finding The KTH Smallest Element D11
Document7 pages
Team 7 Finding The KTH Smallest Element D11
DIPTANU SAHA
No ratings yet
Alm Co-2 PDF
Document11 pages
Alm Co-2 PDF
Thota Deep
No ratings yet
Unix Commants Interview
Document2 pages
Unix Commants Interview
malleswari Ch
No ratings yet
Scrpting Langaugae
Document24 pages
Scrpting Langaugae
Nithesh
No ratings yet
1.demonstration of Open, Read, Write and Close Functions Pract2.c
Document5 pages
1.demonstration of Open, Read, Write and Close Functions Pract2.c
Karthik Chelakkara
No ratings yet
Island Garden City of Samal - Briefer
Document4 pages
Island Garden City of Samal - Briefer
Yannah Hidalgo
No ratings yet
Teaching English in Algerian Middle School Coordinators Meeting Sept 29 2015
Document4 pages
Teaching English in Algerian Middle School Coordinators Meeting Sept 29 2015
Samir Bounab
92% (25)
Ficha2 - Unidad 1 - Jer - 1°
Document2 pages
Ficha2 - Unidad 1 - Jer - 1°
carlosfranciaavila
No ratings yet
English To Japanese Guru Gurubooks
Document10 pages
English To Japanese Guru Gurubooks
Ionel Vangu
No ratings yet
Excel Wind Analysis Present
Document15 pages
Excel Wind Analysis Present
hogze
No ratings yet
PDFfiller - CONTINGENT BILL FORM FOR INTERMEDIATE PUBLIC EXAMINATIONS PDF
Document3 pages
PDFfiller - CONTINGENT BILL FORM FOR INTERMEDIATE PUBLIC EXAMINATIONS PDF
Devara Isthari
No ratings yet
900-Antonym-Synonym-By Vishal Sir
Document35 pages
900-Antonym-Synonym-By Vishal Sir
Kiran Kumari
100% (3)
Roman Drama
Document8 pages
Roman Drama
Erin P
100% (1)
Artificial Intelligence (AI) - Definition, Examples, Types, Applications, Companies, & Facts - Britannica
Document1 page
Artificial Intelligence (AI) - Definition, Examples, Types, Applications, Companies, & Facts - Britannica
jeremyhove96
No ratings yet
Class 4 English
Document39 pages
Class 4 English
nngowriharibaskar
No ratings yet
Brand Image Mediates The Effect of Product Quality On Consumer Loyalty
Document6 pages
Brand Image Mediates The Effect of Product Quality On Consumer Loyalty
AJHSSR Journal
No ratings yet
Music W6 Las#1
Document3 pages
Music W6 Las#1
ariel vela
No ratings yet
Alvi Mutia Nurul Syifa Daniel Alvin Giovanni Dimas Fajriansyah Adi Restu Nurhani Humaira
Document36 pages
Alvi Mutia Nurul Syifa Daniel Alvin Giovanni Dimas Fajriansyah Adi Restu Nurhani Humaira
Anonymous G3HFGlyU
No ratings yet
Sonicwall SSL VPN 2.5 For SSL-VPN 200: Secure Remote Access
Document9 pages
Sonicwall SSL VPN 2.5 For SSL-VPN 200: Secure Remote Access
cami2305
No ratings yet
(Lecture - 7) Conditional Sentences
Document21 pages
(Lecture - 7) Conditional Sentences
N. W. Flannel
No ratings yet
Time Table G9
Document1 page
Time Table G9
Le Mauricien
100% (1)
Full Voice Freebie Songbird Warmup
Document5 pages
Full Voice Freebie Songbird Warmup
Ajanijane
No ratings yet
Innovative Lesson Plan!
Document6 pages
Innovative Lesson Plan!
Reshma Selastin
No ratings yet
High-Level Programming Languages: Focus
Document3 pages
High-Level Programming Languages: Focus
HassanRana
No ratings yet