Professional Documents
Culture Documents
Experiment No.02
A.1 Aim:
To explore and practice NLTK tool kit
A.2 Prerequisite:
Python basics
A.3 Outcome:
After successful completion of this experiment students will be able to
1. Install NLTK toolkit, Anaconda, Jupyter notebook
2. Use NLTK toolkit
A.4 Theory:
Natural language processing (NLP) is a field that focuses on making natural human language
usable by computer programs. NLTK, or Natural Language Toolkit, is a Python package that you
can use for NLP. NLTK is a toolkit build for working with NLP in Python. It provides us various
text processing libraries with a lot of test datasets. A variety of tasks can be performed using
NLTK such as tokenizing, parse tree visualization, etc
A lot of the data that you could be analyzing is unstructured data and contains human-readable
text. Before you can analyze that data programmatically, you first need to preprocess it.
Installing NLTK:
This is optional, but if you feel that you need those datasets before starting to work on the
problem.
import nltk
nltk.download()
desktop requiring no internet access (as described in this document) or can be installed on a
remote server and accessed through the internet.
The easiest way for a beginner to get started with Jupyter Notebooks is by installing Anaconda.
Anaconda is the most widely used Python distribution for data science and comes pre-loaded
with all the most popular libraries and tools.
Some of the biggest Python libraries included in Anaconda include NumPy, pandas, and
Matplotlib, though the full 1000+ list is exhaustive.Anaconda thus lets us hit the ground running
with a fully stocked data science workshop without the hassle of managing countless
installations or worrying about dependencies and OS-specific (read: Windows-specific)
installation issues.
Running Jupyter
On Windows, you can run Jupyter via the shortcut Anaconda adds to your start menu, which will
open a new tab in your default web browser that should look something like the following
screenshot.
Natural Language Processing (NLP) Tutorial with Python & NLTK Tutorial link:
https://www.youtube.com/watch?v=X2vAabgKiuM&t=1315s
https://www.youtube.com/watch?
v=FLZvOKSCkxY&list=PLQVvvaa0QuDf2JswnfiGkliBInZnIC4HL
PART B
(PART B : TO BE COMPLETED BY STUDENTS)
(Students must submit the soft copy as per following segments within two hours of the
practical. The soft copy must be uploaded on the portal or emailed to the concerned lab in
charge faculties at the end of the practical in case the there is no portal access available)
B.3 Conclusion:
(Students must write the conclusive statements as per the attainment of individual outcomes listed above and
learning/observation noted in section B.2)
1. Tokenization: NLTK can be used for tokenizing text, breaking it into individual words or
sentences. This process is fundamental in many NLP tasks.
2. Part-of-Speech Tagging: NLTK allows you to perform part-of-speech tagging, where
each word in a text is labeled with its corresponding part of speech (e.g., noun, verb,
adjective).
3. Named Entity Recognition (NER): NLTK provides tools for recognizing named entities
in text, such as person names, locations, and organizations.
4. Text Classification: NLTK enables you to build and train machine learning models for
text classification tasks, such as sentiment analysis, spam detection, or topic
categorization.
5. Sentiment Analysis: NLTK can be used for sentiment analysis, determining the sentiment
(positive, negative, neutral) of a piece of text.
6. Language Detection: NLTK includes functionality for language detection, identifying the
language in which a text is written.
7. Stemming and Lemmatization: NLTK provides tools for stemming and lemmatization,
reducing words to their base or root form.
8. Frequency Distribution: NLTK allows you to analyze the frequency distribution of words
in a text.
and more. These libraries provide powerful tools for data manipulation, analysis,
visualization, and machine learning.
Cross-Platform Support: Anaconda is available for Windows, macOS, and Linux,
making it accessible to a wide range of users.