You are on page 1of 6

SVKM’S NMIMS (Deemed-to-be University)

MUKESH PATEL SCHOOL OF TECHNOLOGY MANAGEMENT AND ENGINEERING


NAVI MUMBAI CAMPUS

Experiment No.02

A.1 Aim:
To explore and practice NLTK tool kit

A.2 Prerequisite:
Python basics

A.3 Outcome:
After successful completion of this experiment students will be able to
1. Install NLTK toolkit, Anaconda, Jupyter notebook
2. Use NLTK toolkit

A.4 Theory:
Natural language processing (NLP) is a field that focuses on making natural human language
usable by computer programs. NLTK, or Natural Language Toolkit, is a Python package that you
can use for NLP. NLTK is a toolkit build for working with NLP in Python. It provides us various
text processing libraries with a lot of test datasets. A variety of tasks can be performed using
NLTK such as tokenizing, parse tree visualization, etc

A lot of the data that you could be analyzing is unstructured data and contains human-readable
text. Before you can analyze that data programmatically, you first need to preprocess it.

Installing NLTK:

Use the pip install method to install NLTK in your system:

pip install nltk


Downloading the datasets:

This is optional, but if you feel that you need those datasets before starting to work on the

problem.

import nltk
nltk.download()

Jupyter Notebook App


The Jupyter Notebook App is a server-client application that allows editing and running
notebook documents via a web browser. The Jupyter Notebook App can be executed on a local
SVKM’S NMIMS (Deemed-to-be University)
MUKESH PATEL SCHOOL OF TECHNOLOGY MANAGEMENT AND ENGINEERING
NAVI MUMBAI CAMPUS

desktop requiring no internet access (as described in this document) or can be installed on a
remote server and accessed through the internet.

In addition to displaying/editing/running notebook documents, the Jupyter Notebook App has a


“Dashboard” (Notebook Dashboard), a “control panel” showing local files and allowing to open
notebook documents or shutting down their kernels.

The easiest way for a beginner to get started with Jupyter Notebooks is by installing Anaconda.
Anaconda is the most widely used Python distribution for data science and comes pre-loaded
with all the most popular libraries and tools.

Some of the biggest Python libraries included in Anaconda include NumPy, pandas, and
Matplotlib, though the full 1000+ list is exhaustive.Anaconda thus lets us hit the ground running
with a fully stocked data science workshop without the hassle of managing countless
installations or worrying about dependencies and OS-specific (read: Windows-specific)
installation issues.

To get Anaconda, simply:

Download the latest version of Anaconda for Python 3.8.


Install Anaconda by following the instructions on the download page and/or in the executable.
If you are a more advanced user with Python already installed and prefer to manage your
packages manually, you can just use pip:

pip3 install jupyter

Running Jupyter
On Windows, you can run Jupyter via the shortcut Anaconda adds to your start menu, which will
open a new tab in your default web browser that should look something like the following
screenshot.

The Notebook Interface


Now that you have an open notebook in front of you, its interface will hopefully not look entirely
alien. After all, Jupyter is essentially just an advanced word processor.
SVKM’S NMIMS (Deemed-to-be University)
MUKESH PATEL SCHOOL OF TECHNOLOGY MANAGEMENT AND ENGINEERING
NAVI MUMBAI CAMPUS

Try hello world code:


print('Hello World!')
Hello World!

Natural Language Processing (NLP) Tutorial with Python & NLTK Tutorial link:
https://www.youtube.com/watch?v=X2vAabgKiuM&t=1315s

https://www.youtube.com/watch?
v=FLZvOKSCkxY&list=PLQVvvaa0QuDf2JswnfiGkliBInZnIC4HL

PART B
(PART B : TO BE COMPLETED BY STUDENTS)
(Students must submit the soft copy as per following segments within two hours of the
practical. The soft copy must be uploaded on the portal or emailed to the concerned lab in
charge faculties at the end of the practical in case the there is no portal access available)

Roll. No. : A082 Name: Shubham Kumar


Class: B.Tech CSBS Batch: 2
Date of Experiment: Date of Submission:
Grade:

B.1 Task to do:


Task1. Install NLTK tool kit.
Task2. Install Anaconda with Jupyter Notebook App.
Task3. Write basic program (hello world) with Jupyter Notebook App

B.2 Observations and Learning:


(Students must write the observations and learning based on their understanding built about the subject matter
and inferences drawn)

1. Provide screenshot for the task given.


SVKM’S NMIMS (Deemed-to-be University)
MUKESH PATEL SCHOOL OF TECHNOLOGY MANAGEMENT AND ENGINEERING
NAVI MUMBAI CAMPUS

B.3 Conclusion:
(Students must write the conclusive statements as per the attainment of individual outcomes listed above and
learning/observation noted in section B.2)

B.4 Curiosity Questions:


SVKM’S NMIMS (Deemed-to-be University)
MUKESH PATEL SCHOOL OF TECHNOLOGY MANAGEMENT AND ENGINEERING
NAVI MUMBAI CAMPUS

Identify use of NLTK tool kit -

1. Tokenization: NLTK can be used for tokenizing text, breaking it into individual words or
sentences. This process is fundamental in many NLP tasks.
2. Part-of-Speech Tagging: NLTK allows you to perform part-of-speech tagging, where
each word in a text is labeled with its corresponding part of speech (e.g., noun, verb,
adjective).
3. Named Entity Recognition (NER): NLTK provides tools for recognizing named entities
in text, such as person names, locations, and organizations.
4. Text Classification: NLTK enables you to build and train machine learning models for
text classification tasks, such as sentiment analysis, spam detection, or topic
categorization.
5. Sentiment Analysis: NLTK can be used for sentiment analysis, determining the sentiment
(positive, negative, neutral) of a piece of text.
6. Language Detection: NLTK includes functionality for language detection, identifying the
language in which a text is written.
7. Stemming and Lemmatization: NLTK provides tools for stemming and lemmatization,
reducing words to their base or root form.
8. Frequency Distribution: NLTK allows you to analyze the frequency distribution of words
in a text.

2. Explain use of Anaconda, Jupyter notebook.

1. Anaconda: Anaconda is a popular open-source distribution of the Python and R


programming languages for data science and machine learning tasks. It comes pre-
packaged with a vast collection of essential libraries, tools, and environments that make it
easier for data scientists and developers to set up their data analysis and machine learning
workflows.
Key features and uses of Anaconda:
 Package Management: Anaconda provides a package manager called conda,
which simplifies the installation, updating, and management of various Python
libraries and packages used in data science.
 Environment Management: One of the significant advantages of Anaconda is its
ability to create and manage isolated environments. These environments allow
you to have different Python versions and package dependencies for different
projects, preventing conflicts between packages.
 Integrated Development Environment (IDE): Anaconda comes with a user-
friendly IDE called Anaconda Navigator, which allows you to manage
environments, launch Jupyter Notebook, and install new packages through a
graphical interface.
 Comprehensive Libraries: Anaconda includes many popular data science and
machine learning libraries like NumPy, pandas, SciPy, scikit-learn, Matplotlib,
SVKM’S NMIMS (Deemed-to-be University)
MUKESH PATEL SCHOOL OF TECHNOLOGY MANAGEMENT AND ENGINEERING
NAVI MUMBAI CAMPUS

and more. These libraries provide powerful tools for data manipulation, analysis,
visualization, and machine learning.
 Cross-Platform Support: Anaconda is available for Windows, macOS, and Linux,
making it accessible to a wide range of users.

2. Jupyter Notebook (formerly IPython Notebook): Jupyter Notebook is an interactive


computing environment that enables users to create and share documents containing live
code, equations, visualizations, and narrative text. It supports multiple programming
languages, but it is widely used with Python for data analysis and exploration.

Key features and uses of Jupyter Notebook:


 Interactive Coding: Jupyter Notebook allows users to execute code cells
interactively, which means you can run individual code blocks and see the results
immediately. This is particularly useful for data exploration and debugging.
 Markdown Support: Jupyter Notebook supports Markdown, allowing you to add
rich-text elements like headers, bullet points, images, and hyperlinks to provide
context and explanations in your code documentation.
 Data Visualization: With Jupyter Notebook, you can create visualizations using
libraries like Matplotlib, Seaborn, and Plotly, allowing you to generate interactive
plots and graphs directly in the notebook.
 Easy Sharing and Collaboration: Notebooks can be easily shared with others,
making it a great tool for collaborating on data analysis projects or presenting
findings to stakeholders.
 Reproducibility: Jupyter Notebooks promote reproducible research because all
code, explanations, and visualizations are contained within a single document.

You might also like