NLP Lab Expdoc New

NATURAL LANGUAGE
PROCESSING & APPLICATIONS

21EC4082
STUDENT ID: ACADEMIC YEAR: 2023-24

STUDENT NAME:
Table of Contents
1. Session 01: Introductory Session NA

2. Session 02: Tokenization_of_text #1 #
3. Session 03: Text_2_Sequences #2 #
4. Session 04: One_Hot_Encoding #3 #
5. Session 05: Vectorization_of_texts #4 #
6. Session 06: Databases_how_to_Use #5#
7. Session 07: Parsing_nltk_toolbox #6 #
8. Session 08: TF_Testing_fail #7 #
9. Session 09: IDF_Why #8 #
10. Session 10: TFIDF_Vertorization #9 #
11. Session 11: TF_IDF_Failure_meaning #10 #
12. Session 12: Distance_Metrics #11 #
13. Session 13: Word_similarities_nltk #12 #
14. Session 14: Document_recognition_tfidf_vectors #13 (Adv/Peer) #
15. Session 15: Zipf's_Law_nlp #14 (Adv/Peer) #
16. Session 16: Simple_topic_modelling_ex #15 (Adv/Peer) #
17. Session 17: PCA_From_SCratch #16 (Adv/Peer)#
18. Session 18: Singular_Value_Decomposition_SVD_Ex #17 (Adv/Peer) #
19. Session 19: Latent_Semantic_Analysis_SVD #18 (Adv/Peer) #
20. Session 20: spam_dect_class #19 (Adv/Peer) #
21. Session 21: Sentiment_Analysis_RNN #20 (Adv/Peer) #
https://github.com/pvvkishore/NLP-A_LAB_2023 : Code for the entire lab sessions.

A.Y. 2023-24 LAB/SKILL CONTINUOUS EVALUATION
S.N Date Experiment Name Pre- In-Lab (25M) Post- Viva Total Faculty
o Lab Program/ Data and Analysis & Lab Voce (50M) Signature
(10M) Procedure Results Inference (10M) (5M)
(5M) (10M) (10M)
1. Introductory Session -NA-
Tokenization_of_text #1
2.
Text_2_Sequences #2
3.
One_Hot_Encoding #3
4.
Vectorization_of_texts #4
5.
Databases_how_to_Use #5
6.
Parsing_nltk_toolbox #6
7.
TF_Testing_fail #7
8.
IDF_Why #8
9.
TFIDF_Vertorization #9
10.
TF_IDF_Failure_meaning #10
11.
Distance_Metrics #11
12
13. Word_similarities_nltk #12
14. Document_recognition_tfidf_vectors
S.N Date Experiment Name Pre- In-Lab (25M) Post- Viva Total Faculty
o Lab Program/ Data and Analysis & Lab Voce (50M) Signature
(10M) Procedure Results Inference (10M) (5M)
(5M) (10M) (10M)
#13 (Adv/Peer)
Zipf's_Law_nlp #14 (Adv/Peer)
15.
Simple_topic_modelling_ex #15
16.
(Adv/Peer)
PCA_From_SCratch #16 (Adv/Peer)
17.
Singular_Value_Decomposition_SVD_Ex
18.
#17 (Adv/Peer)
Latent_Semantic_Analysis_SVD #18
19.
(Adv/Peer)
spam_dect_class #19 (Adv/Peer)
20.
Sentiment_Analysis_RNN #20
21
(Adv/Peer)
Experiment # <TO BE FILLED BY STUDENT> Student ID <TO BE FILLED BY STUDENT>
Date <TO BE FILLED BY STUDENT> Student Name <TO BE FILLED BY STUDENT>
Experiment Title: Tokenization_of_text
Aim/Objective:
The aim is to compare and evaluate different tokenization techniques or libraries, such as NLTK,
SpaCy, and TensorFlow, to determine their effectiveness in handling various types of text data.
Description:
Tokenization is the 1st step in any NLP model. The experiment may aim to explore how tokenization
using NLTK, spaCy, and TensorFlow can be integrated into a broader NLP pipeline or used as a
preprocessing step for tasks such as sentiment analysis, machine translation, named entity
recognition, or text summarization. The focus is on understanding the impact of tokenization choices
on downstream model performance. The experiment may aim to analyze the performance
characteristics of tokenization using NLTK and TensorFlow.
Pre-Requisites:
Install Python 3.6 and above using.
1. https://pip.pypa.io/en/stable/installation/
2. https://packaging.python.org/en/latest/tutorials/installing-packages/
3. https://pypi.org/project/nltk/
4.
https://www.tensorflow.org/install/pip
5.
https://spacy.io/usage
6. https://pypi.org/project/gensim/
Pre-Lab:
This Section must contain at least 5 Descriptive type questions or Self-Assessment Questions which
help the student to understand the Program/Experiment that must be performed in the Laboratory
Session.
1. What is tokenization in the context of NLP?
2. How can you tokenize a sentence into individual words using NLTK?
3. What is the purpose of tokenizing text in NLP?
4. Name a few tokenization techniques other than word tokenization.
Course Title NATURAL LANGUAGE PROCESSING & ACADEMIC YEAR: 2023-24

APPLICATIONS
Course Code(s) 21EC4082, 21EC4082A, 21EC4082P Page 1 of 106
5. How can you tokenize a text document into sentences using NLTK?

APPLICATIONS
In-Lab:
1. Apply tokenization methods in the NLTK library on a 5-line text data available in NLTK.
2. Apply tokenization methods in the TF library on a 5-line text data available in NLTK.
3. Draw comparisons based on text handling capabilities.
 Procedure/Program:
This Section is meant for the student to Write the program/Procedure for the Experiment

APPLICATIONS
(Leave at least 2-3 Pages to record the Procedure/Program)

APPLICATIONS
 Data and Results:
This Section is meant for the students to collect, record the results generated during the
Program/Experiment execution. Include instructions on how to present the results, such as creating
tables, graphs, or visualizations.
(Leave at least 1 Page to record the results)

APPLICATIONS
 Analysis and Inferences:
This Section is meant for the students to analyse their data, perform calculations Include
questions or prompts to encourage critical thinking and interpretation of the data
(Leave at least 1 Page for each Program)
Sample VIVA-VOCE Questions (In-Lab):
1. What is tokenization?
2. According to your exp which tokenizer API is the best?
3. How NLTK and TensorFlow handle tokenization for different languages.
4. List the Metrics used to Evaluate Tokenization Techniques.
5. Can you tokenize multiple text documents simultaneously using TensorFlow.
APPLICATIONS

APPLICATIONS
Post-Lab:
1. Try tokenization in the spaCy library and compare with the NLTK and Tensorflow.
2. Try tokenization on big corpus dataset given below.
https://www.kaggle.com/datasets/thoughtvector/customer-support-on-twitter
This Section is meant for the student to Write the program/Procedure for Experiment

APPLICATIONS

APPLICATIONS
(Leave at least 2-3 Pages for each Procedure/ Program/ Solution)
(Leave at least 1 Page for recording the results)
questions or prompts to encourage critical thinking and interpretation of the data.
(Leave at least 1 Page for recording the analysis and inferences)
Evaluator Remark (if Any):
Marks Secured: _____out of 50
Signature of the Evaluator with Date
Evaluator MUST ask Viva-voce prior to signing and posting marks for each experiment.

APPLICATIONS
Experiment Title: Text_2_Sequences
Aim/Objective:
The aim is to evaluate different techniques or libraries, such as NLTK, SpaCy, and TensorFlow, to
determine their effectiveness in converting text to a sequence of numbers.
Description:
The objective of converting text to a sequence of numbers is a fundamental step in natural language
processing (NLP) tasks. The primary goal of this conversion is to represent textual data in a numerical
format that machine learning models can process effectively. To convert text to a numerical format
that enables the application of machine learning and NLP techniques.
Pre-Requisites:
4.
5.
Pre-Lab:
Session.
1. Why convert text to numbers?
2. How effective is the method used by you?
3. Are all sentences in the text considered to have the same length? If No, What did you do.
4. In NLTK, which function is used to assign numeric IDs to tokens?

APPLICATIONS
5. What is the difference between word tokenization and sentence tokenization?
In-Lab:
1. Apply tokenization and convert a sequence of sentences in the NLTK library to a sequence of
numbers.
2. Convert a 10-sentence dataset with multiple-length sentences into a number array of equal
size for ML model training.

APPLICATIONS

APPLICATIONS

APPLICATIONS
1. What does NLTK's FreqDist class provide?

APPLICATIONS
2. According to your exp which API is the best?

3. Do you think your sequence conversion is suitable for GPT.
4. List the Metrics used to Evaluate sequence conversion Techniques.
5. Can you convert using spaCy.

APPLICATIONS
Post-Lab:
1. Try normalization of converted numbers from text data.

2. Try text to sequences on big corpus dataset given below.

APPLICATIONS

APPLICATIONS

APPLICATIONS

APPLICATIONS
Experiment Title: One Hot Encoding of Text
Aim/Objective:
The aim is to convert the text into numbers and eventually code those converted numbers into
encodings for downstream NLP tasks using NLTK, SpaCy, and TensorFlow.
Description:
One hot encoding of text data is a process of transforming categorical data, such as words or
symbols, into numerical data that can be used by machine learning models. It involves creating a
binary vector for each categorical value, where only one element is 1 and the rest are 0. The length
of the vector is equal to the number of unique categories in the data. One hot encoding allows the
representation of categorical data as multidimensional binary vectors that can be fed to models that
require numerical input.
Pre-Requisites:
4.
5.
Pre-Lab:
Session.
1. Why convert text to encoded ones and zeros?
3. Are all sentences in the text considered to have the same length? If No, what did you do.
4. The function get_dummiesis role in one hot encoding?

APPLICATIONS
5. Which according to you is a good NLP practice: OHE words or sentences?
In-Lab:
1. Apply One Hot Encodings and convert a sequence of sentences in the NLTK library to a
sequence of numbers and then OHE.
2. Convert a 10-sentence dataset with multiple-length sentences into a OHE array of equal size
for ML model training.

APPLICATIONS

APPLICATIONS

APPLICATIONS

APPLICATIONS
1. What is one hot encoding and why is it used?

2. What are the advantages and disadvantages of one hot encoding?
3. How can you implement one hot encoding in Python using pandas or scikit-learn?
4. What are some alternatives to one hot encoding for categorical data?
5. How does one hot encoding affect the dimensionality and sparsity of the data?

APPLICATIONS
Post-Lab:
1. Try using OHE data for training a simple neural network model.
2. Try text to OHE on big corpus dataset given below and train a ANN model.

APPLICATIONS

APPLICATIONS

APPLICATIONS

APPLICATIONS
Experiment Title: Vectorization_of_texts
Aim/Objective:
The aim is to convert text into vectors by computing term frequencies and create a corpus.
Description:
The objective of converting text to a sequence of numbers using TF vectorizer function. The primary
goal of this conversion is to represent textual data in a numerical format that machine learning
models can process effectively. To convert text to a numerical format that enables the application of
machine learning and NLP techniques.
Pre-Requisites:
4.
5.
Pre-Lab:
Session.
1. Why TF is better than OHE?
3. What is the mathematical formulation to compute TF.
4. Is TF a good representation for text transformation?
5. What difference did you find between OHE and TF?

APPLICATIONS
In-Lab:
1. Apply tokenization and convert a sequence of sentences in the NLTK library to a sequence of
numbers. Use those sequences and calculate term frequencies for representing text data on
a small corpus.
2. Convert a 10-sentence dataset with multiple-length sentences into TF representations and
compare them with OHE.

APPLICATIONS

APPLICATIONS

APPLICATIONS
1. What does TF stand for?

APPLICATIONS
2. According to your exp which text encoding is the best?

3. Do you think your sequence conversion is suitable for GPT.
5. Can you convert using spaCy.

APPLICATIONS
Post-Lab:
1. Try an ANN model on the transformed text using TF.

2. Try TF conversion big corpus dataset given below and apply ANN training,

APPLICATIONS

APPLICATIONS

APPLICATIONS

APPLICATIONS
Experiment Title: Text Datasets_how_to_use
Aim/Objective:
The aim is to use the online resources of text data to test NLP applications.
Description:
A text corpus is a large and structured collection of texts, typically stored in a digital format, that
serves as a linguistic resource for language analysis and research. It consists of a diverse range of
written or spoken texts from various sources and domains, such as books, articles, newspapers,
websites, social media, conversations, and more.
Pre-Requisites:
4.
5.
Pre-Lab:
1. How can I create a text corpus from a collection of documents using Python?
2. What Python libraries can I use to tokenize and preprocess text data for corpus creation?
3. How can I handle different file formats (e.g., PDF, Word documents) when building a text
corpus in Python?
4. What are the steps involved in cleaning and preprocessing text data for corpus creation?
5. How can I remove stopwords and punctuation from text documents when creating a corpus
in Python?

APPLICATIONS
In-Lab:
1. From NLTK library, download and apply wordnet package of built-in corpus. Extract the
requirements of a text dataset and tokenize the text.
2. From spaCy, use en_core_web_sm (English Small) corpus and tokenize this text.

APPLICATIONS

APPLICATIONS

APPLICATIONS

APPLICATIONS
1. What are the different types of text datasets available in NLTK?

2. Can you give an example of a text dataset available in NLTK?
3. How can you access and explore the content of a text dataset in NLTK?
4. Can you explain the concept of text datasets in spaCy?
5. Do you know spaCy can handle multi-language text datasets? If Yes, name two.
Post-Lab:
1. Try to encode the wordnet text into TF vectors and OHE. Measure the corpus size occupied
by them in memory.
2. Try to find some text datasets available online and load into your current program.

APPLICATIONS

APPLICATIONS

APPLICATIONS

APPLICATIONS
Experiment Title: Parsing_nltk_toolbox
Aim/Objective:
The aim is to analyze the grammatical structure of sentences in natural language text data using
NLTK and spaCy.
Description:
To perform parsing in NLTK, you typically start by defining a grammar using CFG, then apply a
parsing algorithm to parse a sentence and obtain a parse tree or dependency tree representation.
NLTK provides functions and methods to assist in these tasks, such as nltk.CFG for defining CFG,
nltk.ChartParser for chart parsing, and nltk.DependencyParser for dependency parsing.By utilizing
NLTK's parsing capabilities, you can analyze sentence structure, extract syntactic information, and
facilitate further natural language understanding and processing tasks.
Pre-Requisites:
4.
5.
Pre-Lab:
1. What is parsing in natural language processing (NLP), and what is its goal?
2. Explain the concept of Context-Free Grammars (CFG) and their role in parsing.
3. What are the two main parsing strategies supported by NLTK?

APPLICATIONS
4. How does recursive descent parsing work in NLTK?
5. How can you define a CFG in NLTK?

In-Lab:
1. Analyze the grammatical structure of sentences and extract syntactic information on small
text corpus to identify the performance of the libraries used in NLTK.
2. Show that the parser used in spaCy and NLTK libraries have the capability to extract
semantic information.

APPLICATIONS

APPLICATIONS

APPLICATIONS

APPLICATIONS
1. What does NLTK's parser called?

2. According to your exp which parsing model is the best?
3. Do you think parsing is necessary in other languages too.
4. List the Metrics used to Evaluate parsing Techniques.
5. How data is stored after parsing.

APPLICATIONS
Post-Lab:
1. Try parsing using context free grammar on wordnet text data in NLTK.
2. Try parsing on big corpus dataset given below.

APPLICATIONS

APPLICATIONS

APPLICATIONS

APPLICATIONS
Experiment Title: TF_Testing_fail
Aim/Objective:
The aim is to evaluate term frequency (TF) on large text corpuses note its breaking point.
Description:
The formula for calculating term frequency (TF) is:
TF = (Number of occurrences of the term in the document) / (Total number of terms in the
document)
The TF value reflects the relative importance or prevalence of a term within a specific document. It
helps to identify which terms are more frequently used and potentially carry more significance or
relevance in the context of that document. However, term frequency alone does not consider the
significance of the term in the overall corpus.
Pre-Requisites:
4.
5.
Pre-Lab:
1. Why TF when OHE is simple?
2. According to you why TF fails, hypothesize?
3. How can you calculate the term frequency of a specific term in a document using NLTK?
4. How can you count the occurrences of a specific term in a list of tokens using NLTK?

APPLICATIONS
5. How do you normalize the term frequency to account for the document length in NLTK?
In-Lab:
1. Compute TF vectors on large corpus in NLTK library and identify the reason why it cannot
capture the sematic information in the text data.
2. Investigate deeply the above process on a text corpus of your choice to arrive at the solution
in a faster way.

APPLICATIONS

APPLICATIONS

APPLICATIONS
1. Why TF fails to capture the semantics?

APPLICATIONS
2. According to your exp is there an alternative to the TF?

3. Does TF reduce dimensionality or increase it.
4. List the Metrics used to Evaluate the performance of TF representations.
5. Can you convert the above using spaCy.

APPLICATIONS
Post-Lab:
1. Try to compare dimensionality of TF and OHE. Which is the best show through program.
2. Try to explain how TF fails on big corpus dataset given below.

APPLICATIONS

APPLICATIONS

APPLICATIONS

APPLICATIONS
Experiment Title: IDF_Why
Aim/Objective:
The aim is to evaluate the importance of IDF as an alternative to TF, which is projected as an
information retrieval to quantify the importance or rarity of a term in a collection of documents.
Description:
The IDF of a term is calculated as the logarithm of the ratio between the total number of documents
in the collection and the number of documents that contain the term. The formula for IDF is as
follows:
IDF = log(N / DF), N: Total number of documents in the collection, DF: Number of documents that
contain the term. The IDF value increases as the term becomes less frequent in the document
collection. It helps to identify terms that are relatively rare and potentially carry more important or
distinctive information. Terms with higher IDF scores are considered to have more discriminative
power.
Pre-Requisites:
4.
5.
Pre-Lab:
1. Why IDF is more important than TF?
2. How effective is IDF is discriminating text?
3. How can you calculate IDF for a specific term using Python and a given collection of
documents?
4. In NLTK, which function is used to perform IDF calculations?

APPLICATIONS
5. Can you handle the presence of stop words during IDF calculations in NLTK?
In-Lab:
1. Compute IDF on small text data and show how IDF is better than TF in the context of text
discrimination in documents of a corpus.
2. Use wordnet dataset in NLTK and show how IDF beats TF as a text discriminator.

APPLICATIONS

APPLICATIONS

APPLICATIONS

APPLICATIONS
1. Which is a better TF or IDF?

2. According to your exp what is the intuition behind IDF and its significance in information
retrieval and text mining.?
3. Do you think IDF calculated for a term in a collection of documents is suitable for GPT.
5. Can you handle terms with zero IDF.

APPLICATIONS
Post-Lab:
1. Try to compute IDF and TF together using spaCY.

2. Try IDF on big corpus dataset given below.

APPLICATIONS

APPLICATIONS

APPLICATIONS

APPLICATIONS
Experiment Title: TFIDF_Vertorization
Aim/Objective:
The aim is to transform a collection of documents into a numerical representation with TF-IDF
vectors.
Description:
The TF-IDF value is computed by multiplying the term frequency (TF) of a term in a document by the
inverse document frequency (IDF) of the term. Each document is represented as a vector, where
each dimension corresponds to a unique term in the collection. The TF-IDF value for each term in the
document becomes the value of the corresponding dimension in the vector.
Pre-Requisites:
4.
5.
Pre-Lab:
Session.
1. Why TF-IDF works for text vectorization?
3. Why log is used IDF calculations.
4. Which function in NLTK is used to extract TF-IDF vectors?

APPLICATIONS
5. Can spaCy extract TF-IDF vectors?
In-Lab:
1. Apply TF-IDF vectorization model in NLTK on a small set of text data and show the
representation is better than TF and IDF.
2. Convert the NLTK wordnet corpus into a TF-IDF data representaiton.

APPLICATIONS

APPLICATIONS

APPLICATIONS
1. Can you calculate TF-IDF values for terms in a collection of documents?

APPLICATIONS
2. According to you explain the concept of term weighting and its role in TF-IDF calculations.
3. Which Python libraries or modules can be used to perform TF-IDF calculations.
4. List the Metrics used to Evaluate TF-IDF vectors.
5. Can you handle stop words or common words when computing TF-IDF using Python.

APPLICATIONS
Post-Lab:
1. Try spaCy library to convert text to TF-IDF.

2. Try TF-IDF conversion on big corpus dataset given below.

APPLICATIONS

APPLICATIONS

APPLICATIONS

APPLICATIONS
Experiment Title: TF_IDF_Failure_meaning
Aim/Objective:
The aim is to evaluate the reason for failure of TF-IDF vectors and estimate the meaning of failure
to represent the text data in terms of semantics, context extraction and corpus size.
Description:
TF-IDF does not capture the semantic meaning of words or the context in which they are used. It
treats each term independently, without considering their relationships within the document or
across the collection. This can lead to issues when dealing with tasks that require a deeper
understanding of language, such as sentiment analysis or question-answering. TF-IDF is influenced
by document length, as longer documents generally have higher term frequencies. This bias can
result in longer documents dominating the similarity or importance measures, overshadowing
shorter and potentially relevant documents. TF-IDF treats documents as bags of words, disregarding
the order and context in which the words appear. This can be problematic in tasks like text
generation or language translation, where word order and context play a crucial role.
Pre-Requisites:
4.
5.
Pre-Lab:
1. Why TF-IDF fails?
2. How effective is TF-IDF vectorization on long documents?
3. Does they preserve context between the words.
4. In NLTK, which function is used to assign TF-IDF vectors to text?

APPLICATIONS
5. What is the length of TF-IDF vector?

In-Lab:
1. Apply TF-IDF and evaluate the vectors to check their failure related to context and semantic
representation.
2. Show the reason for failure of TF-IDF on large datasets such as wordnet in nltk.

APPLICATIONS

APPLICATIONS

APPLICATIONS
1. What does NLTK's TF-IDF vectorizer do.

APPLICATIONS
2. According to your exp which API is the best for TF_IDF?

3. Do you think your TF-IDF is suitable for GPT.
4. List the Metrics used to Evaluate TF-IDF conversion Techniques.
5. Can you convert large text to TF-IDF using spaCy.

APPLICATIONS
Post-Lab:
1. Try normalization of converted TF-IDF vectors from text data and does they still fail.
2. Try TF-IDF on big corpus dataset given below and use ANN for classification.

APPLICATIONS

APPLICATIONS

APPLICATIONS

APPLICATIONS

NLP Lab Expdoc New

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

NLP Lab Expdoc New

Uploaded by

Copyright:

Available Formats

NATURAL LANGUAGE

PROCESSING & APPLICATIONS

STUDENT ID: ACADEMIC YEAR: 2023-24

1. Session 01: Introductory Session NA

https://github.com/pvvkishore/NLP-A_LAB_2023 : Code for the entire lab sessions.

13. Word_similarities_nltk #12

Experiment Title: Tokenization_of_text

Install Python 3.6 and above using.

1. What is tokenization in the context of NLP?

3. What is the purpose of tokenizing text in NLP?

4. Name a few tokenization techniques other than word tokenization.

Course Title NATURAL LANGUAGE PROCESSING & ACADEMIC YEAR: 2023-24

Course Title NATURAL LANGUAGE PROCESSING & ACADEMIC YEAR: 2023-24

Course Title NATURAL LANGUAGE PROCESSING & ACADEMIC YEAR: 2023-24

(Leave at least 2-3 Pages to record the Procedure/Program)

 Data and Results:

(Leave at least 1 Page to record the results)

Course Title NATURAL LANGUAGE PROCESSING & ACADEMIC YEAR: 2023-24

 Analysis and Inferences:

(Leave at least 1 Page for each Program)

Sample VIVA-VOCE Questions (In-Lab):

Course Title NATURAL LANGUAGE PROCESSING & ACADEMIC YEAR: 2023-24

Course Title NATURAL LANGUAGE PROCESSING & ACADEMIC YEAR: 2023-24

Course Title NATURAL LANGUAGE PROCESSING & ACADEMIC YEAR: 2023-24

(Leave at least 2-3 Pages for each Procedure/ Program/ Solution)

 Data and Results:

(Leave at least 1 Page for recording the results)

 Analysis and Inferences:

(Leave at least 1 Page for recording the analysis and inferences)

Evaluator Remark (if Any):

Marks Secured: _____out of 50

Signature of the Evaluator with Date

Course Title NATURAL LANGUAGE PROCESSING & ACADEMIC YEAR: 2023-24

Experiment Title: Text_2_Sequences

Install Python 3.6 and above using.

1. Why convert text to numbers?

2. How effective is the method used by you?

4. In NLTK, which function is used to assign numeric IDs to tokens?

Course Title NATURAL LANGUAGE PROCESSING & ACADEMIC YEAR: 2023-24

5. What is the difference between word tokenization and sentence tokenization?

Course Title NATURAL LANGUAGE PROCESSING & ACADEMIC YEAR: 2023-24

Course Title NATURAL LANGUAGE PROCESSING & ACADEMIC YEAR: 2023-24

(Leave at least 2-3 Pages to record the Procedure/Program)

 Data and Results:

(Leave at least 1 Page to record the results)

Course Title NATURAL LANGUAGE PROCESSING & ACADEMIC YEAR: 2023-24

 Analysis and Inferences:

(Leave at least 1 Page for each Program)

Sample VIVA-VOCE Questions (In-Lab):

1. What does NLTK's FreqDist class provide?

Course Title NATURAL LANGUAGE PROCESSING & ACADEMIC YEAR: 2023-24

2. According to your exp which API is the best?

Course Title NATURAL LANGUAGE PROCESSING & ACADEMIC YEAR: 2023-24

1. Try normalization of converted numbers from text data.

Course Title NATURAL LANGUAGE PROCESSING & ACADEMIC YEAR: 2023-24

Course Title NATURAL LANGUAGE PROCESSING & ACADEMIC YEAR: 2023-24

(Leave at least 2-3 Pages for each Procedure/ Program/ Solution)

 Data and Results:

(Leave at least 1 Page for recording the results)

Course Title NATURAL LANGUAGE PROCESSING & ACADEMIC YEAR: 2023-24

 Analysis and Inferences:

(Leave at least 1 Page for recording the analysis and inferences)

Evaluator Remark (if Any):