You are on page 1of 3

COIMBATORE INSTITUTE OF TECHNOLOGY, COIMBATORE

DEPARTMENT OF COMPUTING
(ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING)
IV YEAR M.Sc. (AIML) – VIII Semester
CONTINUOUS ASSESSMENT TEST (CAT) – 1
19MAMEL03 – NATURAL LANGUAGE PROCESSING LAB

Time: 10.00 am – 1.00 pm Maximum Marks: 50

Date: 08.03.2023

COURSE OUTCOMES
CO1: Practice text and word representations in NLP
CO2: Use text segmentation, text summarization and categorization to process text data
CO3: Implement and evaluate different NLP applications using machine learning and deep
learning methods
SET A

QNo Question CO Marks RBT

1 (a) Using RE Extract the usernames from the email addresses CO1, AP
present in the document below CO2
4
Document = "The new registrations are aiml@gmail.com ,
languageprocessing@gmail.com. If you find any
disruptions, kindly contact nlp@gmail.com or
help@gmail.com "

(b) Remove all the punctuations in the given Document


Document = "The match has concluded !!! India has won 4
the match . Will we find the finals too ? !"

(c) Detect all the Laptop names present in the given document
Document = “For my offical use, I prefer Dell. For gaming 4
purposes, I love asus, for entertainment purpose I go for
Lenovo “

2. For the given dataset Build a text classifier model using CO3 AP
naive bayes that can classify Hotel reviews
2
a) Load the dataset 4
b) Use text normalization approaches in dataset 4
c) Extract the reviews 4
d) Fit the classifier and display confusion matrix 4
e) Train the model and predict the result
(Dataset link: Given)

Marks Split-up: Lab Exercises -10/Viva - 10/CAT - 30


COIMBATORE INSTITUTE OF TECHNOLOGY, COIMBATORE
DEPARTMENT OF COMPUTING
(ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING)
IV YEAR M.Sc. (AIML) – VIII Semester
CONTINUOUS ASSESSMENT TEST (CAT) – 1
19MAMEL03 – NATURAL LANGUAGE PROCESSING LAB

Time: 10.00 am – 1.00 pm Maximum Marks: 50

Date: 08.03.2023

COURSE OUTCOMES
CO1: Practice text and word representations in NLP
CO2: Use text segmentation, text summarization and categorization to process text data
CO3: Implement and evaluate different NLP applications using machine learning and deep learning
methods
SET B

QNo Question CO Marks RBT

1. (a) Consider the given document CO1 4 AP


“the outbreak of coronavirus disease 2019 (COVID-19) has CO2
created a global health crisis that has had a deep impact on
the way we perceive our world and our everyday lives. Not
only the rate of contagion and patterns of transmission
threatens our sense of agency, but the safety measures put
in place to contain the spread of the virus also require social
distancing by refraining from doing what is inherently
human, which is to find solace in the company of others.
Within this context of physical threat, social and physical
distancing, as well as public alarm, what has been (and can
be) the role of the different mass media channels in our
lives on individual, social and societal levels? Mass media
have long been recognized as powerful forces shaping how
we experience the world and ourselves. This recognition is
accompanied by a growing volume of research, that closely
follows the footsteps of technological transformations (e.g.
radio, movies, television, the internet, mobiles) and the
zeitgeist (e.g. cold war, 9/11, climate change) in an attempt
to map mass media major impacts on how we perceive
ourselves, both as individuals and citizens. Are media
(broadcast and digital) still able to convey a sense of unity
reaching large audiences, or are messages lost in the noisy
crowd of mass self-communication? "
Remove all the stop words from the document
(b) Plot the word frequency for (top 10)using dispersion plot for
the above document 4
(c) Generate Bigram and Trigram for the above document 4
2. For the given data set, Build a text classifier using naive CO3 AP
bayes that can classify spam mails

a) Load the dataset 2


b) Use text normalization approaches in dataset 4
c) Extract the Keywords 4
d) Build the model and classify it 4
e) Predict the result for a new mail. 4
(Dataset link: Given)

Marks Split-up: Lab Exercises -10/Viva - 10/CAT - 30

You might also like