Text Mining Problem Statement

Uploaded by

mamta thakur

100% found this document useful (1 vote)

219 views3 pages

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

100% found this document useful (1 vote)

219 views3 pages

Text Mining Problem Statement

Uploaded by

mamta thakur

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 3

Search inside document

Topic: Text Mining (NLP)

Instructions:
Please share your answers filled in-line in the word document. Submit code
separately wherever applicable.

Please ensure you update all the details:

Name: _____________ Batch ID: ___________
Topic: Text Mining and NLP

Grading Guidelines:
1. An assignment submission is considered complete only when correct and executable code(s) are
submitted along with the documentation explaining the method and results. Failing to submit either of
those will be considered an invalid submission and will not be considered for evaluation.
2. Assignments submitted after the deadline will affect your grades.

Grading Criteria:
Submission Date Submission Date
Correct On time A 100
80% & above On time B 85 Correct Late
50% & above On time C 75 80% & above Late
50% & below On time D 65 50% & above Late
E 55 50% & below
Copied/No Submission F 45

● Grade A: (>= 90): When all assignments are submitted on or before the given deadline.
● Grade B: (>= 80 and < 90):
o When assignments are submitted on time but less than 80% of problems are completed.
(OR)
o All assignments are submitted after the deadline.

● Grade C: (>= 70 and < 80):

o When assignments are submitted on time but less than 50% of the problems are
completed.
(OR)
o Less than 80% of problems in the assignments are submitted after the deadline.

● Grade D: (>= 60 and < 70):

o Assignments submitted after the deadline and with 50% or less problems.

● Grade E: (>= 50 and < 60):

o Less than 30% of problems in the assignments are submitted after the deadline.
(OR)
o Less than 30% of problems in the assignments are submitted before the deadline.

● Grade F: (< 50): No submission (or) malpractice.

Hints:
1. Business Problem
1.1. What is the business objective?
1.2. Are there any constraints?
2. Work on each feature of the dataset to create a data dictionary as displayed in
the below image:

2.1 Make a table as shown above and provide information about the features such as its
data type and its relevance to the model building. And if not relevant, provide reasons and a
description of the feature.

3. Data Pre-processing
3.1 Data Cleaning, Feature Engineering, etc.
3.2 Outlier Treatment
4. Exploratory Data Analysis (EDA):
4.1. Summary
4.2. Univariate analysis
4.3. Bivariate analysis

5. Model Building
5.1 Extract text data from websites such as Amazon, Snapdeal, IMDB, Twitter, etc.
5.2 Clean the data and build a word cloud for both positive and negative
words. Perform Sentiment Analysis as well.
5.3 Briefly explain the model output in the documentation.

6. Write about the benefits/impact of the solution - in what way does the
business (client) benefit from the solution provided?

Problem Statement: -

In the era of widespread internet use, it is necessary for businesses to understand what
the consumers think of their products. If they can understand what the consumers like or
dislike about their products, they can improve them and thereby increase their profits by
keeping their customers happy. For this reason, they analyze the reviews of their products
on websites such as Amazon or Snapdeal by using text mining and sentiment analysis
techniques.

Task 1:
1. Extract reviews of any product from e-commerce website Amazon.
2. Perform sentiment analysis on this extracted data and build a unigram and bigram
word cloud.

Task 2:
1. Extract reviews for any movie from IMDB and perform sentiment analysis.

Task 3:

1. Choose any other website on the internet and do some research on how to
extract text and perform sentiment analysis

NaiveBayes - Problem Statement
Document5 pages
NaiveBayes - Problem Statement
SANKALP ROY
0% (1)
Multinomial Problem Statement
Document28 pages
Multinomial Problem Statement
Moayad
No ratings yet
Missing Values
Document3 pages
Missing Values
vinutha
No ratings yet
DataPreparation Outlier Treatment
Document3 pages
DataPreparation Outlier Treatment
onkar diwate
100% (1)
CRISP DM Business Understanding Completed
Document18 pages
CRISP DM Business Understanding Completed
Jaya Ujaya
No ratings yet
KNN - Problem Statement ANSWER
Document8 pages
KNN - Problem Statement ANSWER
Naga Shiva
100% (1)
Topic: Dimension Reduction With PCA: Instructions
Document8 pages
Topic: Dimension Reduction With PCA: Instructions
Alesya alesya
No ratings yet
Association Rules Ans
Document28 pages
Association Rules Ans
Priya kamble
No ratings yet
CRISP DM Business Understanding - Data Science
Document15 pages
CRISP DM Business Understanding - Data Science
Ganesh V Gaonkar
No ratings yet
Association Rules Problem Statement
Document5 pages
Association Rules Problem Statement
vinutha
50% (2)
CRISP DM Business Understanding
Document17 pages
CRISP DM Business Understanding
keerthi
No ratings yet
CRISP DM Business Understanding
Document18 pages
CRISP DM Business Understanding
Rahul Sathiyanarayanan
No ratings yet
Missing Values
Document6 pages
Missing Values
R N Guru
No ratings yet
Transformations Problem Statement
Document7 pages
Transformations Problem Statement
R N Guru
No ratings yet
Zero-Variance Feature Removal
Document3 pages
Zero-Variance Feature Removal
vinutha
0% (1)
Zero Variance-Problem Statement
Document4 pages
Zero Variance-Problem Statement
R N Guru
50% (2)
Hierarchical Clustering: Instructions
Document4 pages
Hierarchical Clustering: Instructions
Praveen Urs R
67% (3)
Problem Statements:: Inferential Statistics
Document5 pages
Problem Statements:: Inferential Statistics
vinutha
No ratings yet
Discretization Problem Statement
Document3 pages
Discretization Problem Statement
vinutha
No ratings yet
Day13 K Means Clustering
Document4 pages
Day13 K Means Clustering
Priya kamble
No ratings yet
Clustering analysis reveals optimum number of clusters for airlines and crime data
Document9 pages
Clustering analysis reveals optimum number of clusters for airlines and crime data
nehal gundrapally
100% (1)
R - Assignment
Document2 pages
R - Assignment
Ravalika Peenamala
No ratings yet
Basic Statistics (Module - 3)
Document7 pages
Basic Statistics (Module - 3)
nehal gundrapally
No ratings yet
Day10 Mathematical Foundations
Document4 pages
Day10 Mathematical Foundations
kishna
No ratings yet
Duplication - Typecasting-Problem Statement
Document3 pages
Duplication - Typecasting-Problem Statement
vinutha
100% (1)
DataPreparation - Outlier - Treatment ASSIGNMENT 1
Document7 pages
DataPreparation - Outlier - Treatment ASSIGNMENT 1
Hari Machavrapu
100% (1)
11 Network Analytics - Problem Statement
Document4 pages
11 Network Analytics - Problem Statement
abhishek kandharkar
25% (4)
CRISP - ML (Q) - Business Understanding Assignment
Document11 pages
CRISP - ML (Q) - Business Understanding Assignment
Anakha Prasad
No ratings yet
8.dummy Variables
Document4 pages
8.dummy Variables
Sunkari Arika
No ratings yet
06.discretization Problem Statement
Document2 pages
06.discretization Problem Statement
gujan thakur
50% (2)
Statistics and Probability
Document8 pages
Statistics and Probability
Alesya alesya
No ratings yet
15 KNN - Problem Statement
Document3 pages
15 KNN - Problem Statement
abhishek kandharkar
0% (1)
13.exploratory Data Analysis
Document8 pages
13.exploratory Data Analysis
Karthik Peddineni
0% (1)
CRISP - ML (Q) - Business Understanding
Document13 pages
CRISP - ML (Q) - Business Understanding
Tejas S
No ratings yet
Module 03 Assignment
Document13 pages
Module 03 Assignment
suresh avadutha
100% (1)
Data Assigment 1
Document32 pages
Data Assigment 1
Sukhwinder Kaur
100% (1)
Name:Silpa Batch Id: Analysis: WDEO 171220 Topic: Principal Component
Document7 pages
Name:Silpa Batch Id: Analysis: WDEO 171220 Topic: Principal Component
Sravani Adapa
100% (1)
CRISP ML (Q) Business Understanding
Document17 pages
CRISP ML (Q) Business Understanding
Nikesh bobade
No ratings yet
Name: Siti Mursyida Abdul Karim (Data Science Program) Topic: Assignment - EDA
Document13 pages
Name: Siti Mursyida Abdul Karim (Data Science Program) Topic: Assignment - EDA
siti mursyida
100% (1)
Basic Statistics (Module - 3)
Document9 pages
Basic Statistics (Module - 3)
prudhvi nadimpalli
No ratings yet
Assignments Module038566
Document6 pages
Assignments Module038566
karthik karti
No ratings yet
Business Uderstanding Solved1 - Module 1
Document11 pages
Business Uderstanding Solved1 - Module 1
pooja mukunde
No ratings yet
Inferential Statistics
Document10 pages
Inferential Statistics
Sapana Sonawane
No ratings yet
Discretization Problem Statement
Document4 pages
Discretization Problem Statement
R N Guru
No ratings yet
Business Moments Graphic Assignmebt
Document11 pages
Business Moments Graphic Assignmebt
Anakha Prasad
No ratings yet
Assignment 2
Document7 pages
Assignment 2
suresh avadutha
No ratings yet
Module-Preliminaries For Data Analysis - Data Science
Document5 pages
Module-Preliminaries For Data Analysis - Data Science
Ganesh V Gaonkar
100% (1)
Python Data Types and Sets Frequency
Document3 pages
Python Data Types and Sets Frequency
Tejas S
100% (1)
Inferential Statistics (AutoRecovered)
Document12 pages
Inferential Statistics (AutoRecovered)
Shariq Ansari
No ratings yet
Preliminaries For Data Analysis: Problem Statements
Document4 pages
Preliminaries For Data Analysis: Problem Statements
Anakha Prasad
No ratings yet
Main Key
Document4 pages
Main Key
kishna
No ratings yet
Day02-Data Understanding Answer Asit 25082022
Document4 pages
Day02-Data Understanding Answer Asit 25082022
love
No ratings yet
CRISP ML (Q) Business Understanding
Document12 pages
CRISP ML (Q) Business Understanding
Tharani vimala
No ratings yet
Basic Statistics (Module - 4 ( - 1) ) : © 2013 - 2020 360digitmg. All Rights Reserved
Document3 pages
Basic Statistics (Module - 4 ( - 1) ) : © 2013 - 2020 360digitmg. All Rights Reserved
Proli Pravalika
No ratings yet
Problem Statement - Mathematical Foundations
Document6 pages
Problem Statement - Mathematical Foundations
Samyak Varshney
No ratings yet
Topics: Descriptive Statistics and Probability: Name of Company Measure X
Document4 pages
Topics: Descriptive Statistics and Probability: Name of Company Measure X
Simarjeet Singh
No ratings yet
K-Nearest Neighbors: Instructions
Document4 pages
K-Nearest Neighbors: Instructions
swapnil karade
No ratings yet
Network Analytics - Problem Statement
Document4 pages
Network Analytics - Problem Statement
swapnil karade
No ratings yet
Transformations Problem Statement
Document3 pages
Transformations Problem Statement
vinutha
No ratings yet
DataPreparation Outlier Treatment
Document5 pages
DataPreparation Outlier Treatment
R N Guru
No ratings yet
Negative Words For NLP
Document81 pages
Negative Words For NLP
hfujsx
No ratings yet
One Plus Seven
Document48 pages
One Plus Seven
mamta thakur
No ratings yet
Positive Words
Document34 pages
Positive Words
Sirius_Vishal
No ratings yet
Stop Words
Document10 pages
Stop Words
Tomas Guerra Cámara
No ratings yet
GYROSCOPE Manual
Document8 pages
GYROSCOPE Manual
Aman Bansal
No ratings yet
Chapter 5
Document42 pages
Chapter 5
Hong Anh
No ratings yet
Budha Dal Aarti Aarta FULL
Document1 page
Budha Dal Aarti Aarta FULL
Vishal Singh
100% (1)
Compare and Contrast History Essay
Document9 pages
Compare and Contrast History Essay
Giselle Posada
No ratings yet
Nanotechnology in Textiles
Document4 pages
Nanotechnology in Textiles
kevin cagud Phillip
No ratings yet
Kindergarten Quarter 4 Standards For Lesson Plans
Document2 pages
Kindergarten Quarter 4 Standards For Lesson Plans
LydiaDietsch
No ratings yet
Managing An Angry Patient
Document15 pages
Managing An Angry Patient
masa.dalati.1
No ratings yet
POMR Satiti Acute Cholangitis
Document30 pages
POMR Satiti Acute Cholangitis
Ika Ayu
No ratings yet
Philippine School Action Plan for Scouting Program
Document1 page
Philippine School Action Plan for Scouting Program
Laira Joy Salvador - Viernes
No ratings yet
DEF CON 30 - Hadrien Barral - Emoji Shellcoding ?, ?, and ? - Presentation
Document141 pages
DEF CON 30 - Hadrien Barral - Emoji Shellcoding ?, ?, and ? - Presentation
kumar sanjay
No ratings yet
Attendance: Umut Kurtoğlu
Document2 pages
Attendance: Umut Kurtoğlu
Havva
No ratings yet
Cucs 016 13 PDF
Document16 pages
Cucs 016 13 PDF
Anonymous SlyvspdB
No ratings yet
Lucky Textile
Document5 pages
Lucky Textile
Saim Bin Rashid
No ratings yet
PreciControl CMV IgG Avidity - Ms - 05942322190.V4.En
Document2 pages
PreciControl CMV IgG Avidity - Ms - 05942322190.V4.En
ARIF AHAMMED P
No ratings yet
Effect of Grain Boundary Thermal Expansion on Silicon Nitride Fracture Toughness
Document8 pages
Effect of Grain Boundary Thermal Expansion on Silicon Nitride Fracture Toughness
brijesh kinkhab
No ratings yet
LLM Thesis On Human Rights
Document7 pages
LLM Thesis On Human Rights
wssotgvcf
100% (2)
TR 101 - Issue 2
Document101 pages
TR 101 - Issue 2
ergismilo
No ratings yet
Physics Universe Models
Document14 pages
Physics Universe Models
Tracy zorca
50% (2)
Physics
Document3 pages
Physics
Mohit Tiwari
No ratings yet
Ngāti Kere: What 87 Can Achieve Image 500
Document17 pages
Ngāti Kere: What 87 Can Achieve Image 500
Angela Houkamau
No ratings yet
The Biology of Vascular Epiphytes Zotz 2016 PDF
Document292 pages
The Biology of Vascular Epiphytes Zotz 2016 PDF
Evaldo Pape
100% (1)
Rhythm MP - The Music Page - Theory Made Easy For Little Children Level 1
Document9 pages
Rhythm MP - The Music Page - Theory Made Easy For Little Children Level 1
Amilacic
No ratings yet
Feminist Criticism in Frankenstein, Equus and The Turn of The Screw
Document4 pages
Feminist Criticism in Frankenstein, Equus and The Turn of The Screw
Lucia Toledo
No ratings yet
Parenteral Fluid Therapy: Types of Intravenous Solution
Document18 pages
Parenteral Fluid Therapy: Types of Intravenous Solution
Kathleen Joy Costales Magtanong
100% (1)
Creative 2nd Quarter
Document6 pages
Creative 2nd Quarter
Janice Cordova
No ratings yet
Hotel Training Report
Document14 pages
Hotel Training Report
Butchick Concepcion Malasa
100% (1)
CCNA Security Instructor Lab Manual v1 - p8
Document1 page
CCNA Security Instructor Lab Manual v1 - p8
MeMe Amro
No ratings yet
PAPD Cable glands for hazardous areas
Document2 pages
PAPD Cable glands for hazardous areas
Gulf Trans Power
No ratings yet
IGO Operations Manual PDF
Document2,251 pages
IGO Operations Manual PDF
Mardi Wirengkoso
100% (1)
Digital Media TYBMM (Advertising & Journalism) Semester VI
Document5 pages
Digital Media TYBMM (Advertising & Journalism) Semester VI
Kartavya Jain
No ratings yet