Welcome to Scribd!

Skip carousel

Slides - Text Mining PDF

Uploaded by

Vikas Mishra

0% found this document useful (0 votes)

15 views12 pages

Original Title

Slides+-+Text+Mining.pdf

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

15 views12 pages

Slides - Text Mining PDF

Uploaded by

Vikas Mishra

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 12

Search inside document

Text Analytics

Machine Learning

Proprietary content. ©Great Learning. All Rights Reserved.

Unauthorized use or distribution prohibited.
Agenda

• Introduction to Text Analytics

• Text Analytics and Applications

• Unstructured Vs Structured Data, Cleaning

• Bag of words, Word Frequencies

• Hierarchical Clustering

• Sentiment Analysis

• Word Embeddings

• Ensemble Methods

Proprietary content. ©Great Learning. All Rights Reserved.

Unauthorized use or distribution prohibited.
2
Text Analytics

• The process of drawing meaning out of written

communication.

• to understand online reviews, tweets, call center agent

notes, survey results, and other types of written feedback
that capture insight into your customers.

• Spam detection

• translation

• search and crawl

• sentimental analysis

• entity modeling to support fact based decision making

• text summarization
Proprietary content. ©Great Learning. All Rights Reserved.
Unauthorized use or distribution prohibited.
3
Text: Unstructured Data

• Structured: Data is organized into pre-defined structure like

a table of database - with rows and columns.

• UnStructured Data: Data does not have a pre-defined

structure. Think of a collection of emails, a bunch of satellite
images or the entire text of speeches from the british
parliament since 1803.

Proprietary content. ©Great Learning. All Rights Reserved.

Unauthorized use or distribution prohibited.
4
Modeling/representing text

• Bag of words - Documents simply represented by the words

in the document and their frequencies. Disregards grammar
and word order

• Bayesian SPAM filter

• Semantic - mapping natural language rules to get a formal

representation of the meaning of the text

• Name entity identification

Proprietary content. ©Great Learning. All Rights Reserved.

Unauthorized use or distribution prohibited.
5
Bag of words
• Corpus:

• A: John likes to play soccer

• B: John is reading a book

John likes soccer play book reading a is to

A 1 1 1 1 1
B 1 1 1 1 1

Proprietary content. ©Great Learning. All Rights Reserved.

Unauthorized use or distribution prohibited.
6
n-gram model

• The Bag-of-words model is an orderless document representation. Only

the counts of words matter.

• We could do this also by choosing consecutive pairs (2-gram) and

representing each pair

• A: John likes to play soccer

• B: John is reading a book

• 2-gram (bigram):

John likes likes to play soccer to play John is is reading reading a a book

A 1 1 1 1
B 1 1 1 1

Unauthorized use or distribution prohibited.
7
Cleaning text

• Stop words: Common words that are not useful in providing

value or context. Eg: ‘the’, ‘an’, ‘in’ etc.

• Stemming: Returning words to their original stem. Eg:

‘Chopping’, ‘Chopped’ are all replaced with ‘Chop’

• Lower case conversion

• Remove punctuations

• Strip extra white spaces

• Remove numbers

Unauthorized use or distribution prohibited.
8
Example

Unauthorized use or distribution prohibited.
9
Example

Unauthorized use or distribution prohibited.
10
Term-Document Matrix (TDM)
Doc 1 Doc 2 … Doc N
Term 1

Term 2
…

…
Term M

Document-Term Matrix (DTM)

Term 1 Term 2 … Term M
Doc 1
Doc 2
…
Doc N

Unauthorized use or distribution prohibited.
11
• Each document is represented by a vector in the term document
matrix

• This lends itself to a number of ML techniques

• For example, these vectors (documents) can be clustered to

identify similar documents

Unauthorized use or distribution prohibited.
12

Fluency, Grades 1 - 3: Activities That Support Research-Based Instruction
From Everand
Fluency, Grades 1 - 3: Activities That Support Research-Based Instruction
Starin W. Lewis
Rating: 4 out of 5 stars
4/5 (1)
FreePhonicsReadingPassagesforFluencyandComprehensionSampler 1
Document18 pages
FreePhonicsReadingPassagesforFluencyandComprehensionSampler 1
Mony Fernandez
No ratings yet
Effective Mechanics of Writing
Document23 pages
Effective Mechanics of Writing
umar altaf
No ratings yet
Speed Reading For Dummies
From Everand
Speed Reading For Dummies
Richard Sutz
Rating: 3 out of 5 stars
3/5 (7)
Skinnybones (Novel Study)
From Everand
Skinnybones (Novel Study)
Sonja Suset
No ratings yet
Machine Learning in Geoscience
Document22 pages
Machine Learning in Geoscience
Ade Prayuda
No ratings yet
Deep Learning Interview Questions - Deep Learning Questions
Document21 pages
Deep Learning Interview Questions - Deep Learning Questions
hehee
No ratings yet
English For Information Technology Level 1
Document33 pages
English For Information Technology Level 1
Sergio Rosales
No ratings yet
Referencing and Avoiding Plagiarism
Document35 pages
Referencing and Avoiding Plagiarism
What le fuck
No ratings yet
How To Avoid Plagiarism
Document4 pages
How To Avoid Plagiarism
Somrita Bera
No ratings yet
NLP Part1
Document67 pages
NLP Part1
QADEER AHMAD
No ratings yet
Upload Leaflet
Document64 pages
Upload Leaflet
Mr Locker
No ratings yet
AI in Warehousing
Document5 pages
AI in Warehousing
Aryan
No ratings yet
Presentation On AI
Document15 pages
Presentation On AI
Naqeeb Ur Rehman
0% (1)
Lesson 1-Levels of Comprehension-Expository Text Analysis
Document29 pages
Lesson 1-Levels of Comprehension-Expository Text Analysis
Tibetan Fox
No ratings yet
Training: Slide Making - Slide Methodology
Document18 pages
Training: Slide Making - Slide Methodology
susheel kumar
No ratings yet
Key Thesis Writing Skills 2
Document12 pages
Key Thesis Writing Skills 2
Ahmad
No ratings yet
Classroom Assessment Resource Package ELA
Document13 pages
Classroom Assessment Resource Package ELA
hongbong
No ratings yet
Lesson Plan GET IP Grade 5 English FAL
Document11 pages
Lesson Plan GET IP Grade 5 English FAL
Gqezeh Msaneh
No ratings yet
Projectable Journey's Book Strategy Grade 2 Unit 1
Document8 pages
Projectable Journey's Book Strategy Grade 2 Unit 1
Daannaahh
No ratings yet
Levi Velasquez Paz3
Document1 page
Levi Velasquez Paz3
Antonio Aburto Cortez
No ratings yet
1 Reading Process Reading Strategies
Document15 pages
1 Reading Process Reading Strategies
Honey Wind Oriel
No ratings yet
1.570 ATP 2023-24 GR 6 IsiZulu HL Final
Document21 pages
1.570 ATP 2023-24 GR 6 IsiZulu HL Final
lengseunice0
No ratings yet
Chapter 1. The Reading Writing Connection
Document10 pages
Chapter 1. The Reading Writing Connection
margarita gregory
No ratings yet
Text As Connected Speech
Document57 pages
Text As Connected Speech
Chello Ann Pelaez Asuncion
No ratings yet
Natural Language Processing
Document21 pages
Natural Language Processing
Feroz Ahmed
No ratings yet
9no-Egb-A1.2-Ingles-Student-Book (1) - 4
Document1 page
9no-Egb-A1.2-Ingles-Student-Book (1) - 4
July C
No ratings yet
Reading Strategies
Document12 pages
Reading Strategies
fyama honorato
No ratings yet
Interactive Notebook Pages
Document16 pages
Interactive Notebook Pages
Ibrahim Afifi
No ratings yet
Lecture 1 Introduction To NLP
Document16 pages
Lecture 1 Introduction To NLP
Ayush Gupta
No ratings yet
Unit I Natural Language Basics Foundations of Natural Language Processing
Document14 pages
Unit I Natural Language Basics Foundations of Natural Language Processing
I yr IT 10-Cherisha S
No ratings yet
Jim Henson British
Document17 pages
Jim Henson British
J3m
No ratings yet
English 6 TRP
Document279 pages
English 6 TRP
Shireen Xada
No ratings yet
InglesSec PDF
Document9 pages
InglesSec PDF
fernandar_121
No ratings yet
IT1306 - Lesson 03 - FINAL
Document99 pages
IT1306 - Lesson 03 - FINAL
vishminu95
No ratings yet
5.2 Natural Language Processing
Document43 pages
5.2 Natural Language Processing
punit mishra
No ratings yet
Skimming Scanning and Detailed Reading
Document2 pages
Skimming Scanning and Detailed Reading
Xumar Həsənova
No ratings yet
Star
Document2 pages
Star
api-546696759
No ratings yet
MN-1007 Session 1 Oct 2020
Document12 pages
MN-1007 Session 1 Oct 2020
Chris Costa
No ratings yet
Research Writing
Document9 pages
Research Writing
hamdan123456
No ratings yet
10-Steps To Writing A Research Paper
Document3 pages
10-Steps To Writing A Research Paper
Anna H
No ratings yet
Pinto, Ambar Rosales, Omar - ILIN165-2023-Assignment 2 Peer-Editing
Document11 pages
Pinto, Ambar Rosales, Omar - ILIN165-2023-Assignment 2 Peer-Editing
Ambar Anais
No ratings yet
AA Basic Writing Skills Session 1
Document41 pages
AA Basic Writing Skills Session 1
Pong
No ratings yet
Training: Communication - Principles of Effective Communication
Document11 pages
Training: Communication - Principles of Effective Communication
susheel kumar
No ratings yet
NLP Part 2
Document14 pages
NLP Part 2
Sukeshan R
No ratings yet
5 C of COmmunication Brochure
Document2 pages
5 C of COmmunication Brochure
Harry Sidhu
No ratings yet
Celeste Speaking 1
Document23 pages
Celeste Speaking 1
mauro
No ratings yet
2021 PERSONAL FINANCE - Module 1 LP 1-3
Document76 pages
2021 PERSONAL FINANCE - Module 1 LP 1-3
Ariella Salvacion
No ratings yet
ELT Methods and Practices
Document62 pages
ELT Methods and Practices
marlene
No ratings yet
Unit5 - Reading 2
Document62 pages
Unit5 - Reading 2
marlene
No ratings yet
TY QB (Articles and Interviews)
Document12 pages
TY QB (Articles and Interviews)
ciaraellenbrowne2000
No ratings yet
Learning Tips
Document3 pages
Learning Tips
Dominic Galliard
No ratings yet
How To Study Law
Document10 pages
How To Study Law
mukwevho269
No ratings yet
Technology On The Move: Writing Task: Give Advantages and Disadvantages
Document1 page
Technology On The Move: Writing Task: Give Advantages and Disadvantages
Jessica Centeno
No ratings yet
Basic Research Methods: Annie Gabriel Library
Document5 pages
Basic Research Methods: Annie Gabriel Library
Dhiraj Pathak
No ratings yet
Book and Movie Reviews
Document14 pages
Book and Movie Reviews
bharath somashetti
No ratings yet
Chapter 3 CLASS
Document17 pages
Chapter 3 CLASS
Tãjriñ Biñte Ãlãm
No ratings yet
ATP 2023-24 GR 6 English HL Final
Document23 pages
ATP 2023-24 GR 6 English HL Final
paseka1124
No ratings yet
Presentation Skills For Managers - 2
Document46 pages
Presentation Skills For Managers - 2
ttmdm
No ratings yet
PoC 1 Updated 2016
Document23 pages
PoC 1 Updated 2016
Nguyễn Tài
No ratings yet
AA Basic Writing Skills Session 1
Document41 pages
AA Basic Writing Skills Session 1
Dhugaan Galgalaaf
No ratings yet
Mark Scheme (Results) Summer 2019: Pearson Edexcel International GCSE in English Language (4EB1) Paper 01R
Document19 pages
Mark Scheme (Results) Summer 2019: Pearson Edexcel International GCSE in English Language (4EB1) Paper 01R
Nairit
100% (1)
First - Grade 11 - Peta 01 Guidelines
Document1 page
First - Grade 11 - Peta 01 Guidelines
Jean
No ratings yet
Eyes Before Ease
From Everand
Eyes Before Ease
Larry Beason
No ratings yet
Robotics: AU Course Code:ME 8099 Our New Course Code: Year/Semester: IV / VII
Document10 pages
Robotics: AU Course Code:ME 8099 Our New Course Code: Year/Semester: IV / VII
ManojKumar M
No ratings yet
Technical Seminar Report 1cg20me402
Document34 pages
Technical Seminar Report 1cg20me402
K King
No ratings yet
What Will Schools of The Future Look Like
Document4 pages
What Will Schools of The Future Look Like
Linh Nguyen
No ratings yet
The Ultimate AI Guide
Document38 pages
The Ultimate AI Guide
Rochak Bohora
No ratings yet
Machine Learning For Complex and Unmanned Systems: February 2024
Document2 pages
Machine Learning For Complex and Unmanned Systems: February 2024
Dương Trần Mỹ Linh
No ratings yet
CSE3013-Artificial Intelligence CAT1 E1 Slot Section - A
Document4 pages
CSE3013-Artificial Intelligence CAT1 E1 Slot Section - A
VIT ONLINE COURSES
No ratings yet
Paraphrasing Tool - QuillBot AI
Document1 page
Paraphrasing Tool - QuillBot AI
Khosi Ndabankuli
No ratings yet
I. Background: Impact of Ai in The Philippines' Bpos
Document5 pages
I. Background: Impact of Ai in The Philippines' Bpos
Enzo Paraiso
No ratings yet
Drafting A Debate
Document11 pages
Drafting A Debate
HACKER CHAUDHARY
No ratings yet
Use of Robotics and Automation in Construction
Document5 pages
Use of Robotics and Automation in Construction
arlene gammad
No ratings yet
Ai Support Material Class 9
Document161 pages
Ai Support Material Class 9
saumya jude
No ratings yet
Dilshad Ali: Career Objective
Document2 pages
Dilshad Ali: Career Objective
AI GHOST
No ratings yet
Autonomous Robotics in The Home
Document4 pages
Autonomous Robotics in The Home
api-271227136
No ratings yet
Write and Train Your Own Custom Machine Learning Models Using PyCaret
Document18 pages
Write and Train Your Own Custom Machine Learning Models Using PyCaret
Teto Schedule
No ratings yet
Appreciative Inquiry Theory and Critique
Document17 pages
Appreciative Inquiry Theory and Critique
shaikh javed
No ratings yet
Chapter 5 AI
Document27 pages
Chapter 5 AI
amanterefe99
No ratings yet
AI Assignment
Document6 pages
AI Assignment
Khurram Abbas
No ratings yet
WAT Topics
Document13 pages
WAT Topics
Ayusman Arya
No ratings yet
Artificial Intelligence
Document10 pages
Artificial Intelligence
AYUSH PRATAP 9th 'A'
No ratings yet
Pega Customer Service Application Datasheet PDF
Document2 pages
Pega Customer Service Application Datasheet PDF
Priyanshu Yadav
No ratings yet
Guide To Digital Identity Verification Report
Document21 pages
Guide To Digital Identity Verification Report
delbertfgvbnhdoughty45
No ratings yet
Price List
Document18 pages
Price List
Ippi
No ratings yet
Artificial Intelligence in Educ - Eric C. K. Cheng Springer 2022
Document488 pages
Artificial Intelligence in Educ - Eric C. K. Cheng Springer 2022
Dritan Laci
No ratings yet
Endless Legend Modding Tutorial
Document56 pages
Endless Legend Modding Tutorial
samuel_farrant
No ratings yet
AIN1501 - Study Unit - 2
Document4 pages
AIN1501 - Study Unit - 2
Hazel Nyamukapa
No ratings yet
Blockchain and IoT-based Cognitive Edge Framework For Sharing Economy Services in A Smart City
Document12 pages
Blockchain and IoT-based Cognitive Edge Framework For Sharing Economy Services in A Smart City
Nivi Ps
No ratings yet