Information lost in bag-of-words model

Uploaded by

Slippery Stairs

0% found this document useful (0 votes)

15 views2 pages

Original Title

NlpQuiz

Copyright

Available Formats

TXT, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as TXT, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

15 views2 pages

Information lost in bag-of-words model

Uploaded by

Slippery Stairs

Copyright:

Available Formats

Download as TXT, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 2

Search inside document

Q.What information is lost when we represent a document as a bag of words model?

A bag-of-words is a representation of text that describes the occurrence of words

within a
document. It involves two things:
A vocabulary of known words
A measure of the presence of known words.

it suffers from some shortcomings

Vocabulary: The vocabulary requires careful design, most specifically in order to
manage the
size, which impacts the sparsity of the document representations.

Sparsity: Sparse representations are harder to model both for computational reasons
(space
and time complexity) and also for information reasons where the challenge is for
the models
to harness so little information in such a large representational space.

Meaning: Discarding word order ignores the context, and in turn meaning of words in
the
document (semantics). Context and meaning can offer a lot to the model that if
modelled
could tell the difference between the same words differently arranged ("this is
interesting" vs
is this interesting"), synonyms ("old bike" vs "used bike"), and much more

Q.Why the denominator is dropped to simplify the naive bayes classification

algorithm?

The denominator does not change, it remains static. Therefore, the denominator can
be
removed and proportionality can be injected, it remains same throughout all the
classes that
is why it can be dropped to simplify the calculation.

Q.How can we use semi-supervised methods for lexicon learning? Provide two
examples.

Semi-supervised machine learning is a combination of supervised and unsupervised

learning. It uses a small amount of labeled data and a large amount of unlabeled
data,
which provides the benefits of both unsupervised and supervised learning while
avoiding the
challenges of finding a large amount of labeled data. That means you can train a
model to
label data without having to use as much labeled training data.

However, there are situations where some of the cluster labels, outcome variables,
or
information about relationships within the data are known. This is where semi-
supervised
clustering comes in. Semi supervised clustering uses some known cluster information
in order
to classify other unlabeled data, meaning it uses both labeled and unlabeled data
just like
semi supervised machine learning.

Q.If you have three classes A, B, and C in test set with the number of examples 10,
10 and 80
respectively. What do you think is a better averaging measure? Will you prefer
Micro or Macro
averaging and for what reason will you select one?

Micro- and macro-averages (for whatever metric) will compute slightly different
things, and
thus their interpretation differs. A macro-average will compute the metric
independently for
each class and then take the average (hence treating all classes equally), whereas
a micro-
average will aggregate the contributions of all classes to compute the average
metric. In a
multi-class classification setup, micro-average is preferable if you suspect there
might be
class imbalance (i.e you may have many more examples of one class than of other
classes).

The large standard deviation tells us that the average does not stem from a uniform
precision among classes, but it might be just easier to compute the weighted macro-
average, which in essence is another way of computing the micro-average.

Q. Why the naive bayes algorithm is called naïve? What are the two assumptions for
multinomial naive Bayes.

Naive Bayes is a simple and powerful algorithm for predictive modelling. ... Naive
Bayes is called naive because it assumes that each input variable is independent.

conditioonal independence and mutual independence are two assumtion of multinomial

naive Bayes

Information Retrieval Models
Document4 pages
Information Retrieval Models
HellenNdegwa
No ratings yet
Data Models
Document5 pages
Data Models
sammarmughal77
No ratings yet
Thesis Tables and Figures
Document6 pages
Thesis Tables and Figures
brandycarpenterbillings
100% (2)
MILES - Cap. 5
Document10 pages
MILES - Cap. 5
sayuriunoki
No ratings yet
Cross-cutting Models of Distributional Lexical Semantics
Document53 pages
Cross-cutting Models of Distributional Lexical Semantics
John Kirk
No ratings yet
DBMS Lyq 18 (Bca)
Document9 pages
DBMS Lyq 18 (Bca)
yourtraderguy
No ratings yet
The Evolution of Database Modeling
Document11 pages
The Evolution of Database Modeling
raadr17
No ratings yet
Quora Question Pairs
Document7 pages
Quora Question Pairs
Muhammed Abdalshakour
No ratings yet
Getting Started with Data Types, Databases, and Workloads
Document12 pages
Getting Started with Data Types, Databases, and Workloads
BC Group
No ratings yet
Data Base 1
Document24 pages
Data Base 1
Kasun Sankalpa අමරසිංහ
No ratings yet
Adb Dis 2
Document4 pages
Adb Dis 2
Victor Wangai
No ratings yet
Semantic Web Dissertation
Document4 pages
Semantic Web Dissertation
CanYouWriteMyPaperForMeUK
100% (1)
Ôn-thi-KTDL
Document18 pages
Ôn-thi-KTDL
20521292
No ratings yet
Ontology and Database Schema, What Is The Difference
Document16 pages
Ontology and Database Schema, What Is The Difference
IisAfriyanti
No ratings yet
Databases Model
Document6 pages
Databases Model
irshia9469
No ratings yet
Database Models: Hierarchical Model
Document6 pages
Database Models: Hierarchical Model
dipanshuhandoo
No ratings yet
DMW Lab File Work
Document18 pages
DMW Lab File Work
Mahesh Kabra
No ratings yet
Database Models Overview
Document11 pages
Database Models Overview
Iroegbu Mang John
No ratings yet
Course and Book Recommendation Model Based On The Item Based Filtering System With Similarity Measure Based On The Dice Coefficient
Document4 pages
Course and Book Recommendation Model Based On The Item Based Filtering System With Similarity Measure Based On The Dice Coefficient
International Journal of Innovative Science and Research Technology
No ratings yet
Chapter 2. The Relational Model: Objectives
Document37 pages
Chapter 2. The Relational Model: Objectives
Ezzaddin Sultan
No ratings yet
Improving Webpage Clustering Using Multiview Laerning
Document6 pages
Improving Webpage Clustering Using Multiview Laerning
International Journal of Application or Innovation in Engineering & Management
No ratings yet
Assignment2 ProblemsSolutions
Document4 pages
Assignment2 ProblemsSolutions
waqas amjad
No ratings yet
09 Relation Extraction and Scoring
Document12 pages
09 Relation Extraction and Scoring
50_BMG
No ratings yet
Influential Vocabulary Detection
Document15 pages
Influential Vocabulary Detection
api-263491997
No ratings yet
Database Models: Hierarchical Model
Document6 pages
Database Models: Hierarchical Model
Vinothkumar
No ratings yet
Database Models (Types of Databases)
Document3 pages
Database Models (Types of Databases)
jeunekaur
0% (1)
Database Models
Document5 pages
Database Models
Jesmine Gandhi
No ratings yet
Flat (Or Table) Model Data
Document6 pages
Flat (Or Table) Model Data
Jorge Romeo Rosal Jr.
No ratings yet
Thoughts On Clustering: Draft December 9, 2009
Document9 pages
Thoughts On Clustering: Draft December 9, 2009
karteekak
No ratings yet
Latent Semantic Analysis
Document14 pages
Latent Semantic Analysis
Yunianita Rahmawati
No ratings yet
CSE4412 & CSE6412 3.0 Data Mining Syllabus
Document9 pages
CSE4412 & CSE6412 3.0 Data Mining Syllabus
MuhammadRizvannIslamKhan
No ratings yet
Fundamentals of Relational Database Design
Document18 pages
Fundamentals of Relational Database Design
Bao Huynh
No ratings yet
Document Classification Using Machine Learning: What Is Document Classifier?
Document9 pages
Document Classification Using Machine Learning: What Is Document Classifier?
manasa kc
No ratings yet
Table Extraction For Answer Retrieval: Xing Wei, Bruce Croft, Andrew Mccallum
Document26 pages
Table Extraction For Answer Retrieval: Xing Wei, Bruce Croft, Andrew Mccallum
trungnv
No ratings yet
K-Means Document Clustering Using Vector Space Model
Document5 pages
K-Means Document Clustering Using Vector Space Model
BONFRING
No ratings yet
Schema Matching
Document4 pages
Schema Matching
katherine976
No ratings yet
Clustering and Search Techniques in Information Retrieval Systems
Document39 pages
Clustering and Search Techniques in Information Retrieval Systems
Karumuri Sri Rama Murthy
67% (3)
In The Social Sciences (Pp. 95-124) - New York: Routledge
Document45 pages
In The Social Sciences (Pp. 95-124) - New York: Routledge
ulysses_lp
No ratings yet
LDA Topic Model With Soft Assignment of Descriptors To Words
Document9 pages
LDA Topic Model With Soft Assignment of Descriptors To Words
Fabian Moss
No ratings yet
Extended Notes For Multivariates
Document15 pages
Extended Notes For Multivariates
newgame999
No ratings yet
DBMS Assignment
Document11 pages
DBMS Assignment
veeru19verma4418
100% (1)
Cybersecurity and Applied Mathematics
From Everand
Cybersecurity and Applied Mathematics
Leigh Metcalf
No ratings yet
View of Data
Document4 pages
View of Data
Lakshmi Lakshminarayana
No ratings yet
Ans Key CIA 2 Set 1
Document9 pages
Ans Key CIA 2 Set 1
kyahogatera45
No ratings yet
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
Document12 pages
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
Marian Aldescu
No ratings yet
Tracking Context Changes Through Meta-Learning: Editors: Ryszard S. Michalski and Janusz Wnek
Document28 pages
Tracking Context Changes Through Meta-Learning: Editors: Ryszard S. Michalski and Janusz Wnek
bisema
No ratings yet
Research Paper On Knowledge Representation
Document8 pages
Research Paper On Knowledge Representation
vbbcvwplg
100% (1)
Hierarchical Model: Database Models
Document8 pages
Hierarchical Model: Database Models
mallireddy1234
No ratings yet
Advantages of multidimensional data models explored in case study
Document6 pages
Advantages of multidimensional data models explored in case study
Kadir Sahan
No ratings yet
Types of Database Models
Document6 pages
Types of Database Models
dextermartins608
No ratings yet
A Personal Perspective On Keyword Search Over Data Graphs: Yehoshua Sagiv
Document12 pages
A Personal Perspective On Keyword Search Over Data Graphs: Yehoshua Sagiv
Hoc Nguyen Van
No ratings yet
Ontology Matching - A Machine Learning Approach
Document20 pages
Ontology Matching - A Machine Learning Approach
Davood Yousefi
No ratings yet
Differentiate IM Terms Records Fields Tables
Document12 pages
Differentiate IM Terms Records Fields Tables
Drake Wells
0% (1)
Bharat Hic ML 2010
Document8 pages
Bharat Hic ML 2010
Parameshwari Ramdass
No ratings yet
Book Exercises NayelliAnswers
Document3 pages
Book Exercises NayelliAnswers
Nayelli Valeria Pc
No ratings yet
Normalisation
Document19 pages
Normalisation
Drake Wells
No ratings yet
Dev Answer Key
Document17 pages
Dev Answer Key
jayapriya kce
100% (1)
Tree-Based Model
Document21 pages
Tree-Based Model
NachyAloysius
No ratings yet
Semantic Modeling In Formal English
From Everand
Semantic Modeling In Formal English
Dr. Ir. Andries Van Renssen
No ratings yet
50 Most Challenging Algebra Problems!
From Everand
50 Most Challenging Algebra Problems!
Andrei Besedin
No ratings yet
Windows10systemprogramming Sample
Document55 pages
Windows10systemprogramming Sample
Slippery Stairs
100% (1)
NLP Q1
Document1 page
NLP Q1
Slippery Stairs
No ratings yet
Assignment 3
Document1 page
Assignment 3
Slippery Stairs
No ratings yet
Task NLP Hira Sher
Document1 page
Task NLP Hira Sher
Slippery Stairs
No ratings yet
Win32 Api PDF
Document37 pages
Win32 Api PDF
Jhon Coffeines Adv
No ratings yet
Text Mining: Big Data For Economic Applications
Document24 pages
Text Mining: Big Data For Economic Applications
Valerio Zarrelli
No ratings yet
Forward Feature Selection For Toxic Speech Classification Using Support Vector Machine and Random Forest
Document10 pages
Forward Feature Selection For Toxic Speech Classification Using Support Vector Machine and Random Forest
IAES IJAI
No ratings yet
von Mises-Fisher Mixture Model-based Deep Learning for Face Verification
Document16 pages
von Mises-Fisher Mixture Model-based Deep Learning for Face Verification
assasaa asasa
No ratings yet
DMW - Unit 1
Document21 pages
DMW - Unit 1
Priya Bhalerao
No ratings yet
Gaussian Process Regression Based Remaining Fatigue Lif - 2022 - International J
Document9 pages
Gaussian Process Regression Based Remaining Fatigue Lif - 2022 - International J
Liushizhou
No ratings yet
J Tust 2019 04 019
Document7 pages
J Tust 2019 04 019
Warwick Hastie
No ratings yet
First Doctoral Committee: Researcher: B.Madasamy
Document25 pages
First Doctoral Committee: Researcher: B.Madasamy
ymeenamca
No ratings yet
Decision Tree Explained in 40 Characters
Document16 pages
Decision Tree Explained in 40 Characters
reshma acharya
No ratings yet
Machine Learning in Action
Document1 page
Machine Learning in Action
Dreamtech Press
100% (1)
Machine Learning in Antenna Design: An Overview On Machine Learning Concept and Algorithms
Document9 pages
Machine Learning in Antenna Design: An Overview On Machine Learning Concept and Algorithms
abdul jawad
No ratings yet
ML Introduction: Hacking + Math & Statistics
Document10 pages
ML Introduction: Hacking + Math & Statistics
Sanath Murdeshwar
No ratings yet
OPMA 419 Course Outline
Document6 pages
OPMA 419 Course Outline
Cindy Wang
No ratings yet
ML and Deep Learning Syllabus
Document3 pages
ML and Deep Learning Syllabus
Padmini Palli
No ratings yet
World Class Manufacturing
Document78 pages
World Class Manufacturing
RAJASREE S
No ratings yet
Heart Disease rp2
Document14 pages
Heart Disease rp2
Ekansh
No ratings yet
Projectreport Diabetes Prediction
Document22 pages
Projectreport Diabetes Prediction
ushavalsa
No ratings yet
AL3451 -unit 1
Document12 pages
AL3451 -unit 1
jeffrinsb7
No ratings yet
Resensi Big Data, Data Mining, and Machine Learning "Bahasa Inggris"
Document2 pages
Resensi Big Data, Data Mining, and Machine Learning "Bahasa Inggris"
Rolansyah Trisakti
No ratings yet
Boltzmann Machine - Tutorialspoint
Document3 pages
Boltzmann Machine - Tutorialspoint
prabhuraaj101
No ratings yet
IEEE - Deep Neural Chatbot for Medical Conversations
Document7 pages
IEEE - Deep Neural Chatbot for Medical Conversations
Đặng Phương Nam
No ratings yet
Review DL2019
Document16 pages
Review DL2019
Thuan Nguyen
No ratings yet
Data Mining Project Ideas & Algorithms
Document8 pages
Data Mining Project Ideas & Algorithms
Irimescu Andrei
No ratings yet
PGP in Data Science and AI With Fellowship
Document14 pages
PGP in Data Science and AI With Fellowship
Nilesh Roy
No ratings yet
Syllabus - 3650014 Machine Learning
Document3 pages
Syllabus - 3650014 Machine Learning
kn patel
No ratings yet
Random Forest
Document10 pages
Random Forest
noname
No ratings yet
C19 - Group 1 - History and Trends in Management-Reporting
Document12 pages
C19 - Group 1 - History and Trends in Management-Reporting
sharlyn grace
No ratings yet
Early Diagnosis of Parkinson's Disease: A Combined Method Using Deep Learning and Neuro-Fuzzy Techniques
Document14 pages
Early Diagnosis of Parkinson's Disease: A Combined Method Using Deep Learning and Neuro-Fuzzy Techniques
Sally Abdulaziz
No ratings yet
(N) Semi-Supervised Learning Quantization Algorithm With Deep Features
Document13 pages
(N) Semi-Supervised Learning Quantization Algorithm With Deep Features
Christian F. Vega
No ratings yet
AWS-IDC Executive Summary Final
Document4 pages
AWS-IDC Executive Summary Final
zaaaawar
No ratings yet
Post Graduate Program in Data Science and Business Analytics Online
Document18 pages
Post Graduate Program in Data Science and Business Analytics Online
maxamed Qadar
No ratings yet