You are on page 1of 2

Q.What information is lost when we represent a document as a bag of words model?

A bag-of-words is a representation of text that describes the occurrence of words


within a
document. It involves two things:
A vocabulary of known words
A measure of the presence of known words.

it suffers from some shortcomings


Vocabulary: The vocabulary requires careful design, most specifically in order to
manage the
size, which impacts the sparsity of the document representations.

Sparsity: Sparse representations are harder to model both for computational reasons
(space
and time complexity) and also for information reasons where the challenge is for
the models
to harness so little information in such a large representational space.

Meaning: Discarding word order ignores the context, and in turn meaning of words in
the
document (semantics). Context and meaning can offer a lot to the model that if
modelled
could tell the difference between the same words differently arranged ("this is
interesting" vs
is this interesting"), synonyms ("old bike" vs "used bike"), and much more

Q.Why the denominator is dropped to simplify the naive bayes classification


algorithm?

The denominator does not change, it remains static. Therefore, the denominator can
be
removed and proportionality can be injected, it remains same throughout all the
classes that
is why it can be dropped to simplify the calculation.

Q.How can we use semi-supervised methods for lexicon learning? Provide two
examples.

Semi-supervised machine learning is a combination of supervised and unsupervised


learning. It uses a small amount of labeled data and a large amount of unlabeled
data,
which provides the benefits of both unsupervised and supervised learning while
avoiding the
challenges of finding a large amount of labeled data. That means you can train a
model to
label data without having to use as much labeled training data.

However, there are situations where some of the cluster labels, outcome variables,
or
information about relationships within the data are known. This is where semi-
supervised
clustering comes in. Semi supervised clustering uses some known cluster information
in order
to classify other unlabeled data, meaning it uses both labeled and unlabeled data
just like
semi supervised machine learning.

Q.If you have three classes A, B, and C in test set with the number of examples 10,
10 and 80
respectively. What do you think is a better averaging measure? Will you prefer
Micro or Macro
averaging and for what reason will you select one?

Micro- and macro-averages (for whatever metric) will compute slightly different
things, and
thus their interpretation differs. A macro-average will compute the metric
independently for
each class and then take the average (hence treating all classes equally), whereas
a micro-
average will aggregate the contributions of all classes to compute the average
metric. In a
multi-class classification setup, micro-average is preferable if you suspect there
might be
class imbalance (i.e you may have many more examples of one class than of other
classes).

The large standard deviation tells us that the average does not stem from a uniform
precision among classes, but it might be just easier to compute the weighted macro-
average, which in essence is another way of computing the micro-average.

Q. Why the naive bayes algorithm is called naïve? What are the two assumptions for
multinomial naive Bayes.

Naive Bayes is a simple and powerful algorithm for predictive modelling. ... Naive
Bayes is called naive because it assumes that each input variable is independent.

conditioonal independence and mutual independence are two assumtion of multinomial


naive Bayes

You might also like