Professional Documents
Culture Documents
Sparsity: Sparse representations are harder to model both for computational reasons
(space
and time complexity) and also for information reasons where the challenge is for
the models
to harness so little information in such a large representational space.
Meaning: Discarding word order ignores the context, and in turn meaning of words in
the
document (semantics). Context and meaning can offer a lot to the model that if
modelled
could tell the difference between the same words differently arranged ("this is
interesting" vs
is this interesting"), synonyms ("old bike" vs "used bike"), and much more
The denominator does not change, it remains static. Therefore, the denominator can
be
removed and proportionality can be injected, it remains same throughout all the
classes that
is why it can be dropped to simplify the calculation.
Q.How can we use semi-supervised methods for lexicon learning? Provide two
examples.
However, there are situations where some of the cluster labels, outcome variables,
or
information about relationships within the data are known. This is where semi-
supervised
clustering comes in. Semi supervised clustering uses some known cluster information
in order
to classify other unlabeled data, meaning it uses both labeled and unlabeled data
just like
semi supervised machine learning.
Q.If you have three classes A, B, and C in test set with the number of examples 10,
10 and 80
respectively. What do you think is a better averaging measure? Will you prefer
Micro or Macro
averaging and for what reason will you select one?
Micro- and macro-averages (for whatever metric) will compute slightly different
things, and
thus their interpretation differs. A macro-average will compute the metric
independently for
each class and then take the average (hence treating all classes equally), whereas
a micro-
average will aggregate the contributions of all classes to compute the average
metric. In a
multi-class classification setup, micro-average is preferable if you suspect there
might be
class imbalance (i.e you may have many more examples of one class than of other
classes).
The large standard deviation tells us that the average does not stem from a uniform
precision among classes, but it might be just easier to compute the weighted macro-
average, which in essence is another way of computing the micro-average.
Q. Why the naive bayes algorithm is called naïve? What are the two assumptions for
multinomial naive Bayes.
Naive Bayes is a simple and powerful algorithm for predictive modelling. ... Naive
Bayes is called naive because it assumes that each input variable is independent.