Professional Documents
Culture Documents
SNLP 2016
Input
A document d
A fixed set of classes C = {c1 , c2 , . . . , cn }
Input
A document d
A fixed set of classes C = {c1 , c2 , . . . , cn }
Output
A predicted class c ∈ C
Spam
Spam
black-list-address OR (“dollars” AND “have been selected”)
Spam
black-list-address OR (“dollars” AND “have been selected”)
Naïve Bayes
Logistic regression
Support-vector machines
...
P(d|c)P(c)
P(c|d) =
P(d)
P(d|c)P(c)
P(c|d) =
P(d)
P(x1 , x2 , . . . , xn |c)
P(x1 , x2 , . . . , xn |c)
P(x1 , x2 , . . . , xn |c)
Conditional Independence
Assume the feature probabilities P(xi |cj ) are independent given the class cj .
P(x1 , x2 , . . . , xn |c)
Conditional Independence
Assume the feature probabilities P(xi |cj ) are independent given the class cj .
count(wi , c) + 1
P̂(wi |c) =
∑ (count(w, c) + 1)
w∈V
count(wi , c) + 1
=
( ∑ (count(w, c)) + |V|
w∈V
But if we use only the word features and all the words in the text
Naïve Bayes has an important similarity to language modeling.
But if we use only the word features and all the words in the text
Naïve Bayes has an important similarity to language modeling.
Each class can be thought of as a separate unigram language model.
Multi-value classification
A document can belong to 0, 1 or > 1 classes
Multi-value classification
A document can belong to 0, 1 or > 1 classes
Multi-value classification
A document can belong to 0, 1 or > 1 classes
Multi-value classification
A document can belong to 0, 1 or > 1 classes
For each pair of classes < c1 , c2 > how many documents from c1 were
incorrectly assigned to c2 ? (when c2 6= c1 )
Recall
Recall
cii
Fraction of docs in class i classified correctly:
∑ cij
j
Recall
cii
Fraction of docs in class i classified correctly:
∑ cij
j
Precision
Fraction of docs assigned class i that are actually about class i:
Recall
cii
Fraction of docs in class i classified correctly:
∑ cij
j
Precision
cii
Fraction of docs assigned class i that are actually about class i:
∑ cji
i
Recall
cii
Fraction of docs in class i classified correctly:
∑ cij
j
Precision
cii
Fraction of docs assigned class i that are actually about class i:
∑ cji
i
Accuracy
∑ cii
Fraction of docs classified correctly: i N
Macro-averaging
Compute performance for each class, then average
Macro-averaging
Compute performance for each class, then average
Micro-averaging
Collect decisions for all the classes, compute contingency table, evaluate.
Macro-averaged precision:
In a task of text-classification, assume that you want to categorize various documents into 3
classes, ‘politics’, ‘sports’ and ‘nature’. The table below shows 16 such documents, their true
labels and the label assigned by your classifier.
Construct the confusion matrix and find the macro-averaged and micro-averaged precision
values. Which category has the highest recall?