You are on page 1of 56

Text Classification

SNLP 2016

CSE, IIT Kharagpur

November 7th, 2016

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 1 / 29


Example: Positive or negative movie review?

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 2 / 29


Example: Male or Female Author?

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 3 / 29


Example: What is the subject of this article?

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 4 / 29


Taxt Classification

Assigning subject categories, topics, or genres


Spam detection
Authorship identification
Age/gender identification
Language identification
Sentiment analysis
...

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 5 / 29


Text classification: problem definition

Input
A document d
A fixed set of classes C = {c1 , c2 , . . . , cn }

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 6 / 29


Text classification: problem definition

Input
A document d
A fixed set of classes C = {c1 , c2 , . . . , cn }

Output
A predicted class c ∈ C

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 6 / 29


Classification Methods: Hand-coded rules

Rules based on combinations of words or other features

Spam

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 7 / 29


Classification Methods: Hand-coded rules

Rules based on combinations of words or other features

Spam
black-list-address OR (“dollars” AND “have been selected”)

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 7 / 29


Classification Methods: Hand-coded rules

Rules based on combinations of words or other features

Spam
black-list-address OR (“dollars” AND “have been selected”)

Pros and Cons


Accuracy can be high if rules carefully refined by expert, but building and
maintaining these rules is expensive.

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 7 / 29


Classification Methods: Supervised Machine Learning

Naïve Bayes
Logistic regression
Support-vector machines
...

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 8 / 29


Naïve Bayes Intuition

Simple classification method based on Bayes’ rule


Relies on very simple representation of document - Bag of words

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 9 / 29


Bag of words for document classification

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 10 / 29


Bayes’ rule for documents and classes

For a document d and a class c

P(d|c)P(c)
P(c|d) =
P(d)

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 11 / 29


Bayes’ rule for documents and classes

For a document d and a class c

P(d|c)P(c)
P(c|d) =
P(d)

Naïve Bayes Classifier

cMAP = arg max P(c|d)


c∈C

= arg max P(d|c)P(c)


c∈C

= arg max P(x1 , x2 , . . . , xn |c)P(c)


c∈C

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 11 / 29


Naïve Bayes classification assumptions

P(x1 , x2 , . . . , xn |c)

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 12 / 29


Naïve Bayes classification assumptions

P(x1 , x2 , . . . , xn |c)

Bag of words assumption


Assume that the position of a word in the document doesn’t matter

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 12 / 29


Naïve Bayes classification assumptions

P(x1 , x2 , . . . , xn |c)

Bag of words assumption


Assume that the position of a word in the document doesn’t matter

Conditional Independence
Assume the feature probabilities P(xi |cj ) are independent given the class cj .

P(x1 , x2 , . . . , xn |c) = P(x1 |c) · P(x2 |c) . . . P(xn |c)

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 12 / 29


Naïve Bayes classification assumptions

P(x1 , x2 , . . . , xn |c)

Bag of words assumption


Assume that the position of a word in the document doesn’t matter

Conditional Independence
Assume the feature probabilities P(xi |cj ) are independent given the class cj .

P(x1 , x2 , . . . , xn |c) = P(x1 |c) · P(x2 |c) . . . P(xn |c)

cNB = arg max P(c) ∏ P(x|c)


c∈C x∈X

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 12 / 29


Learning the model parameters

Maximum Likelihood Estimate


doc − count(C = cj )
P̂(cj ) =
Ndoc
count(wi , cj )
P̂(wi |cj ) =
∑ count(w, cj )
w∈V

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 13 / 29


Learning the model parameters

Maximum Likelihood Estimate


doc − count(C = cj )
P̂(cj ) =
Ndoc
count(wi , cj )
P̂(wi |cj ) =
∑ count(w, cj )
w∈V

Problem with MLE


Suppose in the training data, we haven’t seen the word “fantastic”, classified in
the topic ‘positive’.
P̂(fantastic|positive) = 0

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 13 / 29


Learning the model parameters

Maximum Likelihood Estimate


doc − count(C = cj )
P̂(cj ) =
Ndoc
count(wi , cj )
P̂(wi |cj ) =
∑ count(w, cj )
w∈V

Problem with MLE


Suppose in the training data, we haven’t seen the word “fantastic”, classified in
the topic ‘positive’.
P̂(fantastic|positive) = 0

cNB = arg max P̂(c) ∏ P̂(xi |c)


c x∈X

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 13 / 29


Laplace (add-1) smoothing

count(wi , c) + 1
P̂(wi |c) =
∑ (count(w, c) + 1)
w∈V

count(wi , c) + 1
=
( ∑ (count(w, c)) + |V|
w∈V

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 14 / 29


A worked example

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 15 / 29


A worked example

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 16 / 29


A worked example

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 17 / 29


A worked example

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 18 / 29


A worked example

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 19 / 29


A worked example

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 20 / 29


Naïve Bayes and Language Modeling

In general, NB classifier can use any feature


URL, email addresses, dictionaries, network features

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 21 / 29


Naïve Bayes and Language Modeling

In general, NB classifier can use any feature


URL, email addresses, dictionaries, network features

But if we use only the word features and all the words in the text
Naïve Bayes has an important similarity to language modeling.

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 21 / 29


Naïve Bayes and Language Modeling

In general, NB classifier can use any feature


URL, email addresses, dictionaries, network features

But if we use only the word features and all the words in the text
Naïve Bayes has an important similarity to language modeling.
Each class can be thought of as a separate unigram language model.

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 21 / 29


Naïve Bayes as Language Modeling

Which class assigns a higher probability to the sentence?

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 22 / 29


Naïve Bayes: More than Two Classes

Multi-value classification
A document can belong to 0, 1 or > 1 classes

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 23 / 29


Naïve Bayes: More than Two Classes

Multi-value classification
A document can belong to 0, 1 or > 1 classes

Handling Multi-value classification


For each class c ∈ C, build a classifier γc to distinguish c from all other
classes c0 ∈ C

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 23 / 29


Naïve Bayes: More than Two Classes

Multi-value classification
A document can belong to 0, 1 or > 1 classes

Handling Multi-value classification


For each class c ∈ C, build a classifier γc to distinguish c from all other
classes c0 ∈ C
Given test-doc d, evaluate it for membership in each class using each γc

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 23 / 29


Naïve Bayes: More than Two Classes

Multi-value classification
A document can belong to 0, 1 or > 1 classes

Handling Multi-value classification


For each class c ∈ C, build a classifier γc to distinguish c from all other
classes c0 ∈ C
Given test-doc d, evaluate it for membership in each class using each γc
d belongs to any class for which γc returns true

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 23 / 29


Naïve Bayes: More than Two Classes

One-of or multinomial classification


Classes are mutually exclusive: each document in exactly one class

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 24 / 29


Naïve Bayes: More than Two Classes

One-of or multinomial classification


Classes are mutually exclusive: each document in exactly one class

Binary classifiers may also be used


For each class c ∈ C, build a classifier γc to distinguish c from all other
classes c0 ∈ C

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 24 / 29


Naïve Bayes: More than Two Classes

One-of or multinomial classification


Classes are mutually exclusive: each document in exactly one class

Binary classifiers may also be used


For each class c ∈ C, build a classifier γc to distinguish c from all other
classes c0 ∈ C
Given test-doc d, evaluate it for membership in each class using each γc

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 24 / 29


Naïve Bayes: More than Two Classes

One-of or multinomial classification


Classes are mutually exclusive: each document in exactly one class

Binary classifiers may also be used


For each class c ∈ C, build a classifier γc to distinguish c from all other
classes c0 ∈ C
Given test-doc d, evaluate it for membership in each class using each γc
d belongs to one class with maximum score

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 24 / 29


Evaluation: Constructing Confusion matrix c

For each pair of classes < c1 , c2 > how many documents from c1 were
incorrectly assigned to c2 ? (when c2 6= c1 )

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 25 / 29


Per class evaluation measures

Recall

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 26 / 29


Per class evaluation measures

Recall
cii
Fraction of docs in class i classified correctly:
∑ cij
j

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 26 / 29


Per class evaluation measures

Recall
cii
Fraction of docs in class i classified correctly:
∑ cij
j

Precision
Fraction of docs assigned class i that are actually about class i:

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 26 / 29


Per class evaluation measures

Recall
cii
Fraction of docs in class i classified correctly:
∑ cij
j

Precision
cii
Fraction of docs assigned class i that are actually about class i:
∑ cji
i

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 26 / 29


Per class evaluation measures

Recall
cii
Fraction of docs in class i classified correctly:
∑ cij
j

Precision
cii
Fraction of docs assigned class i that are actually about class i:
∑ cji
i

Accuracy
∑ cii
Fraction of docs classified correctly: i N

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 26 / 29


Micro- vs. Macro-Average

If we have more than one class, how do we combine multiple performance


measures into one quantity?

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 27 / 29


Micro- vs. Macro-Average

If we have more than one class, how do we combine multiple performance


measures into one quantity?

Macro-averaging
Compute performance for each class, then average

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 27 / 29


Micro- vs. Macro-Average

If we have more than one class, how do we combine multiple performance


measures into one quantity?

Macro-averaging
Compute performance for each class, then average

Micro-averaging
Collect decisions for all the classes, compute contingency table, evaluate.

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 27 / 29


Micro- vs. Macro-Average

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 28 / 29


Micro- vs. Macro-Average

Macro-averaged precision:

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 28 / 29


Micro- vs. Macro-Average

Macro-averaged precision: (0.5 + 0.9)/2 = 0.7


Micro-averaged precision:

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 28 / 29


Micro- vs. Macro-Average

Macro-averaged precision: (0.5 + 0.9)/2 = 0.7


Micro-averaged precision: 100/120 = 0.83

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 28 / 29


Micro- vs. Macro-Average

Macro-averaged precision: (0.5 + 0.9)/2 = 0.7


Micro-averaged precision: 100/120 = 0.83

Micro-averaged score is dominated by score on common classes

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 28 / 29


Sample Problem

In a task of text-classification, assume that you want to categorize various documents into 3
classes, ‘politics’, ‘sports’ and ‘nature’. The table below shows 16 such documents, their true
labels and the label assigned by your classifier.

Doc. Actual Predicted Doc. Actual Predicted


1 politics sports 2 politics politics
3 sports sports 4 nature politics
5 politics politics 6 nature nature
7 sports sports 8 sports nature
9 politics politics 10 politics politics
11 politics nature 12 politics sports
13 sports politics 14 nature politics
15 nature nature 16 politics nature

Construct the confusion matrix and find the macro-averaged and micro-averaged precision
values. Which category has the highest recall?

SNLP 2016 (IIT Kharagpur) Text Classification November 7th, 2016 29 / 29

You might also like