You are on page 1of 5

Assignment No 1:

Submitted by:

1- Muhammad Ali (Reg.#BSSE07163044)

Class:

BSSE-8.

Submitted to:

MISS SAIRA MOIN.

Subject:

NATURAL PROCESSING LANGUAGE.

Department of Computer Sciences

The University of Lahore, Sargodha Campus


Text Classification (Navies Bayes)

The Naive Bayes classifier is a simple classifier that classifies based on probabilities of events. It
is the applied commonly to text classification. Though it is a simple algorithm, it performs well in
many text classification problems.

Other Pros include less training time and less training data. That is, less CPU and Memory
consumption.

As with any machine learning model, we need to have an existing set of examples (training set)
for each category (class).

Let us consider text classification to classify a sentence to either ‘question’ or ‘statement’. In this
case, there are two classes (“question” and “statement”). With the training set, we can train a
Naive Bayes classifier which we can use to automatically categorize a new sentence.

We need to find out if a new sentence, say, ‘what is the price of the book’ is a question or not.

Bayes’ Theorem:

We need to find out what is the Probability of class ‘Question’ given the new sentence and the
Probability of class ‘Statement’ given the new sentence
Here, we need to find out which class has a bigger probability for the new sentence. i.e., we need
to find which of the below is bigger

Since the denominator is same for both the equations, we can ignore the denominator and have to
find out the values for the numerator.

The problem here is the new sentence need not have to appear in the class within the training set.
In that case, the probability is zero. i.e., since ‘what is the price of the book’ did not appear in any
of the classes in the training set, the probability is zero. But this is not useful.
So let us split the sentence to words and assume that every word in a sentence is independent of
the other ones. That is, we’re no longer looking at entire sentence, but rather at individual words.

The next step is just to calculate every probability in the above equations.

Now that we have frequency of each words in each class, we can calculate the probability for each
word in a given class. We know the probability of occurrence of words in a class, we can
substitute the values in
Therefore the new sentence will be classified into class which has higher frequency of words as
per results.

You might also like