You are on page 1of 11

EXPERIMENT – 3

Implementation of Classification algorithm


CLASS: TE CMPN A PID- 182048
NAME: Shruti Kamath ROLL NO.: 41

Aim: To implement any one of the classifiers Naïve Bayes using any languages
like Java/Python
S/w Requirement: C, C++ or JAVA, Python

Theory:
 Explain Classification
 Classification means arranging the mass of data into different classes or
groups on the basis of their similarities and resemblances.
 All similar items of data are put in one class and all dissimilar items of data
are put in different classes.
 Statistical data is classified according to its characteristics.
 For example, if we have collected data regarding the number of students
admitted to a university in a year, the students can be classified on the basis of
sex. In this case, all male students will be put in one class and all female
students will be put in another class.
 The students can also be classified on the basis of age, marks, marital status,
height, etc.
 The set of characteristics we choose for the classification of the data depends
upon the objective of the study.
 For example, if we want to study the religions mix of the students, we classify
the students on the basis of religion.

 Explain Bayes theorem


 In statistics and probability theory, the Bayes’ theorem (also known as the
Bayes’ rule) is a mathematical formula used to determine the conditional
probability of events.
 Essentially, the Bayes’ theorem describes the probability of an event based on
prior knowledge of the conditions that might be relevant to the event.
 The theorem is named after English statistician, Thomas Bayes, who
discovered the formula in 1763. It is considered the foundation of the special
statistical inference approach called the Bayes’ inference.
Formula for Bayes’ Theorem
The Bayes’ theorem is expressed in the following formula:

Where:

P(A|B) – the probability of event A occurring, given event B has occurred


P(B|A) – the probability of event B occurring, given event A has occurred
P(A) – the probability of event A
P(B) – the probability of event B

 Explain Naïve Bayes Algorithm with example


 It is a classification technique based on Bayes’ Theorem with an assumption of
independence among predictors. In simple terms, a Naive Bayes classifier
assumes that the presence of a particular feature in a class is unrelated to the
presence of any other feature.
 For example, a fruit may be considered to be an apple if it is red, round, and about
3 inches in diameter. Even if these features depend on each other or upon the
existence of the other features, all of these properties independently contribute to
the probability that this fruit is an apple and that is why it is known as ‘Naive’.
 Naive Bayes model is easy to build and particularly useful for very large data sets.
Along with simplicity, Naive Bayes is known to outperform even
highly sophisticated classification methods

Above,

 P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).


 P(c) is the prior probability of class.
 P(x|c) is the likelihood which is the probability of predictor given class.
 P(x) is the prior probability of predictor.

Pseuodo Code:
Input: Training dataset T
Value of the predictor variable in testing data F = (f1,f2,f3,f4,…..,fn)

Output: A class of testing dataset

Step:
1. Read the training dataset T.
2. Calculate the mean and standard deviation of the predictor variables in each class.
3. Repeat
i. Calculate the probability of fi using the gauss density equation in each class
until the probability of all the predictor variables has been calculated.
4. Calculate the likelihood for each class.
5. Get the greatest likelihood.
-
Implementation:

def calcCount(x, y):

    count = 0

    for element in d:

        if(set([x, y]).issubset(set(element))):

            count += 1

    return count

d=[['young','high','STno','fair','no'],

         ['young','high','STno','good','no'],

         ['middle','high','STno','fair','yes'],

         ['old','medium','STno','fair','yes'],

         ['old','low','STyes','fair','yes'],

         ['old','low','STyes','good','no'],

         ['middle','low','STyes','good','yes'],

         ['young','medium','STno','fair','no'],

         ['young','low','STyes','fair','yes'],

         ['old','medium','STyes','fair','yes'],

         ['young','medium','STyes','good','yes'],

         ['middle','medium','STno','good','yes'],

         ['middle','high','STyes','fair','yes'],

         ['old','medium','STno','good','no']

        ]

y=0

n=0

for i in range(len(d)):

   if(d[i][4]=='yes'):

       y+=1

   else:

       n+=1

py=y/len(d)

pn=n/len(d)
print("P(Yes)="+str(py)+"\nP(No)="+str(pn))

print("--------------------------------")

p_ay=calcCount('young','yes')

print("1.AGE:")

print("P(ageyoung|Yes)="+str(p_ay/y))

p_atfy=calcCount('middle','yes')

print("P(age middle|Yes)="+str(p_atfy/y))

p_agfy=calcCount('old','yes')

print("P(ageold|Yes)="+str(p_agfy/y))

p_an=calcCount('young','no')

print("P(ageyoung|No)="+str(p_an/n))

p_atfn=calcCount('middle','no')

print("P(age middle|No)="+str(p_atfn/n))

p_agfn=calcCount('old','no')

print("P(ageold|No)="+str(p_agfn/n))

print("--------------------------------")

print("2.INCOME:")

p_ihy=calcCount('high','yes')

print("P(income high|Yes)="+str(p_ihy/y))

p_imy=calcCount('medium','yes')

print("P(income medium|Yes)="+str(p_imy/y))

p_ily=calcCount('low','yes')

print("P(income low|Yes)="+str(p_ily/y))

p_ihn=calcCount('high','no')

print("P(income high|no)="+str(p_ihn/n))

p_imn=calcCount('medium','no')

print("P(income medium|no)="+str(p_imn/n))

p_iln=calcCount('low','no')

print("P(income low|no)="+str(p_iln/n))

print("--------------------------------")

print("3.STUDENT:")
p_syy=calcCount('STyes','yes')

print("P(student yes|yes)="+str(p_syy/y))

p_sny=calcCount('STno','yes')

print("P(student no|yes)="+str(p_sny/y))

p_syn=calcCount('STyes','no')

print("P(student yes|no)="+str(p_syn/n))

p_snn=calcCount('STno','no')

print("P(student no|no)="+str(p_snn/n))

print("--------------------------------")

print("4.CREDIT RATING:")

p_crfy=calcCount('fair','yes')

print("P(credit_rating fair|Yes)="+str(p_crfy/y))

p_crey=calcCount('good','yes')

print("P(credit_rating good|Yes)="+str(p_crey/y))

p_crfn=calcCount('fair','no')

print("P(credit_rating fair|no)="+str(p_crfn/n))

p_cren=calcCount('good','no')

print("P(credit_rating good|no)="+str(p_cren/n))

data = 'young,medium,STyes,fair'

newData = data.split(',')

print('')

print("DATA SAMPLE")

print(newData)

p_newyes=(p_ay)*(p_imy)*(p_syy)*(p_crfy)*(py)

p_newno=(p_an)*(p_imn)*(p_syn)*(p_crfn)*(pn)

print('The student will ',end="")

if(p_newyes>p_newno):

    print('buy computer')

else:
    print(' not buy computer')

Output:
Conclusion:

You might also like