Implementation of Classification Algorithm: Experiment - 3

EXPERIMENT – 3
Implementation of Classification algorithm

CLASS: TE CMPN A PID- 182048
NAME: Shruti Kamath ROLL NO.: 41
Aim: To implement any one of the classifiers Naïve Bayes using any languages
like Java/Python
S/w Requirement: C, C++ or JAVA, Python
Theory:
 Explain Classification
 Classification means arranging the mass of data into different classes or
groups on the basis of their similarities and resemblances.
 All similar items of data are put in one class and all dissimilar items of data
are put in different classes.
 Statistical data is classified according to its characteristics.
 For example, if we have collected data regarding the number of students
admitted to a university in a year, the students can be classified on the basis of
sex. In this case, all male students will be put in one class and all female
students will be put in another class.
 The students can also be classified on the basis of age, marks, marital status,
height, etc.
 The set of characteristics we choose for the classification of the data depends
upon the objective of the study.
 For example, if we want to study the religions mix of the students, we classify
the students on the basis of religion.
 Explain Bayes theorem

 In statistics and probability theory, the Bayes’ theorem (also known as the
Bayes’ rule) is a mathematical formula used to determine the conditional
probability of events.
 Essentially, the Bayes’ theorem describes the probability of an event based on
prior knowledge of the conditions that might be relevant to the event.
 The theorem is named after English statistician, Thomas Bayes, who
discovered the formula in 1763. It is considered the foundation of the special
statistical inference approach called the Bayes’ inference.
Formula for Bayes’ Theorem
The Bayes’ theorem is expressed in the following formula:
Where:
P(A|B) – the probability of event A occurring, given event B has occurred

P(B|A) – the probability of event B occurring, given event A has occurred
P(A) – the probability of event A
P(B) – the probability of event B
 Explain Naïve Bayes Algorithm with example

 It is a classification technique based on Bayes’ Theorem with an assumption of
independence among predictors. In simple terms, a Naive Bayes classifier
assumes that the presence of a particular feature in a class is unrelated to the
presence of any other feature.
 For example, a fruit may be considered to be an apple if it is red, round, and about
3 inches in diameter. Even if these features depend on each other or upon the
existence of the other features, all of these properties independently contribute to
the probability that this fruit is an apple and that is why it is known as ‘Naive’.
 Naive Bayes model is easy to build and particularly useful for very large data sets.
Along with simplicity, Naive Bayes is known to outperform even
highly sophisticated classification methods
Above,
 P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).

 P(c) is the prior probability of class.
 P(x|c) is the likelihood which is the probability of predictor given class.
 P(x) is the prior probability of predictor.
Pseuodo Code:
Input: Training dataset T
Value of the predictor variable in testing data F = (f1,f2,f3,f4,…..,fn)
Output: A class of testing dataset
Step:
1. Read the training dataset T.
2. Calculate the mean and standard deviation of the predictor variables in each class.
3. Repeat
i. Calculate the probability of fi using the gauss density equation in each class
until the probability of all the predictor variables has been calculated.
4. Calculate the likelihood for each class.
5. Get the greatest likelihood.
-
Implementation:
def calcCount(x, y):
count = 0
for element in d:
if(set([x, y]).issubset(set(element))):
count += 1
return count
d=[['young','high','STno','fair','no'],
['young','high','STno','good','no'],
['middle','high','STno','fair','yes'],
['old','medium','STno','fair','yes'],
['old','low','STyes','fair','yes'],
['old','low','STyes','good','no'],
['middle','low','STyes','good','yes'],
['young','medium','STno','fair','no'],
['young','low','STyes','fair','yes'],
['old','medium','STyes','fair','yes'],
['young','medium','STyes','good','yes'],
['middle','medium','STno','good','yes'],
['middle','high','STyes','fair','yes'],
['old','medium','STno','good','no']
]
y=0
n=0
for i in range(len(d)):
if(d[i][4]=='yes'):
y+=1
else:
n+=1
py=y/len(d)
pn=n/len(d)
print("P(Yes)="+str(py)+"\nP(No)="+str(pn))
print("--------------------------------")
p_ay=calcCount('young','yes')
print("1.AGE:")
print("P(ageyoung|Yes)="+str(p_ay/y))
p_atfy=calcCount('middle','yes')
print("P(age middle|Yes)="+str(p_atfy/y))
p_agfy=calcCount('old','yes')
print("P(ageold|Yes)="+str(p_agfy/y))
p_an=calcCount('young','no')
print("P(ageyoung|No)="+str(p_an/n))
p_atfn=calcCount('middle','no')
print("P(age middle|No)="+str(p_atfn/n))
p_agfn=calcCount('old','no')
print("P(ageold|No)="+str(p_agfn/n))
print("--------------------------------")
print("2.INCOME:")
p_ihy=calcCount('high','yes')
print("P(income high|Yes)="+str(p_ihy/y))
p_imy=calcCount('medium','yes')
print("P(income medium|Yes)="+str(p_imy/y))
p_ily=calcCount('low','yes')
print("P(income low|Yes)="+str(p_ily/y))
p_ihn=calcCount('high','no')
print("P(income high|no)="+str(p_ihn/n))
p_imn=calcCount('medium','no')
print("P(income medium|no)="+str(p_imn/n))
p_iln=calcCount('low','no')
print("P(income low|no)="+str(p_iln/n))
print("--------------------------------")
print("3.STUDENT:")
p_syy=calcCount('STyes','yes')
print("P(student yes|yes)="+str(p_syy/y))
p_sny=calcCount('STno','yes')
print("P(student no|yes)="+str(p_sny/y))
p_syn=calcCount('STyes','no')
print("P(student yes|no)="+str(p_syn/n))
p_snn=calcCount('STno','no')
print("P(student no|no)="+str(p_snn/n))
print("--------------------------------")
print("4.CREDIT RATING:")
p_crfy=calcCount('fair','yes')
print("P(credit_rating fair|Yes)="+str(p_crfy/y))
p_crey=calcCount('good','yes')
print("P(credit_rating good|Yes)="+str(p_crey/y))
p_crfn=calcCount('fair','no')
print("P(credit_rating fair|no)="+str(p_crfn/n))
p_cren=calcCount('good','no')
print("P(credit_rating good|no)="+str(p_cren/n))
data = 'young,medium,STyes,fair'
newData = data.split(',')
print('')
print("DATA SAMPLE")
print(newData)
p_newyes=(p_ay)*(p_imy)*(p_syy)*(p_crfy)*(py)
p_newno=(p_an)*(p_imn)*(p_syn)*(p_crfn)*(pn)
print('The student will ',end="")
if(p_newyes>p_newno):
print('buy computer')
else:
print(' not buy computer')
Output:
Conclusion:

Implementation of Classification Algorithm: Experiment - 3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Implementation of Classification Algorithm: Experiment - 3

Uploaded by

Copyright:

Available Formats

EXPERIMENT – 3

Implementation of Classification algorithm

 Explain Bayes theorem

P(A|B) – the probability of event A occurring, given event B has occurred

 Explain Naïve Bayes Algorithm with example

 P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).

Output: A class of testing dataset

You might also like