Professional Documents
Culture Documents
Tuesday (10am-11am)
Wednesday (10am-11am & 3pm-4pm)
Friday (9am-10am, 11am-12am, 2pm-3pm)
Dr. Srinivasa L. Chakravarthy
&
Smt. Jyotsna Rani Thota
Department of CSE
GITAM Institute of Technology (GIT)
Visakhapatnam – 530045
Email: slade@gitam.edu & jthota@gitam.edu
Department of CSE, GIT 1
20 August 2020
EID 403 and machine learning
Course objectives
20 August 2020 4
&
Notation-
We can determine the MAP hypotheses by using Bayes theorem to calculate the
posterior probability of each hypothesis. hMAP represents as-
Bayesian Theorem-(cont.)
As an Example-
Consider a diagnosis problem which is with two hypothesis
A patient takes lab test, and the result comes back positive.
Prior knowledge is that over the entire population of people only .008 have this
disease.
As an Example-(cont.)
.008 .992
.98 .02
.03 .97
As an Example-(cont.)
If we observe a new patient for whom lab test returns positive result,
And so, For every consistent hypothesis has a posterior probability (1/|VSH,D|) &
So, here we may interested to minimize the expected code length such
that it assigns -log2pi bits to encode message i.
● -log2 P(h) is description length of h under the optimal encoding for the
hypothesis space H.
○ We denote it as LCH(h)= -log2 P(h) where CH is optimal code for
hypothesis space H.
● -log2P(D|h) is the description length of the training data D of a given
hypothesis h.
○ We denote it as LCD|H(D|h) = -log2 P(D|h) where CD|H is optimal code
for describing data D.
The MDL principle recommends choosing the hypothesis that minimizes the
sum of these two description lengths.
Assume that code C1 & C2 represent the hypothesis and the data given for a
hypothesis.
● If the possible classification of the new example can take on any value vj
from some set V,
● Then the probability P(vj|D) that the correct classification for the new
instance is vj, is
Bayes optimal classifier-(cont.)
Any system that classifies new instances according to above equation is called
as Bayes optimal classifier.
Note- This method maximizes the correct classification of new instance than any
other classification method on an average.
GIBBS Algorithm-
Expected
error
Naive Bayes Classifier-
It is one of the most practical learning methods in neural networks, decision
trees, nearest nbr.
● Diagnosis
● Classifying text documents
Naive Bayes Classifier-(cont.)
Assume target function f(x),
The Bayesian approach to classify new instance is to assign the most probable
target value, vMAP is defined as-
In other words, the assumption is that for a given target value of the instance,
the probability of observed conjunction a1,a2,....an is just the product of
probabilities for the individual attributes represented as-
Bayesian belief networks-
We define the joint space of the set of variables Y to be the cross product i.e.,
V(Y1) X V(Y2) X…. V(Yn).
The probability distribution over this joint space is called joint probability
distribution.
● A bayesian belief network describes the joint probability distribution for a set
of variables.
Conditional Independence
Representation
Representation
Learning Bayesian Belief Networks-
The learning task of Bayesian networks involves-
Then i
If structure is unknown-