PROBABILITIES MAXIMUM LIKELIHOOD HYPOTHESES FOR PREDICTING PROBABILITIES
• Consider the setting in which we wish to learn a
non-deterministic(probabilistic) function f: X→{0,1}, which has two discrete output values. (or) • Learn a neural network whose output is probability that f(x)=1 i.e. learn a target function f': X→[0,1] such that f(x)=P(f'(x)=1) eg: instance space X – symptoms of medical patients. target function f(x)=1 if the patient survives, 0 otherwise
NOTE: Here f(x) is probabilistic, 92% may survive
i.e., f‘ (x)=0.92 & f(x)=1 8% many not survive f‘ (x)=0.08 &f(x)=0 • Training data D= {<x1,d1>…….<xm,dm>} • Let xi and di be random variables and each training example is drawn independently • so by using bayes theorem we can write P(D|h) = ……………. 1 • Let the probability of encountering any particular instance xi is independent of the hypothesis h • Lets understand with an example • The probability that our training set contains a particular patient xi is independent of the hypothesis about survival rates • Applying the product rule P(D|h)= -----2 • Use the above equation to substitute for in 2 equation
• Now we write an expression for the maximum
likelihood hypothesis
• P(xi) can be dropped because it is independent
of hypothesis. It is a prior probabaility.so nothing to do with the hypothesi The expression on the right side of the above equation seen in binomial distribution • To maximize an expression it is easier to work with the log of the likelihood