Professional Documents
Culture Documents
Probabilistic Models
Statistical reasoning-Introduction
• In the logic based approaches we have assumed that everything is either believed false
or believed true
• It is often useful to represent the fact that - something is probably true, or true with
probability (say) 0.65.
• This is useful for dealing with problems where there is randomness and
unpredictability (such as in games of chance) and also for dealing with problems where
we could, if we had sufficient information, work out exactly what is true
• Requires techniques for probabilistic reasoning - Bayesian Probability Theory
Probabilistic Reasoning
Way of knowledge representation where we apply the concept of
probability to indicate the uncertainty in knowledge.
Ex scenarios:
“It will rain today”.
“behavior of someone for some situation”.
“Match between two teams”.
Probable sentences (probably”, “likely”, “maybe”, “perhaps” and
“possibly”) -assume that it will happen but not sure about it- use
probabilistic reasoning.
The probability of an event is the proportion of cases in which the
event occurs.
3
n Probability can be expressed mathematically as a numerical index with a
range between zero (an absolute impossibility) to unity (an absolute
certainty).
n Most events have a probability index strictly between 0 and 1, which means
that each event has at least two possible outcomes: favorable outcome or
success, and unfavorable outcome or failure.
4
Probability and Bayes’ Theorem
• The basic approach statistical methods adopt to deal with uncertainty is via the
axioms of probability
• Probabilities are (real) numbers in the range 0 to 1.
5
Conditional probability
• Let A be an event in the world and B be another event. Suppose that events A and
B are not mutually exclusive, but occur conditionally on the occurrence of the
other.
• The probability that event A will occur if event B occurs is called the conditional
probability.
• Conditional probability is denoted mathematically as p(A|B) in which the vertical
bar represents GIVEN and the complete probability expression is interpreted as
“Conditional probability of event A occurring given that event B has occurred”.
6
Conditional probability
• The weatherman might state that your area has
a probability of rain of 40 percent. However,
this fact is conditional on many things, such as
the probability of…
• …a cold front coming to your area.
• …rain clouds forming.
• …another front pushing the rain clouds
away.
7
Independent and Dependent Events
• Independent Event : each event is not affected by any other events
(Example : tossing a coins) .
• Dependent Event : the next event depends on what happened in the previous event,
and is called dependent (Example : marbles in bag)-
8
9
Conditional probability
10
n The number of times A and B can occur, or the probability that
both A and B will occur, is called the joint probability of A and B.
n It is represented mathematically as p(A ÇB). The number of ways
B can occur is the probability of B, p(B), and thus
p AÇ B
p AB =
pB
n Similarly, the conditional probability of event B occurring given
that event A has occurred equals
p BÇ A
p BA =
p A
11
Hence, p BÇ A = p B A ´ p A
or p AÇ B = p B A ´ p A
Substituting the last equation into the equation
p AÇ B
p AB = Step 1: Write out the multiplication rule:
pB P(A and B) = P(A)*P(B|A)
Step 2: Divide both sides of the equation
yields the Bayesian rule: by P(A):
P(A and B) / P(A) = P(A)*P(B|A) / / P(A)
P(A|B)=P(AÇB)/P(B) Step 3: Cancel P(A) on the right side of
the equation:
P(B|A)= P(AÇB)/P(A) => P(AÇB)=P(B|A)*P(A) P(A and B) / P(A) = P(B|A)
Step 4: Rewrite the equation:
P(A|B)=P(B|A)*P(A)/P(B) P(B|A) = P(A and B) / P(A)
12
Bayes’ Theorem
Bayes’ Theorem finds the probability of an event occurring given the probability of
another event that has already occurred.
Basically, we are trying to find probability of event A, given the event B is true.
Event B is also termed as evidence.
P(A) is the priori of A (the prior probability, i.e. Probability of event before evidence
is seen). The evidence is an attribute value of an unknown instance (here, it is
event B).
P(A|B) is a posteriori probability of B, i.e. probability of event after evidence is
seen.
13
Bayesian rule
pBA´p A
p AB =
pB
where:
p(A|B) is the conditional probability that event A occurs given that event B has occurred;
p(B|A) is the conditional probability of event B occurring given that event A has occurred;
p(A) is the probability of event A occurring;
p(B) is the probability of event B occurring.
14
Bayesian rule p AB =
pBA´p A
pB
prior likelihood
posterior
P (C ) p (x | C )
P (C | x ) =
p (x )
evidence
15
• Prior Probability: It describes the degree to which we believe the model accurately,
describes reality based on all of our prior information.
• So we can know, how probable was our hypothesis before observing the evidence.
For example, 0.01(1%) of a population has the Food allergy, then the probability that
a person drawn at random would have Food Allergy is 0.01
• Likelihood: It describes how well or accurate, our model predicts the data.
• Normalizing Constant: The constant that makes the posterior density integrate
to “One”
• Posterior Probability: It represents the degree to which we believe the given model
accurately describes the situation given the available data and all of our prior
information.
16
Naive Bayesian
• The Naive Bayesian classifier is based on Bayes’ theorem with the
independence assumptions between predictors.
• It assumes that the effect of the value of a predictor (x) on a given class
(c) is independent of the values of other predictors.
• This assumption is called class conditional independence.
• Despite its simplicity, the Naive Bayesian classifier often does
surprisingly well and is widely used because it often outperforms more
sophisticated classification methods.
• Bayes theorem provides a way of calculating the posterior
probability, P(c|x), from P(c), P(x), and P(x|c).
17
Example dataset
Question:
For the day <sunny, cool,
high, strong>, what’s the
play prediction?
18
19
Dataset
• The dataset is divided into two parts,
namely, feature matrix and the response
vector.
• Feature matrix contains all the vectors(rows)
of dataset in which each vector consists of the
value of dependent features. In above
dataset, features are ‘Outlook’, ‘Temperature’,
‘Humidity’ and ‘Windy’.
• Response vector contains the value of class
variable(prediction or output) for each row of
feature matrix. In above dataset, the class
variable name is ‘Play golf’.
20
Assumption
21
22
Bayes Rule for classification
• Assume know : prior, evidence and likelihood
• Plug them in into Bayes formula to obtain P(C|x)
• Choose C=1 if P(C=1|x)>P(C=0|x)
23
# Outlook Temp. Humidity Windy Play tennis
1 Sunny Hot High False No
Consider
2 Sunny Hot High True No
the given
3 Overcast Hot High False Yes
training
examples: 4 Rainy Mild High False Yes
5 Rainy Cool Normal False Yes
6 Rainy Cool Normal True No
7 Overcast Cool Normal True Yes
8 Sunny Mild High False No
Question: Find the
play prediction for the 9 Sunny Cool Normal False Yes
day <sunny, cool, high, 10 Rainy Mild Normal False Yes
strong>, using Naïve
Bayes classifier 11 Sunny Mild Normal True Yes
12 Overcast Mild High True Yes
13 Overcast Hot Normal False Yes
14 Rainy Mild High True No
24
Step 1 Calculate the class probability p(C)
# Outlook Temp. Humidity Windy Play tennis
31
32
So now, we are done with our pre-computations and
the classifier is ready! (completed training phase)
Testing phase:
Let us test it on a new set of features
today = (Outlook -Sunny, temp-Hot, Humidity-
Normal, windy-False)
33
Test features ‘today’ = (Outlook -Sunny, temp-Hot, Humidity-Normal, wind-False)
34
• These numbers can be converted into a probability by
making the sum equal to 1 (normalization)
Since,