You are on page 1of 35

Statistical reasoning

Probabilistic Models
Statistical reasoning-Introduction
• In the logic based approaches we have assumed that everything is either believed false
or believed true
• It is often useful to represent the fact that - something is probably true, or true with
probability (say) 0.65.
• This is useful for dealing with problems where there is randomness and
unpredictability (such as in games of chance) and also for dealing with problems where
we could, if we had sufficient information, work out exactly what is true
• Requires techniques for probabilistic reasoning - Bayesian Probability Theory
Probabilistic Reasoning
— Way of knowledge representation where we apply the concept of
probability to indicate the uncertainty in knowledge.
— Ex scenarios:
— “It will rain today”.
— “behavior of someone for some situation”.
— “Match between two teams”.
— Probable sentences (probably”, “likely”, “maybe”, “perhaps” and
“possibly”) -assume that it will happen but not sure about it- use
probabilistic reasoning.
— The probability of an event is the proportion of cases in which the
event occurs.

3
n Probability can be expressed mathematically as a numerical index with a
range between zero (an absolute impossibility) to unity (an absolute
certainty).
n Most events have a probability index strictly between 0 and 1, which means
that each event has at least two possible outcomes: favorable outcome or
success, and unfavorable outcome or failure.

the number of successes P success = p =


s
P(success) = s+ f
the number of possible outcomes
f
P failure = q =
the number of failures s+ f
P( failure) =
the number of possible outcomes p+ q=1

4
Probability and Bayes’ Theorem
• The basic approach statistical methods adopt to deal with uncertainty is via the
axioms of probability
• Probabilities are (real) numbers in the range 0 to 1.

5
Conditional probability

• Let A be an event in the world and B be another event. Suppose that events A and
B are not mutually exclusive, but occur conditionally on the occurrence of the
other.
• The probability that event A will occur if event B occurs is called the conditional
probability.
• Conditional probability is denoted mathematically as p(A|B) in which the vertical
bar represents GIVEN and the complete probability expression is interpreted as
“Conditional probability of event A occurring given that event B has occurred”.

the number of times A and B can occur


p AB =
the number of times B can occur

6
Conditional probability
• The weatherman might state that your area has
a probability of rain of 40 percent. However,
this fact is conditional on many things, such as
the probability of…
• …a cold front coming to your area.
• …rain clouds forming.
• …another front pushing the rain clouds
away.

• We say that the conditional probability of rain


occurring depends on all the above events.

7
Independent and Dependent Events
• Independent Event : each event is not affected by any other events
(Example : tossing a coins) .
• Dependent Event : the next event depends on what happened in the previous event,
and is called dependent (Example : marbles in bag)-

8
9
Conditional probability

10
n The number of times A and B can occur, or the probability that
both A and B will occur, is called the joint probability of A and B.
n It is represented mathematically as p(A ÇB). The number of ways
B can occur is the probability of B, p(B), and thus

p AÇ B
p AB =
pB
n Similarly, the conditional probability of event B occurring given
that event A has occurred equals

p BÇ A
p BA =
p A
11
Hence, p BÇ A = p B A ´ p A
or p AÇ B = p B A ´ p A
Substituting the last equation into the equation
p AÇ B
p AB = Step 1: Write out the multiplication rule:
pB P(A and B) = P(A)*P(B|A)
Step 2: Divide both sides of the equation
yields the Bayesian rule: by P(A):
P(A and B) / P(A) = P(A)*P(B|A) / / P(A)
P(A|B)=P(AÇB)/P(B) Step 3: Cancel P(A) on the right side of
the equation:
P(B|A)= P(AÇB)/P(A) => P(AÇB)=P(B|A)*P(A) P(A and B) / P(A) = P(B|A)
Step 4: Rewrite the equation:
P(A|B)=P(B|A)*P(A)/P(B) P(B|A) = P(A and B) / P(A)
12
Bayes’ Theorem
— Bayes’ Theorem finds the probability of an event occurring given the probability of
another event that has already occurred.

— Basically, we are trying to find probability of event A, given the event B is true.
Event B is also termed as evidence.
— P(A) is the priori of A (the prior probability, i.e. Probability of event before evidence
is seen). The evidence is an attribute value of an unknown instance (here, it is
event B).
— P(A|B) is a posteriori probability of B, i.e. probability of event after evidence is
seen.

13
Bayesian rule
pBA´p A
p AB =
pB

where:
p(A|B) is the conditional probability that event A occurs given that event B has occurred;
p(B|A) is the conditional probability of event B occurring given that event A has occurred;
p(A) is the probability of event A occurring;
p(B) is the probability of event B occurring.

14
Bayesian rule p AB =
pBA´p A
pB

prior likelihood
posterior

P (C ) p (x | C )
P (C | x ) =
p (x )

evidence

• P(C|x) is the posterior probability of class (target) given predictor (attribute).


• P(C) is the prior probability of class - class probability.
• P(x|C) is the likelihood which is the probability of predictor given class - conditional probability
• P(x) is the prior probability of predictor.

15
• Prior Probability: It describes the degree to which we believe the model accurately,
describes reality based on all of our prior information.
• So we can know, how probable was our hypothesis before observing the evidence.
For example, 0.01(1%) of a population has the Food allergy, then the probability that
a person drawn at random would have Food Allergy is 0.01
• Likelihood: It describes how well or accurate, our model predicts the data.
• Normalizing Constant: The constant that makes the posterior density integrate
to “One”
• Posterior Probability: It represents the degree to which we believe the given model
accurately describes the situation given the available data and all of our prior
information.

16
Naive Bayesian
• The Naive Bayesian classifier is based on Bayes’ theorem with the
independence assumptions between predictors.
• It assumes that the effect of the value of a predictor (x) on a given class
(c) is independent of the values of other predictors.
• This assumption is called class conditional independence.
• Despite its simplicity, the Naive Bayesian classifier often does
surprisingly well and is widely used because it often outperforms more
sophisticated classification methods.
• Bayes theorem provides a way of calculating the posterior
probability, P(c|x), from P(c), P(x), and P(x|c).

17
Example dataset

— Question:
For the day <sunny, cool,
high, strong>, what’s the
play prediction?

18
19
Dataset
• The dataset is divided into two parts,
namely, feature matrix and the response
vector.
• Feature matrix contains all the vectors(rows)
of dataset in which each vector consists of the
value of dependent features. In above
dataset, features are ‘Outlook’, ‘Temperature’,
‘Humidity’ and ‘Windy’.
• Response vector contains the value of class
variable(prediction or output) for each row of
feature matrix. In above dataset, the class
variable name is ‘Play golf’.

20
Assumption

— The fundamental Naive Bayes assumption is that each feature


makes an Independent and Equal contribution to the outcome.
— Ex: We assume that no pair of features are dependent. For example, the
temperature being ‘Hot’ has nothing to do with the humidity or the outlook
being ‘Rainy’ has no effect on the winds. Hence, the features are assumed to
be independent. (class conditional independence)
— Secondly, each feature is given the same weight(or importance). For example,
knowing only temperature and humidity alone can’t predict the outcome
accuratey. None of the attributes is irrelevant and assumed to be
contributing equally to the outcome.

21
22
Bayes Rule for classification
• Assume know : prior, evidence and likelihood
• Plug them in into Bayes formula to obtain P(C|x)
• Choose C=1 if P(C=1|x)>P(C=0|x)

Detailed steps for classification:

• The posterior probability can be calculated by first, constructing a


frequency table for each attribute against the target.
• Then, transforming the frequency tables to likelihood tables and finally
use the Naive Bayesian equation to calculate the posterior probability for
each class.
• The class with the highest posterior probability is the outcome of
prediction.

23
# Outlook Temp. Humidity Windy Play tennis
1 Sunny Hot High False No
Consider
2 Sunny Hot High True No
the given
3 Overcast Hot High False Yes
training
examples: 4 Rainy Mild High False Yes
5 Rainy Cool Normal False Yes
6 Rainy Cool Normal True No
7 Overcast Cool Normal True Yes
8 Sunny Mild High False No
— Question: Find the
play prediction for the 9 Sunny Cool Normal False Yes
day <sunny, cool, high, 10 Rainy Mild Normal False Yes
strong>, using Naïve
Bayes classifier 11 Sunny Mild Normal True Yes
12 Overcast Mild High True Yes
13 Overcast Hot Normal False Yes
14 Rainy Mild High True No
24
Step 1 Calculate the class probability p(C)
# Outlook Temp. Humidity Windy Play tennis

1 Sunny Hot High False No

2 Sunny Hot High True No

3 Overcast Hot High False Yes

4 Rainy Mild High False Yes

5 Rainy Cool Normal False Yes

6 Rainy Cool Normal True No

7 Overcast Cool Normal True Yes

8 Sunny Mild High False No

9 Sunny Cool Normal False Yes

10 Rainy Mild Normal False Yes

11 Sunny Mild Normal True Yes

12 Overcast Mild High True Yes

13 Overcast Hot Normal False Yes

14 Rainy Mild High True No


25
Step 2 a: calculate p(x/C) -outlook
# Outlook Temp. Humidity Windy Play tennis

1 Sunny Hot High False No

2 Sunny Hot High True No

3 Overcast Hot High False Yes

4 Rainy Mild High False Yes

5 Rainy Cool Normal False Yes

6 Rainy Cool Normal True No

7 Overcast Cool Normal True Yes

8 Sunny Mild High False No

9 Sunny Cool Normal False Yes

10 Rainy Mild Normal False Yes

11 Sunny Mild Normal True Yes

12 Overcast Mild High True Yes

13 Overcast Hot Normal False Yes

14 Rainy Mild High True No


26
Step 2 b: calculate p(x/C) -temperature
# Outlook Temp. Humidity Windy Play tennis

1 Sunny Hot High False No

2 Sunny Hot High True No

3 Overcast Hot High False Yes

4 Rainy Mild High False Yes

5 Rainy Cool Normal False Yes

6 Rainy Cool Normal True No

7 Overcast Cool Normal True Yes

8 Sunny Mild High False No

9 Sunny Cool Normal False Yes

10 Rainy Mild Normal False Yes

11 Sunny Mild Normal True Yes

12 Overcast Mild High True Yes

13 Overcast Hot Normal False Yes

14 Rainy Mild High True No


Step 2 c: calculate p(x/C) - Humidity
# Outlook Temp. Humidity Windy Play tennis

1 Sunny Hot High False No

2 Sunny Hot High True No

3 Overcast Hot High False Yes

4 Rainy Mild High False Yes

5 Rainy Cool Normal False Yes

6 Rainy Cool Normal True No

7 Overcast Cool Normal True Yes

8 Sunny Mild High False No

9 Sunny Cool Normal False Yes

10 Rainy Mild Normal False Yes

11 Sunny Mild Normal True Yes

12 Overcast Mild High True Yes

13 Overcast Hot Normal False Yes

14 Rainy Mild High True No


Step 2 d: calculate p(x/C) - wind
# Outlook Temp. Humidity Windy Play tennis

1 Sunny Hot High False No

2 Sunny Hot High True No

3 Overcast Hot High False Yes

4 Rainy Mild High False Yes

5 Rainy Cool Normal False Yes

6 Rainy Cool Normal True No

7 Overcast Cool Normal True Yes

8 Sunny Mild High False No

9 Sunny Cool Normal False Yes

10 Rainy Mild Normal False Yes

11 Sunny Mild Normal True Yes

12 Overcast Mild High True Yes

13 Overcast Hot Normal False Yes

14 Rainy Mild High True No


30
Class probabilities (P(C))

P(x | C) for each xi in X

31
32
So now, we are done with our pre-computations and
the classifier is ready! (completed training phase)

Testing phase:
Let us test it on a new set of features
today = (Outlook -Sunny, temp-Hot, Humidity-
Normal, windy-False)

33
Test features ‘today’ = (Outlook -Sunny, temp-Hot, Humidity-Normal, wind-False)

• So, probability of playing tennis is given by:

• probability to not to play tennis is given by:

34
• These numbers can be converted into a probability by
making the sum equal to 1 (normalization)

Since,

So, prediction that golf would be played is ‘Yes’.


35

You might also like