You are on page 1of 11

AI for DS – Theory CSP, Wumpus world and Probabilistic Reasoning

Constraint satisfaction problems (CSP) with AC-3 and backtracking.


constraint satisfaction problems, or CSPs for short, are a flexible approach to searching that have proven useful in
many AI-style problems
CSPs can be used to solve problems such as
• graph-coloring: given a graph, the a coloring of the graph means assigning each of its vertices a color such
that no pair of vertices connected by an edge have the same color
o in general, this is a very hard problem, e.g. determining if a graph can be colored with 3 colors is NP-
hard
o many problems boil down to graph coloring, or related problems
• job shop scheduling: e.g. suppose you need to complete a set of tasks, each of which have a duration, and
constraints upon when they start and stop (e.g. task c can’t start until both task a and task b are finished)
o CSPs are a natural way to express such problems
• cryptarithmetic puzzles: e.g. suppose you are told that TWO + TWO = FOUR, and each of the letters
corresponds to a different digit from 0 to 9, and that a number can’t start with 0 (so T and F are not 0); what, if
any, are the possible values for the letters?
o while these are not directly useful problems, they are a simple test case for CSP solvers
the basic idea is to have a set of variables that can be assigned values in a constrained way.
A CSP contains:
• a set of variables {A, B, C} to which you want to find a value. X: a set of variables {X1,…,Xn};
• D: a set of domains {D1,…,Dn} one domain per variable. A domain for each variable Da, Db, Dc each containing the
possible values that the variable can take. we’re only going to consider finite domains; you can certainly have CSPs
with infinite domains (e.g. real numbers), but we won’t consider such problems here;
• a set of restrictions like A > B + 2 and C < B ...
C is a set of constraints that specify allowable assignments of values to variables
o for example, a binary constraint consists of a pair of different variables, (Xi, Xj), and a set of pairs of
values that Xi and Xj can take on at the same time
o we will usually only deal with binary constraints; constraints between three or more variables are
possible (e.g. Xi, Xj, Xk are all different), but they don’t occur too frequently, and can be decomposed
into binary constraints
Now when you have a CSP you want to attribute values to all the variables and keep respecting the restrictions. The CSP is solved when all
the variables have a value and respect all the restrictions at the same time.
Backtracking is an algorithm that allows you to find a solution for this problem. So you start with a empty state {} which means that no
variable has a value. Then you pick up one variable from the variables set (the order you use to choose the variable you pick may affect the
performance of the algorithm there are some heuristics that you can use for this like MRV - Minimum values remaining ...). Now let's say we
choose A first, now we pick up a value from the domain Da (The order you pick this value may also use an heuristic). Imagine Da = {1,2,3}.
We pick 1. Now we check if A = 1 has does not violate any restriction, otherwise it's not a good attribution. If it doesn't then let's set A = 1,
now we are in the state {A=1}. Now let's keep doing this. Imagine you pick B and a value 1. This will violate the restriction A > B + 2. Now
you have two options, if you have another value to test for B, you can try it. If not, this means that A = 1 it's wrong and you'll need to go
back to the state {} and try A = 2 and so on.
Here's the pseudocode of the backtracking:
function backtracking (csp) return a solution or fails
return recursive_backtracking({}, csp) // {} is the initial state
function recursive_backtracking (state, csp) return a solution or fails
if state is complete then return state // all variable have a value
var <- selectNotAtributedVariable(csp)
for each value in orderValues(csp, var) // values of the domain of var
if var = value is consistent given the restrictions
add {var = value} to state
result = recursive_backtracking(state, csp)
if result != fail then return result
remove {var = value} from state
return fail
Note that selectNotAtributedVariable and orderValues are heuristics (they may just return the first element of the set).
Now what is AC-3 and why and when do we use it? First AC-3 is used as a preprocess step. You use it like this:
function solveCSP(csp)
ac3(csp)
return backtracking(csp)
this answers to when. Basically AC-3 detects conflicts that you will have in attributions, during the backtracking, and deletes them. How? By
cutting the domains of the variables in the CSP. So when two variables share a restriction we say that there's an arc between both. You say
that the arc between A and B is consistent if:
• A->B is consistent: foreach value a that A can take there's a value b that B can take respecting the restriction.
• and B->A is consistent: foreach value b that B can take there's a value a that A can take respecting the restriction.
Imagine you have the following restrictions A > B and B > C. You'll have the following set of arcs: {A->B, B->A, B->C, C->B} Now what
AC-3 does is it selects an arc from the set above, A->B, for each value of a that A can takes try to check if there's a value b that B can take
1
AI for DS – Theory CSP, Wumpus world and Probabilistic Reasoning
respecting the restriction. If so, the domain of A keeps the same, if not remove the value a from the domain of A. Every time a value is
removed from the domain, you'll have to recheck the neighbors of A (in this case). I mean you'll need to recheck the arc B->A (not because
they are in the set above).
So, here's the pseudocode:
function AC3(csp) returns csp possibly with the domains reduced
queue, a queue with all the arcs of the CSP
while queue not empty
(X,Y) <- getFirst(queue)
if RemoveConsistentValues(X,Y, csp) then
foreach Z in neighbor(X) - {Y}
add to queue (Z,X)
return csp
function RemoveConsistentValues(X, Y, csp) returns true if a value was removed
valueRemoved <- false
foreach x in domain(X, csp)
if there's no value of domain(Y, csp) that satisfies the restriction between X and Y
then
remove x from domain(X, csp)
valueRemoved <- true
return valueRemoved
The objective of AC3 is to cut the domains of csp, removing values that won’t be the solution for sure. This allows the backtracking to be
fast.
Constraint Propagation: Arc Consistency
as mentioned above, we are only considering CSPs with finite domains, and with binary constraints between variables
it turns out that many such CSPs can be translated into a simpler CSP that is guaranteed to have the same solutions
the idea we will consider here is arc consistency
the CSP variable Xi is said to be arc consistent with respect to variable Xj if for every value in Di there is at least one
corresponding value in Dj that satisfies the binary constraint on Xi and Xj
an entire CSP is said to be arc consistent if every variable is arc consistent with every other variable
if a CSP is not arc consistent, then it turns out that there are relatively efficient algorithms that will make it arc
consistent
• and the resulting arc consistent CSP will have the same solutions as the original CSP, but often a smaller search
space
• for some problems, making a CSP arc consistent may be all that is necessary to solve it
the most popular arc consistency algorithm as AC-3
• on a CSP with cc constraints a maximum domain size of dd, AC-3 runs in O(cd3) time
Backtracking Search
arc consistency can often reduce the size of the domains of a CSP, but it can’t always guarantee to find a solution
so some kind of search is needed
intuitively, the idea backtracking search is to repeat the following steps
• pick an unassigned variable
• assign it a value that doesn’t violate any constraints
o if there is no such value, then backtrack to the previous step and choose a new variable
a basic backtracking search is complete, i.e. it will find solutions if they exist, but in practice it is often very slow on
large problems
• keep in mind that, in general, solving CSPs is NP-hard, and so the best known general-purpose solving
algorithms run in worst-case exponential time
so various improvements are made to basic backtracking, such as:
• rules for choosing the next variable to assigned
o one rule is to always choose the variable with the minimum remaining values (MRV) in its domain
▪ the intuition is that small domains have fewer choices and so will hopefully lead to failure more
quickly than big domains
▪ in practice, usually performs better than random or static variable choice, but it depends on the
problem
• rules for choosing what value to assign to the current variable
o one rule is to choose the least-constraining value from the domain of the variable being assigned, i.e.
the value that rules out the fewest choices for neighbor variables
o in practice, can perform well, but depends on the problem
• interleaving searching and inference: forward checking
2
AI for DS – Theory CSP, Wumpus world and Probabilistic Reasoning
o after a variable is assigned in backtracking search, we can rule out all domain values for associated
variables that are inconsistent with the assignment
o forward-checking combined with the MRV variable selection heuristic is often a good combination
• rules for choosing what variable to backtrack to
o basic backtracking is sometimes called chronological backtracking because it always backtracks to the
most recently assigned variable
o but it’s possible to do better, e.g. conflict-directed backtracking
an interesting fact about these improvements on backtracking search is that they are domain-independent — they can
be applies to and CSP
• you can, of course, still use domain-specific heuristics if you have any
it’s also worth pointing out that fast and memory-efficient implementations are a non-trivial engineering challenge,
and require some thought to implement efficiently
• plus, experiments with different combinations of heuristic rules are often needed to find the most efficient
variation
Local Search: Min-conflicts
backtracking search solves a CSP by assigning one variable at a time
another approach to solving a CSP is to assign all the variables, and then modify this assignment to make it better
this is a kind of local search on CSPs, and for some problems it can be extremely effective
example: 4-queens problem
• place 4 queens randomly on a 4-by-4 board, one queen per column
• pick a queen, and re-position it in its column so that it is attacking the fewest number of other queens; this is
know as the min-conflicts heuristic
• repeat the previous step until there are no possible moves that reduce conflicts; if the puzzle is solved, then
you’re done; if it’s not solved, then re-start the entire process with a new initial random placement of queens
local search using min-conflicts can be applied to any CSP as follows:

function Min-Conflicts(csp, max_steps)


returns: solution, or failure
Inputs: csp, a constraint satisfaction problem
max_steps, number of steps allowed before giving up
current <-- an initial complete assignment for CSP
for i = 1 to max_steps do
if current is a solution, then return current
var <-- randomly chosen conflicted variable from csp.VARIABLES
value <-- the value v for var that minimizes CONFLICTS(var, v, current, csp)
set var = value in current
end

the CONFLICTS functions returns the count of the number of variables in the rest of the assignment that violate a
constraint when var=v
tailored version of Min-Conflicts can solve the n-queens problem for n=3 million in less than a minute (on 1990s
computers!)

The Wumpus World in Artificial intelligence


Wumpus world:
The Wumpus world is a simple world example to illustrate the worth of a knowledge-based agent and to represent knowledge
representation. It was inspired by a video game Hunt the Wumpus by Gregory Yob in 1973.
3
AI for DS – Theory CSP, Wumpus world and Probabilistic Reasoning
The Wumpus world is a cave which has 4/4 rooms connected with passageways. So there are total 16 rooms which are connected
with each other. We have a knowledge-based agent who will go forward in this world. The cave has a room with a beast which is
called Wumpus, who eats anyone who enters the room. The Wumpus can be shot by the agent, but the agent has a single arrow.
In the Wumpus world, there are some Pits rooms which are bottomless, and if agent falls in Pits, then he will be stuck there forever.
The exciting thing with this cave is that in one room there is a possibility of finding a heap of gold. So the agent goal is to find the
gold and climb out the cave without fallen into Pits or eaten by Wumpus. The agent will get a reward if he comes out with gold,
and he will get a penalty if eaten by Wumpus or falls in the pit.
Note: Here Wumpus is static and cannot move.
Following is a sample diagram for representing the Wumpus world. It is showing some rooms with Pits, one room with Wumpus
and one agent at (1, 1) square location of the world.

There are also some components which can help the


agent to navigate the cave. These components are
given as follows:
a. The rooms adjacent to the Wumpus room are
smelly, so that it would have some stench.
b. The room adjacent to PITs has a breeze, so if the
agent reaches near to PIT, then he will perceive the
breeze.
c. There will be glitter in the room if and only if the
room has gold.
d. The Wumpus can be killed by the agent if the agent
is facing to it, and Wumpus will emit a horrible
scream which can be heard anywhere in the cave.
PEAS description of Wumpus world:
To explain the Wumpus world we have given PEAS
description as below:
Performance measure:
o +1000 reward points if the agent comes out of the
cave with the gold.
o -1000 points penalty for being eaten by the
Wumpus or falling into the pit.
o -1 for each action, and -10 for using an arrow.
o The game ends if either agent dies or came out of the cave.
Environment:
o A 4*4 grid of rooms.
o The agent initially in room square [1, 1], facing toward the right.
o Location of Wumpus and gold are chosen randomly except the first square [1,1].
o Each square of the cave can be a pit with probability 0.2 except the first square.
Actuators: Sensors:
o Left turn, o The agent will perceive the stench if he is in the room adjacent to the Wumpus. (Not diagonally).
o Right turn o The agent will perceive breeze if he is in the room directly adjacent to the Pit.
o Move forward o The agent will perceive the glitter in the room where the gold is present.
o Grab o The agent will perceive the bump if he walks into a wall.
o Release o When the Wumpus is shot, it emits a horrible scream which can be perceived anywhere in the cave.
o Shoot. o These percepts can be represented as five element list, in which we will have different indicators for
each sensor.
o Example if agent perceives stench, breeze, but no glitter, no bump, and no scream then it can be
represented as:
[Stench, Breeze, None, None, None].
The Wumpus world Properties:
o Partially observable: The Wumpus world is partially observable because the agent can only perceive the close
environment such as an adjacent room.
o Deterministic: It is deterministic, as the result and outcome of the world are already known.
o Sequential: The order is important, so it is sequential.
o Static: It is static as Wumpus and Pits are not moving.
o Discrete: The environment is discrete.
o One agent: The environment is a single agent as we have one agent only and Wumpus is not considered as an agent.
Exploring the Wumpus world:
Now we will explore the Wumpus world and will determine how the agent will find its goal by applying logical reasoning.
Agent's First step:
Initially, the agent is in the first room or on the square [1,1], and we already know that this room is safe for the agent, so to
represent on the below diagram (a) that room is safe we will add symbol OK. Symbol A is used to represent agent, symbol B for
the breeze, G for Glitter or gold, V for the visited room, P for pits, W for Wumpus.
4
AI for DS – Theory CSP, Wumpus world and Probabilistic Reasoning

At Room [1,1] agent does not feel any breeze or any Stench which means the adjacent squares are also OK.

Agent's second Step:


Now agent needs to move forward, so it will either move to [1, 2], or [2,1]. Let's suppose agent moves to the room [2, 1], at this
room agent perceives some breeze which means Pit is around this room. The pit can be in [3, 1], or [2,2], so we will add symbol P?
to say that, is this Pit room?
Now agent will stop and think and will not make any harmful move. The agent will go back to the [1, 1] room. The room [1,1], and
[2,1] are visited by the agent, so we will use symbol V to represent the visited squares.
Agent's third step:
At the third step, now agent will move to the room [1,2] which is OK. In the room [1,2] agent perceives a stench which means there
must be a Wumpus nearby. But Wumpus cannot be in the room [1,1] as by rules of the game, and also not in [2,2] (Agent had not
detected any stench when he was at [2,1]). Therefore agent infers that Wumpus is in the room [1,3], and in current state, there is
no breeze which means in [2,2] there is no Pit and no Wumpus. So it is safe, and we will mark it OK, and the agent moves further
in [2,2].
Agent's fourth step:
At room [2,2], here no stench and no breezes
present so let's suppose agent decides to
move to [2,3]. At room [2,3] agent perceives
glitter, so it should grab the gold and climb
out of the cave.
Knowledge-base for Wumpus world
As in the previous topic we have learned
about the wumpus world and how a
knowledge-based agent evolves the world.
Now in this topic, we will create a knowledge
base for the wumpus world, and will derive
some proves for the Wumpus-world using
propositional logic.
The agent starts visiting from first square [1,
1], and we already know that this room is safe
for the agent. To build a knowledge base for
wumpus world, we will use some rules and
atomic propositions. We need symbol [i, j] for
each location in the wumpus world, where i is
for the location of rows, and j for column location.
Atomic proposition variable for Wumpus world:
o Let Pi,j be true if there is a Pit in the room [i, j].
o Let Bi,j be true if agent perceives breeze in [i, j], (dead or alive).
o Let Wi,j be true if there is wumpus in the square[i, j].

5
AI for DS – Theory CSP, Wumpus world and Probabilistic Reasoning
o Let Si,j be true if agent perceives stench in the square [i, j].
o Let Vi,j be true if that square[i, j] is visited.
o Let Gi,j be true if there is gold (and glitter) in the square [i, j].
o Let OKi,j be true if the room is safe.
Note: For a 4 * 4 square board, there will be 7*4*4= 122
propositional variables.
Some Propositional Rules for the wumpus world:

Note: lack of variables gives us similar rules for each cell.


Representation of Knowledgebase for Wumpus world:
Following is the Simple KB for wumpus world when an agent moves from room [1, 1], to room [2,1]:

Here in the first row, we have mentioned propositional variables for room[1,1], which is showing that room does not have
wumpus(¬ W11), no stench (¬S11), no Pit(¬P11), no breeze(¬B11), no gold (¬G11), visited (V11), and the room is Safe(OK11).
In the second row, we have mentioned propositional variables for room [1,2], which is showing that there is no wumpus, stench
and breeze are unknown as an agent has not visited room [1,2], no Pit, not visited yet, and the room is safe.
In the third row we have mentioned propositional variable for room[2,1], which is showing that there is no wumpus(¬ W21), no
stench (¬S21), no Pit (¬P21), Perceives breeze(B21), no glitter(¬G21), visited (V21), and room is safe (OK21).
Prove that Wumpus is in the room (1, 3)
We can prove that wumpus is in the room (1, 3) using propositional rules which we have derived for the wumpus world and using
inference rule.
o Apply Modus Ponens with ¬S11 and R1:
We will firstly apply MP rule with R1 which is ¬S11 → ¬ W11 ^ ¬ W12 ^ ¬ W21, and ¬S11 which will give the output ¬ W11 ^ W12 ^
W12.

o Apply And-Elimination Rule:


After applying And-elimination rule to ¬ W11 ∧ ¬ W12 ∧ ¬ W21, we will get three statements:
¬ W11, ¬ W12, and ¬W21.
o Apply Modus Ponens to ¬S21, and R2:
Now we will apply Modus Ponens to ¬S21 and R2 which is ¬S21 → ¬ W21 ∧¬ W22 ∧ ¬ W31, which will give the Output as ¬ W21 ∧ ¬
W22 ∧¬ W31

o Apply And -Elimination rule:


Now again apply And-elimination rule to ¬ W21 ∧ ¬ W22 ∧¬ W31, We will get three statements:
¬ W21, ¬ W22, and ¬ W31.
o Apply MP to S12 and R4:
Apply Modus Ponens to S12 and R4 which is S12 → W13 ∨. W12 ∨. W22 ∨.W11, we will get the output as W13∨ W12 ∨ W22 ∨.W11.

6
AI for DS – Theory CSP, Wumpus world and Probabilistic Reasoning

o Apply Unit resolution on W13 ∨ W12 ∨ o Apply Unit resolution on W13 ∨ W12 ∨ W22 and ¬
W22 ∨W11 and ¬ W11 : W22 :
After applying Unit resolution formula on W13 ∨ W12 ∨ After applying Unit resolution on W13 ∨ W12 ∨ W22, and ¬W22,
W22 ∨W11 and ¬ W11 we will get W13 ∨ W12 ∨ W22. we will get W13 ∨ W12 as output.

o Apply Unit Resolution on W13 ∨ W12 and ¬ W12 :


After Applying Unit resolution on W13 ∨ W12 and ¬ W12, we will get W13 as an output, hence it is proved that the Wumpus is in
the room [1, 3].

Probabilistic reasoning in Artificial intelligence


Uncertainty:
Till now, we have learned knowledge representation using first-order logic and propositional logic with certainty, which means we were sure
about the predicates. With this knowledge representation, we might write A→B, which means if A is true then B is true, but consider a
situation where we are not sure about whether A is true or not then we cannot express this statement, this situation is called uncertainty. So to
represent uncertain knowledge, where we are not sure about the predicates, we need uncertain reasoning or probabilistic reasoning.
Causes of uncertainty:
Following are some leading causes of uncertainty to occur in the real world.
1. Information occurred from unreliable sources.
2. Experimental Errors
3. Equipment fault
4. Temperature variation
5. Climate change.
Probabilistic reasoning:
Probabilistic reasoning is a way of knowledge representation where we apply the concept of probability to indicate the uncertainty in
knowledge. In probabilistic reasoning, we combine probability theory with logic to handle the uncertainty.
We use probability in probabilistic reasoning because it provides a way to handle the uncertainty that is the result of someone's laziness and
ignorance.
In the real world, there are lots of scenarios, where the certainty of something is not confirmed, such as "It will rain today," "behavior of
someone for some situations," "A match between two teams or two players." These are probable sentences for which we can assume that it
will happen but not sure about it, so here we use probabilistic reasoning.
Need of probabilistic reasoning in AI:
• When there are unpredictable outcomes.
• When specifications or possibilities of predicates becomes too large to handle.
• When an unknown error occurs during an experiment.
In probabilistic reasoning, there are two ways to solve problems with uncertain knowledge:
• Bayes' rule
• Bayesian Statistics
Note: We will learn the above two rules in later chapters.
As probabilistic reasoning uses probability and related terms, so before understanding probabilistic reasoning, let's understand some common
terms:
Probability: Probability can be defined as a chance that an uncertain event will occur. It is the numerical measure of the likelihood that an
event will occur. The value of probability always remains between 0 and 1 that represent ideal uncertainties.
7
AI for DS – Theory CSP, Wumpus world and Probabilistic Reasoning
• 0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.
• P(A) = 0, indicates total uncertainty in an event A.
• P(A) =1, indicates total certainty in an event A.
We can find the probability of an uncertain event by using the below formula.

• P(¬A) = probability of a not happening event.


• P(¬A) + P(A) = 1.
Event: Each possible outcome of a variable is called an event.
Sample space: The collection of all possible events is called sample space.
Random variables: Random variables are used to represent the events and objects in the real world.
Prior probability: The prior probability of an event is probability computed before observing new information.
Posterior Probability: The probability that is calculated after all evidence or information has taken into account. It is a combination of prior
probability and new information.
Conditional probability:
Conditional probability is a probability of occurring an event when another event has already happened.
Let's suppose, we want to calculate the event A when event B has already occurred, "the probability of A under the conditions of B", it can
be written as:

Where P(A⋀B)= Joint probability of a and B


P(B)= Marginal probability of B.
If the probability of A is given and we need to find the probability of B, then it will be given as:

It can be explained by using the below Venn diagram, where B is occurred event, so
sample space will be reduced to set B, and now we can only calculate event A when event
B is already occurred by dividing the probability of P(A⋀B) by P( B ).

Example:
In a class, there are 70% of the students who like English and 40% of the students who
likes English and mathematics, and then what is the percent of students those who like
English also like mathematics?
Solution:
Let, A is an event that a student likes Mathematics
B is an event that a student likes English.

Hence, 57% are the students who like English also like Mathematics.
Bayes' theorem in Artificial intelligence
Bayes' theorem:
Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian reasoning, which determines the probability of an event with
uncertain knowledge.
In probability theory, it relates the conditional probability and marginal probabilities of two random events.
Bayes' theorem was named after the British mathematician Thomas Bayes. The Bayesian inference is an application of Bayes' theorem,
which is fundamental to Bayesian statistics.
It is a way to calculate the value of P(B|A) with the knowledge of P(A|B).
Bayes' theorem allows updating the probability prediction of an event by observing new information of the real world.
Example: If cancer corresponds to one's age then by using Bayes' theorem, we can determine the probability of cancer more accurately with
the help of age.
Bayes' theorem can be derived using product rule and conditional probability of event A with known event B:
As from product rule we can write:
1. P(A ⋀ B)= P(A|B) P(B) or
Similarly, the probability of event B with known event A:
1. P(A ⋀ B)= P(B|A) P(A)
Equating right hand side of both the equations, we will get:

The above equation (a) is called as Bayes' rule or Bayes' theorem. This equation is basic of most modern AI systems for probabilistic
inference.
It shows the simple relationship between joint and conditional probabilities. Here,
P(A|B) is known as posterior, which we need to calculate, and it will be read as Probability of hypothesis A when we have occurred an
evidence B.
P(B|A) is called the likelihood, in which we consider that hypothesis is true, then we calculate the probability of evidence.
P(A) is called the prior probability, probability of hypothesis before considering the evidence
8
AI for DS – Theory CSP, Wumpus world and Probabilistic Reasoning
P(B) is called marginal probability, pure probability of an evidence.
In the equation (a), in general, we can write P (B) = P(A)*P(B|Ai), hence the Bayes' rule can be written as:

Where A1, A2, A3,........, An is a set of mutually exclusive and exhaustive events.
Applying Bayes' rule:
Bayes' rule allows us to compute the single term P(B|A) in terms of P(A|B), P(B), and P(A). This is very useful in cases where we have a
good probability of these three terms and want to determine the fourth one. Suppose we want to perceive the effect of some unknown cause,
and want to compute that cause, then the Bayes' rule becomes:

Example-1:
Question: what is the probability that a patient has diseases meningitis with a stiff neck?
Given Data:
A doctor is aware that disease meningitis causes a patient to have a stiff neck, and it occurs 80% of the time. He is also aware of some more
facts, which are given as follows:
• The Known probability that a patient has meningitis disease is 1/30,000.
• The Known probability that a patient has a stiff neck is 2%.
Let a be the proposition that patient has stiff neck and b be the proposition that patient has meningitis. , so we can calculate the following as:
P(a|b) = 0.8
P(b) = 1/30000
P(a)= .02

Hence, we can assume that 1 patient out of 750 patients has meningitis disease with a stiff neck.
Example-2:
Question: From a standard deck of playing cards, a single card is drawn. The probability that the card is king is 4/52, then calculate
posterior probability P(King|Face), which means the drawn face card is a king card.
Solution:

P(king): probability that the card is King= 4/52= 1/13


P(face): probability that a card is a face card= 3/13
P(Face|King): probability of face card when we assume it is a king = 1
Putting all values in equation (i) we will get:

Application of Bayes' theorem in Artificial intelligence:


Following are some applications of Bayes' theorem:
• It is used to calculate the next step of the robot when the already executed step is given.
• Bayes' theorem is helpful in weather forecasting.
• It can solve the Monty Hall problem.
Bayesian Belief Network in artificial intelligence
Bayesian belief network is key computer technology for dealing with probabilistic events and to solve a problem which has uncertainty. We
can define a Bayesian network as:
"A Bayesian network is a probabilistic graphical model which represents a set of variables and their conditional dependencies using a
directed acyclic graph."
It is also called a Bayes network, belief network, decision network, or Bayesian model.
Bayesian networks are probabilistic, because these networks are built from a probability distribution, and also use probability theory for
prediction and anomaly detection.
Real world applications are probabilistic in nature, and to represent the relationship between multiple events, we need a Bayesian network. It
can also be used in various tasks including prediction, anomaly detection, diagnostics, automated insight, reasoning, time series
prediction, and decision making under uncertainty.
Bayesian Network can be used for building models from data and experts opinions, and it consists of two parts:
• Directed Acyclic Graph
• Table of conditional probabilities.
The generalized form of Bayesian network that represents and solve decision problems under uncertain knowledge is known as an Influence
diagram.
A Bayesian network graph is made up of nodes and Arcs (directed links), where:
• Each node corresponds to the random variables, and a variable can be continuous or discrete.
9
AI for DS – Theory CSP, Wumpus world and Probabilistic Reasoning
• Arc or directed arrows represent the causal relationship or conditional probabilities between random variables. These directed
links or arrows connect the pair of nodes in the graph.
These links represent that one node directly influence the other node, and if there is no directed link that means that nodes are
independent with each other
o In the above diagram, A, B, C, and D are random variables represented by the nodes of the network graph.
o If we are considering node B, which is connected with node
A by a directed arrow, then node A is called the parent of
Node B.
o Node C is independent of node A.
Note: The Bayesian network graph does not contain any cyclic graph.
Hence, it is known as a directed acyclic graph or DAG.
The Bayesian network has mainly two components:
• Causal Component
• Actual numbers
Each node in the Bayesian network has condition probability distribution P(Xi
|Parent(Xi) ), which determines the effect of the parent on that node.
Bayesian network is based on Joint probability distribution and conditional
probability. So let's first understand the joint probability distribution:
Joint probability distribution:
If we have variables x1, x2, x3,....., xn, then the probabilities of a different
combination of x1, x2, x3.. xn, are known as Joint probability distribution.
P[x1, x2, x3,....., xn], it can be written as the following way in terms of the joint probability distribution.
= P[x1| x2, x3,....., xn]P[x2, x3,....., xn]
= P[x1| x2, x3,....., xn]P[x2|x3,....., xn]....P[xn-1|xn]P[xn].
In general for each variable Xi, we can write the equation as:
P(Xi|Xi-1,........., X1) = P(Xi |Parents(Xi ))
Explanation of Bayesian network:
Let's understand the Bayesian network through an example by creating a directed acyclic graph:
Example: Harry installed a new burglar alarm at his home to detect burglary. The alarm reliably responds at detecting a burglary but also
responds for minor earthquakes. Harry has two neighbors David and Sophia, who have taken a responsibility to inform Harry at work when
they hear the alarm. David always calls Harry when he hears the alarm, but sometimes he got confused with the phone ringing and calls at
that time too. On the other hand, Sophia likes to listen to high music, so sometimes she misses to hear the alarm. Here we would like to
compute the probability of Burglary Alarm.
Problem:
Calculate the probability that alarm has sounded, but there is neither a burglary, nor an earthquake occurred, and David and
Sophia both called the Harry.
Solution:
• The Bayesian network for the above problem is given below. The network structure is showing that burglary and earthquake is the
parent node of the alarm and directly affecting the probability of alarm's going off, but David and Sophia's calls depend on alarm
probability.
• The network is representing that our assumptions do not directly perceive the burglary and also do not notice the minor earthquake,
and they also not confer before calling.
• The conditional distributions for each node are given as conditional probabilities table or CPT.
• Each row in the CPT must be sum to 1 because all the entries in the table represent an exhaustive set of cases for the variable.
• In CPT, a boolean variable with k boolean parents contains 2K probabilities. Hence, if there are two parents, then CPT will contain 4
probability values
List of all events occurring in this network:
• Burglary (B)
• Earthquake(E)
• Alarm(A)
• David Calls(D)
• Sophia calls(S)
We can write the events of problem statement in the form of probability: P[D, S, A, B, E], can rewrite the above probability statement using
joint probability distribution:
P[D, S, A, B, E]= P[D | S, A, B, E]. P[S, A, B, E]
=P[D | S, A, B, E]. P[S | A, B, E]. P[A, B, E]
= P [D| A]. P [ S| A, B, E]. P[ A, B, E]
= P[D | A]. P[ S | A]. P[A| B, E]. P[B, E]
= P[D | A ]. P[S | A]. P[A| B, E]. P[B |E]. P[E]
Let's take the observed probability for the Burglary and earthquake component:
P(B= True) = 0.002, which is the probability of burglary. B E P(A= True) P(A= False)
P(B= False)= 0.998, which is the probability of no burglary.
True True 0.94 0.06
P(E= True)= 0.001, which is the probability of a minor earthquake
P(E= False)= 0.999, Which is the probability that an earthquake not occurred. True False 0.95 0.04
We can provide the conditional probabilities as per the below tables: False True 0.31 0.69
Conditional probability table for Alarm A:
The Conditional probability of Alarm A depends on Burglar and earthquake: False False 0.001 0.999
Conditional probability table for David Calls: Conditional probability table for Sophia Calls:
The Conditional probability of David that he will call depends on The Conditional probability of Sophia that she calls is depending
the probability of Alarm. on its Parent Node "Alarm."
A P(D= True) P(D= False) A P(S= True) P(S= False)
10
AI for DS – Theory CSP, Wumpus world and Probabilistic Reasoning
True 0.91 0.09 True 0.75 0.25
False 0.05 0.95 False 0.02 0.98
From the formula of joint distribution, we can write the problem statement in the form of probability distribution:
P(S, D, A, ¬B, ¬E) = P (S|A) *P (D|A)*P (A|¬B ^ ¬E) *P (¬B) *P (¬E).
= 0.75* 0.91* 0.001* 0.998*0.999
= 0.00068045.
Hence, a Bayesian network can answer any query about the domain by using Joint distribution.
The semantics of Bayesian Network:
There are two ways to understand the semantics of the Bayesian network, which is given below:
1. To understand the network as the representation of the Joint probability distribution.
It is helpful to understand how to construct the network.
2. To understand the network as an encoding of a collection of conditional independence statements.
It is helpful in designing inference procedure.

11

You might also like