18CSC305J - Artificial Intelligence: Unit - 2

18CSC305J– Artificial Intelligence
UNIT – 2
Unit 2 List of Topics
• Searching techniques – Uninformed search – • AO* research

General search Algorithm • Local search Algorithms-Hill Climbing,
• Uninformed search Methods – Breadth First Simulated Annealing
Search
• Local Beam Search
• Uninformed search Methods – Depth First • Genetic Algorithms
Search
• Uninformed search Methods – Depth limited
• Adversarial search Methods-Game
Search
playing-Important concepts
• Game playing and knowledge structure.
• Uniformed search Methods- Iterative
Deepening search
• Game as a search problem-Minimax Approach
• Bi-directional search
• Minimax Algorithm
• Informed search- Generate and test, Best First

search • Alpha beta pruning
• Informed search-A* Algorithm • Game theory problems
General Search Strategies
•Blind search traversing the •Heuristic search search

search space until the goal process takes place by
nodes is found (might be doing traversing search space with
exhaustive search). applied rules (information).
•Techniques: Greedy Best First
•Techniques : Breadth First,
Search, A* Algorithm
Depth first, Depth Limited,
Iterative Deepening search, •There is no guarantee that
Uniform Cost search. solution is found.
•Guarantees solution.
Important Terms
• Search space possible conditions and solutions.

• Initial state state where the searching process started.
• Goal state the ultimate aim of searching process.
• Problem space “what to solve”
• Searching strategy strategy for controlling the search.
• Search tree tree representation of search space, showing
possible solutions from initial state.
Blind Search : Breadth First Search
1 2
3 4
Blind strategies BFS
Search Methods
Blind Search : Breadth First Search
BFS characteristics
Completeness: if the branching factor is ‘b’ and the goal node is
at depth d, BFS will eventually find it.
Optimality: BFS is optimal if path cost is a non-decreasing
function of depth.a
Time complexity:
1 + b + b2 + b3 + . . . + bd + b(bd − 1) = O(bd+1).
Space complexity: O(bd+1).b
a
Otherwise, the shallowest node may not necessarily be optimal.
b
b branching factor; d depth of the goal node
spring 2011
Blind Search : Depth First Search (DFS)
Implementation:
fringe = LIFO queue, i.e., put successors at front.
1 2 3
4 5 N+1
…….
Search Methods
Blind Search : Depth First Search (DFS)
DFS characteristics
Small space requirements: only the path to the current node and
the siblings of each node in the path are stored.
Backtracking search generates only one successor for each
node.
Completeness: no, if the expanded subtree has an
infinite depth.
Optimality: no, if a solution located deeper, but located in a
subtree expanded earlier, is found.
Time complexity: O(b )m.
Space complexity: O(bm) (linear!).
spring 2011
Search Methods
Blind Search : Depth Limited Search (DLS)

In Depth Limited Search, we first set a constraint on how deep (or how far from root)
will we go.
In the above example , If we fix the depth limit to 2, DLS can be carried out similarly to
the DFS until the goal node is found to exist in the search domain of the tree.
spring 2011
• Searching techniques – Uninformed search – • AO* research

Search
Search
Search
Deepening search

Blind Search : Iterative Deepening DFS
(ID-DFS)
DEPTH
LIMITED
SEARCH
Blind Search : Iterative Deepening DFS
(ID-DFS)
Blind Search : Iterative Deepening DFS Search Methods
(ID-DFS)
IDS characteristics
Completeness: yes.
Optimality: yes, if step cost = 1.
Time complexity:
(d + 1)b0 + db1 + (d − 1)b2 + . . . + bd = O(bd ).
Space complexity: O(bd).
Numerical comparison for b = 10, d = 5
N(IDS ) = 50 + 400 + 3000 + 20000 + 100000 = 123450

N(BFS) = 10 + 100 + 1000 + 10000 + 100000 + 999990 = 1111100
Conclusion
IDS exhibits better performance, because it does not expand other
nodes at depth d.
APPLICATION: MAZE GAME
• HOW TO REACH TO THE GOAL?

• BFS SOLUTION?
– S-1-2-3-5-8-10-12-14-16-19-G
– SEARCH SOLUTION FOUND IN 12
STEPS
DFS SOLUTION?
S-1-2-3-6-5-8-9-10-11-13-16-18-G
SEARCH SOLUTION FOUND IN 14 STEPS
• BFS SOLUTION?
– S-1-2-3-5-8-10-12-14-16-19-G
– SEARCH SOLUTION FOUND IN 12 STEPS
5 15 19
S
6 7 8 17 18
1 G 21
10
9 20 21
2
12
3
11
19
x
x
14
5 4 13
16
15
19
• BFS SOLUTION?
– S-1-2-3-5-8-10-12-14-16-19-G
– SEARCH SOLUTION FOUND IN 12 STEPS
5 15 19
S
6 7 8 17 18
1 G 21
10
9 20 21
2
12
3
11
19
x
x
14
5 4 13
16
15
19
Blind Search : Uniform Cost Search
• This algorithm comes into play when a different cost is

available for each edge.
• The primary goal of the uniform-cost search is to find a
path to the goal node which has the lowest cumulative
cost.
• Uniform-cost search expands nodes according to their path
costs form the root node. It can be used to solve any
graph/tree where the optimal cost is in demand.
• A uniform-cost search algorithm is implemented by the
priority queue.
• It gives maximum priority to the lowest cumulative cost.
• Uniform cost search is equivalent to BFS algorithm if the
path cost of all edges is the same.
Uniform Cost Search - Example
Minimum is S->D->C->G2
And also G2 is one of the destination nodes thus we found our path.
In this way we can find the path with the minimum cumulative cost from a start node to
ending node – S->D->C->G2 with cost total cost as 13(marked with green color).
Uniform Cost Search
Implementation: fringe =
queue ordered by path cost
Equivalent to breadth-first if
all step costs all equal.
Breadth-first is only optimal
if step costs is increasing
with depth.
(e.g. constant). Can we
guarantee optimality for
any step cost?
Uniform-cost Search:
Expand node with
smallest path cost g(n).
Blind strategies UCS
Search Methods
Uniform Cost Search
Uniform-cost search strategy

Expands the node with the lowest cost from the start node
rst.a
Completeness: yes, if cost of each step ≥ s > 0.
Optimality: same as above nodes are expanding in
increasing order of cost.
Time and space complexity: O(b1+ƒC ∗/s¶).b
a
If all step costs are equal, UCS=BFS. C
b
∗ cost of optimal solution
Blind strategies UCS
Search Methods
Uniform Cost Search

Blind search -Bidirectional Search
32
Summary of Blind Search Algorithms
Criterion Breadth- Depth-

First First
Time d m
b b
Space d bm
b
Optimal? Yes No
Complete? Yes No
b: branching factor d: solution depth m: maximum depth
33
Summary of Blind Search Algorithms
Algorithm Space Time Complete Optimal

BFS Theta Theta(b^d) Yes Yes
(b^d)
DFS Theta(d) Theta(b^m No No
)
UCS Theta(b^ Theta(b^d) Yes Yes
(ceil(C*/e))
DLS Theta(l) Theta(b^l) No No
IDS Theta(d) Theta(b^d) Yes Yes
34
Summary of Search Algorithms
35
• Searching techniques – Uninformed search – • AO* search

Search
Search
Search
Deepening search

Informed Search Algorithms
• Generate and Test

• Best-first search
• Greedy best-first search
• A* search
• Heuristics
Generate-and-test
• Very simple strategy - just keep guessing.
do while goal not accomplished

generate a possible solution
test solution to see if it is a goal
• Heuristics may be used to determine the

specific rules for solution generation.
38
Generate-and-test
Example - Traveling Salesman Problem
(TSP)
• Traveler needs to visit n cities.
• Know the distance between each pair of cities.
• Want to know the shortest route that visits all
the cities once.
• n=80 will take millions of years to solve
exhaustively!
39
Generate-and-test
TSP Example
A 6 B
1 2
5 3
D 4 C
40
Generate-and-test Example
• TSP - generation of possible

solutions is done in
lexicographical order of A B C D
cities:
1. A - B - C - D B C D
2. A - B - D - C
3. A - C - B - D
4. A - C - D - B C D B D C B
...
D C D B B C
41
Best First Search Algorithms
• Idea: use an evaluation function f(n) for each node

– f(n) provides an estimate for the total cost.
Expand the node n with smallest f(n).
• Implementation:
Order the nodes in fringe increasing order of cost.
• Special cases:
– greedy best-first search
– A* search
Romania with straight-line dist.
Greedy best-first search
• f(n) = estimate of cost from n to goal

• e.g., f(n) = straight-line distance from n to
Bucharest
• Greedy best-first search expands the node
that appears to be closest to goal.
Greedy best-first search example
GBFS is not complete
c
b g
a goal state
start state
f(n) = straightline distance

Properties of greedy best-first search
• Complete? No – can get stuck in loops.

• Time? O(bm), but a good heuristic can give
dramatic improvement
• Space? O(bm) - keeps all nodes in memory
• Optimal? No
e.g. Arad Sibiu Rimnicu
Virea Pitesti Bucharest is shorter!

Search
Search
Search
Deepening search

*
A search
• Idea: avoid expanding paths that are already
expensive
• Evaluation function f(n) = g(n) + h(n)
• g(n) = cost so far to reach n
• h(n) = estimated cost from n to goal
• f(n) = estimated total cost of path through n to goal
• Best First search has f(n)=h(n)
• Uniform Cost search has f(n)=g(n)
Applications of
Heuristic Search : A* Algorithm
•Widely known algorithm – (pronounced as “A

star” search).
•Evaluates nodes by combining g(n) “cost to
reach the node” and h(n) “cost to get to the
goal”
•f(n) = g(n) + h(n), f(n) estimated cost of the
cheapest solution.
•Complete and optimal ~ since evaluates all
paths.
Heuristic Search : A* Algorithm
Path cost for S-D-G
S :10 f(S) = g(S) + h(S)
2 3 = 0 + 10 10
f(D) = (0+3) + 9 12
A :8 D :9 f(G) = (0+3+3) + 0 6
5 3 1 Total path cost = f(S)+f(D)+f(G) 28
2
G Path cost for S-A-G’

E G H
’ f(S) = 0 + 10 10
:4 :0 :0 :3
f(A) = (0+2) + 8 10
f(G’) = (0+2+2) + 0 4
* Path S-A-G’ is chosen =
Lowest cost Total path cost = f(S)+f(A)+f(G’) 24
Application of Heuristic Search :
A* Algorithm – Snake & Ladder
12 h(n)=0 11 10 9g(n)=8, h(n)=8,
g(n)=1, h(n)=2, f(n)=16
f(n)=3
8 7 6 5
1 2 3 4
g(n)=0, h(n)=11, f(n)=11 g(n)=1, h(n)=2, f(n)=3 g(n)=2, h(n)=9, f(n)=11 g(n)=3, h(n)=8, f(n)=11
1 2 (10) 3 4 5 6 7
2 (10) 3 4 5 6 7 8
3 4 5 6 7 8 9 (4)
4 5 6 7 8 9 (4) 10
5 6 7 8 9 (4) 10 11
6 7 8 9 (4) 10 11 12
7 8 9 (4) 10 11 12 X
8 9 (4) 10 11 12 X
9 (4) 10 11 12 X
10 11 12 X
11 12 X
Demystifying AI algorithms
12 X
*
A search example
A* search example
*
A search example
*
A search example
*
A search example
*
A search example
try yourself !
6 1 straight-line distances
3 A D F 1
h(S-G)=10
2 h(A-G)=7
S 4 8 G
B E h(D-G)=1
h(F-G)=1
1 20 h(B-G)=10
C h(E-G)=8
h(C-G)=20
h(G-G)=0
The graph above shows the step-costs for different paths going from the start (S) to
the goal (G). On the right you find the straight-line distances.
1. Draw the search tree for this problem. Avoid repeated states.
2. Give the order in which the tree is searched (e.g. S-C-B...-G) for A* search.
Use the straight-line dist. as a heuristic function, i.e. h=SLD,
and indicate for each node visited what the value for the evaluation function, f, is.
Admissible heuristics
• A heuristic h(n) is admissible if for every node n,

h(n) ≤ h*(n), where h*(n) is the true cost to reach the
goal state from n.
• An admissible heuristic never overestimates the cost to
reach the goal, i.e., it is optimistic
• Example: hSLD(n) (never overestimates the actual road
distance)
• Theorem: If h(n) is admissible, A* using TREE-SEARCH
is optimal
63
Admissible heuristics – 8 Puzzle Problem
Try it out!
START
1 2 3
7 8 4
6 5
GOAL
1 2 3
8 4
7 6 5
64
Admissible heuristics – 8 Puzzle Problem
START
1 2 3
START
7 8 4
1 2 3
6 5
7 8 4
6 5
START START START
1 2 3 1 2 3 1 2 3
GOAL 7 4 7 8 4 7 8 4
1 2 3 6 8 5 6 5 6 5
8 4
7 6 5 H=5 H=4 H=6
65
Admissible heuristics
E.g., for the 8-puzzle:

• h1(n) = number of misplaced tiles
• h2(n) = total Manhattan distance
(i.e., no. of squares from desired location of each tile)
• h1(S) = ?
• h2(S) = ?
66
Evaluation of Search Algorithms
Completeness
Is the algorithm guaranteed to nd a solution if one exists?
Optimality
When the algorithm nds a solution, is this the optimal one?
Time complexity
How long does it take to nd a solution?a
a
Often measured as number of nodes generated during search.
Space complexity
How much memory is needed to perform the search?a
a
Often measured as maximum number of nodes stored in memory.

Search
Search
Search
Deepening search

Algorithm AO*
E.g.,
1. forInitialize:
the 8-puzzle:Set G* = {s}, f(s) = h(s)
• h1(n) = number If s of
∈ misplaced tiles
T, label s as SOLVED
2. Terminate: If s is SOLVED, then terminate
3. Select: Select a non-terminal leaf node n from the
marked sub-tree below s in G*
4. Expand: Make explicit the successors of n
For each successor, m, not already in G*:
Set f(m) = h(m)
If m is a terminal node, label m as SOLVED
5. Cost Revision: Call cost-revise(n)
•6.h1(S) =?
Loop: Go To Step 2.
• h2(S) = ?
69
Cost Revision in AO*: cost-revise(n)
E.g.,
1. for the Z8-puzzle:
Create = {n}
2.
• h1(n) If Z =
= {number
} return of misplaced tiles
3. Select a node m from Z such that m has no descendants in Z
4. If m is an AND node with successors r1, r2, … rk:
Set f(m) = Σ [ f(ri) + c(m, ri) ]
Mark the edge to each successor of m
If each successor is labeled SOLVED, then label m as SOLVED
• If m is an OR node with successors r , r , … r :
1 2 k
Set f(m) = min { f(r ) + c(m, r ) }
i i
Mark the edge to the best successor of m
If the marked successor is labeled SOLVED, label m as SOLVED
1. If the cost or label of m has changed, then insert those parents of m into Z for which
• h (S) = ?
m is a marked successor
1Go to Step 2.
2.
• h2(S) = ?
70
Searching OR Graphs
E.g.,
• for
Howthe 8-puzzle:
does AO* fare when the graph has only OR nodes?
• •h1(n)
What= number of misplaced
are the roles tiles and upper-bound estimates?
of lower-bound
• h2(n) – =Pruning
total Manhattan
criteria: LB distance
> UB
Searching Game Trees
• Consider an OR tree with two types of OR nodes, namely Min nodes and Max
nodes
• In Min nodes we select the minimum cost successor
• In Max nodes we select the maximum cost successor
• Terminal nodes are winning or loosing states

• h1(S)
– It= ? infeasible to search up to terminal nodes
is often
– We use heuristic costs to compare non-terminal nodes
• h2(S) =?
71
Shallow and Deep Pruning
E.g., for the 8-puzzle:

• h1(n) = number of misplaced
ROOT tiles ROOT
(i.e., no. of squares
A from desired location of each tile)
10
B
10 F
Max node
14 C D
G Min node
• h1(S) = ? 5 E
• h2(S) = ? Cut-off
Shallow Deep Cut-off
72

Search
Search
Search
Deepening search

Local search algorithms
74
• In many optimization problems, the path to the goal is
irrelevant; the goal state itself is the solution
• State space = set of "complete" configurations

• Find configuration satisfying constraints, e.g., n-queens
• In such cases, we can use local search algorithms
• keep a single "current" state, try to improve it.
• Very memory efficient (only remember current state)
75
Types of Local Search
• Hill-climbing Search
• Simulation Annealing Search
76
Terminology of Local Search
77
Hill Climbing
• Searching for a goal state = Climbing to

the top of a hill.
• Generate-and-test + direction to move.

• Heuristic function to estimate how close a
given state is to a goal state.
78
Hill Climbing
Algorithm
1. Evaluate the initial state.
2. Loop until a solution is found or there are
no new operators left to be applied:
− Select and apply a new operator
− Evaluate the new state:
goal → quit
better than current state → new current state
79
Steepest Ascent Hill Climbing
• Considers all the moves from the current state.
• Selects the best one as the next state.

Steepest-Ascent Hill Climbing
• current start node
• loop do
– neighbor a highest-valued successor of current
– if neighbor.Value <= current.Value then return current.State
– current neighbor
• end loop
80
Steepest Ascent/Descent Hill Climbing
current
6
4 10 3 2 8
What if current had a value of 12?
81
Hill Climbing Example
A local heuristic function
Count +1 for every block that sits on the correct thing. The goal state has
the value +8.
Count -1 for every block that sits on an incorrect thing. In the initial state
blocks C, D, E, F, G, H count +1 each. Blocks A, B count -1 each , for the
total of +4.
Move 1 gives the value +6 (A is now on the correct support). Moves 2a
and 2b both give +4 (B and H are wrongly situated). This means we have
a local maximum of +6.
82
A global heuristic function
Count +N for every block that sits on a correct stack of N things. The goal
state has the value +28.
Count -N for every block that sits on an incorrect stack of N things. That is,
there is a large penalty for blocks tied up in a wrong structure.
In the initial state C, D, E, F, G, H count -1, -2,
-3, -4, -5, -6. A counts -7 , for the total of -28.
83
Move 1 gives the value -21 (A is now on the correct support).

Move 2a gives -16, because C, D, E, F, G, H
count -1, -2, -3, -4, -5, -1.
Move 2b gives -15, because C, D, E, F, G
count -1, -2, -3, -4, -5.
There is no local maximum!
Moral: sometimes changing the heuristic function is all we need.
84
Hill-climbing search
• Problem: depending on initial state, can get stuck in local maxima
85
Hill Climbing: Disadvantages
Local maximum
A state that is better than all of its
neighbours, but not better than
some other states far away.
Plateau
A flat area of the search space in
which all neighbouring states have
the same value.
Ridge
The orientation of the high region, compared to
the set of available moves, makes it impossible
to climb up. However, two moves executed
serially may increase the height. 86
Hill Climbing: Disadvantages
Ways Out
• Backtrack to some earlier node and try

going in a different direction.
• Make a big jump to try to get into a new
solution.
• Moving in several directions at once.
87

Search
Search
Search
Deepening search

Simulated Annealing – Basic Steps
89
Simulated annealing search
• Idea: escape local maxima by allowing some "bad"
moves but gradually decrease their frequency.
• This is like smoothing the cost landscape.
• One can prove: If T decreases slowly enough, then

simulated annealing search will find a global
optimum with probability approaching 1 (however,
this may take VERY long)
90
• A variation of hill climbing in which, at the

beginning of the process, some downhill
moves may be made.
• To do enough exploration of the whole space

early on, so that the final solution is
relatively insensitive to the starting state.
• Lowering the chances of getting caught at a

local maximum, or plateau, or a ridge.
91
Physical Annealing
• Physical substances are melted and then
gradually cooled until some solid state is
reached.
• The goal is to produce a minimal-energy state.
• Annealing schedule: if the temperature is
lowered sufficiently slowly, then the goal will
be attained.
• Nevertheless, there is some probability for
a transition to a higher energy state: 92
Simulated annealing algorithm
80
93
Simulated annealing example
94
95
1. The objective function to minimize is a simple function of two variables:
min f(x) = (4 - 2.1*x1^2 + x1^4/3)*x1^2 + x1*x2 + (-4 + 4*x2^2)*x2^2;
x1, x2 >=0, x1,x2 <=20 5bits Tmin = 50
Delta e(-DeltaE
Decimal Decimal Decimal Decimal Delta E = E<0? /T) < r? =
k=1 Binary x1 x1 Binary x2 x2 f(x) Binary x1'x1' Binary x2'x2' f(x') f(x')-f(x) Accept e(-DeltaE/T) r Accept
T=300 10 2 1001 9 25941.73 0 0 1000 8 16128 -9813.733333Yes Accept
T=T/2=150 0 0 1000 8 16128 1 1 1100 12 82382.23 66254.23333No 1.494E-192 0.6Accept
T=75 1 1 1100 12 82382.23 0 0 1110 14 152880 70497.76667No 0 0.7Accept
T=37.5
16128
6
2 5 1 1 3 2 1 2
96

Search
Search
Search
Deepening search

Local Beam Search
• Keep track of k states rather than just one, as

in hill climbing
• In comparison to beam search we saw earlier,
this algorithm is state-based rather than
node-based.
98
Local Beam Search
• Begins with k randomly generated states

• At each step, all successors of all k states are
generated
• If any one is a goal, alg halts
• Otherwise, selects best k successors from the
complete list, and repeats
99
Local Beam Search
• Successors can become concentrated in a small part of state

space
• Stochastic beam search: choose k successors, with probability
of choosing a given successor increasing with value
• Like natural selection: successors (offspring) of a state
(organism) populate the next generation according to its
value (fitness)
100

Search
Search
Search
Deepening search

Genetic Algorithms
• Genetic Algorithms(GAs) are adaptive heuristic search

algorithms that belong to the larger part of evolutionary
algorithms. Genetic algorithms are based on the ideas of
natural selection and genetics.
• They are commonly used to generate high-quality
solutions for optimization problems and search problems.
• n simple words, they simulate “survival of the fittest”
among individual of consecutive generation for solving a
problem. Each generation consist of a population of
individuals and each individual represents a point in search
space and possible solution. Each individual is represented
as a string of character/integer/float/bits. This string is
analogous to the Chromosome.
Genetic Algorithms
• Search Space
– The population of individuals are maintained within search space. Each

individual represent a solution in search space for given problem. Each
individual is coded as a finite length vector (analogous to chromosome)
of components. These variable components are analogous to Genes.
Thus a chromosome (individual) is composed of several genes (variable
components).
– A Fitness Score is given to each individual which shows the ability of an
individual to “compete”. The individual having optimal fitness score (or
near optimal) are sought.
Genetic Algorithms
Genetic Algorithms
• Cross Over
• Mutation
GA can be summarized as:
• 1) Randomly initialize populations p

• 2) Determine fitness of population
• 3) Until convergence repeat:
– a) Select parents from population
– b) Crossover and generate new population
– c) Perform mutation on new population
– d) Calculate fitness for new population
https://www.geeksforgeeks.org/genetic-algorithms/
Random Number Table for Solving
GA Problem
GA Calculation
Initial Decimal Fitness Pi Ei Actual New Crossover

Pop Count Population and
Mutation
GA Example
Initial Decimal F(Z) Pi = Ei = F(z)/ Actual NEW Crossov Mutat Offsprings Decimal
Pop F(z)/Sum Avg(F(z)) Count POP er ion
F(z)
00010 2 4 0.004 0.016 0 1|1010 11110 11110 30
01001 9 81 0.084 0.338 0 11010 - 11110 11110 30
11010 26 676 0.706 2.825 3 11010 11010 26
01110 14 196 0.206 0.81 1 0|1110 01010 01010 10
Sum F(z) 937
Avg F(z) 239.25
F(z) = x^2, 0<=x<=30;

Pc= 0.6, i.e. 0.6*4 = 2.4; i.e. 2 strings
Pm=0.2 , i.e. 0.2*4= 0.8, i.e.1 string

Search
Search
Search
Deepening search

Adversarial search Methods-Game
• Adversarial search: Search based on Game theory-

Agents- Competitive environment
• According to game theory, a game is played between
two players. To complete the game, one has to win
the game and the other looses automatically.’
• Such Conflicting goal- adversarial search
• Game playing technique- Those games- Human
Intelligence and Logic factor- Excluding other factors
like Luck factor
– Tic-Tac-Toe, Checkers, Chess – Only mind works, no luck
works
Adversarial search Methods-Game
• Techniques required to get
the best optimal solution
(Choose Algorithms for best
optimal solution within
limited time)
– Pruning: A technique
which allows ignoring the
unwanted portions of a
search tree which make no
difference in its final
result.
– Heuristic Evaluation
Function: It allows to
approximate the cost value
at each level of the search
tree, before reaching the
goal node.

Search
Search
Search
Deepening search

Game playing and knowledge structure-
Elements of Game Playing search
• To play a game, we use a game tree to • For example, in chess, tic-tac-toe, we have
know all the possible choices and to pick two or three possible outcomes. Either to
the best one out. There are following win, to lose, or to draw the match with
elements of a game-playing: values +1,-1 or 0.
• S0: It is the initial state from where a game • Game Tree for Tic-Tac-Toe
begins. – Node: Game states, Edges: Moves taken by
• PLAYER (s): It defines which player is having players
the current turn to make a move in the
state.
• ACTIONS (s): It defines the set of legal
moves to be used in a state.
• RESULT (s, a): It is a transition model which
defines the result of a move.
• TERMINAL-TEST (s): It defines that the
game has ended and returns true.
• UTILITY (s,p): It defines the final value with
which the game has ended. This function is
also known as Objective function or Payoff
function. The price which the winner will
get i.e.
• (-1): If the PLAYER loses.
• (+1): If the PLAYER wins.
• (0): If there is a draw between the PLAYERS.
https://www.tutorialandexample.com/adversarial-search-in-artificial-intelligence/#:~:text=AI%20Adversarial%20search%3A%20A
dversarial%20search,order%20to%20win%20the%20game.
Game playing and knowledge structure-
Elements of Game Playing search
• INITIAL STATE (S0): The top node in
the game-tree represents the initial
state in the tree and shows all the
possible choice to pick out one.
• PLAYER (s): There are two
players, MAX and MIN. MAX begins
the game by picking one best move
and place X in the empty square box.
• ACTIONS (s): Both the players can
make moves in the empty boxes
chance by chance.
• RESULT (s, a): The moves made
by MIN and MAX will decide the
outcome of the game.
• TERMINAL-TEST(s): When all the
empty boxes will be filled, it will be the
terminating state of the game.
• UTILITY: At the end, we will get to
know who wins: MAX or MIN, and
accordingly, the price will be given to
them.

Search
Search
Search
Deepening search
• Game as a search problem -Minimax Approach

Game as a search problem
• Types of algorithms in Adversarial search

– In a normal search, we follow a sequence of actions to
reach the goal or to finish the game optimally. But in
an adversarial search, the result depends on the
players which will decide the result of the game. It is
also obvious that the solution for the goal state will be
an optimal solution because the player will try to win
the game with the shortest path and under limited
time.
• Minmax Algorithm
• Alpha-beta Pruning
Game Playing vs. Search
• Game vs. search problem
• "Unpredictable" opponent specifying

a move for every possible opponent reply
• Time limits unlikely to find goal, must

approximate
Game Playing
• Formal definition of a game:

– Initial state
– Successor function: returns list of (move,
state) pairs
– Terminal test: determines when game over
Terminal states: states where game ends
– Utility function (objective function or payoff
function): gives numeric value for terminal
states
We will consider games with 2 players (Max and Min);
Max moves first.
Game Tree Example:
Tic-Tac-Toe
Tree from
Max’s
perspective
Minimax Algorithm
• Minimax algorithm
– Perfect play for deterministic, 2-player game
– Max tries to maximize its score
– Min tries to minimize Max’s score (Min)
– Goal: move to position of highest minimax value
Identify best achievable payoff against best play
Minimax Algorithm
Payoff for Max

Minimax Rule
• Goal of game tree search: to determine one move for Max

player that maximizes the guaranteed payoff for a given
game tree for MAX
Regardless of the moves the MIN will take
• The value of each node (Max and MIN) is determined by (back
up from) the values of its children
• MAX plays the worst case scenario:
Always assume MIN to take moves to maximize his pay-off
(i.e., to minimize the pay-off of MAX)
• For a MAX node, the backed up value is the maximum of the
values associated with its children
• For a MIN node, the backed up value is the minimum of the
values associated with its children
Minimax procedure
1. Create start node as a MAX node with current board configuration

2. Expand nodes down to some depth (i.e., ply) of lookahead in the game.
3. Apply the evaluation function at each of the leaf nodes
4. Obtain the “back up" values for each of the non-leaf nodes from its
children by Minimax rule until a value is computed for the root node.
5. Pick the operator associated with the child node whose backed up
value determined the value at the root as the move for MAX
Minimax Search
2
2 1 2 1
2 7 1 8 2 7 1 8 2 7 1 8
This is the move 2

Static evaluator selected by minimax
value
2 1
MAX
MIN
2 7 1 8
Minimax Algorithm (cont’d)
3 9 0 7 2 6
Payoff for Max

3 0 2
3 9 0 7 2 6
Payoff for Max

3 0 2
3 9 0 7 2 6
Payoff for Max

• Properties of minimax algorithm:

• Complete? Yes (if tree is finite)
Optimal? Yes (against an optimal opponent)
Time complexity? O(bm)
m – maximum depth of tree; b branching
• Space
factor complexity? O(bm) (depth-first exploration,
if it generates all successors at once)
m – maximum depth of the tree; b – legal

moves;
Minimax Algorithm
• Limitations
– Not always feasible to traverse entire tree
– Time limitations
• Key Improvement
– Use evaluation function instead of utility
• Evaluation function provides estimate of utility at given
position

Search
Search
Search
Deepening search

α-β Pruning
• Can we improve search by reducing the size of

the game tree to be examined?
Yes!!! Using alpha-beta pruning
Principle
– If a move is determined worse than another move already
examined, then there is no need for further examination of the
node.
α-β Pruning Example
α-β Pruning Example (cont’d)
Alpha-Beta Pruning (αβ prune)
• Rules of Thumb
– α is the best ( highest) found so far along the path for

Max
– β is the best (lowest) found so far along the path for Min
– Search below a MIN node may be alpha-pruned if
the its β ≤ α of some MAX ancestor
– Search below a MAX node may be beta-pruned if
the its α≥ β of some MIN ancestor.
Alpha-Beta Pruning Example
1. Search below a MIN

node may be
alpha-pruned if the
beta value is <= to
the alpha value of
some MAX ancestor.
•
2. Search below a MAX
node may be
beta-pruned if the
alpha value is >= to
the beta value of
some MIN ancestor.
1. Search below a MIN 3

node may be
alpha-pruned if the
beta value is <= to
the alpha value of
some MAX ancestor.
•
3
node may be
beta-pruned if the
the beta value of
some MIN ancestor.
3

node may be
alpha-pruned if the
beta value is <= to
the alpha value of
some MAX ancestor.
•
3
node may be
beta-pruned if the
the beta value of
some MIN ancestor.
3 5
β

node may be
alpha-pruned if the
beta value is <= to
the alpha value of
some MAX ancestor.
•
3 0
node may be
beta-pruned if the α
the beta value of
some MIN ancestor.
3 5 0
β

node may be
alpha-pruned if the
beta value is <= to
the alpha value of
some MAX ancestor.
•
3 0 2
node may be
beta-pruned if the α α
the beta value of
some MIN ancestor.
3 5 0 2
β
The α-β algorithm
The α-β algorithm
Another Example
1. Search below a MIN
node may be
alpha-pruned if the
beta value is <= to
the alpha value of
some MAX ancestor.

node may be
beta-pruned if the
the beta value of
some MIN ancestor.
Example
1. Search below
a MIN node
may be
alpha-pruned
5 if the beta
5 3 value is <= to
the alpha
value of
some MAX
ancestor.
5 3
2. Search
5 6 3 7 below a MAX
α node may be
beta-pruned
if the alpha
value is >= to
the beta
5 6 3 value of
some MIN
5 0 6 1 3 2 4 7 ancestor.
β
Why is it called α-β?
•α is the value of the best (i.e.,

highest-value) choice found so
far at any choice point along
the path for max
•If v is worse than α, max will

avoid it
prune that branch
•Define β similarly for min

Properties of α-β Prune
• Pruning does not affect final result
• Good move ordering improves effectiveness of

pruning b(e.g., chess, try captures first, then
threats, froward moves, then backward moves…)
• With "perfect ordering," time complexity = O(bm/2)

doubles depth of search that alpha-beta pruning can
explore
Example of the value of reasoning about which

computations are relevant (a form of metareasoning)

Search
Search
Search
Deepening search

What is Game Theory?
• It deals with Bargaining.
• The whole process can be expressed

Mathematically
• Based on Behavior Theory, has a more casual

approach towards study of Human Behavior.
• It also considers how people Interact in Groups.

Game Theory Definition
•Theory of rational behavior for interactive decision problems.
• In a game, several agents strive to maximize their (expected) utility

index by choosing particular courses of action, and each agent's final
utility payoffs depend on the profile of courses of action chosen by
all agents.
•The interactive situation, specified by the set of participants, the

possible courses of action of each agent, and the set of all possible
utility payoffs, is called a game;
• the agents 'playing' a game are called the players.

Definitions
Definition: Zero-Sum Game – A game in

which the payoffs for the players always adds
up to zero is called a zero-sum game.
Definition: Maximin strategy – If we

determine the least possible payoff for each
strategy, and choose the strategy for which this
minimum payoff is largest, we have the
maximin strategy.
A Further Definition
Definition: Constant-sum and nonconstant-sum game –

If the payoffs to all players add up to the same constant,
regardless which strategies they choose, then we have a
constant-sum game. The constant may be zero or any
other number, so zero-sum games are a class of
constant-sum games. If the payoff does not add up to a
constant, but varies depending on which strategies are
chosen, then we have a non-constant sum game.
Game theory: assumptions
(1) Each decision maker has available to him two

or more well-specified choices or sequences of
choices.
(2) Every possible combination of plays available

to the players leads to a well-defined end-state
(win, loss, or draw) that terminates the game.
(3) A specified payoff for each player is associated

with each end-state.
Game theory: assumptions (Cont)
(4) Each decision maker has perfect

knowledge of the game and of his opposition.
(5) All decision makers are rational; that is,

each player, given two alternatives, will select
the one that yields him the greater payoff.
Rules, Strategies, Payoffs, and Equilibrium
⚫ A game is a contest involving two or more decision

makers, each of whom wants to win
⚫ Game theory is the study of how optimal strategies are
formulated in conflict
⚫ A player's payoff is the amount that the player wins
or loses in a particular situation in a game.
⚫ A players has a dominant strategy if that player's
best strategy does not depend on what other players
do.
⚫ A two-person game involves two parties (X and Y)
⚫ A zero-sum game means that the sum of losses for one player
must equal the sum of gains for the other. Thus, the overall sum
is zero
Rules, Strategies, Payoffs, and Equilibrium
⚫ Economic situations are treated as games.

⚫ The rules of the game state who can do what, and
when they can do it.
⚫ A player's strategy is a plan for actions in each
possible situation in the game.
⚫ Strategies taken by others can dramatically affect
the outcome of our decisions
⚫ In the auto industry, the strategies of competitors to
introduce certain models with particular features can
impact the profitability of other carmakers
Payoff Matrix - Store X
• Two competitors are planning radio and

newspaper advertisements to increase their
business. This is the payoff matrix for store X.
A negative number means store Y has a
positive payoff
S-159
Game Outcomes
S-160
Minimax Criterion
⚫ Look to the “cake cutting problem” to explain
⚫ Cutter – maximize the minimum the Chooser will
leave him
⚫ Chooser – minimize the maximum the Cutter will
get
Chooser Choose bigger piece Choose smaller piece
Cutter
Cut cake as evenly as Half the cake minus a Half the cake plus a
possible crumb crumb
Make one piece bigger Small piece Big piece

than the other
Minimax Criterion
⚫ The game favors competitor X, since all values

are positive except one.
⚫ This means X would get a positive payoff in 3 of
the 4 strategies and Y has a positive payoff in
only 1 strategy
⚫ Since Y must play the game (do something
about the competition), he will play to minimize
total losses using the minimax criterion.
Minimax Criterion
⚫ For a two-person, zero-sum game, each person chooses

the strategy that minimizes the maximum loss or
maximize one’s minimum gains
⚫ Player Y (columns)is looking at a maximum loss of 3
under strategy Y1 and loss of 5 under Y2
⚫ Y should choose Y1 which results in a maximum loss of 3
(minimum of 3 and 5) – minimum of the maximums (upper
value of the game)
⚫ The minimum payoffs for X (rows) are +3 (strategy X1 )
and -5 (strategy X2)
⚫ X should choose strategy X1 – the maximum of the
minumums (lower value of the game)
Minimax Criterion
⚫ If the upper and lower values are the same, the number is called the
value of the game and an equilibrium or saddle point condition exists
⚫ The value of a game is the average or expected game outcome if the game
is played an infinite number of times
⚫ A saddle point indicates that each player has a pure strategy i.e., the
strategy is followed no matter what the opponent does
Saddle Point
• Von Neumann likened the solution point to the point

in the middle of a saddle shaped mountain pass
– It is, at the same time, the maximum elevation reached by
a traveler going through the pass to get to the other side
and the minimum elevation encountered by a mountain
goat traveling the crest of the range
S-165
Pure Strategy - Minimax Criterion
Player Y’s Minimum Row

Strategies Number
Y1 Y2
Player X’s strategies X1 10 6 6
X2 -12 2 -12
Maximum Column 10 6
Number
S-166
Mixed Strategy Game
⚫ When there is no saddle point, players will play each strategy for a
certain percentage of the time
⚫ The most common way to solve a mixed strategy is to use the expected
gain or loss approach
⚫ A player plays each strategy a particular percentage of the time so that the
expected value of the game does not depend upon what the opponent
does
Y1 Y2 Expected Gain
P 1-P
X1 4 2 4P+2(1-P)
Q
X2 1 10 1P+10(1-p)
1-Q
4Q+1(1-Q) 2Q+10(1-q)
Mixed Strategy Game
: Solving for P & Q
4P+2(1-P) = 1P+10(1-P)
or: P = 8/11 and 1-p = 3/11
Expected payoff:
1P+10(1-P)
=1(8/11)+10(3/11)
EPX= 3.46
4Q+1(1-Q)=2Q+10(1-q)
or: Q=9/11 and 1-Q = 2/11
Expected payoff:
EPY=3.46
S-168
Mixed Strategy Game : Example
• Using the solution procedure for a mixed

strategy game, solve the following game
S-169
Mixed Strategy Game
Example
• This game can be solved by setting up the
mixed strategy table and developing the
appropriate equations:
S-170
Mixed Strategy Game: Example
S-171
Two-Person Zero-Sum and Constant-Sum
Games
Two-person zero-sum and constant-sum games are played according to
the following basic assumption:
Each player chooses a strategy that enables him/her to do the best he/she
can, given that his/her opponent knows the strategy he/she is following.
A two-person zero-sum game has a saddle point if and only if

Max (row minimum) = min (column maximum)
all all
rows columns
(1)
Games (Cont)
If a two-person zero-sum or constant-sum game has a saddle point, the row

player should choose any strategy (row) attaining the maximum on the right
side of (1). The column player should choose any strategy (column) attaining
the minimum on the right side of (1).
In general, we may use the following method to find the optimal strategies and
value of two-person zero-sum or constant-sum game:
Step 1 Check for a saddle point. If the game has none, go on to step 2.
Games (Cont)
Step 2 Eliminate any of the row player’s dominated strategies. Looking at

the reduced matrix (dominated rows crossed out), eliminate any of the
column player’s dominated strategies and then those of the row player.
Continue until no more dominated strategies can be found. Then proceed to
step 3.
Step 3 If the game matrix is now 2 x 2, solve the game graphically.

Otherwise, solve by using a linear programming method.
Zero Sum Games
• Game theory assumes that the decision maker and the

opponent are rational, and that they subscribe to the
maximin criterion as the decision rule for selecting
their strategy
• This is often reasonable if when the other player is an
opponent out to maximize his/her own gains, e.g.
competitor for the same customers.
• Consider:
Player 1 with three strategies S1, S2, and S3 and Player
2 with four strategies OP1, OP2, OP3, and OP4.
Zero Sum Games (Cont)
• The value 4 achieved by both players is

called the value of the game
• The intersection of S2 and OP2 is called a
saddle point. A game with a saddle point is
also called a game with an equilibrium
solution.
• At the saddle point, neither player can
improve their payoff by switching strategies
Zero Sum Games- To do problem!
Let’s take the following example: Two TV channels (1 and 2) are competing
for an audience of 100 viewers. The rule of the game is to simultaneously
announce the type of show the channels will broadcast. Given the payoff
matrix below, what type of show should channel 1 air?
Two-person zero-sum game – Dominance
property
dominance method Steps (Rule)

Step-1: 1. If all the elements of Column-i are greater than or equal to the corresponding
elements of any other Column-j, then the Column-i is dominated by the Column-j and it
is removed from the matrix.
eg. If Column-2 ≥ Column-4, then remove Column-2
Step-2: 1. If all the elements of a Row-i are less than or equal to the corresponding elements of
any other Row-j, then the Row-i is dominated by the Row-j and it is removed from the
matrix.
eg. If Row-3 ≤ Row-4, then remove Row-3
Step-3:
Again repeat Step-1 & Step-2, if any Row or Column is dominated, otherwise stop the
procedure.
Two-person zero-sum game – Dominance
property- To do problem!
Player
\
Player A
B B1 B2 B3 B4
A1 3 5 4 2
A2 5 6 2 4
A3 2 1 4 0
A4 3 3 5 2
Solutio
B3 B4 Player B
A2 2 4 n
A4 5 2
B3 B4
Player A2 2 4
A A4 5 2
The Prisoner’s Dilemma
•The prisoner’s dilemma is a universal concept. Theorists now realize that

prisoner’s dilemmas occur in biology, psychology, sociology, economics, and
law.
•The prisoner’s dilemma is apt to turn up anywhere a conflict of interests exists
-- and the conflict need not be among sentient beings.
• Study of the prisoner’s dilemma has great power for explaining why animal
and human societies are organized as they are. It is one of the great ideas of
the twentieth century, simple enough for anyone to grasp and of fundamental
importance (...).
• The prisoner’s dilemma has become one of the premier philosophical and
scientific issues of our time. It is tied to our very survival (W. Poundstone,1992,
p. 9).
Prisoner’s Dilemma
• Two members of a criminal gang are arrested and

imprisoned.
– They are placed under solitary confinement and have no chance of
communicating with each other
• The district attorney would like to charge them

with a recent major crime but has insufficient
evidence
– He has sufficient evidence to convict each of them of a lesser charge
– If he obtains a confession from one or both the criminals, he can convict
either or both on the major charge.
Prisoner’s Dilemma
• The district attorney offers each the chance to turn

state’s evidence.
– If only one prisoner turns state’s evidence and testifies against his partner he
will go free while the other will receive a 3 year sentence.
– Each prisoner knows the other has the same offer
– The catch is that if both turn state’s evidence, they each receive a 2 year
sentence
– If both refuse, each will be imprisoned for 1 year on the lesser charge
A game is described by
• The number of players

• Their strategies and their turn
• Their payoffs (profits, utilities etc) at the outcomes of the
game
Payoff matrix
Game Theory Definition
Normal- or strategic form
Player B
Left Right
Top 3, 0 0, -4
Player A
Bottom 2, 4 -1, 3
Game
How to solve Playing like this?
a situation
• The most simple case is where there is a optimal choice of

strategy no matter what the other players do; dominant
strategies.
• Explanation: For Player A it is always better to choose Top, for
Player B it is always better to choose left.
• A dominant strategy is a strategy that is best no matter what
the other player does.
Nash equilibrium
• If Player A’s choice is optimal given Player B’s choice, and B’s
choice is optimal given A’s choice, a pair of strategies is a
Nash equilibrium.
• When the other players’ choice is revealed neither player like
to change her behavior.
• If a set of strategies are best responses to each other, the
strategy set is a Nash equilibrium.
Payoff matrix
Player B
Left Right
Top 1, 1 2, 3*
Player A
Bottom 2, 3* 1, 2
Solution
• Here you can find a Nash equilibrium; Top is the best

response to Right and Right is the best response to Top.
Hence, (Top, Right) is a Nash equilibrium.
• But there are two problems with this solution concept.
Problems
• A game can have several Nash equilibriums. In this case also

(Bottom, Left).
• There may not be a Nash equilibrium (in pure strategies).
Payoff matrix
Player B
Left Right
Top 1, -1 -1, 1
Player A
-1, 1
Bottom 1, -1
Nash equilibrium in mixed strategies
• Here it is not possible to find strategies that are best

responses to each other.
• If players are allowed to randomize their strategies we can
find s solution; a Nash equilibrium in mixed strategies.
• An equilibrium in which each player chooses the optimal
frequency with which to play her strategies given the
frequency choices of the other agents.
The prisoner’s dilemma
Two persons have committed a crime, they are held in

separate rooms. If they both confess they will serve
two years in jail. If only one confess she will be free
and the other will get the double time in jail. If both
deny they will be hold for one year.
Prisoner’s dilemma
Prisoner B
Confess Deny
Prisoner A
Confess -2, -2 0, -4
Deny -4, 0 -1, -1*
Solution
Confess is a dominant strategy for both. If both
Deny they would be better off. This is the
dilemma.
Nash Equilibrium – To do Problems!
HENRY McD (1)

L R L R
JANE U 8,7 4,6 KFC U 9,9* 1,10
D 6,5 7,8 (1)
D 10,1 2,2
COKE
L R B
PEPSI U 6,8* 4,7 L R
A U 7,6* 5,5
D 7,6 3,7
D 4,5 6,4
GAME PLAYING & MECHANISM DESIGN
Mechanism Design is the design of games or

reverse engineering of games; could be called
Game Engineering
Involves inducing a game among the players

such that in some equilibrium of the game,
a desired social choice function is implemented
Mother
Social Planner
Mechanism Designer
Kid 1 Kid 2
Rational and Rational and
Intelligent Intelligent
Example 1: Mechanism Design
Fair Division of a Cake
Tenali Rama
(Birbal)
Mechanism Designer
Baby
Mother 1 Mother 2
Rational and Rational and
Intelligent Player Intelligent Player
Example 2: Mechanism Design

Truth Elicitation through an Indirect Mechanism
One Seller, Multiple Buyers, Single Indivisible Item
Example: B1: 40, B2: 45, B3: 60, B4: 80
Winner: whoever bids the highest; in this case B4
Payment: Second Highest Bid: in this case, 60.
Vickrey showed that this mechanism is Dominant Strategy

Incentive Compatible (DSIC) ;Truth Revelation is good for
a player irrespective of what other players report
MECHANISM DESIGN: EXAMPLE 3 : VICKREY AUCTION
English Auction Dutch Auction

1 1
0, 10, 20, 30, 100, 90, 85, 75,
40, 45, 50, 55, 70, 65, 60,
58, 60, stop. Seller stop. n
n
Buyers Auctioneer or seller Buyers
First Price Auction Vickrey Auction

1 40 40
1
2 50 Winner = 4 45 Winner = 4
2
Price = 60 Price = 60
3 55 60
3
4 60 4 80
Buyers Buyers
Four Basic Types of Auctions

END
UNIT-2
17-03-2021 18CSC305J_AI_UNIT3 1
Knowledge and Reasoning
Table of Contents
• Knowledge and reasoning-Approaches and issues of knowledge reasoning-Knowledge
base agents
• Logic Basics-Logic-Propositional logic-syntax ,semantics and inferences-Propositional
logic- Reasoning patterns
• Unification and Resolution-Knowledge representation using rules-Knowledge
representation using semantic nets
• Knowledge representation using frames-Inferences-
• Uncertain Knowledge and reasoning-Methods-Bayesian probability and belief network
• Probabilistic reasoning-Probabilistic reasoning over time-Probabilistic reasoning over time
• Other uncertain techniques-Data mining-Fuzzy logic-Dempster -shafer theory
17-03-2021 18CSC305J_AI_UNIT3 2
Knowledge Representation & Reasoning
• The second most important concept in AI
• If we are going to act rationally in our environment, then we must have some way of
describing that environment and drawing inferences from that representation.
• how do we describe what we know about the world ?
• how do we describe it concisely ?
• how do we describe it so that we can get hold of the right piece of knowledge when
we need it ?
• how do we generate new pieces of knowledge ?
• how do we deal with uncertain knowledge ?
Knowledge Representation & Reasoning
Knowledge
Declarative Procedural
• Declarative knowledge deals with factoid questions (what is the capital of
India? Etc.)
• Procedural knowledge deals with “How”
• Procedural knowledge can be embedded in declarative knowledge
Planning
Given a set of goals, construct a sequence of actions that achieves
those goals:
• often very large search space
• but most parts of the world are independent of most other
parts
• often start with goals and connect them to actions
• no necessary connection between order of planning and order
of execution
• what happens if the world changes as we execute the plan
and/or our actions don’t produce the expected results?
Learning
• If a system is going to act truly appropriately, then it must

be able to change its actions in the light of experience:
• how do we generate new facts from old ?
• how do we generate new concepts ?
• how do we learn to distinguish different situations in
new environments ?
What is knowledge representation?
•Knowledge representation and reasoning (KR, KRR) is the part of Artificial

intelligence which concerned with AI agents thinking and how thinking contributes
to intelligent behavior of agents.
•It is responsible for representing information about the real world so that a
computer can understand and can utilize this knowledge to solve the complex real
world problems such as diagnosis a medical condition or communicating with
humans in natural language.
•It is also a way which describes how we can represent knowledge in artificial
intelligence. Knowledge representation is not just storing data into some database,
but it also enables an intelligent machine to learn from that knowledge and
experiences so that it can behave intelligently like a human.
17-03-2021 18CSC305J_AI_UNIT3 7
What to Represent?
Following are the kind of knowledge which needs to be represented in AI systems:
•Object: All the facts about objects in our world domain. E.g., Guitars contains
strings, trumpets are brass instruments.
•Events: Events are the actions which occur in our world.
•Performance: It describe behavior which involves knowledge about how to do
things.
•Meta-knowledge: It is knowledge about what we know.
•Facts: Facts are the truths about the real world and what we represent.
•Knowledge-Base: The central component of the knowledge-based agents is the
knowledge base. It is represented as KB. The Knowledgebase is a group of the
Sentences (Here, sentences are used as a technical term and not identical with the
English language).
17-03-2021 18CSC305J_AI_UNIT3 8
Approaches to knowledge Representation
• Representational adequacy the ability to represent all of the kinds of
knowledge that are needed in that domain.
• Inferential Adequacy: - the ability to manipulate the representation structures
in such a way as to derive new structures corresponding to new knowledge
inferred from ol.
• Inferential Efficiency: - the ability to incorporate into the knowledge structure
additional information that can be used to focus the attention of the inference
mechanism in the most promising directions.
• Acquisitioned Efficiency: - the ability to acquire new information easily. The
simplest case involves direct insertion by a person of new knowledge into the
database.
Knowledge Representation Issues
• It becomes clear that particular knowledge representation models allow for more specific more powerful
problem solving mechanisms that operate on them.
• Examine specific techniques that can be used for representing & manipulating knowledge within programs.
• Representation & Mapping
• Facts :- truths in some relevant world
• These are the things we want to represent.
• Representations of facts in some chosen formalism.
• Things we are actually manipulating. Structuring these entities is as two levels.
• The knowledge level, at which facts concluding each agents behavior & current goals are described.
Facts Internal Representations
English understanding English generation

English Representations
Table of Contents
• Knowledge and reasoning-Approaches and issues of knowledge reasoning-Knowledge
base agents
• Logic Basics-Logic-Propositional logic-syntax ,semantics and inferences-Propositional
logic- Reasoning patterns
• Probabilistic reasoning-Probabilistic reasoning over time-Probabilistic reasoning over
time
17-03-2021 18CSC305J_AI_UNIT3 11
A KNOWLEDGE-BASED AGENT
• A knowledge-based agent includes a knowledge base and an inference system.
• A knowledge base is a set of representations of facts of the world.
• Each individual representation is called a sentence.
• The sentences are expressed in a knowledge representation language.
• The agent operates as follows:
1. It TELLs the knowledge base what it perceives.
2. It ASKs the knowledge base what action it should perform.
3. It performs the chosen action.
12
17-03-2021 18CSC305J_AI_UNIT3
Requirements for a Knowledge-Based Agent
1. \what it already knows" [McCarthy '59]
A knowledge base of beliefs.
2. \it must rst be capable of being told" [McCarthy '59]
A way to put new beliefs into the knowledge base.
3. \automatically deduces for itself a suciently wide class of
immediate consequences" [McCarthy '59]
A reasoning mechanism to derive new beliefs from ones already
in the knowledge base.
17-03-2021 18CSC305J_AI_UNIT3 13
ARCHITECTURE OF A KNOWLEDGE-BASED
AGENT
• Knowledge Level.
• The most abstract level: describe agent by saying what it knows.
• Example: A taxi agent might know that the Golden Gate Bridge connects San
Francisco with the Marin County.
• Logical Level.
• The level at which the knowledge is encoded into sentences.
• Example: Links(GoldenGateBridge, SanFrancisco, MarinCounty).
• Implementation Level.
• The physical representation of the sentences in the logical level.
• Example: ‘(links goldengatebridge sanfrancisco marincounty)
14
17-03-2021 18CSC305J_AI_UNIT3
THE WUMPUS WORLD ENVIRONMENT
• The Wumpus computer game
• The agent explores a cave consisting of rooms connected by passageways.
• Lurking somewhere in the cave is the Wumpus, a beast that eats any agent that
enters its room.
• Some rooms contain bottomless pits that trap any agent that wanders into the
room.
• Occasionally, there is a heap of gold in a room.
• The goal is to collect the gold and exit the world without being eaten
15
17-03-2021 18CSC305J_AI_UNIT3
A TYPICAL WUMPUS WORLD
• The agent always starts in the
field [1,1].
• The task of the agent is to
find the gold, return to the
field [1,1] and climb out of
the cave.
17-03-2021
16 18CSC305J_AI_UNIT3
AGENT IN A WUMPUS WORLD: PERCEPTS
• The agent perceives
• a stench in the square containing the Wumpus and in the adjacent squares (not
diagonally)
• a breeze in the squares adjacent to a pit
• a glitter in the square where the gold is
• a bump, if it walks into a wall
• a woeful scream everywhere in the cave, if the wumpus is killed
• The percepts are given as a five-symbol list. If there is a stench and a breeze, but no
glitter, no bump, and no scream, the percept is
[Stench, Breeze, None, None, None]
17
17-03-2021 18CSC305J_AI_UNIT3
WUMPUS WORLD ACTIONS
• go forward
• turn right 90 degrees
• turn left 90 degrees
• grab: Pick up an object that is in the same square as the agent
• shoot: Fire an arrow in a straight line in the direction the agent is facing. The arrow
continues until it either hits and kills the wumpus or hits the outer wall. The agent
has only one arrow, so only the first Shoot action has any effect
• climb is used to leave the cave. This action is only effective in the start square
• die: This action automatically and irretrievably happens if the agent enters a square
with a pit or a live wumpus
18
17-03-2021 18CSC305J_AI_UNIT3
ILLUSTRATIVE EXAMPLE: WUMPUS WORLD
•Performance measure
• gold +1000,
• death -1000
(falling into a pit or being eaten by the wumpus)
• -1 per step, -10 for using the arrow
•Environment
• Rooms / squares connected by doors.
• Squares adjacent to wumpus are smelly
• Squares adjacent to pit are breezy
• Glitter iff gold is in the same square
• Shooting kills wumpus if you are facing it
• Shooting uses up the only arrow
• Grabbing picks up gold if in same square
• Releasing drops the gold in same square
• Randomly generated at start of game. Wumpus only senses current room.
•Sensors: Stench, Breeze, Glitter, Bump, Scream [perceptual inputs]
•Actuators: Left turn, Right turn, Forward, Grab, Release, Shoot
17-03-2021 18CSC305J_AI_UNIT3 19
WUMPUS WORLD CHARACTERIZATION
Fully Observable No – only local perception
Deterministic Yes – outcomes exactly specified
Static Yes – Wumpus and Pits do not move
Discrete Yes
Single-agent? Yes – Wumpus is essentially a “natural feature.”
17-03-2021 18CSC305J_AI_UNIT3 20
EXPLORING A WUMPUS WORLD
The knowledge base of the agent
consists of the rules of the
Wumpus world plus the percept
“nothing” in [1,1]
Boolean percept
feature values:
<0, 0, 0, 0, 0>
None, none, none, none, none
Stench, Breeze, Glitter, Bump, Scream
17-03-2021 18CSC305J_AI_UNIT3 21
T=0 The KB of the agent consists of

None, none, none, none, none the rules of the Wumpus world plus
the percept “nothing” in [1,1].
Stench, Breeze, Glitter, Bump, Scream By inference, the agent’s knowledge
base also has the information that
World “known” to agent [2,1] and [1,2] are okay.
at time = 0. Added as propositions.
17-03-2021 18CSC305J_AI_UNIT3 22
T=0 T=1
P?
A/B P?
V
None, breeze, none, none, none

None, none, none, none, none A – agent
V – visited
Stench, Breeze, Glitter, Bump, Scream B - breeze
Where next? @ T = 1 What follows?
Pit(2,2) or Pit(3,1)
17-03-2021 18CSC305J_AI_UNIT3 23
4 T=3
W
3 S
P
S P?
2
P
1 P?
1 2 3 4
Stench, none, none, none, none
Stench, Breeze, Glitter, Bump, Scream

Where is Wumpus? Wumpus cannot be in (1,1) or in (2,2) (Why?)➔ Wumpus in (1,3)
Not breeze in (1,2) ➔ no pit in (2,2); but we know there is
pit in (2,2) or (3,1) ➔ pit in (3,1)
17-03-2021 18CSC305J_AI_UNIT3 24
We reasoned about the possible states the Wumpus world can be in,
given our percepts and our knowledge of the rules of the Wumpus
world.
I.e., the content of KB at T=3.
W
What follows is what holds true in all those worlds that satisfy what is
known at that time T=3 about the particular Wumpus world we are in.
P
Example property: P_in_(3,1)
Models(KB) Models(P_in_(3,1)) P
Essence of logical reasoning:

Given all we know, Pit_in_(3,1) holds.
(“The world cannot be different.”)
17-03-2021 18CSC305J_AI_UNIT3 25
NO INDEPENDENT ACCESS TO THE WORLD
• The reasoning agent often gets its knowledge about the facts of the world as a sequence of
logical sentences and must draw conclusions only from them without independent access to
the world.
• Thus it is very important that the agent’s reasoning is sound!
26
17-03-2021 18CSC305J_AI_UNIT3
SUMMARY OF KNOWLEDGE BASED AGENTS
• Intelligent agents need knowledge about the world for making good decisions.
• The knowledge of an agent is stored in a knowledge base in the form of sentences in a
knowledge representation language.
• A knowledge-based agent needs a knowledge base and an inference mechanism. It
operates by storing sentences in its knowledge base, inferring new sentences with the
inference mechanism, and using them to deduce which actions to take.
• A representation language is defined by its syntax and semantics, which specify the
structure of sentences and how they relate to the facts of the world.
• The interpretation of a sentence is the fact to which it refers. If this fact is part of the
actual world, then the sentence is true.
27
17-03-2021 18CSC305J_AI_UNIT3
Table of Contents
• Knowledge and reasoning-Approaches and issues of knowledge reasoning-Knowledge base
agents
• Logic Basics-Logic-Propositional logic-syntax ,semantics and inferences-Propositional logic-
Reasoning patterns
• Unification and Resolution-Knowledge representation using rules-Knowledge representation
using semantic nets
• Probabilistic reasoning-Probabilistic reasoning over time-Probabilistic reasoning over time
17-03-2021 18CSC305J_AI_UNIT3 28
What is a Logic?
• A language with concrete rules
• No ambiguity in representation (may be other errors!)
• Allows unambiguous communication and processing
• Very unlike natural languages e.g. English
• Many ways to translate between languages
• A statement can be represented in different logics
• And perhaps differently in same logic
• Expressiveness of a logic
• How much can we say in this language?
• Not to be confused with logical reasoning
• Logics are languages, reasoning is a process (may use logic)
18CSC305J_AI_UNIT3 29
17-03-2021
Syntax and Semantics
• Syntax
• Rules for constructing legal sentences in the logic
• Which symbols we can use (English: letters, punctuation)
• How we are allowed to combine symbols
• Semantics
• How we interpret (read) sentences in the logic
• Assigns a meaning to each sentence
• Example: “All lecturers are seven foot tall”
• A valid sentence (syntax)
• And we can understand the meaning (semantics)
• This sentence happens to be false (there is a counterexample)
Propositional Logic
• Syntax
• Propositions, e.g. “it is wet”
• Connectives: and, or, not, implies, iff (equivalent)
• Brackets, T (true) and F (false)

• Semantics (Classical AKA Boolean)
• Define how connectives affect truth
• “P and Q” is true if and only if P is true and Q is true
• Use truth tables to work out the truth of statements
Predicate Logic
• Propositional logic combines atoms
• An atom contains no propositional connectives
• Have no structure (today_is_wet, john_likes_apples)
• Predicates allow us to talk about objects
• Properties: is_wet(today)
• Relations: likes(john, apples)
• True or false
• In predicate logic each atom is a predicate
• e.g. first order logic, higher-order logic
First Order Logic
• More expressive logic than propositional

• Used in this course (Lecture 6 on representation in FOL)
• Constants are objects: john, apples
• Predicates are properties and relations:
• likes(john, apples)
• Functions transform objects:
• likes(john, fruit_of(apple_tree))
• Variables represent any object: likes(X, apples)
• Quantifiers qualify values of variables
• True for all objects (Universal): X. likes(X, apples)
• Exists at least one object (Existential): X. likes(X, apples)
Example: FOL Sentence
• “Every rose has a thorn”
• For all X
• if (X is a rose)
• then there exists Y
• (X has Y) and (Y is a thorn)
Example: FOL Sentence
• “On Mondays and Wednesdays I go to John’s house for dinner”
⚫ Note the change from “and” to “or”

– Translating is problematic
Higher Order Logic
• More expressive than first order
• Functions and predicates are also objects
• Described by predicates: binary(addition)
• Transformed by functions: differentiate(square)
• Can quantify over both
• E.g. define red functions as having zero at 17
• Much harder to reason with

Beyond True and False
• Multi-valued logics
• More than two truth values
• e.g., true, false & unknown
• Fuzzy logic uses probabilities, truth value in [0,1]
• Modal logics
• Modal operators define mode for propositions
• Epistemic logics (belief)
• e.g. p (necessarily p), p (possibly p), …
• Temporal logics (time)
• e.g. p (always p), p (eventually p), …
Table of Contents
• Knowledge and reasoning-Approaches and issues of knowledge reasoning-
Knowledge base agents
• Logic Basics-Logic-Propositional logic-syntax ,semantics and inferences-
Propositional logic- Reasoning patterns
• Uncertain Knowledge and reasoning-Methods-Bayesian probability and belief
network
• Probabilistic reasoning-Probabilistic reasoning over time-Probabilistic
reasoning over time
17-03-2021 18CSC305J_AI_UNIT3 38
Table of Contents
network
reasoning over time
17-03-2021 18CSC305J_AI_UNIT3 39
Propositional logic
• Propositional logic consists of:
• The logical values true and false (T and F)
• Propositions: “Sentences,” which
• Are atomic (that is, they must be treated as indivisible units, with
no internal structure), and
• Have a single logical value, either true or false
• Operators, both unary and binary; when applied to logical values, yield
logical values
• The usual operators are and, or, not, and implies
40
17-03-2021 18CSC305J_AI_UNIT3
Truth tables
• Logic, like arithmetic, has operators, which apply to one, two, or more
values (operands)
• A truth table lists the results for each possible arrangement of operands
• Order is important: x op y may or may not give the same result as y op x
• The rows in a truth table list all possible sequences of truth values for n
operands, and specify a result for each sequence
• Hence, there are 2n rows in a truth table for n operands
41
17-03-2021 18CSC305J_AI_UNIT3
Unary operators
• There are four possible unary operators:
X Identity, (X)
X Constant true, (T) T T
T T F F
F T
X Negation, ¬X
X Constant false, (F)
T F
T F
F T
F F
• Only the last of these (negation) is widely used (and has a symbol,¬ ,for the operation
42
17-03-2021 18CSC305J_AI_UNIT3
Combined tables for unary operators
X Constant T Constant F Identity ¬X

T T F T F
F T F F T
43
Binary operators
• There are sixteen possible binary operators:
X Y
T T T T T T T T T T F F F F F F F F
T F T T T T F F F F T T T T F F F F
F T T T F F T T F F T T F F T T F F
F F T F T F T F T F T F T F T F T F
• All these operators have names, but I haven’t tried to fit them in
• Only a few of these operators are normally used in logic
44
17-03-2021 18CSC305J_AI_UNIT3
Useful binary operators
• Here are the binary operators that are traditionally used:
AND OR IMPLIES BICONDITIONAL

X Y XY XY XY XY
T T T T T T
T F F T F F
F T F T T F
F F F F T T
• Notice in particular that material implication () only approximately means the same as the
English word “implies”
• All the other operators can be constructed from a combination of these (along with unary
not,
45
¬)
17-03-2021 18CSC305J_AI_UNIT3
Logical expressions
• All logical expressions can be computed with some combination of and (),
or (), and not () operators
• For example, logical implication can be computed this way:
X Y X X  Y XY
T T F T T
T F F F F
F T T T T
F F T T T
• Notice that X  Y is equivalent to X  Y
46
17-03-2021 18CSC305J_AI_UNIT3
Another example
• Exclusive or (xor) is true if exactly one of its operands is true
X Y X Y X  Y X  Y (XY)(XY) X xor Y
T T F F F F F F
T F F T F T T T
F T T F T F T T
F F T T F F F F
• Notice that (XY)(XY) is equivalent to X xor Y
47
17-03-2021 18CSC305J_AI_UNIT3
World
• A world is a collection of prepositions and logical expressions relating those
prepositions
• Example:
• Propositions: JohnLovesMary, MaryIsFemale, MaryIsRich
• Expressions:
MaryIsFemale  MaryIsRich  JohnLovesMary
• A proposition “says something” about the world, but since it is atomic (you
can’t look inside it to see component parts), propositions tend to be very
specialized and inflexible
48
17-03-2021 18CSC305J_AI_UNIT3
Models
A model is an assignment of a truth value to each proposition, for example:
• JohnLovesMary: T, MaryIsFemale: T, MaryIsRich: F
• An expression is satisfiable if there is a model for which the expression is true
• For example, the above model satisfies the expression
• An expression is valid if it is satisfied by every model
• This expression is not valid:
because it is not satisfied by this model:
JohnLovesMary: F, MaryIsFemale: T, MaryIsRich: T
• But this expression is valid:
MaryIsFemale  MaryIsRich  MaryIsFemale
49
17-03-2021 18CSC305J_AI_UNIT3
Inference rules in propositional logic
• Here are just a few of the rules you can apply when reasoning in propositional logic:
50
17-03-2021 18CSC305J_AI_UNIT3
Implication elimination
• A particularly important rule allows you to get rid of
the implication operator,  :
• X  Y  X  Y
• We will use this later on as a necessary tool for
simplifying logical expressions
• The symbol  means “is logically equivalent to”
51
17-03-2021 18CSC305J_AI_UNIT3
Conjunction elimination
• Another important rule for simplifying logical expressions
allows you to get rid of the conjunction (and) operator,  :
• This rule simply says that if you have an and operator at the
top level of a fact (logical expression), you can break the
expression up into two separate facts:
• MaryIsFemale  MaryIsRich
• becomes:
• MaryIsFemale
• MaryIsRich
52
17-03-2021 18CSC305J_AI_UNIT3
Inference by computer
• To do inference (reasoning) by computer is basically a search process,
taking logical expressions and applying inference rules to them
• Which logical expressions to use?
• Which inference rules to apply?
• Usually you are trying to “prove” some particular statement
• Example:
• it_is_raining  it_is_sunny
• it_is_sunny  I_stay_dry
• it_is_rainy  I_take_umbrella
• I_take_umbrella  I_stay_dry
53• To prove: I_stay_dry
17-03-2021 18CSC305J_AI_UNIT3
Table of Contents
network
reasoning over time
17-03-2021 18CSC305J_AI_UNIT3 54
Reasoning Patterns
• Inference in propositional logic is NP-complete!
• However, inference in propositional logic shows
monoticity:
• Adding more rules to a knowledge base does not
affect earlier inferences
Forward and backward reasoning
• Situation: You have a collection of logical expressions (premises), and
you are trying to prove some additional logical expression (the
conclusion)
• You can:
• Do forward reasoning: Start applying inference rules to the logical
expressions you have, and stop if one of your results is the
conclusion you want
• Do backward reasoning: Start from the conclusion you want, and
try to choose inference rules that will get you back to the logical
expressions you have
• With the tools we have discussed so far, neither is feasible
56
17-03-2021 18CSC305J_AI_UNIT3
Example
• Given:
• it_is_raining  I_take_umbrella
• You can conclude:
• it_is_sunny  it_is_raining
• I_take_umbrella  it_is_sunny
• I_stay_dry  I_take_umbrella
• Etc., etc. ... there are just too many things you can conclude!
57
17-03-2021 18CSC305J_AI_UNIT3
Predicate calculus
• Predicate calculus is also known as “First Order Logic” (FOL)
• Predicate calculus includes:
• All of propositional logic
• Logical values true, false
• Variables x, y, a, b,...
• Connectives , , , , 
• Constants KingJohn, 2, Villanova,...
• Predicates Brother, >,...
• Functions Sqrt, MotherOf,...
• Quantifiers , 
58
17-03-2021 18CSC305J_AI_UNIT3
Constants, functions, and predicates
• A constant represents a “thing”--it has no truth value, and it
does not occur “bare” in a logical expression
• Examples: DavidMatuszek, 5, Earth, goodIdea
• Given zero or more arguments, a function produces a
constant as its value:
• Examples: motherOf(DavidMatuszek), add(2, 2),
thisPlanet()
• A predicate is like a function, but produces a truth value
• Examples: greatInstructor(DavidMatuszek),
59 isPlanet(Earth), greater(3, add(2, 2))
17-03-2021 18CSC305J_AI_UNIT3
Universal quantification
• The universal quantifier, , is read as “for each”
or “for every”
• Example: x, x2  0 (for all x, x2 is greater than or equal to zero)
• Typically,  is the main connective with :
x, at(x,Villanova)  smart(x)
means “Everyone at Villanova is smart”
• Common mistake: using  as the main connective with :
x, at(x,Villanova)  smart(x)
means “Everyone is at Villanova and everyone is smart”
• If there are no values satisfying the condition, the result is true
• Example: x, isPersonFromMars(x)  smart(x) is true
60
17-03-2021 18CSC305J_AI_UNIT3
Existential quantification
• The existential quantifier, , is read “for some” or “there exists”
• Example: x, x2 < 0 (there exists an x such that x2 is less than zero)
• Typically,  is the main connective with :
x, at(x,Villanova)  smart(x)
means “There is someone who is at Villanova and is smart”
• Common mistake: using  as the main connective with :
x, at(x,Villanova)  smart(x)
This is true if there is someone at Villanova who is smart...
...but it is also true if there is someone who is not at Villanova
By the rules of material implication, the result of F  T is T
61
17-03-2021 18CSC305J_AI_UNIT3
Properties of quantifiers
• x y is the same as y x
• x y is the same as y x
• x y is not the same as y x

• x y Loves(x,y)
• “There is a person who loves everyone in the world”
• More exactly: x y (person(x)  person(y)  Loves(x,y))
• y x Loves(x,y)
• “Everyone in the world is loved by at least one person”
• Quantifier duality: each can be expressed using the other

• x Likes(x,IceCream) x Likes(x,IceCream)
• x Likes(x,Broccoli) x Likes(x,Broccoli)
62
17-03-2021 18CSC305J_AI_UNIT3
Parentheses
• Parentheses are often used with quantifiers
• Unfortunately, everyone uses them differently, so don’t be upset at any
usage you see
• Examples:
• (x) person(x)  likes(x,iceCream)
• (x) (person(x)  likes(x,iceCream))
• (x) [ person(x)  likes(x,iceCream) ]
• x, person(x)  likes(x,iceCream)
• x (person(x)  likes(x,iceCream))
• I prefer parentheses that show the scope of the quantifier
• x (x > 0)  x (x < 0)
63
17-03-2021 18CSC305J_AI_UNIT3
More rules
• Now there are numerous additional rules we can apply!
• Here are two exceptionally important rules:
• x, p(x)  x, p(x)
“If not every x satisfies p(x), then there exists a x that does not satisfy
p(x)”
• x, p(x)  x, p(x)
“If there does not exist an x that satisfies p(x), then all x do not satisfy
p(x)”
• In any case, the search space is just too large to be feasible
• This was the case until 1970, when J. Robinson discovered resolution
64
17-03-2021 18CSC305J_AI_UNIT3
Table of Contents
network
reasoning over time
17-03-2021 18CSC305J_AI_UNIT3 65
Logic by computer was infeasible
• Why is logic so hard?
• You start with a large collection of facts (predicates)
• You start with a large collection of possible transformations (rules)
• Some of these rules apply to a single fact to yield a new fact
• Some of these rules apply to a pair of facts to yield a new fact
• So at every step you must:
• Choose some rule to apply
• Choose one or two facts to which you might be able to apply the rule
• If there are n facts
• There are n potential ways to apply a single-operand rule
• There are n * (n - 1) potential ways to apply a two-operand rule
• Add the new fact to your ever-expanding fact base
66
17-03-2021 18CSC305J_AI_UNIT3
• The search space is huge!
The magic of resolution
• Here’s how resolution works:
• You transform each of your facts into a particular form, called a clause
(this is the tricky part)
• You apply a single rule, the resolution principle, to a pair of clauses
• Clauses are closed with respect to resolution--that is, when you
resolve two clauses, you get a new clause
• You add the new clause to your fact base
• So the number of facts you have grows linearly
• You still have to choose a pair of facts to resolve
• You never have to choose a rule, because there’s only one
67
17-03-2021 18CSC305J_AI_UNIT3
The fact base
• A fact base is a collection of “facts,” expressed in predicate calculus, that are presumed to be true (valid)
• These facts are implicitly “anded” together
• Example fact base:
• seafood(X)  likes(John, X) (where X is a variable)
• seafood(shrimp)
• pasta(X)  likes(Mary, X) (where X is a different variable)
• pasta(spaghetti)
• That is,
• (seafood(X)  likes(John, X))  seafood(shrimp) 
(pasta(Y)  likes(Mary, Y))  pasta(spaghetti)
• Notice that we had to change some Xs to Ys
• The scope of a variable is the single fact in which it occurs
68
17-03-2021 18CSC305J_AI_UNIT3
Clause form
• A clause is a disjunction ("or") of zero or more literals, some or all of
which may be negated
• Example:
sinks(X)  dissolves(X, water)  ¬denser(X, water)
• Notice that clauses use only “or” and “not”—they do not use “and,”
“implies,” or either of the quantifiers “for all” or “there exists”
• The impressive part is that any predicate calculus expression can be
put into clause form
• Existential quantifiers, , are the trickiest ones
69
17-03-2021 18CSC305J_AI_UNIT3
Unification
• From the pair of facts (not yet clauses, just facts):
• seafood(X)  likes(John, X) (where X is a variable)
• seafood(shrimp)
• We ought to be able to conclude
• likes(John, shrimp)
• We can do this by unifying the variable X with the constant shrimp
• This is the same “unification” as is done in Prolog
• This unification turns seafood(X)  likes(John, X) into
seafood(shrimp)  likes(John, shrimp)
• Together with the given fact seafood(shrimp), the final deductive
70
17-03-2021 18CSC305J_AI_UNIT3
step is easy
The resolution principle
• Here it is:
• From X  someLiterals
and X  someOtherLiterals
----------------------------------------------
conclude: someLiterals  someOtherLiterals
• That’s all there is to it!
• Example:
• broke(Bob)  well-fed(Bob)
¬broke(Bob)  ¬hungry(Bob)
--------------------------------------
well-fed(Bob)  ¬hungry(Bob)
71
17-03-2021 18CSC305J_AI_UNIT3
A common error
• You can only do one resolution at a time
• Example:
• broke(Bob)  well-fed(Bob)  happy(Bob)
¬broke(Bob)  ¬hungry(Bob) ∨ ¬happy(Bob)
• You can resolve on broke to get:
• well-fed(Bob)  happy(Bob)  ¬hungry(Bob)  ¬happy(Bob)  T
• Or you can resolve on happy to get:
• broke(Bob)  well-fed(Bob)  ¬broke(Bob)  ¬hungry(Bob)  T
• Note that both legal resolutions yield a tautology (a trivially true statement, containing X
 ¬X), which is correct but useless
• But you cannot resolve on both at once to get:
• well-fed(Bob)  ¬hungry(Bob)
72
17-03-2021 18CSC305J_AI_UNIT3
Contradiction
• A special case occurs when the result of a resolution (the resolvent) is
empty, or “NIL”
• Example:
• hungry(Bob)
¬hungry(Bob)
----------------
NIL
• In this case, the fact base is inconsistent
• This will turn out to be a very useful observation in doing resolution
theorem proving
73
17-03-2021 18CSC305J_AI_UNIT3
A first example
• “Everywhere that John goes, Rover goes. John is at school.”
• at(John, X)  at(Rover, X) (not yet in clause form)
• at(John, school) (already in clause form)
• We use implication elimination to change the first of these into clause
form:
• at(John, X)  at(Rover, X)
• at(John, school)
• We can resolve these on at(-, -), but to do so we have to unify X with
school; this gives:
• at(Rover, school)
74
17-03-2021 18CSC305J_AI_UNIT3
Refutation resolution
• The previous example was easy because it had very few clauses
• When we have a lot of clauses, we want to focus our search on the
thing we would like to prove
• We can do this as follows:
• Assume that our fact base is consistent (we can’t derive NIL)
• Add the negation of the thing we want to prove to the fact base
• Show that the fact base is now inconsistent
• Conclude the thing we want to prove
75
17-03-2021 18CSC305J_AI_UNIT3
Example of refutation resolution
• “Everywhere that John goes, Rover goes. John is at school. Prove that Rover is
at school.”
1. at(John, X)  at(Rover, X)
2. at(John, school)
3. at(Rover, school) (this is the added clause)
• Resolve #1 and #3:
4. at(John, X)
• Resolve #2 and #4:
5. NIL
• Conclude the negation of the added clause: at(Rover, school)
• This seems a roundabout approach for such a simple example, but it works well
for larger problems
76
17-03-2021 18CSC305J_AI_UNIT3
A second example
• Start with:
• it_is_raining  I_take_umbrella
• Proof:
• Convert to clause form:
6. (5, 2) it_is_sunny
1. it_is_raining  it_is_sunny
2. it_is_sunny  I_stay_dry 7. (6, 1) it_is_raining
3. it_is_raining  I_take_umbrella 8. (5, 4) I_take_umbrella
4. I_take_umbrella  I_stay_dry 9. (8, 3) it_is_raining
• Prove that I stay dry: 10. (9, 7) NIL
5. I_stay_dry ▪ Therefore, (I_stay_dry)
▪ I_stay_dry
77
17-03-2021 18CSC305J_AI_UNIT3
Converting sentences to CNF
1. Eliminate all ↔ connectives
(P ↔ Q)  ((P → Q) ^ (Q → P))
2. Eliminate all → connectives
(P → Q)  (P  Q)
3. Reduce the scope of each negation symbol to a single predicate
P  P
(P  Q)  P  Q
(P  Q)  P  Q
(x)P  (x)P
(x)P  (x)P
4. Standardize variables: rename all variables so that each quantifier has its own
unique variable name
78
17-03-2021 18CSC305J_AI_UNIT3
Converting sentences to clausal form Skolem
constants and functions
5. Eliminate existential quantification by introducing Skolem
constants/functions
(x)P(x)  P(c)
c is a Skolem constant (a brand-new constant symbol that is not used in any
other sentence)
(x)(y)P(x,y)  (x)P(x, f(x))
since  is within the scope of a universally quantified variable, use a Skolem
function f to construct a new value that depends on the universally
quantified variable
f must be a brand-new function name not occurring in any other sentence in
the KB.
E.g., (x)(y)loves(x,y)  (x)loves(x,f(x))
In this case, f(x) specifies the person that x loves
79
17-03-2021 18CSC305J_AI_UNIT3
Converting sentences to clausal form
6. Remove universal quantifiers by (1) moving them all to the left end;
(2) making the scope of each the entire sentence; and (3) dropping
the “prefix” part
Ex: (x)P(x)  P(x)
7. Put into conjunctive normal form (conjunction of disjunctions) using
distributive and associative laws
(P  Q)  R  (P  R)  (Q  R)
(P  Q)  R  (P  Q  R)
8. Split conjuncts into separate clauses
9. Standardize variables so each clause contains only variable names
that do not occur in any other clause
80
17-03-2021 18CSC305J_AI_UNIT3
An example
(x)(P(x) → ((y)(P(y) → P(f(x,y)))  (y)(Q(x,y) → P(y))))
2. Eliminate →
(x)(P(x)  ((y)(P(y)  P(f(x,y)))  (y)(Q(x,y)  P(y))))
3. Reduce scope of negation
(x)(P(x)  ((y)(P(y)  P(f(x,y))) (y)(Q(x,y)  P(y))))
4. Standardize variables
(x)(P(x)  ((y)(P(y)  P(f(x,y))) (z)(Q(x,z)  P(z))))
5. Eliminate existential quantification
(x)(P(x) ((y)(P(y)  P(f(x,y))) (Q(x,g(x))  P(g(x)))))
6. Drop universal quantification symbols
(P(x)  ((P(y)  P(f(x,y))) (Q(x,g(x))  P(g(x)))))
81
17-03-2021 18CSC305J_AI_UNIT3
Example
7. Convert to conjunction of disjunctions
(P(x)  P(y)  P(f(x,y)))  (P(x)  Q(x,g(x))) 
(P(x)  P(g(x)))
8. Create separate clauses
P(x)  P(y)  P(f(x,y))
P(x)  Q(x,g(x))
P(x)  P(g(x))
9. Standardize variables
P(x)  P(y)  P(f(x,y))
P(z)  Q(z,g(z))
P(w)  P(g(w))
82
17-03-2021 18CSC305J_AI_UNIT3
Running example
• All Romans who know Marcus either hate Caesar or
think that anyone who hates anyone is crazy
• x, [ Roman(x)  know(x, Marcus) ] 

[ hate(x, Caesar) 
(y, z, hate(y, z)  thinkCrazy(x, y))]
83
17-03-2021 18CSC305J_AI_UNIT3
Step 1: Eliminate implications
• Use the fact that x  y is equivalent to x  y
• x, [ Roman(x)  know(x, Marcus) ] 

[ hate(x, Caesar) 
(y, z, hate(y, z)  thinkCrazy(x, y))]
• x, [ Roman(x)  know(x, Marcus) ] 

[hate(x, Caesar) 
(y, (z, hate(y, z)  thinkCrazy(x, y))]
84
17-03-2021 18CSC305J_AI_UNIT3
Step 2: Reduce the scope of 
• Reduce the scope of negation to a single term, using:
• (p)  p
• (a  b)  (a  b)
• (a  b)  (a  b)
• x, p(x)  x, p(x)
• x, p(x)  x, p(x)
• x, [ Roman(x)  know(x, Marcus) ] 

(y, (z, hate(y, z)  thinkCrazy(x, y))]
• x, [ Roman(x)  know(x, Marcus) ] 

(y, z, hate(y, z)  thinkCrazy(x, y))]
85
17-03-2021 18CSC305J_AI_UNIT3
Step 3: Standardize variables apart
• x, P(x)  x, Q(x)
becomes
x, P(x)  y, Q(y)
• This is just to keep the scopes of variables from
getting confused
• Not necessary in our running example
86
17-03-2021 18CSC305J_AI_UNIT3
Step 4: Move quantifiers
• Move all quantifiers to the left, without changing their relative
positions
• x, [ Roman(x)  know(x, Marcus) ] 

(y, z, hate(y, z)  thinkCrazy(x, y)]
• x, y, z,[ Roman(x)  know(x, Marcus) ] 

(hate(y, z)  thinkCrazy(x, y))]
87
17-03-2021 18CSC305J_AI_UNIT3
Step 5: Eliminate existential quantifiers
• We do this by introducing Skolem functions:

• If x, p(x) then just pick one; call it x’
• If the existential quantifier is under control of a
universal quantifier, then the picked value has to be
a function of the universally quantified variable:
• If x, y, p(x, y) then x, p(x, y(x))
88
17-03-2021 18CSC305J_AI_UNIT3
Step 6: Drop the prefix (quantifiers)
• x, y, z,[ Roman(x)  know(x, Marcus) ] 
[hate(x, Caesar)  (hate(y, z)  thinkCrazy(x, y))]
• At this point, all the quantifiers are universal quantifiers
• We can just take it for granted that all variables are
universally quantified
•[ Roman(x)  know(x, Marcus) ] 
89
17-03-2021 18CSC305J_AI_UNIT3
Step 7: Create a conjunction of disjuncts
• [ Roman(x)  know(x, Marcus) ] 

becomes
Roman(x)  know(x, Marcus) 

hate(x, Caesar)  hate(y, z)  thinkCrazy(x, y)
90
17-03-2021 18CSC305J_AI_UNIT3
Step 8: Create separate clauses
• Every place we have an , we break our expression up
into separate pieces
91
17-03-2021 18CSC305J_AI_UNIT3
Step 9: Standardize apart
• Rename variables so that no two clauses have the same
variable
• Final result:
Roman(x)  know(x, Marcus) 
hate(x, Caesar)  hate(y, z)  thinkCrazy(x, y)
• That’s it! It’s a long process, but easy enough to do

92mechanically
17-03-2021 18CSC305J_AI_UNIT3
Resolution
• Resolution is a sound and complete inference procedure for FOL
• Reminder: Resolution rule for propositional logic:
• P1  P2  ...  Pn
• P1  Q2  ...  Qm
• Resolvent: P2  ...  Pn  Q2  ...  Qm
• Examples
• P and  P  Q : derive Q (Modus Ponens)
• ( P  Q) and ( Q  R) : derive  P  R
• P and  P : derive False [contradiction!]
• (P  Q) and ( P   Q) : derive True
93
17-03-2021 18CSC305J_AI_UNIT3
Resolution in first-order logic
• Given sentences
P1  ...  Pn
Q1  ...  Qm
• in conjunctive normal form:
• each Pi and Qi is a literal, i.e., a positive or negated predicate symbol with its
terms,
• if Pj and Qk unify with substitution list θ, then derive the resolvent sentence:
subst(θ, P1 ...  Pj-1  Pj+1 ... Pn  Q1  …Qk-1  Qk+1 ...  Qm)
• Example
• from clause P(x, f(a))  P(x, f(y))  Q(y)
• and clause P(z, f(a))  Q(z)
• derive resolvent P(z, f(y))  Q(y)  Q(z)
• using θ = {x/z}
94
17-03-2021 18CSC305J_AI_UNIT3
Resolution refutation
• Given a consistent set of axioms KB and goal sentence Q, show that KB
|= Q
• Proof by contradiction: Add Q to KB and try to prove false.
i.e., (KB |- Q) ↔ (KB  Q |- False)
• Resolution is refutation complete: it can establish that a given sentence
Q is entailed by KB, but can’t (in general) be used to generate all logical
consequences of a set of sentences
• Also, it cannot be used to prove that Q is not entailed by KB.
• Resolution won’t always give an answer since entailment is only
semidecidable
• And you can’t just run two proofs in parallel, one trying to prove Q and the
other trying to prove Q, since KB might not entail either one
96
17-03-2021 18CSC305J_AI_UNIT3
Refutation resolution proof tree
allergies(w) v sneeze(w) cat(y) v ¬allergic-to-cats(z)  allergies(z)
w/z
cat(y) v sneeze(z)  ¬allergic-to-cats(z) cat(Felix)
y/Felix
sneeze(z) v ¬allergic-to-cats(z) allergic-to-cats(Lise)
z/Lise
sneeze(Lise) sneeze(Lise)
false
negated query
97
17-03-2021 18CSC305J_AI_UNIT3
We need answers to the following questions
• How to convert FOL sentences to conjunctive normal form (a.k.a. CNF,

clause form): normalization and skolemization
• How to unify two argument lists, i.e., how to find their most general
unifier (mgu) q: unification
• How to determine which two clauses in KB should be resolved next
(among all resolvable pairs of clauses) : resolution (search) strategy
98
17-03-2021 18CSC305J_AI_UNIT3
Unification
• Unification is a “pattern-matching” procedure
• Takes two atomic sentences, called literals, as input
• Returns “Failure” if they do not match and a substitution list, θ, if they do
• That is, unify(p,q) = θ means subst(θ, p) = subst(θ, q) for two atomic
sentences, p and q
• θ is called the most general unifier (mgu)
• All variables in the given two literals are implicitly universally
quantified
• To make literals match, replace (universally quantified) variables by
terms
99
17-03-2021 18CSC305J_AI_UNIT3
Unification algorithm
procedure unify(p, q, θ)
Scan p and q left-to-right and find the first corresponding
terms where p and q “disagree” (i.e., p and q not equal)
If there is no disagreement, return θ (success!)
Let r and s be the terms in p and q, respectively,
where disagreement first occurs
If variable(r) then {
Let θ = union(θ, {r/s})
Return unify(subst(θ, p), subst(θ, q), θ)
} else if variable(s) then {
Let θ = union(θ, {s/r})
Return unify(subst(θ, p), subst(θ, q), θ)
} else return “Failure”
end
100
17-03-2021 18CSC305J_AI_UNIT3
Unification: Remarks
• Unify is a linear-time algorithm that returns the most
general unifier (mgu), i.e., the shortest-length substitution
list that makes the two literals match.
• In general, there is not a unique minimum-length
substitution list, but unify returns one of minimum length
• A variable can never be replaced by a term containing that
variable
Example: x/f(x) is illegal.
• This “occurs check” should be done in the above pseudo-
code before making the recursive calls
101
17-03-2021 18CSC305J_AI_UNIT3
Unification examples
• Example:
• parents(x, father(x), mother(Bill))
• parents(Bill, father(Bill), y)
• {x/Bill, y/mother(Bill)}
• Example:
• parents(x, father(x), mother(Bill))
• parents(Bill, father(y), z)
• {x/Bill, y/Bill, z/mother(Bill)}
• Example:
• parents(x, father(x), mother(Jane))
• parents(Bill, father(y), mother(y))
102 • Failure
17-03-2021 18CSC305J_AI_UNIT3
Resolution example
Practice example : Did Curiosity kill the cat
• Jack owns a dog. Every dog owner is an animal lover. No animal lover
kills an animal. Either Jack or Curiosity killed the cat, who is named
Tuna. Did Curiosity kill the cat?
• These can be represented as follows:
A. (x) Dog(x)  Owns(Jack,x)
B. (x) ((y) Dog(y)  Owns(x, y)) → AnimalLover(x)
C. (x) AnimalLover(x) → ((y) Animal(y) → Kills(x,y))
D. Kills(Jack,Tuna)  Kills(Curiosity,Tuna)
E. Cat(Tuna)
F. (x) Cat(x) → Animal(x) GOAL
G. Kills(Curiosity, Tuna)
17-03-2021 18CSC305J_AI_UNIT3
103
• Convert to clause form
D is a skolem constant
A1. (Dog(D))
A2. (Owns(Jack,D))
B. (Dog(y), Owns(x, y), AnimalLover(x))
C. (AnimalLover(a), Animal(b), Kills(a,b))
D. (Kills(Jack,Tuna), Kills(Curiosity,Tuna))
E. Cat(Tuna)
F. (Cat(z), Animal(z))
• Add the negation of query:
G: (Kills(Curiosity, Tuna))
104
17-03-2021 18CSC305J_AI_UNIT3
• The resolution refutation proof
R1: G, D, {} (Kills(Jack, Tuna))
R2: R1, C, {a/Jack, b/Tuna} (~AnimalLover(Jack),
~Animal(Tuna))
R3: R2, B, {x/Jack} (~Dog(y), ~Owns(Jack, y),
~Animal(Tuna))
R4: R3, A1, {y/D} (~Owns(Jack, D),
~Animal(Tuna))
R5: R4, A2, {} (~Animal(Tuna))
R6: R5, F, {z/Tuna} (~Cat(Tuna))
R7: R6, E, {} FALSE
105
17-03-2021 18CSC305J_AI_UNIT3
• The proof tree
G D
{}
R1: K(J,T) C
{a/J,b/T}
R2: AL(J)  A(T) B
{x/J}
R3: D(y)  O(J,y)  A(T) A1
{y/D}
R4: O(J,D), A(T) A2
{}
R5: A(T) F
{z/T}
R6: C(T) A
{}
R7: FALSE
106
17-03-2021 18CSC305J_AI_UNIT3
Table of Contents
• Unification and Resolution
• Knowledge representation using rules-Knowledge representation using semantic
nets
network
• Probabilistic reasoning-Probabilistic reasoning over time-Probabilistic reasoning
over time
17-03-2021 18CSC305J_AI_UNIT3 107
Production Rules
• Condition-Action Pairs
• IF this condition (or premise or antecedent) occurs,
THEN some action (or result, or conclusion, or
consequence) will (or should) occur
• IF the traffic light is red AND you have stopped,
THEN a right turn is OK
Production Rules
• Each production rule in a knowledge base represents an
autonomous chunk of expertise
• When combined and fed to the inference engine, the set of rules
behaves synergistically
• Rules can be viewed as a simulation of the cognitive behaviour
of human experts
• Rules represent a model of actual human behaviour
• Predominant technique used in expert systems, often in
conjunction with frames
Forms of Rules
• IF premise, THEN conclusion
• IF your income is high, THEN your chance of being
audited by the Inland Revenue is high
• Conclusion, IF premise
• Your chance of being audited is high, IF your income
is high
Forms of Rules
• Inclusion of ELSE
• IF your income is high, OR your deductions are unusual, THEN
your chance of being audited is high, OR ELSE your chance of
being audited is low
• More complex rules
• IF credit rating is high AND salary is more than £30,000, OR
assets are more than £75,000, AND pay history is not "poor,"
THEN approve a loan up to £10,000, and list the loan in category
"B.”
• Action part may have more information: THEN "approve the loan"
and "refer to an agent"
Characteristics of Rules
First Part Second Part
Names Premise Conclusion

Antecedent Consequence
Situation Action
IF THEN
Nature Conditions, similar to declarative knowledge Resolutions, similar to procedural knowledge
Size Can have many IFs Usually only one conclusion
Statement AND statements All conditions must be true for a conclusion to be true
OR statements If any condition is true, the conclusion is true

Rule-based Inference
• Production rules are typically used as part of a
production system
• Production systems provide pattern-directed control of
the reasoning process
• Production systems have:
• Productions: set of production rules
• Working Memory (WM): description of current state
of the world
• Recognise-act cycle
Production Systems
Production
Rules
C1→A1 Working
Environment
C2→A2 Memory
C3→A3
…
Cn→An
Conflict Conflict
Set Resolution
Recognise-Act Cycle
• Patterns in WM matched against production rule conditions
• Matching (activated) rules form the conflict set
• One of the matching rules is selected (conflict resolution) and
fired
• Action of rule is performed
• Contents of WM updated
• Cycle repeats with updated WM
Conflict Resolution
• Reasoning in a production system can be viewed as a type of
search
• Selection strategy for rules from the conflict set controls
search
• Production system maintains the conflict set as an agenda
• Ordered list of activated rules (those with their conditions
satisfied) which have not yet been executed
• Conflict resolution strategy determines where a newly-
activated rule is inserted
Salience
• Rules may be given a precedence order by assigning a
salience value
• Newly activated rules are placed in the agenda above all rules
of lower salience, and below all rules with higher salience
• Rule with higher salience are executed first
• Conflict resolution strategy applies between rules of the
same salience
• If salience and the conflict resolution strategy can ’ t
determine which rule is to be executed next, a rule is chosen
at random from the most highly ranked rules
Conflict Resolution Strategies
• Depth-first: newly activated rules placed above other rules in the
agenda
• Breadth-first: newly activated rules placed below other rules
• Specificity: rules ordered by the number of conditions in the LHS
(simple-first or complex-first)
• Least recently fired: fire the rule that was last fired the longest time
ago
• Refraction: don’t fire a rule unless the WM patterns that match its
conditions have been modified
• Recency: rules ordered by the timestamps on the facts that match
their conditions
Salience
• Salience facilitates the modularization of expert systems
in which modules work at different levels of abstraction
• Over-use of salience can complicate a system
• Explicit ordering to rule execution
• Makes behaviour of modified systems less predictable
• Rule of thumb: if two rules have the same salience, are
in the same module, and are activated concurrently,
then the order in which they are executed should not
matter
Common Types of Rules
• Knowledge rules, or declarative rules, state all the facts
and relationships about a problem
• Inference rules, or procedural rules, advise on how to
solve a problem, given that certain facts are known
• Inference rules contain rules about rules (metarules)
• Knowledge rules are stored in the knowledge base
• Inference rules become part of the inference engine
17-03-2021 18CSC305J_AI_UNIT3 120

Major Advantages of Rules
• Easy to understand (natural form of knowledge)
• Easy to derive inference and explanations
• Easy to modify and maintain
• Easy to combine with uncertainty
• Rules are frequently independent
17-03-2021 18CSC305J_AI_UNIT3 121

Major Limitations of Rules
• Complex knowledge requires many rules
• Search limitations in systems with many rules
17-03-2021 18CSC305J_AI_UNIT3 122

Table of Contents
nets
network
over time
17-03-2021 18CSC305J_AI_UNIT3 123
Semantic Networks
• A semantic network is a structure for representing
knowledge as a pattern of interconnected nodes and
arcs
• Nodes in the net represent concepts of entities,
attributes, events, values
• Arcs in the network represent relationships that hold
between the concepts
17-03-2021 18CSC305J_AI_UNIT3 124

Semantic Networks
• Semantic networks can show inheritance
• Relationship types – is-a, has-a
• Semantic Nets - visual representation of relationships
• Can be combined with other representation methods
17-03-2021 18CSC305J_AI_UNIT3 125

Semantic Networks
Bird
is-a is-a Canary
Can fly
Can sing
Has wings
Is yellow
Has feathers
Animal Ostrich
Can breathe is-a Runs fast
Can eat Cannot fly
Has skin Is tall
Fish Salmon
is-a Can swim is-a Swims upstream
Has fins Is pink
Has gills Is edible
17-03-2021 18CSC305J_AI_UNIT3 126

Semantic Networks
moves
ANIMAL is a
breathes DOG is a works sheep

is a
SHEEPDOG
tracks HOUND has tail

barks is a
is a size: medium
BEAGLE COLLIE
instance
size: small FICTIONAL
instance CHARACTER instance
instance
SNOOPY instance
LASSIE
friend of
17-03-2021 CHARLIE BROWN
18CSC305J_AI_UNIT3 127
Semantic Networks
What does or should a node represent?
• A class of objects?
• An instance of an class?
• The canonical instance of a class?
• The set of all instances of a class?
17-03-2021 18CSC305J_AI_UNIT3 128

Semantic Networks
• Semantics of links that define new objects and links that relate
existing objects, particularly those dealing with ‘intrinsic’
characteristics of a given object
• How does one deal with the problems of comparison between
objects (or classes of objects) through their attributes?
• Essentially the problem of comparing object instances
• What mechanisms are there are to handle quantification in
semantic network formalisms?
17-03-2021 18CSC305J_AI_UNIT3 129

Transitive inference, but…
• Clyde is an elephant, an elephant is a mammal: Clyde is a
mammal.
• The US President is elected every 4 years, Bush is US President:

Bush is elected every 4 years
• My car is a Ford, Ford is a car company: my car is a car
company
17-03-2021 18CSC305J_AI_UNIT3 130

Table of Contents
nets
network
over time
17-03-2021 18CSC305J_AI_UNIT3 131
Frames
• A frame is a knowledge representation formalism based on the idea of a
frame of reference.
• A frame is a data structure that includes all the knowledge about a
particular object
• Frames organised in a hierarchy Form of object-oriented programming for
AI and ES.
• Each frame describes one object
• Special terminology
M. Minsky (1974) A Framework for Representing Knowledge,

MIT-AI Laboratory Memo 306
17-03-2021 18CSC305J_AI_UNIT3 132

Frames
• There are two types of frame:
• Class Frame
• Individual or Instance Frame
• A frame carries with it a set of slots that can represent
objects that are normally associated with a subject of
the frame.
17-03-2021 18CSC305J_AI_UNIT3 133

Frames
• The slots can then point to other slots or frames. That
gives frame systems the ability to carry out inheritance
and simple kinds of data manipulation.
• The use of procedures - also called demons in the
literature - helps in the incorporation of substantial
amounts of procedural knowledge into a particular
frame-oriented knowledge base
17-03-2021 18CSC305J_AI_UNIT3 134

Frame-based model of semantic
memory
• Knowledge is organised in a data structure
• Slots in structure are instantiated with particular values for a
given instance of data
• ...translation to OO terminology:
• frames == classes or objects
• slots == variables/methods
17-03-2021 18CSC305J_AI_UNIT3 135

General Knowledge as Frames
DOG COLLIE
Fixed Fixed
legs: 4 breed of: DOG
type: sheepdog
Default
diet: carnivorous Default
sound: bark size: 65cm
Variable Variable
size: colour:
colour:
17-03-2021 18CSC305J_AI_UNIT3 136

General Knowledge as Frames
MAMMAL:
subclass: ANIMAL
has_part: head
ELEPHANT
subclass: MAMMAL
colour: grey
size: large
Nellie
instance: ELEPHANT
likes: apples
17-03-2021 18CSC305J_AI_UNIT3 137

Logic underlies Frames
• ∀x mammal(x) ⇒ has_part(x, head)
• ∀x elephant(x) ⇒ mammal(x)
• elephant(clyde)
∴
mammal(clyde)
has_part(clyde, head)
17-03-2021 18CSC305J_AI_UNIT3 138

Logic underlies Frames
MAMMAL:
subclass: ANIMAL
has_part: head
*furry: yes
ELEPHANT
subclass: MAMMAL
has_trunk: yes
*colour: grey
*size: large
*furry: no
Clyde
instance: ELEPHANT
colour: pink
owner: Fred
Nellie
instance: ELEPHANT
size:
17-03-2021 small 18CSC305J_AI_UNIT3 139
Frames (Contd.)
• Can represent subclass and instance relationships (both
sometimes called ISA or “is a”)
• Properties (e.g. colour and size) can be referred to as slots and
slot values (e.g. grey, large) as slot fillers
• Objects can inherit all properties of parent class (therefore
Nellie is grey and large)
• But can inherit properties which are only typical (usually called
default, here starred), and can be overridden
• For example, mammal is typically furry, but this is not so for an
elephant
17-03-2021 18CSC305J_AI_UNIT3 140

Frames (Contd.)
• Provide a concise, structural representation of knowledge in a
natural manner
• Frame encompasses complex objects, entire situations or a
management problem as a single entity
• Frame knowledge is partitioned into slots
• Slot can describe declarative knowledge or procedural
knowledge
• Hierarchy of Frames: Inheritance
17-03-2021 18CSC305J_AI_UNIT3 141

Capabilities of Frames
• Ability to clearly document information about a domain model;
for example, a plant's machines and their associated attributes
• Related ability to constrain allowable values of an attribute
• Modularity of information, permitting ease of system expansion
and maintenance
• More readable and consistent syntax for referencing domain
objects in the rules
17-03-2021 18CSC305J_AI_UNIT3 142

Capabilities of Frames
• Platform for building graphic interface with object graphics
• Mechanism to restrict the scope of facts considered during
forward or backward chaining
• Access to a mechanism that supports the inheritance of
information down a class hierarchy
• Used as underlying model in standards for accessing KBs (Open
Knowledge Base Connectivity - OKBC)
17-03-2021 18CSC305J_AI_UNIT3 143

Summary
• Frames have been used in conjunction with other, less well-
grounded, representation formalisms, like production systems,
when used to build to pre-operational or operational expert
systems
• Frames cannot be used efficiently to organise ‘a whole
computation
17-03-2021 18CSC305J_AI_UNIT3 144

Table of Contents
• Knowledge and reasoning-Approaches and issues of knowledge
reasoning-Knowledge base agents
• Logic Basics-Logic-Propositional logic-syntax ,semantics and
inferences-Propositional logic- Reasoning patterns
• Unification and Resolution-Knowledge representation using rules-
Knowledge representation using semantic nets
• Uncertain Knowledge and reasoning-Methods-Bayesian probability and
belief network
• Probabilistic reasoning-Probabilistic reasoning over time
17-03-2021 18CSC305J_AI_UNIT3 145
Types of Inference
• Deduction
• Induction
• Abduction
17-03-2021 15CS401-Artificial Intelligence 146

Deduction
• Deriving a conclusion from given axioms and facts

• Also called logical inference or truth preservation
Axiom – All kids are naughty
Fact/Premise – Riya is a kid
Conclusion – Riya is naughty

Induction
• Deriving general rule or axiom from background knowledge and

observations
• Riya is a kid
• Riya is naughty
General axiom which is derived is:
Kids are naughty

Abduction
• A premise is derived from a known axiom and some observations

• All kids are naughty
• Riya is naughty
Inference
Riya is a kid

Table of Contents
belief network
17-03-2021 18CSC305J_AI_UNIT3 150
Uncertain knowledge and reasoning
• In real life, it is not always possible to determine the state of the environment as it might not be clear. Due
to partially observable or non-deterministic environments, agents may need to handle uncertainty and deal
with it.
• Uncertain data: Data that is missing, unreliable, inconsistent or noisy
• Uncertain knowledge: When the available knowledge has multiple causes leading to multiple effects or
incomplete knowledge of causality in the domain
• Uncertain knowledge representation: The representations which provides a restricted model of the real
system, or has limited expressiveness
• Inference: In case of incomplete or default reasoning methods, conclusions drawn might not be completely
accurate. Let’s understand this better with the help of an example.
• IF primary infection is bacteria cea
• AND site of infection is sterile
• AND entry point is gastrointestinal tract
• THEN organism is bacteriod (0.7).
• In such uncertain situations, the agent does not guarantee a solution but acts on its own
assumptions and probabilities and gives some degree of belief that it will reach the required
solution.
17-03-2021 18CSC305J_AI_UNIT3 151

• For example, In case of Medical diagnosis consider the rule Toothache = Cavity. This
is not complete as not all patients having toothache have cavities. So we can write a
more generalized rule Toothache = Cavity V Gum problems V Abscess… To make this
rule complete, we will have to list all the possible causes of toothache. But this is not
feasible due to the following rules:
• Laziness- It will require a lot of effort to list the complete set of antecedents and
consequents to make the rules complete.
• Theoretical ignorance- Medical science does not have complete theory for the
domain
• Practical ignorance- It might not be practical that all tests have been or can be
conducted for the patients.
• Such uncertain situations can be dealt with using
Probability theory
Truth Maintenance systems
Fuzzy logic.
17-03-2021 18CSC305J_AI_UNIT3 152
Probability
• Probability is the degree of likeliness that an event will occur. It provides a certain degree of belief
in case of uncertain situations. It is defined over a set of events U and assigns value P(e) i.e.
probability of occurrence of event e in the range [0,1]. Here each sentence is labeled with a real
number in the range of 0 to 1, 0 means the sentence is false and 1 means it is true.
• Conditional Probability or Posterior Probability is the probability of event A given that B has
already occurred.
• P(A|B) = (P(B|A) * P(A)) / P(B)
• For example, P(It will rain tomorrow| It is raining today) represents conditional probability of it
raining tomorrow as it is raining today.
• P(A|B) + P(NOT(A)|B) = 1
• Joint probability is the probability of 2 independent events happening simultaneously like rolling
two dice or tossing two coins together. For example, Probability of getting 2 on one dice and 6 on
the other is equal to 1/36. Joint probability has a wide use in various fields such as physics,
astronomy, and comes into play when there are two independent events. The full joint probability
distribution specifies the probability of each complete assignment of values to random variables.
17-03-2021 18CSC305J_AI_UNIT3 153

Bayes Theorem
• It is based on the principle that every pair of features being
classified is independent of each other. It calculates probability
P(A|B) where A is class of possible outcomes and B is given
instance which has to be classified.
• P(A|B) = P(B|A) * P(A) / P(B)
• P(A|B) = Probability that A is happening, given that B has
occurred (posterior probability)
• P(A) = prior probability of class
• P(B) = prior probability of predictor
• P(B|A) = likelihood
17-03-2021 18CSC305J_AI_UNIT3 154
Consider the following data.

Depending on the weather
(sunny, rainy or overcast), the
children will play(Y) or not
play(N).
Here, the total number of
observations = 14
Probability that children will
play given that weather is
sunny :
P( Yes| Sunny) = P(Sunny |
Yes) * P(Yes) / P(Sunny)
= 0.33 * 0.64 / 0.36
= 0.59
17-03-2021 18CSC305J_AI_UNIT3 155
It is a probabilistic graphical model for

representing uncertain domain and to
reason under uncertainty. It consists of
nodes representing variables, arcs
representing direct connections between
them, called causal correlations. It
represents conditional dependencies
between random variables through a
Directed Acyclic Graph (DAG). A belief
network consist of:
1. A DAG with nodes labeled with variable
names,
2. Domain for each random variable,
3. Set of conditional probability tables for
each variable given its parents, including
prior probability for nodes with no
parents.
17-03-2021 18CSC305J_AI_UNIT3 156

• Let’s have a look at the steps followed.

1. Identify nodes which are the random variables and the possible
values they can have from the probability domain. The nodes can be
boolean (True/ False), have ordered values or integral values.
2. Structure- It is used to represent causal relationships between the
variables. Two nodes are connected if one affects or causes the other
and the arc points towards the effect. For instance, if it is windy or
cloudy, it rains. There is a direct link from Windy/Cloudy to Rains.
Similarly, from Rains to Wet grass and Leave, i.e., if it rains, grass will
be wet and leave is taken from work.
17-03-2021 18CSC305J_AI_UNIT3 157

3. Probability- Quantifying relationship between nodes. Conditional

Probability:
• P(A^B) = P(A|B) * P(B)
• P(A|B) = P(B|A) * P(A)
• P(B|A) = P(A|B) * P(B) / P(A)
• Joint probability:
4. Markov property- Bayesian Belied Networks require assumption of
Markov property, i.e., all direct dependencies are shown by using arcs.
Here there is no direct connection between it being Cloudy and Taking a
leave. But there is one via Rains. Belief Networks which have Markov
property are also called independence maps.
17-03-2021 18CSC305J_AI_UNIT3 158

Inference in Belief Networks

• Bayesian Networks provide various types of representations of probability distribution over their
variables. They can be conditioned over any subset of their variables, using any direction of
reasoning.
• For example, one can perform diagnostic reasoning, i.e. when it Rains, one can update his belief
about the grass being wet or if leave is taken from work. In this case reasoning occurs in the
opposite direction to the network arcs. Or one can perform predictive reasoning, i.e., reasoning
from new information about causes to new beliefs about effects, following direction of the arcs.
For example, if the grass is already wet, then the user knows that it has rained and it might have
been cloudy or windy. Another form of reasoning involves reasoning about mutual causes of a
common effect. This is called inter causal reasoning.
• There are two possible causes of an effect, represented in the form of a ‘V’. For example, the
common effect ‘Rains’ can be caused by two reasons ‘Windy’ and ‘Cloudy.’ Initially, the two
causes are independent of each other but if it rains, it will increase the probability of both the
causes. Assume that we know it was windy. This information explains the reasons for the rainfall
and lowers probability that it was cloudy. 18CSC305J_AI_UNIT3
17-03-2021 159
Table of Contents
belief network
17-03-2021 18CSC305J_AI_UNIT3 160
Bayesian probability and belief network
• Bayesian belief network is key computer technology for dealing with
probabilistic events and to solve a problem which has uncertainty. We
can define a Bayesian network as:
• "A Bayesian network is a probabilistic graphical model which
represents a set of variables and their conditional dependencies using
a directed acyclic graph."
• It is also called a Bayes network, belief network, decision network,
or Bayesian model.
17-03-2021 18CSC305J_AI_UNIT3 161

• Bayesian networks are probabilistic, because these networks are built from
a probability distribution, and also use probability theory for prediction and
anomaly detection.
• Real world applications are probabilistic in nature, and to represent the
relationship between multiple events, we need a Bayesian network. It can also be
used in various tasks including prediction, anomaly detection, diagnostics,
automated insight, reasoning, time series prediction, and decision making
under uncertainty.
• Bayesian Network can be used for building models from data and experts
opinions, and it consists of two parts:
Directed Acyclic Graph
Table of conditional probabilities.
• The generalized form of Bayesian network that represents and solve decision
problems under uncertain knowledge is known as an Influence diagram.
17-03-2021 18CSC305J_AI_UNIT3 162
Directed Acyclic Graph
A Bayesian network graph is made up of nodes
and Arcs (directed links), where:
Each node corresponds to the random variables,
and a variable can be continuous or discrete.
Arc or directed arrows represent the causal
relationship or conditional probabilities between
random variables. These directed links or arrows
connect the pair of nodes in the graph.
These links represent that one node directly
influence the other node, and if there is no
directed link that means that nodes are
independent with each other
In the above diagram, A, B, C, and D are
random variables represented by the nodes
of the network graph.
If we are considering node B, which is
connected with node A by a directed arrow,
then node A is called the parent of Node B.
Node C is independent of node A.
17-03-2021 18CSC305J_AI_UNIT3 163

CONDITIONAL PROBABILITY
• The Bayesian network has mainly two components:
Causal Component
Actual numbers
• Each node in the Bayesian network has condition probability
distribution P(Xi |Parent(Xi) ), which determines the effect of the
parent on that node.
• Bayesian network is based on Joint probability distribution and
conditional probability. So let's first understand the joint probability
distribution:
17-03-2021 18CSC305J_AI_UNIT3 164

Joint probability distribution:

• If we have variables x1, x2, x3,....., xn, then the probabilities of a
different combination of x1, x2, x3.. xn, are known as Joint probability
distribution.
• P[x1, x2, x3,....., xn], it can be written as the following way in terms of
the joint probability distribution.
• = P[x1| x2, x3,....., xn]P[x2, x3,....., xn]
• = P[x1| x2, x3,....., xn]P[x2|x3,....., xn]....P[xn-1|xn]P[xn].
• In general for each variable Xi, we can write the equation as:
• P(Xi|Xi-1,........., X1) = P(Xi |Parents(Xi ))
17-03-2021 18CSC305J_AI_UNIT3 165
Explanation of Bayesian network:

• Let's understand the Bayesian network through an example by creating a
directed acyclic graph:
Example: Harry installed a new burglar alarm at his home to detect burglary.
The alarm reliably responds at detecting a burglary but also responds for
minor earthquakes. Harry has two neighbors David and Sophia, who have
taken a responsibility to inform Harry at work when they hear the alarm.
David always calls Harry when he hears the alarm, but sometimes he got
confused with the phone ringing and calls at that time too. On the other
hand, Sophia likes to listen to high music, so sometimes she misses to hear
the alarm. Here we would like to compute the probability of Burglary
Alarm.
17-03-2021 18CSC305J_AI_UNIT3 166

Problem:
• Calculate the probability that alarm has sounded, but there is neither a burglary, nor an
earthquake occurred, and David and Sophia both called the Harry.
Solution:
• The Bayesian network for the above problem is given below. The network structure is showing
that burglary and earthquake is the parent node of the alarm and directly affecting the probability
of alarm's going off, but David and Sophia's calls depend on alarm probability.
• The network is representing that our assumptions do not directly perceive the burglary and also
do not notice the minor earthquake, and they also not confer before calling.
• The conditional distributions for each node are given as conditional probabilities table or CPT.
• Each row in the CPT must be sum to 1 because all the entries in the table represent an exhaustive
set of cases for the variable.
• In CPT, a boolean variable with k boolean parents contains 2K probabilities. Hence, if there are
two parents, then CPT will contain 4 probability values
17-03-2021 18CSC305J_AI_UNIT3 167

List of all events occurring in this network:
• Burglary (B)
• Earthquake(E)
• Alarm(A)
• David Calls(D)
• Sophia calls(S)
We can write the events of problem statement in the form of probability: P[D, S, A, B, E], can
rewrite the above probability statement using joint probability distribution:
• P[D, S, A, B, E]= P[D | S, A, B, E]. P[S, A, B, E]
• =P[D | S, A, B, E]. P[S | A, B, E]. P[A, B, E]
• = P [D| A]. P [ S| A, B, E]. P[ A, B, E]
• = P[D | A]. P[ S | A]. P[A| B, E]. P[B, E]
• = P[D | A ]. P[S | A]. P[A| B, E]. P[B |E]. P[E]
17-03-2021 18CSC305J_AI_UNIT3 168

Let's take the observed probability for
the Burglary and earthquake
component:
P(B= True) = 0.002, which is the
probability of burglary.
P(B= False)= 0.998, which is the
probability of no burglary.
P(E= True)= 0.001, which is the
probability of a minor earthquake
P(E= False)= 0.999, Which is the
probability that an earthquake not
occurred.
17-03-2021 18CSC305J_AI_UNIT3 169

We can provide the

B E P(A= True) P(A= False)
conditional probabilities as
per the below tables: True True 0.94 0.06
Conditional probability table True False 0.95 0.04

for Alarm A: False True 0.31 0.69
The Conditional probability of False False 0.001 0.999

Alarm A depends on Burglar
and earthquake:
17-03-2021 18CSC305J_AI_UNIT3 170

Conditional probability table
A P(D= True) P(D= False)
for David Calls:
The Conditional probability of
True 0.91 0.09
David that he will call
depends on the probability of False 0.05 0.95
Alarm.
17-03-2021 18CSC305J_AI_UNIT3 171

Conditional probability
A P(S= True) P(S= False)
table for Sophia Calls:
The Conditional
True 0.75 0.25
probability of Sophia that
she calls is depending on
False 0.02 0.98
its Parent Node "Alarm."
AP(S= True)P(S=
False)True0.750.25False0.020.98 AP(S=
17-03-2021 18CSC305J_AI_UNIT3
True)P(S=
• From the formula of joint distribution, we can write the problem statement in the form of
probability distribution:
• P(S, D, A, ¬B, ¬E) = P (S|A) *P (D|A)*P (A|¬B ^ ¬E) *P (¬B) *P (¬E).
= 0.75* 0.91* 0.001* 0.998*0.999
= 0.00068045.
Hence, a Bayesian network can answer any query about the domain by using Joint distribution.
• The semantics of Bayesian Network:
• There are two ways to understand the semantics of the Bayesian network, which is given below:
1. To understand the network as the representation of the Joint probability distribution.
• It is helpful to understand how to construct the network.
2. To understand the network as an encoding of a collection of conditional independence
statements.
• It is helpful in designing inference procedure.
17-03-2021 18CSC305J_AI_UNIT3 173

Bayes' theorem in Artificial intelligence
Bayes' theorem:
• Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian
reasoning, which determines the probability of an event with uncertain
knowledge.
• In probability theory, it relates the conditional probability and marginal
probabilities of two random events.
• Bayes' theorem was named after the British mathematician Thomas Bayes.
The Bayesian inference is an application of Bayes' theorem, which is
fundamental to Bayesian statistics.
• It is a way to calculate the value of P(B|A) with the knowledge of P(A|B).
• Bayes' theorem allows updating the probability prediction of an event by
observing new information of the real world.
17-03-2021 18CSC305J_AI_UNIT3 174

Bayes' theorem in Artificial intelligence
Example: If cancer corresponds to one's age then by using Bayes' theorem, we can determine the probability of cancer more accurately with
the help of age.
• Bayes' theorem can be derived using product rule and conditional probability of event A with known event B:
• As from product rule we can write:
• P(A ⋀ B)= P(A|B) P(B) or
• Similarly, the probability of event B with known event A:
• P(A ⋀ B)= P(B|A) P(A)
• Equating right hand side of both the equations, we will get:
The above equation (a) is called as Bayes' rule or Bayes' theorem. This equation is basic of most modern AI systems for probabilistic inference.
• It shows the simple relationship between joint and conditional probabilities. Here,
• P(A|B) is known as posterior, which we need to calculate, and it will be read as Probability of hypothesis A when we have occurred an
evidence B.
• P(B|A) is called the likelihood, in which we consider that hypothesis is true, then we calculate the probability of evidence.
• P(A) is called the prior probability, probability of hypothesis before considering the evidence
• P(B) is called marginal probability, pure probability of an evidence.
• In the equation (a), in general, we can write P (B) = P(A)*P(B|Ai), hence the Bayes' rule can be written as:
•17-03-2021
Where A1, A2, A3,........, An is a set of mutually exclusive and 18CSC305J_AI_UNIT3
exhaustive events. 175
Applying Bayes'
Applying Bayes'theorem
rule: in Artificial intelligence
• Bayes' rule allows us to compute the single term P(B|A) in terms of P(A|B), P(B), and P(A). This is very useful
in cases where we have a good probability of these three terms and want to determine the fourth one.
Suppose we want to perceive the effect of some unknown cause, and want to compute that cause, then the
Bayes' rule becomes:
Example-1:
Question: what is the probability that a patient has diseases meningitis with a stiff neck?
• Given Data:
A doctor is aware that disease meningitis causes a patient to have a stiff neck, and it occurs 80% of
the time. He is also aware of some more facts, which are given as follows:
The Known probability that a patient has meningitis disease is 1/30,000.
The Known probability that a patient has a stiff neck is 2%.
Let a be the proposition that patient has stiff neck and b be the proposition that patient has meningitis. , so we
can calculate the following as:
P(a|b) = 0.8
P(b) = 1/30000
P(a)= .02
• Hence, we can assume that 1 patient out of 750 patients has meningitis disease with a stiff neck.
17-03-2021 18CSC305J_AI_UNIT3 176

Applying Bayes' theorem in Artificial
intelligence
Example-2:
Question: From a standard deck of playing cards, a single card is drawn. The probability that the
card is king is 4/52, then calculate posterior probability P(King|Face), which means the drawn
face card is a king card.
Solution:
P(king): probability that the card is King= 4/52= 1/13

P(face): probability that a card is a face card= 3/13
P(Face|King): probability of face card when we assume it is a king = 1
Putting all values in equation (i) we will get:
17-03-2021 18CSC305J_AI_UNIT3 177

Application of Bayes' theorem in Artificial
intelligence:
Following are some applications of Bayes' theorem:
• It is used to calculate the next step of the robot when the already
executed step is given.
• Bayes' theorem is helpful in weather forecasting.
• It can solve the Monty Hall problem.
17-03-2021 18CSC305J_AI_UNIT3 178

Table of Contents
belief network
17-03-2021 18CSC305J_AI_UNIT3 179
Probabilistic reasoning
Uncertainty:
• Till now, we have learned knowledge representation using first-order logic and propositional logic with
certainty, which means we were sure about the predicates. With this knowledge representation, we might
write A→B, which means if A is true then B is true, but consider a situation where we are not sure about
whether A is true or not then we cannot express this statement, this situation is called uncertainty.
• So to represent uncertain knowledge, where we are not sure about the predicates, we need uncertain
reasoning or probabilistic reasoning.
Causes of uncertainty:
Following are some leading causes of uncertainty to occur in the real world.
• Information occurred from unreliable sources.
• Experimental Errors
• Equipment fault
• Temperature variation
• Climate change.
17-03-2021 18CSC305J_AI_UNIT3 180

Probabilistic reasoning:
• Probabilistic reasoning is a way of knowledge representation where we apply the concept of probability to indicate the uncertainty
in knowledge. In probabilistic reasoning, we combine probability theory with logic to handle the uncertainty.
• We use probability in probabilistic reasoning because it provides a way to handle the uncertainty that is the result of someone's
laziness and ignorance.
• In the real world, there are lots of scenarios, where the certainty of something is not confirmed, such as "It will rain today,"
"behavior of someone for some situations," "A match between two teams or two players." These are probable sentences for which
we can assume that it will happen but not sure about it, so here we use probabilistic reasoning.
Need of probabilistic reasoning in AI:
• When there are unpredictable outcomes.
• When specifications or possibilities of predicates becomes too large to handle.
• When an unknown error occurs during an experiment.
In probabilistic reasoning, there are two ways to solve problems with uncertain knowledge:
• Bayes' rule
• Bayesian Statistics
17-03-2021 18CSC305J_AI_UNIT3 181

As probabilistic reasoning uses probability and related terms, so before understanding probabilistic reasoning, let's
understand some common terms:
Probability: Probability can be defined as a chance that an uncertain event will occur. It is the numerical measure of the
likelihood that an event will occur. The value of probability always remains between 0 and 1 that represent ideal
uncertainties.
0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.
P(A) = 0, indicates total uncertainty in an event A.
P(A) =1, indicates total certainty in an event A.
We can find the probability of an uncertain event by using the below formula.
P(¬A) = probability of a not happening event.
P(¬A) + P(A) = 1.
• Event: Each possible outcome of a variable is called an event.
• Sample space: The collection of all possible events is called sample space.
• Random variables: Random variables are used to represent the events and objects in the real world.
• Prior probability: The prior probability of an event is probability computed before observing new information.
• Posterior Probability: The probability that is calculated after all evidence or information has taken into account. It is a
combination of prior probability and new information.
17-03-2021 18CSC305J_AI_UNIT3 182

Conditional probability:
• Conditional probability is a probability of occurring an event when another event
has already happened.
Let's suppose, we want to calculate the event A when event B has already occurred,
"the probability of A under the conditions of B", it can be written as:
P(A/B)=P(A ⋀B)/P(B)
• Where P(A⋀B)= Joint probability of a and B
• P(B)= Marginal probability of B.
• If the probability of A is given and we need to find the probability of B, then it will
be given as: P(B/A)=P(A ⋀B)/P(A)
• It can be explained by using the below Venn diagram, where B is occurred event,
so sample space will be reduced to set B, and now we can only calculate event A
when event B is already occurred by dividing the probability of P(A⋀B) by P( B ).
17-03-2021 18CSC305J_AI_UNIT3 183
Example:
In a class, there are 70% of the
students who like English and 40% of
the students who likes English and
mathematics, and then what is the
percent of students those who like
English also like mathematics?
Solution:
Let, A is an event that a student likes
Mathematics
B is an event that a student likes
English.
Hence, 57% are the students who
like English also like Mathematics.
17-03-2021 18CSC305J_AI_UNIT3 184

Table of Contents
belief network
17-03-2021 18CSC305J_AI_UNIT3 185
Probabilistic reasoning over time
Definition
• Probabilistic reasoning is the representation of knowledge where the concept of probability is applied to indicate the uncertainty
in knowledge.
Reasons to use Probabilistic Reasoning in AI
• Some reasons to use this way of representing knowledge is given below:
• When we are unsure of the predicates.
• When the possibilities of predicates become too large to list down.
• When during an experiment, it is proven that an error occurs.
• Probability of a given event = Chances of that event occurring / Total number of Events.
Notations and Properties
• Consider the statement S: March will be cold.
• Probability is often denoted as P(predicate).
• Considering the chances of March being cold is only 30%, therefore, P(S) = 0.3
• Probability always takes a value between 0 and 1. If the probability is 0, then the event will never occur and if it is 1, then it will
occur for sure.
• Then, P(¬S) = 0.7
• This means, the probability of March not being cold is 70%.
• Property 1: P(S) + P(¬S) = 1
17-03-2021 18CSC305J_AI_UNIT3 186

Consider the statement T: April will be cold.

• Then, P(S∧T) means Probability of S AND T, i.e., Probability of March and April being cold.
• P(S∨T) means Probability of S OR T, i.e., Probability of March or April being cold.
• Property 2: P(S∨T) = P(S) + P(T) - P(S∧T)
• Proofs for the properties are not given here and you can work them out by yourselves using Venn
Diagrams.
Conditional Property
• Conditional Property is defined as the probability of a given event given another event. It is
denoted by P(B|A) and is read as: ''Probability of B given probability of A.''
• Property 3: P(B|A) = P(B∧A) / P(A).
Bayes' Theorem
• Given P(A), P(B) and P(A|B), then
• P(B|A) = P(A|B) x P(B) / P(A)
17-03-2021 18CSC305J_AI_UNIT3 187

Bayesian Network
When designing a Bayesian Network, we keep
the local probability table at each node.
Bayesian Network - Example
Consider a Bayesian Network as given below:
This Bayesian Network tells us the reason a

particular person cannot study. It may be
either because of no electricity or because of
his lack of interest. The corresponding
probabilities are written in front of the causes.
17-03-2021 18CSC305J_AI_UNIT3 188

Now, as you can see no cause No Electricity Not interested P(Cannot Study)
is dependent on each other P(No electricity = F) x P(Not
and they directly contribute F F Interested = F) = 0.8 x 0.7 =
0.56
to the person's inability to
P(No electricity = F) x P(Not
study. To plot the third table, F T Interested = T) = 0.8 x 0.3 =
we consider four cases. Since, 0.24
the causes are independent, P(No electricity = T) x
their corresponding T F P(Not Interested = F) = 0.2
x 0.7 = 0.14
probabilities can be
P(No electricity = T) x
multiplied directly. T T P(Not Interested = T) = 0.2
x 0.3 = 0.06
17-03-2021 18CSC305J_AI_UNIT3 189

The updated
Bayesian Network
is:
17-03-2021 18CSC305J_AI_UNIT3 190

Table of Contents
belief network
17-03-2021 18CSC305J_AI_UNIT3 191
Data Mining
• In artificial intelligence and machine learning, data mining, or knowledge discovery in databases, is the nontrivial
extraction of implicit, previously unknown and potentially useful information from data. Statistical methods are used that
enable trends and other relationships to be identified in large databases.
• The major reason that data mining has attracted attention is due to the wide availability of vast amounts of data, and the
need for turning such data into useful information and knowledge. The knowledge gained can be used for applications
ranging from risk monitoring, business management, production control, market analysis, engineering, and science
exploration.
In general, three types of data mining techniques are used: association, regression, and classification.
Association analysis
• Association analysis is the discovery of association rules showing attribute-value conditions that occur frequently together
in a given set of data. Association analysis is widely used to identify the correlation of individual products within shopping
carts.
Regression analysis
• Regression analysis creates models that explain dependent variables through the analysis of independent variables. As an
example, the prediction for a product’s sales performance can be created by correlating the product price and the average
customer income level.
Classification and prediction
• Classification is the process of designing a set of models to predict the class of objects whose class label is unknown. The
derived model may be represented in various forms, such as if-then rules, decision trees, or mathematical formulas.
17-03-2021 18CSC305J_AI_UNIT3 192

Data Mining
• A decision tree is a flow-chart-like tree structure where each node denotes a test on an attribute
value, each branch represents an outcome of the test, and each tree leaf represents a class or
class distribution. Decision trees can be converted to classification rules.
• Classification can be used for predicting the class label of data objects. Prediction encompasses
the identification of distribution trends based on the available data.
The data mining process consists of an iterative sequence of the following steps:
• Data coherence and cleaning to remove noise and inconsistent data
• Data integration such that multiple data sources may be combined
• Data selection where data relevant to the analysis are retrieved
• Data transformation where data are consolidated into forms appropriate for mining
• Pattern recognition and statistical techniques are applied to extract patterns
• Pattern evaluation to identify interesting patterns representing knowledge
• Visualization techniques are used to present mined knowledge to users
17-03-2021 18CSC305J_AI_UNIT3 193

Data Mining
Limits of Data Mining
• GIGO (garbage in garbage out) is almost always referenced with respect to data mining, as the
quality of the knowledge gained through data mining is dependent on the quality of the historical
data. We know data inconsistencies and dealing with multiple data sources represent large
problems in data management.
• Data cleaning techniques are used to deal with detecting and removing errors and inconsistencies
to improve data quality; however, detecting these inconsistencies is extremely difficult. How can
we identify a transaction that is incorrectly labeled as suspicious? Learning from incorrect data
leads to inaccurate models.
• Another limitation of data mining is that it only extracts knowledge limited to the specific set of
historical data, and answers can only be obtained and interpreted with regards to previous trends
learned from the data.
• This limits one’s ability to benefit from new trends. Because the decision tree is trained
specifically on the historical data set, it does not account for personalization within the
tree. Additionally, data mining (decision trees, rules, clusters) are non-incremental and do not
adapt while in production.
17-03-2021 18CSC305J_AI_UNIT3 194

Table of Contents
belief network
17-03-2021 18CSC305J_AI_UNIT3 195
Operation of Fuzzy System
Crisp Input
Fuzzification Input Membership Functions
Fuzzy Input
Rule Evaluation Rules / Inferences
Fuzzy Output
Defuzzification Output Membership Functions
Crisp Output 196

Building Fuzzy Systems
⚫ Fuzzification
⚫ Inference
⚫ Composition
⚫ Defuzzification
197
Fuzzification
⚫ Establishes the fact base of the fuzzy system. It identifies the input and output of the
system, defines appropriate IF THEN rules, and uses raw data to derive a
membership function.
⚫ Consider an air conditioning system that determine the best circulation level by
sampling temperature and moisture levels. The inputs are the current temperature
and moisture level. The fuzzy system outputs the best air circulation level: “none”,
“low”, or “high”. The following fuzzy rules are used:
1. If the room is hot, circulate the air a lot.
2. If the room is cool, do not circulate the air.
3. If the room is cool and moist, circulate the air slightly.
⚫ A knowledge engineer determines membership functions that map temperatures
to fuzzy values and map moisture measurements to fuzzy values.
198
Inference
⚫ Evaluates all rules and determines their truth values. If an input does not
precisely correspond to an IF THEN rule, partial matching of the input data is
used to interpolate an answer.
⚫ Continuing the example, suppose that the system has measured temperature
and moisture levels and mapped them to the fuzzy values of .7 and .1
respectively. The system now infers the truth of each fuzzy rule.
⚫ To do this a simple method called MAX-MIN is used. This method sets the
fuzzy value of the THEN clause to the fuzzy value of the IF clause. Thus, the
method infers fuzzy values of 0.7, 0.1, and 0.1 for rules 1, 2, and 3
respectively.
199
Composition
⚫ Combines all fuzzy conclusions obtained by inference into a single conclusion.
Since different fuzzy rules might have different conclusions, consider all rules.
⚫ Continuing the example, each inference suggests a different action
⚫ rule 1 suggests a "high" circulation level
⚫ rule 2 suggests turning off air circulation
⚫ rule 3 suggests a "low" circulation level.
⚫ A simple MAX-MIN method of selection is used where the maximum fuzzy value
of the inferences is used as the final conclusion. So, composition selects a fuzzy
value of 0.7 since this was the highest fuzzy value associated with the inference
conclusions.
200
Defuzzification
⚫ Convert the fuzzy value obtained from composition into a “crisp” value. This
process is often complex since the fuzzy set might not translate directly into a
crisp value.Defuzzification is necessary, since controllers of physical systems
require discrete signals.
⚫ Continuing the example, composition outputs a fuzzy value of 0.7. This
imprecise value is not directly useful since the air circulation levels are “none”,
“low”, and “high”. The defuzzification process converts the fuzzy output of
0.7 into one of the air circulation levels. In this case it is clear that a fuzzy
output of 0.7 indicates that the circulation should be set to “high”.
201
Defuzzification
⚫ There are many defuzzification methods. Two of the more common
techniques are the centroid and maximum methods.
⚫ In the centroid method, the crisp value of the output variable is
computed by finding the variable value of the center of gravity of the
membership function for the fuzzy value.
⚫ In the maximum method, one of the variable values at which the fuzzy
subset has its maximum truth value is chosen as the crisp value for the
output variable.
202
Example: Design of Fuzzy Expert System – Washing
Machine
FDP on AI & Advanced Machine Learning using Data

203
Science, 22/11/19
Fuzzification
Given inputs x1 and x2, find the weight

values associated with each input
membership function.
NM NS Z PS PM
0.7
0.2
X1
W = [0, 0, 0.2, 0.7, 0]

Fuzzy Rules
Not Greasy Medium Greasy
Small Dirt Time= Vshort Medium Long
Medium Dirt Short Medium Long
Large Dirt Medium Long Very Long
DeFuzzification
Washing Time Long = (Y- 30)/(40-30)
Washing Time Medium = (Y- 20)/(30-20)
Very short Short Medium Long Very Long
5 10 20 30 40 60
(Y – 20)/(30-20) = 0.5
X1 and X2 = 0.5 Y – 20 = 0.5* 10 = 5
Y = 25 Mins
Table of Contents
network
17-03-2021 18CSC305J_AI_UNIT3 207
Dempster Shafer Theory
• Dempster Shafer Theory is given by Arthure P.Dempster in 1967 and his student
Glenn Shafer in 1976.
This theory is being released because of following reason:-
• Bayesian theory is only concerned about single evidences.
• Bayesian probability cannot describe ignorance.
• DST is an evidence theory, it combines all possible outcomes of the problem.
Hence it is used to solve problems where there may be a chance that a different
evidence will lead to some different result.
The uncertainity in this model is given by:-
• Consider all possible outcomes.
• Belief will lead to believe in some possibility by bringing out some evidence.
• Plausibility will make evidence compatibility with possible outcomes.
17-03-2021 18CSC305J_AI_UNIT3 208

For eg:-
let us consider a room where four person are presented A, B, C, D(lets say) And suddenly lights out and
when the lights come back B has been died due to stabbing in his back with the help of a knife. No one came
into the room and no one has leaved the room and B has not committed suicide. Then we have to find out
who is the murdrer?
To solve these there are the following possibilities:
• Either {A} or{C} or {D} has killed him.
• Either {A, C} or {C, D} or {A, C} have killed him.
• Or the three of them kill him i.e; {A, C, D}
• None of the kill him {o}(let us say).
These will be the possible evidences by which we can find the murderer by measure of plausibIlity.
Using the above example we can say :
Set of possible conclusion (P): {p1, p2….pn}
where P is set of possible conclusion and cannot be exhaustive means at least one (p)i must be true.
(p)i must be mutually exclusive.
Power Set will contain 2n elements where n is number of elements in the possible set.
For eg:-
If P = { a, b, c}, then Power set is given as
{o, {a}, {b}, {c}, {a, b}, {b, c}, {a, c}, {a, b, c}}= 23 elements.
17-03-2021 18CSC305J_AI_UNIT3 209
• Mass function m(K): It is an interpretation of m({K or B}) i.e; it means there is
evidence for {K or B} which cannot be divided among more specific beliefs for K
and B.
• Belief in K: The belief in element K of Power Set is the sum of masses of element
which are subsets of K. This can be explained through an example
Lets say K = {a, b, c}
Bel(K) = m(a) + m(b) + m(c) + m(a, b) + m(a, c) + m(b, c) + m(a, b, c)
• Plaausiblity in K: It is the sum of masses of set that intersects with K.
i.e; Pl(K) = m(a) + m(b) + m(c) + m(a, b) + m(b, c) + m(a, c) + m(a, b, c)
Characteristics of Dempster Shafer Theory:
• It will ignorance part such that probability of all events aggregate to 1.
• Ignorance is reduced in this theory by adding more and more evidences.
• Combination rule is used to combine various types of possibIlities.
17-03-2021 18CSC305J_AI_UNIT3 210

Advantages:
• As we add more information, uncertainty interval reduces.
• DST has much lower level of ignorance.
• Diagnose Hierarchies can be represented using this.
• Person dealing with such problems is free to think about evidences.
Disadvantages:
• In this computation effort is high, as we have to deal with 2n of sets.
17-03-2021 18CSC305J_AI_UNIT3 211

•P(
) = ({
Dempster Shafer Problem

Example: 4 people (B, J, S and K) are locked in a room when light goes out .
When light comes on, K is dead, staffed whit a knife.
Not suicide (staffed in the back)
No one entered room.
Assume only one killer
P(ϴ) = ({
Detectives after receiving the crime scene, assign mass probabilities to various elements of the
power set:
Event Mass
No one is guilty 0
B is guilty 0.1
J is guilty 0.2
S is guilty 0.1
Either B or J is guilty 0.1
Either B or S is guilty 0.1
Either S or J is guilty 0.3
One of the 3 is guilty 0.1
17-03-2021 18CSC305J_AI_UNIT3 212
Belief in A:
The belief in an element A of the power set is the sum of the masses of
elements which are subsets of A (including A itself)
Ex: Given A= {q1, q2, q3}
Bet (A)
={m(q1)+m(q2)+m(q3)+m(q1,q2)+m(q2,q3),m(q1,q3)+m(q1,q2,q3)}
Ex: Given the above mass assignments,
Bel(B) = m(B) =0.1
Bel (B,J) = m(B)+m(J)+m(B,J) = 0.1+0.2=0.1 0.4
RESULT: A {B} {J} {S} {B,J} {B,S} {S,J} {B,J,S}
•
M(A) 0.1 0.2 0.1 0.1 0.1 0.3 0.1
•
Bel (A) 0.1 0.2 0.1 0.4 0.3 0.6 1.0
17-03-2021 18CSC305J_AI_UNIT3 213

Plausibility of A: pl(A)
The plausibility of an element A, pl(a), is the sum of all the masses of
the sets that instruct with the
Set A : Ex:Pl (B,J) =M(B) +m(J)+M(B,J)+M(B,S)+M(J,S)+M(B,J,S)=0.9
All Result:
A {B} {J} {S} {B,J} {B,S} {S,J} {B,J,S}
M(A) 0.1 0.2 0.1 0.1 0.1 0.3 0.1
Pl (A) 0.4 0.7 0.6 0.9 0.8 0.9 1.0
17-03-2021 18CSC305J_AI_UNIT3 214

Disbelief (or Doubt) in A: dis (A)
The disbelief in A is simply bel (7A)
It is calculated by summing all masses of elements which do not intersect with A
D is (A) = 1-pl (A)
Or
A {B} {J} {S} {B,J} {B,S} {S,J} {B,J,S}
Pl (A) = 1-Dis (A)
Pl(A) 0.4 0.7 0.6 0.9 0.8 0.9 1.0
Dis(A) 0.6 0.3 0.4 0.1 0.2 0.1 0.0

Belief Interval of A:
The certainty associated with a give subset A is defined by the belief interval: [bel(A) p(A)]
Ex . The belief interval of (B,S) IS [0.3,08]
A {B} {J} {S} {B,J} {B,S} {S,J} {B,J,S}

M(A) 0.1 0.2 0.1 0.1 0.1 0.3 0.1
Bel (A) 0.1 0.2 0.1 0.4 0.3 0.6 1.0
Pl(A) 0.4 0.7 0.6 0.9 0.8 0.9 1.0
17-03-2021 18CSC305J_AI_UNIT3 215

P(A) represents the maximum share of the evidence. We could possibly have, if for all its that intouects with A, the part that intracts actually
valid. So, Pl(A) is the max possible value of prof(A).
Belief intervals and Probability
The probability in A falls some ware between bel (A) and pl(A).
-bel (A) represents the evidence. We have for a directly So proof (A) cannot be less than this value.
- PL(A) represents the maximum share of the evidence we could possibly have. If, for all sets that intersects with A, the part that intersects is
actually valid. So, PL(A) is the max possible value of proof(A).
Belief intervals allow Dempster, Shaffer theory to reason about the degree of certainity or certainity of our beliefs.
A small difference between belief and plausibility shows that we are curtain about our belief.
A large difference shows that we are uncertain about our belief.
however, even with a ‘O’ interval, this does not mean we know which conclusion is right.
A {B} {J} {S} {B,J} {B,S} {S,J} {B,J,S}
M(A) 0.1 0.2 0.1 0.1 0.1 0.3 0.1
Bel (A) 0.1 0.2 0.1 0.4 0.3 0.6 1.0
Pl(A) 0.4 0.7 0.6 0.9 0.8 0.9 1.0
Belief {0.1,0.4} {0.2,0.7} {0.1,0.6} {0.4,0.9} {0.3,0.8} {0.6,0.9} {1,1}

17-03-2021 18CSC305J_AI_UNIT3 216
interval
Thank You
17-03-2021 18CSC305J_AI_UNIT3 217

18CSC305J - Artificial Intelligence: Unit - 2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

18CSC305J - Artificial Intelligence: Unit - 2

Uploaded by

Copyright:

Available Formats

18CSC305J– Artificial Intelligence

• Searching techniques – Uninformed search – • AO* research

• Informed search- Generate and test, Best First

•Blind search traversing the •Heuristic search search

• Search space possible conditions and solutions.

Blind Search : Breadth First Search

Blind Search : Depth First Search (DFS)

Blind Search : Depth Limited Search (DLS)

• Searching techniques – Uninformed search – • AO* research

• Informed search- Generate and test, Best First

Numerical comparison for b = 10, d = 5

N(IDS ) = 50 + 400 + 3000 + 20000 + 100000 = 123450

• HOW TO REACH TO THE GOAL?

• This algorithm comes into play when a different cost is

Uniform Cost Search

Uniform-cost search strategy

Uniform Cost Search

Criterion Breadth- Depth-

b: branching factor d: solution depth m: maximum depth

Algorithm Space Time Complete Optimal

• Searching techniques – Uninformed search – • AO* search

• Informed search- Generate and test, Best First

• Generate and Test

• Very simple strategy - just keep guessing.

do while goal not accomplished

• Heuristics may be used to determine the

• TSP - generation of possible

• Idea: use an evaluation function f(n) for each node

• f(n) = estimate of cost from n to goal

f(n) = straightline distance

• Complete? No – can get stuck in loops.

• Searching techniques – Uninformed search – • AO* search

• Informed search- Generate and test, Best First

•Widely known algorithm – (pronounced as “A

G Path cost for S-A-G’

• A heuristic h(n) is admissible if for every node n,

E.g., for the 8-puzzle:

• Searching techniques – Uninformed search – • AO* search

• Informed search- Generate and test, Best First

• Terminal nodes are winning or loosing states

E.g., for the 8-puzzle:

• Searching techniques – Uninformed search – • AO* search

• Informed search- Generate and test, Best First

• State space = set of "complete" configurations

• Searching for a goal state = Climbing to

• Generate-and-test + direction to move.

• Considers all the moves from the current state.

• Selects the best one as the next state.

What if current had a value of 12?

Move 1 gives the value -21 (A is now on the correct support).

• Backtrack to some earlier node and try

• Searching techniques – Uninformed search – • AO* search

• Informed search- Generate and test, Best First

• This is like smoothing the cost landscape.

• One can prove: If T decreases slowly enough, then

• A variation of hill climbing in which, at the

• To do enough exploration of the whole space

• Lowering the chances of getting caught at a

• Searching techniques – Uninformed search – • AO* search

• Informed search- Generate and test, Best First

• Keep track of k states rather than just one, as

• Begins with k randomly generated states

• Successors can become concentrated in a small part of state