You are on page 1of 902

INDEX

S. No Topic Page No.


Week 1
1 Artificial Intelligence: Introduction 1
2 Introduction to AI 28
3 AI Introduction: Philosophy 60
4 AI Introduction 83
5 Introduction: Philosophy 113
6 State Space Search - Introduction 132
7 Search - DFS and BFS 154
8 Search DFID 176
9 Heuristic Search 193
10 Hill climbing 212
11 Solution Space Search,Beam Search 231
12 TSP Greedy Methods 247
13 Tabu Search 266
14 Optimization - I (Simulated Annealing) 280
15 Optimization II (Genetic Algorithms) 299
16 Population based methods for Optimization 319
17 Population Based Methods II 336
18 Branch and Bound, Dijkstra's Algorithm 353
19 A* Algorithm 372
20 Admissibility of A* 395
21 A* Monotone Property, Iterative Deeping A* 425
22 Recursive Best First Search, Sequence Allignment 438
23 Pruning the Open and Closed lists 459
24 Problem Decomposition with Goal Trees 480
25 AO* Algorithm 496
26 Game Playing 509
27 Game Playing- Minimax Search 527
28 Game Playing - AlphaBeta 546
29 Game Playing-SSS * 564
30 Rule Based Systems 580
31 Inference Engines 603
32 Rete Algorithm 622
33 Planning 639
34 Planning FSSP, BSSP 657
35 Goal Stack Planning Sussman's Anomaly 673
36 Non-linear planning 690
37 Plan Space Planning 706
38 GraphPlan 721
39 Constraint Satisfaction Problems 738
40 CSP continued 755
41 Knowledge-based systems 772
42 Knowledge-based Systems, PL 790
43 Propositional Logic 804
44 Resolution Refutation for PL 820
45 First-order Logic (FOL) 832
46 Reasoning in FOL 846
47 Backward chaining 867
48 Resolution for FOL 884
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Lecture - 01
Artificial Intelligence Introduction

(Refer Slide Time: 00:20)

Welcome to this course on A I, as we mentioned let me first start, today with the syllabus
that we are going to cover, and within the little bit of this in the last class, towards the
end, and I am just repeating this for continuity. So, will spend, the first few weeks, not
two or three lectures, on the first part of the course, which is the history and philosophy
of A I. And we as we will see, goes back quite a bit in time, and this is qualitatively
going to be, very different from the rest of the course, which is going to be mostly
algorithms.

And will start with the simplest algorithm like, depth first search, breadth first search and
so on, move on to heuristic search, in which we look at how search can be guided,
towards the solution that we are trying to find, and we look at algorithm like hill
climbing, and tabu search, and ((Refer Time: 01:12)).We will find that even that is not
going to be good enough, so we will try some randomized approaches like simulated
annealing, genetic algorithms, and ant colony optimization.

1
These are basically optimization techniques, but we will try to see them from the search
perspective, when we will look at, very well known algorithm called A star and it is
variations, which we will see. Then as I mentioned earlier that we will look at, something
called goal trees or problem decomposition that if you want to solve a problem, and you
want to break it up into parts, and solve each parts separately, that technique is called
problem decomposition.

Let to an area called rule based systems which we will look at, will also do game
playing, may perhaps not as late as this, may be somewhere here. So that, I can give you
one assignment, to start off with which is to implement the game playing program? And
finally, depending on how much time we have left, we should have, something on
planning and constraint satisfaction, which is kind of preview of the course that we offer
next semester.

In which we will study this algorithm like alphabeta algorithm, minimax algorithm, and a
heuristic version called S S S star. And then depending upon how much time we have
will spend some time on, these two topics planning and constraint satisfaction, in which
we look at, general algorithms for planning. And we will see, by planning essentially we
mean finding a sequence of actions, which does something useful for you, and we will
also look at logic and inferences. Because it is not that we are just solving problems, of
how to do things, but we also making inferences, that if we know something, then we
know something else.

So, that is a process of making inferences, and the language that we use for
representation is logic, and we will spend some time that. So, these two topics are
actually covered independently, and completely in two different courses that we offer
next semester. One is called planning and constraint satisfaction, and the other one is
called knowledge representation reasoning, which is not the title we are using here.

2
(Refer Slide Time: 03:27)

So, the text book that we will follow is, the book which I have just published, it is just
about come out and, there are some text books in A I which have been, very popular and,
earlier I was using a lot of a material from here. So, rich and knight book on A I, Russell
and Norvig, which is probably the most, well known text book at this point of time, and a
book by Winston which was written earlier.

Then there are certain specialized books, so these two books by Fogel and Michalewicz
is on certain aspects that we will cover, and this book by Judea pearl is something we
will use while game playing essentially. And these two books which I will just mention
again, deal with the history and the philosophy part of A I essentially.

3
(Refer Slide Time: 04:20)

So, these two books that I mentioned, and this is going to be the subject matter of the
first few lectures, the historical and the philosophical perspectives to A I. And you can
see that, is a topic, because we use this word intelligence here, and that is something
which has concerned people over a lot of time essentially. And we want to see, what has
been the thought, behind what is A I essentially. So, these two books are, and I would
recommend that you, read at least portions of this, there is the book called A I the very
idea.

And we will discuss shortly, why this book is different from the rest, John Haugeland is
a philosopher by profession, not computer scientist, and he is looking at the
philosophical side of things that, one of the key questions we will ask. And today we will
start doing that is, can machines think, I wanted to start already thinking about this
question. And today we should discuss some of these basic concepts, what is intelligence
for example, and Haugeland looks into the philosophy behind this, Pamela McCorduck is
also from the social sciences, and she wrote this book quite, long time ago, actually 1974
or something like that.

And I hope you will notice that the title is, if nothing else at least a little provocative,
because she uses the pronoun, who for machines. So, she has machines who think, and
who is something that we normally, use for peoples essentially, human beings and so on
and so forth. So, she is talking about machines, who think not machines which think for

4
example, and therefore, already there is a suggestion, that her own intimations is to
believe that, yes it is possible that machines can think.

And these two books, we will follow in the slides that I have prepared, are mostly from
these two books and a little bit from Wikipedia, so I will give you all those sources, from
the rest of the course I will not use slides very much, I we will just discuss things on the
board essentially. So, I want today’s class to be little bit interactive, well not just today’s
class, but today’s class will be more interactive. And I wanted to start thinking about
question of what is intelligence, and we will discuss that, but before we do that, let us
just look at, what are the classical definitions that people have given, for this field of
artificial intelligence.

(Refer Slide Time: 06:58)

So, let us see first, what Herbert Simon has to say, Herbert Simon was one of the
founding persons in this area of A I, starting in the 1950s, he and his collaborator Allen
Newell, they founded the school at Carnegie Mellon university. And we will see, their
contribution as we go along, Simon also one of the few people, who works in A I, whose
got a Nobel prize. As you know, we do not get Nobel prize in computer science, but
Simon got one for economics, and he was the multifaceted person, he did many things,
as people used to be earlier.

So, his definition is we call programs intelligent, if they exhibit behaviors that would be
regarded intelligent, if they were done by human beings. So, this is the most common

5
definition of A I that people use, that it is concerned with lighting programs or making
machines do things, which should be considered intelligent by, if they were done by
human beings essentially. So, what are the first things that A I people got into was,
things like chess playing essentially, because chess playing was always considered to be
a hallmark of intelligent behaviors essentially. It is only the bright, and the intelligent
people who could play good chess.

(Refer Slide Time: 08:55)

There is a long story of chess playing, the first programs were written in 1950’s, one of
the first outline of the game was given by pone Neumann in the 60’s grand master called
David levy, I do not know whether I have it in my history, but may be it will come later.
So, let us write it here, around 1968 also, he wagered the bet that to chess program,
cannot beat him for the next ten years. Because, chess was considered to be something
which is very intellectual in nature, well luckily for him, he won his bet, which is
because it ended in 1978.

But, many of you would know that, in the mid 90s, late 90s the then world champion
Garry Kasparov was beaten by chess playing program essentially. Chess in fact, it is not
so, intellectual in the sense that we tend to, talk about you know, philosophical sense.
Yes, it requires lot of computing machinery, and we will see that, if you have a lot of
computing machinery, you can play good chess.

6
(Refer Slide Time: 10:06)

Let us look at another old definition; this is by Barr and Feigenbaum, also two old timers
in A I, so his frictions says that, physicist ask what kind of place is universe is, and seek
to characterize the behavior systematically. Biologists ask, what it means to be a physical
system to be living, and he says we in A I wonder, what kind of information processing
system can ask this such questions essentially. So, in other words, he asking about,
talking about intelligence, that physicists are asking questions about the physical world,
biologists are asking questions about the living creatures, what kind of information
processing system, could ask such questions.

So, essentially saying what kind of system would be intelligent, in that sense of the world
essentially, when Elaine Rich as I mentioned one of the popular books in A I, she wrote
one in eighty three or something or eighty six. And she gives a computer science flavor
to the definition; she says that, A I is the study of techniques for solving exponentially
hard problems in polynomial time essentially, by exploiting knowledge, about the
problem domain.

Of course, those of you, who are diehard theory people, would immediately object,
saying that you cannot solve a hard problem in polynomial time, because by definition, it
is a hard problem. But there are two counters to this, one is that, we may not necessarily,
be looking for solving them in polynomial time in the worst case. In certain situations
like, we will see travelling salesman problem, is one of the hardest problems, that people

7
have encountered. But given some constraints on the problem, of how the edges are
connected, what are the weights on the edges, you can have much faster solutions.

The second counter to this, objection that you cannot sign, that you cannot solve
problems in polynomial time is that we are not seeking to find optimal solutions. And
this is something, which many, many people have observed, that human beings are not
optimizers, we do not necessarily find, what solutions? The solution that we considered
to be optimal, we are what some people called as satisfiers, satisfiers essentially, which
says, that you are happy with the good solutions essentially, you does not have to be
optimal essentially.

So, just an example, with sort of strikes me once in a while, living in Chennai, that if you
have walking along one of the roads in I I T, may be one thing that you want to optimize
on, the amount of shade that you walk through, but we do not have such dense ((Refer
Time: 12:47)) at everywhere there is shades, so you have to choose a path essentially.
And even if, one is conscious of the fact, that one wants to walk through shade and with
one does not mind walking a little bit longer. So that, our objective function is to
maximize shade, and not worry too much about the length of our path, even then, we do
not go into zigzag path that we would, if we want to really follow the shade essentially.

So, we do not optimize in that sense, even when you want to be away from the sun, you
are happy that if the path that we are following, as enough lot of shade, not necessarily
the maximum amount of shade essentially. So, in that sense we do not solve, hard
problems completely, we do not find optimal solutions, but we tend to find good
solutions essentially, and that is what we do all the time. We go shopping; you do not,
check in ten places then find the minimal cost price, and then buy your product. Even
though on the web nowadays you can do that sort of a thing, but in general if you think
that the price is reasonable, we go and buy this stuff essentially.

And one more definition, which is due to Charniak and McDermott, who also wrote a
very famous book, on A I very popular book which, I use for part of my session, I do not
think I mentioned it may be I should added to the list there. They talk about A I, being
the study of mental faculties, through the use of computational models. So, we had said
earlier that, there are two approaches to A I, one is the cognitive approach which says,
which I, we are trying to understand intelligence.

8
And the other is the engineering approach which says that, we want to build smart
systems or smart apps if you want to say nowadays essentially. So, what this definitions
says is that, we want to study mental faculties, and to do that we will be computational
model, and use them for the studies actually, where definition, which I like most, before I
come to that, look at these definitions. They are saying, if a human being does this, then
it is intelligent, and we want to sort of do something similar, so we want to mimic human
intelligence.

(Refer Slide Time: 15:05)

So, the definition which I like most is come from not a computer scientist, but from a
philosopher, that we mentioned John Haugeland in the book A I, the very idea. He says,
that the fundamental goal of A I, is not merely to mimic intelligence or produce some
cleaver fake of intelligence, he says that not the goal at all. A I wants the genuine article,
machines with minds, of their own in the full and the literal sense. Now, it is a very
interesting question, and we would debate it today a little bit, in the class, as to what we
mean by intelligence and can machine have machines have it.

And then you goes on to say, and all this is in this book here, that this is not science
friction, but real science based on the theoretical conception, as deep and daring, namely
that we are at the root computers ourselves, essentially. So, if you are at the root,
computers ourselves, which means if you are at the root machines ourselves, then to
answer the questions can machine think has been ((Refer Time: 16:01)) solve essentially,

9
because yes, human beings can think and therefore, machines can think essentially. But
the idea that we want to pursue, is that the idea that thinking and computing are radically
the same, is idea in his book, which is A I the very idea, it is very interesting book.

And for those of you of philosophically inclined, should go and have look at it, and this
idea, that thinking and computing are kind of tied up together, goes back much before
Haugeland. And we will see, either in today’s class or in the next class, that the British
philosophers Thomas Hobbes, was one of the first person through, put forward this idea.
Hobbes of course, was not a computer scientist, in those days, there was no computer
science, he was a political scientist, and this kind of stuff.

(Refer Slide Time: 16:54)

So, let us, get to the fundamental questions, and this is the part that, I want you to, give
answers to or what do you think about this question? So, I have not written any answers
for this. I have just written the questions, and I will write the answers on the board as an
when, they come out from, the class essentially. So, the question you want to ask is, what
is intelligence? I mean if there is going to be ever a debate about whether machines can
be intelligent or not machines can think on are, first we should be clear is to, what do we
mean by intelligent, I mean if I write a program is, let say the singular value
decomposition of a matrix, would that is a program intelligent, well I do not know.

So, can I have some responses from the class, what is intelligence, what is let us forget
about what is thinking? Let us say, because thinking is this thing, but when is, when

10
would something be call intelligent, what is intelligence, what would you require in a
system or in agent, for you to call it intelligent, what are the fundamental characteristic
of intelligent behavior?

Student: Ability to take decisions.

(Refer Slide Time: 18:24)

That is very generic, yes definitely a part of intelligence, but may be if you could expand
on that, from a little bit. Example is you know you have a small program, which says if
something, then something else, it is also doing taking some decision, by looking at some
data; obviously, you are looking at something and taking a decision.

Student: Use of knowledge to respond to new situations.

Use of knowledge of course, you will have to tell me, what do you mean by knowledge.
And this definition has a little bit of inconsistency, built into it, in the sense that, most of
the time when you use knowledge or experience, exploit experience we use them in
situations, which are similar, which are not entirely new in that sense. Well if by new
situation, you mean a new problem, then one has to ask the question, what do you mean
by that essentially? You know there is the ((Refer Time: 19:38)) saying which says that,
you can never step into the same river twice, essentially, that is never the same thing.

But of course, nevertheless I will, I am not disputing, what you are saying, I am just
trying to get people to respond more, we do as human beings, you make extensive use of

11
knowledge, and we spend close to, what should I say, twenty two years, twenty five
years, acquiring knowledge ((Refer Time: 20:05)) will later use, essentially in our lives
essentially. Human being, humans have a very different kind of a species I think, I mean
we are the only species, which has schools up to twelfth standard, and then college four
years after that and then, masters and may be you know p h d in some cases. No other
species spend so much time, acquiring knowledge essentially.

Student: Sir, we able to make inductive inferences, and something which others senses
just follow from your input, but to be able to make some new.

Ok

Assumptions

So, I will just use a term inductive inferences or in other words to generalize, ability to
generalize. So, you go to the some hotel and you ate masala dosa, and you are happy,
you come back. Next time you go there, and you have something else, let us say
oottapam and you come back, and then you generalize, that this hotel, gives you good
food or you might say that, you know, south Indian food is very good.

These kind of inferences that we come to, is making inductive inferences, you we look at
a few instances of something, and then from where, we generalize, that you know, it
holds for a certain class of things essentially. I see, a few leaves, and all of them are
green, then I conclude that all leaves are green essentially, which of course, does not true
at least not all the time, may be in Chennai yes, when they, when we have leaves, but not
in the rest of the world.

Student: Basically, spending that definition applicable, generalize and classify.

Classify would come in this making decisions, what else have is that all that we do as
human beings, is that all we lay our claim to for being intelligent.

Student: Choosing the best available of ((Refer Time: 22:10))

Well when that comes here, choosing best options.

Student: Ability to learn.

12
Ability to learn yes, which is a little bit difference from here, and we can say by learn we
mean acquire knowledge, one can learn from once own experience, you do to something
with gives a little bit of a pain. So, maybe you touch a hot stove or something like that,
two, three times, and then you learn, that is again inductive inferences essentially, but to
learn all kinds of things to learn facts.

(Refer Slide Time: 23:19)

To learn relations between things, is something that, we do quite effectively. So, what do
you mean by this? Communication.

Student: not

So, but there is a more fundamental thing, to I mean, expressing well something,
incidentally is something which is a feedback, we get from all the companies which
come to higher people here. Let us say that our students are not good at communications
essentially, but that is not the idea, even that is not about your talking about I think, the
very fact that we can, communicate something. So, let me go to the fundamental thing,
what does this lie on, something which if is specific to the human species.

Student: speech

Speech, the speech, before speech use of language, language is something which is
unique to; at least we think it is unique to our species. There are doubts that you know
may be, whales communicate over long distances, and dolphins can communicate, and

13
that cannot stop, but we are not quite sure. And we do see that, there are other creatures
which make sounds, which are; obviously, aimed or directed at least towards their own
species, but it is not clear to us, what they are proving actually.

So, it is a use of language, which us enabled us to carry forward knowledge. So, if you
have a brilliant scientist like Newton, whose thinking about the universe, and the world
around him, and coming to conclusions, and arriving at some understanding of how the
world operates, the fruit of his effort is available to us, and it is available to us, only
through the medium of language essentially. Because, we can talk to other people,
because we can write books, so printing of course, was another invention which help this
process, but this simply be able to communicate, to tell stories, this whole idea ((Refer
Time: 25:52))

You know that is stories are passed on from one person to the next, like all the stories
that we hear in our subcontinent, the Ramayan the Mahabharat and so on, where sort of
overly conveyed from generation to generation. And all that is possible, entirely through
the use of language, it is language, which has allowed us, to hold down to whatever
knowledge we get from, our interactions with the world, and pass it on to other people
essentially, anything else, can one think of. So, will take this, as part of thing, and then
we will see, whether machines can be intelligent.

(Refer Slide Time: 26:46)

14
So, let me move on, a little bit and ask the next question, this is not the very complicated
question; I just want to be sure that we are all on the same page, because I need talk of
machines thinking and so on. So, what do you mean by a machine, otherwise we will be
stuck with trying to answer a question, that can machines think, without knowing what
we mean by thinking, and without knowing what exactly we mean by machines
essentially. So, both these terms we should know, that is what do we mean by that
essentially.

Student: Why which does a particular task repeated.

A device which does the particular task repeatedly.

Student: However.

I am not going to write this here, is that complete enough definition of a machine.

Student: Device that has reduces human effort.

A device that reduces human effort, what about an exercising machine? Treadmill or
something.

Student: Computations

Something that there is computation, but computation is only, one kind of activity that
we consider, we have a machine which grains coffee beans for you, I do not know that is
doing computation. Now more fundamentally, when will I call something a machine that
is what I mean by the questions essentially? So, if it is not a machine, what can it be?

Student: It follows the cable instructions; you instructed and do the work for you. Does
not think on it is own.

He says, does not think on it is own says ((Refer Time: 28:31)) get the answer to the
question that can machines think. So, machines are thinks, which cannot think on their
own. Now, this bit about following instructions, I do not know, I mean there are, of
course at some stage, in the life of the machine, there are instructions given to a machine.
So, but if I have a air conditional like in this room or thermostats somewhere, it is not
really following instructions.

15
But ((Refer Time: 28:56)) some coding or something.

Yes some that is what I say, that some stages it is life, some instructions were given to it,
but then I can say the same thing about you as a person, that you are following
instructions, your parents said go and attend lectures, do not bunk classes, that is why
you are sitting here in this class ((Refer Time: 29:16)) more fundamentally, what is this,
when would I call something on machine.

(Refer Slide Time: 29:24)

So, let me give a circular definition, something which acts mechanically; of course, as
((Refer Time: 29:37)) it is a circular definition, that is using the term machine and
mechanical, they are related to each other. So, it is not really a good definition in that
sense, but it gives us an idea, what I am trying to convey essentially. Because, we can
express this more easily, when do you say that something is acting mechanically, and I
do not want the answer that without thinking, because thinking is a something, which
happens at a different level all together as we will see.

Basically in a well-defined manner, according to certain rules, let us say laws of physics,
if it is a physical machine or some other mathematical laws, if it is some computing
machine, something which operates, according to fixed set of rules. So, the question that
one ask is, and will come to that in a moment, so this is the question which has raised.
So, just to be cleared is a computer a machine, it does operate according to some very
well defined laws and so on.

16
Of course, a computer is a, very special kind of a machine, it is a very flexible kind of a
machine, which says, so this whole idea store program, which we discovered? It is
discovered, not quite discovered, but at least, brought forward by Charles Babbage which
says that, you can have a same machine, and you can put in a different program. And it
will do something different for you essentially, make it a very flexible machine, but
nevertheless, it is the machine, because at the base, there is something which is very
repetitive which is going on.

And whenever we say, machine in the rest of this course, basically we will mean a
program in computer. So, when we say, can a machine think; then it means can we
program a computer, so that it appears to be thinking or is thinking, as this. So, this is a
question that is fundamental in the sense, there was a edging debate as we will see some
arguments against thinking, in the next slide. In the last fifty years, sixty years people
have been talking about, whether machines can think or not.

So, what does, so does anyone here, have a strong opinion either side. So, when I say, by
this time, I mean a computer program, can I program a computers, so it is a thinking
machine, is that possible at all. And we try to find some aspects of what we call
intelligent behavior or is there something missing that we have not mentioned here, we
forgot to mention here, which the computer cannot do, can never do, is there something
like the halting problem ((Refer Time: 32:30)) situation here. So, does anyone have a
opinion either ways, there is anyone strongly feel that yes machines can think, there is
nothing fundamentally against it or there anyone have a opinion which says, no machines
cannot think, only we human beings can think essentially.

Student: ((Refer Time: 32:53)) did not tell what is thinking?

Well, I that is the first question I started asking you will. So, we wrote all this stuff by
saying that if you are using this.

Student: intelligence.

So, we sort of say that they are closely correlated, thinking is the process how to of
which intelligence arises, we might say. So, no one has the strong opinion I take it
essentially. So, that is fine, there is nothing either ways, and finally, as Haugeland said, I
mean, I that the and to what Haugeland thinks about this question, that are we machines

17
is already here, in his answers essentially, he thinks that we are machines. But, is anyone
here, who feels that strongly about this, that yes we are machines or no we are not
machines, we are flesh and blood creatures of carbon, we are not made of silicon, any
strong views.

So, supposing I would to say, let us try and put forward the idea that we are machines,
what is the argument that you would give, to say that yes, we are also machines. So, one
of the fundamental objections, the people ask, there is that, you know machines versus
whatever it is, which is called as free will. So, when I asked you little while ago as to
what would be, if you were not a machine, then the answer that some people give is that,
you have a own free will. So, in some sense, a machine does not have it any free will
essentially, a machine operates according to fix set of instructions, and fix set of laws,
and always obeys those instructions and laws essentially.

Whereas, free will, which you do not understand, we do not know whether we have free
will or not, I mean people claim that human beings have free will, but they all go and
vote for some congress and b j p all the time essentially. So, but anyway, what is this
thing called free will, basically says that we make choices, that we have, the ones who
decide, how our lives will be, how what we will do in the next instant, and thinks like
that. You know, your open philosophy is like existentialism dealt quite a bit, in the post
what period, about this notion of free will, and you know making choices think ((Refer
Time: 35:35))

So, if you want machines, then we would not have something called free will; or is that a
contradiction; or if we are machines do we like, some of the Indian thought says, that
everything is free decided, like this say, whatever have to happen will happen essentially.
Of course, then we are all machines, and then, there is no second thought about it, but if I
want to sort of deconstruct say, we are machines, because of this reason, I could sort of
give you an argument, which says that. We grow out of a single cell, to start with
instructions written in our genetic code, about how to will our bodies, what color to of
eyes to have, all kind of things.

And then essentially, we build ourselves using this thing and therefore, we become
human beings and, just like computers are flexible, and they can do different things, at
different times. We also flexible, may be a little bit more than the current day computers,

18
but we are in the end, we are machines essentially or I could give you an argument which
says that, see our brain is made up of a, ten to hundred billion neurons, all of them
operate in by a very simple mechanical procedure. So, our brains are mechanical in
nature, and therefore, since a brains control us we are mechanical in nature, I could give
argument like this. So, what will you say against it, I mean if you were to say anything
against it.

Student: We have something called emotion that is not in machines.

We have something called emotion that is not in machine essentially.

Student: We are biased to our emotion.

But how do you know, it is not in the machine.

Student: Suppose, I turnoff my computer.

Suppose you are system patches, can we say it is angry with you, I mean it may not
display it in other ways I think, no more seriously, why should we say that, machines
cannot have emotions. So, I will pointed to a book, it is called the emotion machine, and
it is written by a guy called Marvin Minsky, was also one of the founders of A I. As we
will see the history of A I, as we go along, he founded the m i t A I lab, along with John
McCarthy, and he has it is in the last five, six years odd, he is written this book called,
the emotion machine essentially.

So, it actually goes ((Refer Time: 38:36)) again the slightly longer divide as so what do
you mean by emotion and so on and so forth. I could try to characterize emotion by
saying that, you have memories, and then you have some value, labels attached to
memories, that some memories are good; some memories are bad. And then you have
states, which are attached to those value labels, so you are happy or you are sad. So, one
could talk about things like that, but is it something, which is exclusive to us, I do not
know, and do creatures like dogs and cats have emotions.

Student: Yes

19
They have, but are they also intelligent or that is another question, is intelligence the
prerogative of human beings, only or do we allow dogs and cats, and deer and monkey,
to be intelligent or not.

Student: Yes

But, if you go down this, ladder of life, so to speak, then you have dogs and cats, then
you have mosquitoes somewhere here, then you have bacteria, then you have virus. So,
at which point, you stop essentially. We will we are not here to answer this question, we
are here to keep in mind, that these questions have been asked by many people, and this
is not the goal, our goal to you know, it is not a course on philosophy, but still we should
be aware of it. So, here small cartoon I got from, so our, if we were machines yes then, I
suppose our admiration would be mutual happy or if you want to call as admiration.

(Refer Slide Time: 40:20)

So, let me give you some arguments, which are well known in literature, which claim
that machines can, the question we asking is, can the machine think, can machine think.
So, what are the objections, the first ((Refer Time: 40:40)) guy call Herbert Dreyfus says
that, intelligence depends upon unconscious instincts, that can never be captured in
formal rules essentially. So, you cannot read this, I did not know how to make this a bit
stronger, darker, whether basically a Wikipedia page, which is critiques of A I,
essentially.

20
Dreyfus spent, he has made a carrier out of saying that A I is not possible essentially. So,
at least he is made a carrier out of it, what you think about these unconscious instincts
that can never be captured in formal rules. So, this is one of the arguments which people
say these kinds of arguments which say that we often do not know what we are doing?
Why we are doing something? I did this, but I did not know why I did this, but does this
say that, I was doing something really mysterious, which I cannot reproduce in a
machine. Let us together argument by philosopher John Searle, it is called the Chinese
room argument, he says can an agent locked in a room processing questions in Chinese,
based on a set of syntactic rules, be said to understand Chinese. So, is an, it is a thought
experiment which John Searle proposes, it is a very famous argument, just lookup the
Chinese, whom argument on the web, when you will get all these descriptions.

So, the idea is that, supposing you as a English speaking person; or whatever Hindi; or
Tamil speaking person, you all locked up in the room. And you are full of these slips of
paper, which have these syntactic rules, which says if you see this pattern, then send out
this response, if you see this pattern, then send out this response. You do not know, what
that thing is about, you see some patterns, and you have an instructed, to loop match a
pattern, and send out a response based on that. And you are there somebody; from
outside below the door slipping, sending you slip of paper, with some patterns, then you
make some other patterns on slips of paper, and send them back essentially.

You do not know, what is happening? What it turns out apparently at the end of this, is
that somebody is asking questions in Chinese, and you are giving them answers in
Chinese. So, John Searle says, and this is the Chinese room experiment, thought
experiment, says that supposing this were to happen, would you say that, the person
whose answering you, those Chinese. And he says no, because the way that experiment
has been described, and he says that therefore, but his behavior looks like intelligent
behavior, because he is giving you all the answers, but he said really intelligence, he says
no essentially.

And of course, there is a little bit of an operational trap there, which is what I written
here, how many rules will an agent need to have, for the thought experiment to be
convincing essentially. And we will see this idea, again in a different form, as we go
along, one more objection from the celebrated mathematical physicist John Roger
Penrose, you must have heard about him, those who have Nobel laureate, he wrote this

21
book with, which became quite a hit essentially, it was called the emperors you mind
essentially.

(Refer Slide Time: 44:29)

If you write the name, you know, so parading the emperor’s new clothes, and he is also
asking this question about, can one we can machines think or not, his answer is that, no
machines cannot think. We are the only thinking creatures, and he says that there
something happening in our brains, which current day physics cannot understand, cannot
explain essentially. And that is something he says respective quantum mechanical, if you
want to go into the details, you should look up the web, and read his book essentially,
which is not so easy to read.

But still, he wrote a later book, I forgot it is name, which is the shorter version of this
book. So, that is another argument, then there are arguments like, he mentioned emotion,
intuition, consciousness, ethics. So, some people say, it would not be ethical to have
intelligent machines, so they cannot be intelligent. Now, this is kind of round about
argument which says, it would be bad for, I do not know who, so we cannot have
intelligent machines essentially. Of course, we are very ethical people, and we go around
suspending twenty eight year old IAS officers, because of some small residues that we
have against them.

22
(Refer Slide Time: 46:18)

So, there are many arguments, which a co ordination and they have been many counters
to the argument which I have not talked about, because he wants to get on, to what
Turing said. So, you all know Alan Turing, he was very instrumental in tracking codes,
during world war, this thing, what he says, that he would have been one hundred and one
years old. If he were alive today, what he says last year was his birth centenary and lots
of things were going on, he says that the question whether machines can think is just a
meaningless question.

Because we are not able to, even describe with we made an attempt here, to say what is
thinking, but it not very clear to say, what is thinking I mean I keep ((Refer Time:
47:07)) and thinks like that are of course, meaningless essentially. As his I guess g e and
certain essentially, what he did, was that, let us not get into this raising debate of, can a
machine think or not. He says I will give you a test, which is called as a imitation game,
which we will see in the next slide, which is now known as the turing test, then nothing
to do with turing machines, of this he says, about this turing test, we will see in a
moment.

23
(Refer Slide Time: 47:47)

Let us first see the test, and then come back. The turing test is like this, that there is a
human judge, in this something has happened to this anyway, there is a human judge
sitting on in those is a teletype, in current they were in may be on a mobile phone
chatting with someone. So, you are chatting with someone, you type in something, and
somebody else types backs something and so on and so forth. So, he imagines that
teletype, connected to a machine on the other side, but there is a wall in between, so you
do not know whether it is a machine or whether it is a human being essentially.

And what turing said, was that if he gave a figure like, seventy percent of the time, the
machine can fool the judge into thinking that the judge is talking to a human being; then
the machine is intelligent. We will come back to, the test again, so what it turing feel, he
felt and this was in 1950, when he wrote this paper, called computer machinery and
intelligent, it is available on the web, if you go to many places, you will just get the paper
directly. He says that in about fifty years of time, which is 2000 in year, 2000 will be
possible to program computers with a storage capacity of 10 is to 9, so 10 is to 9 was
considered to be a big number, and histories repeat with these kind of example.

Bill gates apparently had one said that, who on earth will need the memory more than
sixty four k essentially. So, he said that, with the capacity of 10 is to 9 to make them,
play the imitation game, the game that we do describe. So well, that an average
interrogator, will not have more than 70 percent chance of making the right

24
identification, after five minutes of questioning. And many says, that I believe that the
end of the century, which is at the end of fifty years, use of words and general educated
opinion will be altered so much, that one will be able to speak of machines thinking
without expecting, to be contradicted.

Very difficult to make predictions, in these kind of matters, David levy said that, he no
machine can beat him, Alan Turing says that, all machine the machines will ((Refer
Time: 50:05)) pass turing test, both was wrong in the sense, that we still cannot say that,
you know, we have machines which pass turing test. So, currently there is something
called a Loebner prize, which has been instituted by Agricola Loebner as a name
suggest, it is an annual competition, where they are judged by for human like response.

So, it is not as here fooling something, but for human like response, and there is a grand
prize of 100000 dollars, in case who interested in some pocket money, I mean say it is
still open essentially. So, there are two, so there is a question which I want to, ask a few
week do not have too much time, will have to stop soon. And we will start with this in
the next class, is to what you think of the turing test, as a test of intelligence remembers,
now he himself said that do not talk about thinking and all, but just accept this test.

(Refer Slide Time: 51:06)

Let me just show you, some examples of this year’s competition, which I got from the
web, so this year, there are four finalist, who are going to compete on September 14, this
finalist have been selected based on some earlier rounds, and this is the transcript from,

25
the leader of those four best one. So, let me just read it out, so there is a judge and there
this program called izar. So, the judge says hello, I am Ronan, that is the name of the
judge, what is your name, the program response, I am just try to imagine, how would you
write a program, which would respond like this.

This is the output of this program, which is called izar, it is says Ronan, it is about time
you contacted me, I was waiting for you, my name is izar, pronounced I zar, but you can
call me izzy if you want, doing anything fun, too much recursion in A I M l and so on
and so forth. Judge says, at some later point I have just not I have not given you the full
essentially, I like bananas, which is your favorite fruit. He says tell me more about, your
unknown, this is the trick, which the program is employing, to answer a question, which
is it cannot answer very easily, you are not only one. So, you have to build in this kind of
tricks, human being also do that, if you are taking a viva about something, you know, I
do not have a problem with banana, is that your favorite fruit the obvious one and so on.
Then he says, I have been getting into hoomii, a type of mongolian throat singing. So,
you have to put in certain amounts of knowledge in your system, to be able to convince
the listen, so he is trying to impress.

(Refer Slide Time: 52:55)

So, let me leave you with a program which was written in1960 or something, this
program is called eliza, you must might have heard about it, it was named after eliza
Doolittle, who was a character in Bernard Shaw’s play called Pygmalion, and we will

26
visit Pygmalion again later, it was a very simple n l p program written, at m i t by
weizenbaum in 1966. It use simple rules to manipulate language, it would read what the
users written, manipulated little bit, and throw it back.

So, it says, if you go and say for example, somebody will say, so for example, if you
want to say, I like bananas, if it simply say, why do like bananas. So, it just twist that,
and send it back to you. And there popular version called doctor, which I am sure you
might have seen, it runs a script which makes it looks like psychotherapist essentially. It
of course, makes it easy to ask questions, it can always one of the standard questions
these program ask is, tell me more about your family. You know, if they cannot say
anything else, ((Refer Time: 53:58)) tell you more about your family, and as a human
being, you would so this program is doing some deep analysis, ((Refer Time: 54:05))

So, here is the Russian scientist, who was visiting Stanford, who was running a version
of this, we just read this. So, I have colored these things to show you that you know, it
just twisting that sentence, in this thing. So, these are, this is, so there was a scientist
apparently, after this conversation he started pouring out, all his words to this program
and so on and so forth. And Weizenbaum found that his secretary was all the time talking
to this program, and apparently she was quite furious, when she found out that
Weizenbaum had access to those conversations essentially.

And nowadays of course, you know prism, and everything, Weizenbaum actually found
that peoples responses, words are disturbing that he wrote a book, which says that no, no
computers cannot do all this kind of thing essentially. So, we are gullible, and I think we
will take it up, in the next class, with some even older examples of how, we look at
something, and we believe that it is doing something in the intelligence for us essentially.
Meanwhile I would like you to think about this turing test, in the next class on
Wednesday, we will start discussing, what we think about the turing test essentially.

27
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras
Module - 01
Lecture - 02
Introduction to AI
(Refer Slide Time: 00:14)

No one; you heard about this creature; kirobo is a small robot, 34 centimeter; it comes to
about your knee or something; build by Japan, needless to say, it is Japanese based ahead
of the rest of the world in robotics. And, this robot has been sent to space on Sunday that
is how it came into news. If you are watching some news channel, well, but may be BBC
or something else, the Japanese space agency has sent this robot into space.

It is a small robot which can recognize speech, understand what you are saying, talk back
and recognize faces, and so on, exactly. And, the idea is that this robot will be a
companion for a Japanese astronaut who is scheduled to go in November sometime; and,
that is an interesting idea, robots as companions of people.

28
(Refer Slide Time: 01:36)

So, if you remember, we had talked about, I had mentioned about this chess grandmaster
David Levy. In 1968 he had a chess bet we had talked about this in the last class. And, he
had said that no program can beat him for the next 10 years which he luckily survived
the bet. Because, now as we know, chess playing programs are much better, but in 2008
he is talking about robot companions.

So, he is come from one end of the pendulum in which he believe that chess, a program,
a computer program could not do anything interesting, to the other end where he believes
that robots can be companions to human beings. So, he wrote this book, it is called, it has
published in 2008, and the title of the book is “Love and Sex with Robots”.

And, the idea behind the book is something which many parts of the world now looking
at; specially those parts of the world which have aging populations where they do not
have enough young to take care of the old. But, of course, he is not talking of old here,
but where robots could take care of people. So, robots can be companions and so on. So,
that is the book he wrote in 2008. And, for example, what looks like a young lady,
whereas in fact, it is a robot, the thing here.

So, this idea of creating robots in the image of us has been around for a long time, and
we will look at some of this history today; and, robotic companions could well be there
in the future essentially. So, we saw in the last class; so, I just quickly go over this. These
are the syllabus which will be available in some place and the text books. And, these are
the 2 books that we will be following in the next couple of lectures – “AI: The Very

29
Idea” by Haugeland and “Machines Who Think” by Pamela McCorduck.

We saw some definitions of AI. So, therefore, 4 things here - one is that if they do
machines are intelligent, if they do things which human beings are considered to be
intelligent for; another definition is that AI is the enterprise of solving heart problems
and finding polynomial time solutions. And, we must, of course, qualify that by saying
that these are approximate solutions, so, or they are not necessarily optimal solutions.

And then, the AI, the study of mental faculties by creating computation models that is the
idea given by Sania Macdomote. But, the idea, but the definition that we like most is
given by Haugeland. And, the definition says that AI is interested in the idea of machines
with minds of their own essentially. And, this is the idea that you will pursue in the next
couple of lectures.

(Refer Slide Time: 05:42)

We asked some fundamental questions in the last class - what is intelligence? What is
thinking? And, we got several responses here; what we think is intelligent behavior? So,
problem solving, reasoning, learning, perception and language; language was mentioned
in the last class.

30
(Refer Slide Time: 06:30)

So, let me ask a question here. Language is something which is unique to human beings.
And, many people believe that it is instrumental in intelligent behavior. But, the question
that I want to ask is if you look at language and thought, what came first? So, he was
thinking because of the language or was language was because of the thinking. In the
sense, is that what ability to think dependent upon the scale of language that we have, or
did language come because we are able to think?

So, let me see what people think here. And, again I will emphasize that there is no
correct answer to this question; in the sense that it is like the chicken and egg problem.
But, what do students think? And, I hope students from Monday will also join in, with
their opinions. Can we think without language? Let me ask in this way. Or, is thinking
closely tied to language? You are saying, no; we can think without language. So, can
you, sort of, justify, or support your answer?

Student: Abstract thoughts we have; so, those are not language; those are not language
dependent or something.

So, abstract thoughts are not language dependent; now that is a, some more debatable
claim essentially. So, the question I am asking really is that our thoughts made up of
language, or all our thoughts, good thoughts imply language; can we think without taking
recourse to language. So, when you say language, we really mean symbol because
language is just one kind of a symbol system essentially. Is it possible to think? Now you
said graphics or visual images, yeah that is the thing which comes to mind, that if you

31
recall visual images then you are not really talking about words or things like that.

Interestingly, you have heard of Chomsky, right. Yeah. So, is there anyone who is not
heard of Noom Chomsky? What is he doing nowadays? He visited India few years ago
as well. So, he is actually become a political activist.

(Refer Slide Time: 09:38)

But, many years ago when he was active in linguistics he put forward the idea of
universal grammar; so, UG, that it is called. And, he said that human beings are born
with grammar in their heads, whatever that means in the heads, we will not explore that
question, but essentially our brains come prewired with the faculty of linguistic ability
which is, sort of, some kind of a grammar.

And, what he says is that depending upon which place, which society you grow up in,
you tune that grammar to that particular language that exists in that society essentially.
So, Chomsky, of course, so, he is saying that language came first; that we are born with
the ability to use language, and may be that helped us, of course; he is not saying that,
but may be that helped us in our ability to think essentially. But, anyway that is an open
question; may be at later point I will come back to it.

32
(Refer Slide Time: 10:55)

So, we also ask some question as to, what is a machine? Is the computer a machine? And
we said that, yes, we will assume that; whenever we talk of machines thinking we will be
talking about computer programs running. And of course, we ask the question that, are
we machines; that is something you can found over and some reactions to that.

(Refer Slide Time: 11:18)

So, historically there have been arguments against thinking. So, we had discussed 3
arguments by Dreyfus which says that there is something intuitionistic going on in our
heads; something which is kind of intuition which we cannot define in terms of rules. So,

33
when Dreyfus was talking about all these people, we were talking about rules as a
mechanism for reasoning. And, he said that there is certain kind of unconscious instincts
that we have which cannot be captured in rules essentially.

John Searle, a philosopher, use the Chinese Room argument; and he says the argument
was that just because he can manipulate symbols and convince somebody that you are
doing something, like for example, children doing long division. Do they understand
when, whether, what is really, what are they really doing. Or, when even younger
children do addition; so, they add 2 numbers, where looking up a table, do a carryover,
then add, and so on and so forth. You are doing simple manipulations. Are you
understanding, what is behind that activity essentially?

All older students, should I say, when they are dealing with things like Fourier
transforms and so on and so forth, are you doing it mechanically, or have you mugged up
a formula of how to integrate something, or do you understand what is happening behind
that decision. So, what Searle says is that symbol manipulation, the ability to manipulate
symbols is not necessarily a guarantee that you are intelligent; maybe you are following
some rules which somebody has taught you; this is how you add numbers and so on and
so forth.

And, Penrose, the celebrated scientist says that there is something quantum mechanical
going on in our brains essentially. So, there are other arguments based on emotion in
intuition, consciousness, ethics, and so on which we will ignore.

(Refer Slide Time: 13:44)

34
Then, Alan Turing, and he said that the question whether machines can think is
meaningless. Let us not try an answer because first you have to answer what is thinking
and only then you can say whether machines can think essentially or not essentially. He
said that let me prescribe a test which is called as a imitation game, and which we now
called that Turing test.

And, at that time when he prescribed it in this book that is mentioned in this slide here, in
the paper that is mentioned here, „Computing Machinery and Intelligence‟ appeared in
1950. It is available on the link that is given in the page. He believed that in 50 years on
then which is in 2000, machines would be able to pass his so called Turing test
essentially.

(Refer Slide Time: 14:03)

And, what is the test? This is where we stopped in the last class. The test is that there is a
human judge sitting out there, interacting over some medium; it could be nowadays a
mobile phone where you are chatting with someone; or, in those days it was a teletype
which was connected to another room in which the other person who was responding;
and what Turing said was that if that human judge can confidently discriminate whether
the other side is man or a computer, then the computer has failed the test. But, if the
computer can, most of the times convince the judge that the judge is talking to a human
then the computer has passed the Turing test.

So, this is what is known as a Turing test of intelligence you might say, to test whether
system is intelligent you will pass it through the Turing test and then decide whether it is

35
intelligent. You do not ask, what you mean by thinking, what is intelligence; and you
know that do not go into fundamentals. And, as I said, there is a Loebner prize which is
currently available; still now 100, 000 US dollars to anybody who can pass the test, so to
speak essentially.

So, the question which I left the class with was, what you think of the Turing test as a
test of intelligence? Do you have any views on this? Is it a good test? Is it a bad test? Do
you agree that if a computer passes a test it will be considered to be intelligent, it qualify
to be called intelligent? Any thoughts on this? While you are thinking, meanwhile let me
address the mandy students.

So, welcome again, and the basic idea of the first few lectures is that they are going to be
a study of history and philosophy behind AI, what has happened in the last few 100 years
which has led to the development of AI currently. And, after those 2 or 3 lectures we will
have a qualitative shift, and we will spend most of the time using algorithms, using the
syllabus which I will convey to you essentially. So, is it a good test or a bad test? Surely
you can have some opinion. Yes.

Student: I mean you cannot judge the intelligence because like any performer can see the
past media like through what type of question a grandmaster configuration and based on
that data.

Yes, if that is, precisely what happens is Loebner prize is a context which takes place
every year, and as I said this year it is going to, the final is going to take place on
September 14. And, this one of the leading programs is called Izar and this is a
pronunciation transcript from the earlier competition rounds. So, you have seen it and
you can look at it again.

So, obviously, people who like such programs look at the history just like students who
write exams they look at history of past questions, even such people essentially. So, that
is all allowed; everything is allowed. Can you write a computer program which will cool
the judge, if you want to use the term, to thinking that the judge is talking to a human
being.

36
(Refer Slide Time: 17:57)

And, this was the conversation that we saw this izar had. And, he is making statements
like I am getting into, when he is talking about music he is making statements about, the
last couple of lines you will see, I have been getting into Hoomii, a type of Mongolian
throat singing. What kind of music do you like? So, obviously, such a program will have
to be equipped with general knowledge, atleast which everybody knows essentially. So, I
mean no human would be, would not knowing it essentially. So, you have to have that
kind of knowledge.

Of course, you also have to have some kind of a rhetorical skills and ways of getting
around questions and things like that, all that is part of it. What if I were to give it to 12
digit numbers to multiply; say, what is the product of 2 billion 29 million whatever, some
12, 13 digit number I say, I give 2 12 digit numbers and ask it what is the product of that,
and the computer, poor thing, being a computer gives me the answer before I even finish
the question, almost finish the question. Would not I be able to say no, no, you are not a
human being? So, I will again leave it you to think about this; is it a good test or bad
test?

37
(Refer Slide Time: 19:28)

But, you have did mention Eliza, a program written 1966 when computer has just come
into place by Weizenbaum. And, it is a program for simple manipulation of rules. If you
take the input, do a little bit of twisting, turn it round of it and put it back to the user; and,
to some people it is, one version of it called Doctor, sounded like a psychotherapist. And
they would start talking to the program as if they were talking to a therapist.

And, this was a Russian scientist who was visiting Stanford who actually went through
this conversation, and we saw that in the last class. And, what you see in purple is the
fact that how this program is manipulating your input into generating its own output
essentially. So, something like I am feeling a bit tired, and it says why do you think you
are feeling a bit tired, is standard; and, new questions like tell me about your family and
so on.

Weizenbaum did not like the way people responded to Eliza. It is a very simple program.
It is nothing deep sophisticated about it, but people used to interact with it as if they were
interacting with somebody who understood the complexities of their problems and thinks
like that essentially. So, he wrote this book “Computer Power and Human Reason: From
Judgment to Calculation”.

He wanted say, in essence, that AI is not possible; that you know computers can never be
as deep thinkers as the therapist can be essentially. So, there is a difference between what
can be, what appears to be, essentially. And, human beings have a tendency, we have, we
are willing to suspend our disbelief essentially. They are willing to watch a James Bond

38
movie and believe that all that is happening is possible, and all kinds of things
essentially.

(Refer Slide Time: 21:26)

So, the fact that the man made artifact could respond to human input easily leads humans
to make a leap of faith and conclude that it responds intelligently and knowledgeably.
Throughout centuries we have been doing that essentially. So, in olden times, in Egypt,
people believed that statues which moved and gestured had a sort of a soul, and they
could represent a god or a dead person and communicate through a priest essentially.

So, I said older times Egypt, but even today you can find in our country this sort of a
thing happening. You have people who will eat tea leaves, or people who communicate
with your ancestors, or people who go and get their fortunes foretold by a parrot who
pulls a card out of a bunch of cards. So, we do it all the time. And. we believe, will not
everybody, we mostly believe that this is possible essentially. Such practices continued
to this day essentially.

And, in Europe, there was a great fascination for such moving figures, moving automita,
or, you know, statues which could move around shake their heads and so on. So, Pamela
McCorduck writes in a book “Machines Who Think” that in medieval times art of
making clocks decorated and animated figures was very popular essentially. So, if you
go to Germany, you can still find them. For example, in clock towers when it is 12 noon
suddenly there is a lot of music and some statues come out and do something and go
back in, that kind of stuff.

39
So, it was popular in medieval times that learned men kept robots essentially. By learned
men, know, society was not very as galantive as it was now; they were the kings, and
they were the peasants, and they were the learned men, they were the traders and the
warriors. So, there are classes of people, in those learned men kept robots.

And, most interestingly to most people there could be little difference between a human
figure that nodded, bowed, marched, or struck a gong at a precise and predictable
moment which is entirely feasible. You can construct machinery which is accurate, and
we know that such machinery exist, so between surf machinery and a human figure that
answered naughty questions and foretold the future.

So, for us there is no difference. If you can construct a statue which can nod its head and
we ask the question and it nods its head, we are willing to say that yes it understands
what I am saying and it is telling my future, and you know that sort of a thing.

So, in the study of history that we are going to be doing, there going to be two strands -
one is this mechanical side of talking statues, moving statues, and things like that; and
other is going to be the philosophical side which is about what is the notion of the mind,
how do the notion of the mind come. You know, I am, so those questions we will come
to little bit later. Let us first address the mechanical side of things essentially.

40
(Refer Slide Time: 24:54)

So, all this is happening in Europe. We have this question of artificial people. In Homer's
“Illiad” Hephaestus is supposed to have created this Talos, a man of bronze which would
patrol the beaches of Crete. Hephaestus is also supposed to have created Pandora; you
might have heard about Pandora who commissioned by Zeus, Zeus was a god, to punish
mankind for accepting Prometheus‟s gift of fire. And, Pandora is supposed to take that
casket, but she is so curious about it that she opens the casket essentially; you know, and
let lose the evils into this world mostly.

Pygmalion, remember this one act show of Pygmalion in which there was a character
called Eliza which was a name of the program written by Weizenbaum. Pygmalion was
also a mythical creature who was disappointed by real women and created Galatea in
ivory, and Aphrodite who was another god. So, the Greeks also had many gods like we
Indians have gods for doing different kinds of thing. Aphrodite obliges him by breathing
life into Galatea and apparently, he fell in love with his own creation like in the play.

Then Daedalus, you must have heard about, more, well known for his artificial wings.
He was, he wanted to fly, but he was also create, credited with creating lifelike statues
that wheezed and blinked, and scuttled about, impressing everyone. So, this is the
important thing. This statues which could seem to be autonomous; and, if you are
autonomous you must be intelligent essentially. So, that is the leaf of face that we are
making essentially.

Then, about a 1000 years ago, Pope Sylvester is said to have made a statue with a talking

41
head, with a limited vocabulary, and a penchant for predicting the future. So, that is why
people are willing to believe that this talking head can tell your future; and on being
asking a query, it would reply yes or no by shaking his head essentially, but all that is in
myth, essentially. There is some more mythology.

(Refer Slide Time: 27:18)

Paracelsus was a physician lived from 1493 to 1541 is supposed to have created a little
man called humunculus essentially. And, he made this statement: “We shall be like gods
... we shall duplicate God's greatest miracle - the creation of man” essentially. Because in
western thought we have been created in the image of god himself, and so, we can be
like him and create creatures in our own image. So, he lived in Switzerland.

And, Judah Loew ben Bezalel is reported to have sculpted a human, a living man from
clay, and he called him Golem, to defend the Jews of Prague. So, in Jewish folklore, a
Golem is an animated anthropomorphic creature made out of an inanimate matter. So,
that is a kind of a image, the kind of creature he supposed to have created essentially. All
this material that is available in Wikipedia and I have given all the references from where
I have taken the images.

42
(Refer Slide Time: 28:30)

So, let us talk about real mechanisms, or some of them are mythical of course. We
cannot imagine man made of clay which could do all this sort of things. But, in parallel,
real machinery was being created essentially. Some of these ideas came from the east by
a Arabian countries.

And, in 802 Haroun-al-Rashid, you know, we heard his name in other context as well, is
said to have presented Emperor Charlemagne with an elaborate clock which sent out
dozens of cavaliers from a dozen windows each and back again. So, this is the kind of
clockwork which if you go to Eurpoe you can still see now essentially in this, in the
town, or the city center we have this sort of machinery still operating.

Then, a group of Arab astrologers is credited with constructing what they called as a
thinking machine called the zairja which was designed. So, it was a collection of rotating
disc, you know, with markings on them. And, if you rotate the disc according to some
input information you would compute something.

But, their notion was to generate ideas by mechanical means with a help of a technique
of breaking down called al-jabr which, as some of you know, is a route for the word
algebra. And, by combining numbers values associated with letters and categories, new
paths of insight and thought would be created essentially. So, this fascination of
autonomous entities, autonomous machines which are thinking machines, goes back a
long time essentially.

43
So, this zairja caught the imagination of a Spanish Catalonian missionary called Ramon
Lull, and, who decided to design a Christian version of it which he calls as Ars Magna.
And, he said the goal is to bring reason to bear on all subjects, and in this way, arrive at
the truth without the trouble of thinking or fact finding. So, one thing, when you look at
the quotations from these times you must remember that some of the meanings of the
words are a little bit different from what they are know essentially.

So, those of you who have read Shakespeare for example, would know that
Shakespeare‟s English is a little bit different from our's English, and our English, and we
need to understand things essentially. But, this notion of arriving at truth without the
trouble of thinking or fact finding, of course, has been fulfilled now with programs like
google and so on; just have typing something and you get answer obviously.

(Refer Slide Time: 31:22)

So, by the middle of the 14th century, large clocks and figures became popular in many
areas of Germany and Italy. And, talking brass heads became closely associated with
learned man again. The Archbishop of Salzburg built a working model of a complete
miniature town, driven by water power essentially, operated by water power from a
nearby stream.

44
(Refer Slide Time: 31:53)

So, 1 or 2 more examples. Vaucanson's Duck, Vacanson, keep, note the years; this was,
he made this thing in around 1730 or something like that, which is quite long time ago.
He was a French inventor. So, one - he is credited with having made an android which
could serve dinner and clear tables for the visiting politicians. However, one government
official declared that he thought Vaucanson's tendencies as profane, and ordered his
workshop to be destroyed essentially.

We will see later that this kind of political oversight has influenced europolian thought
quite a bit political and religious source; for example, Copernicus and Galileo and all
these people was sort of worried about putting forward their ideas, about what the world
is really like. So, he created this, his most famous creation is this duck called the
mechanical duck which could appear to be drinking, eating, quacking, splashing about in
water, and digesting its food; became very famous, 1739.

And, there is a image of a replica of this duck which is lying in some museum
somewhere. Though, of course, in real life, meaning in the actual duck that he created, it
did not have digestive abilities. The food was actually collected in, the food that the duck
was supposed to be eating was collected in one container, and the output was sort of pre-
stored and sent out from another container. But, he was hopeful that a truly digesting
automaton could one day be designed. So, this fascination with machinery is that you are
trying to look at essentially.

45
(Refer Slide Time: 33:52)

Another very famous example is this chess playing Turk by Kempelen. Wolfgang
Kempelen, 1734 -1804, he created chess playing machine known as a Mechanical Turk
constructed in 1770 to impress the Empress Maria Theresa of Austria. And, the
mechanism appeared to be able to play a strong game of chess against human opponents,
as well as perform the knight's tour. So, you must be familiar with the knight‟s tour on
the 64 board chess square, can you move a knight to cover all the squares exactly ones.

And, the figure on the bottom is knights to a apparently created by the Mechanical Turk,
looks quite a aesthetic figure to make. Actually, you might have written a program to
create a knight‟s tour of some point. Now, this was the automaton which Kempelen took
all over Europe; he impressed Napoleon and other people, beating his general Hectaus.

And, here is the picture of the automaton. You can see it; maybe I should make it a bit
larger. If you look carefully, you can see that inside this box was a human chess player
sitting this. So, it was a really a Hokes. Since, it was not discovered for a long time
essentially. And it travelled for nearly 84 years, Europe and America, beating all kinds of
luminaries at chess.

It seems, Edgar Allan Pope wrote an essay trying to expose that this chess play cannot be
a real machine. So, let us move on to more useful things. Mechanical arithmetic; can we
make machines which will do arithmetic for us? So, Pascal, of course, you are familiar
with. As students of science in various places Pascal‟s name is appeared; not least as the
programming language, name of a programming language.

46
So, he invented a mechanical calculator using something called Lanthonyer which we
will not go into. And, he tried out 50 different prototypes before presenting his machine
in 1645 with a public. It was called Pascalin or reflecting machines or Pascal's calculator.
And, you could add and subtract 2 numbers; that was its limits of its mental abilities, and
multiply and divide by repetition essentially.

There is a image of Pascaline from one of the museums. And, he received the rights to
produce this machine and sell it in France, but it was not something which was
commercially viable for him. So, as this article says, the cost and complexity of
producing the Pascaline was too much for him; and the production ceased in a year. This
is a real machine which could add, subtract, and multiply, 2 numbers. So, you can see,
we are already looking at advent of calculating machines.

(Refer Slide Time: 37:16)

So, all these are famous names. You must have encountered them. For example, Leibniz
we know as an inventor of the calculus. So, these are sort of multifaceted people with all
kinds of things in those days. So, he was a German philosopher and a mathematician. He
started to work on his own calculator after Pascal's death. So, he invented something
which is called as a Leibniz wheel or a stepped drum which could do counting in some
sense.

So, here is the image from the Wikipedia site. So, you can see there is a gear in the red
color fixed to a rod. And, depending upon; and there is a step set of teeth in the orange
cylinder. So, depending upon where the gear is it would be rotated some number of times

47
dependent upon howmany of those teeth is encountered essentially. So, when it was on
the lower part it would count 0 or 1, if it was raised to the extreme right then it would be
struck by let us say 9 or 10 teeth and it would count of 10. So, it is just a small counting
device that he invented.

All these machines in those times were mechanical in nature. It is called as stepped
drum. And, as you can see from this quote it was used for 3 centuries until the advent of
the electronic calculator which came only much later in life, in fact, in the last century.
And, he build this machine called the stepped reckoner. So, the stepped reckoner, the
stepped name comes from the kind of drumit uses, the step drum.

It was a digital mechanical calculator invented by Leibniz around 1672 and completed in
1694. It could perform multiplication by repeated addition, and division by repeated
subtraction. And, it could operate with 8 digit numbers. So, if you multiply 2, 8 digit
numbers it would give you 16 digit answer. So, that was its precision; and image of the
stepped reckoner from one of the museums essentially.

48
(Refer Slide Time: 39:35)

Leibniz, of course, was as I said, multifaceted person. He believed that much of human
listening could be reduced to calculations of a sort. We are moving towards AI in some
sense; that can we have machines automaton which think and it can do reasoning, and
you know, including things like arithmetic. But, he is going behind arithmetic. He is
saying that human visioning could be reduced to calculation of a sort and such
calculations could resolve many differences of opinion essentially.

So, here is a quote from Leibniz. He says, “the only way to rectify our reasonings, and
mind this language is not modern language, it is slightly old language, the only way to
rectify our reasonings is to make them as tangible as those of the Mathematicians, so that
we can find our error at a glance, and when there is a dispute among people, persons, we
can simply say; Let us calculate without further ado, to see who is right”.

The motivation behind devising logic is the same essentially. And, we will see that logic
also evolved from similar background. But, this idea that arguments can be settled by
calculation, so you do not have to fight or something like that, also goes back to those
time. Of course, if we tell this to the UP government they would not believe it
essentially. So, these are the general principles of his logic, and, in fact, his whole
philosophy.

And, they are, that all our ideas are compounded from a very small number of simple
ideas which form the alphabet of human thought essentially, very significant step is
making essentially. He says that everything that we think about is essentially

49
combinations of some small set of simple things. Remember, in those days, physics or
science had still not discovered the notion of an atom. All those ideas came much later
essentially.

And, Leibniz is talking about the fact that there are small number of simple ideas which
combine and form complex ideas to form new ideas essentially. And, this is an idea
which, sort of, carries forward to present that knowledge representation as well. They
have been approaches to knowledge representation which says that we will work with
small number primitive concepts and derive all other concepts from those concepts
essentially. So, it is a very significant idea. So, the alphabet of human thought.

And secondly, complex ideas proceed from these simple ideas by a uniform and
symmetrical combination, analogous to arithmetical multiplication. So, what you mean
by uniform and symmetrical combination? Is that there is a well defined way of doing
things essentially, well defined mechanical way of just like you have algorithms for
adding or doing long division or multiplication, you have a algorithms for combining
smaller ideas into more complex one.

So, there are two things he is saying - one is that everything, all ideas are made up of first
finite set of simple ideas, and then there is a uniform way of combining ideas to form
more complex ideas. Then remember this was in the 17th century.

50
(Refer Slide Time: 43:25)

The first commercial success of a mechanical calculator was in 1820. Thomas de Colmar
from France, he built this machine and it was strong and steady enough to extend daily
use in the office environment essentially. So, if you look at the last paragraph, so these
are some of the images which are of those real machines. And, as you can see the second
paragraph they were manufactured from 1851 to 1915. So, not even 100 years ago you
might have found people using such a machine to do arithmetic essentially.

The last idea says that its sturdy design made it a key player in the move from human
computers to calculating machines that took place during the second half of the 19th
century. So, I want to draw your attention to the phrase human computers. Before our
computers, digital computers, or mechanical computers that we will see in the moment
were invented, but the word computation was essentially applied to human beings. The
human beings computed things and they were called computers essentially; and this is
the term that you will see again later sometime.

51
(Refer Slide Time: 44:48)

That brings us to the first computer, so, you all familiar with Charles Babbage I presume;
18th century; he was a mathematician, philosopher, inventor, mechanical engineer; best
remembered for his idea of the programmable computer. All the machines that we have
seen so far were not programmable; they could do arithmetic for example, and that was
about it essentially.

Charles Babbage took this to the next step. So, he as a child was fascinated by the same
automaton, the kind of talking heads, moving figures that we talked about. And he, this is
a quote from what he said about some statues that he saw displayed by a man called
Merlin.

And, he described them as saying, there is 2 feminine figures, he says, one walked, used
an eye-glass occasionally, and bowed frequently; her motions were singularly graceful.
And he says, the other was a dancer, full of imagination, I do not know how he got that,
and irresistible essentially. So, this was the general fascination with talking, moving
figures that, you know, was pushing all these ideas forward essentially.

So, in 1822 he began a building what is called as a difference engine; you must have
heard the term. It was designed to compute polynomial functions, more than addition and
multiplication. The first engine was composed of 25,000 parts weighed about 13,600 kilo
grams, and was 8 feet tall.

52
(Refer Slide Time: 46:29).

Here is an image of a difference engine. On the bottom we can see. On the bottom left
you can see an image which is there in London science museum. And, on its right is a
small enlargement of the gear system that it was using. And, on the top is a part of a
difference engine assembled after his death by his son, using parts found in his lab
essentially. That is a original Babbage engine, whereas what here is a recreation, modern
recreation of the difference engine that you can see. So, these are working machines.

53
(Refer Slide Time: 47:09)

One place where Babbage got some ideas was from the so called Jaquard Looms
essentially. So, why the computer are different machine from the rest of the machines,
because it is programmable; and how was it programmable, because we have this idea of
a stored program. We can plug in a program and learn that program, then we can plug in
a different program and learn the different program. So, the stored program which can be
input.

The idea came from this Jaquard Looms which was, which is the kind of punched cards
that you see here on the left which is an enlargement of the figure on right which were
used to create designs in fabric essentially. This looms were create to, used to create this
pattern fabrics; and the pattern could be conclude by this punched cards. So, this idea of
punched cards is what Babbage took from there.

54
(Refer Slide Time: 48:00)

And he created what is known as the analytic engine. It was a proposed, he did not
actually build it, mechanical general purpose computer designed by Charles Babbage,
first describe in 1837. It had an arithmetical logic unit control flow, all modern terms in
computer science in the form of conditional and branching loops and integrated memory.
And, it was the first machine which in the modern sense could be said to be Turing-
complete which means it is equivalent to a Turing machine.

55
(Refer Slide Time: 48:37)

The first programmer was his accomplice, a collaborator, Augusta Ada King. She was a
daughter of Lord Byron who was a poet. She is now known as Ada Lovelace, was
English mathematician and writer, chiefly known for the work that she did along with
Babbage. So, her notes include what can be called as a first algorithm which is processed
by a machine, and because of this she is often called as the first programmer. And, the
programming language ADA that you might have heard about, promoted by the US
department of defense, is named after her.

56
(Refer Slide Time: 49:30)

So, she went beyond number crunching. Again, like (49:20) she realized the potential of
the device extended far more beyond number crunching. And, she wrote, and this is the
quotation, that the analytical might act upon other things besides number, were objects
found whose mutual fundamental relations could be expressed by those of the abstract
science of operations, and so on.

And then in particular she talks about generating music. She says, that supposing that
fundamental relations of pitched sounds and the science of harmony and musical
composition were susceptible of such expressions, the engine might compose elaborate
and scientific pieces of music of any degree of complexity essentially.

So, even then Babbage was just designing this analytical engine which is not even built,
and she was imagining that such a machine would compose music essentially; because
now days we know, it can be done.

57
(Refer Slide Time: 50:29)

And finally, in this study of mechanical history we look at this first electronic machine
which was built called ENIAC. You must be familiar with it. And, you can just see the
size of ENIAC – 17,000 vacuum tubes, 7, 200 crystal diodes, and so on and so forth. 27
tons, it occupied a full room essentially. It was not even as powerful as a small computer
that you have on a smart phone nowadays. But, that was the first electronic machine
essentially.

58
(Refer Slide Time: 51:00)

So, with that we come to the end of the first part which is the mechanical side of this
thing. So, will take a small break and come back with the second question which is the
motion of mind and philosophy; how do the motion of mind come back in philosophy.
So, we will stop for a while and then we will start in about 3 or 4 minutes essentially.

59
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Module - 01
Lecture - 03
AI Introduction Philosophy

We are back. Having looked at the mechanical side of this history of AI, we want to now
look at the philosophical side as you know; this notion of the mind. What is the notion of
the mind? How did this notion of the mind ever come out? And how can we get it across
to the machines, essentially?

(Refer Slide Time: 00:37)

So, the medieval view, world view in Europe was basically, a Christian adaptation of
Greek ideas, essentially. There was a big gap between the Greeks and medieval Europe,
but the ideas that Greek started off with, eventually, ended up with medieval Europe. The
view of the world was of course, that it was a very anthropocentric view of the world;
humans were at the center of the universe, and everything revolved round the earth,
essentially. We look at some of those ideas. What we are trying to see is how did human
beings, as gentian beings, have come up with the notion of the mind? How would you
even imagine that there is something called a mind? I mean, of course, you are there in

60
the world; you are immersed in the world, and you are interacting with the world. But
how do you come to this conclusion that you have thoughts and ideas, which are in some
sense, existing independently.

We start with the platonic view of the world; the idea of which came from Plato. He said
that in the perfect world, there are this creator's ideas; the God’s ideas. Our ideas, the
humans’ ideas are derived from God’s ideas, and the world itself, is derived from God’s
ideas. So, it is a very platonic view of this whole world, and the world was corruptible
materialization of God’s ideas. You know, that is why things were not perfect in the
world. Even though, God’s ideas are perfect; the world is not perfect. Likewise, our
thoughts are true to the extent that they are accurate copies of God’s ideas; that was the
first starting point; Plato.

When we move we move on to Aristotel, Aristotel derived the idea of the God and he
said that the world is out there, and human ideas are human ideas, and they, in some way,
correspond to the world, essentially. So, our thoughts resemble the objects at they stand
for. If I am thinking of an apple, then my thought of an apple resembles the apple,
essentially. This is known as the correspondence theory of truth, essentially, and one
branch of philosophy, which was, it was taken up by Ludwig Wittgenstein, who in his
early works, postulated something called the picture theory of languages; that behind
every word, there is a picture; an image which is sitting out there, essentially. So, this is
how the world as we saw it. The earth was flat at the center of the universe with the
Gods, heavens, rotating around it essentially. The sensible world, the world at we could
sense, was composed a five elements

61
(Refer Slide Time: 03:43)

Quintessence, which was constant in the heavens, did not change; constant in that sense.
Four are the elements, which are fire, air, water and earth; listed in decreasing order of
weight, essentially. So, you can find similar ideas all over the world about, you know
what are the basic elements of the world, essentially. So, if you ignore quintessence,
which is there in the heavens; we have fire, earth, water and air. They are all jumbled up
on the earth, trying to find or striving to find a rightful place; rightful place, meaning by
order of weight, essentially. The earth should be the lowest, and then, there should be
water, and then, air and then, fire. Different materials had different amounts of these four
elements in them, and that is why, they behave differently. For example, wood had more
water, and therefore, and some air, and therefore, it floated on water, essentially.
Whereas, iron had more earth and therefore, it sank in water, essentially. So, they could
explain why wood floated, whereas, iron sank and so on. If wood were to catch fire, then
you know, it tries to escape into air. So, there are this kind of explanations about the
world, essentially.

62
(Refer Slide Time: 05:14)

This is how; the color inside this circle is supposed to represent the color of the sky. So,
you know, we have morning, day time, evening and night. As the sun; it depends upon
the position of the sun; as the sun rotates, our day changes, essentially.

(Refer Slide Time: 05:38)

So, this is a small animation I created of what we thought about the world to be like. This

63
is how the world was; all the heavens were rotating around the earth, and the earth was
the center of the universe, essentially. Now, in those days, astronomy was for many
reasons, are very important science, and to very empirical science; but it was difficult to
explain the motions of the planets. For those of you, who are interested in astronomy,
you would know that the stars are always in the same position; the constellations that we
see, Leo or Virgo or any of these. The constellations, they always appear in the same
fixed pattern, throughout the year. It is only the planets, which you know, move from one
constellation to another, and it was very difficult to explain; how they were operating,
essentially.

King Alfonso of Spain in 13th century got so upset; that he said that if God had
consulted me when creating the universe, he would have received a good advice. You
know, why have this planets move around in erratic fashion, essentially. It is a quotation
I have got from book; I should have mentioned it there.

(Refer Slide Time: 06:56)

Then, along came Copernicus. So, this is the first, and this what (Refer Time: 07:15)
says; the verge between thought and reality, the first verge between thought and reality
was inserted by Copernicus, who says that what we see, is not what really is. So, up till
now, remember, this picture theory, the truth, the motion of correspondence that our

64
thoughts are in the image of what we see around us, and that kind of a thing, that our
thoughts reflect the world as it is. Copernicus was the first person, who came and you
must be familiar with his book on the revolutions of the celestial spheres. He said that
our Earth is not at the center of the universe. In fact, the earth revolves around the sun,
and earth rotates, and creates the illusions of day and night, and that kind of stuff,
essentially. The important thing from our point of view is that what we see, is not what
really is out there, essentially; so, the verse as hogelensay between thought and reality.

(Refer Slide Time: 08:10)

So, all these famous characters; you have encountered them in one way or the other. We
know Galileo Galilei, because of the equations of motion, for example, we always
attribute them to Galileo; v is equal to u plus a t, and you know, that kind of stuff. But
Galileo made this very important observation. He said that perception is an internal
process. He says, and this is quoted to him; I think that tastes, odors, colors, and so on,
are no more than mere names, so far as the object in which, we locate them, are
concerned. So, if you are smelling a rose and you feel that the rose smells nice, it is
nothing to do; the notion of the smell of the rose is not located in the rose, but it is
located in our minds.

So, he says tastes, odors, colors and so on, are no more than mere names, so far as the

65
object in which, we locate them, are concerned and that they reside in consciousness in
our minds, in other words. He says that hence, if the living creature were removed that
we as a perceivers of this tastes and smell and odors were removed, all these qualities
would be wiped away, essentially; that this notion of taste and smell and color is
something that we have in our heads; it is not the property of the object. So, he goes on
to explain, for example, he imagines that the notion of smell actually, happens, because
they are this particles, which are impinging upon the inside of our noses, which results in
certain sensation, which we call as smells. It is very accurate as you can see, but Galileo
said that in the 17th century that perception is an internal process, essentially. So, we are
exploring this notion of thinking; how the notion of the mind evolves? So, all these are
European history, because AI, as we know it, came out of European thought, essentially.
Even though, for example, other civilizations like Indian philosophy, has a lot to say
about some of these concepts, like knowledge and so on, but we are not, I mean, AI did
not come out of that, essentially.

Then, Galileo says that philosophy is written in this grand book; The Universe. It is
written in the language of mathematics, and its characters are triangles, circles and other
geometric figures. So, when Galileo was doing all these reasoning, algebra had not been
invented, essentially. In fact, his proofs of the equations that we attribute to him, like v is
equal to u plus at; are essentially, geometric in nature. So, if you look at Hoggland’s
book, you will see some idea of how he draws triangles, and says that this side represents
this; this side represents that; and the area represents this; and that kind of thing. All
these reasoning were done; for him, mathematics was geometry, and he says that the
whole world can be described in mathematics; the language of mathematics; and its
characters are triangles, circles and other geometric figures. So, you can say, this is
another step away from the fact that our ideas are reflections of the real world out there.

He is saying that you can think of motion; the laws of motion are about moving bodies,
using the language of mathematics. So, the very already, the representation has moved to
something, which is different from the real world out there, essentially. Next, we look at;
so, Galileo showed that geometry could be used to represent and reason about motion;
this is what we just said.

66
(Refer Slide Time: 12:25)

Then we come to the person, who Haugeland calls as the grandfather of AI. It was the
English philosopher, Thomas Hobbes, 1588- 1649, who first put forward the view that
thinking is the manipulation of symbols. This is fundamental to AI, because after all, we
are talking about representing symbols and manipulating them, and creating intelligence
out of them, essentially. So, Galileo had said that reality is mathematical, in the sense
that everything is made up of particles, and our sense of smell or tastes as, how we
reacted to those particles. Hobbes extended this notion to say that thought too was made
up of, or expressed in particles, which the thinker manipulated. So, Galileo was talking
about the external reality, and how we can represent, think about that, or talk about that.
Hobbes is talking about the internal process of thinking and saying that even thinking is
basically, the manipulation of something, which we called as particles, which we now
call as symbols, essentially.

However, he had no answer to the question of how can a symbol mean anything, because
we will see that; for us, intelligence is manipulating of symbols in a meaningful fashion.
Hobbes could never say how can a symbol, mean anything. In fact, as Haugeland says he
could not distinguish, he could not tell us; how minds are different from books. In the
sense that books are also collections of symbols, and mind are also collections of
symbols, which we are manipulating; how can the two be different, because the idea of

67
meaning is very elusive. So, if I were to ask you; how do you know the meaning of a
word? How would you, what would your answer be? You just take any word. Let us say

Student: in terms (( ))

Prof: You could use examples.

In particular, I am talking about our standard source of meaning, which is a dictionary.


So, if you want to look up a meaning of a word, you go, and look up a dictionary. How
does a dictionary give us meanings, essentially, because dictionary is only describing
words in terms of other words, essentially. When you give examples; also, you are giving
examples in terms of other words. Where does the meaning originate from? I mean is
there a fundamental source of meanings, essentially? This is a kind of difficulty, which
Hobbes faces. Where does meaning come from, essentially? We are also not able to see
where, the meaning comes from. See, for people before him that the notion of an apple
is, because you see an apple, and that is what it means. But when you talk about
language and thought and symbols; we have this difficulty of saying where, does this
meaning come from, essentially; as a question that we do not, we are not yet, answered.

(Refer Slide Time: 15:34)

68
Here, is the picture of Thomas Hobbes in the 16th century. In his book called De
Corpore, Hobbes first describes the view that reasoning is computation. So, he is saying
reasoning is computation. By reasoning, he says I understand computation, and to
compute is to collect the sum of many things, added together at the same time, or; these
are very arcade languages; or to know the reminder, when one thing has been taken from
another to reason, therefore, is the same as to add or to subtract. Again, like I said and
adding, subtracting, arithmetic is similar to other kind, reasoning is similar to this kind of
process, essentially. So, this quote, I have taken from this source, which is the Stanford
encyclopedia of philosophy, and as we have just mentioned, Hobbes was influenced by
Galileo; just as geometry could represent motion, thinking could be done by
manipulation of mental symbols, essentially. Does not name Hobbes ring a bell? Kelvin
and Hobbes; in fact, Hobbes was named after Thomas Hobbes. Kelvin and Hobbes,
named after Thomas Hobbes; that is why he is such a philosophical character.

(Refer Slide Time: 17:07)

When we come to Rene Descartes, again, another great thinker from the middle times.
We know for many things, including in Cartesian coordinates is named after Decartes.
He had come; remember, that all these things was going on, these talking statues, moving
things, and so on and so forth, and it had become surf acceptable in Europe to talk about;
these machines has being liked, in some sense, essentially.

69
So, Descartes, in fact, goes on to saying that animals are wonderful machines; he just
makes the next step that they are not like machines; they are machines, essentially. Then,
he says human being were to, except for that they process something, called a mind,
essentially. We will come to this Decartes problem in a moment. So, just as Galileo said
that motion can be expressed in geometry; Decartes said geometry could be expressed in
algebra. Decartes is the one, who invented this; we called coordinate geometry and, but
he went further; he says that even thought can be expressed in the language of
mathematics and thoughts themselves, are symbolic representations. So, you can see, he
is building upon what, Hobbes says. Hobbes says that thoughts are symbols, and now, he
says that thoughts are symbolic representation that we operate up on, essentially.

(Refer Slide Time: 18:38)

But this is something, which is new which Decartes brings in; the notion of the mind and
the body. So, Decartes is what we call as the dualist, or belongs to this thinker, which
says that you know mind and body are two separate things. So, we also, often called the
mind body dualism. So, as opposed to dualism, there are schools of thought, which are
monoist in nature, which believe that there is only one kind of thing. For the first time,
Decarte is saying that two different kinds of things in this world; one is this material
world, which he calls a body, and the other is the mental world, which he calls as the
mind. He says that they are two different kinds of things. The material world, of course,

70
would obey the laws of physics and things like that. We will see later that you know
philosopher said that mental world also, should obey such laws and so on, but they are
different world. The world of mind is separate, and the world of body is separate.

And this is opposed to other kinds of philosophies, other kinds of views in philosophy
that there is only one kind of thing. So, there is the world of idealism, which says that
there is only the world of ideas. So, for example, in India we say that everything is
Maya; everything is the world of ideas and matter is basically, a construct that comes out
of our ideas, essentially. Very complicated to think about, but may be you can reflect
upon that, a little bit. As opposed to idealism, the other world is materialism, which says
that everything is matter, and the whole world is matter, and the matter interacts in the
certain way. Ideas and minds and all these, kind of stuffs; they emerge out of this
somehow, essentially. So, they are different viewpoints about what the world is like
there, and Decartes is a dualist. He said that mind and body are two separate things. He
says that a symbol and what it symbolizes are two different things.

So, if I say chalk is a symbol, it is a compound symbol made up of these letters, but it is
nevertheless, a symbol. So, chalk is a symbol and this thing that I am holding in the
hand, is what it symbolizes. So, this notion that chalk; symbol chock is separate. Then,
we have this problem that a symbol is amenable to algebraic manipulation. So, you can
do thinking, what we call as thinking is basically, symbol manipulation, which you can
manipulate symbols. The subject of thought is a world; the real world out there what, it
symbolizes. They are different things; the mind is different and the body is different. Of
course, he had to answer questions like, you know, because see, the body or the material
world obeys laws of physics. The mind, of course, was not clear how to operate it, but it
was separate. He had to answer questions like this; when, what makes a notation or a
symbol notation, suitable for symbolizing, and secondly, what makes a suitable notation
actually symbolize?

See this problem has occurred, because he has separated the world of the mind and the
body. He says the mind is one thing, and the body is another thing. A symbol is one thing
and what it symbolizes, is another thing. So, the question is what makes a notation
suitable for symbolizing? Now, that is a question that we are addressing now. When we

71
write algorithms or when we talk about knowledge representation, then we are
addressing this issue as to you know, what is this? How do you represent? How do you
create a domain model, for example; how do you represent the world so that, you can
compute up on that, essentially. That is an easier part. A more difficult part is what
makes a suitable notion actually, symbolizes; which means that if it is to be meaningful
in nature, then the world of symbols or the world of thought, should be connected in
some ways to the material world or the world of matter. Because they cannot be
independent of each other they are not different worlds that are, you know, operating
independently. Our world of thoughts is, sort of, close synchrony to the world of matters.
If we raise our hands, if we think about raising our hands; we actually, raise our hands.
So, that kind of, how does the interaction take place between the world of thought and
the world of matter. So, the question is how can thought and matter interact; because the
world of thought is different. Matter, of course, behave, sort of, obeys the laws of
physics. What about the world? The world of thought is not made of matter; it is a
different world. How can it interact with matter; that is a question that he could never
answer; the mind body problem, essentially, or the mind body dualism, essentially.

(Refer Slide Time: 23:53)

So, this brings us to what we can call as the paradox of mechanical reason; this term is
by John Haugeland in his book. The paradox says that if reasoning is the manipulation of

72
meaningful symbols, according to rational rules; remember, that we are talking about
manipulating symbols says there are well defined ways of manipulating these ideas. It is
not like you are doing it, randomly so, according to well defined rules. So, if reasoning is
a manipulation of symbols, according to these rational rules; who is manipulating these
symbols? Because this question of meaningful manipulation is coming; our thoughts are
not independent of the real world, essentially; they have to be connected to that,
essentially. If a fast bowler is running up and bowling, thinking of bowling and in swing
around something, he better be able to produce the real in swing if he is worth, he solved.
How is his thoughts related to the real world, essentially? Who is manipulating the
symbols, essentially? It is a difficult question to answer, because what says is that it can
be either be mechanical, according to some fix set of rules, or it can be meaningful. How
can it be both? You cannot have a mechanical system, being meaningful at the same
time, and by meaningful, we mean paying attention to the mean of what is happening.

How can a mechanical manipulator pay attention to meaning? Remember, that they are
not talking about AI or any such thing; they are talking about human cognation; they are
talking about human minds; how human minds operate, essentially? So, they are trying
to analyze that, essentially. Decartes said that there is a world of the mind, which is the
symbol processing. Then, there is a world of the body, which is the real world, made up
of physical matter, but they are closely tied together. So, when I am thinking about some
real world in a meaningful fashion; if I have got two pieces on a table, let us say, a cake
and a sandwich, and thinking about them. I have to decide, should I pick up one of them;
I am thinking about some real things in the world, in a meaningful fashion. My thoughts
about the cake and the sandwich are about real things, and I making some decisions;
should I eat this, or should I not eat this or something like that. So, this meaningfulness;
where does that come from, essentially? How can a mechanical manipulator pay
attention to meaning; it is a question I would ask you to ponder over a little bit, and see,
whether you know, may be like Pen Rose said, human beings are special. There is
something special happening in our brains, which allows us to do this, or like Grafus said
that there are some instincts that we have, which we cannot automate, but of course, I
will take the opposite group.

So, this led to a lot of debate in his time. This is, we are talking about Decartes still, Rene

73
Decartes, and his mind body dualism. Some people attribute the fact that you know, it is
said that Decartes, who also gave us a phrase Cogito Igo, some I think, therefore, I am.
Apparently, he is claimed to have a proof of the God exists, essentially, and the proof is
tied to the fact that there is this difficulty about how do symbols get manipulated in the
meaningful fashion, but his contemporary is, of course, did not accept any such thing.
They would, in fact, mock him about you know, this idea. So, we can imagine a little bit
like the Chinese room, which you have not discussed in detail. Just imagine that your
brain is a Chinese room, full of symbols. There is somebody, manipulating those
symbols according to some rules. Who is that somebody; that is the question that we
asking. So, people would mock at Decartes and say that there is a little man sitting in
your head, who is doing manipulating these symbols, but the problem is as you can
imagine; this explanation does not work, because the next question that you would ask is;
how does the little man operate? Little man has his little brain in his little body, which
has little symbols inside his head, and who is manipulating those symbols. So, there
comes an infinite review, essentially, and people say that this is what, led to Decartes
claim that he can prove that God exists, essentially.

But in the real world, what was happening as to this question to who people,
philosophers have tried various kinds of explanations; something called the faculty of
will, which we cannot quite define, or transcended ago, or as I said, the people who used
to mock him and say, there is a homunculus. Remember, the homunculus of nearby
parcel; a little man sitting inside. So, that is a fundamental question one has to answer.
We say that if you are in modern day world, going to write programs, which will operate
according to the algorithms that you are putting into those programs; how can they be
doing meaningful things? It is roughly equaling to that, essentially, or I might say that if
I want to implement a neural network, which is; I know that the structure of a neuron,
and how it operates, and so on and so forth, and I am just connecting together hundreds
of thousands of neurons. How can that evolve; do meaningful things like character
recognition? Of course, we know that it can be done; character recognition can be done,
but the fundamental question is that; is that intelligent, or is it doing something that we
have asked it to do.

In fact, he said that the computer can only, do what is instructed to do; nothing more than

74
that, which is of course, true at a very fundamental level. Some recent thoughts on who is
doing this manipulation, thinking, there are some very interesting books, and for those of
you are interested; I would recommend them. All of three have a common author called
Douglas Hofstadter, who is in the Indiana University. His famous book was Godel,
Escher, Bach. He and Dennet, wrote a series, collected a series of articles called The
Mind’s Eye. More recently, he has written a book called I am a Strange Loop. So, he is
trying to; Hofstadter is also trying to answer this question; I mean, instead of saying who,
he is saying what is this notion of I that I have, as a as a human being that I have,
essentially, or I or you, essentially. If I talk of you as a person; what do I really mean,
essentially? What is that you, essentially? I say that my body, my mind, my hands, my
eyes, my feet, my whatever; what is this I, which is saying my, essentially? That is the
question, which Hofstadter is trying to answer, and he, sort of, uses a combination of
emergent behavior and self-essential loops, which we do not have time to get into here,
but I would recommend one of these books; they are quite easy to read and quite
interesting. So, let us move on from Decartes to John Locke, known as the father of
classical liberalism.

(Refer Slide Time: 31:43)

His theory of mind is often cited as the origin of modern concept of identity and the self,
essentially. It influence other philosophers like Hume, that we will see and Kant that we

75
will see, in a moment. He postulated that mind was a blank slate, as opposed to what
Thompskey says that we are born with an inbuilt grammar or the universal grammar in
our heads. Locke said that the mind was a blank slate or tabula rasa as he called it, and
that we are born without innate ideas, and as you can see in the last two lines; knowledge
is determined by experience derived from sense perception. Whatever, we know in our
heads is the result of whatever we have experienced in the world, and experience leads to
knowledge, essentially.

(Refer Slide Time: 32:45)

One of his elaborators or followers, David Hume, Scottish philosopher, whom


Haugeland calls as a mental mechanic. By this, we mean a mechanic, who is operating in
the mental domain. It was empiricism and in his book called Treatise of Human Nature,
he strove to create what he called as the science of man that examined the psychological
basis of human nature. He said that everything is tied up to human nature. If you can
understand human nature, you can understand how human beings behave and what else
is there, essentially. Science and everything, derives from there. He follows this idea of
experience and observation as a foundation of logical argument. He was an admirer of
Newton, and he says in a manner in which, Newton express the movement of heavenly
bodies over planets and so on.

76
He says that impressions and ideas are like basic particles to which, mental forces and
operations are applied. Just as Newton is giving the laws of physics, Hume is saying that
there is a law of mental activity; law of associations, as he called it. They were; mental
ideas were like particles. He is not saying that they were particles; they were like
particles to which, mental force and operations are applied. Further, like Newton, he does
not care as to how that is happening. So, Newton had never explained how gravity
happens, or you know why gravity happens; there was no explanation behind there. He
just gives the laws of gravity, and says that this is how planets are moving around the
earth, and it is explained by gravity. So, Hume does the same thing. He does not try to
explain how it is happening; he says that this is what is happening, and it can be
explained by these laws; do not ask me, why it is happening, like that, essentially. But he
could not explain, however, what made ideas, ideas, now. It is like that once, you say
these are particles, which are obeying these laws, and then why are these ideas,
essentially. What makes their interaction between different ideas count as thinking,
essentially? So, he is done away with meaning, all together.

(Refer Slide Time: 35:07)

So, the last person, we will visit today, is Immanuel Kant, German philosopher, widely
considered to be central to modern philosophy. In fact, when I was in undergraduate, we
had a whole course, which data comparative analysis of Kant and Mills philosophy. He

77
says, and this is very interesting; they have come a long way in this short period of time.
From this notion that the world is out there, and we are simply saying the world; the
correspondence theory of knowledge and then, mind body dualism and then so; Kant has
come to the other theory. He says the mind has a priori principles, which make things
outside conform to those principles. These are some very consistent, with some very
modern ideas. For example, some very recent research in computer vision; the simple
view of computer vision would be like the correspondence theory of knowledge that you
get the image of things, and you do image processing, pattern recognition, feature
extraction, and all these kind of stuff, and then, you understand what is happening. It is a
forward process from the world to the mind.

Modern theory says that we have preconceived notions of what we are trying to see, and
what we see is already there, in our mind to some extent; this is what Kant has said. The
mind has a priori principles, which makes things outside conform to those principles.
Then, he says that mind shapes and structures experience; it is a mind which shapes
structures and experience, so that, on an abstract level, all human experience shares
essential structural features. All our mind operates in the same way. That is why we are
able to communicate; that is the question that one could have asked. How one human
being can communicate ideas to another human being? He says fundamentally, the mind
has a similar structure. Then, he of course, goes on to explain that the concept of space
and time are integral to human experience that you cannot operate without them, as are
the notions of cause and effect, essentially.

What causes? Causal theory is basically, a mental theory. In the real world, I mean, we
have these cause and effect kind of a motion that if I turn a switch on, the light will come
out. But the physics does not recognize any cause or causal theory. Physics only
recognizes the equations; it goes from one equilibrium state to another equilibrium state;
there is no causal thing, but these are fundamental to our thinking, essentially. So, the
second last paragraph is very interesting. He says that we do not have direct experience
of things, and we will visit this in the next class, which we have on Wednesday. We will
come back to this question. After, as he called is the nominal world or the real world
outside, we do not have direct experience access to the real world, but what we do
experience is the phenomenal world, as conveyed by our senses. So, we cannot; you

78
know, this is the very philosophical question. If you look at some Indian philosophies,
like Buddhism, they ask this same question, again that what is there in the mind, is what
we think, essentially. That is what Kant is saying, essentially. He says that human
concepts and categories, structure the view of the world, as we see it, essentially.

(Refer Slide Time: 38:41)

So, the world is not as it is out there, but as we see it, essentially; the world as we know
it. So, this is known as the subject-object problems. A longstanding philosophical issue is
concerned with analysis of human experience. So, the question is that the world consists
of objects and entities, which are perceived, or otherwise, presumed to exist as entities by
subjects; there is the subject. So, we think that the world has this object out there, and
how does that happen, essentially. There are some technical terms, which we should be
familiar with. The subject-object problem has two primary aspects; first is what is
known; what can exist out there, and this is something that we call as ontology, which
became very popular in current day computer science. So, the field of ontology deals
with questions concerning what exist, or what can said to be exist, actually, and how
such entities are grouped together, essentially, related within a hierarchy and that kind of
a theory.

Nowadays, computer scientists talk a lot about ontology, and in the concept of the

79
semantic webs. So, we have one computer, talks to another. One computer sitting here to
meaningfully, talk to another computer, and we have the social of ontology and
taxonomies, which we may not have time to go through in this course. The second
standpoint is how does one know what we know; and this concerns epistemology;
questions as to how knowledge is acquired. So, ontology says what can exist, and
epistemology is concerned with how do we get the real facts of, for example, why was
Durga suspended; epistemic question. How do we say that this is what is really,
happened out there, essentially; is the question of knowledge acquisition or
epistemology.

(Refer Slide Time: 40:48)

The bounds of our own mind; he creates. Kant says that he has done something, like a
Copernican revolution in philosophy; what he calls as a critical philosophy. So, he says
that two things; one is the epistemology of transcendental idealism, which says that we
are not able to transcend the bounds of our own mind. We can only perceive the world
through the prism of our mind, in some sense; through the spectacles of glasses as mind,
and we cannot exceed that. So, we cannot access the real world out there; only, what our
mind allows us to see. Already, the notion of mind has become prominent, essentially.
The moral philosophy in those days was not quite what we talk about it as right now, but
something, to do with the mental world. The moral philosophy of the autonomy of

80
practical reason; he says that practical reason can be automated.

May be, this is the last thing I will leave you with. Conceptual unification and integration
is carried out by the mind through concepts or the categories of understandings. So, this
is again, those terms, some ontologies are coming up. We have concepts about things.
We know, we have categories of birds, and flowers, and apple, and fruits; all these kind
of categories or things, operational on the perceptional manifold, which is build within
space and time. Space and time is something, fundamental to our minds. Our minds think
in terms of space and time, and everything that we think about is located within our
notions of space and time. They are not concepts, but are forms that are a priori
necessary conditions for any possible experience. He says that without the notion of
space and time, you would not have been able to imagine the world, and think about the
world. Thus, the objective order of nature and the causal necessity that operates within it
are dependent upon the mind’s processes, which he called by a product of rule based
activity, which he called as a synthesis. So, the emphasis is totally, shifted to the human
mind. It is a human mind, which shapes the way we see the world, and we reason about
the world, and everything is dependent upon that. So, from a notion when we did not
even have a notion of a mind, and then, gradually we said thought and reality is different,
and then, mind body is different. Kant has come, taken us to a point, which says that our
interaction with the world is controlled by our minds, essentially.

81
(Refer Slide Time: 43:25)

So, this is what we will do in the next class. Just to remind you of the goal that
Haugeland that we had said. The goal of AI is to build the machine with a mind of its
own. So, in the next class, we will come back to this the cantian view of the mind, and
discuss a little bit more, and may be, wind up with the introduction in that on next
Wednesday.

82
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Lecture - 4
AI Introduction

(Refer Slide Time: 00:25)

We will begin, so we are entering a last lecture of our introduction part. And I do not
know whether, you can read the names on the slides. These are the people we have met
so far. Starting with Nicolaus Copernicus are even before that Ramon Lull. And host are
people including Galileo and Hobbes and Descartes, Pascal. Pascal who said that
perception is in our minds. And if is smell, the smell of a rose, then that is a reaction to
though partials that are in ((Refer Time: 00:54)) upon our nose. And we are set of
pursuing based on that essentially.

So, we saw that there was a stand of listening. Now, this is you can say the pre-history of
AI. And you can see, from this diagram that it is about more than 500 years of history.
And their two stands to this. And one is said the physical side of trying to make, talking
heads, walking statues. And statues, which can you know not their heads and that kind of
stuff. The engineering all the physical all the contraction site of it. Because, they was the

83
believes an it can move autonomously. It must be able to think also essentially.

So, there leap of faith, that one is to make. The others stand was the emergence of the
motion of the mind. At some point you know, teachers like us for simply live in a world
and see the world. And believe that what we see is what the world is like. But, then along
cames somebody like Copernicus. And he said that, you thing at the sun is going around
the earth during the day, but that is not what is happening. What is happening is at the
earth is rotating. And it creates and elution of the sun going around the earth.

So, the fact it what you see is not necessarily what is out there had already started
coming out. And gradually then that the distinction between what we see and what we
think started happening. And that some point they cart said there two world out there.
One is the world of a mind. And other is the world of the body. And he had this idea of
mind what is dualism. Then as we moved along we saw cant for example.

Immanuel Kant one of the most influential philosophers a from Europe, who said that we
perceive the world. In terms of a priory knowledge that we have in our heads. And we
mold the world at we see into those a priory knowledge structures that we have. Of
course, he did not use the term knowledge structure, which we used now a days concepts
at we have essentially. And at the same time the mechanical contraptions of becoming
more and more sophisticated. They was just duct in trance or a vacancies.

So, some of you saw, BBC every Sunday if you see BBC you get something for this
course. So, this last Sunday, BBC showed a news item, in which in the south of France.
They have open the museum of all these talking, walking mechanical creatures
essentially, which apparently very popular. And you should keep them in shop window
to attracts shoppers, then thinks like that. And I could see there, this vacancies duct also
amongst the displease. So, they were getting sophisticated.

And we see that from, so these two stands the merging together. You know this moving
creatures and thinking futures in some sense. So, Pascal for example, we will so those
can recognize him here for the fact that he was the first person to invent a calculating
machine. Of course, it could only do addition. But, nevertheless it was a calculating

84
machine, which was set of improve later by lightness into something, which could do
more than addition, it could do multiplication and so on.

And it became more and more sophisticated, tell we came to Chales Babbage, who
invented a machine which could store of a program. And run the instructions in that
program, which is the notion of computers, that we are still working with essentially. We
also met Alan Turing, who sort of try to put down this debate on, what is intelligence
scan machine things. And that can he propose a Turing test that we saw earlier. He did
many other things and we will just have a brief mention on of him later essentially.

So, let me remind you of this definition by hogiland. AI is the quest for building,
machines with minds of your own. And we are ask this question, towards the end of the
last lectures. So, what are minds, we will come to that question later today, but a little bit
later. Before that with less complete the histories. So, we have seen the pre-digital
computer hire so far, the mechanical contraptions are people use the bill. How they AI
progress after the digital computer came into b, which is just along the time an Alan
during was around.

(Refer Slide Time: 05:37)

So, let us first begin with this piece of information is tells us, how did he get this name

85
artificial intelligence essentially. And the name was devise by John McCarthy. I think
you all know John McCarthy for some form of another. Among other things invent
language lisp, which became very popular in AI for many years. So, the name is credited
John McCarthy and Marvin Minsky, along with Claude Shannon. Who organized this
conference called the Dartmouth confidence in Dartmouth college in 1956; where
McCarthy is credited with the having devices name artificially intelligence.

Now, many people have said that, it is name is not a nice name. You should used
something like heuristic programming or machine intelligence or something else. But,
some of the name as stuck since that time. And we all know this area artificial
intelligence. So, hogiland for example, suggest that you could call it synthetic
intelligence. So, artificially and he makes with comparison with pearls for example.
Now, you have real pearls we have artificial pearls, which have kind of fake essentially.

But, you also have synthetic pearls, which have not fake, but which are pearls in sort of
made by humans. So, the conference was organized and it is charter was that if you two
months, ten mans study of artificial intelligence. So, name was point there. On the basis
of the conjecture that every aspect of learning was or any other future of intelligence can
in principle be so precisely described. That a machine can we made to simulated. So, the
focus all the emphasis is that when we are talking about intelligent behavior.

86
It is something, which we can describe up to the Manutius detail. And if you can do that,
we can make a machine do with essentially. That was the idea behind that. So, who are
the people, who organized this feels see them in the moment. So, for those of you hand
residing history. So, we should look at this two books, which I have mention earlier. And
this one book all machines for think, that is shown here by ((Refer Time: 07:57)) which
describes the full chapter on the Dartmouth conference. And the other book is John
hogiland, which is AI the very idea which is the most philosophical side of things.
Essentially will come back to that bit later.

(Refer Slide Time: 08:16)

So, who are the organizes of this conference, we have already said that they were John
McCarthy, he was then and assistant professor at Dartmouth. And he has done many,
many, so these people who have contributed so much to AI in one way or the other that
is not easy to list there. What all they are very active people. McCarthy inventor lisp he
invented something called situational calculus, which we will may or may not see in this
course.

He is also credited with having invented the alpha beta algorithm, which we will see a bit
later in this course. And he did lot of work on logic commonsense listening essentially,
which will see if you can come to that later. Marvin Minsky the only person of these

87
whose still alive was a junior fellow at Howard he and McCarthy towards set of the MIT
lab AI lab in MIT. And as you will see in the history parts today most of these work in A
I was concentrated in a few places in the US and few places in Europe.

And it is not as a everybody everywhere was working on AI. So, MIT lab was on CMU
was another place, stand for another place. And you know a couple of places in Europe
essentially. So, Minskye is very well known for his idea on frames, which is a way of
structuring knowledge into interconnected components. And it is basically, the for 1 hour
for three college object one day programming nowadays essentially. He also wrote a
very Influential book also society of mind.

And more recently book all the emotion machine essentially. The somebody have once
pointed out dealing over machine, lectures that machines cannot display emotions. So,
maybe it should look at that book. Another person there was what is turn, who was the
inventor of release at the designer of this IBM 701, which was the best machine around
the that time, in work the first assembly of for that machine.

He supervised arsis into writing of program for playing the game of checkers, which you
will talk about little bit, which was one of the early successive of AI essentially. Now, I
turns out that this Samuels program was a learning program. And Samuels goals for
actually learning, he wanted to see how computers could learn. And his program was the
program, which became better and better as it late more and more essentially.

And this kind of a generated, a kind of a fear amongst a people that is machines will
become smarter than us. More powerful than us and thinks like that, will come back to
that point little bit later. When we talk over is checkers program. And finally, Claude
Shannon everybody knows Claude Shannon, because of is in information to the end
there. He was the person who had hired Minsky and the McCarthy as in terns when they
were graduate students. And it is they are that they got this idea of floating together this
conference, which would talk about this new feel which was coming of call artificial
intelligence.

88
(Refer Slide Time: 11:33)

But, they was couple of guys, who were in some senses are shows dealers that
confidence essentially. So, let us first see, who there were the names we have might have
mention before, Herbert Simon and Allen Newell and ((Refer Time: 11:52)) corded says
are they that about them the two vaguely known persons working at Carnegie Tech. At
that time ((Refer Time: 11:59)) university it later on became CMU.

At that point it was Carnegie Tech and grand, who are also in mighty to the Dartmouth
conference ((Refer Time: 12:11)) as an afterthought. And it is these to people, who really
creative then big impact at the conference. Because, they had along with J. C. Shaw, who
also was that rand, build this return this program call the logic list. It was a logical
listening machine, a theorem proving machine. It put prove theorems in mathematics. LT
as short it was.

So, this say about this, it was a first program deliberately engineer to mimic the problem
solving the skills of a human being essentially. So, Simon and Newell for greatly
inference by the way human being solve problem. Because, after all we are sort of
existential examples of smart creatures. You know, we set of are thinking creatures if
you do not want to call us machines, who operate very effectively in the world. You
know, solving problems and getting along and so on.

89
And he wrote a book, they wrote a book call human problem solving, which became very
influential later essentially. Now, this program logic theories, went on to proves several
theorems on from ((Refer Time: 13:16)). See, ((Refer Time: 13:17)) had embargoed
upon this land exercise of formalizing all knowledge. And this had you know everything
that you can do mathematics will put it down piece of paper. There great dream or
shattered and 1931 by Kurt Godel, when he came and proved that you cannot become
powerful.

You cannot construct powerful enough systems listening system, which are consistence
at the same time. So, either you can be very powerful in the sense very expressive, that
you can talk about all kinds of things. All you can be consistence, but not both at the
same time any should that this is something, which will always followed if you try to
will power systems. And as some of might know, his arguments are basically scented
around, self-reference and self-negating sentences.

So, sentences like I am lying or the story about this barber, which resilient resolve so
worried about that if there is the village, in which the rule is that everywhere, who does
not shave himself shaved other barber. Then the question is who shaves the barber
essentially. Because, the barber if you shaves himself then he shaving himself and
therefore, he cannot shave himself. Also this kind of conundrums comes to self-
referentially a sentences and curd ((Refer Time: 14:49)).

That any formals system, which is expressive enough, will and a become in consistent,
which means in our the kind of things of day. And that however, do logics are never
going to be consistent. And complete at the same time will come to this versions as some
letter point of time. But, the simpler logic or first order logic are predicate logic is good
enough for us. And first order logic can be seen into capture everything appear doing in
programming essentially.

So, our programs can whatever, we can express in programs we can do consistently in
some sense. Now, this program LT produce some shorter and more elegant proves that
were present in this principia mathematical. This is a book barbers Russell and
Whitehead’s. And apparently, so this story goes I do not know well it is true or not. But,

90
apparently the journal of symbolic logic or journal of logic or something, refuse except
people.

Because, it is the authored by computer program, co authored by computer program. I do


not know whether, stories true a not, but you can find it in some places essentially.

(Refer Slide Time: 16:03)

So, let us first talk a little bit about Simon and Newell. A Simon was a multi faceted
person as you can see, from this court from Wikipedia. He was a political scientist
economist, sociologist, psychologist and a professor mostly at CMU who’s research
range across all these fields cognitive psychology, cognitive science, computer science
public administration economics and so on and so forth. He went on get a Nobel Prize in
economics.

And his long time associate was Allen Newell about 10 years is junior. And they did a
lot of collaborative work together essential. So, Allen Newell created this language for
IPL, in which LT was implemented.

91
(Refer Slide Time: 16:40)

Some little bit more on them, because they give us something which we base our work
on. So, they became reading figures at CMU. And they wrote this program for general
problems, which is based on even problem solving. And how human beings use
heuristics solve problem. And we will visit this general problem solve a idea, this idea of
means on analysis, which is the heuristic that views. We will see that sometime needs in
the courts essentially.

Their work also brought to focus a information processing approach to A I, which means
at you are talking about, that if you all the create intelligence system. It is enough to do
information processing. As approach to this others stand of effort, which was to say that
will bill system form bottom of will put together components, which make intelligence
systems and so on and eventually they will come intelligence. He said not we do not
have to do that.

You can work are they information level was A I some people call as a knowledge level;
and bell in the legend systems essentially. And one of things which came out of CMU,
one of the many things which came out CMU was this cognitive architecture call so or
which you can even or download and use to bill a good application. So, one all the things
a talked about was this architecture for AI what you neat for AI.

92
(Refer Slide Time: 18:11)

So, you have this idea of the physical symbol systems. And a symbol is something for as
we I concern, a perceptible something which stand for something else. So, a symbol
stand for something else. If you write the numeral seven, it stand for the number seven.
Of course, it is not the number 7, it just stands for the number 7. We could have in
different script, we could have written it differently. A symbol system is a collection of
symbols.

So, for example, a data structure or English language would or he one musical tune
essentially. So, you put them together you have a symbol system. So, you have alphabet,
which is made of symbols and then you put together things of alphabet and he have a
symbol system. And physical symbol is something which obeys laws which have like the
laws of physics. So, in some sense, if you can men ably if then using well define laws,
how rules then they are physical in that sense they are physical.

In the sense they can be manipulating apparently this law. So, anything you can use
algorithms or so the procedure for long division for example and so on and so forth.

93
(Refer Slide Time: 19:15)

The important statement that they maid is known as the physical symbol system
hypothesis. It says there the physical symbol system as a necessary and sufficient means
to generate intelligent action. So, what they are saying, that earlier need in our
terminology, earlier need to bill intelligent systems is the ability to create did the
structures. And right algorithm which will operator from the data structures. You deal
nothing else essentially. That is the basic infrastructure you need.

So, unlike for example, roger pen was who feels set the human mind, a human brain has
some kind of physics, which is going on which we cannot replicate reset nothing of the
slot. If you can do information processing which means, if you can operate on symbol
systems using well define algorithms you can create intelligent behavior. So, this is
known as the ability symbolic AI of classical AI. Classical AI follows this principle that,
it is a top down design approach to build in intelligence system.

So, that you will create your data structure and you would like algorithm and you will
producer intelligence systems. Ours Haugeland calls it good faction AI essentially. So, as
a first symbolic classical AI, we have a sometimes he call a sub symbolic systems or
signal level systems. Systems like human brain, which operate settle level where
information is not and coded into symbol. So, if you look at a neural network for

94
example. It is made of many nods and many edges collecting nodes.

And edges have values, which are numbers essentially all weights as we call them. And
everything, that the Newell is encoded in terms of those weight essentially. For the
weight itself does not stands for anything. It was a mean anything to us. In that sense, it
is not a symbol. A symbol should stand for a something, where as if a writer a let us say,
program in which variable call x, which stands for let us say the distance from plays a to
plays b.

Then it is some it is a symbol. Because, it is stamp for something. So, symbolic AI or


classical AI concern with explicit representation. And I will go with them for working on
representations.

(Refer Slide Time: 21:35)

And that is what we a perusing here. So, we have mention this Samuel’s Checkers
program he was also one of the participant in this Dartmouth conference. And he is
contribution was this program to play checkers. And he wrote it in this IBM 701
perusing. And as a bit earlier his goal was so explode learning for computers. His idea
was at if computers could learn. Then they would be known need to do all this pain
trekking programming.

95
And when he says programming in the pain trekking passion, he really means that.
Because, in those days be did not have this whole set of high level language is that you
are so comfortable, which nowadays a you had somely language. And maybe you, I am
not sure whether even four time was devise at the time. List was devise a little bit later
then the dark mouth conference. So, programming was very pain sticking in those days.

And Samuel said that, if you can make the machines learn then below have you program
them essentially. And you know that so fascinating. So, the active idea, that you will
some system and little learn and become efficient. And of course, we have a thriving
machine learning community ((Refer Time: 22:49)). So, his checkers program Samuel
was not great checkers clear. He was a computer scientist. But, story goes that he wrote
this program the program became better and better and better.

And eventually beep it is own creators as be sick. You know this around the time. when
Babbage was constructing is machine. And we have mention the his collaborator either
contuse of loveless, who was the daughter of Lord Byron. And Lord Byron have this
((Refer Time: 23:30)) peter shell end. And peter shell is a may be wife was may be shell
he, who wrote this novel call Frankenstein.

And this novel I do not know ((Refer Time: 23:42)) it is about a artificial creature, you
know people were fractionated the artificial creatures. So, this novel about this artificial
future creature call vector Frankenstein who was made by doctor. And eventually,
became like a monster, who would you known destroy is the doctors essentially. So, that
whole idea, that this machine would becomes smart and you know whole power us and
kill us have always been around for a long time. So, you have seen the movie matrix.

So, that is one of the films and will come to matrix again later for different reason, so
quite an interesting film. But, one of the I thinks in matrix is that this machines have
Clint to control human spices essentially. So, you do not always realize it when you was
things it film.

96
(Refer Slide Time: 24:36)

So, in 1940 sometime, I Isaac as Asimov who was the science fiction writers. He
introduce the so called laws of robotics, which we can many popular. It took the fancy of
all the people around. He wrote it as part of a slot story called run around, which he
wrote in 1942. So, they are the three laws that Isaac of Asimov said. So, lose in the sense
that, not like Newton’s laws, which we discover that the physical world is the obeying.

But, mould like a laws at human beings make you know, well legislation by the
parliament and so on essentially. So, for example, our parliament is necking a law that.
They cannot come under RTI essentially, so this kind of laws essentially. So, his law was
basically this three laws that the lowert will not injure a human being. That is the first
law. You know protecting the human spices. Do a robot must always we built in such a
fashion that never harm human being essentially.

Secondly, if will obey human beings at all times. Unless it voyages rule 1, so if I build
the robot an I tell it to harm another person then the robot should not obey too. And
thirdly, it must protect this old existence, as long it does not by late the first two laws.
So, it is you know all this is they in the popular imagination, because people are worried
about machines over taking human beings. Already for 4 500 years we looking at talking
moving heads, which they think are thinking.

97
So, chess as we know, for many reasons has been fractionation for computer scientist for
ever since a computers go invented.

(Refer Slide Time: 26:29)

So, let us look at the quick history of chess first. And then we look at a History of the
rest of the computing world. Before, we come back to the question of what a minds and
so on. So, you see all these name safe. And all this materials been taking by taking from
Wikipedia. So, you just at ago to Wikipedia at computer chess in you will see this time
line or part of this time line. So, clot channel first wrote a paper called programming a
computer for playing chess.

To will in developed up on paper and algorithm for doing it. McCarthy invented the
alpha beta I will go with them that we will study. But, other people also set a have
invented a including a Samuel essentially. Bernstein Alex once and who work then IBM
was the person to who wrote the first actual program, which could play chess. Complete
program a, who could which could play a complete game of chess.

Then program for Mack hack 6 by beam let was first program should defeat a person in
tournament plays essentially. And we are already mention David levy, who made this bet
in 1968 that no machine can him in 10 years. And it 2008 the wrote this book call love

98
and sex with robot essentiality. I when he is swan from one end of capability of machine
intelligent to the other end. That they could be like human companions to us.

In 1970 the American computer chess championship started. In 1974 a program call case
of from somewhere in the USS of won the first ((Refer Time: 28:11)) chess
championships and the small micro computers subject playing chess in 1977.

(Refer Slide Time: 28:19)

So, that was a initial years then in the latest 70 is the started making progress. So,
program call chess 4.6 well in the major chest tournament. 1978 David levy one his that
defending a program call 4.7. In 1980 of Fred kin price was institute it 100, 000 dollar in
1980 was quiet the bit of women if a beating of reigning world champion, program call
create it is. Now, creave was the you know super computer, which of deal by same
((Refer Time: 28:59)).

And they was the fastest for seems for a very long period of time the best and the fastest
machine for create a machines. And create blitz all the program running on those
machine. So, it when is the championship with the score of 5-0, which means it one all
the games. If the performance rating for those of your family with chess rating 2258,
which is the quite good. And it was a first program to be the human master. Along that

99
time people certain it investing in special purpose hardware to play chess essentially.

So, for example, a high tech mutual also developed on this time in 1988 ((Refer Time:
29:38)) had 64 processors. One processer dedicated for each where of the chess board.
And that, kind of think. So, Ken Thomson had a hardware chess. So, people where
investing more and more hardware. And the same time, so then a program call deep
thought. Share the first place with Tony Miles a head of the former was champion, not
the reigning world champion Mikhail Tal. So, deep thought is a name, which does it ring
the bell for anyone.

Student: ((Refer Time: 30:15))

He ((Refer Time: 30:17)) to the galaxy. So, when as been taken from there, it lost two
games to galicasple of 1989.

(Refer Slide Time: 30:28)

And after that, the years of tram for the machine. In 1992 a micro computers program
one a computer chess championship playing against mean free in some super computers.
A program call chess Genius actually, defeated cast ((Refer Time: 30:46)) game. In 1996
are program call deep loop, which was also developed at IBM lost to cast Kasparov of at

100
a 1997 it beet Kasparov when a 16 match. And it is creators a inventors or actually
awarded the Fred kin Price at ((Refer Time: 31:05)) mention sometime ago.

Then after was it was a series of losses for the human players Kramnik do an 8 games of
program call deep flits. Then program call hydra beet Adams 5.5 is to 0.5 which means
Adams could only do one game. Then Kramnik lost two deep fritz later. And nowadays,
it is routine for the human players including Vishwanathan Anandh from India to
contently take help from machines, while preparing their own chest playing strategies.
So, here we have a mention in 2010 Topalov prepares by sparring against super
computer Blue Gene essentially.

(Refer Slide Time: 32:10)

So, let us move on to general AI and look at what is happened since the dark mouth
conference. Before we come back to the main question again essentially. So, just quickly
done through some of these things. We already mention the general problems solved by a
Simon and Newell. Then known book on syntactic structure, which was very influential
in natural languages ((Refer Time: 32:40)). McCarthy introduces list ((Refer Time:
32:47)) wrote a program for ((Refer Time: 32:51)) theorems in geometry.

This was a part of PhD ((Refer Time: 32:54)). So, you will see a list of people whose

101
PhD works appears in land marks in A I. So, which is a kind of a motivation for some of
the PhD students setting here essentially. So, Minsky and McCarthy set of A I lab in 59
((Refer Time: 33:10)) build us percept on. The percept was the single layout Newell
network, which had this learning capacity. So, that Newell networks was program, which
have basically pursuing this idea of learning. You are training as he call it.

You show system of sequence of pattern a little learn to recognize that pattern. Samuels
programs beat the best players. People also started talking about machine translation
along that time. But, one of the early ((Refer Time: 33:38)) set that machine translation is
not something that we can do so easily, which is actually quite rule. In this early 60’s the
first industrial lower started coming indica wrote his influential book on knowledge and
belief, which talks about how can you formal the reason about what people you know.

So, I know that, you know that something you know this kind of kripke. ((Refer Time:
34:08)) introduced a formal model for this kind of reasoning call kripke structures. There
was progress in a cad so other land million wrote the program semantic nets. You knows
parcel on MIT was tested with this very well known sentence time flies like an arrow
essentially. So, I would earlier to look at this sentence and try to parse it. Now, the thing
about our summons is that we always ((Refer Time: 34:42)) to one parts of this sentence.

Because, we are like and said pre dispose to certain ideas award this world essentially.
So, well we here a sentence like time files like an arrow. We do not even thing that it
could have any other meaning then that time is going fast very quickly essentially. But,
for a machines, which is not biased by h p dispose ideas there could be other meaning as
well. So, this is the an ambiguous sentence. If you look at it from the fundamental point
of view, it could have other meanings and worlds could have a other meanings.

So, one of the problems in natural language processing is that our language is so rich.
That we can say, the same thing in many different ways. But, also conversely something
we say can have many different meaning essentially you know. With you of course, were
useful for politicians, because a say something and then they say that this, what a meant
out of corteges store something like that essentially. So, this sentence time flies like an
arrow. I would early to look at other meanings.

102
So, a look up the web and a try to see, what other past structures. So, when he say passed
we mean you know subject, object and you know known face work frees and this kind of
essentially. So, it is time adjective is it a verb. You can look at some of those option
essentially. A very influential book all computers and thought by ((Refer Time: 36:23))
around that time. They were programs to do in a we saw program for geometry.

Then we saw program for algebra by ((Refer Time: 36:32)) a program to answer
questions for people. You can put all this things together and build the system, which can
you know.

((Refer Slide Time: 36:40))

Teeth somebody let us algebra or geometry or something like that. In 65 Allen Robinson
introduce is varies this facets call the resolution method for theorem proving which
revolutionize is whole idea of theorem proving. By theorem proving we mean the kind of
seen that LT was doing logic theories was doing. That given a set of axioms. So, given a
set of flimsies what are the things that you can proved from their essentially.

And there was other developments Herbert Simon predicted in 1965 that by 1985
machines will do any work that man can do essentially. You optimistic people, we have
seen then Allen Turing also said that by 2000 the during test would be past essentially.

103
None of that has really happen. So, at the same time Herbert refers argues against,
((Refer Time: 37:39)) we have seen Eliza earlier in 1967 refers to a beaten by a chess
program at chess.

Also there was this program call Dendron which was one of the early successive of A I
we could do chemical analysis for us. So, we will not going to the details now, but later
on in the course we will see what Dendron died. But, it was it officiate as the level often
expert came is. Then mosses it is work on symbolic reasoning in A I, which is also
common essentially. That everybody uses some symbolic integration passages you know
mat lab, maxima and all these kind of packages.

Shaky the first robot appear then SRI’s stand for research instituted. Minsky and paper
wrote this book on perceptions. We mention perceptions they have single they have
neutron. What minsky and paper should in 1969 was that perception was limited to
recognizing only certain kinds of patterns essentially. And a kind of patterns at a person
from problem could recognized was pattern at for linearly separable, which means that if
you what the plot them in some space.

Let us a two dimensional of three dimensional space, a let us a two dimensional space.
Then you goes draw straight line, which would say a one side of the line is class a and
the other side of the line is class b. So, such patterns have call linearly separable patterns.
And what minsky and papered show would was a that is all perception could do. And it
is said, that this skilled the research in Newell quite a wild. It was not a till the mid 80’s.
The people started getting interested in Newell network again.

That is when they realize at multi laid Newell networks. Have do not have this limitation.
That they are not limit it to finding only linier separates. So, about that is what happen in
69 essentially. So, shrank talked about is contextual dependency theory. McCarthy talked
about the frame problem essentially. The frame problem says, the refiar representing
about of wall. And if you a reasoning about change in the world. How do you figure out
what is not change essentially.

So, if something was true, if may I watch laying on the table here. And I look that side

104
for a few minutes. And then a look at this again is a watch still there of course, I can see
it is still there. But, in the reason system after sometime after 2 hours will the watch still
be there. Of course, they have a different example log out a loaded gun, if you will not
get into right now. So, let me say, what was other staff so Nilsson and fikes demonstrated
this planning system call strips. We will see, a strips later in this course.

And wino grad wrote the program call ((Refer Time: 40:46)), which could do lateral
language conversation in a well limited domain, the domain of blocks world. So, it could
converse with a person taking instructions and do things. So, he should look of on the
net. And you probably get a sample conversation. So, you can say things like pickup the
green block and put it down talk of the red one. And it could do that. Understand what
you saying and do that.

In the sense, it would generate a plan for doing that essentially quite interesting. Along
that time this language for log was device by is French pan ((Refer Time: 41:20)). And
some more work on planning for example, apps strips essentially.

(Refer Slide Time: 41:23)

So, here the picture of shakey. This was the first autonomous robot that was built, which
could take it is own decisions. So, it could one the a one coli does of SRI, which as stand

105
for research institute. And do some limited things like a open dose and plug itself or
getting charge then things like that essentially. And so standard was another hotspot of a
activity people like Nilsson and many others there. And some of the name set we
associate with a shakeys lisp came by McCarthy.

But, as go with them like is a star and strips and visibility graph and collision detection
they were all discussed at time put.

(Refer Slide Time: 42:12)

So, let us move on, so and early 70’s. Schank and Abelson introduce idea of a script.
This is if you on to understand remember this cant says that we perceive the world in
terms of predefined a priory that is he call them. What are those a priories. Schank and
Abelson say, that these are like script. So, it is like a movie script. That, you always
follow a script when you have acting. So, if you so that the standard example that the
used was rest a gone essentially.

If you go to a rest around you a essentially following a script. That this is how typically
things are so if you here so the about somebody going to rest around. Then you can
understand the story, if you have the script in your head essentially. So, that is the idea of
scripts. We already ((Refer Time: 43:08)) Dendron another success was mission. It was

106
the rule based what people called as expert systems. And the 80’s was the time and
people wanted to bill lots of expert systems.

By expert systems they meant a system, which will capture the knowledge of an expert
typically, in a roof form we will see this later in the course. And performer to expert
level essentially. So, minimal the program, which this medical diagnose. Minsky wrote
about this names a advance version of Dendron called ((Refer Time: 43:42)) had some
learning capabilities. There were some more planning systems. There is more work on
visual representation.

The power of metal level reasoning for example, Randall Davis we see a another PhD
((Refer Time: 43:59)) program call A M, A M stood for applied mathematician. Created
quite it staff, when it was first published essentially. So, if you look at some those
papers, which talk about A M. It was the program, which len it claimed was doing
mathematical discovery. That you give it basic concepts and little learn you thinks. So,
concepts like prime numbers, the concept of prime number.

Not in algorithm to find prime or something. What is the notion that some there
something call prime numbers len at claimed his program could learn. And essentiality,
his thing was at it forward the heuristic that go towards the extremes of certain kind. And
one extreme is that number of devices pattern number can have. So, one extreme is two
essentially. You cannot have less than two devices except you know one and itself
counting one and itself. So, the smallest number is 2, so those numbers are interesting.

So, len it claimed that his program was doing that, but has it often happens, if will
discover that the lot of things for set of encoded into his learning program itself
essentially. So, that something that we have to be very about essentially. Now, we write a
program and we put in our knowledge some of which in embedded into the program.
And then we say this system is discovered that knowledge.

So, it talking about experts systems the system called prospector, which was built at the
SRI for prospecting the natural world. Found some unknown deposits of ((Refer Time:
45:39)) in Washington state. So, Mitchell events the concept of version spaces and so on

107
and so forth, which scripts some of these things.

(Refer Slide Time: 45:55)

And so you can them from the slides often the Wikipedia page essentially. I want to
point out the some of this, newer things, which were happening. So, one new thing we
was happening around this time was autonomous vocals essentially. Can you have a car
which will drive it selves. So, arise earlier 1980’s people were building car it should
navigate themselves on the road so in unique. The first car was done, then experts system
self started appearing in a market.

Another expert system call expand for configuring vat system so was developed. The
triple A I conference that A I conference started series started. In this design the
connection machine, which was maid suppose to be machine of thousands of processes
connected together little bit like our own brains are simple processing units connected
together. So, one interesting thing was this program call Aaron develop by professor
Harold Cohen whose still around, which could make drawing.

So, this drawing it is see there, painting that we see there is been created by a program.
And when you say created, you mean visualized and drawn. It is not somebody has told
the program draw man or something. The system has drawn this whole thing itself. And

108
if you look up Aaron on the web you will see a whole gallery of his spin, it is paintings
so essentially. As she had try to get Allen to draw the cover for my book. But, professor
co inside that know it not active any longer.

(Refer Slide Time: 47:39)

So, some of work and 1990 this program called TD gammon, which played the game of
bargeman using this technique call wean forcemeat learning, in the course learning right
now in the department. Beat the became the world champion in wage man essentially.
So, what tells is, so way these to robot cars in 94, 95, if you look at 94, 95 we have
descriptions of robot, robotic cars which can navigate themselves. So, here this car,
which one men from munic to coo pen Hagen and came back all by itself essentially.

You driving, breaking, accelerating, avoiding vehicles and so on and so forth. And this
already seen that 97 deep blue beat kasparov. The Robocop to an aments started is a
football tournament for robots. Robotic toys like this furby that you see there. The tiger
electronics came into picture. So, these a toys, which could furby apparently could learn
your language, you could teach it how to speak this thing. Sony introduced this AIBO,
AIBO is that small dog like robot that you can see in the picture on the bottom right.

So, there these to small dog like futures. They are these Sony AIBO robots, which

109
became very powerful of. In this picture they are participating the Robocop football
tournament. So, you can see an orange colored boll. They and you can see some different
proof humanity robots hanging around in the side line.

(Refer Slide Time: 49:26)

So, you see this picture, it is robot call kismet. Another PhD dissertation from MIT. A
robot that expresses emotion. So, again if you go the web and look for kismachivels see
videos of Sethia talking to this robot and robots smiling and you know, making all kind
of human express human like expressions in the system. The language ((Refer Time:
49:55)) web ontology language was standardized. A darpa introduce the grand challenge
in prize money.

Honda produce this robot call asimo is you can see, the small nee length robot which
could work around at human piece. So, it is not the easy problem to make up two legged
creature walk keep it is balance and walk and thinks like that. And that became very
popular. Recommendation systems started coming into play like amazons
recommendations if you go to Amazon or flip kart. And you look at the book, little all
immediately make some recommendation see you the people bought this as well thinks
like that.

110
Collaborative filtering for those of you attendant none stoke yesterday. He talked about
collaborate of filtering recommendation systems. When they was another dark mouth
conference in 2006 with said what is A I going to be the next 50 years essentially.
Checkers the game of checkers was slot. What do you mean by slot? Just as by solved
mean that if both the place are playing perfectly we can tell you what the outcome will
be essentially.

So, we can do that for things like, smaller games like cross and notes. A ticktack to as it
is called. But, for checkers it was done quite recently using expensive computing
essentially. So, we know that checkers is the drawn game. Or maybe it is a wind
provided do not know. Then much more recently ideas program call Watson, which
participated in this game call jeopardy, which look upon the wave deeds the world
champions at this game essentially.

So, jeopardy is the program, which in which we need lot of general knowledge,
geographic knowledge and thinks like that. So, this program add axis to all this
knowledge and it could ((Refer Time: 52:01)) game better than human beings essentially.
And we mentioned last week this could this robot call kerobo, which is a David levy was
talking about human companions to human being essentially. So, this robot was saying
him to space last week by the Japanese space agency.

And it is man it has a human companion to a human astronaut, which will follow in
November essentially. So, this a very brief history of what has been happening in the last
50 years, will stop here.

111
(Refer Slide Time: 52:46)

And the next that we will followed is come back to this question about, what are minds.
So, that will be the last segment of our introduction today’s essentially.

112
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Lecture - 5
Introduction Philosophy

So, let us get back to this question on the philosophical side of our introduction, so what
are mines essentially. So, if you remember the view from Aristotle, it said that our
thoughts are our reflection of what is out there. If I see an apple or if I think of an apple, it
is because there is an apple out there and my thought is in the image of that apple that is out
there. The diametrically opposite view was given to us by Canter much much later; he said
that what is out there is the reflection of our thoughts that what we think is out there is what
we think is out there.

Now you see there is a gap that you can never cross, because what you think is out there is
only what you are thinking is out there, what is really out there. So, you see a person you
know person sitting here. So, I think there is a person sitting here especially. So, I have this
concept of people; I have concepts of you know boys, girls, men, women, all kind of
concepts, trees, chairs. So, that is how I think about the world and it is out there. So, I also
have concepts about clouds and you see an image of something in the clouds; you see a
dog in the clouds or you know something else.

(Refer Slide Time: 01:52)

113
What is out there really is a question we want to start with and then we want to we have
seen already that the notion of the mind has come about the people understood that there is
something called the mind; gradually over a period of time when they realize that what
they are thinking about is not in direct correspondence with what is out there. But now
what we want to discuss is that it is a necessity that if you have to have an intelligent agent,
we have to have a mind which can represent the world in a way that it can manage the
representation effectively.

So, that is the question I am driving at today. So, let us start with the basic question as that
what is reality; of course, we live in a world. We are also part of this world, but what is the
objective reality? If you wanted to create a model of the world, how would I do it? No what
is really out there, any suggestions? We are talking about reality. So, you are saying that
reality is as we see a collection of situations which we can manipulate. No, but simpler
things like people take this young man sitting over here; is he real or what is the young
man; what is the human being?

(Refer Slide Time: 03:31)

Okay, now let us take a scientific enforce point of view, the physics point of view;
everything in the physical world is made up of a small number of fundamental particles.
So, whereas a small number of fundamental particles I mean there is a number of different
particles are small; it is not that the number of particles is small. The different type is small
actually. So, depending on what theory you are following. So, the simplest one could

114
follow the bore atom module, for example; you know an atom is made up of protons and
neutrons and electrons and everything else in the world is made of these three kind of
things essentially.

So, physics is, of course, I have been struggling to figure out what the world is made up of
but let us take it for granted at it is some fundamental particle; it could be something
smaller that proton and neutron, but everything is there, and these particles they obey some
laws of physics. So, the laws of physics are sufficient to explain what is happening in the
world out there. So, that is a first assumption that we will work with and it is not something
that we can dispute because then you are saying the physics is wrong essentially. So, like
Penrose was saying that the physics of the brain we do not understand, then you would be
saying that we do not understand the physics of this world essentially.

So, the physics is believe that that you know we do not know what the. So, there is a sling
theory, then they are neutrinos and gluons and that kind of stuff, but there is something out
there. And they all behave according to the laws of physics which we are trying to
discover; by the way they behave according to these laws, and, therefore, everything can
be exchanged according to these laws. So, that is enough to understand the world in some
sense. So, we are talking about people, what is the person. So, even adult, it is made up of
about 10 raise to 27 atoms. I do not know whether you can visualize the number 10 raise to
27, but you should try; it is a huge number.

So, we can think of a person or person not in the social sense, but in the physical sense; a
human body is made up of about 10 raise to 27 atoms. So, this young man sitting over there
is just a collection of a 10 rise to 27 atoms which for some reason decide to stick together,
and the reason is given by the laws of physics. It can be explained by the laws of physics,
and of course, we may call it in different terms. We may talk about biology or chemistry
and so on, but deep down they obey these laws of physical is fundamental particles obeyed
essentially. So, we have mention earlier that there are two views of the world. One is
materialist view which says that there is only matter and the other is the idealism view
which says that there is no matter essentially.

So, which is why in the last line I had said that if there are fundamental particles; you know
some people believe that it is all energy or something out there essentially, but that let us
not get into that kind of a thing. So, we are trying to understand the world; we are trying to

115
model the world; we are trying to why do you want to do that? Because an intelligent agent
and we discussed this in the early lectures should be able to operate in a meaningful
manner in the world. Do something useful for itself and achieve its goals and you know
learn and that kind of stuff. So, it needs to represent the world out there, but what is the
world out there is the question we are asking.

(Refer Slide Time: 07:35)

So, human beings is about 10 raise to 27 atoms, and these 10 raise to 27 atoms are
continuously interacting with zillions of other atoms out there; you know we breathe, we
eat, we have sound wave impinging upon us which are also made up of oscillating particles
and so on and so forth essentially. So, if we try to talk about reality in these terms like a
physicist, can we ever make sense of this world out there? And when I say it makes sense
of this world as I mean in the practical sense as an intelligent agent in some environment
trying to do useful things especially.

So, can we ever hope to write down the equations for them and solve them even if we
know the equations and what would we get if we solve them especially. The trajectories of
their location how does that help us? You know that atom number 259 is moving in this
direction or something, does not help. So, if you want to interact with the world, we need
to have our own level of presentation, and this is what I am going to be driving at today.
The world that we interact with like handset, it is a world in our minds especially.

116
(Refer Slide Time: 08:58)

So, we have already said that the world around us and including us operates according to
and can in principle be explained by the fundamental laws of physics; nothing else is really
needed, but it is too big for us to work with, because the number of amount of information
we would have to use would be too much essentially. What we do is that we create our own
worlds in our minds, and it is only our creation that is meaningful to us essentially. So,
coming back to the example of a person, I do not think of a person as an example of 10
raise to 27 atoms behaving in some concerted action.

I think of it as a single entity a person who is sitting on a single entity called a chair or
eating a single entity which is I call a dosa some masala dosa; for example, I think of it as
one thing essentially. Of course, it is not one thing; it is made up of so many things and so
many processes and I do not want to get into that at all. So, we create levels of abstraction
at which we represent things and reason at those levels of abstractions. So, we should also
keep in mind Newell and Simon physical symbol system hypothesis is that we can create
symbol systems and manipulate that symbol; that is enough for us to reason about the
world in an intelligent fashion essentially.

So, Douglas Osheroff who I will talk about again in a moment has a different notion of a
symbol which we will not peruse very much here. He talks about how the mind creates
symbol, how the brain creates these same symbols that we are talking about. We stand for
a person and so on and so forth. So, when I was talking about a symbol, I said something

117
which was perceptible. So, for example, I write the name of a person; I can read it
essentially or I can you know type it in a word processor and things like that, but Osheroff
also talks about symbol processing in the human brain that the human brain leads symbols
and what are the symbols? These are kind of concerted patterns of activity in thousands of
neurons.

So, we will not go into more detail than that, but somehow neurons act in concert with each
other in a manner which we are inspecting ourselves, think of a symbols that we are
reasoning with symbols. So, this kind of an idea has been exploited in movies; of course,
movies do not necessarily depict reality. So, again coming back to the matrix, so if those of
you have seen matrix would remember that its main character called Neil Neo; in Neo’s
mind, Neo was a software engineer working in New York city, and that is how the movie
begins, but in reality, whatever, that means, neo is in some cell in some human battery
which the machines have constructed to extract the energy out of him.

So, which is the complicated sequence which we have to sought off watch the movie
carefully to understand that when eventually that something is pulled out of his brain, he
really finds himself in some very unknown like place you know inside some cell where he
is just unit of a large battery or something like that. The important thing that I am trying to
say is that it uses this idea that the world we live in is actually in our minds out there
essentially which is why, of course, sometimes people can hallucinate. They imagine
something what is not out there, because their minds are sought of not in sync with again
let me use the word reality out there.

But there is a big case in that you know we do not know what is reality; we only know what
we know essentially. Inception is another film in which you would not know whether you
are dreaming or whether you are in some real world. And this is not the physical system
hypothesis says that if you can create a level of representation and is it not that level that
should be enough for creating intelligence systems.

118
(Refer Slide Time: 13:22)

We have dense slides that I have put in; we are not going to go through this slide. It is just
to illustrate the levels of scale at which our concepts exist. Remember that in the end
everything is made up of examples of this fundamental particles, but then we talk of
people, we will talk of football field, we talk of a planet. All these are at different levels of
scale. There is a very nice movie; I do not know how many of you have seen it called,
‘powers of ten’ which is available on the web, and there is a link I have given here which
essentially is rooms out of a level in which a couple is seen in a park and goes to the very
top most level here which you can see a 10 rise to 26 meters and then zooms down back
and goes to the bottom most level which is 10 raise to minus 17 meters and things like that.

So, at different levels you see the world with a different perspective. Suggest some
examples, for example, mustard seed is about one millimeter thick whereas the distance
from Chennai to Pune is about 1100 kilometers. So, in these powers of ten, you keep
magnifying the image all reducing the image by powers of ten every time and then you sort
of keep diving in deeper and so on and so forth. So, you can see at the top most level, there
are very things that we cannot I mean our mind bowels to talk about something like ten
billion light years.

First of all, you have to imagine what is a light year or remember what is a light year. It is
a distance covered by light in one year and we do not even think of light having speed. I
mean I just see you and you raise your hand and I see instantaneously; where is the

119
question of speed? We do not even have a notion of speed; of course nowadays we have
thanks to Einstein and all these people but light travels at a finite speed. And we do not
realize it because we know it is so fast for all of us here that everything happens
instantaneously.

We do not suffer from these effects of relativity and things like that. But ten billion light
years, how much would the light travel in ten billion years? I mean we cannot even think
of these kinds of things especially, alright, the very extreme end 10 raise to minus 17
meters where you have quacks and gluons and so on and so forth. So, the world as we think
of nowadays exists at these very different scales of things; that are at one level. At another
level, it is all just collections of fundamental particles which we have already agreed that
we cannot deal with at that level; that is why we think of these different scales especially,
okay.

(Refer Slide Time: 16:30)

So, our perceptible universe is a small subset of the scales essentially. So, the largest thing
we can see is may be about a kilometer across. So, like the golden gate bridge and San
Fransisco or something like that, and the smallest thing is may be a mustard seed or may be
some people can see pollen you know there is rays of light and you can see dust particles.
Some of them may be 0.1 millimeter across. So, you can perhaps think about things at this
level of scale essentially. So, the human mind as far I am trying to get at; the human mind
has evolved to create concepts at these scales essentially.

120
So, we tend to think of objects at these scales; that is why we are comfortable with this. We
are not even comfortable talking about how far the planet Jupiter is from here, because we
cannot even imagine that kind of a thing. And at least we have not evolved to imagine that
kind of a thing, and to reasons at different levels, we have created this different discipline.
So, we become a specialist in biology or geography or aestrophysics or anything.

(Refer Slide Time: 17:41)

But each discipline operates in its own level of scale; social science is operated some level
where we are talking about collections of human beings. Remember human being, each
human being is collection of 10 raise to 27 particles. So, social science in some sense is
talking about collections of 10 raise to 27 particles, but we do not, obviously, think in
terms of fundamental particles any longer, and we have these different disciplines
especially.

121
(Refer Slide Time: 18:06)

So, Hofstadter, so remember this mind-body problem which Decant was blipping with that
if there is this world of the mind which is reasoning about the real physical world out there
the body; how do the two things interact essentially? Nor the physics we say we have laws
of physics of the fundamental particle level, and if you know that, you will know how the
rest of the system is having essentially. But Hofstadter says that we have to introduce a
notion of what he calls as downward causality which means the causality is from a higher
level to a lower level.

Even though the laws of physics can explain going from particle level to ensemble particle
level, he says that is not useful for us, we have to think about how. So, for example, if I
want to drink a cup of tea, then I am thinking at a level about cup of tea and so on and so
forth. And this level of thinking which is happening with its concepts at this level of
abstraction is eventually driving at one level; you might say my muscles or my nerve cells
or something at even lower level you might say the very fundamental particles which make
up my hand, for example; in such a manner that my hand eventually reaches out for that
cup of tea and pick it up and you know take a sip from it essentially.

So, the causality is from our level of reasoning to the lower level where things are actually
happening. Now physics, of course, does not have a notion of causality; that is why
Canter’s even when he was talking about human categories, he was saying that is face and
causality are given to us that we accept we have to start working with those things

122
essentially.

(Refer Slide Time: 20:09)

So, these things are calling epiphenomenon. So, things like pressure, for example, in a
balloon, we talk of pressure, but what is really happening that lower level activities you
know molecules of different kinds of molecules in air, nitrogen, hydrogen, carbon dioxide,
everything; they are moving around randomly and you know impinging upon the inner
surface of the balloon. And this cumulative activity or the epiphenomenon phenomenal of
pressure is dealt essentially. Likewise, in our human brains, there are these billions of
neurons which are firing away in some fashion.

We do not tend to think of our brain in that fashion. I tend to think of my brain and say, oh,
I want to have this cup of tea which is operating at a very higher level essentially. So, can
a machine operate at epiphenomenon level like this, and that we feel is necessary for
machines to be intelligent. So, we have run through this. So, computers are manmade
objects; we know how they operate. So, it is easy for us to explain; for example, if you type
something in a word processor, in principle somebody can say that at the lowest level,
these are the kind of micro level operations which were taking place. But we do not do
that, of course; we as human beings tend to think of machines as doing higher level things.

So, how do we see a machine computer? As a music player or a web browser or a game or
any of these many things that a machine can do; so the important thing is that many people
have called starting with tuning is there is a universal machine. The computer just like us is

123
a universal machine. So, if you want to call yourself as a universal machine and the
universal machine is a machine which not only, of course, a simple machine can do only
what it is designed to do, but a universal machine can imitate other machines and do what
they were doing.

So, they are flexible in that nature. Scans as a machine be intelligent; Hofstadter, so I come
back to Hofstadter. He says that if this machine can introspect and examine its own
behavior, then it is possible for it to become intelligent essentially. So, he is going one
more step from Newell and Simon. Simon and Newell they said that if you can create
symbolic representations and create algorithm, you still work on this representation; that is
sufficient and necessary to create intelligent behavior. Now we can see that that is at one
level of.

So, there have been layers and layers that one has to talk about. In computers, we have bit
level representation; they have machine code, assembly language, higher level languages,
higher level data structures, representations, objects, all kinds of things and you keep
going higher, likewise in the real world out there. So, Newell and Simon said that one level
of representation which he calls as the symbol level is enough, but now Hofstadter is going
one step further. He is saying that in addition to that, you need this capability to introspect
essentially. So, if you read his book, it is quite an interesting book to read; I must change
the loop.

He sought of goes through a long detailed argument of how Godel discovered this idea of
self reference in Rusell and Whitehead Principia Mathematica, and this in spite of the fact
that Rusell and Whitehead went out to formalize everything, and they wanted to keep
away self referential structure. So, they had layered logical representation or typed logical
representation where self reference would not be possible. The same type of an argument
element could not be an argument to a sentence in that same language essentially, but
Godel constructed this very elaborate; he gave us very elaborate mechanism of how to
construct a sentence.

So, there is this two levels at which things are operating upon. One is this level of number
theory which is principle of mathematics is all about, but there is also this level of
encoding things into this number theory and then encoding sentences like I am lying or
something like that essentially or this sentence is not true and things like that. So,

124
Hofstadter is saying that if a machine can have this capacity to introspect and reason about
its own actions which means also reason about other people’s actions, then it can in
principle be intelligent.

So, let us talk about intelligence agents for a moment; it is a very popular term nowadays.
So, these are programs; we will talk of intelligence recent programs which are persistent
which means like the operating systems, for example; if we leave a machine on that is exist
all the times essentially. They are autonomous which means that nobody is saying that run
this program, run this routine or call this subroutine or something like that; they are
proactive. If the see an opportunity in the environment, they sense an environment; they
will go after it essentially.

And they are goal directed which means that you know they have goals; of course, these
goals may not be self generated; they could be given by the creator essentially. It is just
like we have that secret agents and the governments are supposed to have who have all
these properties they are persistent, autonomous and proactive, but they carry out the
billing of the government essentially.

(Refer Slide Time: 26:20)

This is a rough diagram of what an intelligent agent should be like. So, that white figure at
it is supposed to be the agent, and the thing inside that is what is in the head of the agent.
And what is in the head of the agent is the module of the world out there, and the module of
the world should contain itself which means it can introspect on itself. And, obviously, you

125
might ask the question as to it that if the model of the agent has a world in which the agent
is there, then in that agents head also I should create the model of the world. So, there is an
infinite level of next thing which is possible in principle. So, they are these kinds of very
curious loops which can form, okay.

(Refer Slide Time: 27:13)

So, we are slowly coming towards moving away from history and coming to what we want
to do. If you want to build an intelligent agent which interacts with the real world, then you
have to have at least these layers of different kinds of reasoning. One is the outermost layer
is what we can call a signal processing. It means you are receiving signals, sound waves,
light waves or whatever from the world, and the innermost layer is symbolic reasoning
which is what is classically AI is what all about that you can create symbol systems and
reason with them.

And you may have and this is my interpretation of this whole thing that an intermediate
layer of neuro-fuzzy systems which serve the purpose of converting signals into symbols
essentially. So, for example, if I am speaking and what I am creating is a signal which is
you know sound waves of a particular pattern, but your brain is converting these sound
waves into linguistic entities essentially. So, you are recognizing words out of this sound
wave; from these signals you are extracting symbols essentially.

So, if I say the word apple, it may be certain sound wave which is meaningless in itself just
like neural activity in our head is meaningless in itself, but you can process it to understand

126
at a higher level to stand for a symbol apple, and neural networks are particularly good at
doing this kind of things. So, you must have heard about fact that character recognition if I
were to draw the letter a on a hand written characters.

(Refer Slide Time: 28:57)

So, if I write A like this, if I write A like this, if I write A like this; we have no difficulty in
recognizing that these are A and neural network is also very good at this sort of a thing. But
at some point, you may start getting a doubt about whether I am writing an A or whether I
am writing an H. And it is very difficult to describe rules to say that this sequence of
segment forms A, this sequence of segments forms H and so on. Whether the learning
system which will learn these characters in context of other letters around them will
eventually learn to recognize the character A for example,

So, neural networks are very good at this sort of a thing, but if you want to do give an
explanation of let us say the Pythagoras theorem, what is the Pythagoras theorem and how
do we prove it. Then neural networks are not really very good at that kind of a thing; for
that we need this symbol manipulation ability which everybody is from Simon Hostettler
is saying it is necessary for intelligent behaviors essentially.

127
(Refer Slide Time: 30:06)

So, these are the topics that one can identify if you look at AI in general. This is not the
topics of this course. If you look at the enterprise of AI, then we have all kinds of topics
here, knowledge representations, semantics, anthology, models, search, memory, machine
running, problem solving, planning, adversarial reasoning, qualitative reasoning, natural
language understanding and all kinds of topics. So, on the left what I have drawn in this
figure are the sensing kind of activities. Signal to symbol kind of activities, speech
processing, image processing, video processing, computer vision, neural networks, pattern
recognition, studs sensors and that kind of thing.

On the right hand side, it is the opposite from symbol to signal. So, you have motor control
of. If you want to build robots, you have to eventually make the robot do what the robot is
thinking about doing. If the robot is thinking about going from place A to place B, it must
do something to make their physical movement possible. So, we need activators and things
like that at that time essentially. So, these are the topics of AI. In the circle, basically this
figure is taken from my book the circle roughly it is a kind of describe what is there in the
book.

So, of course, we are doing this course on AI which we are doing here, but in our
department, there are a whole lot of courses which cover these areas. And I just want to
give you some idea of the kind of courses that we offer. The first four courses that will
come are courses which I am personally involved with, but the rest of the courses you

128
know mostly my colleagues are handling.

(Refer Slide Time: 31:56)

We will start with this course which is Ai which kind of covers some of this stuff inside
this thing here, then planning and constraint satisfaction. These are names of courses. So,
this will be offered next semester; for example, both these courses will be offered next
semester knowledge representation reasoning as well. Then there are other courses which
my colleagues teach machine learning which is being offered now. Pattern recognition is
also being offered now I think. Natural language processing as well is being offered at this
moment.

Probabilistic reasoning is not being offered at this moment; very often I think Dr.
Ravindran offers it as a self study course. Then we have computer vision; I am not quite
sure which semester, may be this semester it is being offered. Speech technology, Kernel
methods, this is I think next semester, visual video processing, computer graphics. We do
not have so much on the output side. So, you can see our department is not very strong in
things like robotics. So, we do not really offer courses in that. Imagine mechanical
department may be offering some courses. So, in terms of assignment, I might have
mentioned this earlier; one assignment is going to be on game playing.

So, I will try to do game playing not in this order but a little bit earlier than this order; may
be after heuristic search or something like that. So, that you can get going, and we will
decide which game and we have to implement an algorithm for it. And your programs will

129
play against each other that kind of stuff. And another assignment would be
implementation of some of these algorithms.

(Refer Slide Time: 33:44)

So, we will sort of assign some algorithm and you should implement in that way; we will
go into the details as we go along course, okay.

(Refer Slide Time: 33:55)

The text books as I said it is a text book which I have just written and everything that I am
teaching is from there and vice versa in the sense what is there is what I teach. So, we will
use that as a text book, and then there is a host of others reference books that I will point to

130
as and when needed essentially. So, already from these reference books, we have in some
sense finished with two of them which is Pamela McCorduck machines who think and
John Haugeland AI the very idea artificial intelligence is the very idea, but some of the
other books we will refer to as and when the time comes essentially.

So, we will stop here and next on Friday when we meet, we will have quality type shift and
start devising algorithms for simple search that just we mentioned essentially.

131
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Lecture - 06
State Space Search Intro

(Refer Slide Time: 00:26)

So, today we start with first technical topic and that is state space search. So, let me
begin by first putting this whole thing into perspective.

132
(Refer Slide Time: 00:53)

I will just write this ((Refer Time: 00:49)) these. So, we are looking at one aspect of A I,
which we will call as problem solving. As we said, there are many other aspects that we
will not cover in this course for example, learning or using experience or reasoning with
knowledge. We will focus on problem solving and by problem solving we mean that, the
agent is in a desired, is in some situation and wants to be in some desired situation. So,
given to desired situation and the task of the agent is to make a series of decisions or a
series of moves, which will transform the given situation to the desired situation.

So, the task is to find these decisions, and that is the task we will be addressing. Now, in
general, there are two approaches to this. One is the one that we are following in this
course, which is search. Our first principles base approach, which means that, you have
some kind of representation of the domain. And you simulate, what would happen in the
domain as the consequence of your decisions. And then try to find those decisions, which
will achieve the goal that you are aiming at. The other is knowledge based, in which you
are trying to exploit knowledge, which has been accrued by some means essentially.

So, in general of course, knowledge says derived from experience, either your own
experience or somebody else with experience and convey through in a form of books and
lectures and things like that or stories. But, given the knowledge, we want to up exploit

133
those techniques. The knowledge base techniques themselves could be classified into
two kinds, which is memory based and rule based. Memory based techniques try to
exploit stored experience directly essentially. So, this area is also known as case base
listening.

And in fact, we have a separate course, which will be offered next semester called
memory based listening in A I, which will focus entirely on this approaches to problem
solving. And what the memory based approach does is that it, stores experience and what
do we mean by experiences. Each case as we call it is the pair, made up of a problem
description and a solution that work for the problem description. So, every time is solve a
problem, he could be a human being solving a problem and put in the case into the
memory and so on.

So, every time is solve a problem, you store the problem and the description and the
solution description into this case base. And when a new problem occurs, you retrieve
the best matching problem description and use or reuse the solution that was told along
with that essentially. So, that is the way, ((Refer Time: 04:36)) of using experience, that
is how, we learn many things for example, how to make ((Refer Time: 04:41)) or
something like that or how to solve some kind of integral equations, whatever the case
may be. The rule base approach is on the other hand, does not use experiences directly,
but realize on rules or you might say negates of knowledge extracted from experiences.

And very often a rule base approach needs a human inter mediatory to convert
experience into rules. So, the standard approach that was forward an expert system,
which we is we talked about dendral and mycin and prospector. These were rule base
systems and the way there they were built was that so called, knowledge engineers went
and spoke to domain experts and try to elicit from them, the knowledge that the use for
solving their problem. It could be diagnosis of whatever the problem was and try to put
that knowledge in the form of rules essentially. So, we have knowledge base method as
suppose to search base method.

And we will be largely exploring search base methods in this course essentially. Of
course, it is not as a search base methods or divide of knowledge, because we have to

134
still model the domain. We still have to create the platform on which this search will
happen. So, that extent we do need knowledge for search base methods essentially. As
we move along in the course, we will see that search by itself is not a very efficient
means of solving problem. Because, we will see that, we run into something call ((Refer
Time: 06:29)) explosion, that as number of possibility that we have to explore with be
too high essentially.

So, very soon in fact, in the next week itself, we will introduce some element of
knowledge into search. We will try to see, how search can be guided by some kind of
domain knowledge and we call this heuristic search, and the knowledge that we will use
is called heuristic knowledge. After that, we will move towards standardizing
representation of the domain. So, I to when we begin with, we will just make some ad
hoc assumptions has to how the domain is going to be represented, because we will be
focusing more on search. But, towards a letter half of the course, we will also try to focus
on standardize behave to representing domains and you will see logic base approach for
doing that.

So, we will do little bit about logic representation and listing as we go along. Now,
search can be of course in many difference spaces. We begin with, what we call is state
space search. So, let us a topic today, state space search essentially. So, let me show you
this puzzle, which I might have spoken about some time ago. So, you know this is a
rubrics cube. And it is for this six faces and you can move this face or you can move this
face or you can move any of this six faces by multiples of ninety degrees.

And so that is a problem that you have solve, what is the problem you have to solve that,
unlike these faces that you can see here, which have mixed up colors, you want all the
spaces to have only one color. So, every space should looks something like this. So, this
space is green color and if I can somehow make rest of the spaces of one color then I can
say I have solved the problem. So, the situation is that I am given this rubrics cube as it is
and the goal is to solve it, meaning get all faces to have the same colors essentially.

Now, I am sure you must be familiar with this puzzle, it was a device in the mid
seventies, late seventies by architect called rubric. And it became quite a craze in those

135
days; people would spend hours and days together trying to solve it essentially. Now,
you I want to, first thing I want to do with this is to highlight the case that, between these
two approaches the problem solving, search base and knowledge base essentially. If I
were to give someone this rubrics cube, who has lots seen at before and ask that person
to solve the rubrics cube. She or he would essentially do it trial in error, he would try to
see, what to do with this and what to do this and try to explore a sequence of moves,
which will solve the puzzle.

Now, this puzzle is interesting it looks simple, because as you can see, if you look at the
top face, it has got three faces, the top faces is solved, everything green here and these
sides, cube let us are of one color essentially, I need to solve the rest. The trouble is once
I have solved this top face, I can only move this bottom face and there is nothing yet, no
other movement I can make essentially. Remember there are six spaces, I could have
moved any of those six spaces, if I am not going to disturb this, I can move only this.
But, if I move anything else for example, if I make this change then you can see that I am
disturbing this faces essentially.

And in the process, if I am trying to solve the rest of the cube, I end up disturbing this
cubes that is why, this problem is an interesting problem and has been given quite a bit
of a study. Now, how many of you know how to solve the rubrics cube, only one, two,
three, it is, so some people at least. So, noticed that the question was how many of you
know how to solve the rubrics cube, which means that, if I want to give you this cube,
you with virtually without thinking do a sequence of moves within then the making this
solve.

So, you are exploiting these techniques, which we call as knowledge base techniques
essentially. So, it could be there this knowledge has been given to you in a form of set of
macro moves that to achieve this into this corner, this is the sequence of moves and that
kind of thing essentially. So, that is the knowledge base approach, this very once you
have the knowledge, it is the very simple way of solving a problem, whether trouble is
knowledge has to come from some were. And what happens, when you have a new
problem to solve.

136
Then there is no knowledge to fall back on, it is then that we need to take records to
search base method. So, we call search base method, just first principle methods that you
want, you do not have any carry over from the past. You are a given a problem and you
have to model the problem and you have to find the solution, we have solving the
problem essentially. So, we does not matter what the problem is, you can take any
problem and we will try to solve the problem.

So, one of the thing that we want to do in A I is not to solve the rubrics you by itself, but
to find is problem solving strategy on mechanism, which will solve virtually any
problem, which can be posed as a state space search problem. So, what do you mean by
state space search, that I am given the cube, this particular configuration of this cubic
cube is a state. The desire configuration of the cube is another state and in between there
are thousands of other states essentially.

(Refer Slide Time: 12:40)

And I need to find a mechanism. So, this state space is the set of states. So, let us say we
draw it as the set like this and I will use S for a start state or the given state and G for a
goal state. So, each of the anything state and there are other states as well, some well you
know, which we. Now, how all these, how do we transform the start stage to the goal
stage, we have to have a sequence of moves essentially. So, what are the sequences of

137
moves in the rubrics cube? So, if I call this is a top face, I can say rotate the top by 90
degrees or rotate the top by 180 degrees or by 270 degrees.

So, these three moves are possible on this space, more than that you will come back to
the same stage likewise, three moves for every faces essentially. So, 6 into 3, 18 moves I
have, which I can transform this state into a different state essentially. So, we will model
this process as moves, which will transform you take you from one state to another state.
So, in this case I have drawn 6 states. So, if some problem has some 6 moves, you can
move in moves you can make in given state then the graph would look like that.

Now, you can see, that the once you incorporate this motion of a move, what is the
move? The move is something an action, which takes you from one state to neighboring
state essentially. And even any state, there is the set of neighboring state that you can
move to, that is define by the domain essentially. So, we will assume that we have a
function called Move Gen function. It will take a state as an input and it term set of
neighbors.

So, what we are trying to do is to separate the domain from the problem solving strategy.
So, what I am telling you at this moment is that, choose a domain and take this as a small
exercise that you should try out yourself, choose some domain of interest that you have
define, how to represent the state and define the move gen function. The move gen
function is the function in the programming sense, that it takes as input a state and
returns the set of neighbors for that state essentially.

So, that is one thing that we want, so let us take another small puzzle. So, if we look at
the A I text books, you will find that they are sort of sprinkle with puzzles, because
puzzles are easy to describe, easy to represent, well defined and you know, you can do
work with them. So, there is you might say a younger cousin of this rubrics cubes, which
is on a flat, two dimensional puzzle, which is called things like 8 puzzle.

138
(Refer Slide Time: 16:06)

So, we are 8, 15, 24, in fact, any n square minus 1 puzzle. So, 8 is 3 square minus 1, 15 is
4 square minus 4 and so on. So, the 8 puzzle in particular is a puzzle any must have seen
this, which is got 8 tiles that is why, it is called 8 puzzles and these tiles have labels on
them. So, let us said number 1, 2, 3, 4, 5, 6, 7, 8 and this is let us say the start state and
the goal state could be something like 1, 2, 3, 4, 5, 6, 7, 8, so I am not going the tiles
state, goal state.

Now, in this puzzle I can have, as we can see basically, four different kinds of moves and
some people can to think of it as a movement of this blank, remember this, so what is the
puzzle you can slide. So, there is a blank square here, blank location here, you can slide
this tile here or you can slide this tile here or you slide this tile here, essentially. And
then in the process you will create the new blanks. So, for example, you could do
something like this, 1, 2, 3, 4, 6, I could do this. So, this could be a move that I could
make and you can give a label to this move.

So, you can either say that you are moved four to the left or you can say that, you have
moved the blank to the right; they are just equivalent face of saying the same thing. So,
let us say that we call this right, which means I have move the blank to the right. And
then you can see that, I can have an up move and I can also have a down move. So, I can

139
generate three successes states for this particular stage essentially. The whole question is
how do you represent this, so I will not go too much into that detail and I will ask you to
choose an interesting problem, describe the state representation and construct the move
gen function.

So, in this case the move gen function will return this state and this state, so it will return
a set of states. We will not give too much of importance, so the names of this moves at
this moment, but later when we talk of planning, we will be more interest than that. So,
we are interested in the set of states. Let me take one more example, which is well
known river crossing puzzles. So, you must have heard about this river crossing puzzles,
which basically says that, in most puzzles there is a boat, in which only two things can
go it one time, and there is a whole set of entities on one side of the river, and they have
to be transport to the other side of the river, and there are some constrains about what can
to go together and what cannot go together.

So, you must have heard about the machineries and the ((Refer Time: 19:37)) puzzle
there, three machineries and three ((Refer Time: 19:39)) and they have to cross the river
and the boat can take only two people, how do they cross essentially. So, as a simply
version of that is the man, goat, lion problem. So, what is this saying, this is saying that, I
have some river and on one side of the river, there is the man, I forgot one more thing
that is the cabbage. So, there is a man, there is a goat, there is a lion and there is the
cabbage and there is a boat of size two and only the man can rode the boat.

And the man needs to take everything on the other side of the river, but he can take only
one thing at the time. And the difficulty is that, if he leaves the goat alone with the
cabbage, the goat will eat the cabbage, and if we leave the lion alone with the goat, lion
will eat the goat. So, he does not want that situation, how can he get these three positions
across on the other side. You must have seen this recent movie, which these some boy
and lion in a boat together stuck for across the ocean.

So, obviously this is the very simple puzzle, it is not hard to solve, though the question is
not that the addressing here. The question is that, how can we pose this problem or this
problem or the rubrics cube as something that we can apply some general purpose

140
methods to solve essentially.

(Refer Slide Time: 21:46)

So, our goal would be to do something called our search should be domain independent.
The algorithm that we are going to look at should not depend upon the domain that we
are trying to solve the problem in. So, it could be the rubrics cube or it could be one of
these puzzles or it could be a scheduling problem or it could be something different
essentially. If we can abstract away from the domain, then we do not leave need to look
at the domain and what do a mean by obstructing away from the domain.

In our case, the first thing is to design a move gen function for the domain, which says
that given a state, this function should tell me, what the neighboring states are or should
a turn the set of neighboring states to be essentially. And obviously, when I say given a
state, I mean there is some representation of the state that you have to be concerned with
essentially. So, how can we represent? So, these of course can possibly be represented as
the 3 by 3 array or something in that, and it might be useful to think about it like that, but
I will leave it for you to decide.

Let us talk about this problem, how do we represent this man, goats, lion, cabbage
problem? So, the solution is simple I am sure you know it, we should first take the goat

141
then come back with the boat, take the lion, bring the goat back, take the cabbage, come
back and take the goat. So, you know likewise the mistake is in cabbage, all of these
puzzles have solutions, which are not so hard to find, can we write a program to do that
is the question, without having to write a programs specifically for this puzzle, can we
write general purpose program still solve puzzles like these essentially.

So, there are many other puzzles, the water jug problem you are given a jug 5 litre jug
and a 4 litre jug or you have given a 4 litre jug and a 9 litre jug, can you measure out 6
liters, these kind of problems essentially. So, how do you represent this? So, I want some
input from the class, some suggestion how can I represent this problem? How can I, what
is the state representation and secondly, what is going to be the move gen function. So,
the move gen function will be obviously something, which is operate upon the
representation that we choose. A three cross two arrays.

Student: ((Refer Time: 24:24))

Why three, why not four.

Student: ((Refer Time: 24:31))

(Refer Slide Time: 24:40)

142
So, you are saying that, I have an array like this, and this stands for left show, this stands
for the boat and this stands for the right show is that what you are saying. So, it is an
array of what, because there is more than one thing which is possible on each location.
Let us say, array of list show, so it is an array of list essentially, by how do you write a
move gen function. So, what is the initial stage, the initial state is everything is in this
square, why do we have two, why do we have second row.

Student: ((Refer Time: 25:21))

Then you should have four I think. So, that is when possibility I think, but when you
create a representation, you should also worry about the algorithm that you write on the
representation, because very often represent the choose, the representation that you
choose is helpful in sort of writing algorithm for in our case, we are interest in this move
gen functional algorithms. So, let me make another suggestion, which is that I have a list
of two lists. So, a list of two lists, the first list contains things which are on the left bank
and the second list contains things which are on the right bank.

So, I could choose that essentially. If I choose that list then my list were look like this G,
this is my given start state and my goal state is. But, the question is how do I write the
move gen function? So, that is the thing that I am trying to derive at, what representation
is goat and what is known. Now, how would you write a move gen function for this
representation, you would have to first search the state. So, this is the state
representation, you have to first. So, will we have a make, made an implicit assumption
here, that the boat, we are not talked where the boat is.

And we can effort to do that, because we make an implicit assumption that, wherever the
man is the boat is there essentially, because the boat cannot go by itself some were. So,
having made that assumption, which is hidden into some were here, how do I write a
move gen function. I will first need to inspect this representation, this list of list find out
where M is and then copy something from there, out cuts something from there other
along with M and take it to the other side, something could be nothing also essentially.

So, which means I can go from this state to this could be even successes state

143
essentially. The question is how do I write this successes state. So, I will not spend too
much time here, but may be will, we will look at one more possible representation. Now,
one thing that you would observe is that, this second list is ((Refer Time: 28:14)) we do
not really need the second list essentially. If we know that they are these only four
characters around, the first list is enough to tell us what the state is, so that is one thing
essentially.

And what was the first list represent it to represent the entities, which are on the left bank
or something like this. So, let me suggest another thing, which is that you do not
represent things which are on the left bank, but you represent things which are on the
side of the board and you will also represent, which side the boat is essentially. So,
which means inside of this, I would have something like this. So, let us say I do not
represent the man; I am just trying to explore different possibilities, I just say left G L C
and this is my representation.

So, what I mean by this is, their boat is on left hand side and which means the man is
also on the left hand side and along with the man, there is the goat and the lion and the
cabbage all of them. So, the advantage I see of this representation is at my move gen
function would be simpler, at least that is what I feel. Because, it is going to be
symmetric, whether I am going from left hand side to right hand side or right hand side
to left hand side, the operations that I will have to do would have to be identical, which
means that I will delete some elements from my list and create a compliment set, which
will be the new list essentially.

So, for example from this, I will delete G and take it to the right hand side. So, this could
be a move essentially or if I want represent the same move, I will delete L. So, I have
taken the lion to the other side essentially, it not a very good moves of course, because
the goat will eat the cabbage, but the state space algorithm has to search through all
possibilities. So, we are trying to look at that essentially.

I would leave it as an exercise for you to try out different representations for this and see,
and actually write a program, which will take a state as an input and return the set of
neighboring states as an output. So, there are three moves possible in this case. In fact,

144
there are four moves possible, the man can go alone or the man can take the lion or the
man can take the goat or the man can take the cabbage. So, it is the written this four sets
as my output to the function and this move gen function is a function which will allow it
and navigate the state space.

(Refer Slide Time: 31:25)

So, I will apply the move gen function to this state, I will get a set of neighboring states.
I will go to the neighboring state and see if that is the state I was interested in. Otherwise,
I will alive apply the move gen function to that state. And in this process, I will navigate
the state space, trying to reach the goal state that I will interest than essentially. So, this
general strategy that we are going to follow is has a generic name, which is called
generate and test. It says generate a candidate in when we says, candidate we mean the
candidate state and test whether it is a solution.

And we put this into a loop, and this is the high level strategy that we are going to refine
over the next few weeks generated. Generate a candidate and see if we that is the goal
say that we interested in our task is to explore this state space in search of the goal state
essentially. So, will keep trying out new states and seeing whether that is the goal state or
not, how do we see whether there is we are in the goal state or not, we need another
domain function and that domain function, we will call as a goal test function.

145
So, goal test will take a state as an input and output either yes or no. It will tell me
whether the state that I am looking at is a goal state or not a goal state. These are the only
two functions I need, which know need to know anything about the domain at all. So,
once I have return this two functions for a domain, I can practically forget about the
domain, I can I do not have to worry about whether I am solving the rubrics cube or
whether I am solving this eight puzzle or whether I am solving the this river crossing
puzzle, does not matter.

I have a move gen function, I have a goal test function and I would not find the sequence
of moves which will taking to the goal test and so, this is a very abstract problem
essentially. Incidentally, while we are at the rubrics cube, only very recently and by
recently I mean in the last five to ten years have search algorithms been able to search for
optimal solutions of a rubrics of a problem like this essentially. And by optimal
solutions, I mean the shortest, the smallest number of moves that you need to make
essentially.

If you follow the message that somebody has thought you would says, take the top layer
first, then make the second layer, then get this bottom corners, get the bottom squares
and then get the orientation. So, these kinds of macro moves, they will not necessarily
give you the shortest solution essentially. So, they will solve the problem, but they will
not give you the optimal solution. So, this is something that we had also mentioned about
human problem solving, that we do not necessarily always try to optimize things, as long
as we can solve the problem reasonably well, then we are happy with the solution
essentially, which is where the knowledge base techniques are so important essentially.

So, the move gen function was this would be an interesting exercise. As you can see
there would be 18 possible moves for every given state, corresponding to the 3 moves for
each space, so there are 6 spaces. So, any of those 3 would give a move gen functions.
So, we can see the state space for this is quiet large essentially and whether I can leave it
as the small exercise for you to see, how many possible different configuration there are
to these puzzle.

So, whatever done we have abstracted away from the domain and our, we are only going

146
to work with these two functions. So, the algorithm that we will write now, you will just
use this functions and do not worry about what the domain is essentially. And what is
that idea that once we have design these algorithm, then we can plug in any domain and
it will solve problems in the domain for us. So, there is this generality about our
approach essentially.

(Refer Slide Time: 36:07)

So, let us if find this algorithm a little bit and we will call it simple search one. So, let us
work with sets for this moment, we have to see when we generate this successes or
neighbors, we have to store there somewhere. And we will store them in a set, which is
traditionally has been called as the opened, open list or open set essentially. So, we begin
by saying open gets start state. So, we put the start state in the open and though the
following, while or instead of y it is say true, remove, I am using the set union operation,
because I forgot one step and the step that is going to be here is pick some node N from
open.

So, let us call this N now, because we are not only talking about the start state, but of any
note that we pick from this list, this set open. So, what is this, this refinement of this
algorithm is s, I create this open set, set called open, in which I put the start state to begin
with. And in general, my algorithm says pick some note, so we are calling it on note,

147
because we are set of already started thinking about these as a graph over which we are
searching.

So, each state is a note, take some note N from open, test whether it is a goal state, so the
generate and test idea. If it is true, then we should return, we are found the solution; if it
is not true, we will remove this N from open, and add instead the successors of open or
the neighbors of the open to that. So, I will keep adding things to my open list and keep
picking some elements from that test ((Refer Time: 39:24)). So, that is the simple basic
idea we start of it essentially, what this is ((Refer Time: 39:37)) is a search tree will start
by criticizing this algorithm very shortly, because this is by no means an interesting
algorithm.

But, it gives as a basic idea for what we are trying to do, what this algorithm does it
generate the search tree. So, one thing that I should sort of clarify is that, even though we
see the state spaces as a graph, because of every move we can go every state you can go
to the neighboring states. So, every edge is the move from one state to a neighboring
state and the whole state is a graph. And essentially, what we are trying to do is to find a
path from a start state to a goal state in this graph essentially.

So, it is basic into the graph search algorithm that you must have encounter then some
other in this thing. The only difference here is that the graph is not given to you, that
nobody says that, this is the entire graph for the rubrics cube for example and then finds
the path essentially. You have to generate the graph on the fly and that is done, that is
what generates the search tree essentially. So, we can visualize what is happening with
this algorithm.

Now, notice that, there is the very, I do not know whether you can read this, pick some
node N from open, so this some node is there N. A considerable amount of time would
be spent on refining this notion of some, because it really critically depends upon this
some has together your heading towards a goal or not heading towards a goal essentially.
So, will spent some time on that, but as of now, we will has say that, we are not specified
it, so you some criteria to pick a node from open.

148
(Refer Slide Time: 41:28)

And so, what is the search tree we are generating. We start with some search, the start
node S. So, we will use state versus node interchangeably at least for now. So, search
node is equal to state, the same representation. We will move away from this in a
movement. So, the first, the route node is a search start state or start note and we inspect
that node, route note what do you mean by inspect, we apply the goal test function and
we see, whether that is a goal.

Because, sometime the problem may not need anything to be done essentially you know,
it is somebody gives you solved cube and says solve it, you will say here, it is solved
already essentially, you do not have to do anything, but so you test whether it is a
essentially. So, we inspect the start state of the route note. And let us say that, this double
circle stand for deleting it from open. So, what I am drawing here is open list, I had S in
open and then I delete it, but I replace it with some successors of S. So, my open has go
now got four, four successors.

So, for example, in this situation I would have added three successors, because of the 3
moves I can make or in this situation, I would have added four successors. So, depending
on the state, some numbers of the neighbors would be generated and they would be
added. Then my algorithm says, pick some node from open, we have not specified which

149
nodes. So, let us say we pick some node and we generate add it successors two. So, this
is the general process that our search algorithm is doing to follow.

We have this set of open nodes, we will pick one from there and see whether that is a
goal node or not essentially. So, this actually reminds me of this in a moment, green this
interesting story about Hercules and hydra in Greek methodology. So, as you know, how
many of you heard about this story. So, Hercules fighting this many headed monster
called hydra. And the problem is every time he cuts one head of the hydra many more
appearance essentially. So, we can see, that is what our search algorithm is facing.

Every time, you cut one head or one node from the search tree, you get many more. So,
as you can imagine this, the open is set is going and going and we have to find a way of
solving this. So, he does not have to be this, the next one could be, so this is let us say,
number 2, this is number 3, this could be number 4 you are not specified, which node to
take, we will do that in a moment.

Student: ((Refer Time: 44:41))

So, we will start criticizing ((Refer Time: 44:51)) in a moment. So, his question is what
stops you from going in a cycle. So, that is in fact, the first problem with our search
algorithm, I would have about to ask you, what is wrong with this algorithm and that is
the first thing, which is wrong. And so, if I take this for example, rubrics cube again, I
can say make this move rotate right, this right face by 90 degrees. Then what are the
moves available to you, out of the 18 moves that are available to me, one of them is this
one, what if I choose this, then what if I choose this, then what if I choose this.

Then I have got in the cycle essentially and of course, you can imagine that there are
hundreds of cycles hidden there, I can just keep rotating this that is another cycle or
many other things are possible essentially, that is one problem with this. So, let us first
address that problem. So, one way of thinking about this search algorithms is to imagine
that, you are in a maze. So, you know ((Refer Time: 45:56)) have this mazes in olden
times were there is some. So, you are at some corner and you can see four roads, you are
inside some building, you can see some four paths going, you go to this path, then you

150
see another four paths and so on.

So, every, so this is like a node and this is like an edge. So, you move from here to here,
you see more nodes, how do you get ((Refer Time: 46:24)) of a maze without getting lost
without getting into cycles. There is again this are another interesting story from Greek
times, I do not remember who was this person who solved it by taking the thread along
with ((Refer Time: 46:41)) said that he will mark every path, if you see the thread again.
Then he has he knows at his back to that, something some state that he has visited before.
So, in terms of looping, what do we want, we want our search algorithm.

So, if I am going to start this particular state, I do not want to come back to this state by
going through a sequence of moves. I have to somehow prevent that, which means one
side waits this move from here, I should never visit this state again how do I do that.

Student: ((Refer Time: 47:17))

Mark that state has visited, that is the simplest way of doing it. And traditionally, what
we do is we introduce another list and well I am, I already started calling with the list
((Refer Time: 47:40)) I will be calling with the list.

(Refer Slide Time: 47:47)

151
Another set let us see and we call it closed essentially. So, the algorithm is modified as
this. So, this is simple search two, open as before gets the start state closed as is the new
this thing, which start with the empty set and pick some node from open. So, the same
and the new step, which is add it to closed, I can right it as a set notation, but anyway I
am writing it in English. Then, so let us say that note is N as before, if N, I do not know
whether you can read this I should write little bit more carefully, if goal state N then
return N that is what I am saying that returns the state, else before add.

So, what should I add to my open know, union what move gen N minus, anything in
closed. And let me prove upon that the little bit and say even anything in open
essentially, because it is possible that two states may be generated by the same
successors may be generated by two states essentially, two different states essentially.
So, for example, I can start with this state, then go this state, then go this state and this
state would have generated. This state would have generated the 180 degree move, where
I could have gone here.

So, it would have put it in open or not that move let us say this one. So, let me start
again, at this point on my open list is the 270 degree move, which will bring the green
thing here. So, I have added it to open, but I have made this move, from this move I can
make a 180 degree move, which will again bring the green thing here. But, I do not want
to do that, because I have already added this in the open list essentially. So, you just
think a little bit about that, from this I will remove the set open union closed. So, I will
not add any note that I have generated before or seen before to open, my open will get
new notes essentially.

So, this will stop me from going into loop essentially. So, there is this, if right what if
move gen this, this thing does not add anything new. Then I am also getting into a loop
or some kind essentially. So, I should have a test here, pick some node from open, if
open has become empty then I should say, there is no possible solution. So, I am not
adding it right now, but we will add it this thing, how can open become empty, if we
have seen all possible states and the solution state is not available to you. So, for
example, this puzzle that 8 puzzle that you know, all the permutations of these 8 tiles are
actually into two disjoint spaces.

152
So, there is one graph, where you can navigate from one set of state to any other state,
but there is a disjoint set, where you can also move from one set to another set, but you
cannot go from this set to that set essentially. And that can be obtain with simplify you
know flipping to two tiles essentially, they are in ((Refer Time: 52:30)) stage you can
never get them back, likewise the rubrics cubes. So, this is the tricks with some people
know you can take it out, take out this thing and reassembled the whole cube essentially.
But, the thing is you can reassemble the cube back in twelve different ways.

So, it is possible that, this particular set of states contains my goal state, but another set
of state does not contain goal state. There is one more problem with this algorithm or this
algorithm, which we will. So, I will stop here now, we will when we come back, we will
look at what this other problem is. So, I wanted to think about this in the mean while and
when we come back, we want to look at this, what is the second problem in this state and
try to address that. And after that, we will look at the behavior of difference strategy is
when he says some, what are the options available to us and how do the change the
behavior of our search essentially. So, we will do that in some sense in the next class,
which is after five minutes.

153
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Lecture - 07
Search - DFS and BFS

(Refer Slide Time: 00:23)

So let us get back to state space search and valuation that we are looking at, we are call
uninformed. Valuations which we are looking at today, I mean from in the sense, they do
not exploit any knowledge of any kind from the domain. On the next class when you
meet on a Wednesday, we will try to see how to get around this, how to exploit some
knowledge from the domain essentially.

154
(Refer Slide Time: 00:53)

So, the last algorithm that we saw was this, simple search to in which we had two sets
open and close, open contains a set of candidates set we want to inspect, and closed
contains, the set of candidate that we have already inspected. And the idea is to keep
generating new candidates, and adding them to open, and how do we can generate new
candidates by, removing from move the output of more than anything which is already
on open or which is in closed essentially. So, we will only get new candidates, and in this
ways, we will only had new candidate to inspect.

And if the state species finite, you can see that eventually, whatever states are reachable
from the start state, they will be explode at some point or the other, and it will come out
of this loop only, it open becomes empty. So, there should be a check here, if open not
equal to empty then do this, open him to empty report failure essentially. But, let us
assume that there is a solution to the problem, which means the goal said that we are
talking about, is reachable from the starts state, in which case we would find the solution
that some point or the other.

So, let us take the main missionaries in animals problem or the man, goat and lion
problem that we have been talking about. It is possible for the man to take the goat, and
lion and the cabbage on the other side, what will our algorithm do, this algorithm that we

155
have said, what are the incase dent and what is the algorithm giving us. So, what are the
interested in it start with that, what are the, what should the man do right, man should
first take the goat, then keep the goat on that side, comeback with the empty boat, then
take the lion and all these kind of step, what is the algorithm giving us? It is only giving
us to state end with satisfy the goal test that does not help us in any way, to solve the
problem.

(Refer Slide Time: 03:11)

So, that is the second problem with this solution, and we have to address that, but before
we do that, let me also clarify the there are two kinds of problems, one is call the
configuration problem, and the other kind is planning problems. In planning problems,
the solution is a path; in configuration problems, solution is a state. So, we can actually
distinguish between these two different kinds of problems. So, the river crossing problem
is the planning problem, we want to know, what are the sequence of moves the man must
do, to solve the problem.

But, there are problems which are configuration kind of problems, so for example, the N
queens, so you must be familiar with the N queens problem, I presume. The task is that
given a end by end chess board, you have to place N queens on the board, in such a
manner that no queen attacks, any other queens essentially. So, that is a well known, very

156
commonly studied N queens problem, there are variations with this, which say for
example, every queen must say attack exactly to other queens. So, that is another
configuration problem.

So, in such a problem, the N queens problems for example, the solution is only the final
state that we ((Refer Time: 04:41)) if you can show that such a state exists, then we have
solve the problem, it does not matter they. In fact, there is no notion of a path there, I
mean unless, you say this is the first queen to plays on the board, and this is second
queen to plays on the board, that does not make sense. We are interested, then some
configuration and such problems are called configuration problem, for which this
algorithm is fine because, it will return to us a state with satisfies the goal constraints.

Whereas, for such problems, we does not work, because is only telling us what is the
final status, how do we solve this problem. So, for planning problems, how do we have
to modify the search node? So, there we said the search node was the state essentially.
For planning problems, we have to modify the search node. So, what do I mean by the
search node, the node generated by this algorithm, what we put into open, what we take
out from open, and you know things like that essentially. And we have to modify it in
such a way, that it contains the paths information essentially.

Now, one simple way of doing that, is to store the entire path as the search nodes
essentially. So, let me use a small example, supposing this is the start state, and this is the
successors state. So, let us a some small states piece we have, some random graph I
drawn, this is the start state, and let say this is the goal state, and we have to find the path
from the start state to the goal state essentially. So, you say, so you will go, you will a
algorithm some will begin with start, then the move gen function will return it is
neighbors, which is A and B.

And then the algorithm will inspect a whether a is a goal or not and that kind of things,
so it will do all that stuffs. But, I want to know the path that the algorithm finds, from
start state to goal state. So, I modify this search node, to store the entire path. So, what
does that mean, I start with this node itself then my successors; I call them as B S and A
S. So, I keep a list of nodes as a, so a search node is a list of states or list of state nodes.

157
So, this list is the list of two nodes B S essentially, with basically says, I came to B from
S, this says that I came to A from S essentially.

Then if I about to expand A, then I could the let us assume that we are not going to add S
to successors of A, only C and E. So, I will write that as C A S and E A S. So, this is one
approach, that I am modify I am search node, to store the entire path. Of course, now I
have to modify my algorithm little bit, which I am not doing, but I am leaving does not
exercise for you to do, when you pick a node from the search node. So, this is, so these
are the search node, so this is the original node.

And then, in the new scheme of things, double circle stands for node gen closed, instead
of deleting nodes, because they not deleting nodes anymore, pulling them in closed. So,
we put this in closed, I do not know I hope it not to confusing, put this in closed and so
on essentially. So, these are the two nodes are in open and this three, let say we have
inspected already, when I pick a node from.

So, this is I mean, this is my search node, it is a list of states, I have to extract the first
element from this, which is C or E depending on which node I pick. So, I extract seat,
apply the goal test to C, same goal test function I can apply to C, and if it is fails. So, let
say C any C does not have any children, so let say I pick E, I apply the goal test to E, and
goal test fails, so what do I do I generate the successors of E, again the same move gen
function, I can use to generate successors of E, but I must append that.

158
(Refer Slide Time: 09:22)

So, what are the successors of E returns to me in this graph F and G, so I must take this
F, I must take this G, and append and put it at the head of this list. So, I must generate F
E A S and G E A S. So, I have to modified the algorithm is little bit, but notice that I can
still use the same move gen function, and I am only going to applying to a state. And I
can use the same goal test function, because I will extract the first state from this, and
apply the goal test to that, and then apply the move gen function that is essentially.

So, I will leave this as a small exercise for you to do, we will inset use another approach.
So, this is one approach, to convert everything into a path, the second approach is to only
store the parents essentially. So, search node is the pair, we will call it a node pair, and
the pair is current and parent, and let say it is a list of two nodes, in which we store the
current node. So, we will always be interested in the current node, we will apply the goal
test to current node, will apply the move gen function to current node and so on
essentially.

But, we will only stores, so here we are storing the entire path, up to the start node from
that, which of course, makes the task of returning the path must simpler, I have to just
take this and reverse it and return it as output S. So, for example, when I generate this,
and this is the, this the satisfy the goal test function, then I have to just take this and say

159
the path is go from S to A, A to E and E to G and that is solution. So, I have just reversed
this list, and return it as a path. Now I am saying, that instead of stoling this entire path
as a search node, uniformly store a search node as a pair, made up of the current node
and the parent node.

And my route will look like S comma nil, a nil stands for no parent, and then this will be
again B S, this would be A S as before, but this would be now only C A and E A, and
when I generate successors of A, I will get F E and G E. So, all this is gone into closed,
and let say whether does not matter, whether we have seen this or not, and when we pick
up this node. So, let me say this stands for the fact that we have found the goal node, we
have to still return the path, but now you have to do a little bit of extra work, to
reconstruct the path.

(Refer Slide Time: 12:46)

So, we need than algorithm, which you will call as reconstruct path, and to this algorithm
we will give a node pair as input, and which node pair the one whose first elements
satisfy the goal test. So, this for example, G E will give this as input to G E, so what do
we want to G do with the G E. So in general, I will have a goal node and parent node, I
will call it P 1. So, this is I am describing this algorithm reconstruct path algorithm. So,
the input to that is a goal node and it is parent node. So, it is a node pair and the

160
particular node pair is a goal node and it is parent node. And so I start constructing the
path, I say I had.

So, initially as a path gets an empty list, and then at this point as a path get cons G path.
So, I am using the list cons functions, because it is a convenient mechanism to add
something to the head of a list, so I have a list called path, and I am adding this node G at
the head of the path, then what do I do, how do I go to the parent, parents parent, I want
to go to the grandparent ((Refer Time: 14:20)) parent. Of course, I can find from here,
but from the parent, I will have go to the grandparent, they will a find the grandparent
corresponding.

Student :(( Refer Time: 14:34))

But where do I find the corresponding parent.

Student: if we store the parent ((Refer Time: 14:39))

But, how do I I am looking for a node pair of the kind P 1 P 2 correct. So, I know that P
1, I came to G from P 1, I am trying to find out, where did I come to P 1 from, and that
that is call it P 2, where will I find this P 1 P 2.

Student :(( Refer Time: 15:01))

I will find it in the closed list why, because I am putting all these things into closed. So,
for example, G E is what I have, and I am looking for something like E followed by
something, which is E A in this case, and will find it in the clues. And likewise the parent
of A, I will find in the clues and so on. So, search in closed, so I search in the closed to
find this node pair, everything is a node pair, opened with consists of node pairs, and
search consists of node pairs, and close consists of node pairs essentially. I find this, then
again like this, I find P 2 P 3, and I keep doing that, till when will I stop doing this
search.

161
Student: ((Refer Time: 16:00))

Till I find the route node which is, characterized by some P n, followed by nil, and at
each stage, I will do this, cons P 1 path; here cons P 2 path and so on and so forth, I will
keep doing that by the time I am come here, and found this, come out of this loop. So, all
this is happening inside one loop, going from parent to grandparent, till I find that there
are no more grandparents, at which point, my path for the win constructed, because I
would cons this P n to path. So, I would know that, I start with P n go to P n minus 1and
so on and so forth.

So, I need this extra bit of work to be done, to return the path, but what do I get in that
advantage is that I have a uniform representation that everything is a node pair, open
content consists of node pairs, close consists of node pairs and so on and so forth. So,
this part for example, searching in the closed, I will leave it as a small exercise to for you
to work on. And write this whole algorithm in fact, for reconstruct path. So, we have this
algorithm call reconstruct, a supporting call reconstruct path, we have also this
supporting in that is call it, remove seen, which removes nodes already in open or closed.

So, let us say that we have now move completely to a list based representation, where
our path is a list, node pair is a list, everything is a list essentially, open is a list, close is
also a list. So remove seen, when we are generating the new nodes for open, the remove
seen function should remove things which are either in the open list or in the closed list.
Now, we must be careful here, our open and closed has been modify the little bit, they
have become pairs. So, you have to take care of the fact that you are looking at the
correct element there, so I will lead that exercise for you to do.

And, we will assume that, we have something like a, suppose those of you who are
studied list or something similar, something like a map car function, which does the
following that. So, this removes in will represent, will give as a list of successors, so we
will apply the move gen function, which will gave us all the neighbors, there removes in
will remove from those neighbors things, which are already in closed or which are
already in open in it will give us new list. From this new list, we will apply this map car
function to, say that take each of the successors, and construct node pairs with the parent

162
essentially, so whatever it doing.

(Refer Slide Time: 19:42)

Let me draw it here, so there is some node N, we will call move gen, it will give us some
successors. Let us call them, let say A, B, C, D, E, F then remove seen will remove some
of them, let say this one and this one. So, we are left with B D and F; then I want to call
some function, which will essentially give me this list of B N, D N, F N. So, it should
give me a list of three elements, in this example, of the three successors, already
converted into node pair form. So, I can just take this and upend it too, open list
essentially.

So, my algorithm is still very similar, open is a list, I extract some element, from the list,
which is a node pair, from the node pair I extract the first element, which is the current
node I am interested. Apply the goal test to that, if it works or if it returns to then I call
the reconstruct path function, with that node pair and it will give me the path. If he does
not work, I generate successors of that element N, node N to move anything that I have
seen or I have put on open already.

And construct node pairs from so the successors of N or B D and F, so I want B N, D N


and F N. So, these are the B D and F are the successors, but I want to construct the I will

163
want to remember that N is a parental B; N is a parental D; N is a parental F. And I will
add this to open, and the same cycle will continue. So, that is a algorithm that we have
written, so the only thing that is remains is, when he said pick some node from open, let
us pin that down as well.

(Refer Slide Time: 22:06)

And we will do the following node pair, I call it node pair as a variable name, I will
always pick the first node from open, open is the lists now, and the simplest thing to do
with the list is to remove the head element. So, I will just say I will remove the head
element. And so, what will this give it, this give me a node pair, I will extract the first
element from that, and do all this goal tests and everything is to that. So, one thing I have
pin down is, that we will always take the first element from the open list.

Open is now a list essentially, what should be the first element of the list that is the next
question, that will really now determine the behavior of our search algorithm. So, there
are two possibilities, so somewhere down this line, I will have open as the following
thing append. So, I want to append two lists, what are the two lists, one is the old open
minus the note pad that we have removed. And the second is, the new elements that we
have generated by this process, this list that we have, I have to append this two lists to
found the new open.

164
Again I can append them in one order or the other order, and that is really going to this
side matter. So, let us choose one order first, which says that append new. So, this list,
number this is the list of node pairs, I am calling new, and this I went to a generate this
new, when I have inspecting the state, found that it is not the goal state generated it is
successors calling the move gen function, moved duplicates from there, made pairs and
call this a new, so this is new. So, it is in the form which can be added to open, so that is
new, and I will just use rest, all you can use, tail assuming a not removed in that step
from open, because this only which return the head, here I will take the tail of open.

So now, I have a completely deterministic algorithm, there is no ((Refer Time: 24:54))


statements about pick some node or add this node to open and so on. We have specified
everything completely; we have said that the new nodes will come at the head of the
open list, which means they will be inspected first. And I will always pick the head of the
open list. So, we want to now, look at these two options that we had, what is the other
option? Other option was the opposite, was to change the order. So, I have to choose
between this, either this or this, which option is better. Now, what do we get from what
rather how to be analyses choice, so this is the choice that we have to make in our
algorithm, should be up should be put new at the head of the list.

And keep the old elements behind or should we keep the whole elements first, and put
this in the ((Refer Time: 26:12)) So, you would have recognize this two data structures,
that we have kind of simulating here, this is the queue and this is a stake, this like a
stake. So, we can maintain open is there as a queue or as a stake, what is the repucation
of these two choices, is what we want to, inspect next. So, let us take this option first,
where we are looking at the new nodes first, so newest nodes first, what does that
meaning in terms of the search trees that we were talking about.

165
(Refer Slide Time: 27:30)

Let us draw the search tree. So, when I draw the search tree, I will not draw the node
pairs and everything, all let us any you can draw it does not matter. So, let say S nil and
A S, B S, C S in some graph now this is an ordered, this I am depending on the order and
which may move gen function gives me this A B and C, they will go in some particular
order that is not worry about that. So, I always choose the left most first, so I think this
and add let us call it D A, so the next one will be this one and so on.

You can see what is happening, this algorithm in which open is treated like a stake, as
this greedy like behavior, in the sense that, the latest nodes that have been generated are
explode first. So, these are old, so these are the latest, out of them one we have chosen
randomly, and officially without loss of generality we always chosen the left one, it does
not matter really. Because they were generated where move gen function, and we have
not specified them order there essentially. So, we officially a chosen the left most, these
are the newest; these are little older; these are the oldest.

The algorithm always picks the newest node first. In terms of the search space the states
space, what is this algorithm doing? It will take go from the start node, it will generate
some successors, it will choose one, it will generate the successors, it will go there and so
on. It will dive into the states space, you know headlong without looking left or right.

166
Till of course, it heads a dead end, what do you mean by dead end here? When there are
no new successors along this path essentially. In which case, for example here, if this
was to be a dead end, that they were no successors of this G.

Then automatically this would be inspected next, because that is that would be next in
line and open, if this was also a dead end, then automatically this would be inspecting
next essentially. And if this for a dead end then automatically this would be ((Refer
Time: 30:29)). So, the you can see the path going like this, and like this, and like this,
and like this, it dives into the search piece, there is a danger of course, that if the state
space is infinite. If I say you know, find me two numbers or three numbers, whose
powered to three A is to three plus B is to three is equal to C is to three.

You will just try generating newer and newer combinations, and you will keep diving
into thus search pairs. So, if the states, search space is infinite, there is the danger that
this algorithm will get lost essentially. If we look at this search tree, we can also called it
as the deepest node first, going down this direction of all the open list, who was the
candidates in open all these single circle nodes, this is in open; this is in open; this is in
open; this is in open, these two I will open. It always picks that deepest node first
essentially.

(Refer Slide Time: 31:46)

167
So, you should not be surprised at this algorithm is called depth first search, this
algorithm ((Refer Time: 31:58)) as oppose to that. The other choice we had, was to
maintain open as a queue, and as you can imagine, what will happen with a queue is that.

(Refer Slide Time: 32:24)

I will write the order here, this is one first node to be inspected; always the start node is
the first node to be inspected. This will be the second one, they will be generated, but
now it is a queue, which means these two will go behind these two. So, this should be the
third node; this should be the forth node; then this would be the fifth node; and this
would be the sixth node; then this would be the seventh node; the eighth from would be
this child here; then nineth ((Refer Time: 32:54)) child and so on.

It has a exactly the opposite effect, depth first search dives into the state space, this
algorithm chooses the shallowest nodes first. Even a choice of nodes, in this search tree,
it always pick the one which is closest to the start node essentially. And you know the
name for this algorithm, breadth first search, breadth first in the sense, first this layer
completely; then the second layer completely; then the third layer completely and so on
and so forth. In terms of the states space, what is this algorithm is doing, it will come
here; then it will go here; then it will go here and here and here, in this particular order, it
will clever the states space.

168
Now, notice that both these algorithms are totally oblivious of the goal state, goal state is
somewhere here, they do not cares, only place where the goals set plays a role, is in a
goal test function. If you have reach the goal state, we has the algorithm can consume
whether you reached or not, but given the set of choices, it has no sense of direction that.
I should go in this direction or I should choose this successors, that will come to, when
you look at heuristic searches, but these two algorithms are blind. So, we call them blind
search algorithms or uninformed search algorithms.

One of the ((Refer Time: 34:48)) to stick as close to the sources possible, the other has
opposite tendency a going as far away as possible is essentially. So, let us do a
comparison of these two algorithms, that depth first search and breadth first search. So, I
can remove this, what are the parameters on which we should compare? We will look at
four criteria, what is the most common criteria for comparing algorithms.

(Refer Slide Time: 35:45)

So, D F S verses B F S, the most commonly use criteria is time complexity. So, let us
assume the simple search piece. So, let us take the N queens problem, in the N queens
problem, let say that the way that you will proceed is, you will place the first queen, let
say in the first row, then the second queen in the second row and so on, up to the n th
queen essentially. So, the search space would be, end steps the solution will always be

169
end steps along.

And the branching factor is constant. So, if we have a search tree like this, why do I
provide like this, because the number of nodes are going to increase exponentially. So, if
I branching factor is five let us say, I will have 1 node at this first level; 5 nodes at the
second level; 25 nodes at the third; they will for second level whatever you want to
((Refer Time: 37:14)) 125 in this level and so on. It is going to multiply by the factor of
branching factor every time, and any and in such situations where you multiply at a
every stage the think tense to go exponentially.

So, this is how search space nodes the search trees looks like, so let us forget about that
five thing. So in general, we have given a branching factor of b, and let say the solution
is a depth d. So, it is let us say is a d queens problem, you place d queens and then you
are done essentially. Let say for argument sake that there is only one goal node, in
practice of course, for example, to the N queens problem, there are many solutions. So,
they would be many nodes which are the goal states, let us assume that there is only one
goal node, in some problem which is similar to N queens, what would be the time
complexity of depth first search.

So, before we do that, in the last layer, there are b is should be d nodes, and all the
internal layers nodes, how many are inside, up to one layer less, up to one layer less, how
many are the total internal nodes b is to d minus 1 these are the internal nodes. Now, the
first into observe, there is if we ignore that minus 1, for large b and large d you can
ignore that minus 1 on the top of site. So, you can say this is roughly b is to d divided by
b minus 1, even this minus 1 you can ignore for large b, but let us we are not bother
about that. The first thing to observe, is that is d th layer contains more nodes then all the
previous layers combined.

So, this is the nature of exponential growth that every time you go one level, further
away. The amount of work which do a that level, and by amount of work will mean
inspecting that many number of nodes, is greater than all the work that I have done
before that essentially, that is a feature we will use. I think it will have to be in the next
class, but we will see that. Now, what is the time complexity of this, so will next some

170
simplifying assumptions that the goal not can be anywhere from here to here. So, in the
case of depth first search, if it is here, it will just inspected after d plus 1 inspections.

So, let us say d plus 1, plus seeing the entire list, which is the entire tree, the ((Refer
Time: 41:02)) it and let say divided by 2 search some ((Refer Time: 41:06)) proximation
essentially, which if you work out, will turn out to be roughly b is to d essentially, of the
order of b is to d, of b is to d by 2 essentially. For this one, either it has to inspect the
entire sub tree, up to this point, see noticed that depth first search, were it finds a left
most goal, it just only inspect these nodes, just goals on this path and finds a goal node.

So, it only inspects a d number of nodes on the way, but when breadth first search comes
to this, it has to go through like this, and like this and plot through this whole thing one
slowly, slowly. It has to inspect the entire sub internal sub tree, internal tree before it
comes to this node, and for the right most of course, both of them I inspect the entire
tree. But surprisingly, I will leave that for you to work out, so this is the internal tree plus
the full tree and on the divided by two average, and then you can makes simplifying
assumptions, assuming that b and d are large.

So, you can simplifying the expression, but the time complexity for B F S divided by the
time complexity of D F S, after we make the simplifying assumptions is roughly b plus 1
divided by D. Death first search takes a little bit more time, but not significantly much
more, you b is large for example, if b is 10, then it is 11 by ten times essentially. So, if
not too much, it is kind of equals in some way, for both of them, as for as time
complexity goes, what are the other factors we can compare them on, completeness and
what do we mean by completeness, the question we asking is.

If there is the solution, which means, if there is the path from the start state to the goal
state, thus algorithm always fine it, if it is, if he does than will see the algorithm is
complete. Now, breath first search is; obviously, going to be complete, because as you
can see, is going to go slowly in circles and if there is the path, knew little find it,
whatever death first search, if the search face going to be infinite, which means this
boundary was not there, and even number theoretic problems are an example of that,
then it will just go of in some direction. And the goal may be here, it may just go of some

171
directions and the danger always exists essentially. So, completeness for death first
searches, if the search pair is infinite, it is not completely you can get lost.

Breath first search will still find the solution, because it is not going to go off in any
direction, it is going to just gradually expand it is, sort of set of nodes it has inspected, till
it eventually it will hit this, and then it will find the solution. So, breath first search will
become will work, even for infinite graph. For finite graphs, both are complete, because
eventually both will inspect all part essentially. So, completeness, breath fast search is a
slight advantage that if the graph is infinite, it will still weak complete essentially.

Quality of solution, the only ((Refer Time: 45:11)) we can talk of at this movement is the
number of the length of the path or number of nodes in the solution path essentially. And
we assumed that shorter paths are better, what do you think about these two algorithms?
In terms of quality, is any of these algorithms guarantee to give you an optimal solution,
you should speak a little bit louder.

Student: ((Refer Time: 45:58))

B F S now we have to say, what do you mean a might ((Refer Time: 46:04)) even B F S
might give up to essentially. It depends on the order and which you done it moves
essentially. Let us take a graph start A, B, C, D, let say we take a graph like this, you
have to go from S to G, what will and let us assumed that, the particular order and we
choose the left most on the board first, what will depth first search to, it will go from S to
C, C to D, D to F, F to G. And let it will say I found the path, but the path I could have
found this one, what we breath first search do, it will go from H to C; then it will go to B;
then it will go to A; then it will go to D; then it will go to E; then it will go to G.

G would be generated as the child of B, and you should work this out as an example
yourself. And the parent of G would be B, and the parent of B would be C, breath first
search will always find the shortest path. So, I want to you to change this example and un
simulated, to see that depth first search and breath first search. Depth first search always
guarantees ((Refer Time: 47:26)) in terms of this you can see, there is gradually moving
away from the source, and when it hits the goal, it will always find the shortest path to

172
the goal. But, we have to sort of reason, it out to a little bit, quality of solution B F S
scores high heavily, it always current is a shortest path essentially.

Let us look at the last criteria, which is size of open, now we can see that, this time
complexity the way we are measuring it is in some sense, measure of the size of closed,
because the number of nodes that you have seen, will be the size of closed essentially.
And we are assuming, for the timing that it takes you constant time to inspect get nodes
out of close, and all that kinds of that. So, it is roughly call response the size of closed,
what about the size of open, what is the length, what is the size of the open for the breath
first search first. So, we have to visualize what the algorithm is doing.

(Refer Slide Time: 48:49)

This is my search space and for breath first search, what with the open list look like,
open would, so you will inspect this; inspect this; inspect this and so on. So, let say this
is, let say this is a closed list, we have inspected so far. So, open would be all these
nodes, and all these nodes here, which are the children of these nodes essentially. This is
the open would list look for, depth first search, and what is the size, how is it growing in
terms of depth, it is growing, remember that, then b is to d nodes here and b is to d plus 1
if you want to say. So, it has got some from b is to d know and some from b is to d plus 1
row.

173
In general it is growing exponential with depth the deeper you go, the size of open
explodes, it becomes exponentially ((Refer Time: 50:02)) what about depth first search,
depth first search would come here. So, let us let we draw separate this thing here, let us
say it is three children here, this is gone into closed; this is gone into closed and so on,
what is happening with depth first search? As I go deeper into the search piece, it is
going linearly, how is it going linearly, because as I go deeper, I am adding a constant
number of nodes to open. I am adding basically two extra nodes, at this level they were 3
plus 1 I am going to inspect, so this thing then two extra here; two extra here; two extra
here.

In breath first search, I am multiplying by a branching factor every time that is it is why
growing exponentially. So, you should satisfy yourself that the size of open, for depth
first search is b minus 1 into d plus 1, because that every level I am going to keep b
minus 1 nodes into I am going to add b minus 1 nodes into open, one nodes I will inspect
this thing. So, b minus 1 that d plus one extra node, because at the last layer, I will have
one extra node essentially, so size of open D F S means hands down, grows only linearly
as oppose to breadth first search for which open grows exponentially.

So, we have seen this two algorithms, for finite graphs, goals are complete, goals are
roughly the same time complexity, but one of them wins on the quality of the solution,
which may be important. If you have planning, let us say trips from here to the moon or
to mars, then thus smaller number of trips is going to save your lot of money. Let us say
from here to Bangalore the same thing, almost the same thing, quality of solution,
breadth first search guarantee is you an optimal solution. And you must have survey this
in other context, open requires only linear space for, I mean depth first search requires
only linear space problem and that is a big, big plus.

Time complexity is bad, we will addressed that using heuristic methods, try to force a
search to go towards the goal as we have seen. This one is the ((Refer Time: 52:45))
close to home, this one is going off in some random direction; know none of there is sort
of looking towards the goals essentially. We will see that, then the complete heuristic
search, but even a blind search algorithm or in from search, can I combine, can I device
an algorithm, which will combined these two plus points of this, guarantee that it is an

174
optimal solution, yet require linear space for open.

So, I will leave this as a small thinking exercise for you, when we meet next on
Wednesday, we will look at an algorithm, which does this. So, it should be nice, if one of
you can think of it without reading it from somewhere else. After we discuss it here, you
can go and read it out, but there is an nice algorithm, which will combine these two
things, and it is a blind algorithm, so see you on Wednesday.

175
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Module - 1
Lecture – 8

(Refer slide Time: 00:14)

Let us continue with state space search, let me just do a very quick recap, and then we
will continue from there. So, the state space is a space in which made up of many states
where each state represent the particular situation. And the states are connected together
by the moves you can make of the decisions we can make. So, that if you are in a given
state, and we had defined this function called move gen.

Which takes, n we will use for a node in the state space. So, the state space is implicitly a
graph I say implicitly because, it is not available to us. It is not that the graph has been
given to us. And we have to find a path in that graph we have given start state, and we
are given either a goal state or the description of goal state. So, for example, in the river
crossing problems we had given a goal state that everybody must be on the other side of
the river. But on something like n queens here given a description of the goal state it will
says that, place and queens that now, no queen attacks any other. Either way we have

176
given some idea about the goal state, and then the search algorithm explode the state
space. And how does it searches the state space it applies the move gen function which is
neighborhood function which gives you the neighbors of each state, and then you inspect
one of the neighbor and see if that is a goal state that you entrusted. If not then you
generate more states and so on ((Refer Time: 02:04)).

So, this search state space search generates a search tree. The state space is at graph at
least simplicity it exists even though has it with us. And the state space search generates
a search tree, and we saw that this search tree is characterized by. So, we start with some
search node, and then we go down searching for the tree we had seen this before so;
obviously, every tree has two kinds of nodes; one kind of node is we will call as we have
called as open. Open is the set of leaves actually the set of leaves we call as open, and the
set of internal nodes we call as closed.

The open leaves are set of leaves is search frontier that is the set of candidate nodes that
you have generated but you have not inspected. That we can think of the search frontier,
and set of internal nodes is the memory of past nodes visited. So, I will just call it seen
nodes. We started off by saying that open and close are sets, then because, you want to
implement this using some algorithms we said let us call them list essentially. And as we
will see today list notation is not necessarily I mean list structure not necessarily the base
structure we will come to that little bit of a while.

But basically the search space is the search tree is characterized by these two let us call it
as set or lists whatever, open and close essentially. And just recap the algorithm we
extract. We had moved from storing only the states to a pair of states, and the pair
consists of a given state, and the parent state. The parent state is the state from that given
state was generated essentially.

So, we have this node space we said that initially open gets this pair of start comma nil. I
will just write the outline today, and then, while open not empty. If it is empty then it
will turn failure. When will it be empty it will be empty only for finite state spaces that
you would have ended up by inspecting all the states. And none of them happens with
the goal state in which case it will be empty, and then you can say that goal state cannot

177
be found.

For example, in the eight puzzles I had pointed out that the state space is actually
partition into two sets of states which are connected to each other. Once set is not
reachable form the other set. If you gave the state say it in one set, and the goal state in
the other state then it would not be reachable, and you would end up storing all
inspecting all the nodes reachable from the start state and say that we know we cannot
solve that problem.

If open is empty then you return failure that will happen at the end we will not try it here.
Otherwise we get node pair as head of open I will not write the details we did it with a
little bit in the last class, and otherwise you can look up the book. From this node pair we
extract a node let us call it as n. which is the first element of the node pair, and we apply
the goal test n we apply the goal test function. Remember the goal test function takes a
node, and tells you whether it is a goal node or not. Somehow you implement the goal
test. So, move gen n, and goal test n these are the two domain functions that we have the
rest of the search algorithm that we are writing is independent of a domain. As long as
summery provides with you move gen function, and the goal test function you can use
this algorithm to solve problems in any domain essentially.

You do the goal test; if it is yes then you reconstruct the path. If it is no you apply move
gen n. And you get some success you get this neighborhood this node n, and to this you
do some filtering you remove the things that you have seen. So, I will just say remove
seen there some function which will take the output of this, and filter out things we
which are already existing in closed or in open essentially. We do not want to generate
the same node again because if it is in closed we have already seen it, and if it inspected
again we are likely go into loop. If it is opened we will see it sometime because, it is an
open anywhere no point keeping two copies of it in the open. So, this move gen basically
removes moves from open, and close the successor of n.

And then we have a function called make pairs. Whatever remains after this filtering we
make pairs what is the pairs. This node we take this node n as the parent of each of these
nodes inside this. This nodes will look like x coma n because, these all the children of n.

178
n should be the parent of these nodes, and then we add this things to open or to the tail of
open. If you want to be precise because, if you remove the head we have not removed the
head here. So, actually we should add ((Refer Time: 09:01)) but anyway that is a basic
idea. And then we saw that there are two ways of doing this; one is as the stack, and the
other as a queue. which means that you either add the new ends at the head of open then
it behaves like a stack because, they will be first once to be inspected. Because we
always extracting the node from the head of open. And when it is a stack we saw that this
behaves like depth first, and when it is a queue we saw it behaves like breadth first.

What are the characteristics of depth first and breadth first? Depth first basically dives
into the search tree. And breadth first is more cautious its set of plots through these
things. These were the two characteristics of these two. You can say depth first just goes
where it is nodes takes how to speak, and breadth first stays to close to start space
essentially.

Then we had looked at the properties of these two algorithms. So, we want to compare
properties on two features; one is time complexity. Now time is bad for both by bad we
mean the worst case situation or the average case situation. In the best case they will find
the goals for example, depth first search finding the goals state in this branch itself in
which case it would find it in linear time a breadth first must finds the goals state
somewhere here which is very close to start state. And it will find it very quickly
essentially that is a best case.

On the average case, and the worst case this time complexity for both is of the order of b
is to d, where b is the branching factor. And d is a depth at which the goal occurs
essentially. which were there was a little bit of a difference the first breadth search had
slightly moved time complexity then that of depth first search it was only slightly moves
essentially.

This is something that we will start addressing today a bit later it is a time complexity we
saw completeness. And by completeness you mean will it find the goal state or will it
find a path to the goal state if one exists. And the answer in the case of breadth first was
vocally yes in the case depth first it was it is guaranteed to find the goal state for finite

179
state spaces not for infinite state spaces. Because in infinite state spaces they could go
some infinite loop or some infinite branch. We will assume that we have working with
finite state spaces and will answer yes to this.

But keep in mind that this is only for finite state spaces only. Then we saw two more
properties one is space complexity, and we found that space was good for depth first
because it keeps only it adds only consent number of nodes as it goes down because if
the branching factor is b it goes down it will add b nodes inspect one of them then again
add b nodes inspect one of them and so on. Adds b nodes which means this space
required goes linearly.

Whereas breadth first search will first inspect the entire loop generate till then of all
those nodes. So, it will become b into whatever the width of that was, and therefore, it
multiplies by b as if goes down deeper and deeper and therefore, these goes
exponentially.

So, this was the plus point for depth first search. But quality was the plus point for... I
will just write plus here for breadth first search. Because, of the fact that breadth first
search only floats slowly into the search space. At whichever the layer the goal space
goal load occurs it will find that path till the goal load essentially which means it always
find the shortest path to the goal. And I scope you have convinced yourself by
constructing twice examples if we have not please go and do it essentially.

These are, this is what we did last time. There two things we want to do today one is try
to see if we can find an algorithm which will combine these two plus points. And the
other is to tried address this time complexity. Somehow, because you if you have an
exponentially time algorithm nobody is going to buy it is essentially you can only solve
very small problems with it not problems of significance size.

Let us first look at this is there an algorithm did anybody give a thought to this which
will combine these two properties of depth first breadth first which means required linear
space but, guaranty an optimal solution. I take it you are not been reading my book yet.
Let us look at some variations of this all the algorithms that we are looked at today are

180
blind search algorithms, which means that they have no sense of direction they always.

If given a state space if this is a start state and this is, and where ever the goal state may
be in this state space the behavior of the algorithm would be the same. So, that first
would go of in the direction back track try something else back track try something else
and so on. Breadth first will go down and down and down till it expands the goal could
be here the goal could be here or goal could be this side it does not matter from that
essentially.

So, let me introduce one new algorithm or two new algorithms. One is he will call depth
bounded. It says that variation on depth first search, and the variation is that we have put
the depth bound do not go more than twenty steps go and go more than forty steps
whatever some depth bound we have put and said. So, what have done we have cut of the
search here some level and we had said that do depth first search on this truncated search
space. What is the characteristic of this algorithm? Depth bounded depth first search it is
linear in space why because, it is depth first search to start with its complete.

Who said yes? Why is it yes? What is meaning of complete? We said that if there is path
see this depth bound is something that you have imposed it is does not come from the
problem. The goal could have been some where here you know, what is the depth one?
depth one says that it is like a ((Refer Time: 17:23)) do not go beyond this line. So, if
you find the goal within that, yes you will get the solution. But if the goal happens to be
outside that like here which could be somewhere here, and find the solution.

So; obviously, it is not complete. But, it is faced efficient because it is depth first search.
Now, let us do a variation let us have an algorithm in which we say depth bound is equal
to zero. So, this is new algorithm I am writing we initialized depth bound to, and then we
say while goal not found do this algorithm let us call this d b d f s. And let us say this a
depth bound d b. So, let me use d b here also. This is algorithm it takes an argument of
course, it takes a start node and everything,,, but that we will assume is hidden or glover
whatever.

So, while goal not found do depth bound depth first search with the bound d b. So, zero

181
means you just inspect the start node one means you go one level deeper two means go
two steps deeper in that, and then you say. So, we have this put this in a loop. So, this a
new algorithm what is this algorithm? Call it is very well know algorithm it is called. So,
I will write it here in case in either space iterative as a algorithm suggests. So, this call
this is call iterative deepening in every cycle you increase a depth bound by one, and
then do a depth first search. So, iteratively you deepen this bound to which you will
search. And because, we are doing depth first search this is called depth first iterative
deepening which is popularly known as d f i d. This is algorithm d f i d that. So, first we
should understand what the algorithm is we us doing a sequence of this depth bound d f f
s starting with depth zero then going to depth one depth two depth three and so on.

What is the property of this algorithm? So, let us talk about will come to time complexity
in a moment let us talk about space, and let us talk about quality these are the two
quantities we are interested. In here because breath first gave us on optimum solution
guaranty the optimum solution good on quality depth first was space sufficient requires
only linear space. What about d f i d? Space complexity, when he says space complexity
we mean the size of open that is the convention we have been following.

Same as d f s why for the simple reason that it is d f s of course, it is not one d f s it is
many d f s 's every time you do a different d f s with a different depth bound,,, but you
are doing d f s. So, space is linear. Is anyone having a doubt about this? You should
clarify this at this moment itself it is just doing a series of depth first search inside every
cycle in this loop it is doing one depth first search, but it is doing depth first search. So, it
must be requiring space complexity of depth first search which is linear essentially.

What about quality? Not completeness, does it guarantee an optimal solution does it
guarantee shortest path there is depending upon I have given the algorithm completely. I
am asking the question that this algorithm does it guarantee you an optimal solution.
What is a argument behind the this? Argument yes answer any one willing to ask talk
about. Why does it? How does it guarantee? The answer is, what is the behavior of this
algorithm? If you look at only the new no it is now what is this algorithm doing it is
going to re inspect many nodes. So, let us say we have a search tree in which

182
(Refer Slide Time: 24:15)

We started s then in the first on we look at only s. Then in the second on we look at s a b
c. Then in the third round we look at see this is d e f and so on. In this third round what is
the order in which you. So, in the first round it should inspect only s in the first cycle in
the second cycle, it would inspect in this order s then s a b c. In the third round it will
inspect them in s then a then d then e then f then b and then whatever the child of b is
actually.

So, in the first cycle it inspects only s in the second cycle does search only till this step s
then a then b then c in third cycle it inspect this s I have not drawn the complete tree. But
s a then d then e then f then the children of b then the children of c in that particular order
depth first order. But now, if you want to mark the order in which it first time visits a
node then, you can see that s is visited in the first cycle then, a b and c are visited in the
second cycle then, d e and f are visited in the third cycle and so on.

It the order in which it is visiting you nodes, if you look at the order inside this red circle
you can that s a b c d e f s a b c d e f this is the order of depth first search. And if it finds
the goal it should have found a shortest path essentially goal. Because we know that
depth first search in this order always level order as you all also call it always find the
shortest path.

183
So, you has convince yourself I think that this algorithm behaves d f i d combines both
the things that we desired which is that space should be linear which was depth first
search and quality there you should guarantee the optimal solution which breadth first
search. And this is actually giving it to us. You might say that this is actually sequence of
depth first searches as a masquerading as a breadth first search.

Because, the behavior in terms of the path that it finds would be same as what depth first
search have done. And because the very first level at which the goal appears this will
terminate we know that it has found the shortest path. Any questions…

Student: ((Refer Time: 27:48))

What I written here is while goal not found. And So, I have lost over some detail if let us
assume it is a finite graph it is a infinite graph it will keep searching. So, let us assume it
is a finite graph then I will leave this as small excise for you to discover that at which
point no new nodes has been added which means if you just inspect the next layer if
there is no new node. Then you have inspected the complete graph, if you inspected the
complete graph it should report failure. But till that point it should keep deepening and
searching till finds a goals.

Student: as number of as a height of the tree till goals.

It is not a tree it is not a tree it is we do not know where the goal is that is a whole idea
about the searching. We are in some space and we are exploring the space by using move
gen function. And we were trying to find a path to the goal. So, first of all we do not
even know whether a goal exists for example, in that eight puzzle I may give you the
goal state as one which is not reachable. And secondly, we do not know where it exists at
what level it exists. So, the whole idea is to search for the path.

You know this iterative deepening algorithm they were devised in a chess playing
situation. And you know we saw when we looking at the history of e i that chess playing
has long be fusion with e i people, and they want to make the programs play tournament
under tournament conditions. And tournament conditions for those of you who play

184
chess know that you have allotted certain amount of time for making a certain number of
moves actually. So, the time available to the player is fixed essentially.

Now, we will see game playing programs later in these codes. But essentially they also
explore tree of some kind they can also explode tree up to various levels of depth. The
deeper the explore it so, exploring a tree basically means you make you explore
combinations if I make this move then the opponent will make this move then I will
make this move then the opponent will make this move and so on. So; obviously, this
analysis you can do to any level till the end of the game in fact. But that is not really
possible. You do this analysis and then try to judge which is good move to make.

Now, in chess playing programs if you are playing under tournament conditions you
have to be a aware of how much time you have essentially. So, iterative deepening
algorithms devised that situation that you learn the algorithm. And let it go deeper and
deeper as long as time allows suddenly if the calling algorithm calling program knows
that time is running out will say tell me the best move, and it will play the best move. So,
we look at iterative deepening again all at least we will mention it again we look at game
playing algorithms.

Now…

Student: ((Refer Time: 30:53))

So, you are saying why do not you do breadth first search. But the reason we are not
doing breadth first search is because, it needs to store this entire. See, what is open list of
breadth first search? The open list is something like this across this tree, and this is a
shape. So, the search plenty of breadths first search. In fact, looks like this, and as he go
deeper and deeper this is growing exponentially, we know that the number of nodes in
the d essentially. So, we do not want to store breadth to d nodes that is a reason why are
not doing breadth first search. So, we are doing depth first search. So…

Student: ((Refer Time: 31:41))

185
Already visited but, what is the option availability you. So, if you think little bit about it
you will see that see, I can see, what you are trying to say, you are trying say that, what
is the point of starting with s all over again?. But, where do you start otherwise? In
breadth first search you would have inspect you would inspected all these nodes, and
then you would have inspected their children. They were the children they would be in
open. But here we do not have them essentially anything. So, you need to give a little bit
of thought to it yes that is a common doubt when we look at d f d f i d first essentially.

So, we are doing this extra work, what this extra work we are doing? We are doing a
sequence of searches in which we are inspecting the complete tree at every level. First up
to level s here then this whole tree then this whole sub tree then this whole tree and so,
on. So obviously, we are paying an extra cost, is this cost worthwhile? What are the
benefits we are getting. The benefits, we are getting is that we are getting linear space,
and optimal guarantee solution guarantee of optimal solution the extra cost is going to be
the price that you pay off inspecting all these nodes, which are not inside the this red
circle again and again. So, s we are seen here again we are seen it here again we are seen
it here.

We have seen here we are again seeing it here. B we are seen here this whole count is the
count of measure of time complexity of d f i d because we are seeing all these inspecting
all this nodes. So, how much is this extra cost, is it worthwhile? Is the question, what is
your intuition? So, let me repeat in d f i d, we search up to some level let us say this
level. We do d f d f i at some for some depth bound we come up to here and then to
inspect these next level nodes, we search this whole tree again including this for depth.
So, this shaded portion is the extra work we were doing for inspecting this new set of
nodes how much is this extra work. Is it high or low? Let me just ask way simple
question. So, we have go back to our study of trees and you might have done in data
structures of some other course.

186
(Refer Slide Time: 34:15)

Let me take an arbitrary tree of branching factor b, and we will take a complete. So, for
this argument sake we will assume that the tree is complete which means the every
internal node has exactly be children for the sake of analysis which is not the case as we
know. For example, in the eight puzzles corner when the blank in the corner there are
only two moves that you can do? Whereas, when the blank is in the center you can do
four moves so; obviously, the branching factor is not constant. But let us assume for the
sake of analysis that branching factor is b and it is constant. So, that and this is the
frontier that we are looking at.

So, which the set of as I said right the leaves of this tree are the is the search frontier and
the internal nodes i of this tree are the nodes that we are visiting again for the sake of
inspecting these l nodes. At any given stage for this is a depth first would have done
depth first search just inspecting this l nodes, what is d f i d is doing? It is inspecting i
plus l nodes that is a extra work it is doing the whole question is and depth is should give
us insight into the nature of this monster that we ((Refer Time: 35:41)).

So, let me give you an nice argument of course, I am sure you have done this. In some
course, what is the ratio of internal nodes to leaves in a complete tree? But, I remember
in mathematics professor k Joshi from IIT Bombay had given a very nice argument. And

187
you can visualize tournament which is going on. So, since branching factor is b we will
assume that it is something like that say 100 meter sprint or something like that. And at
any note there are b children. So, any search thing can be seen as one game or one race
that you want to call, if it is binary I could have talked about tennis tournament but, it is
not binary it is branching factor b. So, let us assume it is like a hundred meter sprint and
b people compete in a heat. And only one selected form the top essentially. So, that is the
nature of this competition.

So, there are totally l competitors and in every internal node i what happens every
internal node is a heat. In every internal node one out of b goes head and the b minus one
are eliminated is actually. And in the end of course, there is only one winner in the end,
and all the rest are should I use the word loser or we should say also run may be I think
that is a better word. So, if you give some thought to that you will see that the total
number of parties’ forms which is l which is a number of leaves i n a world tournament.
So, these are also called winner trees you might have studied them somewhere, actually
is equal to b minus one into i plus one. So, what is a argument for this? Of course, you
can give a proof by indexation we are more mathematically inclined.

But this is this argument is just a tournament argument it says that in every internal node
b minus 1 players are eliminated. So, if i is the number of internal nodes then b minus
one into i is the total number of base which are eliminated, which is of course, l minus
one and thus one winner who stands out. So, the total number of competitors is given by
this and this gives us a relationship between i and l.

So, you can write i is equal to l minus 1 divided by b minus 1 and then you can compute
l plus i divided by l which is the ratio that we are looking what which is the amount of
extra work d f i d is doing as compare to breadth first search. Breadth first search would
have inspected only l nodes only this boundary nodes d f id inspecting the entire tree it is
l plus i node essentially. So, if you if you write this. Plug it and do a little bit of
simplification you will see that this is for large l d over d minus one appropriately this
the small factor somewhere with I will leave out.

So, what is what are we saying? We are saying that the amount of work which d f i d is

188
doing as compared to breadth first search, which negligibly more essentially. Just d over
d minus 1 times more essentially, and that is should not be surprising to you because, this
is the nature of these breadths explosion. That we have b is to d nodes here in this layer,
and all the internal nodes are b is to d minus 1 by b minus 1.

So, if you ignore that minus one next say for large or large branching factor you can
ignore that you can that l is roughly b minus 1 times I, which is also what we have
written here? So, the number of leaves as you go deeper down tree is b minus 1 times
entire set of nodes that you seen before. Anything you did before feels in comparison to
what you are doing at this level. So, what and that everything you did before is extra
work which d f i d is doing, it just re seeing visiting those nodes again and again
essentially. And if you go through this argument you will see that the time complexity of
the d f i d is not significantly more than breath first search.

We had that seen that breadth first search was a little bit more than. So, b f s to d f s was
b plus 1 over b or something like that i do not remember exactly, but I think b plus one
over b. Breadth first search little bit doing little more over than depth first search, and d f
i d is doing only little bit more work than d f i d. And what is that advantage we get? We
allowed using linear space and we are guaranteed the solution. So, it is very nice
algorithm think about this little bit i want to... Before we move on to this other question
of, how can we get around time complexity? I want to address this question this thing
that we did.

Remove seen, what is remove seen? Saying is that for every new child that you are
generated or every new node that you are generating check whether it is already present
in close. Let us assume that we have simple collection nodes and we do not have these
node pairs and. So, on let us just ignore that for a moment. But we have collection of we
want to check whether the given node exists in closed. What is the cost of this? actually
or complexity are we paying a heavy cost for simple checking whether we have visited
on node again. What would be an algorithm for doing this? So, what is the task? The task
is given a new node n.

And given a list of nodes which you called closed which is the node that we had seen

189
before, well I use the term list. But and it does not matter whether n exists in that set or
list or not. What would be the algorithm for doing that?

Student: ((Refer Time: 43:22)) in which we can store a is not having a bit bit director in
which we can store. So, it would be order one.

We will refine that in the moment. But if treated list how would i do it. I would have to
sequentially search which would mean it would be linear in the size of closed and how
was closed going? As depth close remember is a measure. So, that is why when we said
that when we talk about time complexity we will appropriate with the size of closed.
Which means the number of nodes that we are, what is closed? Closed is a node that we
already seen.

And we had said we appropriate it with the size of closed and there we had made an
assumption that checking in closed is not expensive. When if closed is going to be
expensive checking in close is going to be self if close growing expansible then checking
in that will each of each time will take expansion amount of work then, it going to be a
tuff thing correct. So, there is a i want to make a distinction between when conceptually
we think of close as a list its fine as per as the problem solving algorithm that we are
considering. But if you want to put on your software engineer or programmer had then,
you have to be more concern about how to implement closed.

So; obviously, set is not good idea anyway sets you have to implement in some way or
the other, list is good idea because, you will have to cancel the list. And we will take a
queue from what she said can we do this faster. I am not saying that we should make in
area of bit factors you should be bit louder, and bit more confident when you make that
answers.

Student: ((Refer Time: 45:37))

While a binary search we would assume that the given set is a ordered set because you
need to have a this thing. But a hash table is the perfect solution to this you must
implement closed as the hash table. Even though we say it is a list and that can that is

190
only for discussion purposes. But if you are going to implement the algorithm then close
was be a hash table, and we know that on the average if you design your hash table well
if your hash function in nicely chosen. Then it will give you average consent time there
essentially. We come to open again in the little while. So, I keep talking about these as a
monster and a beast. So, what is a size of this problem? So, let me give you some idea
about this.

(Refer Slide Time: 46:38)

if you look at the Rubik’s cube remember that b is equal to 18 and Rubik’s cube is a
nice example where b is constant at any given state you can make this 18 possible
different moves. 3 for each face, and there are 6 faces if you had to search up to a depth
of ten which means you want to explore the space up to depth ten. Then you would have
18 is to 10, and that turns out to be 3.5 into 10 is to 12. If you have searching for the
Rubik’s cube that what will happen if i make try all combinations of 10 moves that can
do then you have to inspect what 10 is to 12 states essentially. And what is a typical
length of a solution any idea of Rubik’s cube problem is it less than 10 or more then 10.
It is more than 10 most of the time if you have to search for depth 20 this an let me get I
have the number somewhere is 1.27 in to 10 is to 25 how.

Long does it take it to inspect 10 is to 25 nodes you know we do not have a good idea of

191
big numbers 10 is to 25, 10 is to 30 they sound same to us. So, let us say we assume, do
some very rough calculations let us say we can inspect a million states in a second. So,
we need 10 is to 19 seconds. Let us assume we do not want to divide by 6 multiply
divide by 60 and all that let us assume that a 100 seconds in a minute. So, we will have
10 is to 17 minutes, and let us say there are 100 minutes in an hour then, we have 10 is to
15 hours. Let assume that there are 100 hours in a day.

Then we have 10 is to 12 days how many days is 10 is to 12 days let us assume that there
are 1000 days in a year is 10 is to 9 years it is about billion years. If you had a machine
which could inspect a million moves per second, and it had to explore up to depth of 20
according to this calculation it would take you about a billion years. But if you do the
actual calculation, and I did it at home it actually takes 40 billion centuries to inspect 8
10 18 is to 20 nodes.

Is 10 is to inspect 10 is to 25 nodes it take about 40 billion. I will sure you are not willing
to wait for so, long essentially. So, will try an address this how and I said people have
more recently written programs which will find you the optimal solution in a Rubik’s
cube essentially. How do they do it essentially? So, let us answering this question, how
do you attack this exponential time complexity? Can you set of improve open that?
Essentially. So, we will do that in the next class, and we will take a break now. And
when we look at heuristic search essentially. So, we will take a break and come back in
about five minutes.

192
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Module No - 01
Lecture No - 09
Heuristic Search

(Refer Slide Time: 00:14)

So, let us begin we continue our study of state space search and what we have just seen is
that the size of the state space can be significantly large even for a small toys that we talk
about. So, in Rubik’s cube we saw where the branching factor is eighteen if you want to
search up to depth 20, then you have to search about 10 based to 25 node essentially
even at smaller puzzle which is the 8 puzzle and the 15 puzzle. If you look at the 15
puzzle where the branching factor is capped by 4, you can only move within one of four
directions, it is about 10 raised to 13 states and the 24 puzzle is about 10 raised to 24
states.

So, even that small puzzles that you get can generate huge state space essentially and we
have seen the 10 based to 24 states is not a number that you can slip that essentially it
will take you huge amounts of time. So, what is happening essentially is that the
algorithm that we have seen so far is blind. Essentially, that gives some state space given
some start state, what is what this algorithm is doing with the goal that you want to
achieve. It is using it only to test whether given state is goal state or not it is not using it

193
to guide at search in any manner at all depth first search as we said goes often some
direction.

Then, it back tracks and tries other direction irrespective of where the goal state is
breadth first search always circles around the star states and it will gradually expand, it
search frontier to move away essentially. The goal step no role to play here essentially
the next step that we would like to do is to somehow exploit what we know about the
goal state to help guide the search. So, let us assume that this was a city map essentially
and let us say this s stands for IIT and you have to go someplace essentially. So, let us
say this is a goal state somewhere, so if you are consistent with the geography of
Chennai, then this g could be may be Marina beach or something like that somewhere in
the north east.

If you want to go of IIT to marina beach and you of course have to follow the roads the
every junction is a node and every road segment is an edge essentially, so in some sense
being at a junction is a state, so you are you are at IIT. So, you are at this state you want
to be somewhere and here is the Gandhi statue in Marina beach that is a different state
and you can move across the road to move to a different junction which is a different
states and so on. Now, imagine what these two algorithms are doing they just go of in the
predetermine manner essentially and eventually of course if somebody tells them we
have reached Marina beach they will say I am done.

Essentially, what you really want what we really want is an algorithm which would
somehow head in this direction somehow you know what do you mean by this? What we
mean by this is that given a state given a start state and given a set of successors instead
of choosing in a predetermine fashion, which successors to inspect or expand as a term
we use next. Let us try to get some exploit whatever knowledge we can of the domain to
help guide the search essentially.

194
(Refer slide Time: 04:29)

So, that is called as heuristic search, so the word heuristic comes from the word I mean
with their various routes that I have read about, so Eurisko or Heuriskein and so on. All
these are sort of Greek words of Greek origin and you might remember a word which is
also related to Eurisko which is called eureka you know this story of argument is running
naked through the streets saying eureka and so on essentially. So, what is eureka
meaning it means that I have found it, I have discovered it, essentially and this whole
idea of Eurisko heuristics is based on this knowing something essentially, so how can we
get heuristic?

So, let me first discus how this heuristic knowledge would be used essentially, so we will
assume that we have another function called h of n which will take a node or a state and
give a value which is a measure in some sense. How easy is it to go from that state to the
goal state, essentially so supposing we had such a function. So, the theoretical
community would say we have an oracle, it tells you what is the complexity of going
from this you know or it tells you which if somebody could tell you that of these
successors.

In general, given open list which one to pick next then it would help it would be helpful
essentially that because then they would just tell you if you go ahead if you go ahead if
you go ahead you will reach the goal state essentially. So, first let us get this out of the
way this mechanism of how we are going to exploit the heuristic function.

195
(Refer Slide Time: 07:03)

So, if you look at this algorithm that we have the same framework, we will follow open
is initially made of this pair start. So, it should really be list of list essentially and then
extract the first one extract the node all that is the same and we at some point generate
new the set that we have of new nodes that we want to add to open, what we will do now
at least conceptually is that. So, remember that so far we said that we can either add new
at the head of the tail of open or the remaining part of open or at the end of the tail of
open if it was at the head of open, it was like stack if it was tail of open, it was like a
queue.

Now, we are saying is that something like this sort on happened new, so if I say append
new tail of open that part is like depth first search essentially that new is added at the
assuming append. That is what append does it sort of take this list and concatenate with
this list putting them at this towards the head it is like that this thing. Now, I am saying
after you are appended the two list sort them on the h value what is h value h value is this
value of this heuristic function of the node essentially.

This means what simply that the best h value will come the head of the list we are still
talking about it is the list and this same algorithm will pick it from the head of the list
and then this thing. So, if we have such heuristic knowledge we can exploit it by simply
modifying our algorithm to keep open as a sorted list. So, that when we remove the head
element we always get the best element what do we mean by best? By best we mean best

196
according to what the heuristic function believes or what the heuristic function thinks
essentially only as far as that essentially.

If we have a heuristic function then we can, this is called a heuristic function is should
write it right, it returns a number which is a measure of how easy or hard it is to go solve
the state from there essentially. So, this is the only chain we would need to make in our
algorithm that we have to sort the open list every time we add new elements essentially.
Now, obviously the data structure person inside you must be rebelling against this idea
sorting this list again and again and again because sorting is expensive. So, you could
you could do a other thing you could do for example, merge then sort new this tail open
yeah that is a first thing one think.

So, this is not needed that because the number of new elements is going to be small and
the rest of the tail is anyway sorted. So, just sort the new elements again an h of course,
and then merge it with the tail open of course, which is a little bit better in terms of
computational work, but that obviously is not the last word on this. So, just like we said
that to make efficient use of close you must maintain it as table what about this open now
how do we efficiently implement open. So, of course, that efficiency will only be from
the computational point of view as far as the task is concerned of choosing the next
successor.

It does not matter in what manner we manage to sort it how much time is spend sorting it
essentially, but if you what if you were concern about that how should you manage open
at the risk of turning this into a data sectors class we will spend one minute here.

Student: Sir, we have to maintain it.

So, you must maintain open as a priority queue of course, there are many different
mechanisms to implement priority queues we will not go into that we will just make this
observation that you must somehow implement open efficiently. So, what is really
happening just imagine this that we have this tail of open this, tail of open is like a big.

Let us say it is sorted list or something like that and then we have this new which is a
smaller list essentially. So, this is already sorted and we have get this we have got this
new elements and we have to sort of insert them into in the right place inside this sorted
list. The best way to do it is by a priority queue one way to implement the priority queue

197
is use heaps, but there are other ways of doing it. So, you must if we are implementing
heuristic search it is very important that you pay attention to how you manage open
because that is going to affect your running time essentially things become a little bit
more complicated.

When you remember that you also wanted to remove from the new list those things
which are already on open essentially which is not for which the priority queue is not
really the best we have doing things. So, I will leave this as a small exercise for you to as
the data structure exercise you might say you want to do both these task you want to
check for membership in open. That is a new node already present in open, which is
based on by a hash table you also want to maintain open. So, always the best nodes will
be at the head of open which is what a priority queue does.

How can you do both, can you do both in and if yes how it will be, but it is not our
primary concern here our primary concern is to look at how heuristic search happens.
This is basically what is happening essentially, now let us pay attention to the heuristic
function itself how we get this heuristic function.

(Refer Slide Time: 14:34)

So, one thing that one can do is which part of operational zing is this whole process is at
the search node which was originally a pair which has the current node the parent node.
Now, we want to convert it into a triple, so it will become current parent and that is the
call it h value for that node essentially. So, we want to also store that h value out of node

198
which means what that every time we generate a new node we will need to compute the
h value. Let us say this is the n, n is some node and you want every time we generate a
node n we want to compute the h value for that node.

Now, of course, you must keep in mind that this h value is not just the property of that
node it is also dependent upon the goal node essentially for different goal nodes. The
same node may have a different h value, but we are sort of losing over that here little bit
essentially, so how do we how do we generate these heuristic functions?

(Refer Slide Time: 16:27)

So, there are two approaches to this one is domain dependent or static, so by static we
mean that it only look as this current node and may be it looks as a goal node in some
manner, obviously it will have to do that and gives us a value back essentially. So, let us
take an example or couple of examples let us say this is a city map like we said that is
this is IIT and that is Marina beach and so on and so forth. How can we give heuristic
values to this node essentially, so the simplest thing is h of n, so city in a city map
problem route finding problems. So, now, a days of course, we have all these algorithms
which do route finding for you have to go from one place to another you must think
about how they do this we can say h of n.

So, let us assume that the goal node is given to us, so what can we expect in the city map
we can expect to get coordinates essentially. So, start state is from x start and y start in
the two dimensional map and goal state is some x goal and y goal if the coordinates are

199
given to you and that you can that is reasonable to expect that the coordinates may be
available in such a situation. How can we exploit this? How can we sort of get this? So
we say for each node or each location in the map we compute the distance to the goal
node and that is the estimate of heuristic that we will use.

So, one is Euclidean distance and that is a good estimate of, so how are you going to use
this Euclidean distance we are going to say these are the different places that we can go
to. So, remember that in a city map situation we are moving from one generation to the
next that once you on a road you have to go to the next you have to go to next junction
which has these nodes here. So, essentially you compute this distance for each of these n
nodes each of this successor and in general for every node in open. To start with, this is
only open that we have and then choose the one which has which seems to be closes to
the goal essentially now this is the heuristic.

So, when we say heuristic we also use a term rule of thumb and if you go to the
psychology today website they will say that it is a mental shortcut heuristic is a mental
shortcut which we use to solve problems quickly. So, the emphasis the word quickly we
are trying to fight this common neutral explosion and trying to use some knowledge in
this case from the domain to guide our search. So, when you are using a domain
dependent heuristic function we have now three domain function one is the move gen
function that we already talked about and one is the goal test function.

The third is now the heuristic function which means that if you are going to now create a
new domain you must write the program which will compute h of n. So, that is the search
algorithm can exploit that is why the search algorithm goes it only uses that h value to
sort or maintain a sorted list or maintain a priority queue of the candidates essentially.
The domain must now tell us what the h value is we will in the moment we will try to see
whether is this can be done in a domain independent fashion at all because you see this
whole idea that we are trying to pursue is to write this algorithms in a domain
independent fashion.

We want to write these search algorithms without saying that I want to solve the city
route finding problem or a Rubik’s cube or a Robert movement problem or a factory
scheduling problem. It does not care you have to find some general purpose algorithm
and we would like to plug-in domains and just use those algorithms essentially.

200
Yesterday there was a talk about this mars Robert yes anybody attend, so this NASA,
you know they had this mars Roberts. They used a lot of this kind of planning algorithms
for finding paths for this and move because it is not just the path for the Robert to follow
also what are the movements that.

For example, a robot arm must do if you have to pick up something from the ground, let
us say piece of rock from the ground or something every move have to be has to be
planned carefully and there we use the lot of these techniques. So, heuristic function of
course, if the heuristic function is perfect which means it is like an oracle what the
theoretical computer scientist would say. Then, it will just tell you which is the correct
thing and then of course, you would find the solution in linear time essentially, you just
go from along the path essentially, but in practice of course, it is not, so easy to find
heuristic function.

(Refer Slide Time: 22:50)

What is the other function that one can talk about, so another function that we use
sometime is simply this x s minus x g, so what is this doing instead of measuring. So, the
Euclidean distance is telling you the distance as we say as this is telling you a difference
between the x coordinates and the difference between the y coordinates. What is the
name of this distance function Manhattan function Manhattan distance also called a city
block distance essentially if you have a nicely laid out city like Manhattan, then you
know it is like on a grid then you can only go along the roads on the grid.

201
That distance that you cover would be basically this that if you know if you had if a grid
on a grid you can only move along the x direction or on the y direction and then other
distance measures that one can think of.

For example, one could take the max of this or max of this or max of this and this only
and choose that or there are other something called Murkowski norm which we will not
get into here. So, essentially what are we trying to do we are trying to devise a domain
function which takes two inputs one is the node given node and the other is the goal node
which we assume the during the ten year of the search is constant. So, we are not
mentioning it here, but it is there somewhere in the background and it returns a value to
us which gives some idea of how close one is to the goal node essentially.

So, in this context we can say a heuristic function is an estimate of the distance to the
goal essentially. So, we are using the term distance explicitly here in this case essentially
in general it is an estimate of how hard it is to solve a problem from the given state. By
hard, we mean typically the number of moves that you want to make and later on when
we see when we add cost to each move. So, for example, in the road we might add the
length of a road all the conjunction on the road as a cost or something that we will come
to that later. So far, our idea of optimal solution is simply to have a smallest number of
moves essentially.

(Refer Slide Time: 25:36)

202
Let us look at another example which is this eight or fifteen puzzle, so let us say this is
my given state and I have or let me choose one with three successors, so I have three
successors from here one is that I can move this two down. So, I will get 4, 8, 2, 6, 7, 3,
1, 5 or I can get, I can move 6 this side, so 2, 4, 8, 6, 7, 3, 1, 5 or I can move 3 up. So, I
have these three successors which i can make from here and I have to decide which one
of them to choose so obviously I need a goal state. So, let me say that we have some goal
state which looks like this and so let us say any state can be a goal state. So, let us say
this is a goal state and we want to somehow arrange the tail, so we reach that state
essentially how can we talk about heuristic functions can you think of heuristic function.

Now, what is the heuristic function or static heuristic function, so let us say we are
talking about this state this middle state where we have moved 6 to the left hand side is
this move that we should make? So, that is the question we are asking should we make
this move or this move or this move essentially. So, we want to four each of this we want
to compute the h value as it is called are the heuristic value which basically should look
at this only. That gives us a number and let us say we were adopt the convention that the
smaller the number the better for us essentially we going with the sense of distance or
notion of distance the smaller the distance the better can you think of a heuristic
function.

So, one function is let us calling it h 1 of n is equal to sum of the distance for each tile
distance to goal. So, when I say distance to goal I mean the distance to the goal position
no suggest keep in mind and what do we mean by distance here. Let us assume that we
mean the Manhattan distance because this is the problem where actually distance is has
to be it makes sense use manhattans because you can only move horizontally or
vertically. So, what would be the heuristic value of this state for this state we look at two
is here and two is there.

So, I will add one for two then four is here and four is there, so I need two steps to get to
that. So, 2 plus for 8, I need 1, 2, 3 steps for 6, I need 1, 2, 2 steps for 7, I need 1, 2, 3
steps for 3, I need to move 1, 2, 3, 4 steps or one I need to move 1, 2, 3 steps and plus for
5 I need to move 0 steps. So, basically the sum of all these numbers, so let me remove
them, it is confusing here, so I have all these numbers and basically the sum of all these
numbers is the heuristic value for this state essentially. So, if we added up you will get a
heuristic value likewise, you do for this and likewise you do for that and you would have

203
some way of making a guess which one is better. Can you think of any other heuristic
function? It can be simpler than this, number of misplaced tiles.

That is another, so in this case h 2 of n, you simply count how many tiles are out of place
essentially. So, you can say in this situation five is in place and nothing else is in place or
if you want to count the blank tile as well then you can says the blank tile is in its place,
but that will anyway add up to something essentially. So, it is a simpler measure
essentially it is easier to compute, but is in a notion of better here is one heuristic
function better than another supposing I want to tell you that these are the two heuristic
functions to use.

What does your intuition say see there are two aspects to making this choice one is the
cost of computing the function and second is the benefits that it gives us what is the
benefit the benefit that we are looking for is that we should guide our search better. In
other words, one way of counting this is that that if you let us say run 10,000
experiments on this 8 puzzle then you could compute something like this a measure its
we call as effective branching factor equal to number of nodes seen divided.

So, there is a effective branching factor it is the property of the heuristic function and the
way it is computed is that you run an experiment what is it mean you give some start
state some goal state. Let the heuristic function guide search according to this algorithm
that we have just written and then count how many nodes that the algorithms see and
divided by the total length of the solution what is the ideal value here. Ideal value is 1,
essentially even to heuristic function is perfect you will only see the nodes which take
you to this goal station, so you will not see any other new terms.

So, the ideal value for this effective branch effective is one the branching factor of the
problem itself is known to you which is 4. Essentially, let us say is bounded by four or it
has Rubik’s cube its I mean concentrating, but in this case it is not concern, but let us say
it is. So, the effective branching factor will be some value between 1 and 4, the better the
heuristic function. Of course, you will not do it for one experiment, you will run
thousands of experiments just to even out all statistical variations that you come by
choosing difference start states and that kind of a thing essentially.

So, then you have a notion of one being better than the other, but at the same time you
have the cost difference between the two in this case these two functions. In general, they

204
may be a choice of more than two functions, so one has to choose the heuristic function
essentially.

Now, if you just go back to this algorithm again it makes sense to use a heuristic function
if now remember that the heuristic function you have to compute for every node that you
generate every time you call the move gen function and generate the children. After
removing the duplicates ones, you have to compute the heuristic function for that node
essentially. So, the extra cost of computing the heuristic function must be much less than
the total time save for this algorithm which means that how do you save time by seeing a
fewer number of nodes.

Essentially, if the heuristic function is good then you will not go down some paths which
are meaningless you would only go down the solution path essentially, so obviously it
make sense to choose a versatile use a term inexpensive heuristic function. The simplest
or the most inexpensive ones are the static functions which only look at given state and
tell you a value essentially it shows a state. It will say this is a good state or a bad state in
terms of some number which it will compute using one of those things why do we say
that, so let us go back to this city map example for a moment.

What can happen which the heuristic function cannot foresee essentially, so if you know
the topology of Chennai and if you know that you will go to the gate, you have no other
option you go to the main gate and from there you have to decide where to go. So, this is
the starting place that is the goal location what happens when there is a river running
across this, we have this Adyar River and it even has water sometimes. A heuristic
function which is only looking at the start state and the goal state will be oblivious of the
fact that there is a river on the way.

So, in the mountainous situation it could be a paths for example, you need a paths to
cross some mountain ridge here you need a bridge. So, let us say the bridge is one bridge
is here and another bridge is here, but there is no bridge along the blowflies direction
close the river. What will the heuristic function do it will drive the search in this
direction because it will tell you this node is better than this node and this node is better
than this node. In actual practice these two one of these two nodes is likely to be better
why because this might take you to this bridge and this might take you to this bridge,

205
whereas this will take you to dead end and then you will have to go down this path and
then go like this.

We will not be able to see in the future a heuristic function will make judgment only
based on what information it has in this case it is a static function which means the
current location and the destination location and based on that this appears to be the best.
So, in that sense heuristic functions are fallible that is not necessary that they will give
you the best thing. We will in the next class look at another example to see how a
heuristic function can take, you can be misguided in some sense essentially and we will
discuss that in the little bit more detail.

So, you have to devise some heuristic function which will give you a number for every
node that you add to the open list and we are assuming here that the smaller is the better
because we are talking about notion of a distance. Then, we will sort the open list
according to the heuristic function and always choose the best node and the search will
progress essentially. So, what is the hope the hope is that the search will go towards the
direction of the goal, but sometimes it may go in one direction then it may have to back
trap and try something else essentially. So, which one might say that you start with the
search this thing go down some path and you can fill in some values so that we will go
down this path here.

So, for example, this could be 40 this could be 60 and this could be 50 or something like
that. So, which choose 40 and then likewise remember that all these are closed according
to the algorithm and everything else is opened essentially. So, it is possible that the
heuristic function will take it down this path and suddenly it will discover that the
heuristic value has short of at this stage. So, this becomes the next directly, so the
heuristic function is guided since its guided by the heuristic function it does not always
explore the space in the same predetermined pattern, but it will depend on what is the
goal that you have given to us.

If the goal of was instead of here is the goal was here, let us say station or something
then the certain algorithm would have try to go in this direction essentially. So, its
behavior changes with every goal that is a basic idea that it is trying to go in the direction
where the goal will be achieved essentially. Likewise, for the other problem essentially I

206
will come to the dynamic functions in the moment lets discus the properties of this
algorithm I have gotten the four criteria that we have talked about.

Let us start with completeness, so where should I write this I think let me here, so let us
give this algorithm and name by the way we haven’t given it a name and this algorithm is
called best first best first in the sense that of all the nodes in the open. It will pick the best
one first and what do you mean by best the one with the lowest heuristic value, so best
first what are the properties completeness let us assume that it is a finite space for
simplicity, which is complete. It is complete, but how would justify your argument
actually we did not actually give a very a little bit more formal argument even for the
earlier search is, but you can give it now looking at this algorithm how can you say that it
is complete essentially.

In every cycle, it picks one node from open inspects it and either put into close or
succeeds or whatever the state space is finite which means open can have only all the
entire state space in the worst case it will explore the entire state space. In every cycle, it
will inspect one node from open, so it will terminate either when open is empty or when
it found the goal state it will always find the goal state if it exist because it will at some
point enter the open. It will always pick node some open and open is the finite list
specially given the fact that we are prove we are moving the duplicates we are never
adding the same nodes again.

So, the open have only the finite set to start with and in the worst case it will pick them
one by one and in a recycle and remove it and say I cannot find the solution. When the
solution exists at some point, it will be picked by open pick by this step here and it will
say succeed what is the difference between depth first and best first. The only difference
is that best first is sorting at least conceptually it is sorting the open list every time
essentially though that is the only change, but it is going to pick every node once before
it terminate, so completeness follows.

207
(Refer Slide Time: 44:45)

Let us talk about time and space in one breadth here very quickly what would you say
time complexity, let us worry let us not worry about space. So, much in this moment, let
us talk about time it really depends on how good the heuristic function is. Now, you can
see that if the heuristic function is perfect, then the time complexity will be linear you
will always go to the correct node linear in what linear in depth right. So, the depth is
this search tree, you will always choose make the correct choice at every level and in
linear time you will reach the goals.

In the worst case, it will exponential essentially if the heuristic function is bad the
heuristic function could be actually malicious essentially like you know in some cities
you find ask somebody how do I go there and then tell you the opposite direction. So,
even in the worst case you will only do an exponential amount of search essentially, so it
really depends upon the heuristic function likewise for space it really depends upon the
heuristic function. Now, in the search tree that we were drawing like this solves the open
list for breadth first search and the open list for depth first was like this. Now, it is really
difficult to give you a feel of this exponentially going pen list of breadth first search, but
you must keep in mind that every time when I go from one level to the next.

I am multiplying the length if you want to call it by b essentially where b is the bunching
factor which is not really depicted in the diagram this is just kind of a schematic diagram,
but every time it is going becoming longer by b. So, therefore, it is going exponentially,

208
whereas the for breadth depth first search the open is basically kind of proportional to
length and its linear that we have argued it turns out that for best first search. Typically,
the search frontier looks like this, now this is typical of course, which means it is not
really linear and it generally tends to be more towards exponentially in nature in practice
essentially.

There is one thing that you should consider if you have solved this kind of problems or if
you have solved Rubik’s kind of problems just try to imagine that sometimes you have to
go against the heuristic function. We will come to this point later in the sense that if you
are counting for example, the number of tiles and place or something like that then at
some point you have to disrupt something that you have already done. So, for example,
in the Rubik’s cube you have done the top phase and then do to do the second layer you
have to temporarily disrupt the top phase now heuristic function would be very upset
about such things.

Now, you have made the top phase and then you are disrupting it will not reflect nicely
in your search essentially. So, we will come to those issues later completeness time space
and what else quality I will differ this discussion quality later just think about is do you
think that this will give you an optimal solution or not and we will come to this later. I
just want to spend the couple of minute on the other side which is the domain
independent and without going into details I will just say this that domain independent
heuristic function solves what we call a relaxed problem. So, it solves a relaxed problem
and. So, what is the key difference the static function only looks at the given state and
the goal state and gives you a value.

So, relaxed problem and when we look at planning we will see if we get time we will
look at this in a little bit more detail, but the relaxed problem is modifying the original
problem, so that it can be solve more easily essentially. So, to give you an example when
you look at the eight puzzle like this, now if I have to go from two to if I if two has to go
here or if two has to let us say seven has to go here in place of one. We want to move
seven to one now in the real eight puzzle you have to first move eight out of the way,
then you have to push seven up.

Then, you have to you know somehow create the gap by pushing, so we have pushed
eight here, then you push 6 and then 5 and then 4 and then 3 then 2 and then 1 and then

209
you push seven up essentially you have to do a lot of moves in the real world.
Essentially, imagine an eight puzzle in which you can slide on top of other slides or sit
on top of other slides other tiles essentially which means I have modified my problem
that I can make a move a seven can come here. Then, it can come here essentially, it does
not have to be a blank tile to move into which is the real problem essentially the relaxed
problem, you can move over on the tile and sit on top of another tile.

Now, you can see that if I am working with this relaxed problem I can easily find how
many steps it take for me to go from seven to one this is two steps now which where we
devise the static function. Also, we gave the same value remember that distance to the
goal step he said the difference between the static. This dynamic domain independent
function is that the static domain function was devise specifically for the eight puzzle
whereas, when we learn how to sort of pose problems in the uniform manner. We can see
that we can pose problems and then we can pose relaxed problems in a domain
independent fashion and they can be solved typically in polynomial time that is a key
thing.

So, what is a difference between a static function and a dynamic function or domain
independent static function we assume is solved in constant time because it only looks at
the given state. The goal state and gives you a value whereas, this domain independent
function actually expose the space, but under different constraints which are relaxed
from the original problem. So, it still may search up to the certain depth essentially, but
problem has been relaxed to such an extent that to solve the relaxed problem it needs
only polynomial time u that is a general idea. We will try and come back to that later, but
it is done in a domain independent fashion I just illustrated it with the eight puzzle here
essentially.

So, what happens now you have a heuristic function which is computed whenever n is
generated with a static function this is computed in constant time. So, obviously, it likely
to be helpful the question is if I have a polynomial time function sitting here is it going to
help which is attractive. Then, I can do this in a domain independent fashion I do not
have to say oh this is the city map. I can use a Euclidean distance or that is a eight puzzle
I can or that is a Rubik’s cube or I can count the red tiles and the blue tiles.

210
I do not have to do any of that reasoning I will say in some well defined manner, I will
change the problem definition to such in such a manner that it can be solved in
polynomial time and then the length of that solution I will use as a heuristic value here.
So, you have plugged in a polynomial function here and the question is does it help you
have plugged in this polynomial function inside this thing which is basically exponential
in nature and the answer is that yes in practice. It helps and we will see that little bit later,
but to generalize the idea of using a heuristic function is to use a function which will
given the choices you have given the open list that you have.

It will tell you which node to inspect next rather than do it in a blind fashion like depth
first or breadth first or for that method d f I d was using it is using some it is doing search
in a more informed fashion and that is why it is called best first search essentially. So, we
will take a best first search again and then we will try to improve upon this a little bit we
will come back to this notion of quality and completeness and revisit them. Essentially,
keep in mind that time and space complexity on the average are still exponential in
nature they may be better exponential functions than the original, but that they still tend
to be exponential.

Of course, in some domains, where the heuristic function is very nice easy to build in
which case you will get lot of improvement. For example, if you are in Manhattan, then
you can find the path quite easily I think, so we will come back to this in the next class.

211
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Lecture No - 10
Hill Climbing

(Refer Slide Time: 00:17)

So, we were looking at heuristic search and you saw this algorithm column best first
search. So, the way that this algorithm works is that it is the heuristic function h of n
which gives you an estimate of the distance to the goal and what essentially best first
search does is that it sorts the open list. So, let us say new is a set of nodes that is
generated by Mogen and the basic difference between best first and early algorithm if we
saw was that open is modified as follows.

If we do not have this sort h then you would simply have offend the new loads to the tail
of open and then it is behavior would have been like depth first search, now what you are
doing is that we are sorting on this heuristic function h. So, the best nodes come to the
head of the lets see we always speak node that strategy is does not change we always
pick the node from the head of the open. Then, start does all the processing that we do
and eventually add the new nodes open in sort.

212
Now, when I say sort we essentially mean conceptually we have sorting it, but of course
from the computational point of you it would be quite silly to sort open this every time
and, but we really do is that in set we maintain open is a priority cube. This is an efficient
way of maintaining a sorted list of elements and you can keep adding new elements to
that, now if you look at the behavior of best first search.

(Refer Slide Time: 03:18)

What it does is it is starts with some node then it puts it into closed and adds it is
successes to the open list. So, the single circles are open and the double circles are closed
then it picks one of them whichever has the best h value and expands that now
everything else everything in single circles season open list.

It can take any one of them depending on the h value, so let us say that it expands this
node next and I am assuming here a branching factor 4 and the search proceeds in this
fashion may be this is the next one that it expands. Since it always looking at the
heuristic function it not necessary that it will expand one of these four any one of them
could suddenly turn out to be better than that. This is because this is my gradually
become worst and worst in terms of heuristic value and is possible that after this doing
this it suddenly goes off in the different direction.

213
So, it expands this and then may be it expands this, so the search can, so the jump around
this entire list of nodes is essentially open list if you can make this out. So, the open list
is basically a list of all possible candidates that it is generated which could be inspect that
some later point of time. It is a global list and best first search is a global search
algorithm. So, what are the properties of this algorithm that we have looked at properties
from four perspective?

(Refer Slide Time: 05:24)

This is time space quality and completeness, so if you look at this algorithm from this
four qualities we can say that it is complete because the only thing it does differently
from best first search. Best first search is there it is sorts opened this; otherwise it will
always take one node from open and expand it and so on. So, put at least for finite spaces
it is complete the quality of the solution as we saw is not necessarily an optimal path.

214
(Refer Slide Time: 06:11)

So, an example of that is if this is the start node and it has let us say some source and the
goal is somewhere here then it slightly that this best of search will go along this path. So,
maybe it will expand this next generated structure source may be it will expand this
generated structure source and so on. So, it will find some path to the goal which will
have some number of nodes where is this possible in that this one had a direct link to the
goal in which case it would not have on this path of length to.

Remember that we are not counting we are not allocating any h cost we are viewing.
Simply counting how many states are there in the solution first and this situation it will
not find the shortest path, which is this one or it will go in this direction.

We also saw in the example that if you have if you have a city map and suddenly there is
a river on the way. Then, the first best first search will drive it towards the goal and
suddenly see that there is a no bridge and it will have to take a and find the longer path
decision. So, quality is not guarantee we do not necessarily get an optimized path now as
to time and space complexity. It really depends upon h if the heuristic function is good
then it will drive the search towards this goal directly and you will find goal in linear
time requiring linear space it is like a algorithm.

215
If it will directly goes straight towards the goal, but in factors of course that does not
happen in factors it is very difficult to devise heuristic function which has so good and in
practice it tense to be exponential. So, this is something that we want to try and that we
do not want algorithm which are exponential in nature, so what can we do to devise
algorithm which will require lesser space and lesser time complexity.

(Refer Slide Time: 08:43)

So, let us look at this variation of this algorithm in which we modify as follows we
simply do this that open is the sorted version of the new nodes that we have generated.
So, what have we done here we thrown away all this nodes that we have generated
earlier.

216
(Refer Slide Time: 09:10)

So, in this case the open list is set of includes this nodes it is big list nodes that we have
generated sometime in the past and nodes that we have just generated everything is
included into the open list. So, this is like a search there is now what we are saying is do
not do that just maintain just look at the latest note that what they would generated the
new the newest note that was generated and just pick the best amongst them.

(Refer Slide Time: 09:47)

217
So, what is the behavior of this algorithm you start with some search some start node and
let us assume that we keep that of the parent point you generate the children. You pick
one of them let us say the same heuristic function we are using we take this node, but we
generated children. Now, this algorithm is saying that the set of candidates is only these
children of this node that we have generated and we have forgotten about those are the
nodes. So, in effect we have deleted them, so those nodes no longer exist, so open as we
can see is going to be a shorter list then we generate the third load form here generated
children and we throw away this.

So, what is happened with this variation is that the search has access only to the latest
nodes that have been generated only to the neighbors of the current node it is infect.
Once we are going to do this, you do not your need to sort this thing we can simply
modify the algorithm as follows that next and next is a name of a node it is simply the
best of current. So, essentially what we are saying is at if you are at current node, so for
example, if this is the current node then you move to the next node which is this let us
say this one. So, you have a here and then you simply move to this one and that is the
move that you make here.

So, that is current and this is next you see because we do not want to waste time sorting
because we are not going to use those anywhere later you see once you decided this is
the best of the of this current. We can directly select that and this can be done in linear
time of first and we just put this in a loop while next is better than current, we just keep
doing this as long as we can see a better node move to the better node. So, one thing that
we have done here is that we have changed the domination criteria when we are doing
best first search the domination criteria was either goal test or open empty.

This means that either we have found paths to the goal or they are no more candidates
left in which case there is no path to the goal because the algorithm is a complete
algorithm. You would terminate in the positive piece only when the goal test function
returns true now we have these criteria while next is better than current.

218
(Refer Slide Time: 13:40)

So, what we mean that better this that in the case of heuristic function that we were
talking about it simply means the heuristic value of next is better than the heuristic value
of current in. If the heuristic value of the goal is 0, then it should be lower than the value
of the current node decision.

So, you have changed the termination criteria simply and we have converted this into an
optimization problem, we are saying that optimize the value of heuristic function. So,
instead of a state space search algorithm, we have now converted into an optimization
problem and said find the node with a best value rich if you see and we keep moving
forward till we find the better value of this.

219
(Refer Slide Time: 14:39)

Now, if you consider your situation, there you are blind folded and you are standing on
the slop of a hill side and you have been told to go to the top of the hill then what is the
algorithm that you would follow is, so this is you standing blind folded. The algorithm
that you would follow is that you would pose possibly take a step in all direction well
some income this set of direction. Then, move forward to that direction which seems to
be going up, now this is a two dimensional world or a one dimensional world in which
you can only move left or right.

Then, you can see that if you go right you will be going higher if you go left you will be
going lower, so you go right, which is exactly what this algorithm is doing that it set of
look exploring the neighborhood of the current state.

Then, saying that if there is a better neighbor it moves to that state next is best of this
thing and you keep doing this still next is better than current actually to be more precise.
You should have the check at this stage itself that you should check whether next is
better and then only move to the next in principle what you are doing is you are moving
along the steepest gradient. In this example, we can say that we are doing steepest
gradient ascent and it is not surprising that this algorithm is actually call hill climbing.
So, if I remove this version now becomes a little clear to watch, so this algorithm called

220
hill climbing algorithm.

Essentially, it is analogous to climbing a hill blind folded and we just move in the
direction of the speakers flow and hope to reach the maximum. So, this means that if you
reach here and then you terminate and then you allowed to open your eyes and see if we
have a local maxima then we will stopped the local maxima exactly how will that will
ever reached the global maxima exactly. So, that is what I would just about to illustrate
here, so this is the figure that I am trying to draw that you have done this climbing and
you have reached the place where I all the neighbors are not better than the current
neighbor.

So, initially of course you have a smile on your face saying that you have reach the
maxima, but when you open your eyes then the smile turns into a because then you will
discovered that actually this. So, precisely the problem that you have pointing out that
this algorithm will take it to maxima or a minima if you are minimizing because it should
be analogous. You will get start in this local maxima and that is the problem that terms
because of the fact that this algorithm is a local search algorithm. It only looks in the
neighborhood of a current state to decide where to go next unlike the best first search
algorithm which maintains a global list of open candidates.

It could always move to a better candidate, but never does says must to be said about this
algorithm why because of the complexity issues involved time space complexity
completeness as we have just observed it is not complete. So, hill climbing cannot take
you to the global optima it is not complete, it will get back of the local maxima quality of
the solution.

Also, you cannot even talk of completeness, so you cannot even talk of quality again, but
time and space complexity. So, what is the space complexity of this algorithm in a it is
log n if we take a bindery tree see in that log n. So, similarly by space complexity we
mean the size of the open disk how many nodes does it need to keep in the memory only
those mean number of open it explore by this heuristic function this heuristic function is
there's the best. So, next on you go on incrementing only one node at a time is present in
the priority queue one plus four in this case 1 plus 4. So, it is constant space complexity,

221
so this is the single major advantage of this algorithm is that it requires constant space as
appose to best first and the entire algorithm that we saw.

We said that is general require exponential space off course well does not require
exponential space, but best first search does because it could not it had no sense of
direction. It would say and hill climbing also games on time because it only moves along
the gradient it will stop moving once the gradient becomes negative. So, in some sense it
will require linear time that it will take a if it takes n steps, it just about takes the n steps
and communicates with that essential. So, the question is what is this surface appear
talking about this hill that we are talking about where does this surface come from.

The answer is that this surface is defined by the heuristic function that we are using to
guide such thing. So, I would want to take a couple of examples one example to illustrate
that you can have two different heuristic functions and they will define different
surfaces. Now, if the surface of the that you what in the space that you are searching
were to be smooth if the surface was like this then you can see that this algorithm would
have taken you to the global maxima that is the global maxima. So, it really depends on
nature of the problem if the nature of the problem is searched that the heuristic function
defines a smooth and monotonic surface then hill climbing will work, otherwise it will
get struck to on a local maxima.

222
(Refer Slide Time: 21:30)

So, let us take an example from the blocks world domain which is a domain which is
often used to illustrate many ideas in e i and the domain consist of a set of children
blocks. So, for example, you may have a b set of blocks which are arranged like this, so
you can keep only one block on another block, so here we have free blocks and filled up
one top of the other. Then, there is the table and then we have two more blocks filled up
like this, so this is the start state given to us and the goal state is let us say state which
looks like this that you on a to sit on D and then D to sit on B and b to sit on C and let E
be like this.

So, this is the goal state what are the moves available to you we have only one move
which we will say is like this move block x let us call it commas source to destination
and source can be a top of. So, this move can only be done if you can pick up of block,
which means it must near the top of a stack all around the table and you can put it down
either on the top of another block or on the table. So, the source can be only the top of a
stack or the top most blocks in the stack and the destination also can be only a top most
block in this stack. So, for example, in this situation we can do the following moves we
can say move this to top of b.

So, you would get a situation like this that is one possible move you can then another

223
move we can make is we can put a down. So, this B and C remain like this D is here and
A is here, A third move that you can make is we can move D. So, only A and D are the
two blocks we can move here because they are of the only once on the top of the stack.
So, you can put D on top of a, whereas E remains here and the fourth possible move is
that you can put D down. So, you will get these are the four possible moves, so this is the
neighborhood of this state and you want to use hill climbing algorithm to decide which
state to move.

So, now, it means to design the heuristic function, so what is what can be a heuristic
function that you can use here, so I want to discuss two functions. They are as follows
the first function says that if a block is sitting on a correct destination block. Then, you
add one further block what do your destination block A should be sitting on D should be
sitting on B and B should be sitting on C and C should be on the table and likewise for E
should be on the table.

(Refer Slide Time: 25:32)

So, let us call this h one of n it says add one if block on i just use a term block, so in
general we mean either block or the table and we subtract 1 if it is on, so this gives a
such function which only looks at a state. So, the idea of a heuristic function in that it
should be computationally cheap pieces just look at the state and get a value out of it. So,

224
let us just give heuristic values for all this functions, so let us start with a goal, so for this
we will have plus one for this plus one for this plus one for everything because
everything is on the correct place. So, the heuristic value of the goal is 5 or plus 5
whatever the start state.

The start state we will have plus 1 for this plus 1 for this minus 1 for this because a
should be on D, but it is on b plus 1 for this minus 1 for this because d should be on b in
the goal state, but it is sitting on e. So, if we add this entire sub you will get a value of
plus one now what about the other states. So, let us look at this value this is a, b, c this
ends to plus one here plus one for this and minus one for this. So, just let me know if I
making a mistake this state am plus one here it is minus one for this minus one for this
plus one for this plus one for this. So, this cancel, so and e, so this is also plus 1 for this 1
plus 1 for this plus one for this minus 1 for this plus 1 for this minus 1 for this. So, this is
also plus 1 in this case it is plus 1 for this plus 1 for this plus 1 for this plus 1 for this.

Notice because a sitting on d and in the goal you want to a to be sitting on d and minus
one for this. So, this value turns out to be plus c, so these are the four states this is one
state this is another state this is the third state this is the fourth state that this move gun
function generates and heuristic function tells. So, the algorithm says look at the
neighbors if one of the neighbors is better than if at least one neighbor is better than the
current state then choose a best among the neighbors, in this example three neighbors are
equal to plus 1. So, we do not consider that where we consider this one which is better
than, so the first move hill climbing will make this heuristic function is this one it will
complete, so from here what are the options?

Again, we do the move gun function, so you can move either d or you can move a if you
move a one thing is you can go back to this state you can take a from here and put it on
top of b or you can take a from here put it down here which is you will go to this state.

The other options are you can take b on top put it on a, so you will get this state and you
must tell me that heuristic value of this. So, plus 1 for this minus 1 for this plus 1 for this
minus 1 for this plus 1 for this, so this whole thing comes to plus 1 so that is one move
from here and one more move is that you can pick up b and put it down on the table. So,

225
you would get B C and A A D E, so this minus 1 for this plus 1 for this plus 1 plus 1 and
minus 1, so this is also plus 1 is that correct.

So, here we are sitting on a state which has the heuristic value of plus 3 and it has four
neighbors this has the value of plus 1 we have completed earlier this has the value of plus
1 we have completed earlier and these two also have a value of plus 1. So, you can see
that this is the maxima and the algorithm will terminate here without reaching the goal
state that we are interested.

(Refer Slide Time: 30:49)

So, let us have a different heuristic function and this function is as follows. So, let us call
it h two of n and this says add one for every block in a structure. So, we are looking at
the whole structure that the block is sitting on and subtract one for every block. So, the
difference between that is that you adding O W either you adding 1 or subtracting 1.

In this case, you may an a add of many things, so you are adding one for every block in
the correct structure. So, if it is entire structure below the block is correct then for every
element in the structure, you will add one if it is a long structure, for every element you
will subtract 1. So let us together start with the goal, so will start with plus 1 here plus 2
here because it is on c and c is on the table likewise plus 3 here because d is on b and b is

226
on c and c is on table, so three things below we factor it and plus four here and this is
plus 1. So, this is 10 plus 1, 11, let us look at the start state you will add a plus 1 for c
plus 2 for b minus 3 for a because it is a long structure it should have been on A D B C,
it is on a b c a b c is not a correct structure.

So, three things below it along, so we have two minus three for here likewise we do
minus 2 for this because it is on a wrong structure and plus one for this. So, this three and
three canceled out this canceled out, so this is minus 1, the start state you can see the
same force nodes are generated by the same move gun function which is that you can
move this thing. So, let us see valuate the values for this four states here as before, so in
this case as we have seen this adds to 0 plus 1 plus 2 minus 3, so this is 0 this is plus one
and this will be minus 1. So, this will end of 0 in this case these 3 add up to minus 0 as
before this will add another minus 4 to that because it is in a wrong structure which is
fourth in below that.

So, minus 4 here plus 1 here, so this will be minus 3 this 1 plus 1 plus 2 plus 3 for this
plus 4 for this minus one for this and minus 2 for this. So, his will be plus 1 and this last
one which is plus 3 for these 2 plus 1 for this 4 minus 2 for this. So, this becomes 2 and
minus 3 for this, so it becomes minus 1 is that correct, so now, let us follows and look at
what this heuristic function is how is it evaluating the situation, the first heuristic
function took this as plus 1 and this as plus 3 and this always plus 1. So, it is thought this
is the good move to make and it move make that move essentially.

Now, when you look at the problem you can see that the optimal solution is when you
pick up a put it down somewhere, then pick up D, put it on B then pick up a and put it on
that is optimal solution. So, the first correct move to make a s u pick up a and put it down
on the table and if you look at this second heuristic function h 2 that we are looking at.

First of all you must notice that it is moved discriminative the first function had only
very few set of values plus 1 here plus 3 here and plus 5, there essentially. This one has a
value of minus one here plus 1, 1 for this 0 for this minus 3 for this plus 1 for this and
minus one for this all values is different. So, it does not think that they are equal in states
not only that is feel that this is the good state then this is the next based state and these

227
two state it is comes as to be bad. Actually, because its values are lower essential, now
using the algorithm, it has a better state to move, it was at minus 1 you can you can see
plus 1.

So, it will make this moves and that you will notice is the correct move to make from
here what can we do it can either pick up a and put it on b which is going back here. That
you can see is the move it will not make because it is going from plus 1 to minus 1 or it
can pick up A and put it on to D and that is the state that we have seen here it can make
this move and that is going from plus 1 to minus 1 again. So, it will not make that move,
so it is not going to pick up A and do something with it. So, A could have put on d it is
going to take it to a bad state or it could have put it back one B this also in which going
to take it to a back state, so what about the other possibilities, so it can either pick up B
or it can pick up D.

(Refer Slide Time: 36:41)

So, let us first take the B case, so it can pick up D and put it on C on D sorry B, D, E, C
A, that is one possibility or it can pick up B from here and put it on a that is second
possibility and the third possibility is you can it can put B down. So, these all B moves
that we are looking at this the third possible state, so let us just look evaluate these
values. So, we have plus one for this minus 1 minus 2 and minus 3 for this, so that is

228
minus 6, so this value should be minus 6 is it correct, so C is plus 1 E is plus 1 E is plus
one 1 2 D is minus 2, that cancels that.

So, minus 3 for this and minus 1 for this, so minus 4, so obviously, it is not going to
make this move. So, it is sitting at plus one just keep that in mind it is not going to move
to minus 1 here or to minus 1 here or to minus 4 here what are the states values for these
2 states. So, this is minus 3 minus 2 minus 2, so obviously, these states are worst states,
so it is not going to move from it, so that leave just only the D moves, so let us look at
there, so I can either pick up D and put it on B or on A or put it down.

(Refer Slide Time: 38:41)

So, there are three possibilities. So, D, B, C, E, A or it can put D on A which is B minus
1 and minus 2, 3 plus 2 is that correct for this state plus 3 for this plus one 4 for this
minus 1 for this and minus 2 or this. So, it will be plus 1, so if remember it was sitting on
plus 1, so it would not go to plus 1 or it won’t go it can go to plus 2, but let us look at this
one. So, this is plus 1 plus 2 plus 3 that is plus 6 plus 1 plus 4 minus 1 6 plus 1 minus 1,
so plus 6, so you can see that this heuristic function will actually derive the search hill
climbing algorithm to make these two moves, first it will pick up d and put it. First, it
will pick up A and put it on the table and then it will pick up D and put it on B and you
can see that the heuristic value is going a plus 6.

229
If you want to take this forward you can see that this actually leads to the goal state
essentially in the next step essentially it will pick up a and put it on top of D recheck
value of plus 1, all other moves will be worst in that essentially. So, what we have seen
here is that given a problem to solve what is the problem it is a block solve problem, you
are given some initial configuration. You have some desired configuration when you are
given a set of moves you want to use hill climbing both these functions are static
evaluation functions.

This means, they only look at a state ant give you a value for that when I when we
valued this plus 1 on this minus 3 or this plus 6 or plus 1 we are not doing any search.
We are only looking at this particular state and saying this is the value for this state both
is static. So, both require constant time in some sense, but one is more perceptive then
the other one is more detailed one looks at the entire structure the other one only looks at
what the current block. You can see that one search the first heuristic function takes it to
local maxima the second one takes it to global maxima. So, what does that mean, it
means that the surface that h one is defining as local maxima it is like this something like
this where is the surface at this one is defining is smooth essentially.

So, one thing that you can do when you are using an algorithm like hill climbing chooses
a heuristic function which will define a smooth monotonic surface like this. Then, of
course you are done and you are done at a very inexpensive price the space requirement
is constant. The heuristic function have takes constant time and the time complexity is
linear because it will just keep taking one step in positive direction and eventually come
to a stop at the goal.

So, that is why hill climbing is such an attractive algorithm because it allows you to do
this thing at constant space and linear time essentially the caches can you find the
heuristic function which will define the smooth surface for the searched. If you cannot,
then you have this problem of having getting struck on local maxima or a local minima
as the case. Maybe then, we need to look for algorithms which have variations of hill
climbing which can overcome this problem of getting struck in the local maxima. So, we
will look at that in the next class, so we stop here.

230
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Lecture No - 11
Solution Space Search, Beam , Search

(Refer Slide Time: 00:15)

So, we are looking at this local search and your seen that which leads us to a problem of
local maximum because the algorithm does not have the excess to the entire things space.
It could get started; it could only see some options and not all options and could get stuck
there essentially. So, you want to see how to get around and we saw that that again which
this algorithm is out of navigating is determine by the Heuristic function that actualities
this determines also by the more than function. We will see in a short life well given set
of states it is a move them function will tells you from which state you can go to which
other states essentially and we will see that we can play along that as well essentially.

231
(Refer Slide Time: 01:35)

So, I want now introduce this problem today which well known problem here familiar
with no doubt call sat this problem everybody familiar with satisfiability. So, we just
over quick recap that there is a formula and you we will assume for simplicity that this
formula is in conductive normal form we does not have to be, but it is contempt. So, it is
a Boolean formula we can say, so something like a orb and all not all band, so on a
formula like that were a, b, c, d, e, all Boolean variables we can take two values either
true or false and the semantics of and on or is defined and naught is defined . We want to
find valuations for a, b, c and all the variables such that the formula evaluate is true
essentially.

So, that is a satisfiability formula, which says that can you find values for all the Boolean
variables such that the overall formula evaluates it true and obviously you can see that
this is a kind of search problem you can say something like you start. You say a is equal
to true or a is equals to false that is one way of looking at it that you that you pick a
variable a, sign a value to it, take another value variable sign a value to it. You would
generate a search tree in the process or you can say pick you can have branches for b at
the sense you can formulate it was the search problem at various stages. Essentially, we
want to do today is a different way of exploring the space, what the space is, so let us say
we have variables are five variables.

232
So, we had some formula which is construct over this 5 variables what is the space the
space is for each variable a or b or c or d or e value of true or false. So, 2 raise to 5
possible states in this case if you have n variables, then we have two raise to n states
what if n equal to100. So, it is a similar problem that if you have dealing with a large
number of variables then this problem could become hard. In fact, you know that sat
problem was a first problem which was a first problem which was used by cook to talk
about n p complete next essentially where n is equal to 100 we have 2 raise to 100 which
is about 10 raised to 30 states.

So, we have a 10 raised to 300 bit number essentially and we you has to see in earlier
that we cannot trifled bit numbers like 10 raise to 30, it is it is a huge number, little take a
billions of years to search through all these things. Even if you could inspector billions
states in a second, we have seen that argument earlier essentially. So, it is a difficult
problem to solve, of course you must familiar with the fact that there are variations or
special classes of sat problems which for example, talk about the length. So, this is
calling a clause and each is call a little, so the number of literals in a clause actually is
characteristics take of how hard a problem is.

So, there is a variation called two sat, which means the there is only two little literals for
clause we can have many clauses has you want, but only two literals per clause. You can
have as many variables as you want, but, if there only two literals per clause then the
problem is actually quite easy to solve and some of you must be knowing there well
known will go some Davis Putnam method for solving two sat. Essentially, 3 sat is m p
complete the 3 movement you go from 2 to 3 that you allow 3 variables per clause, then
the problem becomes m p complete essentially.

233
(Refer Slide Time: 06:36)

So, we want to explore something which we will call a solutions space search which is
lot radically different from what we called as state space search, but except for the fact
that we will in the search space every note will be a candidate solution essentially. So,
for example, further 5 variables my state could be something like this one 0, 1, 1, 1 and it
is a candidate solution. This means that I am thinking weather if I put a equal to 1, 1
meaning true and b is equal to 0 and c, d, e and equal to one Boolean formula we true on
notice easily. So, this is a candidate solution and we will look at a process of perturbation
which means we will take a candidate solution and perturbation to give us a new
candidate solution.

It is not other side; well not radically different from what you have doing earlier the
earlier my said could be called as constructive mechanism where you assemble a solution
piece by piece. So, that small diagram that I have drawn on there is illustrative of that,
that first you choose a variable a, then you choose a variable b, then you choose a
variable c and so on and so forth. We are doing that here, we are saying that we take all
the variables at one shot takes that as a candidates and if it not the solution will do
something with this candidate to generate more candidates. Essentially, so in that sense
we call is a solution space search.

Again, I would say it is not radically different it is just a different way of looking at
things essentially even in a for example, city problem, you could say that if I had gone

234
from IIT Madras to Marina beach the path from here to Adyar is that… That is a solution
or not you can think of it like that also as a candidate in some sense essentially, but here
we are saying that we have all the values or all the features of variables such we can
about. So, another example is a n queens problem, you can have a constructive solution
for the problem you can say place the first queen then place the second queen. Then,
place a third queen that and you can build a solution step by step all you can say that I
will have a candidate placement for all the queens.

So, for example, in en queens in mike set let say 6 queens I could say this is a candidate
6, 1, 3, 4, 5, 2 and I would interpret this as saying that the that this is the first column this
is the second column third column. So, I am saying the first column queen is a 6 row, the
second column queen is in the first row the third column queen is in a third row and so
on and so forth. This is my candidate solution and I will do something with this may be I
will do formulation of some sort to generate other candidates solutions. So, that is the
solutions space way of looking at things not radically different, but it sort of helps us to
visualize the space a little bit different.

Now, remember that for this problem there are 2 raise to n possible states essentially and
what move and function should give us is how to navigate this spaces. So, one way we
can divides several move then functions essentially for this one simple, so we will call
them as follows I will call this function one, which says change one bit. Essentially, if
you look at this notation where they convey it is the big thing, you allowed changing one
bits. So, I have tables like 0, 1, 0, 0, 0, sorry 0, 0, 1, 1, 1, so have change the first bit then
1, 1, 1, 1, 1, I change the second bit 1, 0, 0, 1, I came the third bit here, so this is one
move than function that I have all her neighborhood function.

So, that is why am use a name n have, it is a neighborhood function a given a candidate
solution a this be a neighborhood of the solution which is the same as what we called is a
move then the start essentially. So, back to our problem of local maximum why did we
get into the silk limbic algorithm because we wanted to save on space essentially? Now,
a days space is hardly a problem essentially I am in computer system I remember that in
1990 I have the pc in the department which had 30 megabits of hard disk and other day I
was talking to someone and he said this machine was only 2 GB RAM.

235
Essentially, you know look space is now days is not so much, so, problem you had a
heard this is militates famously supposed to has said a some point that who will need
more than 64 k of RAM. So, space of course has change essentially, so we should also
add adopt to the changing world essentially. So, at a before we come to this till the very
simple variation to claiming this why do want to restrict yourself to only one success
essentially why not more than one essentially, I will just will illustrate that with a such
three as follows.

(Refer Slide Time: 13:27)

It starts with a search the start note, then a next a there are five successes for a every note
may be like this. So, you generate five successes fill claiming says move to the best one
what is what is, so about moving to only 1 and a move to 2 essentially. So, let us take say
I move to this one or I move I consider both I do not throw I throw either I keep this to
best to once generate five children for this five children’s. For this again thing the best
two cumulate five children and move on this algorithm is called main search and this
elastration has beam bits width into 2, so what have it done. Instead of hills claiming
which explore only one path in the space they are saying we are allowed to explore more
than one part simultaneously this space.

So, in this example beam beats as to be called as to, so this beam is like you can think of
a like a beam of thoughts light know you should shining into and you can have more than
one. So, at every level we have to notes this, so we will came going down the search

236
space, but you will keep more than one option available essentially listen algorithm. It
has been use very successfully in speech recognition and it imagines that is useful in
situations where there are a few options which are likely to be candidates’ solution
essentially. So, if you can keep more than one option, so what is the problem in speak
recognition that you start with the speak signal then converted into phonemes, then
syllabus then words and so on.

Now, it turns out that very often different word combinations sounds similar essentially,
so very often we think you said this or did you say this, we know the such a thing is
possible impact, this in the chapter on Maxwell language processing. They are given an
example of women who goes to New York and says women come from New York. I do
not remember and we say everything here cause to nominal egg essentially. So, what was
she saying what did a here what did the listener here the listener said cost whatever a
nominal eggs the listener thought that women is saying that everything is cause a
nominal.

Essentially, the women was actually saying work cost, but intended to say was
everything a calls on arm in a egg, but you know people speak with different actions and
that kind of I am sure you can see make both are them sound the similar. I cannot do that
so easily, but that is what and what the twist a heard was cost nominal egg we can see
that this kind of problem is actually very the case in speech recognition the people speak
with different actions. The listener cannot always make sense of what at the same time in
the number of variation options that one has. I am not too many essentially and then
algorithm. Then, I will go some which can keep more than one options alive is likely to
succeed in recognizing the correct sequence of word.

Essentially, in signal processing circles called is a Viterbi algorithm basically is beams


search essentially now what is the complexity of beam search space complexity level yes
considered that it is a constant because the beam builds is speaks instead of one known.
You are going to keep beam known at every stage alive and so times number of
neighbors is a maximum have to deal with essentially that is a constant space algorithm.

It makes around the hill local maximum problem where you know one bars may not have
a good successor, but for example, all these may be bad, but this may have too good
successes. So, may be some were have this solution will find which would not have

237
worked out in this was better than this if this was better than this only this would I am in
selective only this launch should have gone. It should be because we have kept two
options both these branches are there so that one variation of how to get around this local
maximum.

Next, come back to this solutions space sat problem essentially, so I want to talk about
another algorithm which is. So, remember that I said that, so what is this is an about as a
move than function it connects givens state in this case a given a candidate solution to
other candidates solutions, but why should I choose this neighborhood function. Now,
that is only cap what does the hill claiming algorithm do, it generates all this five
neighbors and picks the best amongst them if it is better than it moves, so otherwise it is
where it is. So, the complexity of the algorithm will also move them function will depend
on how many in move how many in neighbors, we have will have to keep this.

Essentially, I can think of another move them function neighbor function, so just call it n
two and it says change any two bit. So, n 1 says change one bit see that for this small
problem of five variables sat problem, next my end to will give me a different neighbor
function. So, for example, one of the neighbors would be 1, 1 or let me just see in the
first 2 bits 0, 1, 1, 1, 1 another neighbor could be I change first and third bit 0, 0, 0, 1, 1.

It gives me a different set of neighbors not only he keeps a different set of neighbors it
gives me more neighbors. So, this n 1 gives me 5 neighbors or n neighbors if n variables
are n there this n two gives me n c 2 neighbors in this example that is ten neighbors. So,
it is a different neighborhood function, I can choose between different functions how can
I explore this information can I device an algorithm which it will try to make use of
different neighborhood function.

238
(Refer Slide Time: 22:11)

So, I can have n one n two then I can have n three and up to n if I have n variables and
this case up to n five I can again say change all five essentially of course, that will give
me only one neighbor. It move at me far from their all I can have combination I can
device n 1 2 which says change 1 or 2 bits or change up to 2 bits essentially, I can say
change up to 3 bits essentially or n 1, 2, 3, I can go up to none n which in our example
would be n 1, 2, 3, 4, 5.

So, the first thing I want you to observe is that they are if you go down this path they are
more dense the neighborhood is more bun dense as you go down this set of
neighborhood function. So, I cannot write this here, so I can choose a sequence of
neighborhood functions, which are in cleansing lead dense, which means they give me
denser and denser neighborhood. Thus most simplex other most parts is n one which
allow we taken get by changing one bit and then I can get n 2, we say 10, there I can get
n 1 2 which as 10 plus this 5 which is 15 and so on.

239
(Refer Slide Time: 24:19)

I can create a sequence of neighborhood function n one n two which is increasing density
how should I choose bits neighborhood function to use remember neighborhood function
same as move then essentially. So, we have now the option of choosing move gen
function on and we on the keeping mind that we role on the gets struck in the local
maxima, so what will be the heuristic function here? If have a clause like this for a
formula like this h of n for a given candidate solution these are all candidate solution say
how do you give them a value any suggestion number of clauses number of clauses it
makes a true that is a simplest you have doing it.

So, will keep that in mind or we can modify this to say weighted number, so if a clause
has more literals then give it more weight. So, another possibility and basically it gives
you an idea how much of the solution is solved in some sense if I have twenty clauses an
I one candidates satisfies 7 and another one satisfies is 12. Then, may be the 12 is better
than 7, just again a take remember just takes a not perfect this is only. We will basically
assume that the most clauses the better the heuristic function, so we want to know device
a algorithm it will search in this solution space and try to device in algorithm, which will
not get struck in a local maximum.

So, what which what is a criteria one should use for choosing a neighborhood function.
So, two things one is that we want to also keep a list on time complexity is time
complexity going to all these going to affect may time complexity all in what manner

240
these functions will affect my time complexity. So, that is why I am saying remember the
hill claiming algorithm what will be do it generates all the neighbors and then picks the
best amongst them and moves to them moves that neighbor then generates all the
neighbors. Then, thus that what about this function this one what is the neighborhood
function take this all example five variables a, b, c, d, e and this one is saying.

You can change any number of bits you want change 1 or 2 or 3 a when I say one or 2 I
mean any one any 2, any 3, any 4, any 5, what is the neighborhood function entire space
is the neighborhood function. Every note is connective directly to every other notes it can
you can move from any state to another any other state in one step. How many maxima
will this have? One only one which is the global maxima which is the solution, they may
be more than one maxima. So, that one is the point, but what is going to be the
complexity of the move gen function remembers that we said generates all the neighbors
this is choosing this the move gen function and using hill climbing.

What are you doing? You are just doing a force saying generates all possible states and
then pick the one best pronounce them, obviously how many states are there we have
major observation that there are two states to n states essentially. If the neighborhood has
to a to n states then you have actually the problem and may be not a group force search
which is no point solving because you know to which instates part to many inspect any
way. So, this hill claiming algorithm if you want to use this use they function the of 5,
but may be for 30 or 40 or 100 variables would not even start of it would spend this
whole life then waiting the neighborhood function done that.

So, what is the next base thing we can do, so we cannot choose the dense function that
what I am trying to say why all we in president dense function because dense functions
will not have this difficulty of getting struck in local maximum? What is the advantage
of choosing a farce neighborhood function faster the neighborhood generate it faster
essentially and choose us remember that hill claiming algorithm. Basically, it says
generate the neighbors pick the best amongst them if is better move to that then generate
the neighbors and so on.

So, what do we do can we device the nice algorithm which will give us which allow us to
see thus parcel the algorithm the neighborhood function the lesser the state you can. It is
true that is obliviously the milling of the parts the mold likely it is that you not connected

241
to the better state that denser the algorithm. The mold likely it is that for any given say
there is always a better state which the extreme example is this algorithm because any
state which is not a optimal solution a global maxima has is connected to the global
maxima.

Every state has a better state and a only state is do not have better state of the global
maxima states. So, if you can try imagine this that choosing move gen function is also
devising the landscape. It is a difficult for me to visualize it here, but it is what a move
gen function telling you will that one function is saying from here you can go to this to
this to this to this or to this one. Another one is saying that from here you can go to some
other set or together essentially and what is the lotion of a local maxima lotion of a local
maxima is that this state is better than always neighbors that is a lotion of a local maxima
if all the neighbors are the entire set like in the example then it is a global maxima.

So, in another way of looking at it is that denser function in many if you to the denser
neighborhood function then probability of the likely hood of a state being as local
maxima becomes lesser because it has to be a better than more neighbors. Essentially, it
just a very naïve way of looking at it, but you can get some intention or that essentially.
So, we would like to use a denser function because it is not likely to have states which
are local maxima which mean the surface, it generates would be smoother and a cooler
surface is more amenable to hill claiming.

Essentially, if a surface has no local maxima, then you will restore global maxima, so
that will a simple approach to and is given in this book which has mention it also given
in my book, but, I first edit here which is how to solved by modern in. It will really go
and look at this book because begging in a every chapter a given nice interesting puzzle
that you can solve. Essentially, may be one of these days I will give you one of the
puzzles the algorithm is simple that you have access to a set up neighborhood functions
and you do the following.

242
(Refer Slide Time: 33:37)

So, I will just write it very briefly call hill claiming, so the meaning of this is, so I call on
sorry we just say hill claiming. So, H c stands for hill claiming hills hill claiming with a
neighborhood function and you put i is equal to 1 and in a put i is equal to i plus 1 and
put it into a move, what is the meaning of this algorithm? So, we should work out the
details I am not going to write the details the meaning of this algorithm is and depending
on what the resource is is allow you will go into this loaf. Once you run out of resource
is you say stop something here, but inside this is a another loaf which is the hill claiming
loaf and that is saying use the I as neighborhood function.

I am assuming that their order crossing to increasing density use the i h neighborhood
function and when you terminate when you come out of hill claiming remember there we
no longer have the notional gold test. So, in fact it is of course you can put in a gold test
check, so for example, in a sat problem you can say yes if all my clauses are satisfied
then stop you can put then that extra check, but we are looking at optimization more
generally know. So, we want to find the maximum value of this heuristic function, so
what we are saying is it start with the most sparse hill claiming most sparse
neighborhood function claim us as claim up as much as you can, using a hill claiming
analogy.

Then, switch to a different neighborhood function the next one an en claim up again and
then keep doing that using neighborhood function. So, if you can visualize this somehow

243
you can say think of the neighborhood function as you know giving you connection with
ladles from one state to another state so on. So, each of them is a ladder and if the ladder
is going up you claim up if you does not going up you stop essentially and many stop
suddenly get a different set of ladders. Let us say this launch and see if you have better
state then you keep claiming the different set of ladders and so on.

So, what is the advantage the intention behind this is that you do most of the claiming in
the early stages that you early sparse a function will take you up quite a bit. Then, when
you gets struck on some local maxima suddenly you have different ladder take you two
another this is the algorithm its call variable neighborhood descent well they are talking
about minimizing. So, it is imagined you are going down a value into the claiming hill or
v d n were this is saying is that use sequence of denser neighborhood functions. Each
stage you do hill claiming and you can have variation on this you can work out
variations.

For example, one could say that after having using n to should that n 1 again allowed
because now I am in a different states may be a n one will help from there is a. So, this a
way essence that you try think of essentially for the basic idea is simple that if you us
talk on a local maxima use a denser function and try that. Essentially, this means that
depending on how were the resources you have you will explore it as many of this
neighborhood function as possible essentially.

So, I was on the end with one example that we of discuss earlier, which is the example
and what with a I want into a visualize what will the terrain look like men. Let us see call
when expert and by expert we mean not necessarily who will find optimal solution, but
who will solve it quickly according to alone solution, how will the terrain how will that
journey look like.

244
(Refer Slide Time: 38:31)

So, the idea that I am trying to talk about is this that this is the solution, so this means
you know what the solution that move 1 is, move 2, move 3, move 4 and so on.
Somebody else gave you a solution how will the terrain look like given some heuristic
function like the number of tiles in place or something like that initially the number of
tiles out of place. Let us see initially little be some value which that a out of place finally,
of course it choose be 0, but how was the journey and remember that we said we makes
does observation. When, you have partially solve the rubrics you many mode to move
further from their you have to disrupts some of the earlier partial solution that is reflect
by saying that the journey is like this.

You do like this and you have solve the top row, but now you have on do the middle row
you have to disrupt this, but you do something better then like this then there a some
variation then a. So, it will be something like this, so whatever trying to illustrate here is
that if a expert the solving is the rubrics cube, then at each state on the way if you want
to flirt the heuristic value it would something like this. Of course, it means that may be
this is an excavation maybe it is like this function like that that real one problems a
search there it is not. So, easy to devise heuristic function which will generates most of
surface essentially, so Rubik’s cube is just an example of that which mean that I cannot
solve.

245
I cannot even hope to solve rubric cubic hopes an using a hill climbing like algorithm
because I will find it extremely difficult to device a heuristic function whose surface
would be monotonically decreasing towards the goal state essentially. In practice since I
have to disrupt the cube on the way I will be first two go against the heuristic function
essentially. So, going with a heuristic function we will call exploitation of the heuristic
knowledge and as oppose to a exploitation is following. Somebody told you do this
sense, so you do that essentially as oppose to exploitation is the notion of exploration to
explore new idea new areas of the space essentially.

So, for our only attempted exploration has been this beam search in which we have said
you will keeps many candidates alive, but in the next class then we meet we will see how
to try to increase component of exploration in search. The reason behind that is at exploit
pure exploitation which means purely following the heuristic function is going to end of
in local maxima even in the surface has local maxima what kind of algorithm. Can we
think of that will go beyond the local maxima and to the global maxima. So, I will stop
here and will take that of next week essentially.

246
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Lecture - 12
TSP Greedy Methods

So, you were looking at optimization. And remember that we came to optimization from
state space search.

(Refer Slide Time: 00:20)

And the way with this was that, we looked at the heuristic function. That was used in
best for search. And said that this, defines at again over which we use an algorithm
called hill climbing, which is a local search algorithm. As the first, a global algorithm
which best first search was, which means at the algorithm looks only at it is
neighborhood. Immediate neighborhood in the state, or in the solutions space and moves
to one of them especially.

And then, we saw that local search algorithms can get stuck in local maxima, or local
minimize is the case may be. And so the theme for our said this movement, is escaping
from those minima. That, how can we improve upon hill climbing. So, what does hill

247
climbing do? Hill climbing does, what we will call as exploitation. And exploitation of
the heuristic function essentially, that it basically follows the heuristic function.

It looks in around it is neighborhood. And wherever the heuristic function is getting, a


maximum increase. Or in other words, in the direction of the steepest gradient, it makes
one move. And then, repeats it process still it cannot find the better neighbor. And in
escaping local maxima, we said that one of the first algorithm that we looked at; or in
fact, the only algorithm that we have seen is variable neighborhood descent.

And what this algorithm did was that, it tried out a variety of neighborhood functions.
So, for us the neighborhood functions is given by the MoveGen function. And we also
said that, we are moving from state space to solution space search, which means that we
are perturbing candidate solutions to generate new solutions. And we saw the example
with said, that you can flip 1 bit or you can flip 2 bits or up to 2 bits or 3 bits and so on.

You can generate a variety of neighborhood functions. And what neighbor, variable
neighborhood descent is that when it get stuck, it does a series of hill climbing's. And
when it get stuck, at a local maxima or at a maxima it does not know, that is local or not.
It increases the density of the neighborhood function and tries hill climbing, all over
again essentially. So, we want to move, we want to look at different other approaches
which will help us get around local maxima.

So, in the process when we are talking of optimization, we will use the notion evaluation.
So, this is just particular to the optimization community. So, but you must keep in mind
that, we are talking about the same function; whether it is h of n or evolve of n. It is
basically a value that you get, for a given candidate. And the task is to maximize, to find
the candidate which has the maximum, this value. So, when heuristic functions, if you
are thinking of heuristic as a distance, then you want to find them.

State with the smallest distance, which is the goal state or if you have pause it, like
within the block problem. That some heuristic function, which has the maximum value at
the goal state, then you are maximizing essentially. So, instead of calling it heuristic
function, we call it evolve function. But, the process is still the same that, we all want to

248
find the note with the maximum value or the minimum value, the case may be. So, today
let me introduce this. One of the most talked about problems in computer science, which
is the tabling salesman problem.

(Refer Slide Time: 05:17)

I am sure, you are all familiar with it. So, I will just write the acrolein here. TSP stands
for Travelling Salesman Problem. And it is one of the simplest problems to state and one
of the hardest problems to solve this easily. So, a travelling salesman problem is
basically, the motivation is set to be to help a travelling salesman, who has to visit many
cities. Let us in one day or something like that. And come to his home city. And do the
whole thing with some minimum cost, in some manner essentially.

So, I suppose everyone is familiar with the TSP problem. We will say that, we have city
1 to city n, n cities which we will also denote by 1 to n. Just has a short hand. Easier to
refer to is essentially. So, there is n cities, 1 to n. And you have to go from one place to
another essentially. one city to another, without visiting the same cities twice. Now, in
practice off course, in some problems it may be necessary for you to visit the same city
twice essentially.

So, for example, if there is a small Ireland near the coastline, to go you go to that Ireland

249
from one city. Let us say Chennai for example. And you come back to Chennai and then,
go somewhere else, essentially. But, we will ignore all those problems. And we will
assume that, the classical problem of visiting each city exactly once. Essentially is what
we want to solve. Now, in practice a city network may not be completely connected.

In the sense that, if it is a road network, you may have roads from some cities to other
cities and so on and so forth. But, not for example a direct road from, let say here to
Nagpur or something like that. But, we have indirect roads, in the sense. That may be,
you go from here to Vijayawada. And from Vijayawada, you go Nagpur or something
like that. But, in general it has been found that, it is easier to solve TSP when the graph,
underline graph is completely connected.

So, we will assume that, this is completely connected. And we can always convert a non-
connected graph to a completely connected graph. I adding new edges. And making sure
those edges have very high weight. So, that they never figure in the solution actually.
But, the solving process becomes easier. Now, TSP occurs in many practical problem.
So, for example drilling circuit boards, if you want to manufacture circuit boards then,
you have to drill many holes on that board.

And if you think of each hole as a city then, you have to visit all these locations keep
drilling holes, essentially. So, we have something like a TSP to solve them. And such
problems have something like thousands of. So, it is not uncommon to have a problem
with eight thousand holes to be drilled and think like that. And then, you can see that the
complexity of the problems grows very quickly. So, what is the number of tours that we
have. For, if you number of n cities, how many tours can we have for the n cities.

So, we can have n factorial tours. And one of the notations that we will use for tour is,
something like this. So, for example, if you have nine cities then, a tour could be
something like 2 7 1 6 4 3 8 5 9. So, list city names or in this case city indexes or city
numbers, gives you a candidate tour, essentially. So, this is a tour says that, you started
cities 2 then, go to city 7, then go to city 1 then, go to city 6 and so on essentially.

We can represent a tour in this session. And we can see that, we can honest choose the

250
first number is n ways, the second one in and so on. And that is, how we get that number
factorial n. But, many of these tours are duplicates of each other. In particular, if I rotate
this number, if you think of this is a number, I start with 7 and 2 comes at this end. Then,
it is a same tour essentially. Because, remember that in the travelling salesman problem,
you have to come back to the same cities.

So, after 9 we are going to come back with 2. So, I could started with 7 and return to
here. I could start with 1 and return 27 here and that would be the same essentially. So,
you should device this way n, because there are these n rotations that are possible. And
in the addition, if I write it in the reverse order 9 5 8 3 4 6 1 7 2, then also we will
assume it is the same tour. So, we will assume that the cost of going from city a to b is
the same as the cost of going from city b to a.

So, you have to divide further by 2. So, you have divide by 2 n. So, what we get is, n
minus 1 factorial divided by 2. So, this is a number of distinct tours, which means say no
2 tours are identical. Of course, in factors it may not be easy to recognize the tours, the
distinct tours. So, we can say that we have about n factorial. The size of the space is end
factorial, essentially. How bad is the factorial function?

So, remember that we has said when we have talking about sat, we had said that a 100
variable SAT problem has to raised to 100, the size of the space is 2. So, 100 variable
SAT has 2 raised n, which we said was about 10 raised to 30. And we have in sought of
talking about this large numbers, and say how big these numbers. We try to imagine how
big these numbers, really or essentially. So, you must whenever you get time, look at this
book call mathematical themes.

A mathematical theme, which is by I keep talking about quite often. And one of the
articles in that book is or one of the chapters in this book is, about how we cannot
distinguish between large numbers. So, we cannot if I say 25 billion or if I say 25
million, they basically appears same to you essentially. I mean in the sense, you do not
have a sense for large number. Essentially, if I say that, it takes 10 is to 27 let say
seconds. And if I say it takes 10 to 29 seconds, it sounds almost the same to us.

251
But in fact, is of course, 10 is to 29 is 100 times 10 is to 27 and we sort of 10 to lose
sense, of in essentially. So, we have seen the 10 raised to 30, extremely large numbers.
And if you have to explore all of them, it would be billions of years essentially. But, 100
variable TSP is about 10 raised to 157. So, TSP is much, the factorial function is much
much much verse in terms of how fast it grows, has compare to the explanation function.

So, the SAT problem is exponentially hard. In fact, it known to be n p completes. It for
the first problem, for that was shown to be n p complete. TSP is verse an exponentially,
it is factorial and factorial tends to go much much faster. And we can see that, the 100
variable TSP has about 10 is to 157 possible different, possible to us. And that is the
number, that we cannot even began to imagine essentially. So, as something to compare
with, we can say how many fundamental particles.

Let say as some level electrons for example, are there in this entire universe. Does
anyone have an idea? How many particles are there in the universe, at some level of
detail, essentially?

Student: (Refer Time: 15:24)

Yes. It is in similar, that somewhere 10 raise to 75, 10 raise to 80 depending on what


level you looking at. So, the total number of particles in the universe is about 10 raise to
80. See each of them was a super computer, examining billions of states per second. We
would still need billions and billions of years. So, these are the very large numbers that,
you have to tackle. And that is why, you cannot even hope to solve this completely.

You cannot even hope to find them optimal solutions for something like, TSP. And in
that sense, many people call TSP has the holy grail of computer science that, how to find
good algorithms to solve TSP is a great motivator, essentially. Now, one thing that we
have talking about so what is the problem? The problem is that we have to find the tours,
the cost of the tour is minimal, essentially. That is the problem essentially.

Now, the cost of course depends on the individual distances between. So, distance city
distance from i to j, what kind of a function are we using for distance function, much

252
depends upon that as well. If we assume that, the distance function is arbitrary then, of
course the problem is completely hard to solve essentially. Then, we have to go back this
thing. But, sometimes the problems are not so arbitrary. In the sense, distances are not
necessarily arbitrarily.

So, one of course example is, simple example is the Euclidean distance. So, if the
distance is the Euclidean distance. So, for example, if you have manufacturing a circuit
board then, you can think of the distances of the Euclidean distance. And then, at least
indubitably you can imagine that, you will be able to find an optimal solution. So,
supposing it was the great, and you have to drill many holes in the grid.

Then, you can at least hope to say that, I will find a good solution, if not necessarily an
optimal solutions. Now, in terms of that for, so Euclidean distance we will call it as
Euclidean TSP. If this is the Euclidean distance, as the distance measure it turns out.
That if the problem is the Euclidean TSP then, you can solve it in polynomial time, not
optimally but, the community calls approximate solutions.

So, we can find approximate solutions, in polynomial time and you can specify it, what
degree it is approximate. So, it is something like 1 plus 1, 1 upon c or something like that
times optimal cost. So, in polynomial time you can find very good solutions and you can
put a bound on, how bad they can be essentially apart. From Euclidean distance, we can
say that the distance should satisfy, what we call as a triangular inequality which is the
weaker condition.

Then, Euclidean distance also satisfies triangle inequality. But, what triangle inequality
says that, if you want to imagine a triangle between three cities. Then, the distance
between of the length of one side is smaller than the sum of the lengths of the other two
sides. So, that is the, this is known as the triangular inequality. And if the distance
function satisfies the triangular inequality then, also it is relatively easier to solve.

Easier to solve meaning, you can find good solutions faster. You cannot solve it
optimally, essentially. Then, we have something call the geographic TSP. So, imagine
that you are one of those globe floating executives, who flies from here to Delhi and

253
from Delhi to Tehran and Tehran to Budapest and so on and so far. Going all over the
world then, your domain is a spherical domain. The earth is the sphere as we all know.
Now, at least and distance is do not necessarily, we are not necessarily we cannot talk of
Euclidean distance.

So, we have to talk about distance on the curved surface. So, that is the slightly different
problem. People have try to solve these kind of problems. Now, interestingly you must
look up for this website, it is called TSP LIB. So, I do not remember the address but, you
must look up for TSP LIB website. It is a website, maintained in one of the German
universities. And it is a collection of very interesting TSP problems.

So, you know there is a problem of all European cities. For example, and some circuit
board problems and things like that. And more interestingly, it has got problems in which
somebody has taken the travel to find the optimal solution. If you have a problem of
15000 cities for with the optimal solution is known then, you can write some algorithm
that will be used to calculate them when it exploring.

And compare your algorithm with one of those with the actual optimal solution. So, this
has the optimal solutions. So, there are examples big examples, which their optimal
solutions given. And so you could use that as a benchmark, to see how good your
algorithm is essentially. And we probably give you one exercise, along those lines
essentially. Some of those optimal solutions, see the only way we can guarantee an
optimal solution and we will study optimal solutions, sub it later.

Is to say that, I mean this sound like a circular argument but, to say that there is no better
solution, which is possible essentially. And if you then, guarantee that there is no better
solution then, you will have an optimal solution. And it is taken them, if you look at
some of the references. For example, this book (Refer Slide Time: 21:49) Something,
like thousands of computing years of computing time.

So, as they are many machines working in parallel. And then, they compute this
exhaustibly anumulating all TSP and finding optimal solutions. So, you must go and look
at this site, which will give you optimal solutions essentially. So, today let us spent a

254
little bit of time. So, I am brought TSP here, because one of the things you want to do is
look at this variable neighborhood descent again.

From and see how TSP can be solved using this essentially. So, basically the idea being
that, what are the different neighborhood functions that we can construct? But, before we
come to that, so remember that we had said that, there are two ways of solving problems.

(Refer Slide Time: 22:38)

One is constructive and the other is perturbative. So, in constructive method we construct
a solution, bit by bit. And that is the whole state space search that we started with. In
perturbative methods, we take a candidate solution and perturbative to look at another
candidate solution. So, we did this for SAT. For example, we said that any bits string is a
candidate solution. Then, you can change some number of bits, which is the perturbation
you have doing.

And look at another solution but, it also constructs a solution edge by edge, if you want
to say. So, let us first look at some algorithm. So, constructive solving the TSP with
constructive methods and then, we will come to Perturbative methods in which case. We
will look at variable neighborhood descent. And whatever other algorithm, that we are
going to look at after that. So, for example given a set of cities. So, let us assume that we

255
have working in this Euclidean space.

So, there are actual distance is the distance, what kind of algorithm can you think of. You
must have tried something some time. So, let simplest constructive method is to start.
Try to stimulate what you would do, if you are doing this in the real world, so to speak.
So, you start the some city, let say this one. And then, look at it is neighborhood. Now,
remember this entire set is a neighborhood because, assume that our graph is completely
connected it, essentially in this case.

But, when we come to perturbative methods, we will look at smaller neighborhoods


where you know, all those in this the neighborhood is different. In sense that, it is not a
candidate solution but, it is a neighboring city. So, you will go to the city which is
nearest to you it. So, let us say you go from here to here. And then, you repeat this
process. You go to the nearest city, you go from here to here, when you go I mean, it
looks like this is the nearest and you will get some solution and solve.

So, this is one simple heuristic algorithm. It is a greedy algorithm. It says, started some
city and go to the nearest neighbor. Then, from there go to the nearest neighbor and so on
and so far, essentially. Now, obviously in some situations it will give you a very good
solution, if not the optimal solution. But, you can imagine that what will happen with
situations, where I have a city somewhere here. If I have a city somewhere there, what
will my algorithm do, ideally what should it do?

It should go from here to that and come back here and then resume its thing. But, it will
not do that because, my algorithm says go to nearest neighbor. So, I will go from here to
then, I will go from here to here, go from here to here then here, here, here to here, here
to here. Then I will go there and then I will come back here. So, obviously you can this is
just to illustrate, that this algorithm will not always give you a optimal solution.

But, it will give you depending on what kind of problem it is, it will give you reasonably
a good solution, essentially. Now, a simple variation to this is, that instead of thinking
going like this, you can think of extending you towards. So, at the any given time you
have a partial toward it. So, let say we have done only, till this much. And instead of

256
saying that, this is a one directional thing you can say that, you can think of it is going in
both directions.

And then, you can say you can extend at either end of the tour. Instead of saying that,
only extend that where you verse trying to stimulate a physical person moving around.
You can say which ever end of the tour has a closer city extend, that towards end. That is
as a simple extension of that essentially. So, I am now how do you talk about cities like
this, which are far away from the rest essentially. So, one algorithm, so I am not writing
these things but, any of you look at any text book on TSP, they will talk about these
algorithm.

You could say that, I splice this. So, in the science that supposing I have already
constructive this toward, which is not a very good toward but, I have constructed this and
I only left with that city. Instead of going from this place to this place but, I will do is
that, I will find out which point in the tour is closes to this city. And connects that with
this and then break the stage and connect this with this. So, this will give you some
improvement essentially.

So, that is but obviously, every time we do that, the complexity increases. Because,
notice now that if you write a general algorithm for that, you will say that for every new
point in the city, which is the closes point in the tour. So, you have to inspect in the
whole toward. So, complexity will go by a factor of n in that case. So, this is one set of
constructive algorithms for solving the TSP. Another popular thing, which is known a
greedy heuristic says that, you sort the edges.

Maintain a sorted set of edges and work with edges instead of working with cities
essentially. So, you can now imagine what the algorithm is. It says that, you have all the
edges that are available to you in this thing. So, pick the shortest edge that is available to
you and add that. So, in this example for example, it could be this one. So, this could be
my first says that I add, because that is the shortest edge. Then, the next shorted could be
this one. Then, it could be this one and so on and so far.

So, this is a different algorithm. It is shorting the edges. So, you short them once and

257
then pick the shortest edges. The hope is that, you will pick all the small edges because,
for the optimal solution the more the number of shorter edges, the more likely it is
optimal essentially. Obviously, when you are doing this greedy heuristic, you have to be
careful that. You do not have a loop on the way essentially.

So, if you have connect a some number of cities and if the next shortest edge is forming a
shorter loop then, you should not take the edges essentially. Used that one thing you
should know. Then, there is another heuristic, call the savings heuristic. And all these
have you know, available in many books. The savings heuristic says that, first construct
n minus 1 tours of length 2. So, let me illustrate this with this example.

(Refer Slide Time: 30:58)

You take some arbitrary city and you construct tours with every other city. So, I will just
take a smaller example. So, there is some city and from there you constructing this. So,
there are, in this example there are five cities. So, I have constructed four tours of length
two. And then, you merge to tours. So, how can you merge two tours? So, for example, I
can say that, I will merge this two let most two tours. So, I will take this one from here.

I will take this one from here. And I will delete this edge and I will delete this edge and I
will add in a edge, here new edge. So, the only question is which two tours should I

258
merge and that is where the name comes from savings heuristic. It says, select that pair
of tours in which you get the maximum savings. What is the savings? That, your length
you had four edges in this to start with. So, you add at the lengths. So, l 1 l 2 l 3 l 4 and
now after merging, you have l 1 l 4 and a new one, that is call it l 5.

How much is the saving, how much is l 1 l 4 l 5 better than l 1 l 2 l 3 l 4. Choose that
combination, which gives you the maximum savings essentially. So, now that you have
merge these two. Then, you could merge this larger tour, with this third one. For
example, so you could delete this edge and add this edge. And add this one and so on and
so far. We keep merging. So, in the n minus 2 merge operations you will finally, get the
tour essentially.

Again, it is heuristic algorithm. Not guarantee to give you an optimal solution but, in
general all these algorithms give you a reasonably good solution for Euclidean TSP, at
least essentially. So, let us now go to the perturbative approach. What does perturbative
approach says? That it, you take some candidate and generate it is neighborhood
essentially. The which is, kind of different from this. Here you are constructing the tours
gradually.

Here you are saying, I have one tour given to be and I am going to produce a set up
neighborhood tours and choose one of the essentially. Exactly like, what we did in SAT.
I have a candidate solution. I will perturb it by changing some number of bits, to get new
solutions and move to that new solution. This was like hill climbing like algorithm, we
are approaching now. In this, the question is what are the neighborhood functions at we
can talk about here.

Can we think of that? So, given that tour for example, 2 7 1 6 4 3 8 5 9. How can I
generate a neighborhood around? What is the property that a tour by satisfy? And if
basically should be a permutation of those n numbers essentially. So, essentially you
want to generate some new permutations and explore one of them essentially.

Student: (Refer Time: 35:04)

259
Swapping adjacent numbers, would give you a new tours. So, instead of saying this I, so
I can pick two, any place and swap two numbers. So, for example, I can replace 1 6 by 6
1, that is what you have saying. So, that is one. So, you can imagine how many such
moves are possible. You can make n minus 1 swaps here. So, this will have n minus 1
neighbors essentially. So, either the first two are the second two are the third like in line
it.

So, you can journalize that to something which we call as a 2 city exchange. And what
that says is that, take any two cities in the tour and exchange their positions. So, this is a
particular case of that, it says that take two adjacent numbers and exchange that. This is
the operator says, the take any two cities and generate essentially. So, how many
neighbors will this have. I can take n c 2. Two cities in n c 2 ways and for each way that
I pick, there is only one neighbor I would be generate because, I can only exchange their
positions essentially.

So, I will have n c 2 neighbors essentially. Or I can have 3 city exchange, what this says
is that, takeout any three cities from the tours. And put them back in some different order
essentially. So, I can pick them in n c 3 ways and then I can put them back in 3 factorial
minus 1 because, 1 I do not want to put it back in the same order. So, 3 factorial minus 1
order. So, I will generate that many these things. So, what is this two city exchange is
doing?

If I have a tour like this then, if I am going to exchange this one with this one for
example. Now, what it means is that, I was going in this order. But, now I am going to
go, instead of this I am going to come here. And from here, I am going to go there. So,
this will go away. Then, instead of coming here I will go like this. If you must visualize
this, this what happening when I am exchanging this city with this city.

Originally, it was a circular looking tour. Now, because I will exchange the position of
this two cities from here, I am going to go here, from here I am going to go there. And
then, continue here. And then, this something finger will have essentially. Now, it turns
out that city exchanges are not the best way of thinking about things. But, edge
exchanges are end of the easier to think about essentially.

260
So, for example, I have two edge exchanges. It says that, remove some two edges from
here tour and insert two new edges, instead of that. So, let us this see, what this means?
Supposing, I have tour like this or let us see I have I am just trying them in the circle
because, this easier to this in factors. Of course, it is going to be sought of distributed
over some space. I mean cities are never arrange in nice circle edges but, it is easier from
me essentially.

So, one of the thing that we would want to do is that, you know this Euclidean TSP is to
generate random problems, which means take a two dimensional space. Let us say a
computers, monitors screen and randomly place it there. And then, say find the optimal
tour for that essentially. Now, if I have a tour like this. So, ideally if I have a tour which
is something like this, you can see that I have two very long edges in this tour essentially.

If I could somehow remove them and replace them, into two shorter edges, keeping it as
a complete tour, I can transform this problem in to another problem in which, instead of
these edge and instead of this edge, I add this edge. I add this edge and I add this edge.
So, whatever them, I have deleted two edges from my tour. And replaced it with two
more edges and there is only one way I can do that essentially.

If I delete any two edges, I have only one new tour that I can create, in this example.
How will I do this, two city exchange? So, let us say we have that tour. So, something
like that, 2 7 6 5. Supposing, I have representation like this, how can I implement two
edge exchange? So, anyways, so it is not comes. So, once you know the answer, it is
simple. Essentially, you take a sub slink here and reverse the sub slink.

So, 2 7 4 5 6 1 3 9 8, so let us see whether it is really doing, what we are saying it is


doing. So, which other two edges that I have removed here? The edge that I have
removed is 1 from 7 to 6 and 4 to 1. The other edges remain the same because, from. So,
here instead of going from 7 to 6, I am going from 7 to 4 and 4 is still connect to 5. So,
maybe I should draw this. This is called as cities. So, I go from 2 to 7, 7 to 6, 6 to 5, 5 to
4, 4 to 1, 1 to 3, 3 to 9, 9 to 8 and 8 back to 2.

That is the last step. This is the tour that I started with. Now, I am saying just rotate this

261
sub slink, 6 5 4 and that gives me, this tour 2 7 4. So, let us follow that tour from 2, I am
going to 7, again from 7 I am going to 4, 4 I am going to 5 is part of the same tour,
notice. From 5 I am going to 6, from 6 I am going to 1, from 1 I am going to 3, which is
also part of the whole 2. Then, from 3 I am going to 9, 9 I am going to 8 essentially.

So, everything all the edges except for this 1 to 4 edge and 7 to 6 has been replaced. So, I
have taken up this edge 7 to 6 and 1 to 4. And replace them with this edges, which goes
from 7 to 4, which is this edge and which goes from 1 to 6, 6 to 1 in this new edge. So,
this is this one and this one is new edges, that I will introduced. So, rotating a sub slink
will effectively do a two edge exchange for use essentially.

Why this intuitively more appealing than city exchange? I hope it is intuitively more
appealing than the city exchange because, it is the edges it is the edge cost, bit adds up to
the total cost of the solution. So, what would you ideally want to do is, to inspect your
solution. And off course, between every two cities there is an edge cost. Pick those
edges, which seem to be very high cost. And replace them essentially.

So, the original example that I have drawn, which was like you go some tour like this.
So, if you have some tour like this, where you are going like this then, these two are very
long edges. If I remove them and replace them with this, I will get a shorter tour. So, in
that sense manipulating edges is more appealing because, you can at least apply the slink
of removing very long edges essentially. So, 2 edge exchanges just one example. You
can look at 3 edge exchange.

262
(Refer Slide Time: 45:10)

So, 3 edge exchange shows, you must compute how many neighbors are there are and so
on. So, you can pick two edges in n minus 1 c 2 ways and then you will get n minus 1 c 2
successes. In 3 edge exchange let say, this is the problem. This is the original tour given
to me and I am removing this three edges. So, one is this one, one is this one and one is
this one. And now, we can see that we can put them back in different ways.

So, one way you can put them back is, that you can connect this to this, from here. So,
you must be careful not to form a cycle. So, from here I cannot go to this. So, that not
allowed. So, from here I can go to, let us it is this. And here I can to this. No, something
is wrong. Now, here there are 3. So, I cannot do this. So, I can put this here. Then, from
here I can go to this. And from there, I can go to this because, I have a new tours
essentially.

So, it turns out that there are four different ways to put them back. So, let us try one more
and I will leave the other two is an exercise for you. So, you understand this notation. It
is a kind of shorthand. This is some tour with some n cities but, we have drawn only six
cities, representing those six, three edges that we are going to remove. So, this edge we
are removing, this edge we are removing and this edge we are removing.

263
Less of the tour remains the same but, we have puttng back three edges in slightly
different place. So, instead of bliss now I can instead of going from here to here, I can
say go from here to here. Then, from here you go like this. Then, from here you go like
this. And there are two more different ways of doing it. So, I will leave that as a small
exercise for you. So, three edge exchange if you take of three cities three edges, you can
put them back in four different ways.

And you can take of three edges in n minus 1 c 3 ways, so into 4. These many neighbors
you get essentially. So, what I am trying to illustrate here is that, when you treat the TSP
perturbative problem, you take any candidates solution. And you can generate a
neighborhood of candidate solution surround it, using two some number of city
exchanges or some number of edge exchanges. And they are neighborhood functions of
different density essentially, which means that you can apply the variable neighborhood
descent functions essentially.

I am not writing this algorithm again. This variable neighborhood descent basically says
that, if you have a set of neighborhood functions are raised in a order of increasing
density. So, by density means how many neighbors thus, a given candidate have? So, if a
candidate has let us say 5 neighbors or if a candidate has let us say 12 or 15 neighbors.
Then, this neighborhood functions would more dense and this one essentially. And why
did we want denser neighborhood functions because, it is likely that the best amongst
them is the local maxima essentially.

Or rather, it just likely that the given no reason local maxima, which means it does not
have a better note surrounding it. So, there have the more note surrounding a note, the
more the likelihood of a better note existing, if there are better notes in the spaces
essentially. So, the idea of variable neighborhood descent is that, to start with the
simplest neighborhood function. Why are you do not we want to work with the most
dense function first?

Because the cost of making a move is proportional to the number of neighbors, that we
have because, we have to inspect all the neighbors and then pick the best amongst them.
So, I like in SAT if every, if we can change any number of bits then, it amongst to doing

264
the complete beautiful search. We do not want to do that, we want to starts with
neighborhood functions move on to dense a function and so on and so far.

So, in today's class basically what we have done is, looked at the TSP problem. And we
are not really looked at the new method for escaping local maxima, which you will do in
the next towards see classes. But, when we are doing that, we will keep in mind how to
solve the TSP. So, remember that this kind of this thing. Just a quite comment about
these two city exchange, remember this was the two city exchange, I exchange this city
and this city.

And as the result, I added this four new edges. Remove the four edges because, the
neighboring edges from here and I added for. So, you can see that this is the particular
case of four edge exchange essentially. In practice, off course if you remove those four
edges, you can put them back in many different ways. So, two city exchange is just a
special, one of the cases of those four city exchange. Four edge exchange, which is going
to give us a denser functions essentially.

So, the more the cities of the edges we remove, the more the ways you can put them back
in and more the ways you can remove them in. And they give you denser functions
essentially. So, that is one mechanism for doing that this. I will just stop here. We will,
when you comeback we will look at a new algorithm, for try to escape local maxima.
How can we escape this local maximize? That is going to be our objective in the next
few lectures essentially.

265
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Lecture - 13
Tabu Search

(Refer Slide Time: 00:14)

With our objective of escaping local maxima or local optima. So, remember our
algorithm for hill climbing and it basically says that, if c is a current node and if n is a
next node, whether you should move from a node to a next node. And algorithm
essentially says that, if the best node that you get from move gen of c, which is basically
in. So, you look at all the neighborhood of c, c is a current node and take the best
amongst them, max or min value depending on what you are doing. And if that is better
than c then you basically move to c and you put this in a loop essentially.

So, it basically looks at the neighborhood and so keeps this at problem in mind, all the T
S C problem in mind, where we are looking at a candidate solution and the neighboring
candidates solutions essentially. Now, what was happening with this algorithm is that, it
along some let us say one dimensional problem, you would end up here and stop here,
because all neighbors are worse in this and when in practices real optimum may be here
or some were else bit further away essentially.

266
So, what do you want to now, look at this is, how do we explore this space more? So,
that we do not get stuck at a local optima, in this case a local maxima essentially. So,
what hill climbing does is exploitation of the gradient, it basically follows the gradient. If
there is a neighbor which is better, it goes to that, otherwise, it gets stuck.

(Refer Slide Time: 03:28)

What explores, what this escape requires is the ability for exploration. So, the simplest,
so first let us look at the deterministic algorithm, which will allow us to explore more of
the space and what do you mean by explores, that we are allowed to go against the
heuristic function. It does not means that you can only go to better states; you can go to
states which are not necessarily better. So, of course then we have to work out the
termination criteria and things like that.

267
(Refer Slide Time: 04:24)

So, let us look at the variation of this algorithm, which says that n, we get the best as
before, but we will introduce another feature, which is allowable or allowed and we just
simply put this in the loop. So, we are not checking whether it is better or not. Here, we
have that condition, if the best neighbor is better than c, then you move from n to c.
Here, we are simply saying just move to the best neighbor, but not just a best neighbor,
but something called allowed essentially, which we will look at now essentially.

So, before we come to what is allowed means, it basically means that, you can move to
the best neighbor, which means that, even if you are at maxima, you can go to a
neighbor, which is not better than that essentially. So, that is the key first thing that you
must remember essentially. Now, supposing we wanted to allow this, that always go to
the best neighbor, which the criteria that you it does not have to be better than the current
node.

Then how do we stop, that is one small problem, but there is a bigger problem and the
bigger problem pertains the fact, that if I am stuck at this local maxima, how do I get to
that maxima essentially. So, the first problem of how do we stop, we will say that put
some other termination criteria. So, tell some, it can be simply time based or it can be
simply is that we are not finding a better solution after a certain amount of time and so
on. You can always store the best.

268
So, looking at in general in optimization terms, that in this process always keep track of
the best solution that you have found ever, even if you have moved away from it,
remember that, that was the best solution. So, always keep track of the best solution, so
that you can always do. Given that, we are now exploring the states place and this
algorithm says that for example, in this case, in a one dimensional word, they are two
neighbors here and here. So, I am allowed to move to one of them essentially.

In a larger space of course, there are many neighbors, so you can move to the best
amongst them essentially, but to illustrate the point, illustrate the difficulty that this
algorithm has supposing I am allowed to, so I came from here to here. Then I am allowed
to move here that is allowed, because I am no longer saying that, it should be better than
what I am in, what will happen in the next step. Next step, this will have these two
neighbors, this one and this one, what will the algorithm do?

Student: ((Refer Time: 07:56))

No, that we, that is the independent thing, we just now talking about the behavior of the
algorithm, where will it go next.

Student: ((Refer Time: 08:06))

Unfortunately, it will go back to this place, even if it takes one step down, but this one
say that, still that best criteria is still there, go to the best neighbor, so that one is still the
best neighbor essentially. So, it will go back there, how do we get along this that is
where this allowed thing is coming in. So, remember we had this idea for a close list,
when we started doing the states space search and what close list said that, do not go
back to the same node again that you have seen, so what we could do.

So, we are looking at this now, one way to do this is to maintain a circular list, maintain
a circular cube of some finite size. So, you know what is a circular cube, that you keep
overwriting as you go round and round the circle you allow to overwrite, but some k
number of elements will always be stored in the cube, where you can you are allowed to
overwrite. So, it is, this is like a short term memory. It is saying that these are the last k
nodes that I went to and I am not allowed to go back to them, but I can I am allowed to
go back to any other neighbor.

269
(Refer Slide Time: 09:40)

So, what with this allowed factor, what we are really doing is that, given a node c that we
start with, we generate all the successors, all the neighbors, from this neighbor we
disallow some. So, we say this move is not allowed, this move is not allowed and this
move is not allowed. So, this is that allowed thing that we are trying to do here, but from
the remaining move to the best one. Another way of doing, this is the following that keep
track of what moves you made in the recent past essentially. So, for example if you are
doing S A T and let us say for simplicity sake that, you are changing only one bit at the
time.

So, essentially move says that k in the highest bit where I could be any bit essentially. If
a bit is changed in the last t moves then disallow it. Notice that, this is slightly different
from maintaining a closed queue. In a close list, we are maintaining our candidates and
saying we will not generate the same candidate again. In this example, we are saying
that, if we are change, let us say the fifth bit now, then for the next t moves, I am not
allowed to change that fifth bit essentially. So, what will happen, I can maintain for
example, a memory and array called M, which will start with 0 for everything that is one
way of doing it.

Let us say, I have a 9 bit problem, 9 bit S A T. So, how many does this have 3, 4, 7 let
me add 2 more, I have a 9 bit S A T like this. I can change any one of those 9 bits and
move to the best amongst them. Now, supposing I have change the fourth bit that is the

270
best, that I am allowing to do. So, what am I doing, I have a valuation function for every
node or every candidate solution, I have a valuation function and we said that for S A T,
it could be the number of clauses satisfied or something like that essentially. So, I have a
valuation function and I generate all this neighbors and just move to the best amongst
them, does not matter, whether it is better or not, but I am allowed to move to the best
amongst them.

So, if I change this, now let us say, so this t is called the tabu tenure and this algorithm is
actually called tabu search. So, this particular spelling of tabu I mean, we are more use to
taboo t a b o o, but it the meaning is still the same. This comes from some I think Fiji
Island or Tonga or somewhere, where they use this spelling of the word. And basically, it
means disallowed essentially, it is a taboo move that you, you are not allowed to make
that moves essentially. So, if I am allowed to, if I decide on tabu tenure of t, let us say in
this example, let say 3 or something or 4 or something.

(Refer Slide Time: 14:43)

Then if I have changed this one, I could say that make this 4, everything else remains 0.
So, this is like an array, which tells me whether I am allowed to change that bit or not, if
that value is non zero, I am not allowed to change only. If the value is 0, I am allowed to
change. So, after the first move I am allowed not allowed to change the fourth bit, but I
am allowed to change any other bit. After the second move, this 4 will come down to 3

271
automatically and some other bit. Supposing, I change this bit, this will become 4
everything else will remain the same.

So, you understand what I am saying that, this is that t equal to 1, this is t equal to 2, then
t equal to 3 not this t, we will use a term t t, which is a kind of a more standard term. So,
t t stands for tabu tenure. So, my tabu tenure is 4 then and that is time, this t is time, the
first cycle, the second cycle, the third cycle. Then in this next cycle, this will become 2,
this will become 3 and some other bit will become 4 and eventually, this will become 1
and then become 0, which is when I am allowed to change it essentially.

So, this will go to 2 meanwhile, this will go to 3 and so on. So, after having change this
here, then for 1, 2, 3, 4 cycles I am not allowed to change it, but now it, the value is
become 0 I am allowed to change it essentially. There is just one way of implementing
this, you can insert simply for every bit keep a time stamp of when it was last changed
and do an explicit comparison with that, and decide whether you know, the current time
is more than four units from that time when it was changed, you could do it in either
ways actually. But, this is the kind of a standard way of doing this essentially.

So, for S A T you could maintain and array which traditionally we called it M, which
stands for memory, for T S P we could maintain a triangular matrix. So, this is 1 to 9 and
this is 1 to 9 for example, if it is a nine city problem and you could keep track of which
edge that, remember that every square in this will correspond to an edge. So, let us say
the seventh and the four, so the edge between 7 and 4 that I have removed that or
something like that. So, I can keep track of that in a T S P like problem as well
essentially.

So, basic idea in tabu search is to have this notion that you do not allow moves we are
made recently. So, that you do not go back to the same local maxima from which you are
trying to escape that is the basic motivation, do not go back to the same maximum that
would. Now, obviously if you look at this way of doing tabu search that you are
controlling which bits to change, then given any two bits. So, for example, I start with let
us say bits, two bits which are 1 1 to start with and then this gets changed to 0 1
essentially.

So, that means, I have changed the first bit and this is some substring here, I have
changed this bit, I am not allowed to change it for four units. Then after this let us say I

272
change this other one, 0 1 0, 0 0, I change the other bit as well. So, if I made this two
bits, where I change this bit, this substring has become 0 1, then I change the other bit,
this is become 0 0, I have lost the ability to move to one combination which is the 1 0
combination, because of this tabu, that I am doing that I am not allowed to change either
of this two bits for the next four rounds.

I cannot change this to 1 0 that will not be allowed essentially. So, I am moving I might
lose out on something, but in general, experimentally it has been found that, this tabu
search works well with these kinds of problem essentially, right. So, we just observe that,
you cannot move to this 1 0 from here essentially, what if that 1 0 really happen to be the
solution or something like that. So, actually the more detail tabu, I will search algorithm
allows you to make an exception to this barring of certain moves essentially.

(Refer Slide Time: 19:47)

And the exception can be made and then what is called as an aspiration criteria, which
confusion sometimes I get into spelling. So, the expression criteria’s says that, if all
allowed neighbors are bad. So, I will just write all are bad by bad we means worse than
current and a tabu move leads to in which is better than best, then allow the move. So,
obviously our goal is to optimize the valuation function or the objective function of that
we are working on and if we are getting access to a good move then we should not lose
it.

273
So, this aspiration criteria says that, if all this allowed neighbors, the once which are not
crossed out are worse than my current node c. And if one of those barred neighbors, a
one of the tabu neighbors is better than the best, by best I mean this thing that you store
the best solution that you have found, if I can find the node, which is better than this also,
than you allow that exception. So, tabu in general, recent moves are tabu, but we can
make an exception, if one of them in a bad situation, when the other moves are bad,
gives us a much better solution.

So, for example, if let us say value of this node is 27 according to some, let us say I have
a 50 clause S A T to solve and this know this satisfying 27 out of those 50 clauses and all
these nodes, which are disallowed or less than 27. And if these one of them happens to
be let us say 40, somehow and the best that I have seen is only 35 or something like that.
Then I will allow this move and that is an aspiration criteria. So, again you can see that
somehow this design of such algorithms is kind of you know little bit of an art, you are
trying to device algorithm, which will work which will give you good solutions and so
on.

And what tabu search does is that, it basically gives you a deterministic mechanism to
say that you can go past, local maxima and explore the state further essentially. So, it
does this by having this tabu tenure, which says that for a certain period of time do not
make the same move again essentially.

(Refer Slide Time: 23:25)

274
Another feature which has sometimes been used is called the frequency based method.
So, I have a frequency table, so all these line bits. So, let us say this was changed 18
times, this was changed 7 times, in 6 times and so on essentially, how many times did I
change that bit essentially? So, I can also bios the algorithm towards moves, which have
been made less often essentially. So, for example, if somewhere here, there is a bit which
has been change only twice in my whole this thing and everything else is large number
of times.

Then I may want to say that try and change this bit and see, if something good comes out
of it. Remember that, what these numbers are, these are the frequency of how many
times you have changes that bit essentially. This is simply like a counter, which tells you
whether you are allowed to change that bit or not. If you want to bios a algorithm,
opposite towards those areas which the heuristic function is not taking it to, why did you
not move this bit, because whenever this bit was generated, it is the noted generated was
not the best amongst the neighbors and so, it never got changed essentially.

So, if you want to push the search into that direction, you can bios the tabu algorithm by
sayings that modify your valuation function. So, eval of n, let us call it eval prime of n is
eval of n, which is the function that we are using to compute the heuristic function that
we were calling, but the out, but we are calling it evaluation function. Because it
optimization community calls with an eval function minus some constant times
frequency of n, this is not very good notation, because n is really known and here it is a
bit that is being saying the index of the bit.

So, let me just call it frequency of b n, the bit which makes leads you to this n th
essentially. So, as long as it is clear, we will use some notation here. So, essentially what
it is saying is that, you are your moving from c, you are considering this move from c to
n, from a node c to node n, when you are evaluating this node n, take into account. How
frequently this particular move has been made, which means which bit has been changes
and give a penalty for those bits, which have been change very often.

So, if the frequency is very high, the valuation value for the resulting node will be
decrease more, if the frequency is less than this. So, this is like a penalty, in our case it is
penalty for changing a bit too often, if you are changing that bit all the time then this
valuation function kind of penalize it and says that, no, no you have changed this bit too

275
many times. I will reduce evaluation values to by this amount essentially. So, in the end
of course, you can have this basic vanilla tabu search than you can add the aspiration
criteria to say that, sometimes you do not make moves tabu.

And then you can have general increasing bios towards newer areas by saying that more
frequent moves will be penalized essentially. So, this was the deterministic approach to
trying to escape local maxima. So, let us move towards stochastic methods or
randomized methods. So, what we have seen so far is the deterministic approach to
escaping from local maxima.

(Refer Slide Time: 28:03)

Let us look at stochastic or randomized method. So, we will look at couple of them, we
will start with some very simple thing today. So, the focus is still on exploration, how
can we make the search go onto newer areas? So, you must keep in mind this two aspects
of search, one is exploitation of the gradient, which is, what hill climbing does very
interesting, that it just looks at the neighborhood and goes to the best neighbor. What
tabu searched it was that it modify that little bit and allowed it to go to the best neighbor
even if it was not better than that.

Exploration simply says that somehow you must allow the algorithm to go into different
areas. And what stochastic or randomize method say is that give some degree of
randomness to the movement. So, just imagine this earth, the algorithm searching
through this earth space and exploitation simple says that the just follow the gradient and

276
now we are saying do not always follow the gradient, but do something different at
sometime essentially. Now, the extreme example of randomized movement is a random
walk.

And random walk, we can simply write as saying generate n is a random neighbor of c
and put this in a loop. So, a random walk basically just takes says that, just take one step
in some direction essentially. Of course, we can add on other stuff to this algorithm
saying like keep track of the best node that you have seen so far and that kind of thing,
but otherwise it is purely random essentially. From a given node c, it will just randomly
choose one neighbor and go to that essentially, no comparison of evaluation function
nothing essentially. But, of course you can keep track of the best one and so far and so
far essentially.

Now, obviously a random walk is not going to be a great way for solving an optimization
problem, because first of all, it is not even systematic, remember that we started out by
saying that some searches are complete or systematic, which means that they explores
the entire space. Hill climbing, tabu search they are not systematic, they do not guarantee
that they will explore the entire space and this is an extreme example, it just go of in
some random direction essentially. So, what we really need to find is ways which are
somewhere between hill climbing, hill climbing is an extreme of exploitation and
random walk is an extreme of exploration.

So, it is there is 0 exploitation in that, only exploration whereas here, in hill climbing
there is 0 exploration, in a sense that it never waivers from the path that has been shown
to it and there is complete exploitation. We want algorithms, which will be somewhere in
between essentially. So, today I will just give you the intuition of the algorithm that we
have going to study and in the next class, we will look at it in more detail. The basic idea
is make a random move with a probability, which is proportional to improvement in eval
n.

So, what I was saying here, that first of all I am talking about a random move. So, I am
no longer saying that generate all the successors, I am just saying make a random move
which means, you are at some given node c and choose a random successor n, just
somehow generate one successor n. But, move to that with a probability, which is
proportional to how better that move n is from c, as compare to c, how better that

277
candidate n is compare to c. So, the implication of this is the following that, I am not
saying that n should be better than c.

I am only saying that, if n is better than c then there is a greater probability of making the
move, if n is worse than c, there is lesser probability of making that move and secondly,
it depends on the magnitude. So, if I look at this value, eval n minus eval c. So, let us say
I am doing a maximization problem which means, the more positive this is, the better for
me. I want to build an algorithm, which will say that the more positive, this is the greater
should be the probability of making the move, but allow a move even if this is negative,
just to include the exploration feature, but with lesser probability.

So, I want to bios my search towards better moves, but I do not want to stop it from
making bad moves, bad moves meaning moves which decrease the evaluation function
values essentially, which go against the gradient essentially. So, we will look at the
details in the next class, but let me ask you one question at this moment, when I say
make a move with the probability. So, let us say this probability is p and I gave you a
value p, how will you make this move probably, when you if you want to implement this
algorithm, how do you make a move probabilistically in an algorithm that you are
writing

Student: ((Refer Time: 35:51))

You and

Student: ((Refer Time: 35:55))

So, you generate a random number in the range 0 to 1 and if that number happens to be
greater than p, you make the move, if the number happens to be less than p, you will do
not make the move. So, eventually the move is either made or is not made, but it is made
with a probability p and that, for that you have to generate a random number. So, what
are we after we are after an algorithm, which will have this stochastic play work, which
means that it will look at a neighbor and may or may not make a move, but it should be
bios towards better moves, but not barred from making bad moves.

So, somehow I have to devise a way of computing this p as the function of this
difference in the evaluation function, in such a way that this behavior is manifested

278
essentially. So, maybe I will ask you to think about this and we will write it is not a very
difficult thing to do. The only thing you have to be careful is that, this being probability
it should come in the range 0 to 1 and it should satisfy this criteria that the larger this is.

So, if you want to plot this on x axis then the more you go to the right hand side, the
probability should 10 to 1 and the more you go to the left hand side, the probability
should 10 to 0 essentially. So, think about a function which will do that and we will
specify it in the next class and the algorithm which is based on that. So, we will stop
here.

279
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Lecture - 14
Optimization 1 (Simulated Annealing)

Let us get back to our local search. And the key words, that you must keep in your mind
is Optimization.

(Refer Slide Time: 00:21)

Remember, that we converted the states space search problem to on optimization


problem, by saying that we want the best value for the heuristic function. And
essentially, we are developing these local search algorithms, to explore the surface define
by the heuristic function, which we are now calling evaluation function. Because, that is
for the optimizing community call set. And looking for minimum and maximum value,
as a case maybe…

And because we are exploring local search algorithms, which means that, the algorithm
will not necessarily search the entire space. But, we will terminate, using some criteria
before that especially. We have the problem of local optimum. And essentially, we are
exploring these different algorithms to try and get around this problem of local optima.
Because, the simplest algorithm that we saw hill climbing is very efficient, it requires
very little space.

280
In fact, a constant space or being search for that matter, which also requires constant
space. And it will just go up the slop and stop. It will not take an exponential amount of
time. But, the travel is that. That is not, what we want? We want an global optima, which
will not be the maxima, that the hill climbing will find. So, we have looking at ways of
getting around or getting beyond those local optima’s. So, therefore one method that we
saw in the last class was ((Refer Time: 01:55)) essentially.

And we have said that, hill climbing we will associate with exploitation. And by this, we
mean exploitation of the heuristic function or exploitation of the local knowledge or the
gradient, that you can measure the gradient around you. And you will always follow the
steepest gradient path. And this, we will term as the exploitation of the gradient
essentially. Now, optimization does not happen only in the computational world,
essentially I mean.

There is optimization happening in the physical world as well and in nature as well. We
will see some examples from nature and from physical world. So, in physical world for
example, when we talk of materials, you want to produce materials of certain quality.
And very often, this quality is dictated by the way, it is atoms are arranged. So, for
example, if atoms are arranged in the nice array like structure, then you get good
crystalline materials or if atoms are arranged to the close to each other in a metal, then or
in a ceramic. Then, you get material, which has good properties like good strength of
material and things like that. Now, the way to arrive at materials with good properties,
very often involves creating a liquid version of that material and then solidifying, it is
essentially. So, casting for example, you make a mould. And then you melt the material
from which we want to make the cast.

And pole the material in to the mould. And then it solidified and takes the shape of the
cast essentially. This is how, many of the ((Refer Time: 04:17)) statue that you see, are
made essentially. And there are many other things, which are made by casting. Now, as I
said, the properties of these materials is dictated by the way, that these atoms are
arranged in the… And it is desirable to arrange the atoms in a systematic fashion, which
in other words corresponds to low energy levels, essentially.

So, you want the final form to have low energy. And in that sense, you have a
minimization problem. So, materials you want to be in minimum energy state. So, you

281
can imagine that, if some material is in gaseous form, it is got some maximum amount of
random movements happening and more energy. And as it becomes liquid, the energy
levels come down. And as it become solid, the energy levels come down.

But, even in solids if we can arrange the matter in to minimum energy levels, then we get
good properties of materials.

(Refer Slide Time: 05:43)

And typically, the process that is used for minimum energy is call, annealing. And by
annealing, we basically mean controlled cooling. That you do not allow to cool the
material, at it is own space or in some sense. But, you make it cool slowly at a control
rate. So, that the atoms settle down in to the minimum energy state. So, annealing is a
physical process of minimization, which is a kind of an optimization problem, that we
are also talking about. And we will take some inspiration from annealing, in a few
moments.

282
(Refer Slide Time: 06:24)

You know some ethic called as random walk, which we associate with exploration. So,
random walk as a name suggest is that, you take a step in any random direction or make
a move to any random neighbor essentially. And there is no other no criteria, which will
tell you which neighbor to move to essentially. Now, hill climbing you generate the
neighbors of a given candidate. Inspect the evaluation heuristic values or evaluation
function values, objective function values.

And choose a best amongst them and move to that essentially. Random walk is simply
just taking a step in any random direction essentially. Now, obviously you can see that,
random work has it is property of exploring the states space or the search space,
whichever the search space. It could be the solution space. But, of course it will not be
guided. It will not have any desire to reach our state, still at maximize essentially. Just
keep going one and one.

So, when deterministic way that we saw of going beyond maxima in hill climbing, we
saw was Tabu search. But, today we want to look at a non deterministic or the stochastic
or randomized approach to exploring the space. And our goal is to have a judicious mix
of exploration and exploitation. Exploration, because it is exploration which will stop us
from getting stock at local optima, and exploitation, because it is exploitation which
takes us to optima in the first place essentially.

283
So, it is just at what we want to reach the global optimize, as far as possible. So, we do
not want to have pure exploitation. Because, we could have started in the wrong place
and you would end up in a local optimize. So, as I mentioned in the end of the last class,
we would want to make a move with a certain probability, which is control in such a
fashion. That if the move is a good move, then the probability is high. And if the move is
not a good move, then the probability is low.

(Refer Slide Time: 08:56)

So, we will use the notation c as current, as we did in the last class, current node and n as
a next node. In hill climbing, n is the best amongst the neighbors. In random walk, n is
any random neighbor of c. So, there is a relation between c. So, this is c and then there
are many possible neighbors. And one of them is n. Associated with this c and this n, we
will have their heuristic values or evaluation values, which you can call as a eval c for
this node c.

Therefore, node n we will have eval n. And if we have maximizing, then we want eval n
to be greater than eval c. If we want minimizing, then we wanted to be smaller
essentially. So, which means that hill climbing would have chosen the one with the
maximum eval n and value around here. But, now we are given up this strategy of
inspecting all of them, and then choosing the best amongst them.

We are inside adopting a strategy, in which you will just take of random neighbor and
either move to it or not move to it, essentially. So, this is the choice, we will make that.

284
We will either move from c to n or we will remain at c, which means that at, we will
generate another random neighbor at that point. And again, either move to that or not
move to that. It is a little bit difference from, what we have been doing so far, essentially.

In random walk, we always move with 100 percent surety to the next neighbor. But, we
do not want to do that. We want to move with a certain probability. And the probability
should be such that, for good moves it should be high and good move. So, we will
associate that term delta E as eval n minus eval c. And the equation, that I am going to
write is going to be for maximization. And you can just flip the sign for minimization
essentially.

So, this delta is something that, we want our probability to be influence by. So, should
we make this move from c to n? It depends upon, what is the delta E is? If delta E is
positive, then I would more likely want to make a move with the high probability. If
delta E is negative, which means that n is worst than c. I would still want to allow that
move to be made, because that is our goal to get of maximum essentially. We do not
want to stuck at maximum.

So, we will allow even bad moves, which means with negative delta E to be made but,
with a lower probability essentially. So, the function that we will use for computing this
probability will be a function of delta E. And one more parameter, which will be you will
used to control, how this delta E is influences the probabilities essentially? So, we want
two things. The delta E should influence the probability. And secondly, I want to control
how delta E influences probably?

Did any one give a thought to this function? Or you might have write about it, some way.
So, the function that I will use… So, what is the nature of this function? We want the
function of delta E. The function should be range should be limited to one. It should be
between 0 and 1, essentially. Because, it is a probability, we want to measure. We want
to get a probability measure. It is domain should be infinite, practically infinite.

Because, I am not going to put any constraint on, how much is the difference between
the evaluation function of two nodes? It could be a, to any degree essentially. So, we
want a function, which will stay within this. So, let me draw that function here. I mean,
let me draw the domain at least. So, this is what my destiny is... So, this is going to be 1,

285
this is going to be 0. And here, let us say delta E is equal to 0. And here, delta E is
increasing. And here, in the opposite direction it is decreasing essentially.

Delta E is equal to 0 means, the next node is as the same value as the current node
essentially. So, obviously you can imagine that, you do not want to, you do not care. You
can move to it or you do not move to it. You are getting the same evaluation function
essentially. So, we want the function which is going to be a monotonically increasing, as
you go from left to right but, stay within these nodes. So, you can imagine. They could
be. There is more than one function, that you can choose to do that.

(Refer Slide Time: 14:21)

We will choose one. And that function is called the sigmoid function. So, we will write
this as the following that P. It is the probability of making the move. So, let me just say P
c n. If we interpreted, as saying that P c n is a probability of making a move from c to n
is 1 over 1 plus. This is basically the sigma at function. And like as I said, we will have a
second parameter, which will allow us to control, how much delta E influences the
probability?

Because, we should be able to work at any in some sense, place in this spectrum. And the
spectrum starts with hill climbing, which means that the moment delta E is positive. You
will accept it, with probability one. The moment that is negative. You will reject it, with
probability one or accept it with probability zero. And on the other hand is a random

286
work at the other extreme, which means that it does not care, whether delta E is positive
or negative.

You will always accept it with probability 0.5. But, we want to be able to operate,
anywhere in this range essentially. So, we need a second parameter. And the second
parameter, traditionally is call T essentially. So, this is a simple algorithm, generate. So,
n is a random neighbor of c. So, let us assume. You have a function called random
neighbor of c. And then we say eval delta E.

According to this formula, eval n minus eval c and move with probability. So, it is a little
bit like the random work. Accepts, if I may use the analogy of somebody being
inebriated. This person is little bit too inebriated. That sometimes, he does not even make
a move ((Refer Time: 17:14)). So, sometimes he makes a move, sometimes does not
make a move. So, sometimes he goes some c to n and sometimes, he just stays there and
after while, he makes another moves and some other moves essentially.

This algorithm is call stochastic hill climbing. So, I will just write HC for hill climbing.
It is not purely hill climbing. Hill climbing would only go in the direction of the steepest
radiant. But, it has a tendency to go in the direction of better values. So, we have still
talking about maximization. So, it still has a tendency to go up. So, let us inspect some
values, which I have got from this book. So, how to solve it by ((Refer Time: 18:01))?
So, there given some examples. So, I am just taking their example.

(Refer Slide Time: 18:15)

287
So, what is the effect of delta E. Let us assume that, T equal to 10, some value. And we
are using that, value 10 here to compute this delta E essentially. And let us assume that,
eval c is equal to 107, some value. And values better than 107 are good, values less than
107 are bad for this essentially. So, let construct a small table of values and say, how it
effects? So, this is a value for eval n. So, if this is 80. That means, it is a not a good
value.

It is less than 107. So, delta E so I write minus delta E here. Minus delta E is 27. Then, e
raise to minus delta E by T, e’s value is 14.88. And this is the probability. This is 0.06.
So, if according to this formula, and this function is called sigmoid function. If my eval n
equal to 80, then I will move to it to the very small probability, as you can see 0.06
essentially. So, 6 times out of 100, I will move to that node. Otherwise, I will not move
to that.

And that is, what we want essentially. You want the search to focus more towards better
moves and less towards, you see. Let us look at some more examples. If this is 100, this
is little bit better. This becomes 7, this becomes 2.01 and this becomes 0.33. So, we are
studying the effect of, how this function sigmoid function response to changing delta E
essentially? So, this is still a bad value because, we are starting with eval c equal to 107.

We are still going 7 points lower. And it will moving to that with one third probabilities,
essentially. Then, just for the sake of completion, you take this. This happens to be 1.0
and 0.50, which is nice. Because, what this function is telling us? Is that, if eval n equal
to eval c you may or may not move to that, which means you move to that probability
0.5. And you can stay back at with probability 0.5, which is what we would expect?

Now, let us took at better values 120, which becomes minus 13, 0.27 and 0.78. So, if you
find the better value, then we will move to with probability 0.78. How do you move it to
the probability 0.78? We discuss at the last class. You generate ((Refer Time: 21:37))
number in the range 0 to 1. And if that number is less than 0.78, you move. Otherwise,
you do not move essentially. Less, you know equal to 0.78. One more value 150, which
is much better.

So, this table basically which I have taken from this book. Illustrates, how stochastic hill
climbing response to different values of delta E. As you can see, if delta E is high and
what you have written here is negation of delta E. If delta E is high, if we get a good

288
improvement, it makes some move with greater probability. Otherwise, it makes a move
with the lesser probability. All this is for, this value of T equal to 10 essentially.

How do you choose this value? It is a next question, because we will see in a moment.
What is a influence of T essentially? I will, let us first finish that and then we will do the
discussion. So, now let us assume that we have looking at this case. So, not this case. We
are looking at this case, which is the better case. And let us see that if eval n equal to 120
and of course, eval c is 107, how does temperature affects the probability?

So, let us look at different values of T. Then, we have e is to minus 13 by T. And then we
have probability P. So, let us take a low value and we will assume, 1 is a low value. This
comes to and you can do this yourself actually. So, let me first write out this values. With
5, it is... So, this is how temperature of it is, this. So, I have used our term temperature.
And in fact, that there is a connection with annealing, that we will shortly see.

So, we will call this parameter, temperature. And delta E is also stand for energy in some
sense. Of course, in this case you want to maximize. Otherwise, you would have a
formula in which, this goes to negation of… Instead of minus delta E, you would have
plus delta E, if you wanted to minimize. So, how does this behave? Let us see. At very
low temperature or very low value of T, the probability is 1. So, you can see that, this is
like hill climbing.

If you keep this parameter low, then it tends to behave like hill climbing. Well, in the
sense that if you see the better more, it will make it essentially. Then, as temperature
increases and this is where the analogy with real world materials comes in to play. The
energy levels increase. The entropy keeps increasing and so on so forth. Or randomness
keeps in improving. And at very, very high temperatures, this is 10 is to 10.

The probability is 0.5. And you can see that, there is a bit like random walk. So, we can
control the behavior of our probability function by controlling temperature. If you want it
to be more random, we will keep high temperature. So, as you are above 0.5, it is
approaching a random walk. It is more random essentially. Irrespective of, what value of
delta E is? It will be 0.5 essentially. So, it will be randomly make a move essentially.

If you wanted to make it, so if you want to explore more, you keep the temperature high,
making to behave like a random walk. If you wanted to follow the radiant, then keep the

289
temperature low. Making it, behave like a little bit like hill climbing, essentially. If it
sees that, well hill climbing in the sense, hill climbing always moves to a better spot. The
actual algorithm moves to the best amongst them.

But, since we are generating only one, we will take this as an approximation to hill
climbing. So, it would move only, if the node n was better than node c, if temperature
was low. If temperature was high, it would less move, irrespective of whether. So, as an
exercise I would ask you to look at, either of these values and construct the table for T.

And you will see that, there would be similar thing. That they would, at high temperature
they will, it will still converts to 0.5 and at low temperatures, it will converts to zero,
towards zero. Then, it will not move it all, essentially. If we are getting a worst, these
two values are worst in this and these two values are better than this, essentially. So, we
have noted. We have seen the example of a better value.

So, how does this sigmoid function look? If you are to plot it, it depends on the
parameter temperature ((Refer Time: 28:51)). So, let us say, this is the value of 0.5
through this. So, this is all curves was passed to this point, essentially. Why because? We
have said here that when delta E is equal to 0, it gives us a value of 0.5. When, you can
see that, this is going to be irrespective of temperature, because this term has become
zero ((Refer Time: 29:13)).

So, irrespective of temperature, this value is going to be 0.5. So, all the curves for
temperatures will pass through ((Refer Time: 29:20)). When temperature is low, this
curve looks like this.

290
(Refer Slide Time: 29:28)

Basically the step function. This is T equal to 1. When temperature is high, this curve
looks like a straight line, here. This is, what T equal to, this is 10 is to 10. It is always
0.5. Otherwise, it gives us a behavior, which looks like the following. So, a typical
sigmoid curve will cross this at some place and then go down towards zero here. And go
up towards one place. So, this is a typical shape of a sigmoid curve.

And it is nice for us, because we want a function, which will have the range of 0 to 1 and
domain which is infinite. And as we change the values of temperature, the shape of this
curve, changes. So, with a different temperature, the curve may look like this. So, as
temperature goes down, this curve, this slope become sharper and sharper, like this. And
as the temperature goes up, it becomes flatter and flatter essentially.

So, this is T raises to 10 is to 10. Somewhere, you know there will be a curve, which will
be like this. So, there is a series of curves. So, this is a direction of increasing T. The
curves become flatter as T. This is just a nature of this function. We are just looking at it,
to get some insides and towards happening essentially. Now, instead of saying that, I will
make a choice of the value T. We follow, what is done in the physical world, which is to
say that, I will cool down the system gradually. And hope that, it will settle into an
optimal state essentially.

291
(Refer Slide Time: 31:57)

So, if we remove this and put in a clause, where we initialize T into some values, let us
say very high. Whatever that very high value is… And very often, the algorithm has two
loops. There is one loop in which, you do this loop a few times at a constant T. Some
people, call it as e power something. But, that does not a matter. It is a inner loop, in
which you do this whole process. What is this process?

Generating a random neighbor and either, moving to it and not moving to it, depending
on how good that neighbor is given by this probability here, essentially. You keep doing,
this as a certain number of types. And then you change T from monotonic decreasing
function. So, something, some function. The simplest is, T is equal to T minus 1. But,
that may not necessarily with the best function, you know. This is a very empirical
process.

This function, that we are looking for determines. What this people call it as a cooling
lead. And this is, at most of the times determined empirically essentially, which means
that you take a domain, do some experiments and try to see, what works and what does
not work and then try to keep that value. And then this is a outer loop. So, you do in an
inner loop, you do this some number of steps. Then, you decrease the temperature and do
some more number of steps essentially.

So, what is the intuition behind this? And simulated annealing. So, it is called simulated
annealing. It is a algorithm, very well known algorithm call simulated annealing. It is

292
very popular in the optimization community. It is used very often. And this is basically
the algorithm. That you started at the high temperature, which means you start with high
random movement. Allow the system to go to any candidate.

But, gradually bring down the temperature and make it behave more and more like, hill
climbing. What would be the intuition? Why should it work? What is the idea behind this
decreasing of temperature? How does it help? Decreasing the random means, therefore it
does. But, what is the intuition behind? Why does it work? Why does it help us find,
good values good.

Student: ((Refer Time: 34:55)) local meaning.

But, remember that towards the end, when T becomes low, it is going to behave more
and more like hill climbing. How do you account for that fact? It is true that, initially we
wanted to behave randomly. And then later on we wanted to follow the heuristic function
or all essentially. What is changing over this period of time?

Student: ((Refer Time: 35:31))

If that is high initially, why do you want this high initially and what is the effect of this
cooling? You know, I heard that somebody say that, this is like one of those toys, which
has the concerting rings. We might have seen with some small balls of wall bearing like
objects inside them. And you have to move them all towards the center. And this person
who is a respected computer scientist said that, you know.

It is a bit like, you first move it randomly and then gradually. You let it try to control the
movement. But, you think that object will little bit or something like that. So, let we give
you an intuition behind this. So, what is the claim that we are making? The claim that we
are making is that, at many problems this gives us very good solutions, which means it
gives us close to optimal solutions essentially.

What are the kind of problems, where this works, were the surface is gadget essentially.
Where hill climbing would have got stuck at first local maxima or minimize, it is found?
But, this algorithm can get beyond many local maxima and move towards better maxima.
So, what is an intuition to say that, it moves towards better maxima? So, this is a basic

293
idea that, I would not make you to think about. Let us assume that, that is… So, there is
this big gadgets surface.

(Refer Slide Time: 37:02)

So, let us say. This is how your one dimensional surface looks like, essentially. Its gadget
in the sense that, hill climbing would, depending on where it is started, hill climbing
would have got stuck in one of these local maxima. But, we are saying that something
like simulated annealing, will go very close to the global maxima, essentially. So, the
idea is this essentially. That, take a certain snap shot of this, which looks like this.

It is like that, one of these things here. Let us call this point A and next, call this point as
B and next call this point C. And we want to maximize which means, we want this value
to be high, higher. We want to go to C rather than to A. We want to go to the top of this
essentially. Now, in this one dimensional world which means, you know these are the set
of candidates and these are the heuristic functions and you can move either left or right.

You want to not get stuck at A but, you want to be at C. And then this I have given to
apply all over this slope, essentially here. Now, you can see that, to move from A to B, it
has to overcome an energy gap, which is this much, which means when I say overcome
the energy gap, basically I am saying that, it is going against the heuristic function. You
want to maximize but, you want to go down here.

294
You want to go down here, maybe two three steps depending on, what is the granularity?
You want to take two three steps down there and end up in B. And of course, then after
that, it will have a natural tendency to go up because, that is the natural tendency of this
algorithm. If it sees a better move, as we have seen here ((Refer Time: 39:01)). The
better the delta E, the better the probability of moving to that move, essentially.

So, it is a natural tendency of going here. So, to go from… So, let us call this E AB. And
likewise, let us call this E CB. To go from A to C, it has to overcome an energy gap of E
AB. To go from C to A, it has to overcome an energy gap of E CB. So, just imagine that,
there is somebody with the helium balloon tied to that first. Let us try to visualize, trying
to pull him up essentially. So, he has to walk down all this way. Pulling that helium
balloon which is a greater distance, essentially.

So, a greater energy gap essentially. So, to go to from A to C, you have to overcome this
energy gap. And then of course, naturally the algorithm will take you up, because it has
this tendency of going up. To go from C to A, you have to emerge the greater energy gap
essentially, which is this E CB essentially. As you works for this, you can certain
comment. The idea, which says that, there is a greater chance of it, moving from A to C,
essentially.

In general, not initially. Initially, when the temperature is high, going up and going down
is all the same for this algorithm. It will just do any more randomly. But, as the
temperature goes lower and lower, it is still not hill climbing. It is still some intermediate
temperature. But, either intermediate temperature as you can see, ((Refer Time: 40:42)).
The probabilities, unfortunately we have drawn the probabilities for negative moves.

But, maybe we should do that and see that. That, as the temperature goes down, the
probability of making negative moves goes down essentially. As a temperature becomes
one, the probability or negative moves tends to zero essentially. So, as a temperature
goes down, the probability of moving some energy gap will depend on the energy gap
essentially, And since this gap is larger than this. It is no likely, that algorithm will go for
A to B and then to C.

And remember that, B to C is a kind of automatic. Like in the sense, there is a most
likely behavior, because of this thing. And it is less slightly, it is come down from C to
B. And then go to A essentially. And this is specially the case, as temperature is brought

295
down essentially. So, you can see in some sense, you can see that temperature it is ability
to go down in this slope essentially. As temperature is high, the algorithm has higher
ability to go down or you can say, go down greater distance essentially.

As we gradually reduce the temperature, it is ability to go down in this. Remember, this


is maximization problem. Decrease this, which means that it is easier for it to go from A
to B, then it is come from C to B, which means that it is more likely to end up in C.
Because, after the temperature has become really down. If it happens to be here, it will
go to C. If it happens to be here, it will go to A essentially.

That is hill climbing part of it, essentially. So, this is a kind of intuition behind simulated
annealing. That initially, you allow the algorithm to explore many different parts. But,
gradually bring down the temperature. In the process, in some sense pushing it towards
the global maxima, essentially. Pushing it towards higher peeks, in this example
essentially. Why because? It is at any given temperature, which is not infinite.

I mean, if you assume 10 is to 10 is infinite. It is more likely that, it will climb down
form it is local peek, then it is from a higher peek, which means it is more likely, that it is
a little climb up to the higher peek from a local peek, which means that this is very
likely, that it will go from A to C essentially at any given temperature. And eventually,
you bring the temperature down. So, that it is a kind of just goes to that nearest peek that
it is that.

Actually, there is a simpler version of randomize algorithm, which I have not mention.
But, this is probably a good time to mention it. That algorithm is called iterated hill
climbing. And what it essentially does? And this is, this works in the solution space. So,
something like, if you are solving sat or solving TSPS or something like that, where you
are looking for a solution. You are not looking for a path from the start state to the goal
state.

In fact, that there is no notion of go start state. So, ((Refer Time: 44:06)) for examples.
There is no start. Unless of course, you think of the empty board as a start state. But, that
is not really critical to the solving the problem. So, if you want to solve ((Refer Time:
44:16)). You have to find with the final solution, somehow with the other essentially.
And so when we look at the solution space search and perturbation methods, you could
start at any random location, essentially.

296
So, again consider SAT as a example. You can choose any supposing, you have n
variables. You could choose any assignment for those n variables. And that is a starting
point from which you apply those operators, that we discussed earlier, essentially.

(Refer Slide Time: 44:48)

So, what iterated hill climbing says? You essentially, that choose a random state of
starting point random means. So, in this example, it would mean choose one here, choose
one here, choose one here, choose one here and so on. So, put it in some kind of a loop.
Choose random start. And simply do hill climbing. Do not do anything else. Just do hill
climbing. So, you can imagine that, if I want to do this, then this starting point will take
me to this maximum.

This starting point will take me to this maximum and so on and so forth. And if you have
chosen a sufficiently large number of random starting points, then one of them is likely
to hit global maxima, essentially. So, that is a hope. Of course, it depends on the nature
of the surface. If it is very gadget, then you may need many, many starting points. But, if
the surface is like this, then you can see that if we start anywhere between this range,
means anywhere here and you will end up with this global maxima essentially.

So, iterated hill climbing is very simple algorithm. It says that, do many hill climbing
searches. But, start at different randomly selected points and because, hill climbing is
simple. Computationally, this algorithm is simple essentially. You can also think of this
as a parallel search. That you start all these searches at the same time and all of them

297
happens in the same time. So, after the end, you will just find what is the best candidate
that you got? And you, that is an answer essentially.

So, there is something to be said with more than one candidate, searching with more than
one candidates. So, we have already seen a couple of examples. Beam search, searches
through. Always selects be best candidates in the next level and so on, if remember. This
is little bit like beam search. In the sense that, except that you ,these things are not
allowed to interact with each other.

So, this will have its own trajectory and this will have its own trajectory and so on. But,
they are the be parallel searches taking place, essentially. Does it make sense for two
candidates to interact in some way? If you are doing a search in which, there are many
candidates like a parallel search, like this. Does it make sense for two candidates to
interact? We will address that, in the next class which will be and we will stop here, with
this. So, that is a new algorithm that we will look at, which is based on working with a
population of candidates, which are allowed to interact with each other in some sense,
essentially. We will stop here.

298
Artificial Intelligence
Prof. Deepak Khemani
Department Of Computer Science
Indian Institute Of Technology, Madras

Lecture No. - 15
Optimization Ii (Genetic Algorithms)

Let us go back to nature today, and remember we are doing optimization. So, does nature
to optimization?

Student: Yes sir, yes sir, n colony optimization.

Somebody said yes. Yes, but we are not going to say n colony optimization, that is an
algorithm that if you, may not get time to look at the today probably in the next class
essentially, what is it, what is optimization happening there?

Student: Natural selection.

Natural selection, which is different from n colony optimization.

N colonies are known to is very good at finding food, if you leave a anything which
edible you know, a biscuit or lump of sugar then, you can be sure that you know, within
some time ants would be there, somehow they find it, you look at that in the next class
may be I think, we will go to natural selection.

So, you can think of, of course, many of us considered human beings to be the pinnacle
of life forms. So, we can say, we have come through a process of natural selection which
of course, is attributed, the theory is attributed to Charles Darwin and you must have
heard about the well known phrase, survival of the fittest, but what is nature doing
essentially.

So, let us set of step back and see what is it trying to do? So, we can say it is trying to
design live forms essentially. So, we have whole wide spectrum diversity of life on earth
starting with cockroach, not starting with a cockroach, starting with you know, some

299
amoebas, and ants, and birds, and animals, and so, we think that we are on top of it.
What would be the objective of this design process, what is it trying to designed?

I mean if of course, we are trying to sting by sting nature is like thing to design
something. Have you heard, how many of read anything by Richard Dawkins? Which
one, which book have you read?

Student- The God delusion.

The god delusion, that is one of his most recent books I think, in which he says that he is
kind of anti institution of religion, but any way his one of the earlier books was called the
blind watch maker, it was quite a well known book, it quite a hit in that time. The
argument in the western society was whether or not there was a god who created the
universe and the earth and human beings at the center of earth.

And the argument was that, you know that everything is so perfect, the world is so
perfect, we just have the right amount of oxygen in our air and we have everything that
we need, we have water flowing and we have day and night and all this kind of stuff, that
there must be a watch maker. If the universe functions so perfectly like a Swiss watch for
example, Switzerland is known for its mechanical watches then, it could not have come
out of random.

So, this is an arguments against Darwin. Darwin said that we came out of evolution. So,
people would aghast at the notion that we have evolved from apes. Sometimes I think
that the apes might be more aghast at this notion. The idea was that, such a perfectly
functioning world must have a maker behind it of the watch maker, like a watch maker.

But Dawkins and Darwin, they are all on the same side, they say that there is a built in
ratchet mechanism. In a sense that if you find some improvement somehow then, that
improvement sticks around essentially. So, there is a ratchet mechanism, if you by
mistake or by some chance event designed something which works then, it has a
tendency to persist.

300
(Refer Slide Time: 05:21)

There is a person called Steven grand is somewhere in the UK, he wrote this book call
creation, which the institute has one copy which is with me at this moment. If you want
to have a look at it, you can come and borrow it I think. You know I have heard about
creatures, these are computational ((Refer Time: 06:00)) of people.

Life forms that you create in the so called cyber world, artificial fish, artificial this thing,
artificial that, in the creatures I means. So, the term creature, if you just look up creature
on the on the net, you will find some sites, one of them is maintained by Steve Grand.
So, he is the, in this book that he wrote, he wrote something very profound at least I
found it profound and it is that whatever persists.

Just imagine a random world where there is no watch maker, there is no creator, he says
that see this is at this is a kind of tautology whatever persists, persists. Obviously, I mean
that is what we say well, this is the fundamental thing behind how life has emerged out
of chaos I might say essentially. I mean after big bang I suppose there were chaos for the
lot of time and so, if you follow what Steve grand is saying then the, you can say an
equivalent statement at the goal of life.

So, if life is you know some driving force or something is to live, and by live essentially

301
e mean persists essentially. But there is no, one cannot describe some external high level
goal to life, it just happens at we happen to persists so, we are their essentially, a goal of
life is simply to be there essentially. Now, you can say that nature is experimenting, I do
not really need to say.

Nature is, again when you say nature we do not want to anthropomize again, nature is
not like god or some devata or somebody we doing something. Nature is some term that
we use for everything that is around us. A set of creating different designs for life forms.
At least we can ((Refer Time: 09:01)) like that nature is doing it, see very often people
for the sake explanation describe a design motive behind things which are happening.

So, why did we develop a neocortex? Because you know, it helped us reason at a
symbolic level I think like that essentially, but of course it is not, there is no motive
behind it. It just so happens that we walled a cortex which allowed us to do thinking so,
we are thinking essentially. So, which is what Steve grand says? Whatever persists
essentially. And this a basic mechanism behind nature’s effort to design life forms, and
the goal of this life form is nothing but to live, to persists essentially, and that is our
((Refer Time: 09:56)) instinct to persists.

So, if you what walk out of this building and you see somebody with a gun pointing at
you, the first thing you do is ((Refer Time: 10:07)), that is our inbuilt nature to persists
essentially. And if you think of it in combination with what is Steve grand say just
imagine that in the initial chaos there were only atoms floating around then, molecules
form then, molecules combined into forming bigger molecules.

And then, eventually organic molecules came into being which has this tendency of
surviving and sticking together and then, of course making copies of themselves and so
on and so forth. Things which happened, happened and those things which survive,
survive which is what Darwin called as survival of the fittest essentially. Of the many
things that you are trying out, some things survive and they are the fittest essentially.

So, when you say survival of the fittest, one has to you know look at it from a careful
perspective, there it is not as if your first determining fitness and then talking about

302
survival. It is a other way around fact essentially. When you say survival of the fittest,
what we really mean is that, the once who survives are the fittest essentially.

Now, these life forms compete with each other. What do they compete for? They
compete for whatever resources that they need essentially, material resources, primarily
material resources like food and so, on. And there is a limited amount of food and
remember that food is a building, food provide just a matter for building our bodies
essentially. So, if a child grows up it is by eating food and you know having that food
transform into various things inside his system.

So, they compete for resources which mean that, some life forms succeeds and some life
forms fails essentially. Just imagine that there is a bunch of people in a room, let us say
there are 50 people in the room and there are only let us say 20 cakes or something like
that, and of course you cannot divide the cake such you know.

Then, there is some mechanism by which people are given cakes. So, some 20 people
will get cakes and 30 people will not get a cakes and to be taking a next game point of
you the is a very extreme you, but in some time what is happening in the nature
especially that different life terms are like different species.

(Refer Slide Time: 13:00)

303
And nature is experimenting again I would like to set of question that we say nature is
experimenting it, nature is a not a like a person setting of a there is a experiment and
these things a happening on their own and making view them as experiment essentially.

So, how is nature doing? It is nature is doing generate and test. It is searching in,
remember that we started off by saying for the basic paradigm of search algorithm that
we had look at this generate and test. Generate a candidate and test, you know the
candidate, the solution that we had a smaller problem, nature does not have a goal.

So, we do not have a goal test function, but by testing we mean something else here. So,
what do you mean by generate? So, there was a French poet, I do not know remember his
name unfortunately right now whose ((Refer Time: 14:03)) he said this process that a
world, this process of evaluation is that one makes up the combination and the other
chooses.

So, there are two forces which are applying, one is which is making up the combination,
which we will call as generate, and test is basically other chooses. So, these are two
things essentially. What you mean by the other chooses? Listen that there is a force with
natural selection which says that it will select some people who will survive and will not
people some creature will survive and some which will survive essentially.

So, there were, there were selection happening which is like a text function and there is a
process of generating, creating new features is happening. And so, just imagine this
world in which new creatures and by creatures I mean individuals, I been created all this
time. And each of them is competing with other creatures, they may be from the same
species or they may be from the different species.

But everybody is competing for food essentially. So, at the fundamental limit, I mean
that is so strongest basic urge is to eat essentially, and that, where that we cannot survive
essentially. How does these combinations come out? Let us talk about other chooses
before we do that essentially. So, just imagine that there is a rabbit o there is species of
rabbit and there are these rabbits on field and they are being chased by foxes essentially
also let us say on the same field. So, there is a relation between them, the fox species and

304
the rabbit species. And I will use this as a, which denotes the positive influence on the
population essentially.

So, this is like some people call influence diagrams. How does the population of rabbits
influence the population of foxes? We say that there is a positive influence, beside the
more the rabbits are there out there, the more the foxes will get to eat them and therefore
the foxes will survive essentially. So, the more, there is no famine nor ((Refer Time:
16:54)).

And of course there is a negative influence of the population of the foxes and the rabbits
essentially. There of course there is this whole eco system out there. So, when you have
foxes then, you have let us say tigers, when you have tigers you have let us say bacteria.

And ((Refer Time: 17:24)) familiar with this necessity you know. The birds help
pollinate the flowers and there is the dependence, the whole eco system there, it is a
stable eco system then, which every population is having some influence on other
population essentially.

So, now a day’s people one hears that is suddenly bees vanish from this world, you
might have heard some news item. The bees vanish when it will affect certain food chain
which means we stop getting things of a certain kind. So, we have this stable eco
systems, but we also have catastrophic on the way essentially.

So, like 60000 years ago or whatever when the predominant life form was we ((Refer
Time: 18:20)) so on. And mammals were you know small smelly creatures hiding from
((Refer Time: 18:25)), but suddenly something happened and we do not quite know what
happened. And ((Refer Time: 18:30)) became extinct and mammals become prominent
and then, human of course the most prominent of those all essentially. So, we are talking
about competition.

305
(Refer Slide Time: 18:45)

Thus competitor competition between species. So, let us say we have rabbits and cows
and goats and sheep, all trying to go for some ((Refer Time: 19:03)), I do not know
whether the rabbits eat grass, but at least cows and goats and sheep all competing for
grass essentially so, there is a computation between species. And there is also a
competition within the species. So, we are going to focus on this, competition within the
species. What do you mean by this? So, just imagine again this bunch of rabbits on this
field and let us say 2 or 3 foxes come running and the rabbits all run away.

Or more likely if you have watched some ((Refer Time: 19:39)) it is some set of bunch
of deer and one or two lions come running then, all the Deers are running away. And the
lion catches a in some sense of weakest one essentially, the one which cannot run away
fastest so, the lion catches the weakest one and the fastest deer they escape essentially.

To ((Refer Time: 20:02)) leave to tell let us say another team. So, there is competition
within the species. So, if there is a prey and the age of species which is preyed upon,
there is competition within the species to escape or to catch ((Refer Time: 20:22))
essentially. So, again the strongest lions will be able to catch deer, by strongest we mean
fastest so, whatever is the good quality, the fastest rabbits will be able to escape.

306
So, fastest is just a, just a actually that I am using it, it is not necessarily fastest, it could
be something else, cleverest or you know something which can disguises itself very well
or any kind of things that nature adopts to. So, there is a constant battle between the
predator and the prey essentially, both are involving.

How are they involving? Because within, the competition within the species is selecting
better and better individuals, which are propagating there propagates in some fashion to
the next level essentially. So, that bring us to one makes up the combination. So, what is
the combination that nature has adopted? It is of sexual reproduction and what is it lead
to it? Leads to genetic mixing, this is a key. How does nature experiments with different
life form?

It mixes up genes of individuals. So, every creatures is at least from this earth has born of
two parents and inherits the genes from both the parents. And there of course there is this
whole theory of which are the dominant ones and which are the recessive once and that
kind of stuff. But eventually there is a experimentation with newer or newer
combinations of genes that life is doing.

So, we are, we up, we use the term genotype for this. Nature is experimenting with the
genotype of, so, genotype is like the design of a creature. So, of course you know
amongst human being we say that this child has inherited the height from one parent or
both parent or something like that essentially or colour of hair or colour of eyes, all kinds
of things that you have inherited from parent.

So, you inherit some from one parent some from another parent, and within a human
family one sibling may brighter than another sibling and things like that because they
have inherited different things from different parents and so on. So, we have this whole
churning of the genes which is happening then with the genotype.

And at this other level which we call as a phenotype, it is competition is happening. It is


a physical creature which is competing on the, in the real world essentially. So, what is
this we have to do with the optimization? We want to look at a method of optimization
which is kind of takes inspiration from this world of natural selection.

307
So, this whole idea here is that of emergent behavior and the idea of emergent behavior
is that you put together simple parts to give you a more complex things essentially. So,
nature is, it is this building blocks of life, which are called the genes that we have, which
is trying out with different combination of genes and letting them loose in the world.
And if they survive, they survive.

So, they are competing for food they are competing for mates, they are competing for
shelter, the once who manage to win they will pass on their genes to the next generation
and in the process we will have in some sense fitter population. So, the population of
rabbits becomes faster, again I am using faster just as we are talking about it. As they
evolve they become faster, at the same time foxes are becoming faster. So, there is a sort
of war going on between them essentially. So, emergent behavior is this approach to
optimization, it says let us try to do this so called random mixing of genes.

So, again this is a randomize process, and see if we can somehow keep track of the best
one. So, what is the basic idea? You allow for random mixing of genes and you have a
mechanism for keeping the best. So, this thing, one makes up the combination and the
other chooses. So, there is this two processes, one is combining things which generates
and test, one is combining things to produce a new creatures and there is some
mechanism which is judging. So, which creatures are good and which creatures are bad
and allowing them to survive essentially. And in the process, we want to see if we can
improve upon the designs and come up with the optimal design, whatever optimal means
here.

308
(Refer Slide Time: 26:56)

So, this idea of optimization is called by the known name of genetic algorithms, also
called GAS for example. And they were devised around 1980 or so by John Holland, and
one of his students call David Goldberg is still very active in this area. And in fact, he
has written a book which is called genetic algorithms or something like that, which is a
very popular book on this topic essentially. So, the idea of the genetic algorithms is that
they work with populations.

So, the first thing of course is you have to first appears to encode. You are working in the
solutions space so, you have to encode a solution as a string so, I just used a term string.
But if I were to be from this community, instead of string I would say chromosome. So,
it is a chromosome which start of contains the design decisions that you are going to
work with, but will tend to think of it as a string essentially. So, the first thing is to
encode the solution as a string. So, the simplest example that we can think of is the sat
problem where the solution is just a bits string essentially or TSP can be encoded as
some sort of string.

And for any kind of a problem that you want to solve with GAS, you have to encode it as
a string essentially. So, this population is a population of strings essentially. So, this
population p 1, p n let us say we have n population of n elements inside it essentially.

309
Then, the next thing you want to do so, the first thing is to encode.

Candidate solutions are string essentially so, essentially what are we trying to do that we
will work with the population of initial randomly generated candidates. And then, try to
mix up the strings to produce new candidate which will hopefully be better essentially.
And how do we decide better?

So, we do not let loose this things in to the designs in to the real world as nature does.
We devise something call a fitness function, it s a f of n which is the same as what we
call eval of n which was the same as what we call of a h of n start with.

Remember that we came to this world of optimization by starting with a heuristic


function and that we said the evaluation, we call it the evaluation because that is what the
optimization community calls it. And in particular the genetic algorithms can be calls it a
fitness function essentially. So, with every candidate we have a fitness value, which tells
us how good that candidate is essentially. I have to see how much time we will have. So,
there are, there are 3 basics steps to a genetic algorithms so, we start with a initial
population P.

(Refer Slide Time: 31:41)

310
And then, the first step is the so, this is the preliminary what we are doing designing the
system, but the algorithms itself, the GA algorithms has 3 steps, the first step is called
selection which is proportional to fitness. So, unlike the real world, genetic algorithm
does cloning essentially so, just creates copies of candidates.

So, what we do? We have these candidates p 1, p 2, p 3, p n and for that each of them we
have a fitness function, and we reproduce this population, clone this population by clone
individuals in this population, in proportion to their fitness which means we will make
more copies of fit candidates and less copies of unfit candidates, less meaning 0
essentially or more could be 1, 2, 3, 4, anything.

How do we do this? We create, it is a similar process to how do we create this whole


random process. We can think of it as a ((Refer Time: 33:34)) in which we assign sectors
proportional to their fitness in is p 1, is p 2, is p 3, is p 4 and so on. The amount of angle
they get is proportional to the fitness value so, p 4 is very fit so it gets a large angle, p 3
is and so on and so forth. And we just sort of rotate it randomly and we have a some
pointer, where ever it stops that candidate gets to reproduce once.

So, we rotate it let us say P 4 it stops at P 4. So, remember that this is something which
can rotate and we give it a random rotation like ((Refer Time: 34:26)) and observe where
the needle is. We do this n times and generate n new, a population of n new elements.
So, let us call it P 1 prime, P 2 prime, P n prime. So, this is the first step, selection. We
are just making copies of the original, but which the difference that everyone does not
get to make a copy.

Fit members have a greater chance of making more copies that you can see there is very
lightly that P 4 will stop at, the P 4 will be created, will stop at this, at this arrow more
than once essentially. So, P 4 may get more copies, this thing whatever it is here will
probably get 0 copies or something like that. But of course it depends upon chance, the
randomized algorithm, we cannot say deterministically what will come about it. All we
are saying is that these rule wheel we will spin n times and whichever names of the
candidates appear against the needled we will create a copy of that.

311
So, we will create a new population of n candidates and you can see that this population,
the fit members will have a greater chance of appearing more than once, and unfit
members may get the wheeled out essentially. So, it is a little bit difference of what
happens in nature, we do not create the final thing, final individuals and out, and send
them out to compete.

We allow them to clone in proportion to how good they are essentially and they can get
to clone more than once. After that is the second step is called in ((Refer Time: 36:18))
of the j a community, it is called cross over. And what we do here? We randomly pair
this population.

So, for example, we may pair P 1 with some other one, P 2 with somebody else and so
on and so forth. We randomly pair of these elements of this new population and we
randomly mix up their genes, that mix up of genes is called cross over and that is just a
term that we use. So, the idea is something like this for example, let us say P 1 is a made
up of let us say gene which we call x. Let us say that there are 9 components to this.

So, for example, if it is a sat problems with 9 variables then, there will be 9 bits
essentially. So, we are just calling them x 1 to x 9 then, P 2 so, let us say p 1 is been
paired with p 2, or you can call it P i and P j. Since, I am using, that is a P i and P j and
let us call this y 1, y 2, y 3 up to y 9.

So, where these 2 parents so, p 1 and p i and p j are the two parents which have been
mated together by this random process of mating. We just randomly made two elements
and then, we wanted to the mix up their genes. So, one way of doing it to just draw line
somewhere and generate new children. So, c 1 is equal to we take x 1, x 2, x 3, x 4, from
here and then, y 1, y 5, y 6, y 9, from this essentially. The other child get the other genes
so, this will get y 1 to y 4 and x 5 to x 9.

This process of mixing of genes is called cross over, and you can see why it is called
cross over is that you know some genes are crossing over from one candidate to the other
and vice versa policy. This particular cross over is called single point cross over, but you
can devise other cross overs, I mean there is no, nothing ((Refer Time: 39:17)) about this

312
single point of a, just to illustrate we have done that.

So, we will randomly choose a point here and everything on one side will go to one child
and the other side will go to the other child and likewise for the other distinguish. So, we
can imagine a sat problem in which you get first 4 bit value from one candidate, one
solution and the, and the next 5 bits from the other parent essentially, in likewise for the
other child essentially.

So, you can see this 2 components of it talked about, one makes up the combination is
being done by the cross over part, and this the other chooses is being done by the
selection part essentially. And we basically put this in to a loop essentially. You take a
population generate a new ((Refer Time: 40:11)) population, do cross over and go all
over again.

Now, for the sake of completeness that sometimes a population may be lacking a gene
essentially. So, how do, new kind of species emerge out of this process essentially? We
have a third step so, the first step is selection, the second step is cross over, and the third
step is called mutation. And by mutation we mean random and not only random it is a
rare event. Every once in a while you change one gene randomly in one individual so, we
had this whole population of n elements here which is reproduced according to fitness
then, we do cross over to make sub genes. And then, we make some random change in
one candidate.

So, this is a just to allow for new genes to come in or if a so, for example, if you have a
population of this sat problem let us say these 9 variables sat problem and you have a
population of 50 candidates and it so happens that all 50 of them have their 3 rd bit as 0
essentially. Now, we can see that if all 50 of them as a 3 rd bit as 0 then, however much
we do this churning the 3 rd bit will remain 0. Because both parents are going to have the
3 rd bit as 0 and this cross over operation will result in the 3 rd bit means 0 in both the
children essentially.

So, you have lost that 3 rd bit in some sense. To create it to something like that, we have
a random change somewhere in mutation, but we do it very rarely essentially. These are

313
just for the sake of completeness. So, the primary processes is selection and cross over,
but there is a process which once in a while we do which we make some change in some
candidates essentially or random change in some candidate.

So, when we talk about sat we are talking about bits, but otherwise we have to do this
encoding in to the production in a low careful fashion. And then, we have to because you
know one bit is nice, but most, for most problems you cannot have a components
represented by 1 bit, you will have a chunk of bits. Then you have to devise a algorithm
to move around chunk of bits rather than you know just single bits and then, you have to
be careful about mutation that you can change mutate something in to something
meaningful and not something which is not meaningful ((Refer Time: 43:12)).

(Refer Slide Time: 43:22)

So, from this we get a new population let us call this P 1 prime prime, P 2 prime prime, P
n prime prime. So, the first P prime population we got by cloning and this P prime prime
population we got by cross over essentially and let us say mutation has been taken into0
account here essentially.

Now, there are variations to what people do at this stage, if I assume that these are sorted
on fitness values then these are the best on the, just for the sake of processing, these are

314
sorted, the most fit numbers are here and the least fit numbers are here. So, what one
might do is that, one might take the most fit k elements where k is a little bit smaller than
n and we place it in the original population by the least fit k elements.

So, what are we doing here is that we started with this population of n elements, and let
us assume that they are sorted according to the fitness here just for sake of allowing me
to draw this diagram. We clone the entire population to get a new population and when I
say entire population, it means that you get representation in proportion to a fitness
value.

A very fit member will get more copies, and a very unfit member will get less copies or 0
copies, some will have to have 0 copies then, you do this cross over, mix up the genes,
took arrived at this population which is now mix up the genes of the old ones. Now,
some people would argue that what if there was a very good candidate here. This P 1, let
us say its fitness value is 98 or something like that, and everything else let us say began
with 67 and 65 and so and so forth. Are not you losing your candidate with fitness value
98 essentially?

To, create it to this, one way to devise this, is that you take the most fit to most fit
candidate in this new population and add them or substitute this least k fit population
with this worst k fit population. Then, keep those few top from the original population
here, we just design question.

You can say k equal to n and that is what really happens in nature essentially. I mean
nature does not allow one to live in definitely essentially. If you want read some of these
work by Goldberg in fact this book by Goldberg then, he goes on to study as to how
should we design these encoding scheme so, that good genes in some science have a
tendency to stick together and carry forward essentially.

So, let us say if we have this some or let us say this sat example with 9 genes or 9
variables. And let us say some good values we have found in some candidate where let
us say x 3 and x 4, we have found what should be the value in the ultimate solution. How
do we kind of make sure that, they stick around together essentially. So, Goldberg talks

315
about things like cluster of jeans and so on which will so, he calls, he has a some notion
of saying that if you have gains which sort of a design kept together then, they will stay
together. They are more likely to be not split, broken up essentially.

Essentially one of the things with this genetic programming or genetic algorithm
community is concern with is that is there some way we can ensure that when a
candidate has good genes, we can ensure their good genes stay together essentially. And
they do not get broken up and divided in because in this case as we can see, it is a very
clear cut exchange of genes essentially.

Every genes of parent for example, if x 3 is a good gene, it will go to exactly one child in
this way, in this scheme of thing because c 1 will get x 3, but c 2 will get y 3 essentially.
So, it is not as if, as it, as it happens in the real world, their good gene can go to both the
children, in this case it will go to only one child.

So, if x 3 and x 5 for example, happen to be good genes, how can we sort of ensure, how
can we ensure that you know good genes stay together? It means they stay in one
candidate only essentially. So, this is a kind of stuff that Goldberg has looked at, you will
not go in to those details here, but you can see that this scheme of things where we select
the k best and replace the k worst with this k best is in some sense trying to keep the
good candidates still alive essentially.

Of course this even prime prime which is a totally new candidate because it is got
generated by the cross over. It could be P 1 cross over with some other elements is
totally different then, it could in fact be better than 98 it is quite possible essentially. That
is a hope that as we mix up genes, the candidate with the good genes will propagate their
genes more often and we are showing this by the process of selection which is
proportional to fitness essentially.

So, I think I will stop close to here, if there any question you can answer them. In the
next class when we meet, I will take up an example again from Goldberg’s book to try
and give you ((Refer Time :49:55)) behind genetic algorithms. But I also wanted you to
think about our, not our, the community’s favorite problems which is the travelling sales

316
man problem. How can we use genetic algorithms for solving the travelling salesmen
problems? So, either way outside let me say that because the way that we have encoded
TSP so far is that let us say we have 9 cities.

(Refer Slide Time: 50:29)

So, let us say parent 1 is 1, 2, 3, 7, 1, and parent 2 is let us say 1, 2, 3, 4, 5, 6, 7, 8, 9. I


have these two parents which means I have these two towards. So, here you go from 2 to
3, 3 to 7, and 7 to 1, and 7 and so on. Here you go for 1 to 2 and 2 to 3 and so on so forth.
You can see that the single point cross over will not work here because if I just take a
random point let us say here after 4 cities then, what I am doing? I am going from 2 to 3
to 7 to 1 to 5 to 6 to 7, you can see I have closed the loop even before, I am not allowed
to that.

In a travelling sales men problem, the solution must be a permutation of the cities, but if
I do, if I do this single point crossover and say this is, I will take this cities from here and
this cities from here. What I get is not really a candidate tour because this tour has a loop
inside before.

So, again from 2 to 3, 3 to 7, 7 to 1 then, you are going to 5 to and then to 6 and then,
you are coming back to 7. And you are never visiting some cities, you never visiting 4,

317
you never visiting 8, you never visiting 9. So, that is not a candidate solution. What we
want is a population of candidate solution, and what is a candidate solution? In which
every city is visited exactly once and then, which means basically it is a permutation of
the number 1 to 9 and this sequence of numbers that is not a permutation because 7 is
repeated and 4 is missing essentially so, it is not a permutation. How can we devise cross
over operators for the TSP essentially.

So, this has, this question is in two parts. Can we think of a different representation of
that candidate solution, and can we think of different operators essentially, different
cross operators? And there is been a considerable amount of work done in trying to
experiment with different kinds of cross over operators.

So, we generalize the notion of cross over to say that you take two parents, and you must
somehow get two children which are generated from those two parents essentially. So,
these two things we will do in the next class to start with. Look at the example of genetic
algorithms for a simple problem and then, we will look at how TSP can be solved using j
s and I wanted to give some thought to at before come to the next class because we will
stop here.

318
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Lecture - 16
Population Based Methods for Optimization

(Refer Slide Time: 00:18)

So, we continue looking at optimization and we are looking at methods which we called
as population based. So, basically when we say population based methods and we will
spend some time on this today. We mean that instead of working with a single candidate
solution for example, as we do in hill climbing or in simulated annealing we work with a
population of candidates. Now, the benefits that we get by working with populations I
have not just speed up, because it is not as if you are running the same algorithm k times
or n times if n is the size of population. You also get benefit from the fact that different
candidates can contribute jointly to exploring new solutions. So, essentially we are
exploring the solutions space which for real problems may not be a smooth surface. And
we are looking for methods which would be robust which even when the terrain is jagged
we will do reasonably well essentially

So, in population based methods the first method that we are looking at this moment we
have called it is genetic algorithms some people also call them as evolutionary
algorithms. And they have a flavor of bottom up problem solving we we also called as

319
emergent. So, by emergent systems we mean that we put together a collection of simple
elements and more complex behavior emerges out of that. And as we might have said
before the human brain is a perfect example of that every neuron in our brain is
computationally a simple device, but it just that we have billions of them in our head.
And they are it is a connections it is the way that they are connected to each other that
really give raise to our individual brains essentially. So, the idea in genetic algorithms is
that we have candidate solutions rather a population of candidate solutions and each
candidate is made up of components.

So, you can think of this some kind of a breakup of every component now in the simple
case that we saw about solving sat every component is the value of one variable or one
bit. And a sats can sat solution candidate is a bit vector, but in general problems may be
more complex. And components may themselves very more complex in nature
essentially in which case of course; we have to be very careful. Because the general
strategy in solution space search or perturbation search that we are exploring is that we
perturb candidates to produce new candidates. So, if we had this as a candidate then
simulated annealing may try to generate a neighborhood around this by changing one or
2 components as we saw in the case of sat and do that. So, one has to do the little more
carefully essentially.

So, when we talk about genetic algorithms we have to talk about encoding candidate
solution as a string in terms of genetic algorithm. We would call this as a chromosome
borrowing from the biological world. And essentially there is an interplay between
chromosomes of different parents essentially which basically modifies the design of the
candidate or design of the solution essentially. So, we had mentioned in the last class that
there are these 2 things; one is the genotype which is made up of the genes. And that in
that influences or they decides the behavior of the phenotype which is the individual or
the creature or whatever is participating in the real world essentially. So, in genotype we
look at recombination’s and the phenotype basically competes and its participates in
selection. So, these are the main 2 components of the problem solving strategy that you
might say nature employ employs which is the trial and error with its designs.

And the design is expressed as a chromosome essentially which is some kind of a string
that we are going to work with and we said the use a word recombination’s. So, for idea
in genetic algorithms is that you draw some components, some one parent some

320
components on another parent and in the hope that you will inherit the good components
of both parents. So, if you going back to the rabbit and foxes example, let us say there
are 2 desirable properties amongst rabbits. One is that they can run fast and other is that
they are smart so whatever that means you know. So, for example, you must have as a
child heard a story about the rabbit who has to go to the lion in the jungle as food. And
then he is a smart rabbit and he tells a lion of another lion and that kind of stuff. So,
either they are fast or they are smart essentially which is contribute today’s survival.

So, the whole idea of evolution is that in a population of candidates, the phenotypes, the
fitter ones are the ones who survive the competition. Competition is for resources it
could be for food, it could be for mates, it could be for shelter which is you know safety
from the predator essentially. So, if you are good at doing these things then you are
likely to survive which means you are likely to pass on your genes to the next generation
and so on and so forth. Now, imagine 2 parents of a rabbit; one of them is smart and the
other one is fast essentially. The hope in genetic algorithm is that if these 2 parents have
let say 2 children that is module that we are following here at least one of them will
inherit. Both these properties of being smart and being fast at the same time which means
it will be like a super rabbit essentially right and super rabbits will give rise to more
super rabbits so on.

So, in general the rabbit population will become smarter and faster essentially. So, that is
the whole idea behind genetic algorithms like we said one makes up the combinations.
So, there is one processor one force which makes up the combinations jumps the genetic
pool and the other which is the process of selection which is that they compete for
survival essentially. So, survival of the fittest and when we say survival of the fittest at
least in the case of genetic algorithms we are going to impose a fitness function on top of
the algorithm. Because it is based on the fitness function that we will the candidate will
survive, but in the real world its other way round it is basically whoever survive this is
the one whose fit essentially. So, I want to do one example of to illustrate genetic
algorithms today, but before that I want to sort of spend few minutes on this idea of
emergent systems essentially. Because it is an idea which is has a lot of follows
essentially.

321
(Refer Slide Time: 09:32)

So, you must have heard about this conveys game of life convey something called game
of life. Have you heard about the game of life? It is like a screen saver sometimes and the
world is a grid in cellular voltammeter. So, you know we have rows and columns and so
on and in the simplest form of this game in every cell. So, every square is a cell inside
there can be one creature and creature is a very simple creature it survives. So, it has
some rules for survival there are rules for surviving there are rules for being born even in
this very simple world there rules for dying essentially. So, the simple rules in this
conveys game of life are that if you are surrounded by more than three living cells or
creatures in living in cells. Then you are sort of overcrowded and you die essentially
which means that each cell is in 1 or 0 1 means its alive and 0 means its dead or the
creature inside is dead.

So, if there is a one surrounded by more than three ones then it dies it becomes 0 if a 0 is
surrounded by exactly three ones then it comes alive. And if a 1 is surrounded by less
than three ones and it stays alive essentially or something I may have a small error there
but anyway something of that nature. So, you have this, very simple rules which control
how a cell evolves over time. So, it is an automate which goes through a sequence of
time steps. So, at every time step every cell decides whether in the next time step it will
be alive or dead or whatever essentially. Now, it turns out that this very simple rules of
life if you want to call it life give rise to patterns in this world which is a grid world and

322
which have a tendency to survive. And some patterns survives some patters oscillate
between 2 3 patterns and some patterns even move forward.

So, if I remember correctly there is a pattern which looks like this that if this is one and
this is one and this is one. So, these five patterns if they are one and maybe you should
try the rules that I just mention on this, these cells. Then this pattern essentially in the
next cycle we will move little bit to the right may be it will rotate by a little bit and it will
keep doing that. So, as the time progresses this pattern will appear to be moving in this
direction. Now, this as sort of takes me back a little bit to the introduction that when you
were looking at you know what is reality? What is really out there? And that kind of stuff
we were discussing in this simple world. What is really out there is that these are the
cells which become 1 or 0 essentially. And that is about it this whole world is full of cells
and each cell can be 1 or 0. And they follow certain rules whether to remain 1 or 0 in the
next time cycle.

But in this world certainly we see patterns like this; this is called the glider gun and this
is the very old game, game of life was invented in 60’s or something like that essentially
or may be 70’s. So, if you just search game of life or glider gun on the web you will
probably get to see on animation of this. So, this looks like a creature norm you know of
this shape which is moving around essentially. Of course, we are also like that in some
sense made up of how many 10 rise to something I had mentioned 28 or something
simple cells moving around in a unison pretending to be a creature. But that is a whole
idea of emergent system essentially that you put together simple cells or simple
components and put which obey certain local rules each cell only looks at it only
neighborhood. It does not see anything beyond its immediate neighborhood and that it its
fate is decided base based on that essentially you put in such local rules and build a
world like this.

And you will find that patterns I mean because we tend to think of them is pattern
emerge essentially. So, basically combinations of simple things tend to behave in a more
complex way essentially. So, more recently there has been a lot of interest and now
imagine that instead of being this very simple creatures that these cells are which only
look at the neighborhood. And decide whether to die or not to die they were slightly
more complex creatures. For example, they could have a little bit of a memory of their
own and they could have a little bit of listening something. Then we sort of call them as

323
populations of agents or some people call them as multi-agent and you are talking about
simulations here. So, a lot of people for example, social scientist are looking at these
multi-agent worlds in which they put together this simple agents and they study the
behavior of those. Of course, they put together the rules and so on so forth for example,
work ordering rules or things like that and sort of look at more sophisticated
implementations of this.

So, I would there is a language called net logo so you must have heard about logo of
course, devise by paper long time ago to teach programming to children. So, there is a
total which you can say move north or move west and so on so many step certain by so
many steps in that kind of thing its net logo is the more sophisticated language. It has
been in implemented by some people in the north western university in the US and its
available freely online and many. And I have some friends who are sort of working with
this and trying to implement these agent based simulations. And so this whole idea of
exploration is to put together simple elements impose rules upon them and then study the
behavior. So, people have studied for example, crowd behaviors and one of my friends
who works in TCS in Pune. He is been doing this sort of a thing and he is sort of
simulated this behavior of this crowd.

So, there was this fire in some restaurant in Brazil or some are in the crowd behavior he
could replicate that crowd behavior by imposing very simple rules on the agents
essentially. So, what do you do if you are in a restaurant or a bar and there is a fire in one
corner of the room? What would you do essentially you know, so impose simple rules
upon then and then you can actually study the behavior of crowd essentially. And
nowadays of course, crowd behavior is of great interest specially in situations like you
know this Uttarakhand tragedy or any tragedy that befalls upon this. You want to see
how to how the crowd would behave essentially. So, the so there is one direction to this
population based system is to is this agent based simulation you create simple agents put
them together in a world. And then simulate or run the world to see how what kind of
behavior would emerge essentially.

324
(Refer Slide Time: 18:25)

But we will come back to our genetic algorithms which is also little bit like that except
that there is no spatial connotation here we have a space which is made up of grids and
cells in which the people live but in g a’s we just have a population. And we do not
really talk about space essentially though there have been people who have also started
speaking about spatial distribution of populations essentially. So, just as human
populations are sort of distributed in different continents and different countries and so
on and so forth. They have try to see whether you know the whole population can be
partitioned into some sub populations and evolution basically interaction mating. For
example, happens only within that population and not outside this population. And then
they try to impose rules like migration when is the agent allow to migrate to a new
country. If I am use the word under what conditions will this migration be successful
under what condition will it fail and that kind of thing essentially. For, so those of you
have been following Australian news and elections are coming up they have been taking
a very strict stand against emigration.

So, they have said that anybody who arise by boat will never be given Australian’s this is
in ship essentially you will also rules like this. And you can see how it will affect the
population one can think that but we work with simpler notions which is that we just
have a collection of populations and what is the algorithm that we had talked about the
first step is selection. So, we have this population of some elements and they are allowed
to reproduce themselves based on some fitness value something like this that is the first

325
step which is selection. And we had said that we will implement the rule wheel kind of a
mechanism of course, you have to work out how to do that which will tell which element
gets to reproduce. So, if there are n elements we will spin the we will n times and
whichever elements comes in front of let us say the pointer we will get to reproduce
essentially then crossover which basically means that we randomly pair the new
population that we have. So, let me draw this by squares we randomly pair them. So, for
example, we could say we pair this one with this one, this one with this one, this one
with this one maybe I should have an even number.

But still we do random pairings and allow the cross over operator which is basically
mixing up the genes of the 2 parents and producing 2 new children. So, you can
centerline sort of see this as a paddle search in the same space solution space except that
we start with a move is made up by somehow selecting 2 parents. And then generating 2
new children out of it instead of one parent giving rise to one child like for example, hill
climbing or simulated annealing here 2 parents give rise to 2 children in this example
essentially. And then we had said mutation which is kind of a rare event once in a while
you change some gene randomly essentially. This is meant to take care of the fact that
some genes may get lost in so on and so forth essentially. So, let me take an example to
sort of illustrate how genetic algorithms work and then we will come back to discussing
what are the issues that you have to decide upon when you are using genetic algorithms
for optimization So, this is an example which I am taking from this book by Goldberg.

(Refer Slide Time: 23:00)

326
So, remember I had told you that the genetic algorithms are device by somebody called
Holland John Holland and his student called David Goldberg has written this very
popular book on genetic algorithms. And this is basically from the first chapter which is
the introduction chapter of his book. And it is a nice example and I like it, because it
illustrate the whole idea of g a quite nicely. So, it is a very it is called very small
population of size 4 so we have this 4 elements. So, remember that the first thing we
want to do is to encode this. So, when we want to use the g a to solve some optimization
problem we have to first encode the candidate solution. Then we have to devise an
evaluation function which will look a candidate and give us a number. And ideally the
evaluation function should be in sync with what we really want to do which means that
for the optimal solution the evaluation function must have the highest value essentially.

So, let us assume that we will do that and then with the genetic algorithm thing comes
into play. Now, in this very simple world I have a 5 bit vector and so the 4 elements that
I have in this example. So, I have a small population of size 4 and these are the 4
candidates represented as bit strings. And let us say that we implement in we interpret
them as binary numbers which means that the numbers that we are talking about are this
is 13 this is 24; this is 8 and this is 19. So, essentially these are the candidates but this is
the representation. So, we have this 4 numbers 13 24 8 and 19 and we want to know our
optimization task is to find the largest number that you can represent with 5 bits. It may
sound like trigger to you, but we want to see how genetic algorithms work essentially
and that is our goal and it is a problem that we understand quite clearly essentially.

And, so we have to have this evaluation function right. So, let us say we decide so this f
is a fitness function which will take any candidate and give us a value. And just simply
let us assume it is a x square essentially a simple fitness function essentially. So, if you
want to evaluate this 4 candidates the values that we will get 169 596 64 361. Now, what
is the role of fitness in our algorithm? Our fitness influences the selection process the
chances of a candidate reproducing itself our proportional to fitness. So, in our rule
wheel we will have this 4 sector with areas proportional to these numbers essentially. So,
this second candidate will have the largest area the fourth candidate will have the next
largest area and so on and so forth.

So, we will have a rule wheel like which looks like this something like this. So, this is 2;
this is 1; this is 3 and this is 4. So, we just rotate it and whichever thing comes here we

327
select that essentially. So, the probability of it reproducing is proportional to the fraction
that this is of the sum of this whole thing. So, this probability value is 0.14 for this. So,
how did I get this values basically you sum of this 4 numbers and then this divided by
the sum is 0.14 this divided by the sum is 0.49. So, that is how you get the probability
values which means that every time you spin this rule wheel the second candidate has
above 50 percent chance of being reproduce this candidate has about 30 percent chance.
And these 2 candidates have little less chances essentially that is how the areas are
divided essentially.

So, the expected values let us call them e are basically these values multiplied by 4,
because we are going to spin the wheel 4 times in our case. So, the number of incenses of
these candidates that we expect to see is basically 4 times this. So, this turns out to be
0.58 1.97. So, what are these numbers? These numbers are saying that after I have done
this selection space. And what is the selection space I will spin this wheel 4 times and
whichever candidate this arrow points 2 I will make a copy of that candidate. So, after I
spend 4 times I expect to see about 2 candidates of this about half candidate of this about
one fifth of this and 1.25 of this roughly is essentially. But of course, in the real world
these fraction do not mean anything I in the end I will end up seeing 4 candidates. And
let us assume that when we actually carryout this experiment the values that we get are 1
2 0 and 1 its quite feasible, it is quite possible that if I will spin the wheel 4 times.

I will get 2 copies of the second candidate one copy of the first one copy of the fourth
and none of the third one essentially. So, let us say this is what happens this is where the
random process is this is the first place where this random process is showing its effect in
the process of producing the next population. So, this population is actually this the one
that we have said after we have done the selection space we started with this and we end
up with this essentially. So, let us redraw that population so 0 1 1 1 copy of this 2 copies
of this and one copy of this. So, this is my new population so selection space is over now
begins a crossover space. So, how so let us say we pair these 2 and then we pair these 2
and we put some random crossover point. And we are doing this single point crossover
which means at some point we will break this chromosome into half and the same point
will break the next one into half and then we will exchange the 2 essentially.

So, let us say that our crossover points are like this one is here and the other one is here.
So, this will give us a new population So, this 0 1 1 0 will come here followed by a 0,

328
because this 0 comes from the other side likewise this 1 1 0 0 will come here followed by
this one which comes from the other side. And similar thing will happen here 1 1 0 1 1
and 1 0 0 0 0. So, just to emphasize I take this part and I take this part and I get this this
one here and likewise for the other 4 essentially. So, this is my new population at the end
of one round one cycle of this let us see whether we have a better population to start
with. Now, if you look at this average fitness for this it happens to be 293. And the
people who work with genetic algorithm tend to talk about average fitness of this thing
that in general the population is becoming fitter. But of course, we are interested in the
most fit element that is the different point essentially.

So, let us see how is the new population. So this numbers corresponds to 12 25 27 and 16
and there are f of x which is equal to x square corresponds to 144 625 729 256. So, the
first thing is that average for this new population is 439. So, starting with this average of
293 we have gone through one cycle of reproduction and mating or in crossover and we
have got a new population which is this. So, we started with this whose average fitness
was 293 and we ended up with this whose average fitness is 439 essentially. So, this is
the kind of thing which g a’s are looking for that can we get fitter populations and so on
essentially. So, the whole idea is that if you have some function in which you are trying
to optimize. Then initially you are population may be decrease sort of spread over this
whole domain where this is the let us say one dimensional domain. But after you have
done the g a’s sufficient number of times you will find that the population is kind of
concentrated towards the peaks.

And one of them hopefully is the solution that you are looking for essentially. So, that is
the general idea behind genetic algorithms. So, let us look at this next population and
what would happen in the second cycle? We will not repeat it but we will just start
thinking about it essentially. So, we look at the probability again if you remember the
probability is obtained by taking the sum of all these 4 numbers. And if I 144 divided by
the sum would give me the probability of this one showing up in this rule wheel the rule
wheel has change now. So, this happens to be 0.08 this happens to be 0.35 this happens
to be 0.26. So, these are the 4 probabilities value and corresponding to them the expected
value which is basically 4 times this number is 0.32 1 1.4. So, these are the values so
want you to now inspect this new population. So, we started with the one population
whose characteristics are given here average fitness is 293. And this as the expected

329
number of cloning’s allowed here also we can see the expected number would be I mean
these are the expected number sorry 1.97 0.22 1.23 and so on. And those are the
expected numbers here 0.32 1.4 1.66 0.58. So I wanted to look at this new population
and its fitness values and make some observations well, average fitness has gone as you
can see is become 439. The maximum fitness here goes 576 and there it is 729. So, that
is also gone so is there any problem? What will happen if I continue this cycle? Will I
get what is the solution that I am looking for?

Student: One solution.

Five ones; so if I run this algorithm let us say for hundred cycles what is the what are the
intuition say? Will I get that or not?

Student: Probably little bit is 0.

Exactly now, if you look at the expected values of these 4 elements, these are the 4
elements I have and the expected values are 0.32 for the first one 1.4, for the second one
1.66, for the third one and 0.58 for the second one. So, it is very likely that when I do this
random spinning of the rule wheel and produce 4 new candidates out of 4 copies of sum
4 of these it is very likely that this first candidate is going to be left out. Why because
expected values its probabilities only 0.08 and if I spin it 4 times my expected value is
0.32 essentially.

Notice that if I want to spin it 100 times its value would expected value would go up, but
that what amount to saying that these 4 candidates are being clone 100 times. We are not
doing that we are assuming here that the population remains constant that we start with
this 4 candidates and clone 4 new candidates out of it. And if I do that based on their
fitness values its very likely that the first candidate will get left out and as he pointed out
this is the only candidate of these three which has the one in the middle place essentially
on the third bit essentially. So, if I want to remove this, so let us say I want to take 2
copies of this, one copy of this and one copy of this then no matter what future churning
that I do with this, I will never be able to get the middle bit as one. So, I like this
example, because it illustrates exactly this point. Now, what is the ruminative this? Why
is this happening? How can we try to see that this does not happen, go back to nature if
you want to essentially

330
Student: The writing in the genetic the writing (refer time: 39:16)

When will they become more and more uniform? What you are saying has the point that
yes it is possible that for a given species. So, let us call it as species since you are talking
about nature as well. So, if you take for example, the cheetah now the cheetah if biologist
want to look at it is. And if you want to call it a machine is a perfectly design machine
for hunting its it can in fact, you know you get to see car adds which kind of try to
portrait them as cheetahs and stuff like that. And the reason for that is that they can
escalate very fast and they can attain very high speeds which is good for hunting
essentially. Now, the cheetah was such a good predator such a good hunting machine
that they all became similar in a sense this capacity to learn very fast and catch free came
in all. So, the genetic the diversity in the cheetah population became less and less and
less essentially. And if the diversity becomes less it means your tendency to explore the
space decreases essentially. This is what is happening with this here if we have getting
one candidate without the middle gene. Then you can never look at solutions which have
the middle gene essentially.

So, likewise if you lose something’s then you become rigid in your generic makeup
which is happen to the cheetah. And which is one of the reasons that since human kind
change the world so much that cheetahs do not have this open spaces. And which they
can go and catch their pray they are on the danger of becoming extinct essentially.
Simply just like this particular creature or the gene which this creature is carrying which
likely to become extinct in the next round essentially. But the question that we want to
ask is what lesson should we learn when we device g a’s. So, when we devise genetic
algorithms what are the parameters that we have to select the first? Of course, is how do
we encode the solution? Encoding the solution should be done. So, encoding followed by
evaluation these are the first 2 steps; how do you encode your candidate solution? And
secondly how do you what is the fitness function that you devise for essentially?

Now, remember there is this fitness function will have to be computed for the entire
population. So, one of the places where I was reading about genetic algorithms they said
that imagine that you are trying to devise a neural network for pattern recognition. So,
you want to recognize some patterns let us say hand written characters or something like
that essentially. And if you have a training set of 1000 patterns given to you based on
which you are training the neural network. And you are task is to identify the parameters

331
of neural network which basically means the number of nodes and the edge grids and
that kind of stuff. What is the fitness function? You could use one fitness function; you
could use is that you could test the candidate. And the candidate in this case is the design
of a neural network or a neural network on the thousand training candidates that you are
that are given to you. And the fitness value is the number of candidates this network
correctly classifies.

Now, if you want to devise such a training such a fitness function you can imagine the
amount of work the generic algorithm has to do. Simply to evaluate the fitness value of a
candidate our example is very simple we just compute f is equal to x square. It is easy to
do it on the board, but it not a problem that is hard to solve anyway essentially. If you
want to look at more difficult problems then you have to be very careful in how do you
encode the candidates worst. And in the next class, we will come back to TSP, how to
solve TSP’s using genetic algorithms. Then how to devise a good fitness functions? The
fitness functions reflect what you want to achieve, because in the way that we have built
our algorithm. It is a fitness function which dominates this selection process that this
process of cloning one population into the next. And the number of copies you make are
dependent directly upon the fitness functions. So, the fitness function if it is good it will
do a good job for you essentially, but there is still one more parameter to be decided and
that is.

Student: Randomization.

No no randomization is a built in process randomization takes place here in the selection


in the spinning of rule wheel and the other place. It take place is in deciding this
crossover point here where do we do a crossover? May be it can be done in some more
informed fashion and people have tried out all kinds of things, but we want to look at
general purpose algorithm at this one.

Student: Sir

The third thing.

Student: how do you decide crossover I mean.

332
Sure, so I will take your question to mean how do you decide what is the crossover
operator essentially? So, in this case now, if you think about a simple situation like this
where the solution is the bit string. Then crossover essentially will mix up bit strings
essentially and without loss of generality a single point crossover will do like any other
crossover might have done. Though as I said in the last class goal bug has investigated as
the, what if there is a sequences of characters which somehow provide a good feature.
How can we try to see whether they stay together, but you will not get into that So, we
will assume that a simple single point crossover does a job for us and this point is
decided randomly at run time whether it is done here or whether it is done here this is
decided randomly at run time. So, that is the second aspects of random random behavior
that is happening here one is in the rule wheel spinning and the second is in deciding
where the crossover happens. But looking at this example there is a third thing which
would come to mind which is crucial for the success of genetic algorithm

Student: We should take care of extinction

We should take care of extinction. So, how do you do that? Somebody had said diversity
how do you increase diversity?

Student: Keep the best.

No you are look at it from a wrong perspective.

Student: All the feature if we maintained to the external.

No no listen we are trying to design a genetic algorithm; we are not trying to design the
selection or the mating process right. Now, what is the third parameter in designing
genetic algorithm which is going to play critical role here?

Student: Size of the population

The size of the population genetic algorithms will work if the population size is large if
the population is large. Then they will have all kinds of characters inside their population
with carrying all kinds of genes. And they will all have a chance to reproduce in some
sense essentially. Now, which is one of the reasons why we have this third space which
is mutation for the sake of completeness. There is hope that this mutation process will
sometime toggle the middle bit here and we will get going essentially.. But in general

333
you have to choose the large population size for these 2 work essentially. One more
aspect that we discussed in the last class we have not mentioned it now is has to so.

(Refer Slide Time: 47:44)

So, some people have done it has follows that you have a population of some size then
you take the most fit members let us say k most fit members. So, this is k this is n so if
you have n members you take the most k fit members only allow those k fit members to
reproduce and mix up the genes both the process selection and crossover. So, you get a
new populations of k elements. Now, take this whole population here., so whatever
saying that that this k elements get to reproduce which means they die in some sense in
the process of reproduction that we have these 2 parents vanish from the scene and inside
this new children come in essentially. So, we get this new children here k new children
plus we take all these. So, this is k and this is n. So, we have k plus n elements and then
you mix them up and take the best k out of them best n out of this. What we have done
here is that we are not keeping the original population at all here we have said we just
replace the new one.

So, you started with this population and you ended up with this population in the process
we lost the no sorry this population after crossover. So, we started with this one and we
ended up with this and none of them are reproduce there essentially. So, we ended up
with the new population which means we replace the entire whole population with this
new population. But I had said earlier also that it is not necessary that you replace the

334
entire whole population you may want to keep some of the best ones from whole
population into the new one. And this is the one approach that people take just say only
allow the best people to reproduce. Then take the new children plus the old population
and from that choose the best ten and that is the new population which gives a chance for
good solutions to survive in this case essentially. So, I will stop here and in the next
class, we want to come back to genetic algorithms and look at how to solve the traveling
salesman problem. How to devise crossover operations for the travelling salesman
problems and what is the representation that you can choose for representing the TSP
candidate essentially? So, we will stop here.

335
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science
Indian Institute of Technology, Madras

Lecture - 17
Population Based Methods II

(Refer Slide Time: 00:14)

Genetic algorithms and we want to look at GAs for solving TSP. So, as you know the
TSP of the travelling salesmen problem the problem where you have to have
Hamiltonian cycle b in which you visit every city exactly once and come back to the
original cities essentially and 1. So, for example, if you have 9 cities you may have
representation which is that you start with 2. And you go to 4 and you go to 5 and you go
to 6 and you go to 9 go to 3 1 7 and 8. So, this could be 1 to a and since I am doing JAS.
I may want to take 2 to us and try to see how I can generate new to us after that? So let
us I another to so let to say this is 3 7 some 2 to as have had mention this problem earlier
that how do you now, device cross over operators for the TSP problem? Because you
cannot do a single point or multi point cross aver have for the simple reason that
supposing have what a do of cross over some point then I 2 a would be 2 4 5 6 4. Now, I
able to came back 4 which means I close the loop earlier and so I come do a simple thing
like that.

336
So they at 2 questions we want to look at. One is that given this representation and this
representation is called the path representation. Because if all the told represented by
path and it is implicit that from the last city in the path you come back the first city. If
you means start with 2 4 5 6 9 3 1 7 8 and come back to 2 if I rotated this by certain all
would see the same to. So, that the 2 question we want to ask is what the representation
to choose and secondly what cross over function to device all for that would the TSP is
problemise we said which your very interest many people, because it is got estimated
many practical application as will. And people would like to good get good solutions of
TSP problem what about the valuation function? As a simple function in this case which
given a 2 you can the order commutates cost an you can use valuation function keeping
in mind that it is a minimization problem that you want to minimize total cost. So, you
have to work out the problem will it is for selection in a care full fraction which I
inversely proportional to the cost how is it to solve the problem?

Because a very of a you can find representation and it help use all the problem very
quickly essentially. So, let me try out a small game with you. So, the game is that that
say you and I playing this game and we have to divided this 9 cities between us. And we
get to choose city by city so forget of the cost to a cost nothing to do here this is the
different game. And the game is the following they give choose a city and I choose city
then you choose a city and I choose a city and so on. The first person to pick up 3 cities
exactly which is adder to 15 men’s again so nothing do with that your cost just a do with
city labels in the labels are 1 to 9. So, how would you play this game? Let me say I
choose you choose something I choose 5 when will be look at my representation so your
choose on what 8 I choose 5 you will choose a 1 which will n1 that i to a chosen 8 and 7.

Student: I chosen 5 and 4 you on some of 3 city 15 not 2 cities.

Any 3 cities you pick up any 3 cities.

Student: You cannot choose 7 I would suppose game we cannot 7 will start again you
start 4.

I choose 6

337
Sir: I choose 2, I already chosen 6.

(Refer Slide Time: 06:28)

So you choose 4, I choose 6 any choose 5.

Student: no 5 not 5

You set pick 8; you choose 8 then your forcing will choose 3 anyway the point is not to
spend the whole class they in this game. But discuss at that I do not know what can of is
in your doing but, they can of is in I was doing is shown by this figure. That I created
this you recognize this it is a magic who some is 15 and what I was doing? I was playing
cross a cost I do not worth came you’re playing something you a doing with. So, what
the point of the whole thing is the representation of and helps solve the problem.
Especially if in some situation you have knowledge related to some mapping that you
can create with representation. So, will look at representation here also, ideally we would
like to have a representation in which I can do single point cross over for TSP. But this is
not that 1 an there is 1 representation in which we can do single point cross over, but will
come to that later. So, let us look at the various cross over that people have tried working
with TSP is a GAs and if you just look of the were will find quite of your. Now, while
devising cross over 1 thing 1 should keeping mind is at if should be nice if you can

338
device cross over which would allow us to carry forward good feature of 2 hours.

So for example, 2 hour have me features such that some cities thing of cities which away
close each other are covered one by one in that order. And that would be nice feature if
in carried forward from 1 parent to a child to be nice. So, let us look at some cross over’s
so 1 cross over which is called partially mapped crossover also call PMX workers
follows. That you choose 2 random points this is p 1 and this is p 2 so a choose 2 random
points lets a this 1 and this 1. Let me choose 1 here then first you copy whatever is their
between those 2 points so you have I would not draw the segment know 6 9 3 1 here and
9 4 2 5. So, this is parent 1, parent 2 this is child 1, child 2 your trying bill. So, first part
some part of the solution in this copy in to the child is respective child.

Now, remaining the part solution I really would like to get something from the other
parents. So, in partially max map crosser mapping between this city’s 6 and 9, 9 and 4, 3
and 2 and 1 and 5 can use this us following. Now, remember this see 1 which is got part
of the 2 a from the first parent and you on the next from the other parents. So, you try to
copy the less so what are the less? Remaining is 3 7 1 and 6 8 so you try to copy 3 here
in to this so this is c 1 in this c 2.

We find to c 1 can we want to get things for this is p 1 and hence p 2. We are try to know
copy things in to c 1 from p 2 so we are try to copy 3 with 3 we cannot copy, because 3
already exists in this. So, what this appropriate that follow the point from there so from 3
go to 2 and copy 2 here so we get 2 here. This is the class of this possible operators if
you let you a repeat this you are try to copy this 3 here, but you cannot copy 3, because 3
already exists in this child. So, you follow this point a from here and take 2 so you copy
to 2 here then you want to copy 7 here from here that is if you copy 7 here 1 you cannot
copy so if follow the point at to 5 and you get 5 here. Then 6 you cannot copy, because 6
is here the follow the point got to 9, 9 you cannot copy, because 9 is here follow he point
at 4 an 4 can be copy it here. And that is 8 anyways with does not matter. So, this is the
partially mapped cross over and I just illustrated to this example and these the algorithms
for you to like. So, let me us repeat the example this part of the 2 are is common to the
parent. So, this c 1 gets this from p 1 an c 2 gets this from p 2. The remaining part of the
2 which is 2 7 5 4 8 it is got from this process. The right to copy with 3 here, but I cannot

339
copy 3, because 3 is already in might to so a follow the point if from 3 2 this is the
mapping that we a talking about partially mapped process.

So, the mapping is between these set of cities that you can instead of 3 you can place 2
so and this copy the actually both piece when you do the others city you would do a
similar process. So, because I could not copy 3, because already there I go to 2 I am
allowed I can copy to 2 here than 7 I can copy directly. Because the does not interfere
with anything 1 I cannot copy, because it is here so I got to 5 and copy that. And then 4 6
I cannot copies of 6 I go to 9, but 9 already here of 9 go to 4 9 get 4 here and 8 here. So,
let us to the other 1 just complete the example. So, I want to copy 2 here, but I cannot
copy 2 I go to 3 and I can am allow to copy with 3 here than 4 I cannot copy here.
Because its sales I go to 9 I cannot copy 9 so I go to 6 I can copy 6. Then 5 I cannot
copy, because 5 is here I copy 1 here from the mapping then 7 I can copy directly and 8 I
can directly. As this process as a given only these 2 children from this to parents. The
keep into many device is crossover operators is see that the resultant things at you have
are valid to us the and the whole idea is try in device cross over which are valid to us.

Another popular cross over to be slinks is the order cross over. And it works as follows
that that like in this crossover you copy first some things 6 9 3 1 9 4 2 5. And for the
remaining part you take the slinks in the order in which they appear in the other parent so
what do I mean by that? That basically I want all the cities to appear in my 2 out of this,
but I want to select them from the second parent now.

But I got 6 9 3 1 from the first parent so in my first parent I already have for this 9 6 3
and 1 this 4 I already got and what this already crossover is that for the remaining cities
which I want which is 7 4 2 5 and 8 I simply copy them in the other in which appear in
the other parents. So, for example, I can simply an added this in 7 4 2 5 8 how do can
this? 7 is here 4 is here 2 is here 5 is here 8 is here. So, in the order in which occurred in
the other parent I copy them in to this new child. So, this child is got have the 2 are from
its first parent and the remaining 2 are somehow it is got from the other parent with I self
good crossover not you should think about this an may be ((refer time: 16:38)). Let us
say another representation anything of another representation for the traveling salesman
problem. I mean this is the simplest representation the past representation it tells you

340
from which city to go to with city an so an this is that self representation.

(Refer Slide Time: 17:09)

So, there is one representation which is called ordinal representation and another 1 which
is called adjacency representation. Let us do the other 1 first and this works as follows
that I created representation for this to find something 3 7 1 9 4 2 5 6 8 and it work as
follows. That we haven index this is as index in my representation and the value that is a
told for each sell in the value that I would go. So, you if i is a index which is one of the
value. Then the value that I will tell is the city that I will go to from higher cities. So, if I
just look at this to this too is 3 7 1 9 4 2 5 6 8 from 1 I am going to 9. So, I have 9 here
then from 2 I am going to 5 I have 5 here. From 3 I am going to 7 here, from 4 I am
going to 2, from 5 I am going to 6. From 6 I am going to 8, from 7 I am going to 1, from
8 going to 3 which is the first 1 from 9 I am going to 4. So, this representation is called
as adjacency representation this a different representation of a too are ((refer time:
19:33)). So, you got if how we got this representation? So this is the representation of
with this 2 only same as this to its us senses the representation is different. This choice I
that I am going from 3 to 7 to 1 to 9 to 4 to 2 to 5 to 6 to 8.

This representation says that you tell me any city and I tell you where to go from that
city. So, if you tell me city 3 where you going from 3, so look at this index 3 you say

341
going to 7 where you going from 6 you going from 6 to 8. So, if you see a 6 to 8, so
these a different is a called adjacency. Now, every not every premonition is a legal
adjacency representation so let me given example. Supposing I want to represent
something like this you go to 2 from 1 and you go to 1 from 2 and so on. Now,
obviously, formation of number 1 to 9 I have shown you formation the I do not care what
is the rest I am saying the this is not a valid told. Because from 1 you going 2 according
to this is the index from 1 you going 2 from 2 you a going to 1 so already done know in
this cycle so it cannot to be part of a. So, not so is my poser to you in the past
representation every formation in the value to a in adjacency representation is not every
formation in the value to so is an in conscience here I am losing out on some 2 hours
here.

So, if will member that in the past representation if I what a located this representation
by cycle of any length. For example, I would start with 4 2 5 6 8 3 7 1 9 then I would get
same to our back which is the different formation, but the same 2 hours essentially where
is every 2 are has a unique representation in the adjacency representation. Whereas, to be
may in 9 city to a you a start from any other 9 cities and that would be still the same to
((refer time: 28:17)). Now, given the adjacency representation so again of will it as this is
the index can you thing of a crossover. Given 2, 2 as adjacency representation can you
know exploit this fact that you know that from this city 1 parent is going to 1. So, let us
to the other 1 in the other 1 you going from 1 to 7 from 2, 4 from 3 to 1 from 4 to 5 from
5 to 6 from 6 to 9 from 7 to 8 from 8 to 2 and from 9 to 3 S. I have this to 2 us in the in
the first to this is p 1 and this p 2 the different adjacency representation work and of a
customer of at any think of.

Especially if you want to ensure that one of the child is at least a better child better than
the parents look as a 1 once simple crossover operator it is called alternating crossover.
And we basically says that all alternating cities you pick the parent from the you pick the
next city from the different parent alternating parents. So, choose from known parents so
that construct 1 2 so you choose let us say the first so you a constructing lets a c 3
another child we choose the first 1 from p 1. So, this say you are going to 7 from 1, we a
going to 7 then from 7 where are we going to go 1 city says go to 1. And other city says
go to 8 essentially from 7. So, here we cannot choose 1, because if at know you can

342
choose 1 in other problem we choose know you cannot choose 1. If you go for 1 to 7 and
from 7 to 1 in you already in cycles so you cannot choose 1. So, at this point you only to
choose 1, but you are not to allow choose 1, because it is form a cycle you can either
choose other parent or you can make an random choice so people if I both of something.

So, the basic idea in alternating crossover is to try and choose as you construct that the
parent from alternate all the next city from the alternating parent. And if you cannot then
you to result to some 10 of random process any things next so either you can u use 8 here
or you can choose random process. So, let us say we choose 8 here and than we say for
the next 1 from 8 you go to 3 choose from the other 1 you put 3 here. And then from 3
choose from the first parent you 1 go to 1 so you go to 7 you 1 go to 7 so you choose
something random. So, you can see that neither of them can come here, because if you
go to choose 1 here then you saying form 1 go to 7, from 7 go to 8 from 8 go to 3 and
from 3 go to 1. So, that is so cycle that you do not want to form and for the single reason
you we any way you cannot go to 7 from 2 cities so you can put 7 there so you would put
7 random value here. You can the exchange this idea in say the alternating you can say
subtour crossover.

So, thus as be selected subtour to here in both these example you can say choose a
subtour from 1 parent then choose a next subtour from the next parent an then choose
remaining from the first and so on. Also, in say of the alternating cities you can alternate
subtour crossover you can exchange in this idea. But the most in testing 1 here is called
Heuristic crossover, so let me write it here. And what the heuristic crossover says is at
you first random a choose some city so let us say you started city 6 or city 1 lets a city 6
in another matter. Then for deciding as to way to go from that city you look at the 2
toward if your chosen 6. For example, as a starting point 1 parent says go to 8 the other
parent says go to 9 from 6 the heuristic crossover says that take the short a of the 2 adjust
so 1 says go from 6 to 8 1 goes other says go to 6 to 9. This one says the choose the
shorter of the 2 in this essentially. So, depending on what the values are we will choose
the 8 or 9 in then construct the 2 of a from 8 other 9 in a similar. At every stage we try to
choose where to go from that city based on the distance to the next city and that is easy
to do with this I, because you are now easing in this fashion essentially. Try to do that
heuristic crossover with this and should be quite difficult is an essentially. Because this

343
as the index representation it tells you exactly that if you 1 go from 6 from city 5 what
are 2 parents in? Here you would at scan the whole thing in say 5 from 5 go to 6 and then
5 from 5 go to 6 in this also.

(Refer Slide Time: 30:09)

So, the last one that you on look at is called ordinal representation and it work as follows.
The let us take the first 2 an you just copy here 2 4 5 6 9 3 in the ordinary representation
what you do? You just listen carefully to this process you look at position, so think of
this index this is 1 2 and so on as both. So, this is the first city in a 2 a second city is a
third city is a 4 city is and so on. You have to a reference in this and is so less assume as
the reference index simply this which is the already in this cities name. So the first city in
you 2 is 2 and that is second and the reference index so you right 2 here. But your know
this from here reference here new reference is the reference without the city 2 the next
city in your 2 are is 4 and its reference is 1 2 3 so you right 3 here and we remove that
from there. The next city here 2 are 5 and its indexes 3 here we right 3 here again and
remove this from here. And next cities 6 and its indexes 1 2 3 its again 3 next cities 9 so
remove 6 from here 1 2 3 4 5, 5 here we remove that the next 3 which is 2 here. And we
remove this and 1 which is 1 an then remove 1 then 7 which is 1 we remover 7 and 8 at
this representation of the 2 are is calling ordinary representation.

344
So, just repeat what we are doing here is that we have a reference index which is
dynamic which is only the index which have the new cities the remaining cities.
Essentially we have all 1 to 9 so 2 is in position to here so with 8 then we remove to then
4 comes to position 3 then we remove 4 from their 5 comes in position 3 remove 5 and
so on 1 we get this 2. As an exercise I will ask to construct the to a ordinary
representation for the second 2 a as well all lets to it know so 3 is in position 3 when we
remove 3 we get 7 which will be an position 6 and so on. So new I live it is an exercise
we to do. The interesting thing and this is the exercise it I want to try would is at 1 to
represent 2 2 us using an ordinary representation then you just to a single point
crossover. And you get 2 valued 2 us so why do you like this single point crossover? We
like it simply, because it is very easy to implement that is once you have a representation
the already do is cut of this thing at 1 place and put together the cut race and new haven
1 new 2. But if you try to imaging words happing here is at with is few different schemes
of representation they allow you different says of crossover.

And for example, this adjacency representation which is 1 of the most popular it allows it
you some form of heuristic knowledge we says that from a given city you can makes
some in form choices to where to go next city. So, even if you 1 not able to fill in the
parent from 1 of the 2 base on this whether it is alternating crossover or subtour
crossover or heuristic crossover if you are not able to filling from another parent. You
can say heuristically choose the closer city which is the allowed and try to bill shorter 2
to us in the you are the ultimate go list bill shorter to us where this presentation does not
allow any such at all this presentation the earn allow that. But it at is allows it do a
simple crossover of essentially and remember that is crossover as to be done in every
cycle for in every of parents at a choose essentially.

345
(Refer Slide Time: 36:20)

So continue in our study of population based methods. We look at nature again to get
inspiration from different way that the members of the population interact with each
other. So, member, that we a said, that call genetic algorithms some it action in genetic
level which is the process of recombination at genotype level. So, recombination happen
as a genotype level where as selection ((refer time; 37:28)) happens of the phenotype
level. So, the such is happening at a micro level the mixing of gens from 2 different
parents to come off with new candidates. And those candidates fainting without in some
real the women training to serve I have essentially. Now, there is the another clause of
methods which some people call as cultural algorithms, because they involve high level
interaction between the agency at what you my say is a cultural level.

346
(Refer Slide Time: 38:18)

And there are various names by which these algorithms go so 1 is all swarm intelligence
and it is in quite popular as optimization method in the recent pass. Even choose a term
ant colony of optimization and we will look at continue looking at the TSP problem for
ant colony optimization. So, you must the familiar with ant colony this ant colony was a
terms which was use by ((refer time; 39:06)) in one of this book ((refer time; 39:07)) the
very shows that an individual ant cannot be a very effective feature in some sense. But I
colony of fans all society of fans is very successful in that task what is the main task at
ant such trying do? There the building nest that were there is the we and sitting out they
an there is worker and who job is so go out to in the world for verge in food and get
some food bag essentially. Now, in terms of that am interact with each other to a process
of symbol essentially and this kind of interaction called mimetic is that 2 symbols 2 signs
ant symbol essentially. And is very common in the animal world for animals for feature
to use symbol for examples you know animal used to mark territory essentially dogs
another predators used to mark territory.

But these ants are using it know kind of cooperative fashion and they want to used. They
want to convey were they having being to the less of the ant community and they do this
by a process of pheromone trails. So, pheromone is a chemical sentences which an ant
drops wherever it goes essentially. So, you can a imagine situation in the real world

347
where the ants are answer exploring some piece an dropping pheromone on the way back
essentially. Now, how to the use is pheromone? Either drop was themselves always
somebody else is that have a tendency to go in a direction which as pheromone trails
essentially. So, it is like marking out pass for each other and following the pass
essentially. So you can imaging the situation where there is an ant nest and that is if 4
hypothetical ants go of in 4 different direction in such of a food. And let us say this is
founds some sugar here some 4 it is found essentially 1 some food as the end of the trail
essentially. So, let us a why in this happens after some time some more ants come out
and them what to the do? So initially 4 ant spent out there after sometime 4 so let us a 4
more just for the sake of the argument 4 more ants come out then they likely to follow
the trails. Some goes this way some goes this way some goes this way some go that.

Now, this ant which is found this piece of food what was it do? Very unselfishly it brings
bag to the nest essentially so once it is found this food it just follow its own trail back
and goes back to the next and deposits a food. But in the process if as tendency the trail it
as deposited more pheromone on the way back. So, this amount of pheromone on this
trail because more essentially. Now, if you assume that that ant has a tendency to move
al1 pheromone trail in the more pheromone there is in a kale the more the antive likely to
following. So, the next ant which comes out is more like to choose this selection then in
your base directions and follow that kale. And notice that once more and more ants go
along with trail more and more pheromone will be deposited along the way and the kale
becomes strong and stronger. And eventually after period of time you will find that this
entered ant colony is sub tour travelling food and back essentially.

Hence it is a common pheromone you must have absorb in the real world essentially. Not
only that if you go to put an optical in the pass of these ants if you own experiment then
off course we cannot follow the trail. So, this part the cannot full so they will start
diverting in different direction. But eventually some of them will come back to this trail
or some of them may find different pass of the trail some make come back to the trail.
And in the thing is that the ant switch come back to this trail will do it faster essentially.
So, they will is establish of pheromone trail faster than an ant with goals of wonders of in
count pack. And this fact that ants which happen to do things faster they live the trails
earlier. And more ants follow on that trail can be seen as process of optimization,

348
because they are so the fine in the shorter paths shortest paths of found first. Assuming
the they enough number of ants in this environment shortest pass will be found first and
they will be followed first in so on about. In that sense you will see that ants for you
open fine go to task to the to the source to the source of food an back to the essentially.

(Refer Slide Time: 44:43)

Now, there was is group of miss chest that by DORIGO at an who device is all go set up
all will comes call an colony optimization. And we look at it with an example of the TSP
problem. And the way that is algorithms works is follows that let they we N cities and let
they we M ants. And we assume that is ants kind of distributed randomly across these
cities. The basic process that we will follow is at each ant constructs a tour in a greedy
fashion. Each ant constructs to a Grady fashion and this will take N time steps because
the to a will have to have N cities. And each ant also deposits pheromone we just use at
the pheromone again on the tour it constructs. So, you can imagine that all these M ants
are there trying to find so all that travelling salesmen problem try to fine the shortest
tour. Each an constructs a tour in a Grady fashion and deposits certain want of
pheromone on the tour it is created. So, the way will be look at tour is that we will look
at the amount of pheromone at the end of N time units after each ants constructs a tour.

So, we will use T i j to stand for the pheromone on the link from the higher city to this j i

349
city. So, this is city i this is city j an than off course there is the other thing essentially so
how much pheromone is on this? You will do not by T i j and we will update it at every
N time units. So, if you know the tau pherom1 on time t you can compute the pheromone
at time T plus N after each. And this will be updated as some of the whole pheromone
that was let from earlier times plus new pheromone which has been deposited by ants in
the last 2. And the whole pheromone that will deposit in earlier times as the tendency
you operate. So, we will assume that some of it will you operate so whatever the value
was at time T it we get you operate by some value which is 1 minus let us a lambda
where lambda we will calls as a coefficient of you operation. So, lambda tells you how
fast the pheromone you operates. If lambda is high if it is close to 1 then you can see that
none of the whole will remain if lambda is low.

Then you can see that some of 2 whole will remain plus the new pheromone that is
deposits and that we will call as tau i j n let a say. We can say the incremental amount is
delta tau i j at time t plus n let us say. So, this is the incremental amount of pherom1 that
and so deposited in the last cycle and this is for tremens of the whole pherom1
essentially an how much is this amount? This is basically the sum fall i is equal to 1 to n
for each of the ants the amount of pheromone deposited while in the k let use k here were
using i n j here deposited by the k ant essentially. So, the total pherom1 deposited in the
last cycle is the sum of the pheromone to deposited by all the ants and for each ants this
term delta tau i j k would be either 0 or little be sum value it to be 0 if that and does not
have this i j h in h 2 an it to non zero if it is it has i at in this 2.

So that 2 cases a 0 if ant does not tours edge i j otherwise it is a value it is denoted by q
divided by l k were q is sum constant that we use control this own computation. And l k
is a cost of tour found by is ant kin the k ant essentially. So, the important thing here the
way the this dapples from the real world situation is at the amount of tour amount of
pheromone that an deposits on an edge is inversely proportional to length of the tour. It is
found which means that the ant switch fine shorter to us will end of the deposits more
pheromone on those i j is and I am switch a found long to us will at the deposited in less
pheromone. So, again we want reinforce the idea that an sit of fine shorter to us they will
encourage other and so follow that real essentially and that why I will do deposit more
pheromone essentially. So, this is how pheromone is updated at the end of every cycle

350
now that lives is part of the Grady to a construction how does and actually constructed to
is basically simple stochastic process in which an ant and looks at a set of cities at it can
move to and moves to it in a holistic fashion.

So, let us assume that ant at a city i moves to city j with a probability little bit like what
we did in the simulated ((refer time; 53:01)) which is the proportional to the following.
The amount of pheromone on that edge that it is considering raise to sum power alpha
multiply it by a factor which is called eta i j is to power beta, so again alpha and beta to
constant, if you control the influence of these 2 factors. So, there 2 factors tau i j is a
want of pherom1 on that which an eta i j is called visibility. Basically it is inversely
proportion to cost of that edge. So, in other words if that the adjust very if the if the city j
is very close to city i and this eta factor will shoot up. Because this is inversely
proportion to cost and then the ants so this factors says that the ant is likely to choose a
city which is close to the city i at which it is.

This factor says the ants likely to follow that to good go to that city on which the
pheromone trails is higher. And then they a control by these 2 parameters divided by sum
of all a city that it can go to so T i h is to alpha eta i h is to beta were h is the set of
allowed city and what you mean by allowed cities? Cities which will not close a loop in
is remember the, that ask at we a try to find is complete tour the ants a so as at ask of
constructing a complete tour in a Grady fashion. So, it goes of city to city and at any
point it only considers those cities as prospective destinations which will not close to
((refer time; 55:24)). So, essentially what this ants it doing is sitting at this location i and
it is got is we a assuming a completely connecting graph here the it as we do in the TSP
so it can move to any other city.

But we will only choose those cities which first of all it has not already tour, because you
do not wonder close the loop again so it cannot go to city which it has already tour
essentially. So, these only set of allowed cities and from the allowed cities we chooses is
to move to 1 of them call j probabilistically. And the probability is proportion 2 thing;
one is how strong is pheromone trail on that h. And secondly how close that new city is
to the current city, because eventually you want to find shortest pass. In shortest pass is
always be need of it adjust which are short adjust essentially so this is the ant colony

351
optimization algorithm.

You have the set up ants each ant construct a tour then base on the 2 it is constructed
each ant deposits a certain amount of pheromone on every edge that it has moved on and
that the end of this whole process, we update the amount of pheromone on every edge in
the graph. So, edges on which ants will travel will have pheromone edges on which ants
will travel to fine short to us will have more pheromone. And they will influence more
ants to follow, because of this factor on those adjust. So, in the end ideally like in this
situation if all the ants of moving along like 1 caravan. In the TS also 1 would expect all
the ants would be travelling 1 optimal to essentially. And it would be a go to exercise for
you to influent this an colony optimization and see how man, how the ants out up
converts to the shortest to at least for small problems you can we seen quite easily. So,
will stop here and in the next class we will move away from this optimization. And we
go back to state space such and try to see deterministic algorithms for fine in shortest
pass essentially.

352
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science
Indian Institute of Technology, Madras

Lecture - 18
Branch and Bound, Dijkstra’s Algorithm

Today, we are going to shift focus, a little bit to see, where we are going. Let us just do a
quick recap of what we have done so far.

(Refer Slide Time: 00:24)

We started with state space search and then, we went on to heuristic search and from
there, we went to optimization. The idea was, that the idea of optimization came in,
because we said that instead of looking for the goal state, we will try to optimize the
heuristic value of the evaluation function, or the objective function, as the case maybe.
The idea of using heuristics was to find solutions faster, because we discovered that
search spaces tend to grow exponentially. We were looking for a function or some kind
of heuristic knowledge, which comes from the domain, which will guide the search
towards the solution, and not go off in other directions, in the hope that we will find it
faster, essentially, because we do not want to run into exponential times.

Today, we want to shift focus and look at the other aspects. So, once you find a solution,
let us say, you are solving problems in some domain, or let us say it is a logistics domain,

353
or you are running a courier company, and you have to send people all over the place to
collect parcels and deliver parcels, and things like that. The solution that you find has to
be executed, and has an associated cost, essentially. Today, we will look, we will shift
focus to finding solutions, which have an optimal or least cost solutions, essentially. That
is critical in many domains. To take an extreme example; if 10 years from now, you start
a company in which you are setting up a colony on Mars, then you can imagine, that the
cost of each trip is going to be significant, and you would like to have a solution in
which, there are minimum number of trips, essentially. So, maybe, on the way you stop
on the Moon, or something like that, or more coming down to Earth. If you are traveling
like a traveling salesman, going through a few cities; you may want to optimize the cost
of your tour, or if you are using some vehicle, let say, daily in a city, because of your job.
Unless you happen to be a politician in the government or a bureaucrat in the
government, who get apparently unlimited fuel supplies. You would be worried about
the cost of fuel, and you would want to find solutions, which are optimal, essentially.

So, our focus is going to be on finding optimal solutions, now. Of course, one might ask
as to how is this different, from the optimization that we have been studying so far
because, for example, we said that we want to solve the TSP, when we were looking at
example of optimization. So, it is not really different; it is a same process; it is just that I
want to be clear on the motivation. Earlier, we went for this process of devising the idea
of a heuristic function, and trying to find optimal solutions for that, or trying to find
optimum values for the heuristic function. In the process, we, sort of wondered into
optimization, which is what we interested in, but we never looked at solutions, which
guarantee optimality. We only looked at randomized methods like simulated, unhealing
and genetic algorithms, and colony optimization, which are likely to work most of the
time, but not necessarily guarantee optimal solutions. So, what we want to do now is to
look at methods, deterministic methods, which will guarantee optimal solutions.

So, these are the two different aspects of problem solving; one is, how long do you take
to solve the problem. Heuristic function is devised to speed up that process. The second
aspect is, how good a solution you find, and that is the aspect that we are going to focus
on today. How to look at optimal, how to find optimal solutions? So far, we have not had
a notion of cost, in the solution finding process and whenever, we spoke about quality of
a solution, we said the number of steps or the length of the solution. So, now, let us

354
introduce cost, which means we will introduce a weight for every edge in the search
space and then, we want to find solutions, which are optimal cost. So, it is like finding
the shortest path, and that is an abstraction of all these problems that we are looking at,
essentially; finding shortest path in a graph. Let us start with a small example graph, just
to illustrate the algorithm set that we are looking at.

(Refer Slide Time: 06:02)

So, let us say, this is a start node and then, you go to some node called A. The cost of
going to that node is 4, or you can go to B. The cost of going to b is 5 and the cost of this
edge is, let us say, 3. Then, you have, let us say, a few more nodes; C, D, E. Let us give
some cost to these. So, let us say, this is 6 and let us say, this is 8 and let us say, this is 4
and let us say, this is also, 4. Let us say, this is an expensive edge, costing 15 and this is
2. Let us say, this is goal node that you want to reach, and the cost of going to goal node
from here, is 2. Let us say, there are some more edges that I am not drawing, because we
do not want to have an exploding search space. Let us say, this is an example, you want
to start from S, and you want to find the path to G, which is of the shortest paths,
actually.

355
(Refer Slide Time: 07:45)

So, the general algorithm that we will follow can be abstracted as follows; that refines or
extends the least cost, and I must emphasize least estimated cost, because we do not
know the actual cost. So, the main problem solving step, which is basically, very similar
to what we were doing earlier. We have a move gen function, which generates
successors and you have to choose one of them, and so on. But, we can view this now, a
little bit more, abstractly, in the sense that we are working in the space of possible
solutions. We do not have all the solutions; we have some of them, partially defined,
essentially with every partial solution, we have an estimated cost. When we say
estimated cost, we mean them; estimation of the cost of the total solution, if that was to
be completely refined. So, we will use a term; refine, to say that if we have a partial
solution, and we had a little bit more information about that solution.

For example, if you are solving a TSP, if you have put in five edges, and if you had one
more sixth edge, then that is a little bit more refined. So, in the process of you keep
refining partial solutions, till you have a complete solution and then, you would stop. The
algorithm that we will use is that; extend the least estimated cost partial solution, till such
a solution is fully refined because, in the end, we are interested in a solution, complete
solution. So, actually, I must clarify what I mean by this, till such, and we will see this is,
when the example comes out; that till the least cost solution is fully refined. I, instead of
writing that, I have just written; till such as solution. But, you must read this such as; a
least cost solution is fully refined. Just imagine this situation where, you have some

356
partial solutions and some, at least one fully refined solution, or more than one fully
refined solution, and you have estimated cost for them. Now, for fully refined solutions,
there is no notion of estimate; you actually, know the actual cost, whereas, only for
partial solution you have to estimate cost.

Now, if one fully refined solution has a lowest cost, then we will say; we can terminate.
That is going to be the basic idea. We will discuss this as to, when is this idea sound; or
when this is idea, guarantee to give you the shorter solution. This is the loop that in
which we will operate, in the next few classes. It is just that our notion of partial solution
may change, as you go along little bit. So, let us look at this from the state space
prospective, as we have done earlier. When I look at this algorithm, you must keep in
mind, that the similarity with Dijikistra’s algorithm, essentially. So, I take it that
everybody is familiar with Dijikistra’s shorter spark algorithm, which takes a single
source and finds shorter spark, to the rest of the graph, essentially. So, we are in some
sense going to mimic that algorithm, but our goal is not to find solutions to all the nodes;
but we will be doing something similar, essentially.

(Refer Slide Time: 11:42)

The algorithm that we are looking at is called Branch and Bound. By branching, we
mean this process of refinement, or extension, and by bounding, we mean excluding
some solutions. So, remember what I said that at the point, which we terminate, we will
not bother about some solutions, which are not fully refined, because we would be able

357
to exclude them, or we would be, they have gone beyond some bound, that we are
working with, and we do not need to refine them, any further. So, this general idea is
called Branch and Bound. So, let us first simulate this algorithm on this graph.

(Refer Slide Time: 12:56)

So, you start with a search node and it is very much like the other algorithm that we have
seen. We generate its children, in this case; A and B. In this stimulation, we will assume
that we are not going to remove duplicates, essentially, because it is possible that we may
later find a shorter path, to a given node, essentially. So, we do not want to say if you
have seen that node, do not generate it again, essentially. So, you can say this is
searching over paths. That instead of saying, that we are visiting a node, for example, D
or C, we will say, we are inspecting a path from source to D or source to E, and we are
looking at different possible paths, and we will. So, if there is one path S, A, D, then this
is a different candidate from another path, which is S, B, D, essentially. So, we will treat
them as separate. So far, we are saying, if you have come to D before, do not bother. You
have already; so, do not put it into open. That is what you use to say.

Now, the cost of s is 0. The cost of this path is 4, and the cost of this path is 5, so the cost
of that edges. The algorithm is simple; refine or extend the least estimated cost partial
solution. So, instead of saying that, we are taking the cost of the complete partial solution
and estimate of the complete partial solution. We will work with the known cost that we
have, which is a cost up to A, essentially. So, as far as we are concerned, we have two

358
partial solutions here; one goes, say, goes from S to A; the other, say, goes some S to B.
This one cost 4 and this one cost 5, and we will say that this is better than S. So, we do
not have any sense of direction. So, this is a simple process, we will follow. We say that
this has been inspected. In Dijikistra’s algorithm, you would do something, very similar.

I will just do the two algorithms in parallel. In Dijikistra’s algorithm, you would
initialize S to cost 0, and everything else to cost infinity, and maybe, in some versions of
the algorithm, some description of the algorithm, you will color them white, and you will
color this; or color everything white to start with, which is like putting them on open, or
something like that, and pick one node from there. So, there is a stage of relaxation in the
rest of the algorithm, which says that once you relax this node, or once you inspect this
node S, you relax all edges going out of that, which means this cost originally was
infinity, and now, it is reduced to 4. Because, we know that you can get from S to A is 4.
So, we revise this cost to 4 and revise this cost to 5; exactly, what we are doing here,
except that in the Dijikistra’s algorithm, you would do it on the graph itself. Then, you
would pick the lowest cost note. So, the Dijikistra’s would also pick this and color it
black, for example, and then relax these three edges, which I am also saying, that we will
generate these three edges. So, one simplifying assumption while drawing, we will make
is, that we are not going to go back; we are not going to allow loops, because we know
that loops are only going to increase the cost. So, if I go back, go from S to A and A to S,
it is not going to help me, in any way. One assumption, that we are working with like,
the Dijikistra does, that cost are all positive here. So, pick the least cost node S A. So, let
us say, this also, I am coloring it black here. The Dijikistra also would color it black, and
there would be a pointer pointing back to this. Then Dijikistra would color this one,
black, and try to; when you relax this edge, you find that 4 plus 3, 7. So, it remains like
that, whereas, this becomes 8 this, in arrow, which comes here, this becomes 19 and an
arrow comes here.

So, there are these costs associated with nodes, which I am not writing there; which I am
writing here. From A, you can go to B, or you can go to D, or you can go to E, and the
costs are, as we said, 19 for this; D is 12 and B is 5. So, very similar to what we were
doing in heuristic search, always pick the lowest cost node, except that when you are
doing heuristic search, it used to be lowest heuristic value, note; here, it is the lowest
known partial solution cost. So, this is, sorry, this is not 5; this is 7. From S from A to B,

359
then if you come like this, it is 7, essentially. Then, you pick this B and from B, you can
go to C, or you can go to A, or you can go to D. If you go to C, then you have a cost of 9.
If you go to A, then you have a cost of 8. If you go to D, then you have a cost of 11. So,
this process continues, the rest of the couple of more rounds and then, we will stop. The
lowest node now is B. So, we do this. As I said, we will not allow loops. So, we will not
allow A or B to B, because A or b is in the path here. So, the only thing you can do is, go
to C or to D; C D. So, 7 plus 4; this is 11 and 7 plus 6 is 13. So, you can see, we have
found two paths to D; one path costing 11, which goes from S to B to D, which is cost
11; the other is from S to A to B to D, which cost 13.

Let us say, there is another edge, which I have not drawn earlier; forgot to draw earlier.
This is costing 6, 7 units, let us say. Now, this is, A is the lowest node that I can expand.
From A, I can go to; I cannot go to S; I cannot go to B; I can go to D or to E, and the cost
of going to D is 16, here, and cost of going to E is 23, here. I have done away with it.
Now, something interesting is happening that I am going to expand C. C and D is
interesting, because they are the two nodes, which lead us to the goal, essentially. So, at
this point, this node D has become the lowest cost node. So, this is 9, 16, 23, 11, 12, 13,
and 19. I expand this C and from C, I can go either to D, along this path, or I can go to G.
If I go to D from here, it is going to be 9 plus 4, 13. If I go to G from here, it is going to
be 9 plus 7, 16.

So, I have found one path to the goal. So, let me highlight that path. I am going from S to
B, B to C and C to G, and that is the path represented by this node G here; S to B, B to C
and B to G. So, this is gone into closed. So, should I terminate, or if you go back to what
Dijikistra would have done. What have we done? We have colored A, we have colored
B, we have colored C, and we have not yet, colored D, or E. Now, if you look at this
graph, there is a path going from S to B to D, which is 11 plus 2, 13. Now, that path is
better than this path, which is of length 16. If I want my algorithm to find the optimal
path, I cannot stop at this stage, which is why, we have this condition, till such a solution
is fully refined, and such, I mean, the least cost solution is fully refined. Now, in this
case, the least cost solution is this D and this C. So, let us say, without loss of generality,
we pick D from here. Once we pick D from here, we will add this; we have already seen
C. So, from D, you can go to G to C and to E. So, let us not worry about C and G. The
cost from S to B is 5 and then, another 6 and then, 2 and we have this cost of 13. Now,

360
we have added, in some sense to open, if you want to say; this path S, B, D, G to open,
and we also have another path S, B, C, G to open, but none of them is at the head of the
queue. Remember, that we will use something like a priority queue to increment this.
Before we come to this G, we will exhaust this option of C, we will exhaust this option
of D, and we will exhaust; yes, only these two options, you have to exhaust. This one,
because this is lower cost, and this is lower cost; once we expand them or refine them,
we will get more expensive solutions, and at that point, this G will become the least cost
and then, we can terminate.

So, this is the idea of Branch and Bound, essentially; that what do we mean by bound
here? That the moment, when we expand this node with 13, we are going to bound; we
are not going to be interested in this node with 19, or this node with 16, or this node with
16, or this node with 23. Even though, they are not complete solutions, we know that if
we are to refine them, there cost would be more than 613 here, and we have found this
complete solution of paths cost 13 and the partial solution are of higher cost. Therefore, I
can terminate at this stage. So, at the moment, when you pick the goal node or the path to
the goal node, you can terminate.

Let us look at another example, which is that traveling salesman problem, which we are
so interested in, but before I do that, I want to ask a question, a few. I used a term
estimated cost here, and when I say estimated cost, I mean the estimated cost of the full
solution; not the partial solution. Here, we are working only with partial solution. So,
when you say the cost of this path is 12, we mean that to go from S to A and A to D is
12. It is not telling you how much it would cost through, if you went to the goal, along
with this path. Now, I am talking about that; that instead of using this exact known cost
of partials solutions, let us work with estimated cost of full solutions.

Let us say, the estimated cost of any solution is C, and the actual cost of the same
solution, and by actual cost, I mean, if I want to refine that solution completely, let me
use the term; C star. The question that I want to ask is the mechanism that I used for
estimation; it can do three things. One is, it can give me a perfect estimate, in which case,
this would be; I would use an equality sign C equal to C star, but that is only a wishful
hope, that you will find an estimate, which is perfect. So, let us hope that, let us say, that
we cannot find an estimate, which is perfect, which leaves us two choices; either greater
than or less than. So, we have eliminated that choice, because we have granted that we

361
will never find such an estimate. Now, if I had a choice between an estimating function,
which underestimates a cost, and a choice with a different estimation function, which
over estimates the cost. So, let say C 1 and C 2 are such functions. Let us say, C 1 is less
than C star, and C 2 is less than, is greater than C star; always greater then C star; which
one, should I use? In other words, while devising an estimating function, what property
should I take care to satisfy? Should I use C 1 or should I use C 2? How many people
here, feel it is C 2, and how many feel, it is C 1? The rest are undecided or not awake.
Anyway, just keep this in mind, I will ask this question in a little while again, essentially.
Now, let us spend a little bit of time on the travelling salesman problem, again. This
time, we are looking for an exact solution; even though it is NP hard problem. We want
to look at methods, which will give us exact solutions. Maybe, we cannot solve very big
problems, but at least, for the smaller problems, we want that, essentially.

(Refer Slide Time: 27:58)

We look at something called refinement search. By refinement search, I mean the


following; that consider the set of all possible tours, that will be the root node of our
search three, the set of all possible tours, and then by some operator, I will partition the
set into smaller sets, essentially. What is an operator? One operator could be the
choosing an edge, for example. Then, in the process, I want to refine so, essentially. I
also want to talk about estimated cost. So, one thing that while I draw a small example,
you should think about is; given a problem, which means given a set of cities, and the
cost of edges between those cities; how can you estimate the cost of a tour? In other

362
words, can you find a lower bound of a tour, or maybe, if you are interested, can you find
a upper bound of a tour; any tour? Or in other words, all tours are going to be greater
than some lower bound; can we find such a lower bound? So, let me write a small
example. Let us say, we have these five cities; Chennai, Bangalore, Hyderabad, Bombay
and Delhi and. So, Chennai, Bangalore, Hyderabad, Mumbai; I should say, otherwise,
there is always a danger; and Delhi. So, I need the edge cost between these cities. So, I
am just drawing the edge matrix to capture those costs. So, these are 0s; the diagonal
elements, because you are already in that city. Let us take some simple values. Let us
say, between Chennai and Bangalore, the cost is 300 Kilometers; between Chennai and
Hyderabad, it is 600 Kilometers; Chennai and Mumbai; it is, let us say, 1000 Kilometers,
and Delhi is 2000 Kilometers. These are not correct figures, but there was a list. So,
given these figures, can you find an estimated cost? I am asking you a specific question
now; can you find a lower bound on the tour cost; which means that no tour can be
cheaper than that cost.

Let us put in some more values. Let us say, Bangalore and Hyderabad, let us say, 500;
Bangalore and Mumbai is, let us say, 900; and Delhi is the farthest from everything. So,
let us say, 2100 here, and 1500 here, and just some random, a close to random figures,
and that leaves me with, between Mumbai and Hyderabad, let us say, that is 700. So, this
is my matrix given to me; edge cost, and I want find the solution for the travelling
salesman. My refinement search is going to do the following; that my root is going to be
S; I will just call it S, and this is a set of all tours. What I am asking you is that for this
node S or route S, which consist of all possible tours; what would be a cost that I would
want to associate with that, which is the lowest possible cost that I can think of,
essentially? Now obviously, you can say 0 is a lower bound, because definitely, every
node, every tour will have cost greater than 0, but I am not interested in such a trivial
lower bound. You can even say 300 is a lower bound, for example, but I am not
interested in that. The reason for that is that, if I am going to do this Branch and Bound, I
am interested in excluding candidates from my search space, and I can only exclude
candidates, if their estimated cost is higher than my actual cost of some known solutions.
To see an example here, if I want to expand this known solution, is cost 13, this
estimated cost is 19. Now, this 19 is actually, the actual cost of going from S to A to E,
but I can be overly optimistic and say, that is actually, the estimated cost of going to G
from this path, essentially. The rest of the edges, I have cost 0; I can be overly optimistic

363
about that. So, I can treat this actual cost of going to E, as estimated cost of going to G,
via this path, essentially, but even, if that were to be the case, I know that the estimated
cost is 19, and this actual cost is 13, and this is only going to increase as I refine the
solution further.

So, it can never become better than 13. So, I do not really need to refine that. I am going
to; that is a bounding, I mean. I am just not looking at that, this thing. So, for such
reasons, that I should be able to exclude bad candidates, as quickly as possible, I need
estimated costs, which are as high as possible. So, as I said, you can always give me 0 as
an estimated cost, or 300 as an estimated cost, but I am not interested in that, because
they will not exclude solutions from a search space, essentially. You were saying
something.

Student: (( ))

Prof: Yes. But I am not interested in higher bounds, so much. So, let us talk about lower
bounds, essentially.

Student: Lower shortest four edges.

Prof: Lower shortest four edges; why not five?

Student: (( ))

Prof: Yes, but that would give me what; 300 plus 500, even assuming that I will look at
only the diagonal matrix, I mean, triangular matrix; 300 plus 500 plus 600 plus 700. That
would, you know, something like that, essentially.

But, all my desire is to get as high an estimate as possible. So, the general idea is that I
want as high an estimate as possible, but it is should be a lower bound, essentially; these
two things, remember. So, there are, of course, I know that there are more than one way
of doing it. So, we will just use one method here, which is that we will take the lowest
two elements from every row, essentially. So, I will take 300 plus 600 from here. I will
get 900 from this row. I will take 300 plus 500; I will get 800 from this row. Then, these
two, which will give me 1100 from this row; 7 plus 9 is 1600 from this row, and 12 plus
15, which is 2700 from this row. I will sum them up and divide the answer by 2.

364
What is the rationale behind this; taking two lowest from every row? Yes. In a tour,
every city will have two edges; one incoming and one outgoing, if you can distinguish
between them. We are sort of being optimistic here, and saying that the two edges are the
lowest cost to edges, because they cannot be better than that. So, if the lowest cost two
edges are taken from each row and we sum that up, and divide by 2, because we do not
want 10 edges; we want only 5 edges; we will get an estimate of the lower bound
essentially. So, we follow this similar, this high level algorithm, we follow; refine the
cheapest cost partial solution. In this case, we have only one at this moment. So, we will
refine that, till the solution is fully refined. Let us say, we use some heuristic and we say,
that we will add a cheap edge to the tour, essentially. So, the cheap edge here is, for
example; between Chennai and Bangalore.

(Refer Slide Time: 37:49)

So, I have now, I will just draw it a little bit this side, because I want to write things here.
I have two successors. This one, I will call C B, and the other, I will call C B bar. By
this, I mean, I have portioned the whole set into two subsets; in one subset, which I call
C B; the Chennai Bangalore segment will always be present; and in the other subset, it
will always be absent, essentially.

Now, I want estimated cost of these two. So, for every node, my search three, I need an
estimated cost, exactly like, what is happening here. Now, you can see that, let me write
it here; C B here, C B bar; let me write here, and C b here. Now, you can see that C B

365
will not change, because I am going to include this lowest cost edge, I have already done
that in my original estimate. So, whatever was my original estimate, will continue there,
essentially. In that case, I can just copy these figures from here, but for C B bar, I cannot
do that; why? Because I have excluded the edge C B from that set, essentially. So, I have
to find revise. I cannot choose 300, because in this, that node C, this thing; can
somebody tell me the value for this? I think so, we can use that. Divided by 2? 3550. So,
this cost is 3550, and this cost here, it will also be 3550. This cost is not going to be that,
because I cannot use this 300. I have to use the next two edges, which means 600 and
1000, which means, this will go up by 700. So, I will write plus 700 here. I cannot use
this here. So, I have to use this. Instead of 900, it becomes 1600, which is plus 700 here.
Instead of 800, this becomes 1400, which is plus 600. The rest will not change, because I
am not using that edge here, essentially.

So, that is 1300, and when I divide by 2, I will get 650. So, this will be plus 650. So, I
have a way of devising edges. What have I done? I have said that; so, this is Chennai,
then, this is Bangalore; this is Hyderabad; on some scale, this is Mumbai; this is Delhi; I
have said I am going to add this edge, and that is my set C B. So, all towards in which,
this segment is there, I will call C B, and all towards in which, this segment is not there, I
will call C B prime. Out of these, C B seems to be better. So, I will refine that. Let us
say, I follow this heuristic. It is not necessary to follow this heuristic, but let us say, we
will follow this heuristic, which means, always pick the cheapest available edge. It is like
a DD algorithm, which tries to build cheapest tour, essentially. So, I look around this
graph here, and the cheapest tour, I can find next is 500, which is between Hyderabad
and Bangalore. So, what am I doing now? I am saying, add this edge, and I am refining
this graph. So, one side will be called Hyderabad and Bangalore, and the other side
would be called its compliment. So, the way to interpret this node is that, this edge H B
prime node is, that all those tours, which contain the Bangalore Chennai segment, but
exclude the Hyderabad Bangalore segment, and the way to interpret this node is, all
those tours, which contain both the Chennai Bangalore, and the Hyderabad Bangalore
segment, essentially.

So, that is what I have drawn here. All tours, which have these two edges is this. Now,
let us find an estimate for this. How do I do this? At this point, I should mention, before
we do this. This process of computing the estimates is computationally, intensive

366
process, in the sense, you have to spend some computation time, looking at this matrix
and doing something. But there is also that you can do a certain amount of reasoning;
something, that we would call constraint propagation, in the process of doing estimates.
Let me illustrate that. Let us, first of all, because we have chosen the cheapest edge, then
for this nod H B, I am anyway going to include both these. So, it is not going to affect
my cost. So, the cost for H B is not going to change. You should just convince yourself
that is going to be 3550 also, but this cost for H B prime; that is going to change. So, let
us look at the changes, essentially. So, one thing is, of course, that you cannot use this
Hyderabad Bangalore link, which is; this is Hyderabad; this is Bangalore. So, this link, I
cannot use, just as we did first C B prime, for this nod also, we cannot use this link. So,
which means now, this will become 300 plus 900, which is 1200. Originally, I had 800,
now it is 1200. So, if you compare with this as a basis, I get plus 400 here.

Likewise, here, I cannot choose this 500. So, it becomes 600 plus 700. Originally, it was
1100; now, it is 1300. So, it is plus 100 here. So, I am just adding the incremental cost.
Only these two rows will change; plus 200, but coming back to this, how; see, we have
this desire, and we will see this in the next class, that having an accurate estimate helps,
or if not in the next class, then, the next class after that; that the higher the estimate, the
better for us. Intuitively, it means that the higher my estimate is, the more likely it is that
it will get excluded from a search space, essentially; if I find a cheaper solution. Now, if
you look at the estimate for this, I have this Chennai Hyderabad section; Chennai
Hyderabad is this one; this 600, and this 600. Now, if I do a little bit of reasoning, how
constraint propagation, and what is the constraint I am propagating is, that I want to find
the complete tour, which means that I cannot have a cycle, which is smaller than length
5, in this example. I cannot have a cycle of length C. So, if I am going to have that set H
B, which includes, actually, it means that this estimate is not correct, even though, I
wrote it there. Why is it not correct? Because I cannot include in my estimate, this
Hyderabad Chennai sector, why, because I have already included C B in that path, and
then, in this path. So, I am talking of this node here, and I am saying that even, the
estimate of this node will go up. The reason, why that will go up is that having included
C B, and have having included H B, I am forced to exclude H C from there, because
otherwise, I would have a cycle of length C, which means I cannot use this value for
computing this; for this one; for computing the estimate of that node, I cannot use this
value, 600. Instead, I will be forced to use the next value, which is 300 plus 1000, which

367
has become 1300, essentially, which is, of course, more reasoning than what I did so far.
What I did so far was, if I am including an edge, I cannot count at, when I am using those
bar kind of node, essentially, because they are excluded, sorry, if I am excluding an edge,
then I cannot count them.

But now, we see that there is certain propagation, which takes place that if I am
including this edge, and if I am including this edge; I cannot count this edge. So, let me
just draw a zigzag line to say, that I cannot count that line. Why, because then, I would
have a cycle. I can go to more extent to do more reasoning, to get better estimates, and
what do I mean by better estimates? Higher estimates, which means, if you cannot
include something, then do not include it. That something, in this example, is 600 here,
which is a low cost in this row, also in this row, but I cannot use it in my estimates. So, I
must use something else, which will give me a higher estimate. Another edge here, that I
cannot include, after I have included Bangalore Chennai and Bangalore Hyderabad, there
are two more edges that I cannot add, because I want a tour, and the tour has this
property; that every city is visited, exactly once, which implies that every city has
exactly, two edges incident on it. I already have Bangalore, which has two edges incident
on it in this set. So, I cannot have this set. Neither, can I have this set.

So, you can see that problem solving is does not necessarily, one prompt strategy, that
you just do search and only search, essentially. Later on, we will see that it is often
useful to combine search with reasoning, essentially; some amount of reasoning and
some amount of search, and in the process you will try to cut down on the search space,
more and more, essentially. What do we gain by excluding this? We get more accurate
estimates. For this tour called H B, which is the set of all tours, which include; now, we
can describe it more specifically, saying that H B stands for the set of all tours, which
include that path here, that C B H, the H B H and which excludes, H C D B and M C,
essentially. So, already we have narrowed down our choices and made better estimates.

So, this is the same thing that we were doing here; refining the least cost partial solution.
We started with 3550 and then, we got two solutions; one was 3550, and the other one
was a bit more than that. We refine this; we get these two solutions. We have not
computed the actual cost for this, but once we do that, we will refine them. At some
point, we will get a complete solution. For example, in this problem, if you want to add
one more edge, let us say, this edge; Hyderabad Mumbai. Then, actually, you have

368
solved the problem, completely, because then, after that, you do not have any more
choices left. If you are going to include these three edges, then you have to visit Delhi
once. Once you have to visit Delhi from Mumbai, and once from Chennai. So, the rest
will be four, essentially.

So, after we have done the search, we have found a complete tour. After we added this
search; below H B, if we add M H, and if this node M H, which includes these three
edges, happens to be the lowest cost node in my search space. Then, I can terminate.
That is the shortest path. So, you must convince yourself, that this termination criterion is
sound. By that, I mean that it will guarantee an optimal solution, essentially. One of the
reasons behind that is, that I have said that we are going to use lower bounding estimates
for estimating the cost of a solution.

You can see that there is some similarity of Branch and Bound with best first search,
essentially. You can view best first search as doing Branch and Bound, with the
condition that all edge cost are equal, essentially. That will force the; if all edge cost are
equal, then you will just go down level by level, because the first level; the cost is 1, the
second level; the cost is 2, third level; the cost is 3, and so on and so forth. So, depth first
search is a special case of Branch and Bound where, all the edge cost are equal. When
the edge cost are not equal, then Branch and Bound is a specific specialization of best
first search, sorry, it is a generalization of Depth first search. So, you have to convince
yourself that this will give you an optimal solution, but it does not have this. Of course, I
said we are going to use a heuristic that uses a minimum cost solution, but it does not
have a sense of direction, essentially.

369
(Refer Slide Time: 57:22)

So, let me illustrate that with a very small example, that if you are doing this city map
kind of a thing, and if this is your start node and this is the map that you are looking at,
and this, let us say, this is to scale, which means a length of the edge, that I am drawing
is actually, the length of the edge. So, let us say, you have some such place. You add this
start node and then, of course, there is one node here, let us say and then, some nodes
here, and let us say, this happens to be a goal node. What will Branch and Bound do?
What is the behavior that Branch and Bound will exhibit? It will explore all this part of
the graph, because that is what its mandate is; extend the cheapest partial solution. All
the cheapest solutions are on this part of the graph. So, it will explore this; it will explore
this; it will explore this; and all possible combinations, which of course, has an
intelligent view or you would see is not a very bright thing to do, essentially.

It will guarantee eventually, it will find me the optimal solution. For example, if there is
another path from here, which goes like this, which is longer; it will find me the shortest
path, but after doing a lot of unnecessary and useless search, in this part of the map,
essentially. It does not have a sense of direction. We are focused on this part, finding
optimal solutions, and in the process, we have forgotten about this part; finding solutions
faster. So, in the next class, we will combine these two together. We will see that how we
can combine Branch and Bound with, this was best first, if you remember; best first
search.

370
We will introduce a heuristic function back again, and try to use this diastase frame of
working on a graph to look at an algorithm, which is a very well known algorithm called
A star algorithm. So, I will stop here and we will take this A star algorithm, up in the
next class.

371
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science
Indian Institute of Technology, Madras

Lecture - 19
A * Algorithm

So, today we want to look at this well known A star algorithm and this was devised by
Hart, Nelson and Raphael essentially.

(Refer Slide Time: 00:27)

So, out of these Nelson is, we have met these characters very briefly in the introduction.
Nelson is one of the founding fathers of a i along with Simon and Noel and Minsky and
McCarthy. In fact, the first text book on a i written around nineteen seventy two also was
written by Nelson essentially. Problem solving methods in a i and it discusses most of
the search algorithm that we have seen accept the randomize algorithm and things like
that. The best first search and heuristic functions and effective branching factor and all
this kind of stuff was written in that book essentially.

Nelson was teaching at Stanford, Hart some of you may know because Doodah and Hart,
they wrote a very well known book on pattern recognition. And Raphael, we had
mentioned very briefly in passing, he was one of the first people to write a program to do
automated theorem proving essentially. Anyway, today we are interested of course, in

372
this algorithm A star, but further moment we will just first call it A and then we will see
in what condition we can call it A star.

Generally, when we look at some of these algorithms, that star implies that, it is
admissible essentially, and by admissible, we mean that it is guaranteed to final optimal
solution. So, branch and bound in that sense is admissible essentially. So, before I move
on our librarian informs me that, my book which I do not know how many of you have
seen is now available in the department library. There are about 5 copies and we are in
chapter 5 of this books at this moment, so you should catch up with that essentially.

(Refer Slide Time: 02:44)

So, let us first get the motivation into place which we have just discussed is that you are
at some start node S, and you want to go to some goal node g, assuming this is a city
map or something like that. And you have some kind of a open list, and there are nodes
on this open list which you have to pick a node from for expansion.

This is a old style algorithm that we wrote depth first search, breath first search, heuristic
best first search and so on. We dint use this terminology in branch and bound, but you
can see that branch and bound also always picks the lowest nodes in lowest leaf in the
tree that we were drawing which is like you know open list that you can maintain.

Now, we saw that what best first search use was a function which kind of estimated the
distance of every node to the goal node. And we call that function h of n and best first

373
search basically maintain a priority queue on the heuristic value and always pick the
lower which appears to be closest to the goal and in the hope of finding the solution
faster essentially.

Branch and bound on the other hand keeps track of the parcel cost found so far, which
are these cost. So, these are actual cost in the sense these are cost of actual partial
solutions found. They may not necessarily be optimal though for diastral algorithm one
can argue and prove that every time it picks a node it has found an optimal cost for that,
but we will come back to that argument a little bit later. So, this is a actual cost that we
have found to a given node so far essentially.

So, if you try to visualize this as some sort of a physical system, you can see that branch
and bound like breath first search which was a simpler case of branch and bound tries to
stick as close to the source as possible always picks a node which is as which has a
lowest known cost essentially. So, it tries to stick as close to the source as possible. So, it
is like a pull on this thing which is pulling it back essentially. Best first search on the
other hand always tries to go as close to the goal as possible essentially.

Because, it is using the heuristic function which is an estimate of given note to the goal
essentially. So, in some science that is a pull in that direction essentially. What we want
to do in this algorithm A star is to use a combination of these two pulls and that is as it is
as simple as that we just need to fill in the details essentially.

So, in A star terminology this value is called g of n and it is a actual cost of actual known
cost of a path which is from S to n. So, this actual cost for this path is to n the kind of
path that we were drawing in branch and bound we will call g of n the cost that we were
using in branch. And bound we will that label we will call g of n, h of n is an estimated
cost of the segment from n to g, that we have used in best first search that the h of n tells
you how far this goal is from this from the goal node essentially.

374
(Refer Slide Time: 07:02)

So, in this algorithm A, we use a function called f of n which is basically the sum of g of
n plus h of n. And essentially we keep the priority queue sorted on this value of f of n
essentially. Now, notice that what is happening in this search algorithm we are starting
with the source node of the start state and we are gradually making this implicit graph
explicit essentially by a kind of a search tree that we are generating and whatever we
have made explicit is divided into two parts. One is called the closed set or closed list
which is the internal nodes and the other is the leaves which we call as a open set.

As we build this explicit graph this g value are computed, as we find partial pots paths
we can compute their values and that is the g value. H value is independent of what we
are doing, H value is simply a property of a given node. So, it is like saying how far
Mailapur from Chennai central. We are just asking that question we has no question of
paths here or some measure we said ((Refer Time: 08:12)) distance we can use or
Manhattan distance we can use.

A heuristic function is just a property of the node. So, the algorithm that we will dip
implement this A star every time we generate a node you can simply compute its h value
which is this heuristic value once for all and you are done essentially. Whereas, this g
value may change, why because you as we saw in the example first you may find the
cost of length 15 to a node and then you may find another paths of cost 14 to that node.
So, g of that node may change from 15 to 14.

375
So, it is a quantity which keeps changing. So, it is a actual known cost or actual known
cost of a known path I should say because, there is some path that you have not yet
explored and which could be cheaper. Now, we use this term f star of n is equal to g star
of n plus h star of n and these are optimal cost. So, whenever we use the term star, they
just denotes the optimal cost, we may or may not know them essentially. In fact, we do
not know them most of the time essentially.

Even if we have found the path from some node to some intermediate node, we do not
know what the actual cost, what is a relation between g of n and g star of n, is it equal
lesser than or greater than? Greater than how many people, feels it greater than only one.
You said greater than right? Greater than equal to, why greater than equal to? So, g of n
is a path that you have found to that node n so far, it may be not the best path. So, it
could be cost greater than g star of n.

So, this f of n is basically or S star of n is the optimal the cost of optimal path that starts
with the start node goes to node n and then goes to the node g. So, for any node n, f star
of n is optimal cost optimal path cost which passes through the node n essentially. And if
that path is an optimal path, so if, so there is a distinction between these two. Here we are
saying that for any node n what is the optimal cost going through that node n. Here we
are talking about an optimal path from S to G. We can assume that, if this is equal to S n
1 n 2 n k G, then we can write f star of S is equal to f star of n 1.

And, so on f star of g because, it is a same path remember. When we say this is optimal
path then whether we say f star of n any node on that path or f star of start or f star of
goal they have the same cost because this is only one path that we are talking about. So,
for an optimal path these two this sources this we will use as some later point.

376
(Refer Slide Time: 12:37)

A star is admissible if now, we come to the same question again the relation between h
of n and h star of n what should be the relation. So, I am giving you two options here
include some we do not have equality. So, we have included an equality in both the sides
which one is better. A first option or the second option? What should my heuristic
function do? Should it what is this one doing. This is this underestimates, and this
overestimates. And I making a claim and this claim we will show more formally
probably in the next class. That if I choose a right condition here, then the algorithm A
star becomes admissible and by admissible we mean it will find you the optimal cost
path essentially.

So, should my h of n be an underestimating function which means h of n should be less


than h star of n. Remember h star of n is the optimal cost of going from n to the goal or
should it overestimate the cost of going from n to the goal, which one will make my
algorithm admissible. Should I take a vote on this or somebody going to volunteer in
answer?

So, let us take a vote, how many people feel it should overestimate the cost? Only one,
two can I use negation by failure, no. How many people feel it underestimates a cost? A
very small number essentially. So, let me we will come back to this. So, let me, let me
take an example to illustrate this whole point essentially, and then you will see or let it
let me put it this way.

377
It is an analogy that, I often tend to use that supposing you are buying a new, you want to
buy a new, mobile phone the most popular object now a days and you want to get a, you
have decided what phone you want to buy and but there are three or four shops which are
selling it. So, you go to the first shop and let say I do not know what is the good price for
you people. So, let us say it is 10000 rupees, this fellow says I am giving you this phone
for 10000 rupees.

Now, there you have some estimate of how much the other people will charge, will they
will you go to them if you think they charge less or will you go to them if you think they
charge more than 10000. Charge less essentially which means that you think that they are
that value of that node is underestimating the cost essentially. So, let us then look at this
example.

(Refer Slide Time: 16:57)

Let us say I have only two nodes in open let us call them A and let us call them B and let
say this is the actual path found to them and let us say that they are just one step away
from the goal this just to illustrate the idea. So, that is the edge that they are just likely to
explode let us say that this cost is 100 just to simplify a maters let say this cost is also
100 and let us say that this cost is 40 and this cost is 50 that is the edge cost A G is the
cost of edge A G is 40 and the cost of edge B G is 50.

So, what do we have, we have this 100 is equal to g of B this is also equal to g of A and
that is a actual cost we do not that is you might say the h star of A is 40 and h star of B is

378
50 in this case of course, if there is only one edge it is basically a falls on to the edge cost
essentially where it could be more than one I think. So, let us look at examples of both
this functions.

(Refer Slide Time: 18:26)

So, let us say h 1 underestimates the cost and h 2 overestimates the cost. And let us pick
functions which are not very good in the sense that both of them are giving you wrong
information, both of them are telling you is that B is actually closer to the goal then A is
closer to the goal, so let us work with this.

So, h 2 let us say h 2 of B equal to it is actually 50. So, let us say this is 70 and h 2 of A
is actually 40, but let us say it is 80. So, what does it means that this function h 2 which
is a heuristic function. So, remember we did this example with this blocks world where
we tried to define its functions when we saw that one of them was better than the other.
This is looking at another aspect of heuristic functions which is one of them is
underestimating and the other one is overestimating.

So, first we are looking at a function which over overestimates the costs which means
that this actual cost for this is 50, but this function thing it is 70, actual cost for this is 40,
but this function thing it is 80. So, in actual practice A is closer to the goal which means
this part should have been better, but this function will think that B is closer to the goal
because of the estimate it has h 2 of B is 70 which is less than 80.

379
So, f 2 of B will be g of A which is g of B which is the same is equal to 100 plus 70
equal to 170 and f 2 of A is equal to 100 plus 80 equal to 180. So, remember that we said
that basically what this algorithm A star does is it maintains a priority queue of on f
value essentially. So, in this small example there are only two nodes in the open list, one
is A one is B and it has to decide which one to pick it uses is f values f 2 is 180 f 2 of A
is 180 and f 2 of B is 170.

So, it will pick this node which means it will now what disaster would have said relax
the edge here which means it will compute g of g. So, g 2 of g is equal to g 2 of B plus
the actual cost which is sorry which one have we picked we have pick B which is 50,
correct? The actual cost was or let me write which is 50 which is the actual edge cost
which is equal to 100 plus 50 is 150 and h 2 of g is equal to 0. We assume that our
heuristic function can at least tell us if we have reached the goal. So, the heuristic value
at the goal is 0 which means f 2 of g is equal to 0 plus 150 is 150. So, what is happened
in our search space?

(Refer Slide Time: 22:17)

We started with S, then we had this two paths, this is A, this is B, then our algorithm h 2
has pick B and found a path to g, then we have this two node on open A and G. A has
estimated cost now, estimated cost we mean the complete cost of going to the goal via
this node A is estimated to be 180 because, the heuristic estimate from A is 80 and the

380
cost up to A is 100. So, 100 plus 80 is estimated to be 180. The estimated cost of g is the
cost found, so far which is the g value which is 150.

So, here we have this node sitting with 150 and this node sitting with 180. So, the
algorithm will pick g and terminate, the same algorithm that we talked about in the last
class when the completely refined solution is becomes a cheapest. It will automatically
get picked in this priority queue and then the algorithm will terminate. So, it will, it will
find this path essentially.

(Refer Slide Time: 23:33)

Let us try the other option which is h 1 of A. It underestimate, so let us say h 1 of B


equal to it is actually 50, but now we are underestimating. So, let us say it is 30 and h 1
of A is actually 40.

So, let me say this is 20 it is actually 40, but let us say this function thing it is 30. So, in
the manner h 2 and h 1 are similar the both of them think that B is closer to the goal, the
only difference is h 2 overestimates the cost and h 1 underestimates the cost essentially.
So, in which case as you can see f 1 of B equal to 20 plus 100 120 and f 1 of A is equal
to 130.

381
(Refer Slide Time: 24:43)

Again this is a very similar situation, this algorithm, so originally this was 180, this was
170 and this became 150 with this new algorithm f 1 this is 130 and this is 120. So, again
this algorithm will explode B first, but what will happen now after picking B it will relax
this edge from B to g which actual cost is 50. So, g to g or g 1, g is equal to 150 which
means f1 G is equal to 150.

See after they have found the path going from S to B to g both of them know the actual
cost there is no longer this estimation done by h because, h is 0. So, the actual cost, so f 1
G is 150 as well as f 2 G was 150, but now you can see that this is with the same cost
150 and this algorithm let us call it A 1 star has to choose between g and A. It terms out
that A is lower estimated cost 130. So, this algorithm will pick A, and when it picks A it
will find a cheaper cost to g exactly like diastral algorithm would have done, and this
cheaper cost is the cost of 100 plus 40 which is 140.

So, this will revise this cost to 140. Originally the estimated cost was 170 and it got
revised to 150 here, it was 180 it never got revised to 150. Here it was 120 it got revised
to 150, but when this expanded this it got again revised to 140 essentially. And at that
time this is the only goal only door left in our open list because we have finished with A,
we have finished with B both and only g is left and g will be picked with the cost of 140
which is the optimal solution.

382
So, you can see that using h 1 of h 1 we could find the optimal solution using h 2 we
could not find the optimal solution, even though both of them thought that B was the
better choice using h 2 we picked B and then we picked the goal using h 1 we picked B,
but then we were forced to pick A and then we were forced to pick the goal essentially.
So, we found the optimal cost.

(Refer Slide Time: 27:31)

So, this is the criteria we are looking for. If h of n is less then equal to h star of n. So, the
algorithm is admissible if the heuristic function underestimates the cost of the goal
essentially. So, let us quickly describe the algorithm completely because there are one or
two things which were not there in the best first search algorithm. So, I will do a quick
description and then we will do it we will take it up again in the next call essentially. So,
the algorithm is very similar, except that we keep track of this cost explicitly g of a node
n, h of a node n, so on.

383
(Refer Slide Time: 28:44)

So, the steps are very similar at. So, I will just write the outline here and you can fill in
the details. So, I will just very briefly write open gets S which means open gets the start
node, but we want to keep track of parents and so on and so for.

So, let us just write here parent of S is nil and h of S is computed we always we can
always compute the heuristic value of a node. So, in as a matter of principle we compute
the heuristic value and we initialize close to nil as before accept that here we have talked
of this parent point of, rather than having this node pair that we had earlier you can use
that it does not really matter, this is simply easier to describe. So, as before if open is not
equal to nil, pick best node n and add it to closed. I will just write it as an outline I want
to focus on the part of what do you do with this node n essentially

So, if n, so if I use the older terminology if gore test n than the reconstruct n. So, we
assume that we can trace back the path and reconstruct the path and then. So, all that it is
very similar to what we did in earlier search algorithm, else successors is the move gen
of n. So, we I am using the same function that we have defined earlier and for each m in
successors we do the following essentially. So, let me also draw the situation at the point
where this algorithm is doing working.

384
(Refer Slide Time: 31:26)

So, this is S and then somewhere we have this open list, this is the set of open node and
let us say this node n is here essentially. So, this line that I have drawn represents open
list and we picked some nod, nodes with the lowest f value have I said that here, ok I
have said best node n and by best I mean the lowest f value and f is g plus h and we it
was not the goal nod. So, we generate successors.

So, let us say these are the successors of n. All neighbors if you want to say or what are
reachable from there. Now you can see that there are three kinds of nodes here one is one
kind which is on the open list already. So, these two I mean to draw is that they are
already on open there are some which are already in closed and there is some which are
new nodes essentially. So, this one represents that new nodes essentially.

385
(Refer Slide Time: 32:43)

So, for each M, so case one m does not belongs to open and m does not belong to close
which means it is this node here that I am talking about, it is a new node that has not
been seen earlier essentially. Basically we want to add it to our search space. So, what do
we do we say add compute h of m then.

So, we have some heuristic function that we use. Then we say parent m is n and g of m is
equal g of n plus I will use k as a cost function. So, that k of n m is the cost of that edge
going from n to m. So, we update the value of or the compute value of g m is equal to g
of n. The g value of the parent exactly as we were doing in this branch and bound kind of
a thing and then we compute f of n and we say add n to open.

So, there are three cases the first case is when m is a new nod. So, we compute its edge
value we compute its g value which means we can compute its f value, we mark the
pointer of m as its parent that it came from and add it to open essentially. The next case
that we want to look at is when it is already on open.

386
(Refer Slide Time: 35:14)

So, case 2, m belongs to open which means what, that we have found this edge here. So,
we are dealt with this these two cases. So, now, if this one of these two was m, it means
it is on already in open which means it is already has a g value and h value and a parent
point. So, this n has just been expanded. So, before that, this was pointing to let us say
this one for argument sake, let us say this was pointing to this and this was pointing to
this. So, they already had this parent pointer, what do we do for such nods, or what do
we want to do?

We want to see if we have found the cheaper path to that node or not, correct? We
already have a path to this node m it is coming through some sequence of nodes n always
remember is a parent point. So, it already has a g value and an h value as well. So, if m
belongs to open, then we do a check if g of n plus this value A of n m. What is this
value?

This value is a new cost that we have found to this node m, this node m was already on
open. If this is less than the cost that was already stored for m, remember it is all open.
So, it is must be having a g value, but if this new cost that we have found is better than
the old cost, then we have to do some readjusting of pointer which is that parent of m
becomes n and g of m becomes this value. And f of course, becomes in all these h of m is
not changing at all because, it is just a property of that node, it is not the property of that
what path we have found to this node.

387
So, these are the two cases. One case for new nodes, we simply add them to the graph,
and create the parent pointer, compute the g value, compute the h value, compute the f
value and we are done. If we have found a new path to a node in open then we need to
check whether this path is better which is what we are doing here, if it is better than we
readjust the parent pointer and readjust the g value essentially. So, this is where we are
readjusting the g value, this is where we are readjusting the parent pointer and then this
of course, taking into account the new g value.

(Refer Slide Time: 38:27)

Case 3, m belongs to closed. Which means we had already visited m and expanded m
and generated children of m n, so on and so for, so let us take this as a case.

388
(Refer Slide Time: 38:47)

So, let us say now this is m, which means it was already generated before which means it
could have other children. So, let me use a different color here. So, this node could have
been a child of m and maybe this node could have been a child of m which could have its
own children child here anything is possible essentially. The difference between the node
on open and a node on close is that a node on close may have children of its own
essentially. So, these purple nodes represent children of m essentially. So, if m is on
close first of course, we have to do what we did for case 2.

(Refer Slide Time: 39:41)

389
So, first I will say do like in case 2, which means that this node m we have found a better
path to this node m.

(Refer Slide Time: 39:56)

Originally this m had some other pointer let us say this pointer. So, remember this are
pointed to parent nodes. So, m was pointing to its parent here and if we find a better path
to m, we must shift this pointer here and say that this is the new parent, exactly like what
we did for n. Originally this node first pointing to this, but now we have removed this
and we have said this is a new pointer if we have found a better path likewise for close
nodes essentially. So, do like in case 2.

390
(Refer Slide Time: 40:39)

So, if let me use this if better path found which basically means this condition g of n plus
k of n m is better is less than g of the old value of m then. So, actually I should do like
this then like case 2 which means readjust the pointer, parent pointer readjust g value, but
now the g value has changed essentially.

(Refer Slide Time: 41:32)

So, we need to, so just to take an example, if the old g of m let me say old is equal to 50.
And if g of m gets a new value which is let us say g of n plus 10, and let us say g of n is
20 equal to 20 plus 10 equal to 30.

391
So, if the old value was 50 and the new value was 30, then we have found a new path to
this m which is 20 unit cheaper than the old path which means that all path going from m
to other nodes we must pass on this improvement to them essentially. So, you must
reduce the cost of this by 20 reduce the cost of this by 20 and then reduce the cost of this
by 20 and so on and so for.

(Refer Slide Time: 42:34)

So, I will just write this and propagate improve cost to sub tree below m. How can we do
this propagation? You can just do a depth first traversal or something like…

(Refer Slide Time: 43:13)

392
that of this purple tree which is rooted at this m, which means this part of the tree to this
node to this node to this node that improve cost must be preoperative to them.

So, that is a third case, if you have found a new path to a node in the closed. A better
path to a node in the closed, then that improvement has to be passed on to children of
that node essentially, which was not the case in disaster. I mentioned, when we were
talking about disaster algorithm, that if you have found the path to a node in disasters
algorithm you have already found optimal path to there. But in A star that is not the case
essentially, like we saw in the example that we saw, if you have found a path to g which
was through the node B which was a more expensive path you can find a new path from
that node essentially.

And it is also possible for nodes and close you can easily construct an example. So, we
need this extra step that if we have found better path, so node that we have already
expanded earlier we need to propagate this improvement to their children essentially. So,
which is a little bit more work then case 2. Where? This one? And here, no that is all and
add m to correct. So, under the conditions when heuristic function underestimates a cost
to the goal this algorithm is called A star and it is admissible essentially.

In the next class, when we meet we will prove that it is admissible. So, this is one of the
few cases where we will give a formal proof of admissibility because, A star is a very
well known and often used algorithm, but this is not the only condition for admissibility I
mean this was the condition only with respect to the heuristic function.

So, what we have shown with an example today, that if a heuristic function
underestimates the cost of the goal then the algorithm is admissible. And you can see
branch and bound is a special case of A star where h of n is always 0. That is why we
said that you can think of this as an estimated cost, if you just assume h of n is 0 for
every node then this algorithm becomes branch and bound essentially, but this is not the
only condition.

So, what I want you to do is, to think just imagine that. So, A star is actually some people
just call it graph search. It is a weighted graph and you have to find the least cost path
from one node to another nod. We make some simplifying assumptions, but it is different
from disasters algorithm in one respect and disasters algorithm assumes that the

393
complete graph is available to you and it is a kind of an old in the sense that, it finds the
shortest path to every other node from a single source essentially.

We are not interested in to every other nod, we are only interested in finding a path to a
given goal node and we do not have the graph available to us, the graph is generated on
the fly as we go along essentially for some were we have this function, move gen
function. The graph is generated as we go along using the move gen function. So, we
start with some node we keep adding more nodes to the graph and then we work with
that under these conditions A star is a little bit different from disasters algorithm.

It also uses this idea of a heuristic function diastral algorithm did not need to do this
because it was anyway going to find shorter path to all nodes in the graph essentially. It
did not have a goal in mind, A star has a goal in mind. If you want to say it has a goal in
mind and therefore, it benefits from the use of a heuristic function, and if the heuristic
function is underestimating function then that is one of the conditions for admissibility.

So, I wanted to think of what other conditions you can might require for admissibility,
just imagine that this is some obituary graphs such problem some graph is given to you
and under what conditions will you, will this algorithm find an optimal path. And we will
take this up in the next class we will do a formal proof of this essentially after we do that
we will compare heuristic functions again formally and show that heuristic functions are
which are more informed they do lesser search.

So, remember that when we were talking about branch and bound we say that this
estimate must be as high as possible, it must be a lower bound even this is a lower
bound, but it must be as high as possible, and we will formally show that the higher the
estimate the lesser the number of nodes that this algorithm will search before termination
essentially, we will do that these two formal things in the next class.

So, we will stop here.

394
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Lecture - 20
Admissibility of A*

(Refer Slide Time: 00:19)

So let us begin, we are looking at the A star algorithm, and if you remember what A star
does is that it uses a function f n is equal to g n plus h of n, where g n is the known cost
up to node n from the start node, because it has explode part of the space and h n is the
estimated cost from node n to the goal node essentially. So, it uses the combination of
these two functions; and basically picks node with lowest f of n at every point of time
from the open list essentially. So, if you remember g n was the function which we sort of
inherited from branch and bound which tries to keep it as close to the goal as to the start
at possible, and h n is the function we have inherited from best first search which tries to
pull it towards the goal node essentially. So, with the combination of these two functions,
the idea is that the algorithm will find optimal paths from start to the goal essentially and
today we will show formally that this is how this is done; but before we do that, just a
quick recap.

395
(Refer Slide Time: 01:55)

Any time it generates a new node n, it gets different kinds of children - three different
kinds of children.

(Refer Slide Time: 02:04)

So, we had said, so, when we say moveGen it gives new nodes, it gives nodes on open
and it gives nodes which are already on closed. So, when you generate, when you expand

396
the new node which means you call the move gen function with this node n. It can
generate three different kinds of nodes either they are completely new or they are already
on opened or they are on closed essentially.

(Refer Slide Time: 02:38)

So, completely know new nodes are like this, and what we do is essentially compute the
g values for these nodes, which is the g value of this node plus the cost of this arc
essentially and put it into open; and we markup parent pointer from these nodes to this
node n; so, these steps we do. For nodes which are on open and on closed we need to
revise g of m where m is a child, because we make might have found the cheaper path to
a node.

397
(Refer Slide Time: 02:57)

So, for example, if this is a node on open, these two nodes are on open, there is possible.
So, since they are on open it means that they already had some parent in the closed of it
is, it was a child. So, this could have been this one this could have been this one;
remember that this double cycle we will use for closed nodes. So, they already have
some parent, but we have found a new path to these nodes on open; it is possible that we
might have found the cheaper path and we want to update the cheaper path. So,
essentially we revise g values of node essentially and lastly you may get nodes which are
on closed. So, for example, these two nodes, in which case not only you have to update
the g values of those two nodes which are on closed, but also any nodes which might be
pointing to this essentially.

So, for example, if the cost if the g value of this is let us say 50, and the edge this edge is
let us say 10 and if this g value was 70, some for some reason then we need to update
that g value for that node, because now we have a cost of 50 plus 10 – 60, so we want to
make this 60. And we have gained an value amount of 10 which must be passed on to
this, so this become must become minus 10 whatever it is value was it must become
minus 10 and any children it might have and thinks like that. So, you need to propagate
that.

398
Now why do we need to do this updation? open and closed list why do we bothered to
update this g values in finding optimal solutions or the shortest paths essentially.

(Refer Slide Time: 05:21)

So, by as and when, so let us take this example that we have been looking at if this is a
river, and you are at this place, which is the start node and your goal is somewhere here
on the other side of the river, we have looked at this example before when you look
looking at best first search. So, you may generate some children and you may follow
some path because the heuristic function is driving it in this direction; or at that point you
can see that there is the, the river on the way and let us assume that the bridges
somewhere here or there is another bridge somewhere here, then the search will drive it
towards that path essentially. So, it may go from here to here, and then it may go from
here to here.

Now best first search what have done the same thing, but it would have remember this as
the path which takes you to the goal node, and that is why we said that best first search
does not guarantee the optimal paths. Where as so this was the this is the path that you
have you have just found for this node, this is the new parent that you have just found it
is possible that as search progresses you may generate a node here, and you will find that
the shorter paths to this node is via here and not like this. So, this is the kind of situation

399
we are trying to depict here; that you may want to revise you may want to say this is not
the parent, but this is the parent and because there is a shorter path here. So, may be you
found a path like this.

So, the paths that you are looking for may be something like this; go like this; instead of
the search would have generated first gone in this direction, because that is what the
heuristic function is saying the goal is here go towards the goal, and then it will away
from that path essentially. But a star does is that when you does something like this it
reallocates parents too shorter paths to any node on the way. So, which is why we say
that whenever we expand this node n for nodes on open like these 2 or for children on
closed like these 2; if you have found the shorter path, we must update that essentially
and that is 1 reason why it ends of finding the shortest paths essentially.

(Refer Slide Time: 07:46)

So, today we want to prove that A star is admissible and by this we mean that it will
guarantee an optimal solution or or shortest path if you trying to think of it as a path in
the graph, the graph may be an abstraction for some other problem where the cost could
be something else, but basically every edge this is the cost associated with it. So, under
what conditions will this algorithm...? Now just think of this purely has an algorithm and
what is the algorithm the algorithm says that pick the node with the lowest f value from

400
open, expanded, this is what we are doing here, and treat these three different kinds of
nodes new, open and closed separately. For new nodes just set up the graph for nodes are
opened check if you have found the cheaper path; for nodes on closed also check
whether you have find found cheaper paths, but also propagate the cheaper path like that
example to it is the children of the closed node, because the node is on closed it will have
some children.

(Refer Slide Time: 09:23)

So, under what conditions would a star guarantee an optimal path we have stated one
condition earlier. So, let me write this conditions for optimality, the first condition is
finite branching factor every node will have a finite number of children essentially. Now
obviously you can imagine some continuous problems, where there may be infinite
neighbours; now, if there are infinite neighbours then we cannot use this algorithm we
will assume that every node has the finite number of children. So, the branching factor is
finite essentially that is one condition.

We will allow infinite graph which means that we will allow for infinite number of nodes
to be presented in the graph, but branching factor cannot be infinite which mean that you
can always move away from some node; if the branching factor was infinite then you
know you will spend all your time infinite amount of time generating the children of that

401
node essentially; so obviously, it would not work. There is a condition on the cost; so we
will use k for cost function. So, k m n stands for cost of edge going from node n to node
n, we will assume this is the symmetric cost, what condition should look cost function
satisfy, for the a alba for the algorithm to guarantee optimal solutions.

So, can I allow negative cost? Why?

Sorry

If there are cycle with some of it is say just negative, then the algorithm will just keep
repeating the cycle and it is total cost will keep going down essentially, which is also
conditions with the algorithm demands algorithm works only for positive cost there are
other algorithms which handle negative cost, but we are not looking those kind of thing.
So, for the moment, we will assume that this should be greater than 0, where we will not
even allow 0 cost essentially. So, that is for the moment know but will come back to that
little bit later.

(Refer Slide Time: 12:02)

A third condition that i have already mentioned and we have seen as example what it
which say that h of n should be less than or equal to h star of n, which means that the

402
heuristic function must be an under estimating function, it must only always under
estimate the distance to the goal, never over estimate the distance to goal. And the
intension behind that is if you over at estimates the distance of some node from the goal
then you will never explode that node further, because you will think it is too far from
the goal essentially. So, in generally if you think it is closer to the goal where it actually
is then you are liked to explore that node and that is a key factor for A star being
admissible.

So, we if you remember we saw an example in betweens of 2 heuristic functions h 1 and


h 2, one of them underestimating the distance, other was the overestimating the distance
and both of them had a wrong notion of which of those two candidates was closer to the
goal, but the underestimating function did find that optimal path and the other one did
not; today we show this little bit formally. So, if these three conditions are true then a
star is admissible and admissible we mean it always finds a optimal path if there is if
there is a path from start to the goal, then it will find the optimal path from start to the
goal and it will terminate to the goal. So, we will do this through a series of small lamas
through a series of statements then eventually conclude with this property of
admissibility and the couple of properties we will compare heuristic functions to see if
how can we say that one heuristic function is better than another heuristic function.

(Refer Slide Time: 13:57)

403
So, we will have a series of lemmas, so L 1 simply says that it terminates for finite
graphs; the algorithm will terminate for finite graphs; that is the only statement we want
to assert at this movement. What is the justification or what is the reasoning to say that it
will always terminate for finite graph?

()

Yes, but that is not to the justification that we are looking for. You should at a algorithm
look at a algorithm and comment on what the algorithm is doing.

()

If the graph is finite then the algorithm in every cycle moves one node from open to
closed it starts by putting the start node on open list and closes empty and in every cycle
it inspects one node from open the one, which has the lowest f value and if it is not the
goal it puts that into closed and generates it is children and puts those children into open,
which is the priority cubes on f. So, if the number of nodes are finite in the graph, if the
graph is finite then this it can do this only a finite number of times. So, it will exhaust the
whole graph. Now supposing the supposing in this problem that we have here, which is
this problem how this city travel supposing this bridges were not there essentially, and
you know there was a small hamlet here, which you could only explore that essentially
then the graph then A star would end of exploring all those nodes and say i have nothing
else explore because open has become empty.

404
(Refer Slide Time: 15:54)

So, remember there are two conditions for exit; one is when open is empty exit and
report failure, other condition is if the node that you have picked is the goal node, then
we construct the path turn it on the path. So, in this case, if these bridges are not there
and the goal was there it would never a find the path, but it would terminate that is all we
are talking about here essentially; the graph is finite, it will terminate.

(Refer Slide Time: 16:38)

405
Then the second lemmas you want to says that at all times before termination there exist
a node we call this node n prime; and this node is on open and which is on the optimal
path. So, I will just say an optimal path, because a could be more than one optimal paths.
So, the next statement that we are making is in that before the algorithm terminates, it
will always have in it is open list one node, which we will call n prime which is on the
optimal path from start to goal essentially.

(Refer Slide Time: 18:00)

And the argument for that is the following that initially we put s on to open. So, let us
say this is the optimal path, this is the optimal path, let us say there is only One optimal
path to start with n 1, n 2, n 3, let us say this is the optimal path. What we are claiming
now that from this path before the algorithm terminates, they would always be one node
in the open list, which means the optimal path will always be in the sites of this
algorithm in some sense. Now when we start the algorithm we put s on to open list. So,
at the beginning s is in the open if the algorithm is terminated by finding the path then it
is terminated we are not talking of that condition essentially. So, we as talking about it
condition before termination which means there is a paths to the goal we assuming there
is a path to the goal and it has not yet found a path or the path whatever you want to say
we will show.

406
That when it finds the path it will always finds an optimal path. So, initially we put s on
to open. So, the condition is true in the initial situation when we remove s from open we
put it into closed and then we end of putting n 1 on to open essentially if we remove n 1
from open then you put this into close and this process will continue essentially if we
remove g from open which is the last in this line; that means, algorithm is terminated by
founding a finding a path essentially. So, this condition says that at any time 1 of these
nodes is always in the open which is available to a star to inspect next.

(Refer Slide Time: 19:57)

Further we say the f value of this node is always less to or equal to the optimal cost from.
So, s is the start node, and remembers, s star is a optimal cost of going from start to the
goal node essentially. So, this is this is part of the statement, we are saying that not only
the such a nodes node exist it will always have f value which is lower than or equal to the
optimal cost.

407
(Refer Slide Time: 20:36)

So, what is the argument behind this you can say that f of n prime is equal to g of n
prime plus h of n prime it is by definition of what the f value is now, because it is the
known optimal path we know that this is g star of n prime plus h of n prime. Remember
that we are talking of optimal path. This will be less than equal to g star of n prime plus h
star of n prime, because of the condition, condition number 3 which we are said that the
heuristic values always less than or equal to the optimal value, which means this is equal
to less than or equal to f star of n prime, where f star of n prime is the cost of the optimal
path which passes through n prime; not necessarily the globally optimal path to the
optimal path passes through n prime. But these n prime that we have chosen is on the
optimal path itself; so this must be equal to f star of s because after all it is it is the same
paths that we are talking about.

So, whether we say it is a passing through n prime or a. So, whether we say it is passing
through s or n 1 or n 2 or n 3 or g it does not matter the cost is in the same. So, this is the
condition that we are looking for; and this is less than or equal to this f star of s that is
what we have written.

408
(Refer Slide Time: 22:20)

Then the third thing we want to say is that algorithm terminates if there is a path, and it
should had even for infinite graphs remember that an infinite graph is only be that which
has an infinite number of nodes factor. So, this the third statement we have making is
that the algorithm terminates with a path; that if there is a path from the starts to the goal
state, the algorithm will always terminate with the path from the start to the goal state we
are not yet talked about optimal path, we are saying the it will find the path; and we are
saying making this claim that even of the graph is infinite when you say that infinite
number of nodes, then algorithm will find a paths to the goal. What would be an
argument for this? In other words we are saying that the algorithm will terminate that
picking the goal node of the two termination criteria that we have one is the open is
empty in this for infinite graphs open can never be empty. So, for infinite graphs it can
only terminate by picking a goal nodes essentially.

So, we are saying you want to argue that it will necessarily pick the goal node at some
point of time just you to imagine this graph just imagine it doing a search in a city or
something like that, how the open front yard will move essentially. You remember that it
always picks the node with the lowest f value essentially for a goal node the f value is
equal to the g value of that node, because h is 0 you already at the goal. So, we want to
argue that this node is the goal, which is the goal node will the at some point come to the

409
head of this sorted list or at the at the root of the way of maintaining the priority queue.

In other words we are saying that all other nodes on open will at some point become
more expensive then this node g. And why is that?

(( ))

No no no . So, what I am saying I am saying that let us say this is my start node, and this
is my paths to the goal it has some cost essentially. What I am saying is that this search
cannot ignore this whole node that eventually it may have it may end of looking at a
node here.

(Refer Slide Time: 25:41)

It may end of looking at a node there in this direction or that direction, but it cannot go
indefinitely in any direction. And the reason for that is that whichever direction it goes,
the g values, think of the g values supposing there is a node here let us call it n 1 or
something if that is on the open list. So, let us say the open is looking something like
this, you know refusing to go towards the goal node, but there is a node n 1 g value?
What is going to be this g value of this node is going to be the length of the path from
here to here, let us say it picks this node n 1 it will have some other node let us collate n

410
2, and then it let us say picks n 3, can it keep doing in can it keep going in indefinitely in
along some other direction, because the g, because the paths are becoming longer and
longer in those other direction; g value remember is an is an length of the path found up
to that point. And at some point this f value will become the smallest that is say
condition that we have that the paths are edges are positive in cost essentially. So, if you
go in some direction and some point your cost will becomes. So, what is the value of this
sum g of g the some value at some point all other nodes that you have in open will have
paths which are longer than this; or to think of it in another manner, there are only a
finite number of paths which are shorter than this lengths g of g; and even if the
algorithm will explore all those paths and then it will take of g of g. Is this they seems to
be a gently silence about this essentially.

All we are saying is that if you go of in some other direction your path cost will become
more than the cost of this g value and remember that the heuristic function
underestimate. So, you can never over estimate the cost to in any case g will come at
some point and the some point this will become the smaller. So, it will have to pick the
node. So, as it turns out this was many years ago, when I was teaching the same very
point in the class 1 of the students in the class, he is now professor in Princeton raises
hand, and said this is not correct; and it something you do with this condition that is why
i said for the moment let us assume this this condition is the paths the edge of e, the cost
of each edge is greater than 0.

So, it is possible to assign edges, edge cost for example, I will assign 1 to the first edge,
half to the second edge, one-fourth to the third edge 1 8 to the fourth edge and keep
dividing by 2. What will happen, I will have an infinite path whose total path length will
be finite essentially, and that finite could be less than this g value, which means that it
could get actually trapped in that infinite path essentially, which has the finite total cost.
So, this is like you know that paradox is that (( )) in the hair paradox that epidemics
paradox. That you say that the diabet, the tortoise tells the rabbit that i if you give me a
lead of let us 100 meters then you can never beat me in the race and this argument is the
following.

So, let us say rabbit is at starting point and the tortoise is 100 meters away, the tortoise

411
say that by the time you come to this 100 meters point, I would have to move to some
point let us call it x 1 essentially and by the time you come to x 1, I would have to move
to x 2 and by the time you come to x 2, I would have to move to x x 3 and this will
ofcourse keep happening infinitely and so you can never over take me essentially. So, is
1 of the paradoxes which comes into picture and something similar would happen if you
go to allow arbitral small edge cost essentially. So, if you cannot allow arbitral small
edge cost then this will not happen.

(Refer Slide Time: 31:04)

So, which is where the correct condition is not greater than 0, but greater than some
value some positive value and as long as that is the case, it cannot be infinite especially
small then every path fill at some point become longer than this path to g, and then g will
come into. So, i will take it that this point has been accepted.

412
(Refer Slide Time: 31:30)

So what have we done so far? So far we have said between statements 1 and 3 1 means
said that the algorithm always terminate; in 3 means said if there is a path to the goal it
will always find the path to the goal. Now we are saying that it will always find the
optimal paths to the goal essentially. And it has very simple argument, the argument is as
follows that so the proof for this I will write here.

(Refer Slide Time: 32:13)

413
Let A star terminate with a non optimal path, which means it is found some paths to the
goal where the g value of that path. So, let me just call a g 1 is greater than f star of s,
remember f star of s is the optimal path; whenever you use the star it is an optimal path,
whether we know in the value or not essentially. So, we are saying that let proof a
contradiction that a star terminate with a by picking a goal node g 1, whose g value is
more than the optimal cost, and now I want to make a statement that this cannot happen
let me ask you, why can this happen, why can it not terminate by a finding a non optimal
path.

So, if it is picked goal node it will terminate that is the algorithm; that you must
remember like the algorithm. If it picks the goal node for open; that means, it will
terminate; it will say I found the goal and so on. Now we are saying that let it terminate
by picking the goal node g 1 such that it is cost is more than the optimal cost; remember f
star of f star of s is equal to f star of g, and in fact, it f star of any value on this node
essentially, all of them the values are the same, it is cost of the optimal it is a cost of this
optimal path. So, we are saying that assume that a star terminates by picking a value
which is more than the optimal.

Now what I am trying to say that we cannot make this assumption, I mean it is not the
long proof or something it just 1 statement it say that it can never pick this such a node
and terminate. So, if you look at the statement number 2 that we have written here; there
is reason for every statement. That statement says that at all point before termination
there exist a node n prime, which is on the optimal path, and whose f value is less than f
star of s. So, so simple answer is it cannot. Why because, it would pick n prime as simple
as that the algorithm simply says pick the one of the lowest f value.

So, just imagine there it is about to terminate by picking this goal g 1, and this goal g 1
has it is cost more than the f f f. So, it will never be at the head of the priority queue
essentially, because we have said that at all points before termination, there is always a
node n prime which is on the optimal path and which is on open. So, f this n prime is on
open and it is value is less than f star s.

414
(Refer Slide Time: 35:21)

So, we it has if a star also choose between these 2 nodes 1 whose value is less than or
equal to f star s and one whose value is definitely greater than f star s. Remember I have
written g of g i could have written f of g is a same thing, because f of g is equal to g of g
plus 0, because h is 0. So, you can write f of g or g of g does not matter, it is a same
thing.

(Refer Slide Time: 35:40)

415
So, if a star at to pick between these 2 nodes n prime and g 1, it would pick n prime
which mean it cannot terminate by picking g 1 like this essentially. So, the only time it
can pick a goal node is that if it is at least as good as the optimal cost node essentially.
So, it will always terminate only with an optimal path; is it okay? So, now we want to
look at 1 more property and this is as for.

(Refer Slide Time: 36:31)

So, one more statement before that property that I am talking about is that for every node
N that a star picks f of n is always less than equal to the optimal cost. So, we are saying
that a star will only pick nodes whose f value is less than the optimal, all this is coming
from fact that the heuristic function is under estimating, because the heuristic function
under estimates we are shown that you know nodes like this will always be less an
optimal cost or they will always be such a node less an optimal cost.

And now we are saying the same argument as we as we did for this case 4 that if f star is
going to pick this node n, it must be better than n prime, because it has picked n and if it
is better than n prime it must be less than or equal to f star of s essentially. So, it only
pick nodes which the better than equal to optimal cost node. L 6 is the question; so, let
there be 2 functions. So, we say h 2 is more informed than h 1 if for all nodes h 2 n is
greater than h 1 n. So, if the heuristic value h 2 is greater than h 1 n, then we say that h 2

416
is more informed. Just a definition, we say then if this is the case then every node seen
by a 2 and a 2 is 1 version of a star, which is using this h 2 function is also seen by A 1
star.

So, let us say we have this two versions of a star algorithm 1 uses this h 2 function
heuristic function the other uses the h 1 heuristic function and h 2 is always greater than
h 1 of n we say that the h 2 is more inform than. So, take the extreme case when h 1 is
equal to 0 h 1 is equal to 0 it always thinks that the cost goal is 0 essentially which is
what does where as the h 2 thinks that it is at least some value. So, such a function is
called and we are making a claim that if h 2 is more inform than h 1 so I should write if
here if h 2 is more informed than h 1, then every node seen by a 2 star is also seen by a 1
star, which means a 2 star will see in general a smaller number of nodes, of which means
it will explore a smaller part of the search space essentially, it will be more focused
towards the search.

So, we will do this proof by induction, and you can see that for the start node does
properties true. So, what is the property we are trying to prove that if even if A 2 see
some node then A 1 also sees that node we want to prove that let us call this property p
that if A 2 sees the node A 2 star why star, because both are admissible. So, what is the
situation? Situation is like this for any given node n, there is a band starting from 0 to h
star of n.

417
(Refer Slide Time: 40:46)

What we are saying is that h 2 is somewhere here and h 1 is somewhere here this is what
we mean by more informed closer to h the optimal value. And we want to make a claim
that if this is the situation, then the search space explore by the algorithm using h 2 will
be smaller than the search space by algorithm using h 1. And we will this proof by
induction. So, you can see that for s. So, we call the statement p; so p is true for s which
is a start node. So, we are just saying that if if a 1 sees if a 2 sees the start node and even
will also see start which is really true, because both the algorithms are the A star always
starts were picking the star node essentially. So, the second part is the hypothesis, which
says that let so I will just use this for this whole statement.

418
(Refer Slide Time: 41:51)

So, instead of every let me say any node which is the same thing essentially. So, let p be
true for depth k 1 or depth k. Now we want show that this is also true for depth k plus
how much is the induction step, and this you will do by contradiction. So, let n be a node
at depth k plus 1, which is picked by A 2 star and let A 1 star terminate without picking
n. So, the induction step, you will do by contradiction. So, we are making an assumption
that there is some node we have call this capital N at depth k plus 1 and we are saying a 2
star has pick that node, but a 1 star terminated without ever picking that node essentially,
which means a 1 star found a paths to the goal. Now if you look at this value h f 2 of n,
so f 2 is further second algorithm which is using h 2 is g 2 of n plus h 2 of n.

So, we can rewrite this as H 2 of n and this is less than f star of s; remember that any
node that you take the statement 5 that view it that will have this value less than f star of
s. So, we are just writing f star it has does not matter f 1 or f 2 it is a optimal cost f star is
starts for the optimal cost. So, we can write this as h 2 of n is less than equal to less than
equal to f star of s minus g 2 of n.

Now even star has terminated without picking this node n, which means that f 1 of n is
greater than or equal to f star of s. It cannot be less than f star of s, because is a a 1 star
has terminated and it will terminate by picking a node with the optimal cost value, and it

419
has not picked n. So, if this equal inequality must hold; that it must be at best equal to
that, but it cannot be less in that essentially. So, this we can write rewrite as following f
star of s is less than equal to g 1 of n plus h 1 of n, I am just rewriting this I am replacing
f 1 of n by g 1 of n plus h 1 of n, and I just inverted the sign in equality sign essentially

(Refer Slide Time: 46:39)

Now the question I want to ask is what is the relation between g 1 of n and g 2 of n this g
1 this g 2 of n and g 1 of n; if I have if I have to put a in equality what would I put or
equal to what is relation between this. What should I put, what symbol should I put in
between here g 2 is greater than equal to, you are the only one who are saying that.

why is other case?

(( ))Need be optimal there only the path found so far path found so far by A 2 need not
be the optimal you are not getting the exactly, but you must remember this statement we
made let this condition be true up to depth k the induction hypothesis, which means that
every node that is been seen by A 2 up to depth k has also been seen by a 1. So,
whichever path A 2 found a 1 would have also seen that path, but may it had seen some
other path as well be equal, but we are saying it does not have to be equal.

420
(Refer Slide Time: 47:47)

So, if this will be greater than so, is possible that A 1 might find a shorter paths to n
because it is explore more nodes on the way to n essentially.

(Refer Slide Time: 48:09)

It could always so the point of this whole exercise is that we can replace this by g 2 of n,
we can rewrite this as we can replace it by g 2 of n plus h 1 of n, which means we can

421
write this as f star of s minus g 2 of n less than equal to h 1 of n. So, I just wrote g 2 on to
the left hand side and rewritten the statement here. So, I have this f star of s minus g 2 of
n is less than equal to h 1 of n, n is that node, which even did not pick and that is why we
could write this in equality.

(Refer Slide Time: 49:05)

Here we have said the same thing that this is less than this and therefore simply by a
transitivity here h 2 of n is less than equal to h 1 of n which is the contradiction, because
we assumed this that is h 2 of n is greater is always greater than h 1 of n and here we
have said that if this and this is only assumption we have made that let a 1 star terminate
without picking n and that was select to this statement with contradicts over original
statement essentially. So, this cannot be the case.

422
(Refer Slide Time: 49:35)

And therefore, it cannot terminate without picking n, which means that if a 2 has pick
this node n, A 1 also must pick this node n and which is the statement we are making that
every node seen by A 2 will also be seen by A 1.

(Refer Slide Time: 50:00)

So, the effect of this is that the space explored by a 2 is contained not necessarily strictly

423
contained, but contained in the space explored by A 1 essentially. So, it always space to
have a better heuristic function; the higher the h 1 h 2, the better (( )). Any questions
here? So, there is one more property which needs to be looked at, but we will do that in
next class. So, I will stop here, and will sort of come back to this A star and it is behavior
will do a quick recap in the next class, and then will start again. So, I will stop here.

424
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science
Indian Institute of Technology, Madras

Lecture - 21
A* Monotone Property, Interactive Deeping A*
(Refer Slide Time: 00:13)

We are looking at the A star algorithm. In the last class, we saw that the algorithm is admissible,
provided the heuristic function underestimates the distance with the goal, and provided, every
edge has the cost, which is greater than the some small value, epsilon, and there is finite
branching. We also saw that if two nodes, if their two heuristic functions, one of them is more
informed than the other, which means h 2 of n is greater than h 1 of n, for every n. Then, we said
that algorithm a 2, which uses h 2 or a 2 star, which uses h 2, will explore a smaller search space.
If you can depict as follows; this is the start node and this is the goal node. Then, one algorithm
will search something like this, and the other algorithm will search something like this. So, this
would be h 2 and this would be h 1; this is what we expect, that the more informed heuristic
function will be more focused towards the goal. So, it will search less away from the goal,
whereas, what branch and bound would search, would have been full circle around of that length.

So, the fact that using a heuristic function actually, focuses a search towards the goal and more
informed heuristic function is, the more it focuses towards a goal; which means, search becomes

425
narrow and narrow. With a perfect heuristic function, the search would be just one pass that
would take place, essentially. Unfortunately, we do not have perfect heuristic functions. So, we
try to look for as good functions as possible, which means, that the heuristic value must be as
high as possible, but it should not cross the level of the optimal value, especially. For example, in
the 8th puzzle, it has been observed, that if there two tiles, let us say, 8 and 5; and they should be
reversed, let us say, in the whole situation; 5 and 8 are here, but they should be 5 and 8, instead
of 8 and 5. Then, one way, one of the functions that we saw was; simply count how many tiles
are out of place, in which case, two tiles are out of place; both 8 and 5 are out of place. The other
function that we saw was the Manhatten distance function, which said that you need one step to
take 8 to this position, and you will take one step to get 5 to this position. So, the cost would be
2, essentially, or the contravention of these two tiles would be two, essentially. Now, if you have
solved this 8 puzzle kind of thing, you know that you cannot exchange them, essentially.

If you want to interchange the position of two tiles, you have to do some round about movement.
So, you will need to add a certain value to that, essentially. About 7 to 8 years ago, somebody
enhanced the heuristic function to count for such things, that they are in the correct row, but in
the wrong order. Then, you add a certain amount, and that search was able to find optimal
solution such faster for bigger problem, which means for the 15 puzzle and the 24 puzzle as well,
essentially. Now, today we want look at the other property, which is that; consider two nodes, we
consider some path to the goal node, and two nodes m and n, on the path will (Refer Time:
04:48).

426
(Refer Slide Time: 04:50)

If you take any such two nodes in the search place, and if they satisfy this property, that h of m
minus h of n, is less than equal to k of m n, in a sense, you can think of this, saying that it under
estimates the cost of every edge, which is on the path to the goal, essentially. So, h of m says that
it is an estimate to the, from m to goal; h of n says the estimate from n to goal, and h of m minus
h of n in some sense, is the estimate of the edge cost from m to n. If this is less than the actual h
cost, in some sense, we are saying that it is underestimating the edge, every edge cost,
essentially. This condition is called monotone of consistency condition, and what we want to
show is that if this condition is satisfied, which means that it is the property of the heuristic
function; a heuristic function is such, that this property is satisfied for any two nodes, which
connected like this. Whenever, A star picks some node n, because already found an optimal path
to n. You want to show that, you will do that in a moment, First, let us look at that what this is
saying; what is the implication of this, essentially? We are saying that, under certain condition,
which is this fact that heuristic function satisfies this condition which is called monotone or
consistency condition.

If this is satisfied, if the heuristic function said that it satisfies this property, then the algorithm
which is using that A star, which is using that heuristic function; whenever, it picks a node, any
node n; picks meaning, it picks some open, right; it has already found an optimal path to n. What
is the implication of this on our algorithm? If you look at, if you think about the A star algorithm,

427
what does it do after it picks a node and it takes for, whether it is a goal or not; if it is not a goal,
it generates its children, and the children are of three kinds, either new nodes or nodes on open or
nodes on closed. For nodes on whichever, is open and closed, it checks for, and found a better
path. For nodes on closed, if it is found a better path, has propagated that improvement to its
children, essentially. So, the application of this statement, if this statement were to be true, it
means that it, the moment it puts a node into closed, it has already found a better path, best path
to it; optimal path to it, which means you have to revise that path, so that, the third stage of that
revision process, we can do away with, essentially.

You can be showed that if node is in closed, you already have the optimal cost to it, essentially,
which of course saves a lot, in terms of a this propagation that you have to do, and in later
classes, which is in next class, you will see that this allows us to design some algorithm, which
can save on spaces, but that we will do in the next class. Now, let us try to prove this property,
essentially. What are we saying? We are saying that when it picks node n from the open list, at
that point, g of n equal to g star of n; this is the optimal value. We can rewrite this condition as
follows; h of n is less than equal to, plus h of n; I will take just h of n to that side and then, I can
add g values; plus g of m. So, I can add this g of m to both sides; same value I am adding to both
sides, so that does not change things. But this one, g of m, if you look at the figure; g of m and k
of m equal to g of n. Let me rewrite this in this order. In other words, f of m is equal to f of n;
that is a first observation we make.

What are these values saying here? When we are looking at this criterion, we are saying; is that
the farther we have from the goal; the less accurate our estimate is, and the closer we go towards
the goal, the more accurate our estimate becomes. As we move from node m to node n, our f
value actually increases, essentially. Now, remember the f value is an estimate of the cost of the
path, going through the node. Since, the path is going through both f, and both m and n, the
estimate should be ideally the same. But when you see from m’s perspective, it is actually less
and when you see it from n’s perspective. What is the difference; that you have n is little bit
closer to the goal. So, that is one of the reasons for, this is called a monotone criteria. A
monotone condition is that, as you move closer to the goal, the f value monotonically increases.
Since, this is, this goal, as go closer to the goal, the f value monotonically increases. That is one
observation, which holds for any, by transitivity, this will hold for any two nodes on a path from

428
to the goal, essentially. Now, let us assume that there is a node n which, let us remove this, that
depicts a search space that we are exploring. There is node n which A star is about to pick, which
means, it must be on open; this is open at that stage of time, or that stage of the algorithm, and n
is about to pick node n. We want to make this clean that if A star is about to pick that node n, it
must have found that optimal path to n. So, we will do this proof by contradiction. We will say
that assume, that when it picks this node n, it has not found optimal path. So, assume that g of n
is greater than g star of n. At the point, where it is about to pick n and we will show that this
leads to a contradiction.

(Refer Slide Time: 13:51)

Let this be the optimal path. Let this be the optimal path to the node n, and let us say that this is
in closed; this is in closed. Let us say that A star has found some other path so, this is open. And
let us say that A star is about to pick this node n, and let us assume that at point, g of n is more
than the optimal cost, and let n l be the; so, this optimal path to n, let us say it is this, and n l is
the last node, A star has inspected on that optimal path, and n l plus one is the node which is on
open, which is child of this node. But A star has not expected. So, what we are saying that A star
has not found this path, which is the optimal path, but A star has found this path, which may not
be an optimal path; we are trying to say. So, n of l, last node on optimal path to n, last node seen,
and n of l plus one is the first node on optimal path, which is not seen. So, I am saying that this is

429
the optimal path. This of course, is one edge and then, they added some more; that is the optimal
path to n.

We are assuming A star has found some other path to n, which is not the optimal path. Now, we
can apply this criteria here, f of m, sorry. This monotone criteria that we observed, assuming that
heuristic function satisfies this property from any path to the goal, this property holds that, as you
go closer towards the goal, the f value increases. As we move from f of this last node n l to n l
plus one, and so on to n, the f value will increase, or at least not decrease. In particular, we can
write that g of n l plus one, plus h of n l plus one, is less than equal to g of n plus h of n. I am just
expanding this one here, and that one there, and I am skipping the transitive steps, and because
they are in optimal path, we can replace g with g star. So, we can write g star of n l plus one plus
h of n l plus one less than equal to; so, we can put this star, because we are assuming that these
nodes, these three nodes that we are talking about. These two nodes, n l and n l plus one, are on
the optimal paths to n. So, on this path, these properties will hold, essentially. Now, if this is the
case we can write this as, I am just adding plus h of n to both sides, essentially. So, g of n plus h
of n greater than g star of n plus h of n. Anyway, that is not so important. What is really
important is that f of n is less than equal to f of n l plus 1, why because we are said that A star is
about to expand this node n. This whole property, we want to say that when is about to expand n,
it must have found an optimal path to that, essentially. So, this we can write as g of n plus h of n
less than equal to g of n l plus one plus h of n l plus one. So, I want to look at this one, and this
one, see I want basically, this g of n is less than equal to g star of n using these two, but I have a
star sitting there.

So basically, I have to put this star here, because we have assumed that in this graph, this is
optimal path. So, this g of n l plus 1 is equal to g star of l plus one; this is what I did here. I said I
could replace this by the optimal path. I can do it here as well. Now, of course, this becomes the
same, which becomes the same as this. So, this is less than this, and if you remove h of n, you get
this is less than this. What is this saying? This saying that g of n less than the optimal cost, less
than equal to the optimal cost, but that can only be the case, if g of n equal to g star of n. By
definition, it cannot be less than optimal cost; it must be the optimal cost.

So, what we are said? There is a contradiction between this and this, and this and this. If you do
not even assume this, you can simply show that g of n is less than equal to g star of n, which you

430
can interpret the same; g of n equal to g star of n, which is the statement we are making here.
Whenever, it picks this node n, it has found optimal path to that node, which means g of n is
equal to g star of n. The implication of this, as I said is that you do not have to revise the cost of
those nodes, which have been put into closed. Once you put a node into closed, you know that
you have the optimal cost, essentially. We have shown a serious of properties about this A star
algorithm. The conditions are that the heuristic function must under estimate the cost to the goal
h cost, must be more than some specified amount, some small amount, then A star will always
find optimal paths to the goal node; that is what we showed in lemmas 1 to 6, I think, and this is
actually in lemma 7; the last of the statements that you want to make. In addition of this
condition is satisfied, then you do not have to keep revising cost to the nodes in close; the
moment you pick a node from open and put in close, at that moment, you found the optimal cost
to the node.

If you go, take this figure seriously, then essentially, what you are saying is that both these costs
are initially of the same, this thing, whether this path or this path, it is the same cost; g value
must equal to the optimal g value. So, let us compare this with the earlier algorithm we have
seen. In terms of the four parameters, we talked about time complexity, space complexity,
quality of solution and completeness. As far as the last two of these are concerned, the quality of
solution and completeness, we have proved today, that A star will find a path to the goal, if there
is a little, find an optimal path, always. That leaves the question of time and space complexity.
Again, like we said in case of best for search, time and space complexity, both depend upon the
quality of the heuristic function. The better the heuristic function is, lesser the amount of space,
that your algorithm will explore, and that is what we said here.

Now, sometimes what people do is now, unfortunately, both space and time requirements have
been observed to be large in nature, essentially. We will look at the examples where, space is
cognatic in nature, for example, city, if you imagine a 2-dimensional city, then the farther you
are go away, you can see the area that you have to explore grows as a square, essentially. But the
combinations make low exponential, essentially, and we have seen this property, that the higher
the heuristic value, the heuristic function that we are using, the better for you. But if you want to
guarantee admissibility, the heuristic function must be less than the optimal heuristic value,
essentially.

431
(Refer Slide Time: 27:09)

So, what people have often done is that, they have tend to use a function like this; f of n equal to
g of n plus k times h of n; this is known as weighted A star, essentially. So, you use a parameter
k to decide, how much influence the heuristic function has. Now, notice that there are two
influences on A star algorithm; g of n is trying to keep it, because we always going to pick one
with a lowest f value. So, low g means, it is close to the goal; low h means, it is close to the;
sorry, low g means it is close to the source; low h means, it is close to the, at least, thinks it is
close to the goal. The effect of g is that tries to keep the algorithm like branch and bound, as
close to source as possible. The effect of h is like, best for search to pull it towards a goal,
without any regard to what was the time spent in reaching that node n. If you use a parameter
like k, you can actually control the effect of heuristic function versus g of n. If you put k equal to
1, then we have the A star algorithm to just describe, if you put k equal to 0, for example, then
you have, simply, branch and bound. You do not even look at the effect of h. If you put k as very
high, which I am also saying, that j is 0, there is like best for search. It only looks forward; it
does not look behind. But if you put a value of k greater than 1, then you are giving more
emphasis to the heuristic function, which means the search will become narrower and narrower,
essentially. It will be given more by the heuristic function and less by the tendency to keep track
of, whether you are finding optimal paths or not.

432
So, obviously, the moment you put k greater than 1, you are losing admissibility, but you can
expect that your solution would be close to optimal solution, and depending on some trade off
that you may have to make, you can choose the value of k higher than 1, essentially. Very often,
people tried 4, 5, 6, values of k and seeing, that their algorithm runs much faster. This is simply
to guarantee faster running time. So, what I want to do is to, now, focus on saving space. This is
what people have done. After A star was discovered, it is quite a well-known algorithm. We
want to look at how we can save on spaces. One of the first ideas that was explored, is an
extension of an idea that we have seen earlier, which is; let us do depth first search. Instead of
best first search, do depth first search. Now, if you remember the algorithm DFID, what DFID
does is that it does the sequence of depth first search, with increasing that bound, but there, we
have the motion of level, because we had no cost associated with edges. So, all edges were
supposed to be of equal cost. Since, we have graduated to h cost and we looked that algorithm
like, branch and bound, and so on; we need a variation of that and that variation is called IDA
star. It was given by a guy called Richard Korf. I do not remember, whether we have discussed
cost work earlier, but he has done a lot of work in search, essentially. His PHD thesis, which was
in the earlier 80s, was finding macros, operators. In fact, his PHD thesis is also available as a
book at some point of time. If you remember the puzzle like Rubics cube, in fact, he was
working on Rubics cube and the eight puzzles. We have seen that it is very difficult to devise
heuristic function, which will drive actually, climbing like, algorithms to solutions. So, what we
tend to do is that we have the set of macro moves, which says, if you have done the top layer,
then if you want to; let us say the next objective, which is to get; Let us say one cube, let into
place in the middle layer; you say do this sequence of moves; left, right, left up down, whatever,
we have some notation for that. That is the macro move.

A macro move is a sequence of moves, packaged into one abstract move. The question is; how
do you get macro move? Of course, most of us learned from friends, at what cost rate for this
PHD work was that he wrote an algorithm, which will search in the Rubics cube problem and the
eight puzzles problem, and tried to learn macro move, essentially. His algorithm actually, built a
macro table, which of course, once you have a macro table like, we saw all the Rubic cubes and
we do not do any search; we say, ok, I will do the top layer first; then, I do the second layer and
then, I will do the third layer. Korf’s PHD thesis was to build that table, which listed all the
macro moves, essentially, But this IDA stuff is also by Richard Korf, and we see a bit more of

433
him, as we go along. This was in 1985 also. And IDA stands, IDA star stands for iterative
datening A stars. It still, A star in the sense, guarantee the optimal solution, but is iterative
datening like, DFID essentially.

(Refer Slide Time: 33:44)

The algorithm idea is very simple. What it says is that if we were the start node, you create a
boundary. Now, first, let us see, what DFID would do. Ignore, because it does not say cost of the
edges; it basically, sees it as level by level. So, it would first do depth first search up to some
level, then up to some next level, then up to next level; so we are assuming that edges are sort of
roughly similar in cost in source.

Even if we have these edges of equal cost, IDA star has some value. So, they finally, would have
basically expanded it search space gradually, and if this is the goal node here, it would have
found eventually, at one of the expansions, it would have found a path to goal and since, it did
not find it in previous round; it means, its new path must be the shortest path, because it is only
in this last situation, that you extended this step by 1, but it uses linear space, because it does the
first search, essentially. IDAs are basically variation of this. It says that you draw a boundary,
which you use for controlling a search, and the boundary is essentially, those nodes. Well, it
should not come here. It falls short of this, actually, something like this; just would depict this,
essentially. This is the curve which is defined by f star of s. This is the locals of f star of s, which

434
means, all the nodes on this boundary have value equal to f; f value equal to optimal cost,
essentially. Sorry, not f star of s; f of s. The estimated, because you do not know f star of s; you
know f of s. That is the estimated cost from start to goal, as which is equal to h of s, because g of
s is 0. So, these are the nodes whose f value is equal to that value h of s, and nodes inside will
have lesser value, and the nodes outside have greater value. Now, one of the things that you
should observe is, that is boundary is ellipse towards the goal and that is, because it is using the
heuristic function, essentially. Nodes which are closer to the goal will have lower heuristic value,
whereas, nodes which are away will higher heuristic value, essentially. So, what IDA star is, that
in the first situation, you said that bound, well, depth is not the value.

(Refer Slide Time: 37:02)

Let us say a bound, its starts with saying, bound equal to h of s, and it does DFS, as long as h
value is, f value is less than the bound. So, it is like a boundary it is drawn for edges, and it is
saying that within this boundary, I will do the depth first search. So, let us say, it explores a node
here; we have some children here; another node, it explores here, somewhere here, let us see. It
may have some child here. Another node is here, which may have some child and so on. All of
them will have their f values that you would have completed. If it does not find goal in this depth
first search, now, this is depth first search with a bound; that it cannot go of in this direction, and
it cannot go from this direction; only has save within this boundary. It augments this boundary
by another boundary, which is little bit bigger than this; so, increment. For the lowest f value of a

435
node, which it is not picked, which means, it beyond the boundary, it increments a bound to that
and then, puts a simpler loop, essentially.

So, you can see this similarity with DFID. The DFID, if you did not find a path to the goal, you
would increment the level by one and then, try the DFS again. What this is doing is it increments
the bound to the next. So, with all these nodes, which it has; this is explored; this is explored, but
all these nodes, which is not explored, generated but not explored; it keeps track of a what is the
lowest f value and increments the bound to that in the next round, and does another depth first
search. It repeats its process till it finds a goal. What do you have to say about this algorithm?
First, we should convince ourselves that it is admissible. It is worthy of that star on top of that
name, that it will guarantee an optimal solution. Can you argue for that? Well, when it starts, it
starts with a bound which is equal to h of s, and given that h of s is an underestimating function;
it is not possible that a path of length greater than optimal path, will exist within h of s, because h
of s is an under estimative function.

If the actual cost of going to the goal is h star of s, h of s is less than h star of s. So, this boundary
will never take you to a node, which is more expensive than the optimal cost. If it does not find
the value, if h of s was perfect, then within the first situation itself, it would have found a path to
the goal. But if h of s is not perfect, it would just be short of goal little bit and then, you are going
to increment the bound. Looking so, I have not written the word incremented, but it is the lowest
unexplored, if you can read this word; this word must be unexplored. The node unexplored, f of
n prime where, n prime are all these nodes, which have not been taken, essentially. So, because it
is only incrementing the boundary to the lowest unseen f value, if it finds a goal of f that value,
then it will be an optimal cost. Because that is only making very conservative increments; it is
guaranteed to find the optimal cost. But you can imagine that in the search space like this, the
number of iterations that, it will have to do is going to be very many, essentially.

So, it is going to be optimal; it is going to save on space; why? Because it is a depth first search;
it is a sequence of depth first searches, which only requires linear space, but its time complexity
is going to go up, by many times, because it will do many iterations, essentially. So, some
variations that people have tried is that; instead of incrementing it by to the next lowest unseen
value, increment it by a fixed cost, that you are willing to bear, essentially; some delta. So,
instead of this, if I write bound, it means that I am making a bigger jump in this, essentially. That

436
means, let us say, this is my next value. So, let us say this is plus delta. So this was the original
bound, and this was a bound plus delta. What is the danger here? The danger here is that there
may be a goal node, just beyond this, but there may be a goal node, here. Because it is doing
depth first search sweeping the space like this, it will find this goal node; but it will not find this
goal node. This goal node is cheaper, I mean, if you just assume that this is kind of two scale, in
some sense. This node is cheaper than that node, because it being depth first search, sweeping
the space like this, it will find this node. But this, of course, this delta allows you to control; how
much sub optimal you are willing to go essentially? So, this distance, this is delta that you have
increased the boundary by. Basically, says that in the worst case, you may have optimal cost plus
delta, but in this big jump, you would cover many nodes, so the number of iterations that you do
would drop, essentially. So, it is nice in the sense, that it takes linear space, and it gives currently,
optimal cost so, it is an extension of DFID, in that sense, which uses this fact that there are edge
cost and there is heuristic function and things like that. What is not nice about this algorithm is
what was not nice about the original algorithm; it is a blind algorithm, now essentially. Except of
course, it retains a boundary that it draws, which is determined by the heuristic function, it is
uninformed; it does not move towards a goal, which is why, we started with best first search in
the first list. This is why, we used the heuristic function in the first list, but this is not exploiting
the heuristic function; it uses the heuristic function only to determine this boundary.

In the next class, we will look at another variation, which exploits the heuristic function, and
which is also linear space algorithm. That algorithm was also given by Richard Korf, may be, if
you want to think about that little bit, essentially, it is a little bit like saying, that it is hill
climbing with back tracking, if we can think of it along those lines. The algorithm is like hill
climbing, but allowed to back track, essentially, and then, try another path; which means, it will
not have exponential space; it will have only one path always in the main way. We will look at
that algorithm in the next class and then, we will look at some more recent algorithms, which
have come in this century, I should say, which are space saving algorithms, which are quite
interesting, essentially.

So, I will stop here for today.

437
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Lecture - 22
Recursive Best First Search, Sequence Allignment

We have been looking at algorithm A star, which is a very well known algorithm for
finding optimal solutions in a state space.

(Refer Slide Time: 00:27)

And at this movement, we are trying to look at variations of A star which are space
savings. So, before I start the algorithms, let means ask you one small or status small
problem.

438
(Refer Slide Time: 00:59)

Let say you are in a ((Refer Time: 00:53)) like city and there are a set of so, it is a grade
like this. And you have to come to this last city here. Let us say, this cities are numbered
0, 1, 2 up to m and 0, 1, 2 up to n. So, it is on m by n grade and by this I mean m adjusts
in this direction and n adjust in this direction. There are m plus 1 cities in horizontally
and n plus 1 cities vertically essentially. It is a complete grid, I am not drawn the
complete grid. The question I want to ask is how many paths are there, if you want to go
from this city 0, 0 to the city m, n there.

And we are assuming that you can only travel either to the right or down essentially. So,
it is not, it is a directed graph, you can only move either to the right or down. But, the
question is how many paths are there, m and n you have to use two parameters m and n
essentially, whether I m adjust here and n adjust in this direction. So, if you if there only
one city, there is only one path. If there are four cities then you can see there are two
paths. As the number of cities increased, the number of paths increases quite
dramatically. Thus, anyone know of the korf how many.

Student: x plus y c x.

x plus y…

439
Student: C x, I mean total number of either we can go in as an if one direction is fix than
the other direction could be fixed, suppose from all the paths you choose after fixing one
direction on to the limited numbers left.

Well, you can think of this as a sequence of right moves, we have to always make m
right moves and you have to put in n down moves, soumiya.

Student: n plus n factorial ((Refer Time: 03:34)) divided by n factorial.

Divided by n factorial, that is a correct answer, you can think of it has saying that you
have to make m right moves and somewhere along those m right moves, you have to put
those n down moves and the number of ways we can do that is given by this essentially.
So, I will come back to this problem; let me increase the complexity of this problem,
what if I allowed diagonal moves as well. Now, that is a little bit harder, it does not have
a nice closed form solution, it is the summation of many things, but I would like it to
think about that.

So, you can think of this new problem as follows that one diagonal move replaces one
horizontal move and one vertical move. And out of the m plus n horizontal and vertical
moves that you are making, whichever is smaller m or n, you can replace it with that
many diagonal moves optionally, you do not have to, but you can replace it with that
many diagonals moves. So, that many horizontal and vertical moves become less and
diagonal moves will become more essentially. So, this a little bit more difficult to
analyze, but it is a good exercise.

440
(Refer Slide Time: 05:18)

So, let us first continue with our variations that we were looking at. In the last class we
saw I D A star and if you recall, what I D A star does is that, essentially from the start
node, it creates a boundary, this is the goal. And this boundary is the heuristic value of
the start node or in other words how far it things, the algorithm thinks the goal is from
the start node. And it draws at boundary and does that depth first search on this. It rate a
((Refer Time: 05:54)) comes on the fact that after it is fail to find a goal here. It would
have some nodes open just around the boundary, which it has not expanded.

It looks at those nodes and fix the minimum f value from there, and extends a boundary
by that much amount essentially. And it keeps doing that, till it has found the goal node
essentially. So, I think we are started discussing the properties of this algorithm, but let
us recap. So, maybe I will ask you to tell me, what do you think of the I D A star
algorithm. It was formed by Korf, Richard Korf 1980’s something, I do not remember
exact date. So, what is good about this algorithm and what is not good about this
algorithm. So, the first question we want to ask is does it survive the substitute for A star,
which means does it guarantee and optimal path or not. So, what is the argument you
would give?

Student: ((Refer Time: 07:18))

441
Why, how can you say the ((Refer Time: 07:21))

Student: If I do not the optimal path at a certain value of that then what I am doing is I
am finding the node with the value of f just higher than that. So, I would not miss out
anymore. So, the movement I the value of f, which is the ((Refer Time: 07:37)) goal I
will find ((Refer Time: 07:39)).

So, that is one criteria it satisfies for us, it is that it finds the optimal path, why is it better
than A star, why would you prefer it to A star or when would be prefer it to A star.

Student: Open lays does not go too large.

Open lays does not go too large in fact, it goes linearly only and especially, if you are
looking at problems, in which thus search space is very huge. You can imagine if you are
generating something like hundred thousand nodes a second in the search space. Then
how quickly well I am will get fill up with, you remember that each state, each node is a
representation of a state which means that, whatever you have described about the state
would be there. And then you have applied a move gen function and you generate a new
state and you put all those things into some kind of a data structure essentially.

So, the advantages of I D A star is that it needs less space and this is the theme that we
are going to follow today, how can we look at space saving algorithms, but what is a
disadvantage that you can think of.

Student: ((Refer Time: 08:57)) time complexities.

So, what do you mean by saying better pass may be left unexplored. We have already
argued and he said that, we guarantee that it will give you the optimal solution. So, that
is what we want essentially. Time complexity is like it was in the case of D F I D time
complexity is an issue, but unlike D F I D. In D F I D we assume that the cost was
uniform of each edge which meant that, as you get went deeper, the number of nodes in
the next layer was increasing exponentially or it was multiplied by the branching factor

442
b.

And therefore, every time we increase the depths by one, you encountered many more
new nodes in fact, more you nodes then the old nodes that you had seen so far.
Unfortunately for ideas are that is not the case, because now we have edge cause involve
and as we discuss, we increase the bound only to the next lowest f value, which means
that it may do a significantly large number of searches before it actually converges to
this, which is why we observed in the last, towards the end of the last class that instead
of incrementing it to the next lowest step value. You increment it by some amount, we
determined amount delta, which is a loss of optimality or willing to bare essentially.

So, if you increase further by a value delta, this is delta. Then the next time it searches, it
may find a node somewhere here, which is a goal node. So, I have drawn only one goal
node, but in practice, search problems may have more than one goal node essentially. So,
for example, if you are looking for a Chinese restaurant in a city, they may be many
Chinese restaurants, if you are looking for a particular film, which is showing in some
theater, you may have more than one solution essentially. So, depends on what your
search algorithm finds.

So, it might find a goal node here, as oppose to a goal node here, this is close to the
previous bound that we had, but this is far there from the previous one. So, it has an error
of our delta. So, if you are willing to tolerate an error of delta, then you can increment
this delta and you can control how much you have willing to tolerate by essentially. So,
that is one problem with ideas are that it may do too many iterations essentially. Another
problem with I d A star is that, it is sort of does not cater to our ascetic sense if you want
to call it, in the sense that, it does not have a sense of direction.

We started search with line search algorithm and then say, then we said we are
introducing heuristic functions to guide search towards the goal. The only role the
heuristic function is playing is in defining this boundary here. Of course, the boundary as
you can see is ((Refer Time: 12:05)) sided, it is towards the goal by that is about all it
does essentially. So, the next algorithm that we want to look at, which is also by Richard
Korf is called recursive best first search, popularly known as RBFS.

443
So, R B F S is a little different from I D A star and no doubt korf devises algorithm,
because he saw the drawbacks of I D A star, which is that, it was doing too many
iterations. In particular, if you do not maintain a close list and you just let, D F S in some
sense run wild is that was good idea or bad idea to do. Let us say, D F I D or I D S are
without maintaining a close list. With it get, the question is one of the things at closed,
thus for us is there it is stops as from going to an infinite loop essentially.

So, if you have for example, search after this is, this I told you was a directed graph, but
if this one not a directed graph, then you could have gone into a loop keep going in a
loop like this essentially. And close allows us to avoid such a possibility, but given the
fact that we have working with bounds on the distance from the source that you have
willing to go, can I work without closed? In other words, if you go back to the D F I D
algorithm, which is basically a simpler version of I D A star that it does not have edge
cause will D F I D work, if I do not, if I implement the D F S without a close list.

So, I will leave that the small topic experiments for you to do. Let us forget this
algorithm RBFS. So, what R B F S does is that, it maintains a linear amount of memory
exactly like I D A star, but it does not. So, what I D A star is doing, it is doing blind
search, wherever the goal is it will go of in one direction, back track, try something else,
back track, try something else, back track, try something else and so on. So, it is will
basically a depth first traversal of the space with no sense of direction. Now, let us try to
stimulate, what R B F S could do.

So, let us say we start with this node start, and let us say we have these four children, and
their heuristic values for the sake of argument, let us say, this is 40 and 70 and 71. So,
initially recursive best first search behaves like best first search, that it fix the one with
the lowest f value and expands that, which means in this example this know. But, what it
also does is that, it keeps a pointer to the next best node that is in the open list. So, in this
example, this is an express node and then it does the search essentially.

So, this is second best node and that is it node ((Refer Time: 16:06)). So, let us say it
generates this and if you remember the monotone criteria are consistency conditions that
we said, that they in generally you expect the heuristic values to become more accurate.

444
As you go closer to the goal and a consequence of monotone criteria was that f values
increase as you go forward, as you go towards the goal essentially. So, he would
generally expect f values to increase in a search space, which is consistent. So, let us say,
this is become 45, this becomes 50 and this become 61 for arguments sake.

So, what immediately R B F S will do is, it first removes this pointer from here, and put
it to this. Because, now that is a, to this, this is the best node and this is the next best
node in the open list, and it keeps a pointer to the next best nodes essentially. Then it
goes like this and let us says this is what is happening, let say this becomes 48 and this
becomes 70 just for arguments 60. Then, it expands this one, remains 48 let us say and
just for argument sake, let say these are all 70 and it expands this. So, it is diving into the
search space, using the heuristic function as a guiding whose.

Let us say some point, these nodes start looking and little bit worst than that node. So,
this becomes 55, this becomes 53 and this becomes 57 or something like that. So, so far
it is behavior and the best for search behavior is identical. But, at this point well, it has to
what best first search would have done is simply, it would have pick this node as the next
one from the open list and started exploring the tree from there essentially, what
recursive best first search, which is trying to save one space, thus is that it rolls back this
search all the way. So, that it can move to the next siblings essentially.

So, it is basically deletes all these nodes from open, close whatever it is maintaining. So,
it rolls back you know, there is to say about the Indian government at one point, it is a
roll back government. So, they would increase the petrol prices and after two days roll it
back or something, little bit like that. But, it has a rule which it follows and which is
called as a backup rule, which determines the f value of a node, f of n is equal to all let us
call it some other value. Let us call it f prime value is equal to f of n, if n is a leaf which
means n is on open is equal to max no min f prime, I just use this short form.

So, either it is a f value or it is a minimum of the f values of it is children essentially. So,


what really it does is that it maintains this f prime value. So, for all these nodes 70, 70,
70, 60, 57, 53, 55, the f prime value is the same as the f value, because it is a leaf node.
But, when I it rolls back, it applies back up rule to back up the values. So, for example,

445
from these three nodes it will back up the value 53 here.

(Refer Slide Time: 20:48)

So, let we use a different chalk to illustrate that. So, it backs up this value 53 to this node.
So, now this becomes 53 and there it deletes this essentially everything. So, this is 53,
70, 70, so again it backs up 53 here; this is 53, 70, 60. So, it backs up this thing here,
changes this to 53, which is the backed up value and then to seats along this direction.
So, you can see that, once it has devises value to 53, if you just look at the snap shot of
the search at that point, all these nodes are not there, all these has been deleted, only that
much of the tree remains. And this is on open, this is put back on to open with a new
value of 53, this is 50, this is 61, this is 70.

So, it basically naturally goes off on that direction. So, you can see that at any given
point, recursive best first search maintains only one paths down the search tree, which
means it is space requirement is going to be linear, because that is what the depth first
search also does. And yet, it mimics the behavior of A star algorithm, in the sense that it
is best first, it always goes down the best looking path. It just that on the way, it has
revise the value of this node, the heuristic value has measure by the heuristic function
was 45, but after is done this search, it realizes that it is not 45, it is 53.

446
So, it leaves it to the open value of 53 and goes down this path essentially. So, it will
generate these children now, and depending on whether they are. So, when this is 50, this
will become the next best node essentially. So, if I going down this path, there is no node
better than 53 in this path, it will roll back from here, and go down this path again. So,
you can sense a similar problem as in that it may do this many time, it may go down this
path, it may go down this path, it may come back here, eventually it may finish this and
it may go down this path.

And then maybe, this will become worst and then it will go down this path. So, R B F S
is a danger that it opposes it, depicts the behavior of what we call is thrashing, that is a
danger with R B F S, you read that it may thrash something like this. So, I do not know
whether we discuss thrashing, when we were looking at hill climbing, but if you imagine
a hill, which is like a ridge.

(Refer Slide Time: 23:38)

So, which looks a bit like this cross section if you can visualize this, like this? So, it is
like a ridge essentially, which is slowly increasing in one direction. Then hill climbing
would have a tendency to go in this direction which is, because that is a steepest gradient
direction essentially. And if you are search problems is the granularity of your move gen
function is search, that once it makes a step in this direction, it may over shoot at some

447
point then it make, it may come back that may go like. So, hill climbing can also behave
a similar fashion behavior essentially.

So, this is the one problem with search algorithms, which are local in some sense
essentially. This is local in the sense, it is always going down one path, but it is not
completely local in the sense are it is keeping updated values, backed up values, for
known essentially. So, R B F S is came around 1990 also was an improvement on I D A
star, because it had a sense of direction, but it has this problem there, it could do ((Refer
Time: 24:46)). And actually we have students, we have implemented these algorithms
for the course here, I have observed this behavior that you know these algorithms just
keep spends, what they say as infinite amount of time, you will just touching between a
few loads essentially.

So, what you would be really interested in is space saving algorithms, which behave
more like A star in terms of the nodes that they are picking for expansion as well.
Whereas, these two algorithms behave like A star in terms of the solutions they produce,
the way they pick nodes is different. And consequently both this algorithms have a of
course, much larger time complexity in A star, because they will explore the same space
again and again many times essentially.

Now, so let me now ask a question has to, if you were designing on a search algorithm
and you are add to look at different options of whether you can save on the close list or
whether you can save on the open list, what would be your choice. If somebody says that
you can, I will give an algorithm in which, the open list sizes is minimize, made constant
or something like that or made linear like, I D A star or it somebody says that, I will give
you an algorithm, in which the close list side is ((Refer Time: 26:14)) reduced, which
one would you choose. Would you rather proven the open list or would you rather
proven the close list?

Student: ((Refer Time: 26:26)) size of depends on the size of the grid the number of
nodes.

Well, I take your answer to mean it depends on the topology of the problem essentially.

448
But, if you look at a general case, were every node has b successors, you know branching
factor is b, every node as b successors. So, very often the community tends to even if you
have a graphs from which you are searching, then you know, if you remember the branch
and bound, we started off with, in which the duplicates were not remove and the same
node would appear and different parts of the tree. Many people tends to think of that is
space, as a space of searching over paths, because each node in a different part of the tree
depict the different path, because you know from route it as a different path. Then of
course, it tends to go very highly.

So, when we studied D F I D, we had argued that the number of nodes in the last layer of
breadth first search by itself was must larger than all the internal nodes seen. And then
we argued that, because if that is only the extra work it we are doing and you are getting
linear space in D F I D are oppose to exponential space of breadth first search. Then you
are saving on the open list largely, that is what D F I D was doing saving on the open list
essentially. But, there are problem sometimes, when you want to save on the close list,
when the close list can become a greater problem than an open list.

(Refer Slide Time: 28:48)

And this problem that we are looking at, this city route finding problem is such a
problem essentially. Now, if you visualize, how search will progress in this space. This

449
will be the start node, it has three successes, it would generate those three successors; it
will take one of them, generate its three successors and so on and so forth, that is how
search will progress. How will the search frontier look like or the boundary of the search
look like? It would be some of which look like this. At some level, the search frontier
would look like; it would have gone some steps to this thing and some steps down at
different parts of the space and search frontier look like.

So, for such a problem how in terms of m and n, how is this open list, search frontier is
equal to the open list. So, this is the open list, how the open going in terms of m and n,
the father you go away from the… So, the distance from the source is m plus n or i plus j
if you want to say, if you have a gone i steps down and j steps to the right, it is a i plus j.
The open list is only growing linearly, as you go further and further away. In the worst
case, when you are just about to pick the last node, open list would be m nodes there, and
n nodes here essentially, m plus n essentially. Whereas, the close list, which is all the
nodes that we have traverse inside this area is growing quadratically essentially.

Because, it is kind of is like an area is suppose through line that you are drawing like
that. So here, is one problem with the close list is growing faster than the open list, the
difficulty in solving this problem comes when the combinations which you have, there if
you want to go from this node to some node here, there are you can go like this; or you
can go like this; or you can go like this; or you can go like this. So, there are many
different combinations and that is what gives rise to the exploding search spaces. So,
there is some ((Refer Time: 30:35)) trying to save on the close list essentially.

450
(Refer Slide Time: 30:54)

So, let me introduce to a problem which has in the last 10 or years or maybe 15, 20 years
has become very important, and that is a problem of sequence alignment. And it is a
problem, which has spade work in search, search had become a kind of dormant area in a
research, but because of this new problems, which are coming out. People were set of
motivated into devising better algorithms essentially. So, if you look at for example, D N
A sequence alignment, and that is a problem with in bio informatics, for example, you
want to do very often essentially.

So you do, if you want to do for example, genome sequencing and thinks like that, the
way the sequence genome is that they get bits and pieces of the sequence from different
parts. And then they have to assemble the whole genome sequence by aligning, you
know, if there are two sequences which have a partial overlap. Then if they can align
that, then they can reconstruct ((Refer Time: 31:54)). So, will take a simpler version of
this problem, so the alphabet of this is let us see C A G T, which are this four chemicals,
which make a pure D N A, and you have sequences of these characters.

So, let us say, there is one sequence which is like this, A C G T C some arbitrary
sequence I have written here. And let us say there is another sequence, if you look at it
carefully, they are not identical, I made some small changes in the two sequences, if they

451
were identical then the problem of alignment is straight forward, you just put once, you
can against the other sequence. If they are not identical, the two sequences and that
includes the possibility that they are of different lens also, I may you this sequence, I
may give you some another sequence like, T A C G then if I say, can you align this
sequence with this sequence, you might say that yes, I can take this part here, and
alignment this part here and have an alignment essentially.

But, what about sequences like this, you can see that, I can match A with A; C with C; G
with G; but now I have T here and a C here essentially, what do I do essentially. So, in
sequence alignment, you are allowed to insert gaps, if inserting the gap improves, the rest
of the sequence alignment, then you allowed to insert the gap. So, you can see that, that
once I align A C G with A C G here, then if I insert the gap here, which means, I will
align this T with a gap. So, let us say this transfers that then I can again align C A G
here, with C A G here, T C G, T C G everything is getting aligned, up to this T A.

And then suddenly, I have to insert another gap, so I insert one gap here, and then I can
((Refer Time: 34:34)). So, if you can make out this diagram, I am inserting a gap here, in
this sequence and I am inserting a gap here in this sequence and by doing so, I am
improving the alignment essentially. So, the problem of sequence alignment is to find,
some alignment, which is optimal according to some criteria will define that in the
moment, which is the best essentially. Observe that, I could have simplify said like this,
that take this whole sequence, and the place it with gaps all the way, and then start this
sequence.

So, I could have then something like this, A C G T C A up to C G here, and then started
the next sequence from here, A C G C A and so on, and these are all gaps. That is of
course, a simple algorithm, take the first sequence, align it with gaps, then take the
second sequence and insert gaps in the first sequence in that place; obviously, that is not
a good alignment essentially. So, how do we differentiate between good and bad, we
give some cost to every operation, and let us say we do the following in practice, people
follow probably more discriminative cost function, but will follow simple cost function,
which says that matching.

452
So, if there is a match, then cost is 0, so if I am aligning in A with an A, I am paying a 0
cost, mismatch cost 1. So, if I am aligning a T with a C for example here, then I am
paying a cost of 1 and a gap, I have to pay a cost of 2. So, let us say that, for example if
there is only one character, which is different, if let us say only this T and this, instead of
this C there was a... let us say another C here or something like that or a A or a G, which
is not T, and the rest was the same. Then I could get away by paying a cost of mismatch,
which is 1, rather than insert two gaps to you know, take care of those things essentially.

But, in a situation like this, where you can see that, inserting one gap here produces a
match for so many more characters, simply by inserting one gap here. So, I am the cost I
am paying by inserting this gap, I am regaining by getting 0 costs for all the rest
essentially. So, obviously there is a notion of optimality here and we want to find a
sequence alignment, which has optimal cost based on these three, this. In practice of
course, you may have a cost matrices which would say that, aligning a C with A has a
certain cost, aligning a C with a G has a certain cost and so on and so forth or
mismatches may have you know different cost.

We are just assuming that all mismatches have 1 cost and gap filling has 2 costs. So, how
does one solve this problem, what would be a good algorithm to do sequence alignment?

Student: dynamic program

Dynamic programming. So, they say they if to use the animal programming, till this bio
informatics people came in and then suddenly, we have sequences of hundreds of
thousands of characters and those algorithms, so good is essentially failed essentially. So,
if you do some kind of an analogy, if you would see the dynamic programming is like
branch and bound that we would, that we were studying essentially in some sense
essentially. So, you are looking at this graph

Student: you can use the word as the rows and column and if cost will diagonal ((Refer
Time: 38:59)) so if you move long diagonal ((Refer Time: 39:02)) and machining it.

453
(Refer Slide Time: 39:11)

So, let us try that, this is by first sequence here A C G T C. So, I will just use the few
characters A C G T C. So, I will draw this here, and I will draw this here. And let me
take five characters from here, A C G C A, A C G C A or maybe I should, I have started
labeling from the second listing, any way let us assume that, we are working here. So,
now you can see that a diagonal move in this. So, if this I look at this alignment,
matching A with A, C with C, G with G and T with C.

So, let us forget the first ones, I should have really drawn this like that, but this move
amounts to saying that I am matching C with this C, then I am matching this G with this
G, but then I run with T with ... So, I want to insert a gap here, in the second this, which
means I am not going to traverse a character, which means I am not going to traverse
down ((Refer Time: 40:28)). So, the next move would be like this and then I would be in
this node, and then I would continue like this.

So, can you see that, the first move aligns C with C this move, the second move aligns G,
it going from here; I am going from C to G and C to G here, so it aligns that. The third
move I am going from G to C, I am not going from G to C, I am staying in G, but I am
going from G to T here, which is like inserting a gap in this essentially. Of course, I am
not shown mismatches here, but you can see that the cause of diagonals would be either

454
0 or 1, depending on whether the two characters that you are moving on are same or
different.

So, if I move with A to A cost is 0, if I move with C to C cost is 0, from move with G
with G, the cost is 0, if I would align T with C and moved diagonally, then the cost
would be if 1. But, instead I am saying, I am going to insert a blank here, which is the
horizontal move here and the cost is 2. So, essentially you can actually forget about
alignment at this point and remember that the cost of every horizontal, a vertical move is
2. And the cost of a diagonal move is either 0 or 1, depending on whether a moving on a
same character.

So, remember that moving down means, moving in this string; moving horizontally
means moving in the other string and moving diagonally means moving on both strings
and if you are moving on the same character then cost is 0, if you are moving on a
different character, cost is 1. So, we are transforming this problem of sequence alignment
into a graph search problem, where you have to go from this corner of the graph to the
other corner of the graph with a optimal solution cost essentially. So, any question about
this.

So, this has been, this has been a motivating problem for A I search community to look
for newer and better problem, better ways of solving search problems. And why do we
need, what do we mean by newer and better ways? We should be able to handle such
graphs which have, let us say hundred thousand nodes in this direction and hundred
thousand nodes in the vertical direction. And then you can imagine that and A star like
algorithm will quickly run out of space essentially. So, the motivation here is to say one
space essentially.

And here, I have just shown two strings, you can extend this to multiple strings
assignments, alignments and you can imagine a third dimension coming here and then a
four dimension and then a fifth dimension , you can align multiple strings. So, the graph
would become multidimensional in nature and then you would have to find from one
corner to the opposite corner essentially. So, the problem is have been transformed into a
graph search problem and the goal is to save one space essentially.

455
So, we will begin with talking about saving on close list and again, the motivation is the
same graph. Because, as you can imagine, if this was to be my search frontier which
means, this is on open, this is on open, this is on open, this is on open or whatever, this is
on open, this is on open, this is on open. If this was the nodes which are an open then the
size of search can be utmost m plus n. So, it is going to grow linearly, at least it will be
the smaller of m and n, as it goes down.

Initially of course, it will be very small, but the size of closed is going quadractically,
because that is would like the area, which is enclosed in this korf, which is m into n in
the worst case at that point, when you reach there. If you have see explode the entire
graph and we want to see. So, this is the motivating problem where, we would be happy
to say one closely essentially. So, let us just ask the question, what is close doing for us
in search, before we move on to the algorithms which. So, this question, no closed,
supposing, I am did not have closed, what would happen to my problem.

(Refer Slide Time: 45:16)

So, what is the purpose, what is the function that a functionality of close list in a search
algorithm one is avoid looping. Now, this motivating example may not allow for
looping, because we have said the edges are directed, but in general of course, looping is
possible. So, we will take a more general view, two is find better path. So, for example,

456
in ((Refer Time: 45:51)) star algorithm or in A star algorithm, you keep the nodes in the
closed and you may find a better path them in which case, you update that path and that
kind of a thing.

And the third thing, one more things closed as for us. In general, if you look at search in
a planning problem, what do you do, if goal test is true?

Student: ((Refer Time: 46:30))

Closed has all the information about parent pointers, which allows us to reconstruct the
path essentially. So, the third thing it does is, it allows us to reconstruct the path. So, if
we are going to think of having an algorithm without closed, then we would have to
worry about, these three issues or these three things that closes doing for us essentially.
So, let me first begin with the second one, how can we get around this problem that, I do
not need to maintain close with one path and then find a better path later and I do not
want to do all this.

Student: ((Refer Time: 47: 23))

((Refer Time: 47:26)) star algorithm maintains a value, but in the closed or in the colored
node as they call it, but it may find a better path, no it may not find a better path. In
((Refer Time: 47:36)) star algorithm, when once it colors a path or once it puts it in close
either always found the best path essentially. But, in A star also we can do that and the
we had discussed that, when in A star can you say, that if you have put a node in closed,
you have already found the best path to the node.

And you would not find a better path later, when does that happen, come on we just did
it. I mean not just now, but in the recent past, when the monotone condition, when the
heuristic function satisfies the monotone condition, we proved that, whenever node is
picked for inspection, which means it is put in to close. A star has already find, found the
optimal cost of that node essentially. So, this updating of past we do not have to worry
about that, what we do have to worry about is avoid looping and reconstruct path.

457
So, we will do that in the next class, we will look at some algorithms for which
drastically proven the close list. After that, we will look at some algorithms, which
proven the open list of course, we have seen I D A star is one such a example, which
proves a open list, but we will look at a other variations. And then we will try to see
whether, we can have an algorithm which proven both the list essentially. We shall take
us to the state of art till about 2005, when that last paper was published I think. So, I will
stop here and then we will take those algorithms of in the next class essentially.

458
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Lecture No - 23
1) Pruning the Open
2) Closed Lists

(Refer Slide Time: 00:11)

Our space saving versions of a star and today we want to first look at pruning the close
list. So, just to recap assuming that you are working with a heuristic function which
satisfies the monotone criteria or the consistency criteria we are worried about two things
1 is that as a community, now calls it stop the search from leaking back. This means
basically just imagine the search frontier being pushed out into the search piece the open
node should not come back from inside the search piece whatever what could have been
in the closed list.

Essentially, that is the first objective and the second objective would be of course to
reconstruct the path. So, let us first worry about the first objective is it how can you stop
the search from going backwards. So, in 2000 or so Koror old friend who is been
working on this for a while and his student Zhang.

They devise the salary sum which I will name in a little while, but it work has follows
that imagine that you have this node x and you generate its children; let us only look at
the forward children. So, let us say we expanding the search frontier here and let us just

459
call them a b c essentially now what Korf and Zhang suggested was that you store along
with these nodes. So, this is open this is only be the new open, so with every node in
open you store it is parents list of its parents. So, with a for example, we will store x here
with b also we will store x with c also we will store x what is the idea the idea is it this
list forms a kind of a list essentially.

This means that when we in turn were to pick a or b or c for expansion we would not
generate those children which we have listed in this list here. So, it is a very simple
mechanism for search to be pushed only in the forward directions essentially. Now, for
example, if there was a y here which was related to x and let us say this y also connected
to this c and b essentially. So, either point when y was expanded then this x this could
come x comma y this also will become x comma y. So, with every node in open, we
maintain a list of nodes which will not be generated when that node is when this is called
with that node essentially. So, it is like at abort list with every node essentially.

So, with this simple mechanism we can see it has a effect if you think about this that
every edge is diverged only once while searching. So, this edge from x to a is only
diverged in this direction or in other words a back pointer is put only in this direction a x
can never become a child of a x can never become a child of b x can never become a
child of c and. So, when a is generated some new nodes where we generated may be
even b may be generated, but x would not be generated. So, that the children of a are
going to be these nodes. So, this is 1 mechanism for stopping the search from leaking
back which means that, so let us say this source is somewhere here and search frontier
expanding.

So, when we when we generate children of a it will only be the forward looking children
after a if we generate b then a will not be generated, but some other children of b would
be generated and the search will only push in the forward direction. So, one task of
getting into loops is taken care of by modifying the nodes in open list by them with extra
information as to which nodes should not be generated.

460
(Refer Slide Time: 05:31)

A little bit later 3 or 4 years later, another Chinese student Zhou working with Hansen
supervisor produced a different algorithm in which they had a slightly different
mechanism for search from leaking back. Their mechanism always follows that the set
closed is partitioned into 2 sets 1 is called, the kernel and the other is called the boundary
and the way to distinguish between the kernel and the boundary is that this has no
children on open. This is a negation of this which means at least 1child; this is
remembered this is the close list essentially. So, if I want to draw the close list if this is a
start node then everything inside this is on the close list and everything on this frontier is
open.

So, the boundary nodes which are if I can draw in this colour would be these nodes
which are which have at least one child in open and all other nodes. So, let draw them as
a numbers which has children which are all on closed would be the kernel essentially.
So, they distinguished between closed and open and the idea here is that when you
generate children of open you only need to look at the boundary and boundary serves the
old function of closed which has to avoid the search from looping essentially. So, if a
child is present in the boundary, then you do not generate it, otherwise you generate it.

It has a same function of pushing the search in the forward direction, so this boundary
layer this intermediate layer of nodes the boundary layer and it basically stops the search
from coming back any node will not generate a child in boundary and the boundary list.

461
(Refer Slide Time: 08:29)

So, now let us address the second function of closed which is to reconstruct the path
because it allows us to reconstruct the path from the start to the goal know how do we
how to Korf and Zhang handle this. So, I hope this is clear this mechanism for pushing
the search only in the forward direction and taking care of the first path because there is
no looping which is going to take place taking care in this case by modifying.

The open list allows only successes, which are not in the closed, in this case why actually
storing a pruned, closed list. They will see that this kernel is not necessarily stored, but
the boundary has to be stored and the boundary is going to be only the edge of the closed
essentially in some senses. Hence, if you only store those then you can stop the search
from coming back, so let us look at the other problem of how to reconstruct the path. So,
Korf and Zhang’s algorithm actually maintains only the open list it does not maintain the
close list at all essentially.

462
(Refer Slide Time: 09:52)

So, the algorithm the search face that this algorithm generates something like this, so I
have this start list and i have this open list. I have a gold node here and that is all I have,
only nodes on the open observe that these nodes are modified they store information
about what nodes I do not want to generate. So, basically it has some extra information
for every node there will be a list of few nodes which are tabor for it to be generated as
children. Now, instead of closed what they maintain is a layer of nodes another layer of
nodes which is roughly like this and this is called the relay layer and the relay layer is a
list of nodes.

So, let me draw the relay with these numbers this time and we will keep it for the relay
nodes only and every node on the open. So, remember that I a star we maintain the
parent pointer that every node had a parent which we would if necessary if we found a
better path around. Now, we are saying we will never find a better path anyway, so we
do not need to worry about that, but we still and the function that the parent pointer did
for us was to allow us to reconstruct the path when we found the goal now in this
algorithm by Zhang and Korf. Every node on open maintains a pointer to earn ancestor
which is in the relay layer.

So, every node will have 1 ancestor in the relay layer and so on essentially, so of course
it is not pruning the close completely it is replacing close by a another layer essentially
which is a relay layer essentially.

463
It is pruned everything else essentially now what it tries to do is that the relay layer is
roughly at the half way mark I will just write half way mark here. So, I am not writing
the details here we will give pointers to the papers as well as you can refer to my book
and I have described the algorithm there initially when the search starts it, you do not
maintain any pointer or you can say figuratively.

You maintain a pointer to the source node essentially an ancestor point know which do
not really have to, but at some point it decides that a given node is at the half way
marked from the start to the goal essentially. It says I will make this node to the relay
layer, so the first question is relay node the first question is how you decide that the node
is at the half way mark it does not have to be exact. Roughly, at the half way mark you
look at its values what would happen at the half way mark at the f value g and h should
be roughly equally. So, I will say g of n is roughly equal to h of n if your function is
good, then it will be closer to me equal if the function is very conservative.

Then, it will end up setting up relay layer little bit earlier than actually it is requires, so
maybe you can have a factor or something, but let us not get into those details
essentially. So, this is how algorithm works it maintains one boundary sorry it maintains
1 search frontier or the open list and after a certain point in the search it maintains a layer
relay layer essentially. Initially, you can imagine when the open is here there is no need
for relay only when it has pushed beyond roughly the half way mark which is here. It
starts constructing a relay layer and then it pushes forward essentially, so that is a basic
search algorithm no closed, but this thing.

So, you can imagine that when a child when this node is expanded into these 2 nodes
then this will be deleted this node will be deleted and the parent pointer would be pointed
to this essentially. So, it will it will pass on the pointer to its children and so on, so at
some point to the goal is picked essentially at some point the goal is picked and the goal
will have some pointer to some relay node here. So, when you pick that goal you know
what is the cost of reaching that goal and the optimal cost of reaching the goal because
you know its g value because we are we are shown that a star finds an optimal path. So,
when you pick the goal node you have you know the cause of the optimal path to the
goal, but you do not know the path only know is that.

464
There is one relay node let us call it r which is an ancestor of the goal node on the path it
is on the path and it is an ancestor of the goal node. Now, how what do you do you want
the whole path you want all the nodes which take you from the start node to the goal
node essentially. All we have is 1, it is like somebody tells you that if you are going from
here to Delhi, then Bhopal is a relay node or something like that I do not know what
distance is. Let us see that that you have to go to you have to first go to Bhopal, then you
go to Delhi and you will get the optimal path, so let us first reveal the name of this
algorithm it is called d c f s.

(Refer Slide Time: 17:20)

The expansion is divide and conquer frontier search, this will give you clue as to how do
you reconstruct the path.

465
(Refer Slide Time: 17:34)

So, to reconstruct the path you make two recursive calls to divide and conquer frontier
search 1 call goes from s to r and the 2nd call goes from r to you make 2 recursive calls
and what would that give you?

(Refer Slide Time: 18:07)

That would give you two more nodes somewhere here and somewhere here then you
make 4 recursive calls from here to here and here to here, so all of this and you keep
doing that till the problem has just become an edge that.

466
The next node is just a child of first node essentially that a base clause when you
terminate the questions essentially. So, remember that once you have solved the first
once you made the first call to divide and conquer frontier search you finish with all your
memory requirements. All you know is that there is a start node there is a relay node and
there is a gold node and then you making a fresh call which is to a smaller problem or
roughly of half a size provided this is this holds that g is roughly equal to h. Otherwise,
there may be a unequal number of size which means a as you can imagine it is like
working with a unbalanced binary rather than a balance binary you may do more work in
1 half and less.

In the other half, but as long as you can divide it roughly half you will split the work half
and half and you will keep doing that till f eventually reconstructed the path the full path.
So, that is why this name divides and conquer frontier search, so this space requirement
of the algorithm is only to maintain the open list or the frontier and a relay layer
essentially and that is all it needs to do essentially. So, we are thrown away most of the
close list and in the kind of problems that we discussed the sequence alignment problems
it is close which is going faster. Then, essentially open is only going linearly close is
going as quadratic of the size of the problem what would be the complexity of this.

(Refer Slide Time: 20:20)

So, you can say that if the original problem of depth could be sized solved with prime
complexity d whatever d is, it depends on the function rarely. Exponential function in

467
general plus the extra work that you are doing what is the extra work 2 into to f d by 2
plus 4 into t of d by 4 of depth 4 depth d by 4 and soon and so on till you solve this small
problem of depth 1. Essentially, for some value x into c of depth 1, so all that is extra
work you are doing, all the extra work is done to reconstruct the path essentially how
much is the extra work.

So, you can solve this its multiplied by log to the base 2 from the depth of the complexity
of td multiply take the log of that. So, if the td were to be exponential in nature be there
as to d then this would be d times 6 log, so you are doing if you are finding a path of
length then essentially you are doing d times extra work. So, if the path is of length 50,
then you are having 50 times extra work to reconstruct the path, but in the process you
are saving on space essentially let us see what Zhou and Hansen do.

What they say is why do we breakup the problem into half what is the rational for
breaking up the problem into half. Of course, we know that divide and conquer strategy
says that if you break it up into half then you can know solve it using this complexity. In
this era of increasing memory available they say that you should do this pruning of
closed only if you are running out of memory essentially.

(Refer Slide Time: 22:51)

So, their algorithm is called SMGS and it expands to smart memory graph search. So,
what do Zhou and Hansen do they say you just learn it like a star do not worry about
pruning or something. You keep track of how many how much memory your algorithm

468
is using somehow and if you can at some point realize that you are running out of
memory, then you prune essentially. What do you prune you prune the kernel you keep
the boundary because you need it, in fact when you prune the kernel at that same very
time you convert this into a relay. So, initially this algorithm is working with this layer
boundary layer and the open layer going neck to neck open layer is moving forward the
boundary layer is just following it at some point.

(Refer Slide Time: 24:09)

So let us say this is a situation this is a boundary layer this is the open layer. So, the
outside 1 is the boundary layer the inner 1 is the sorry the outside 1 is the open layer and
the inner 1 is the boundary layer and whatever inside the boundary layer is closed all the
kernel. So, your counter something tells you that you are running out of memory, so
what do you do you prune the entire closed and convert this into a relay layer and search
progresses from there as before, so let us say its con here. It is got another boundary
layer following it essentially, so at all points a boundary layer just follows this search
frontier because it needs keep the search from leaking back.

Essentially, every time you generate children of open you check on the boundary layer if
they are children or not and then this is the area between this curve and this curve is the
kernel which you have not pruned essentially. Then, again some somebody tells you are
running out of memory, so again you covert this into another layer. So, that is why it is
called smart memory in the sense it is aware of how much memory it is using and

469
whether it is running out of memory essentially. So, unlike divide and conquer frontier
search this maintains one relay layer roughly along the half way mark smart memory
graph.Search maintains as many layers as required it could be 0 it could be 1, it could be
2, it could be 4 depending on how big the problem is and how much memory is available
to you.

At some point, when it finds the goal it would have some path up to some relay layer
which they call as a dense path and from this layer to another layer by a mechanism. You
can workout it would have a series of ancestor pointers which they call it as a sparse
path. So, in their terminology divide and conquer frontier search has 2 sparse paths 1
from start to relay and 1 form relay to goal in smart memory graph search if you are
solving a very large problem, you may have a bigger sparse path. Then, of course you
have to make that many recursive calls to solve each of them, so it may be the case that
the first time around you make 5 relay layers.

So, from start to 1, 1 to 2, 2 to 3, 3 to 4 and 4 to 5, so you will make 5 recursive calls, but
for each of the recursive calls is possible because it is a smart algorithm. You may not do
many more recursive calls because you can imagine that it is close to what this memory
can tolerate essentially. So, it is a little bit different from this in that sense that its aware
of how much memory it can use and only prunes kernel when it is running out of
memory. Remember that pruning kernel incurs this cost of reconstructing the path
because once you are thrown away all these nodes you have to do all this recursive calls
to reconstruct this path. You have to do recursive calls to reconstruct this path, but here
you are not thrown away.

So, you just have to follow the back pointers and you can just get the path from there, so
depending on the how much memory available this behaves in a mode smart way
essentially now. So, let us move on to now pruning the close list essentially which is the
work, in fact, carried these two groups all over again, so how does one prune the close
list. What Zhou and Hansen showed and this was around 2004, so it is not so back in
time compared to looking at.

470
(Refer Slide Time: 29:07)

They give some idea some which is called sounds like a curious name, but you can see
what they are doing breadth first heuristic search it is it is sound like contradiction in
terms. So, the basic idea behind breadth first and we are talking about pruning open, now
most algorithms which prune the open list relay on getting some upper bound on the cost
essentially. So, compute U upper bound on f of the problem that you are trying to solve
what is the maximum possible value that the cost of solving the solution can be.
Essentially, how do you compute an upper bound, one way is to use some ready
algorithm to try to find the solution.

So, you do some beam search with very thick beam based and hopefully you will get
some solution it may not be optimal, but it will give upper bound. Essentially if you
know one solution the solution can be made the upper bound and that is a theme which
runs in too many variations of these algorithms that we are going to see either as and
when you find better solutions. You reduce the upper bound essentially and then what
they say is the following that if this is your start goal.

This is a goal node and if you have a boundary which is the upper bound well that is not
quite correct. So, the upper bound serves as a boundary which means that any node with
f value greater than this value u you will not expand. So, if you generate a child here for
example, then you will never expand these two children essentially, so that is the purpose
that this boundary is serving.

471
So, we are only going to search within this hypothetical boundary, which is determined
by the f value essentially. So, before expanding a node from open check whether it is less
than u only then you expand it essentially now if you want to do a star like search. Then,
the open at some point would look like this the node that you would, whereas if you do
breadth first search keeping this upper bound in mind. Now, if you want to do blind
breadth first search your search boundary would look like this. Assuming that you now
costs are roughly equal for every edge essentially thus visualise the problem, but if you
are doing this breadth first heuristic search which means you are guided by this upper
bound which you have generated by some heuristic algorithm.

Then, your open is only going to be this much, so this is a open for this algorithm and
basically empirically one can observe that the size of open for this breadth first heuristic
search is smaller than the size of open for a star which is roughly like this. So, this
should give a visual intuition, but this is do not take it at face value, but it just to allow
you to give an intuition essentially now another variation.

(Refer Slide Time: 33:43)

This is to prune this even for which is to keep it of constant width and you can imagine
the algorithm is beam search. So, beam search and we have explode beam search earlier
is the variation of beam search in the sense that you keep searching till you find the goal
rather than hill climbing like beam search where you stop. If you do not find a better
node here, meaning of beam search is that you maintain an open list of constant width.

472
Then, you search towards the goal now; obviously, it is not complete because you are
throwing away other nodes, see breadth first heuristic search is complete. It will find a
path for the goal but beam search go off in this direction away from the goal and we
never actually give you a path. So, it is not complete, so how can we make beam search
complete before we do that Zhou and Hansen also gave us.

(Refer Slide Time: 35:01)

So, this is Breadth first heuristic search and you can convert this into divide and conquer
breadth first heuristic search by using a divide and conquer mechanism of which we are
by now familiar which means that along with.

473
(Refer Slide Time: 35:27)

So, in general if you want to draw this graph you would have an open which is to be seen
like this followed by a closed not closed a boundary layer which is just behind it. So, you
can imagine the search you know progressing in this direction and this open will slowly
get converted into a boundary and the new open will move forward and also 1 layer
somewhere in between. So, this is a relay, this is a boundary and this is open, so if you
maintain this 3 layers of node you can convert breadth first heuristic search into divide
and conquer breadth first heuristic search and you will have to do the same mechanism
of reconstructing the path.

So, even algorithm which is not only saves on open it also saves on closed because you
are no longer keeping the close you only keeping the boundary and then relay layers
essentially exactly like what smart memory graph search would have done. Now, you
can of course do the same thing with beam search beam stack search. So, we will
visualise this algorithm as a search tree because it is easier to do it like that. So, just
imagine that this is a search tree that that search same search algorithm generates and we
saw this mapping.

Earlier, we assume that the search tree is ordered, so the lowest heuristic values are from
the left and the right and the highest at each layer is ordered. So, that this is increasing h
just for the sake of visualization we assume that this tree is ordered by increasing h
values.

474
In this case, the boundary that we are talking about their given by the u upper bound on
would look something like this. So, you have to think a little bit about this and convince
yourself that this is how a u value. Remember that this side is values lesser than u and
that side is values greater than you and because u is on upper bound on cause that we
have somehow figured out we know that we do not have to search that part of the tree.
So, we have to only search this part of the tree essentially because it is a heuristic search.
We can now assume that since we assuming that in this visualization the tree is draw in
such a that heuristic values are increasing from left to right, so you can imagine the
heuristic search also progress from left to right essentially.

So, the divide and the beam search is essentially beam search in this space, first of all it
is a beam search. Now, the beam of course start searching from the left node that I have
drawn it in the middle just to show what happen now beam search is incomplete we has
observed essentially. So, you can make it complete by introducing back tracking that a
little bit like what recursive best for search would have done no we are not talking about
the backed up values just back trucking and retry. So, if it runs into this upper bound it
back tracks and try something backtracks and try something.

So, something like back trucking behaviour we want to stimulate except that keep in
mind that in this visualization which is actually my own idea, you would not find it in
that paper this heuristic values are ordered from left to right. So, it is like a on this space
it is like doing that first search from left to right, but how do we do this back trucking in
how do we n practice implement that.

475
(Refer Slide Time: 41:10)

We do it by maintaining another data structure called the beam stack, where at each layer
we store 2 values 1 is f min and the other is f max and it is a open interval on the right
hand side and close interval on the left hand side. So, these two values are telling us as to
where in this space you are searching, so the value of f min is here and the value of f max
is here. So, it tells algorithm which part of the search space you are searching essentially
and this it does.

For every layer, we have f min these two values, so what is that mean that if you are back
tracking and you are coming back to this point in this layer you not found the solution.
You have to come back to this layer you want to start a second search here, and then
essentially these are the value f min at this level and f max at this level you go back to
this layer and reset this value. So, let us say this is 100 and this is 1 50 let us say in some
domain you replace it with 1, 50 and some other value whatever because we have the
open list you generate the open list.

476
(Refer Slide Time: 43:15)

So, you go back to this parent here, generate the open list take children only whose
values are greater that equal to this 1 50 or f max essentially and construct a new beam
layers depending on something it could something like 1 80 or something like that. What
is the initial value for this beam stacks every value will have 0 comma u. So, what is
beam stack is doing is it helping with the back tracking process once you back track to a
layer and you generate. So, you go to these nodes, generate their children again, so again
which of those children do you want to now explode, this beam stack will tell you that
you have already seen values up to 100 and 50 and look at value which are greater than 1
50 essentially f values.

So, generate only look at those values, so depending on of course how many there are
this value could be 1 80 or it could be 1 90 or whatever essentially. So, you must
convince yourself that maintaining this beam stack allows us to search completely in the
search space. So, of course it is very difficult to visualize in this space, but in this space
which is on ordered h value it is easy to visualize that the search will progress from left
to right. If you can search this entire space inside this view, your algorithm is going to be
complete essentially and this beam stack allows us to do that essentially, so this is called
BSS beam stack search and the next step.

As you can imagine is divide and conquer beam stack search you can expand this you
what used to this idea of DC BSS means what is this to this maintains only 4 layers like

477
those divide and conquer breadth first heuristic search. It maintains one open layer, one
boundary layer, sorry one boundary layer here and some relay layer in the peak. It
maintains only these 3 things beam stack search maintains all this everything which is
inside this beam. So, the question I want to ask which i hope has occurred to you is in
beam stack search you have the parents of every node. So, you could go back to the
parent and regenerate the parent’s children and take the next set of children in divide and
conquer beam stack search, you do not have the open layer.

You only have the boundary layer and you only have the relay layer how can you back
track. Now, I hope you see the problem isn’t it may be I am going a bit fast here just
because I am running out of time, how can how can this search go here? Retry this space,
let us say the relay does not matter let us say this also relay or something does not matter
how can it go here and how can it do this type because we do not have the parent of this
relay node here. What is more we do not have the parent of the boundary node we cannot
go back to it is parent and then this parent, so this was a paper published by Zhou and
Hansen ICAPS 2005.

ICAPS is international conference on automated planning and scheduling and for this
paper, they got the best paper award in the conference essentially. So, it is possible, so let
me ask I want to do, I want to stimulate the behaviour of beam stack search which means
I will go down searching down the beam and if I hit the boundary I will come back and
try something else. I will come back and try something else trouble is if I throw away the
close list or the parents, how can I come back and try something else we have 1 minute
to answer this question the answer is that you do not talk of going back.

You regenerate from the source again and so suppose think you are at the ninth layer.
You want to back track to the 8 layer what do you do you go to the start and generate 8
layers which children should pick the beam stack will tell you that of all the children that
you are generating which once are the 1s. These are inside this beam that beam stack has
this information, so you go to the eight layer and then you can generate of course there is
a extra work again back tracking 1 step would have been just going to the parent and
then retry.

Here, you have come from the source all the way to the parent and there from this parent
to the next parent again you have to go all the way and so on. So, if have to back track

478
you have this extra work, but as a result of this if you do not covered to the memory
required by the beam stack which goes linearly with it, but these a small value. Let us
hope what is the space complexity of divide and conquer beam stack search, it is
constant you are just keeping three layers of constant width, the open layer, the boundary
layer and the relay layer. So, here starting with an algorithm a star which required
exponential amount of space, we have an algorithm which practically requires constant
amount of space.

In this era of huge memory sizes, you can keep the beam width as large as you can isn’t
it and it will work. So, I will stop here, I believe professor Shri Chaudary is waiting
outside, He should be waiting outside and with this we will end the search part of it. We
will look at problem solving from the slightly different prospective in the next class that
we meet on Wednesday, which is you know looking from the goal towards the problems,
essentially how can you move from the goal to the problem.

479
Artificial Intelligence
Prof.Deepak Khemani
Department Of Computer Science And Engineering
Indian Institute of Technology Madras

Lecture - 24
Problem Decomposition With Goal Trees

Today, we take a slightly different approach to problem solving. So far, our approach has
been that we are in some given state, and we want to find a path to the goal state or
something like that, or we are given a description of the goal state and we want to search
for a state, which satisfies that description. So, today, we want look at a slightly different
approach, which is kind of, has a flavor of backward reasoning, in the sense, you reason
from the goal that you want to achieve, and try to break down the problem into smaller
problems and so on, essentially.

(Refer Slide Time: 01:06)

So, the idea that you want to explore is called problem decomposition. To motivate, why
we should use this approach, let me try and take up a problem and solve it in the
approaches that we have seen so far, and let us say that you have a friend, whose
birthday it is. You have all decided to go out and let us say, give a treat to your friend. I
know that is not the normal thing, but let us say that is what you are doing, essentially.
Anyhow, decided that you will go in sometime, somewhere, in the evening. Then,

480
maybe, you will go to a movie or a music show or something, and then, you will have
dinner somewhere, essentially. So, that is the idea of planning in evening, and let us say,
it is friend, whose birthday it is, who is yet, to decide what is acceptable. So, you start
querying, essentially.

(Refer Slide Time: 02:28)

So, let us try to visualize this as a search, in a simple search space where, you give
different options to your friend. Let us call this problem as designing an outing, and the
goal state is, when you have decided upon the three things, that you will do; what will
you do first; what will you do next; and what will you do after that. So, let us say that
you start up, by saying that you will go to a mall; very popular thing, nowadays. Then,
maybe, you will go and see a movie. Let us say, I will use HP for Harry Potter, one of
the Harry Potter films. Then, you will go and eat somewhere. Let us say, pizza hut. So, I
will use PH for pizza hut. So, you present this option to your friend, and he or she says,
no. That is not acceptable. So, let us say that you are working in a depth first fashion,
essentially, or something similar to depth first where, this may be, heuristically guided or
something, like that. So, you try something else, and let us say, you give another dinner
option.

Let us say this well known restaurant in Chennai; Saravana Bhavan, and your friend

481
again, says no, essentially. Can you think of another restaurant name? Let us say Coffee
Cafe Day; you can go and eat there, which I will use CC here. Your friend says no,
again, essentially. Let us say you are run out of your budget and your restaurants and so
on. So, you go back and say, let us try another option. So, just to clarify, this is the
evening level, so, to speak. This is a movie or entertainment, and this is a eating out. So,
there are three decisions, you have to make. Let us say, you offer to Bhuvan’s Home,
which is a very nice movie, as most of you might know, but and then, of course you
cannot ask at this level, yes. So, in the approach that we have followed so far, we will
consider every node and ask, whether this node is a goal node or not, essentially. This
cannot be a goal node, because it has only two things; the third thing is not there,
essentially, but there are other approaches. For example, when you do constrain
satisfaction, then at each stage of the partial solution, you can inspect the partial solution
and say, is this consistent with what I am looking for or not; but we will not get into that
at this moment. So, again, you give these offers. So, pizza hut; answer is no, Saravanaa
Bhavan; the answer is no, Cafe Coffee Day; the answer is no, essentially; then, you try
one more movie. Let us say, A I, the movie. So, you must be knowing, there was a film
called A I, the movie. Then, you try these three options, again. I will not write them, but
these are the same three options; what you are trying, and your friend again, says, no,
essentially.

Let us say at this stage, you have run out of movie options as well. So, you back track
from here, and you say, shall we go to the beach. Then, you try all these same things
again. This entire tree that is below mall; you are going to explore below beach, again,
essentially. So, let us say, go to the beach and then, go and see harry porter; and then, go
to pizza hut; then, go to Saravana Bhavan; then, go to Cafe Coffee Day; then, go to
Bhuvan’s Home; then go to let us say, pizza hut; and suddenly, your friend says, yes.
You have found the solution; your friend is agreeable to the idea of going to the beach;
then, going and watching Bhuvan’s Home, and then, going to the pizza hut for dinner,
essentially. Now, if you look at this search tree, I want you to look at this, and make
some observations; is it a good way of searching, or what is happening in this tree?

.Many nodes are replicated, or let me ask a more specific question; what is wrong in the
left side of the tree? What is it that is not working in the left side of the tree? Where is

482
the fault in the solution that you constructing, or what is wrong with the solution that the
left side of the tree is trying to construct? What is, or which part of the solution is not
working? Can you look, inspect the tree?

Because you explored the entire tree below mall, and I have drawn three movies and
three restaurants; they could have been many more, essentially. There could be a greater
depth, because we have explored the entire tree below mall, and it did not work. If mall
was the problem, then you can see that if in the first solution that you considered, which
is mall, Harry Potter and Pizza hut; if somehow, you could figure out that mall is the
problem, then you would not do all these wasteful work, here; trying to see if not pizza
hut; then, Saravanaa Bhavan, if not Saravanaa Bhavan, of course, then, Cafe Coffee Day;
and if not Harry Potter; then, Bhuvan’s Home, if not Bhuvan’s Home; then, A I, the
movie; all these work is wasted work, essentially, and the reason is, because the culprit is
here, and keeping that choice fixed, and search it below the tree, is not going to help, any
longer. Anything you try below this is not going to work. If you figure out that mall is
creating a problem, essentially. It turns out that you go this side, and then, you get a
solution, essentially. So, how does one solve this problem of doing, what is the issue that
we want to address is that; you tried one thing and then, you are doing this useless work,
exploring this part of the tree, which will loss and not going to work. Only, when you
change this, it is going to work, essentially. So, there are basically two approaches to
addressing this difficulty. One is called Dependency Directed Back Tracking.

Now, the backtracking that we are doing here, in most of the algorithm that we have
seen, is chronological in nature; that you undo the last choice that you make. So, you
tried this solution. It did not work so, you undo this last choice, which is pizza hut and
try the next one, at this level, which is Saravanaa Bhavan; undo this and try this. When
all these three fail, then you undo this and try this. This form of backtracking is called
chronological, which mean that you undoing the last choice that you are making. What
we try to do is dependency directed backtracking, is to identify the culprit of what is
wrong with the solution, or in other words, without changing that attribute, you are not
going to get the solution. If you can identify, after coming here, or after coming
somewhere that the culprit is the choice of evening twist, went in the mall. Then, the
algorithm should actually, directly jump back to this, and try the next choice, essentially.

483
So, we will not really, explore this option, here, unless we get time later on, but it is
something that we study, when you look at constrain satisfaction method, essentially. But
we are not going to do too much of constraint processing, here. Constrain processing is
just a different formulation of solving such problems in which, you formulate the entire
problem in terms of variables, and values, that variable can take, essentially. The other
approach that you want to consider, today, is to construct what I call AND OR graphs or
trees. We could have converted this into a graph by, instead of generating these three, I
could have connected them to these three, and then, it would have been a graph. So, there
is not really, too much of difference, whether it is a graph or tree; just a way of looking at
it. These are also known as goal trees.

(Refer Slide Time: 13:33)

We all want to look at problem decomposition with goal trees. The idea of problem
decomposition is something, like using a mechanism to break up a problem into smaller
parts. In this example, it is a little bit, like using a context free grammar, to say, that an
outing is made up of an evening out, and the movie and dinner, and then, so on and so
forth; but it does not have to be context free. In this case, it could be context dependent.
So, how do we do this? This approach is to view the problem differently, which is to say
that you have outing as a; this is the goal that you trying to solve. Now, you are going to
break it up into three separate sub goals. So, one is evening out, or evening plan. Then,

484
movie, and then, eating on dinner. So, this is the different representation. To distinguish
between this sort of tree and that sort of a tree, we put an arc here, for our benefit.

Such an arc is called as AND arc, and such a node is called an AND node. So, an AND
node is a node from which, the choices, all the nodes that it points to; it is a directive
graph. Every one of them has to be solved, as opposed to such a node; this is called an
OR arc. This view is called an OR arc, or this, would be called as an OR node. So, an OR
node has all OR arcs below it. The meaning of an OR node and OR arc is that you have
to do one of the things below that. For example, you have to either, go to the mall or you
have to go to the beach, or if you had decided upon the mall, then you have to either,
choose Harry Potter or Bhuvan’s Home or A I, the movie, essentially. That is the node.
So, we have those all choices here, as well. So, let us say, mall and beech, and may be,
some other choices; does not matter. Then, we have the three choices for the movie,
Harry Potter, Bhuvan’s Home and A I, the movie. Then, we have the three choices for
the restaurant. So, such a tree is called an AND OR tree, because it has AND nodes as
well as OR nodes in that. The AND node is at top level. OR node is below that, and a
solution in the OR graph like this, the kind that we have been studying so far, is the path.
The solution in an AND OR tree is a sub tree. So, a solution will have, for example, this
and this and this and this. So, the solution for this same problem is a sub tree here. The
solution in that combination was a path. The important thing is that you have broken the
problem down into three smaller problems, and solve them independently, of each other.
This means that when you know that you have to choose between mall and beach, you do
this independent of the other things. Whereas here, one should choose the mall, then you
are saying, even the mall; what else should I search for?

Now, that does not really make sense, because mall is independent of what you are going
to do after that, and this formulation of AND OR tree, allows you to break up the
problems into smaller parts, and solve it, essentially. So, we want to study approaches to
solving algorithms that we will work on such formulations, and solve problems, like that,
essentially, but first I want to give some motivation about, where it has been used and so
on. Now, obviously, there would be cost, if you going to do some kind of optimal
solution, finding there, would be some kind of cost associated with it. In this example,
cost could be, for example, mall may have some cost associated with it; may be, the

485
distant you have to travel, or sometimes, some malls now; they charge some fees to some
people, but I do not know. Obviously, movies also will have charges; eating out will
have some costs, and you may have some criteria; you want to those, find the solution
within this cost, or within this budget, or of optimal budget and things like that. So, we
will worry about cost as we see this later.

(Refer Slide Time: 19:30)

Let me take another example. Suppose, you want to construct a house, you can see this
also, you can pose this problem also, as an AND OR problem. So, just to take some
simple variations, you might say something, like single storey or a duplex; these might
be choices. In a single storey house, may say that you have to make a kitchen, followed
by, let us say, room one. So, I will just say, R1 followed by, let us say R2 and so on, and
let us say, balcony. Now, this, of course, would be an AND node here, and I could have
edges coming from here. For example, duplex house also will have a kitchen and so on,
which I can always, draw it as a graph or I can draw it as a tree; it does not really matter.
Then, may be, room 1, would say, wall 1, up to, let say, we have 4 rooms; we have
rectangular rooms. So, from wall 1 to wall 4 and then, may be door; let say, we have
only 1 door in this, and let say, we have a window.

This would be an AND node, and if you look at a wall, for example, you may have

486
choices, like stone or brick or mud. In this manner, the choices that available to you to
construct a house, can be organized into an AND OR tree where, the top level, you have
the high level goal, which is to construct a house. Then these, at lower levels; this is just
as choices here. At lower levels, these are sub goals. Then, you keep breaking down. Till
what extent you break down a problem? You break it down, till the problem is so simple
that you have a solution, already available, essentially. So, I will illustrate that with the
next example that we see, which is from one of the first systems that were to use AND
OR graphs, essentially.

Now, keep in mind that they are two kinds of cost that are going to be involved. One is
cost of transforming a problem. Let say, we assumed, there is a cost of doing that into
sub problems. The second is the cost of solving the sub problems and so on. So, at the
end of this, these nodes are called leaf nodes, I mean, the leaf nodes are actually, called
solved nodes. So, the leaf nodes in the tree would be solved nodes. Solved nodes, of
course, would have an associated cost with it. Depending on the domain, we will see a
domain; we will start with looking at symbolic integration, for example, which was one
of the first applications, which used AND OR graphs. The solved nodes may not have
any cost associated with it. For example, you know that integral of XDX is something
that you can just pick up the solution, or X square DX, you can just pick up the solution
and things like that, but if you look at the problem, like building a house, then if you
going to build 4 walls, then each of those four walls, when you build it; it will have an
associated cost, and then, you will have to aggregate all that cost into the cost of building
that house, essentially. So, solved nodes may or may not have cost, depending on what is
the nature of the problem, but transformations will always, have some costs associated
with it. That is the way we will look at the problem, essentially, because you want to
control the amount of search that you do.

(Refer Slid Time: 23:55)

487
One of the first programs to look at that AND OR trees, was the program called SAINT,
which was implemented by a guy called Slagle in 1961. When you are looking at the
introductory lectures and we are saying, what are the achievements that A I has done
over the years; we had mentioned symbolic integration, at that time, in 1961. I think in
MIT, he wrote this. This was his PHD thesis on how to do symbolic integration. So, what
do you mean by symbolic integration? That you are given some expression, and you
want to integrate that expression, in definite integral and you want to convert, you want
to produce a expression, which is the integral of this expression, essentially. So, my
mathematics is a bit week. So, let me, yes, use this break up, which, as strong as thesis.
So, let us say that you want to integrate X raise to 4 divided by 1 minus X square raise to
5 by 2 d x. I am sure that you people are more familiar with integration than I am.

You know how to solve such problems is that basically, you apply some transformations
and may be, you break up the problem into smaller parts, something that would fit very
nicely into the AND OR situations, because you have a choice of transformations to
make. So, first let us look at what the solution, look like. So, this is the node, when you
can transform it into sin raise to 4 Y Y; do not ask me how, but think for it, which we
can. So, the transformation is X equal to sin Y. So, eventually, when you break it down
into something, which is simple, which is accessible to you; you will have a sort of,
invert this transformation as you construct the solution for this, essentially, I am in a risk
of converting this into a Maths class; let me try one or two more steps.

488
You do not have to really write this. Notice that here, we have replaced something like Z
equal to tan W, let me write it here, and transformed into this, and likewise, this can get
transformed into d w. So, now, we are assuming that this node is a solved node, in the
sense, that you can do this without any difficulty. This node is a solved node, and this
node is a solved node. So, you have the solution now. So, this is an AND node, of
course, and this so. The solution to this integration problem is a series of transformations
that you do, followed by, breaking up the problem into smaller parts, because we have
this addition here; these smaller parts, and then, we can solve this integral DW integral Z
square and integral DW, very trivially. So, we will treat those nodes as solve nodes. So,
that is what I mean by saying that the cost of finding such nodes, the cost of this solution
is; you can consider it to be small, but that cost of transformation can be counted
essentially.

Of course, this is not the only way of doing this, essentially, and there could have been
other choices. So, for example, you could have said that transform this to cost minus 4
YDY, and transform this into something else, DX, and may be, it does not look so
promising or something of even, worst you could try something like this. So, to solve a
problem like this, there are many options. Those of you, who remember doing your
Maths or some entrance exam or the other, you would remember that you have to break
your head; try to remember, what are those transformations and so on. That is why, this
needs a lot of practice, but what this program SAINT; SAINT stands for Symbolic
Automatic Integration.

Later, a guy called Moses developed a program, which he called as SIN, because this
was called SAINT; which stands for Symbolic Integration. This SIN was eventually,
transformed into a product called MACSYMA. Nowadays, of course, you must be
familiar that we have these mathematical packages, which can do symbolic mathematics
for you. These are all the descendants of this SIN and SAINT, essentially. Some
MATLAB, for example, will also allow you to do symbolic mathematics. So, what you
want to show here, is that symbolic integration can be seen, as solving a problem in this
manner where, you reason from your large level, higher level problem to break it down
into sub goals, till sub goals are simple enough, to be solved trivially, and then, you have
a solution for solving this problem. Of course, actual answer to this, you have to undo all

489
the transformations that you have done, which I have not written here. So, let me give
another example where, this idea was used effectively. We had mentioned that in the 70s,
was the area of what we call as expert systems.

(Refer Slide Time: 33:06)

One of the first expert systems was this program called DENDRAL. This DENDRAL,
basically, also can be, some people say you can extend it to a dentritec tree X algorithm
or something like that, because it generated also, an AND OR graph, which look like
that, but the idea of DENDRAL was that it was the assistant; it was originally developed
or programmed it as an assistant to a chemist, and the task that the chemist was doing
was to determining the structural formula of an compound, essentially. Now, you know
that again, I presume your chemistry is better than mine. So, you know that if you are
given a molecular formula for a compound, for example, C6 H6; how was it C6 H12;
whatever, benzene; what is the formula for benzene? C 6 H 6, and you must have heard
the story of Kekule, and you know how people were trying to figure out, what is the
structure of benzene.

So, at a molecular level, of course, it has 6 carbon atoms and 6 hydrogen atoms, but the
question to ask is; how are these atoms arranged, in a structure? That is a very important
question, because the physical and chemical properties of the substance, depends on the

490
structural formula, that it is behind, essentially, and to find the structural formula, is not
such a straight forward task, and apparently, Kekule was dreaming, or day dreaming or
sleeping, or something, and he saw this snake, which was biting his own tail, and then, it
is benzene ring and so on, essentially. So, the idea behind DENDRAL was to help a
chemist, find structural formulas and apparently, people who have done PHD in
chemistry, spend their considerable amount of time, trying to find the structural formulas
of this thing. The problem is not so simple, because a given molecular formula may have
millions of different structural formula, associated with it, essentially, for larger
compounds, obviously.

At a must lower level, you know, the structure plays an important role in materials,
because whether you are holding a diamond in your hand, or a lump of coal; is just a
matter of structure. You, basically, holding carbon in your hand, but it is really the
structure of it, which gives it the properties that you looking for.

(Refer Slide Time: 36:17)

So, the way that is DENDRAL was implemented, the idea was, it will explore the space
of possible structures, and this can be done by breaking down a molecular formula into
smaller parts, and exploring the structure of the smaller parts. We will see a small sample
example, but this was aided by a process of generating a synthetic spectrogram. So, there

491
was an algorithm that given a structural formula, what will be the spectrogram of that
material look like, essentially.

Once you can do that, you can compare it with a real spectrogram of the material, and if
the spectrogram matches, then you can say that you have found the structural formula for
that material. That is the approach with chemist for doing it laboriously. DENDRAL was
a program; it was developed by Stanford in 1971 or something like that, came a bit after
SAINT. It was touted as the first expert system, which was built, and 70s and 80s was
the era of expert systems where, people said that we will build expert systems, and by
this, they meant that they will elicit the knowledge of experts, put it into a program, and
the program will then, perform at the level of an expert. So, apart from DENDRAL, in
the introduction, we might have mentioned things like prospector, which was an expert
system to find a prospect for oils or minerals, and so on.

There was a program called R 1, which was also at Stanford I think, which was used to
configure wax systems. In those years, of course, buying a machine, a computer was not
such a straight forward task as saying that I want Apple, Mac book or something like
that. You had to actually configure various components together, and R 1 was an expert
system, which helped people, configure computer systems. We will look at this; how this
R 1 and things like this; such programs have built, a little bit later in the course, but
today, we want to focus on this A 0 star, kind of an algorithm, or AND OR tree
algorithms, and which basically, search over AND OR trees. So, where did the expertise
come in?

Expertise came in that there was first a program called CONGEN. So, DENDRAL was
made up of a program called CONGEN, and this stands for Constraint Generator. This
had expert knowledge inside it. So, of the many possible ways that you could conjure a
structure for a given molecular formula, CONGEN was guided by knowledge gleamed
from chemist, as to what structures are feasible and what are not feasible, essentially. For
example, given a simple thing like C 2 H 12 NO 2, this CONGEN would produce a
structure like this, or it would produce another structure. So, this is one, and this is
another one, and there are more examples, which I am not drawing here.

492
Essentially, again since, you are better at chemistry than I am. You know that this is
double bond and this is single bond, and so on; we have not drawn the hydrogen atoms
and you can fill in the hydrogen atoms, based on the valance here, remaining. So, 3 here,
for example, and 2 here, in that, but these are different structures that CONGEN
generates and what was interesting about CONGEN was, it generated only feasible
structures. That is why, expert knowledge came into that.

So, after CONGEN, you produce a synthetic spectrogram, compared with real and then,
you can decide, whether you have found the structure of the material or not. What
DENDRAL would do is that it would expose this huge space of all possible structures,
generates this synthetic spectrograms, and compare it with the one of the original
material, and eventually, do the job for the chemist, and it turns out that this was
working, of course, much faster than real chemist, as you can imagine; and doing
equally, well or almost, equally well and when you say real chemist means people, who
have PHDs essentially. So, let me just end with a small example of the kind of space that
DENDRAL generated, and in the next class, we will see the algorithms, which I used to
explore this space.

(Refer Slide Time: 42:37)

Let us say that we are looking at C5 H12. This is a molecular formula, and any box,

493
which has a molecular formula, is an unsolved node or which, may be partially solved, as
we will see. A solved node is where; this structure is entirely given to you, essentially.
So, for example, it could be like this; C, C, C, C, C; and then, the hydrogen items. So,
this would be a solved node. A simple way of organizing this 5 carbon atoms and 12
hydrogen atoms, but there could be other options, essentially. For example, these are
directed graphs. You could say that it is C. So, a partially elicited structure where, part of
the structure is known, and part is not known. So, 3 carbon atoms are here; 2, I buy it
inside here. This is another partially elicited structure, and this itself, would have now,
three children, which this part, this part and this part. So, you solved for C2 H5 with a
bond. Then, you have another C2 H5; it is a bond and then, you have C, H, H, H, H, H,
with two bonds, and this is solved; and this can be solved by, and this also, can be solved
by this, and this will have, for example, two components. One of them will be the C2 H5
with a bond, which will be solved by this; and the remaining will be that; whatever
remains here; C, C, C, and so on.

So, in this very quick chemistry lesson, you can see that three possible structures for C2
H5; one is this structure; one is a structure where, you have the middle part where, you
have C and 2 hydrogen atoms, and the two edges contain this structure like this, and
likewise, this is another structure where, the middle part has the C2 H5 component. Here,
the two edges have the C2 H5 components; obviously, that matters. So, you have either,
one solution is this one. This is an AND node. Another solution is here, and the third
solution is that side, essentially. Now, what DENDRAL did is that it navigates this space
intelligently, in the sense, that this CONGEN, component of DENDRAL, only generates
those nodes, which are feasible in practice. How does it do that? It has got a lot of expert
knowledge built into it, and that is why, this system is called the first expert system that
was built, essentially.

Then, of course, it generates the synthetic spectrogram of the, for example, you take this
generate the spectrogram and compare it with your real material spectrogram, and if this
matches, then you know that it is the structural formula and so on, essentially. Today, we
have just introduced the idea of AND OR trees or AND OR graphs. So, we will not
distinguish too much between them. We have just very briefly, talked about the cost
associated with solved nodes and which, have to be aggregated into the root node. So, in

494
the next class we will look at an algorithm, which explores an AND OR tree or a goal
tree, to find an optimal solution, essentially.

So, we will stop here, now.

495
Aritificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering.
Indian Institute of Technology, Madras

Lecture - 25
AO*Algorithm

(Refer Slide Time: 00:05)

We are looking at problem decomposition with goal trees, and we want to look at an
algorithm, which will solve it. So, what should algorithm do? Break it down into simpler
problems and we keep doing this, till the problems are severely small or primitives, that
you have access to in some systems, especially. So, we will use, of course, one thing is to
search the entire tree, but we do not want to do that. We will use the heuristic function,
which is estimated cost solving node n. We will use the heuristic function to guide our
search.

496
(Refer Slide Time: 01:43)

Let us say that we have some problem, which we are trying to solve. We compute the
heuristic value, and let us say, it is 50, essentially, just to start with. When we start
solving the problem, our estimate is that it is going to cost us 50 units of whatever. Let us
assume for the sake of illustration here, that every edge has a cost associated with it,
which is 10, essentially. Let us say that I get something like this; out of this. I get 2 edges
out of it. One of them is, you might say, hyper edge or an edge; and the other one is
simple edge. So, this is a tree like this. Let us say the cost associated with this is 15, and
this is 20, and let us say this is 40, essentially. Now, in this example, I have mixed up the
kind of edges, emulating from a node and if you think about this, it does not really cause
a problem to see it like this.

We can think of it as two choices from the top level, and these choices already being
broken down into two parts, essentially. You could have easily transformed this into a
node, which says that 50 and then, some node x, and this is 40 and this x has got 2 nodes
of cost 15 and 20. These two are equal in nature, and if necessary, we can always
transform such a problem into that. Of course, you have to keep in mind, the cost of the
edges. So, we have to take care of that somehow. But for the illustration purposes here,
we will assume the cost of every edge is 10. So, if you have generated, if we have
expanded, we will use the same term expanded a node. Here, expanded the node of this

497
thing and we have got these three nodes which is the next node that one should pick. So,
let me call them A, B and C. This is the start node S. We will use similar term only.
Between A, B and C, which is the next one I should expand?

.A; why is it?

Yes. So, the cost of solving the original problem, if I take this option, would be the cost
or solving this plus the edge cost which is, I can write this here; 50. And the cost of
solving it from the other side is 15 plus 20, 35 plus 20, which is 55. So, obviously, this is
better. So, I must choose this side, essentially. Then, let us say I get this, some values,
and this is n node; this is an n node. Then, I can compute the cost of solving; revise the
cost of solving A or estimated cost of solving A. So, instead of 40, this has become 30
plus 250 here, and this has become 40 plus 260, plus 20; 60. So, 20 plus 20 plus 10
plus10; that 60, and likewise, this is 15 plus 15 plus 10 for each hour; 60. So, I have to
revise this cost now. This becomes 60, because this is 50, and this is 60; 50 plus 10; 60.
So, at this moment, my algorithm was shifted tension to the other side. You can see, it
has a best first nature about it. We are using a heuristic function to guide search. The
heuristic function is estimating cost of solving a node. It is just that, because this is an
end of graph or end of tree; we cannot make a decision, based simply on value of a node;
we have to look at the cost of the full solution of which, this node is a part, essentially.
Now, you can see this is consistent with what you were doing earlier.

We said that, for example, when you are looking at the travelling salesmen problem, and
the branch and bound algorithm, we are saying the estimate is a estimate of full tour,
essentially, and not just one, that edges we have found and so on, essentially. So, now,
my attention must shift here, and maybe, I will expand one of these two nodes. Let us
say, I expand this. Let us say for argument sake, this can be solved like this, or it can be
solved by another node, which is of cost, let us say 25. This is 20, this is 25; this is an
odd node. So, this cost is, I have to revise this cost to 25 now, which means I have to
revise this cost to 65 and then again, my attention must shift this side, essentially. So,
basically, what we need to do is to formulize this process, and sort of make this little bit
clearer and put it down as n algorithm. So, let us now write this algorithm. This
algorithm is called AO star and it essentially, finds an optimal solution for a problem like

498
this. When do we terminate this process?

(Refer Slide Time: 08:15)

Let us say, see, what has happened? This has now, gone up to 65. So, we will shift our
attention to this 60, and we will come down here. If I note, if I pick up, if I just name
them, let us say D, E, F, G and H; what is the next node I should expand? So, I have to
pick a leaf. C is a leaf; H is a leaf; G, F, D, E; these I have already expanded. So, D, E, F,
G, H and C; which one should be the next node I should expand? Looking at the
heuristic values; just think about this, I will come back to this question in a moment. Let
us write this. So, A star wolves with a graph, graph G, which is initialized to S, which is
a single node S, essentially.

Then, it says compute h of s and initialize. Now, at any stage of the algorithm, I will
have a graph which looks like that, essentially. I will go into some sort of a loop while;
we have this lotion of the solved node and node, which is not completely solved. For
example, in the integration example, something like, integral DW is the solved node, or
in the chemistry example, we saw that once you know that complete structure of all the
atoms, we are talking about, it is a solved node. We will use a label called solved, while.
As long as, and eventually, we want to say that we have solved the original problem,
which is a root S, essentially. As and when, we find the solution; we want to percolate

499
this label of solved, up, essentially.

So, this is a problem decomposition process, and there is going to be a secondary process
in which, some labels will travel upwards, and one of the labels will be the solved label.
Once a solved label reaches a root node, we can say, we have solved the root node. So,
while, S is not solved, means it does not have a label solved. If you look at for example,
rich and night, the book which describes this algorithm, that I am describing here; we use
that some value call futility, which says that basically, we do not want solution which is
more expensive than this value. So, we are going to explore for a solution or look for a
solution, whose value is less than this number called futility. That is just an additional
thing that we are using. Do the following. We have to pick a node to expand. Of course,
in an initial stage, the graph is only one node, but at something like this stage, the graph
has evolved. So, let me get back to this question.

I started by expanding this root node s; I got these three nodes then, because this was
cheaper than these two combined. I expanded this next; I got this option and this option.
So, let us call them DE option and FG option. The DE option costs 50; the FG option
costs 60; and add another 10 for this option. So, this becomes 50 plus 10; 60, which was
worse than 55. So, I came down to this option. Now, here, I choose between B and C;
one 50. This B expanded so that, I got this 20 and 25. When I back up these values, I got
25 here, and that became 65. So, I have to now shift my attention to this one, again.
Because going down, that pass is estimated to cost 65, along this, I can cost 60. So, I
must come down here. At this point, I must choose this option, because this estimated
cost is 50 and this, here, the estimated cost is 60. So, I must choose one of these two
nodes for expansion next, essentially.

How do I do that? In the graph, I maintained at every choice point, a marker, which
marks the best choice at that point, essentially, or at that stage of the graph. So, for this
graph, at this stage, I would have a marker here. Here, I would have marker here.
Because, here this is a better choice, here, this is a better choice and here, I would have a
marker here. So, this algorithm is going to be a two stage process. In the first stage, I will
identify the nodes to be expanded, and that I will do by starting from the root every time.
In every cycle, in this while loop, we will start from the root, follow the markers, and a

500
marker will take us to a set of nodes. We have to pick a set of nodes. So, the forward face
is as follows. Then, follow the markers to a set of nodes. Let us call them N and then,
pick some n belonging to N. So, we have picked a node. In this example, I have to pick
either D or E. Does it matter which one I pick next; whether, I should D or whether, I
should pick E? In this example, of course, both the costs are 15 so, obviously, there is no
way of telling, which one I should pick.

But let us say, for some other example, this cost was 90 and this cost was 20. Let us say
that D had a cost of 20, and E had a cost of 90. Then, of course, they look different. Will
it matter if I pick D or if I pick E? You have to think of it from the perspective of
reaching a solution, and the perspective of saving on some amount, of course. If in an
OR graph, you are searching something, like A star, if you have to choose between two
nodes and you chose one and then, you are committed to having a solution to that nodes;
I mean, that if that happened to be the cheapest, essentially. In a AO graph; AND OR
graph, if you are at an AND choice like this, if you have to, if the solution is along this
selection, you will have to eventually solve D, and you will have to solve E also. So, you
will have to solve both of them, essentially. From that point of view, it does not matter
whether, you pick D or whether, you pick E. Because, you have to solve both if you have
to solve, if the solution lies, if the sub tree lies somewhere here, that is one perspective,
but supposing, this are the values 20 and 90; is there some other reason for trying to
solve one of them? Now, you can think of this as if you are solving C and F kind of a
formula, then you are exploring whether, a sub formula.

Let us say f1 and f2; you have to try to solve both. If you have to solve for this formula,
which has f1 and f2; you have to solve f1 and you have to solve f2. It is not necessarily C
and F; it is some formula and, of course, f 1 and f 2 are themselves compound formulaes.
If by looking at the size of those formulaes, does it make sense to choose something,
which looks harder or something, which looks cheaper, essentially? So, this is the same
problem here. Here, D looks cheap. E looks expensive. Does the estimated cost; should it
influence my choice having said that, I have to solve both anyways, essentially. How will
it help? No, No, No. You are not listening to what I am saying. If you solve D, you have
to also solve E; why because this is an AND node here.

501
Then, you do not have to solve E. Then, which one is not likely to give a solution? E; the
more expensive one, I think, or if you keep in mind, the fact, that we have some bound
on the cost of the solution, that we are looking for. So, if you have to do two things any
way, and your success depends on doing both the things. If one of them is harder, then
you should see whether, you can do that harder thing first, because then, you hope of
help solving easier thing later, essentially. The idea being that if you solve for D and
then, if you solve for E, then the solution cost shoots up then, we will anyway, have to
shift attention to another side, essentially. So, that extra work could be waste, whereas, if
you try to solve the more expensive thing first, then you will get to know early whether,
it is too expensive or not. So, that is the only idea, but if you have to solve for both you
have to solve for both.

We will just assume; pick some nodes n from N; generate children C of n. We have
something like the move gen function, which is not like the old move gen function, but
something, which defines the problems into different sub problems; gives you the set,
essentially. If none, if I do not have; if this node n does not have any children, my system
is not giving a child, which means, it means my system is not telling me, that I can
decompose it in some way. What should I do? Let us have picked this node n. Let us say
this is my node n that I have picked for illustration purposes. Now, I am saying; generate
the children of n, which means, it has further solutions expanded, essentially. What if
there are no children? What if it cannot be expanded? What if it is a dead end of some
sort? What should my search do? If I cannot expand this, I do not find any solution.
What should my search do? You have to speak up a bit. I cannot hear. What does back
tracking mean here? Should I go and look at E then; no. Then, I should somehow
abandon this whole solution of which, this is a part, because if I cannot solve D, then I
cannot compose the solution, which has D and E as their parts. So, I should abandon the
whole thing; how do I do that?

I simply say that cost of solving n is futility, because my algorithm is looking for
something which is less than futility, which means, this will never be considered
henceforth, essentially. So, generate children C of n, if none; then you do this. Then, you
remove loops. What do I mean by this; remove loops? I am talking in the perspective of
AND OR graph. In the OR graph, of course, we have the social of loop, looping that.

502
You could just go round the same path. Here, what does looping mean? Remember that
AND OR problem is, decomposing the problem into simpler problems. If you think of,
for example, symbolic integration that we looked at; can we have looping? How can we
have? What do you mean by looping? Can we have looping when we are decomposing
the problem? You said; you nodded your head.

Student: Integration, it could happen like, when we apply that formula, we will get the
same integral part again.

So, it is possible that you to do one transformation; you take sin by 4, by cost by 4; sin
rise to 4, divide by cost rise to 4, and you get tan rise to 4 and then, you apply another
transformation, which will, let us say, take you back to the same thing. So, it is possible.
Looping is possible, because transformations are not one way; transformations can be
two way in many situations. So, you want to avoid that extra transformation. Therefore,
we move by mean by loops, essentially. Then, otherwise add for each child c belonging
to this C; compute h of c. This is a forward phase of the algorithm, that in this situation, I
follow the markers. I will come to a set of nodes; n. So, in this example, n is these two
nodes; D and E. Then, I say, pick some node n from this set. I have picked D from there.
Then, I say, generate the children of this c; this is my set c, and this is the values. So, let
us say, I have two choices for solving D. Let us, for argument sake, that each of them is
10 and this is 5 and 5; let us say for argument sake.

So, what I am saying is that for each child, c belonging to this set of children c; this set is
c. For each child in this, compute the h values. So, this completes the forward face,
essentially. Now, in the backward face, I have to propagate the new cost up, essentially,
and I have to readjust the pointers, the markers as I go along. So, that is the second face.
Forward face goes from the root to the leaves; the backward face goes from this set c,
subset of the leaves, to the root. So, what do I want to do? I want to; I found this heuristic
values 10, 10 and 5, 5.

Now, I want to first change the heuristic value of n, essentially. So, this is 10 and 10, 20
plus another 20; this becomes 40, and this is 5 and 5, 10 and plus 20; this becomes 30. I
have to now, say that the revised value of this node D is 30, and this is the best part to

503
follow, if you are solving this node D. So, I have to put this marker here, and I have to
revise it to 30. So, this is the new value. These two things I have to do, correct. This was
my node n; I expanded this; I got these children; I evaluated the heuristic values, and
now, I have to propagate this value up. Till what? Till what stage, this propagation goes?
It should go till, as long as this node changes? If this node changes then, I must
propagate its value to its parents as well. So, let me take another situation, or this one.
Let us say, this is a node that I have. Let us say, this is 10, 5 or let us say, this is 30, 5, 5,
and this is 5, and this is 10.

Let us say this was my node n. I just want to illustrate this idea. If this was my node n,
and these are the two children; I will evaluate this children, and what is the best way of
solving this n? I will mark with the pointer and compute the cost, as 5 plus 10; 15, and
this is 10 plus 10; 20. So, that is not the best one; this is the best one. Because, this has
changed from 5 to 15; I will add its parent here. So, this will become 5 plus 5, sorry, 15
plus 5, 20 plus 20, which is 40. This was originally, also 40, because this 30 plus 10; 40.
Let us say, this was 80, to start with; otherwise, I would not have gone there at all. So, if
this was 80; this was, this is on 90. Now, after I revise this, I must revise its parent,
because its parent is getting affected; it has got from 20 to 40. Because, I am revising
this, I must revise both the parents. I must, because I know that the cost of this has
changed; this cost must be propagated to this, as well. Whether, I came from this path or
whether, I came from this path; it does not matter. I must revise both the parents. So, the
process of propagation is basically, to go up to all the parents. If you have got a feel of
that, just write it down.

Let M be the set of nodes, that has changed. This is just a comment. So, initialize M. The
node that I expanded, I put this into the set M, essentially. Essentially, M is going to be
the nodes, whose value has changed, essentially. So, I could have done the next step
before this, but it does not matter. Let me write it here.

It says that compute the best cost of n. Pick some; let us say this node p or something
like that, or just a way consistent, let me use n; I do not know which is better. Let me use
n, because I will just only put n into this m, initially. So, I just use the same name, but of
course, as m grows, it may have more elements. How will it grow; because parents will

504
get added to this set m, essentially. But initially, of course, there is only m inside n, and I
am just simply, saying, using the same name, saying, and pick some node m from n.
Compute best cost for n. How do I compute best cost for n? For example, if this was n,
then I can see that from this side, the cost is 40, and from this side, the cost is 30. So, the
best cost for n is 30, essentially. I should have checked that, you know, anyway, you are
going to revise the cost, irrespective of whether, it improves or not. So, compute the best
cost for n. I must do another thing here, before that. For each child c, compute h of c; I
must have another step, which says that if c is primitive, label c solved. If the child, that I
have just generated, is the terminal node of the primitive or trivial or whatever, you want
to call it, are solved; then put a label solved for this node c. For example, if this was
solved, then I could just put a label, saying solved. If I knew this was solved, the others
may still have to be expanded and all. So, for every child, you compute the h value, and
if it is a solved node or primitive node; you label it as solved. Then, n is the node that we
just expanded, and we have added this to the set m.

We are picking some node from m. In this case, it is only n to start with. Compute the
best cost for n, which has changed. If a solution below is labeled, solved; label n solved
where, I should not say, a solution; is the best solution, say, if the best solution below it
is labeled solved, you label it solved. If I take this example again, the other side,
supposing, this was n; this is a better node. If this is solved; I will use double circle to
denote a solved node. If this is solved, what do I need to do? I need to update n from 5 to
15, that is the new cost; that is the best cost for solving n. Initially, I thought of it is going
to cost 5, but after I expanded it, I know that either, I have a choice of 15 or choice of 20.
The best cost is 15, but this is labeled solved. So, I must label this also, solved. That is
the step here.

If the best solution below is label solved, then label that node; solved. If I have to label
this node solved, then the better solution, which is this side; both of them must be labeled
solved. If both of them are labeled solved, then I can push the solved label here. In this
fashion, the solved label will percolate up, and I want to keep doing this, till solved does
not reach the root node, essentially. This is the mechanism for the backward phase. So,
this is the best solution below n is solved.

505
Then, you put label as solved. So, we have done two things; compute best, and possibly
labeled it solved. If any of these things happened, we will say that node n has changed,
which means, it was 15 originally. It has either changed to 30, in this case. In this
example, it has changed from 5 to 15, and also, it has got a label solved, essentially. So,
either a change of heuristic value, or the application of solved label means that the node
has changed. If a node has changed; I am using this as a kind of technical term, which
you have to have a test for doing it. If n has changed, then add all parents of n to m.

So, all parents; that is what I was trying to illustrate here, is that if this has changed, then
this will change. If this is changed, not only must you propagate the change of cost, from
the direction you came; but also, to any other parents that this may help because in the
future date, that parent might become, essentially... So, I must put this, of course, in the
loop, while m is not equal to empty; I have this full loop. Pick some node form m.
Compute its revised value, which is the best value from all the sub solutions, it has.

If the best solution below it, is labeled solved; label that node solved. If the node is
changed, add its parent to m. What will happen? Initially, I will add only this node n to
m, because it has changed; I will add to this parent, and because this has changed; I will
add this parent. In this way, the revised value will propagate up. In this example, if this
has changed, I will get a new label solved. So, I will add this to m, but this cannot be
labeled solved, because you know, that branch still is unsolved. So, I will not label this
solved, but I will change its value. Because now, it was originally, 20 and this has
become 15 plus 5; 20 plus 20; 40. So, from 20, it has got to 40. So, it is changed. So, I
must add both its parents to solve; they will change to m, sorry, they will change and I
will add root to m.

After I change m, I will add nothing to m and eventually, all these ms will get revised.
All these loops will terminate, which means the backward phase will terminate. So, the
algorithm works in two phases; in the forward phase, you move forward following the
markers. I have missed out one step here, somewhere here; I must have this thing marked
as best sub solution, or I should say, hyper edge, because each of these are the two hyper
edges; just mark the best ones. So, whenever you are looking at the node m, node in this
m, you must recompute its value, which will get from its children. You must mark the

506
best path, because next time, you are going in the forward phase; you must know which
direction to go, and you must also percolate this solved label back, if possible,
essentially. It is possible, if all the sub solution; if this entire hyper edge has nodes,
which has label solved, essentially. Just to repeat, if this was to be labeled solved, and
this was to be labeled solved, then this will get labeled solved. If this was to be labeled
solved, and this was to be labeled solved, then this will get labeled solved. The moment
it gets labeled solved, and if this is the best one, if the arrow still pointing to this; that
will get labeled solved and then, you will stop. In this example, this has got labeled
solved; this will get labeled solved, but this will not get labeled solved, unless that gets
labeled solved, and this is the better choice. So, I must have a marker here, and a marker
here.

Once this gets slipped, if this has to get labeled solved, and the marker was still here,
then I can label this solved and so on, essentially. So, in the forward phase, you follow
the markers, end up with some unsolved set of nodes, which we have called as this n.
You pick some node from n, may be, the most expensive as we discussed, a little bit
earlier. So, that is not really so important. We pick some node from n, generate its
children. If none, we call it futility. We assign the cost futility of that. Then, we remove
loops and for each child, we compute the h values, that are this set of nodes here, or it
could be this set of nodes here, depending on what is n. There, the forward phase ends.
Then, you add this node n, that you just expanded to this set m; initialize m to this set n.

Then, one by one, you pick. So, this is not correct; this sum is not correct. I should pick
the lowest. So, you can construct examples to show that you have to make this right. If m
is the collection of nodes, pick the one which is lowest in terms of the graph. Pick the
lowest node; compute its best cost below that. If the best cost leads to label to solved
nodes, then label these nodes solved, and if either, it is not labeled solved or if its cost
has changed, you must add its parents to m.

This loop here, this whole loop here, works under the condition as long as, m is not equal
to empty. You pick some node and do this process, essentially. This process is
essentially, backing up the values from the leads, towards the root node, essentially.
Once this ends, the algorithm will go back to the forward node; again, follow the mark

507
path; again, expand some node. Then it will go into the backward phase, backing up the
values from there. These forward and backward phases will keep happening, till either,
one of the two things happens; that either, the value solved is percolated to the root, or
the value futility is percolated to the root. So, in which case, either, it will give you a
solution; solution would be a sub graph or a sub tree, or it will say that we cannot find
the solution of this cost.

So, there is only one more point left. For this algorithm to be called AO star, for it to
deserve the suffix of star, we need a condition on the heuristic function. Can you guess
what the condition is? So, let me, Sorry?

Student: (( ))

What does it mean? This is the optimal cost solution and algorithm is designed to find
the optimal cost solution and what here; saying here is that as long as the heuristic
function under estimate the optimal cost, then it is going to be, give you an optimal
solution. So, as an exercise, I will ask you to construct a small problem like this, with
some leaf nodes, solved nodes and so on. You can basically, manipulate the edge cost to
make it of under estimating or over estimating, and first work with a over estimating
function, and you will see, that it gives you some solution, which may not necessarily be
the optimal solution. Then, try out with an under estimating function, and all you could
go up and look up the past test papers, you will see one such, one of such problems there,
where an AND OR graph is given, and you have been asked to find it solution. So, do try
it out at home.

In the next class, we will take a diversion from here. We will move to games primarily,
because I want to give the games assignment in early October, so that, you can finish it
by the end of October, essentially. Then, we will come back to this idea of expert
systems, tool based systems. I mentioned things like, R1 is called an expert system or
prospective. How are they implemented, and what is a rule based system, and how does
it work? So, we will come back to that after, we finish with games.

So, we will stop here.

508
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Lecture - 26
Game Playing

(Refer Slide Time: 00:20)

Today, we want to move on to a different topic, which is, games. Now game playing has
fascinated computer scientist ever since, the beginning of computing time. Von Newman
was the first person to talk about chess playing programs, and we have discussed in the
introduction that Alex Bernstein wrote a program in the early 50’s, and Samuel wrote a
program to play checkers, and so on and so forth. Essentially why do we like games?
You can say games, we like, because they have well defined rules. If you solve real
problems in the real world, if you want solve computer vision and natural language
processing, or let us say, TSP, and things like that; some of the problems, TSP is well
defined, but some of the other problems are not so well defined, essentially; the real
world problem if you are trying to solve. Whereas, games have clear cut rules; it tells
you these are the moves you can make and so on. Essentially easy to evaluate, it is very
easy to decide; the rules tell you when a player wins when a player loses.

So, it is easy to judge. Incidentally, the kinds of games we are talking about, these are not

509
the games which are popular amongst students where, you are shooting somebody or
killing somebody. We are talking of simpler games, which are like chess and checkers,
and so on and so forth, essentially, but the fact is that games, because of these two
features; they were well defined and easy to evaluate, or always the good medium for
evaluating your reasoning algorithms or search algorithms. Whatever is the technique
that you are using, I mean, not only for us, but even in the real world, you can imagine
that if you look at our country, you will find the politicians there children, are
politicians, that is their only merit.

A film star has children, who become film stars, again, that seems be their only merit.
Whereas, if you look at world of sports, children of sports men, do not necessarily, make
it begin the world of sports, essentially. Look at Sunil Gavaskar, one of the greatest
batsman we had, his son Rohan Gavaskar, could barely, make it to the Indian team,
essentially. Whereas, is if he might be a politician, he would have been some big shot by
now, essentially. That is because games are easy to evaluate that you do not have bias;
you do not have judgment. Somebody saying, no, this is not good and that is good; they
have no such things; either, you win or you lose, essentially.

(Refer Slide Time: 03:45)

Now, game playing derives some game theory. So, what is the name when associated

510
with game theory? Who is credited with inventing this field of game theory? Not Mash,
essentially. Von Newman is the person, we credit with inventing game theory, and what
do we mean by game theory? We mean rational choice in a multi agent scenario. So,
essentially, game theory is the first place that we have stepped out of a single agent
situation. So far, the algorithm that we have looked at, search or whatever else; there was
a single agent, trying to solve a problem, and you are looking at what are the approaches
solve that problem. In game theory, we first time, introduce more than one player, and
therefore, we have to consider the actions of other players, essentially, and that has to be
part of listening process that we do, essentially.

(Refer Slide Time: 05:07)

Now, I wonder if we have heard about this game called Prisoner's Dilemma. So, let me
introduce this game to you in more familiar setting. Let us say, there are two players as
far as this game is concerned, and I am going to write the pay offs here. So, there are two
players. Let us say, some hypothetical country, obviously, not our country; there are two
students who have been caught cheating, and the Dean has put them into separate rooms,
and they are being asked to confess, whether they have helped each other, cheat or not.
The payoffs are like this. Let us say, this is you, I mean, you meaning not personally,
you, but the player for which, you are designing this game. Let us say, this is him
whatever, the other person, essentially. You have two choices; you have to confess or

511
you have to deny, likewise, the other player has two choices; you have to confess or you
have to deny.

When the game was originally defined, it was defined in terms of two crooks, who were
locked up in two different rooms, and the police were trying to interrogate them. They
had to confess or deny, but the principal is the same, essentially. So, what is the payoff
here? Let us say, these are your choices; confess or deny. So, if you confess and if the
other person confesses, let us say, you get a E grade. This is symmetric for both. So, both
people will get E grades. If both of you deny, then let us say, the payoff is this; you both
get D grades, because nothing is proved and something like that. Now, if you confess,
and the opponent, if you want to call the person, opponent; denies, and then let us say,
you get a B grade, and the opponent gets an F grade, likewise, if you deny, and the
opponent confesses, then you get an F, and the opponent gets a B grade. So, the top right
corner is your grade. The bottom left corner is the other person’s grade. These are the
four possible situations; you confess, or you denied, opponent confesses, or opponent
denied. You can construct that game, three out of this; two represent this choice. Now,
remember that this is the game, which is played where, you have to decide sitting alone
in the room, and you have been interrogated by somebody. You have to either, confess or
you have to deny. You know the payoff matrix. Both players know the payoff matrix.
What is the correct strategy to use here? Game theory is concerned with questions like
this; that if you know the consequences of your actions, in the context of other people’s
actions, what are the choices that you will make. So, in this situation, you think it is good
to confess, or is it good to deny. Rationally speaking, sorry, you have to speak a bit
louder; deny is better; why is it?

See; let me explain this matrix, again. The first, look at the first square here, on the left
hand side. It says that you are confessing, and the other person is also confessing. You do
not know what is the other person is doing, but if the other person also confesses, then
you get an E, and other person gets an E. This square says that if you deny, and the other
person confesses, then you get an F; that is your F, stands for fail, and the other person
gets a B, essentially. If you confess and he denies, then it is symmetrically, opposite; you
get an B and gets an F. If both of you deny, then you get a D each, essentially what
should be your strategy?

512
Student: Confess.

How many people say confess, and how many say, deny? So, others, do not have an
opinion.

Student: If you confess, the other deny it should be oppose it.

Sorry.

Student: If you confess and the other deny, it should be oppose it. If you confess and the
other deny, the other person should get a B, because in this case, deny is paired in both
the cases, irrespective of whatever he says.

Yes.

If you look at the other scenario, let us say, two people have robbed a bank or something
like that. I do not know which is the worst; the cheating in a class, or robbing in a bank;
anyway, let us say, they have robbed the bank, and then, the police is interrogating that
two people separately. They say, if you confess, we will let you off lightly, but the other
guy will get punishment. So, it is in that sense that you help them, establish that there
was a crime, essentially. So, it is in that sense. That is why, if you confess and the other
person denies, you are let off with a B, but other person gets an F, essentially. Otherwise,
if you use deny, and the other person confesses, it is the other way alone, essentially. So,
many people said confess; why is confess, good? We should give a rational argument.
Game theory is about rational behavior that what is logically, the correct thing to do,
essentially. You want to maximize your grade in this case, or in the case of those
robbers; they want to minimum their punishment, but the problem are equivalent,
essentially.

Student: Irrespective of what the other guy does, it is better for you to confess.

Yes, but either that, let us say, your answer is correct, but I just do not want the answer; I

513
want an explanation.

Student: (( )), but if you confess, you will get a B, and which is better than F. If other
person is denying, you will get a better grade. if other person is confessing also.

I will encourage you to draw a small tree out of it. So, for example, you confess, deny,
and then, the other person, confesses and denies, or you can construct it like the other
person confesses and denies, and then, you confess and deny. You can construct the three
both ways and then, explore that, essentially.

Student: Either case, if you take the situation where, whatever, regardless of what the
other person does, if you say corresponding row; your choice is between confessing and
denying, because it is between E and F, and in the other case, it is between B and D. So,
it is always better to take the choice of confessing.

Let us argue. Let us just expand up on the argument and say, let us say, if you confess
what will happen? That means, you are looking at this row, essentially. You are looking
at the first column. If this person confesses, then you get an E. If this person denies, then
you get a B, which is, this, so, looking at this thing is, looking at these three structure;
first, you look at your decision and then, you look at other two decisions and then, and
see which one is better. You will see that in this case, this is better. In this case, this is
better and this is bad. If you deny, if the other person confesses, you get an F, and the
other person denies, you get a D. So, maybe, this is actually, may be, let us look at the
other side of the argument. If the other person confesses, what should you do? If the
other person confesses; so, you looking at row, now; we are looking at this tree. You can
argue this tree also, but let us look at this, may be, this is simpler. If the other person
confesses, you are looking at this row. If you confess, you get an E. If you deny, you will
get an F. If the other person confesses, it is better for you to confess, correct. You are
looking only at the row. If the other person denies, and if you confess, you will get a B.
If you deny, you will get a D.

So, even in this case, it is better for you to confess. So, it is better for you to confess in
both cases, essentially. If you look at, you constructed tree like this where, you consider

514
the other person’s choice as first, then your choice, and then, you will see that your
choices; this is better, and this is better. In this case, these two are better, and these two
are the worst, essentially, but this way, it is easy to evaluate. Yes, the actual choice is to
confess, and assuming that the other person is also rational; other person also confess.
So, your expected grade would be an E, essentially. This E is called the Nash equilibrium
in this game, which means that if you divide from this, rather the other person cannot
divide from it, without spoiling the game from him, essentially. Such an equilibrium is
called the Nash equilibrium, but is this the best outcome? It is not the best outcome in
this particular matrix, because if both of them are die hard people, I am not saying that
students can be die hard criminals, but we are talking about those robbers now. If both of
them are die hard criminals, like you know, for example, it is eulogolized in many films
essentially.

So, they are these two criminals, who were really good guys, but you know, they will
never confess for anything. Now, if you have two such people, and they have trust in
each other in some sense; then, they will both deny. In this matrix, they will both end up
in this lower right hand corner, which is the better situation than that situation. So, this of
course, is called a Pareto Optimal, but it is not a stable optimal with this problem
situation. If I want to exchange these grades for these two corners, this E, E and D, D;
then the Pareto Optimal coincide with the Nash equilibrium, and then, it is really the best
solution, essentially. Why is it called the Nash equilibrium?

Because in some sense, if you confess, and the opponent is forced to confess, if he does
not confess, he will go off, likewise, if he confesses, then you are forced to confess,
essentially. Now, obviously, you cannot get into complicate or you try to get into
complicated, kind of pleasing that I can figure out that it is real. So, this is the basic idea
of game theory, puts you in a situation like this. It is a basic mechanism for studying
economics, when people try to evaluate things like, what should be the pricing
mechanism; you know, how much will you advertise and things like that; people use
game theoretical concept to reason there, essentially. So, rational choice, in the case of
multi agent scenario, essentially. So, you can think of Price Wars, for example, as the
game between two players. So, let us first characterize the games a little bit. The kind of
games that we are going to be interested in, are much simpler in nature.

515
(Refer Slide Time: 18:15)

These are called Board games, and as the name suggests, they are played on a board, like
chess, for example. In particular, the games that will be interested in, have this following
characteristic; they are two player games, which means that there are only two players in
this game, but there can be multi player games, may be, will give an example, Zero Sum.

So, we would be interested in games, which satisfy these (( )); two player, zero sum,
alternate moves, deterministic games. I might even say, discreet games, in the sense that
you know things, like chess and so on, where, you have to choose between the set of
discreet, set of choices, essentially. So, two player basically, says that there are two
players, but you can have multi player game, essentially. I was just talking about Price
Wars. If you think of Price Wars as a two player game, which means, let us say, Pepsi
and Coke are getting into a price war, about to decreasing prices to get more market
share. Then, you can think of it as a two player game, and you can also think of it as a
negative sum game, which means that both the players as going to lose out on the money
that they get, essentially.

So, a zero sum game is the game, in which, the sum of the payoffs of different players is
0. So, chess for example, one person wins, and the other person losses. So, if 0 stands for
a draw, and minus 1 stands for a loss, and plus 1 stands for a win, then the total sum is

516
always 0, essentially. One person’s gain is another person’s loss. So, these are the game
offers where, you have opponents, essentially, but there can be games, when there is
negative sum. So, for example, this game is a negative sum game, essentially, assuming
that E and D are negative things, which you do not particularly, like.

The Price War game is a negative sum game, because the revenue of both the companies
is going to suffer a loss, which is seen as negative, essentially, but you can also have
positive sum games. For example, cooperation between people, if two people sit down
and study together for an exam, then both of them gains from it, essentially. So, you can
have different degrees of games. So, Price War, as I said, can be seen as a two player
negative sum game, but if you include the third person, which is the consumer, then it
becomes the three player game. Let us say, there is one figurative consumer and then, it
becomes the zero sum game, because whatever the company loses, the consumer gains,
in terms of price we are talking here, essentially. I missed out one very important
characteristic; which is complete information.

In complete information games, each player knows; what are the moves available to the
other players. Essentially what are the choices available to the other players, essentially.
So, obviously, the game that we have been talking about, like chess and checkers and
Cross and Knots, or Go; all these game called Othello in which, you are going to do an
assignment. These are board games on which, you can see the board. Both sides can see
the board, and therefore, they are complete information games, essentially, but there are
many incomplete information games. So, games, like car games, for example, games like
Poker; are incomplete information game, but there are also games, which are
deterministic versus nondeterministic.

For example, if you have to throw a dies in a game, game like Backgammon or Ludo, or
something like that, and that is not a deterministic game where, you cannot predict the
outcome of the move, essentially. Backgammon, for example, is a two player, 0 sum,
alternate move, and stochastic, incomplete information game. The moment the game
becomes stochastic, it becomes incomplete information, because we do not know what
happens, when you roll a die or when the opponent rolls a die, essentially. Of course, in
the real world, many situations where, you have to do rational decision making. You are

517
mostly operating in an incomplete information world, essentially. An extreme example is
two generals, who are fighting a war, essentially. So, you do not really know what the
other’s army is doing; which side they were moving. Of course, we try to have spies on
the other side, and you try to have intelligence, may be, you take satellite images and all
kinds of stuffs, but that is an effort to move towards the complete information,
essentially.

But in practice, of course, you have to work when the information is not complete,
essentially. Alternate move games are essentially, the simpler kind of games in which, I
make a move and then, you make a move, then, I make a move and then, you make a
move, which are easier to model for us, essentially. Now, the moment I talk of alternate
moves, you can see that we are moving away from these kinds of games where, there is
only one decision to be made, and you make one decision. Here, typically, you have a
sequence of decisions; I make a decision, then, you make a decision; then, I make a
decision, and things like that, essentially. So, such games can be represented by a game
tree. A game tree is a layered tree. So, let me draw one, and typically, because we talking
about two person games, there are two kinds of nodes; one for one player, and one, the
other kind for the other player.

(Refer Slide Time: 25:37)

518
Typically, we draw them differently. So, this is the route node, which the first player has
to make a move, and let us say, this is one move that this player makes, out of the many
possible moves. Then, the other player makes a move. Then, the first player makes a
move, again. Then, the other player makes a move, and so on. So, this is the game, which
is a, it is the actual game played, says that one player made this move, and second player
made this move, and the first player made this move, and so on. So, square is one player
and circle is another player, essentially. Of course, there are many choices that they all
have. So, in practice it is a tree or something like this. So, such a tree is called a game
tree. It is a layered tree where, each layer represents the choice for one player. This is
some tree, I have drawn randomly. Traditionally, this player is called max, and this
player is called min, and you can see that the tree, basically, represents a sequence of
choices for both players. We are assuming that max plays first. So, max has chosen
within these three moves. Then, depending on what max chooses, min has to choose
within two moves, and so on, essentially. Such a tree is called a game tree.

(Refer Slide Time: 27:39)

In a game tree, leaves are finished games. So, leaves represent the end of that game, and
it has got three kinds of labels. So, a leaf may be labeled by one of these three labels;
win, draw or loss; and this word is from max’s perspective. When you say win, it means
max has won the game, and if you say loss; that means, max has lost the game,

519
essentially. This is for max. Anything that we do is for max, essentially. So, labels are for
max and so on. So, we can complex this to W, D and L, essentially.

(Refer Slide Time: 28:58)

Let us label this arbitrarily. Let us say this is L this is D. This is D and this is W. This is
L, W, D, D, L. Every leaf must be labeled. Every leaf has a label. So, basically, the game
tree represents all possible games that can be played; given the rules of that game. When
we draw the game tree, we do not talk about rules. We just say this, a game tree. You can
imagine that if you want draw a game tree for chess and white starts first; white has a set
of moves that white can make, which is 8 plus 8, 16 pond moves plus 4 knight moves.
So, those of you, who know chess, at the top level, white can make 20 moves,
essentially. You could just draw the tree and say, forget about the rules of the game,
essentially.

So, a game tree is basically, a representation of the rules of the game. Once we represent
the game in the form of a game tree with, we can just reason with the game trees,
essentially. Now, every game has an outcome, which will be the outcome, if both players
are playing perfectly. We are always interested in rational choice, remember. We do not
want to play against an opponent, who can make mistakes, and things like that. We
assume the opponent is perfect, and our analysis is based on that, essentially. So, if both

520
the players are perfect, then every game and a game tree is a finite tree, because it has all
these leaves, essentially; has on outcome, which can be computed, essentially. So, just
like, you can think this as a game tree with only one node, essentially, and its outcome
will be determined by what choice you make, essentially. At the same time, this is
simultaneous; this is not alternative move, essentially. Both the players are making
moves simultaneously, whereas, the games you talking about; first max makes a move;
then, min makes a move; then, max makes a move, like chess, for example, and so on.

(Refer Slide Time: 31:29)

How do we listening? The outcome is called the minimax value, is equal to outcome of
perfect play, one more thing. By perfect play, we mean that each player is making the
correct choice, essentially. Let me ask you, for example, you are familiar with Knots and
Crosses, right. This game which children play, or Tic Tac also, as it is called. You put
one cross, somebody puts a knot and you put another cross, and so on and so forth, right;
you are familiar with this game. So, let me ask you; what is the value of this game? What
is minimax value of this game? It is wrong, right; which means that we everybody knows
that if we played perfectly, then the game is withdrawn. So, how would you compute the
value, minimax value of the game? Is that we would have a backup rule, we should back
up value from bottom to top, and the backup rule will basically, look at the kind of node
that it is; I will not write the rule; I will just state it here. So, for min, it means that if you

521
can back up L, if another choice is L; then back up L. If none other choice is L, then, if
you have D; back up D. Otherwise, you back up W. You have forced to back up W, in
some sense, because min; the desired outcome for min is the leaf labeled L, because min
wants to win the game, which means that max should lose the game. Therefore, min is
trying to drive the game towards L. So, you can see what is happening here. The first
move is made by max. So, let say max makes this move. Then, it means, it up to min; so,
min will try to drive the game towards L. For example, there are some Ls here, but if min
drives the game here, unfortunately, I have drawn both of them like this. So, let us say,
this is L. If min drives the game here, you can see, max can choose W, and end the game
there. So, it is better for min to play this move. If min plays this move, then max will not
choose that, of course, you know, and max will choose this, and at this stage, min can
win the game. You can see that if min makes this move, then min can win the game,
which means, max should not really, make this move, essentially, because perfect win
will have been here, essentially. So, this kind of analysis is simplified by this back up
rule, which says that you back up from leaf to the routes, and for min, you choose.

(Refer Slide Time: 34:23)

So, for min, the order of preference is this. For max, it is this. It means, min will prefer L
to a D; if it can see L and D, it will take L, whatever, it will always choose the right
most; max will choose the left most. So, as we will see, it helps to think of this as a

522
supply chain kind of a thing where, max is supplying some values to min, and min will
choose. Min’s choice is always, prefer the L and then, D, and then, W. Likewise, min is
making offers to max, and max will choose, and so on, and so forth. Before I fill up the
tree, let me also say that we can represent this by 1 or 0, or -1 as well; just a numerical
way of representing this outcome, which of course, gives us a clue, as to why max is
called max, and min is called min; because max is driving it towards 1, trying to
maximize the value of the game, and min is trying to drive it towards 0, sorry, towards
minus 1, and trying to minimize the value of game. So, that is why the players are called
max and min. In terms of numerical value, it is easier to state the backup rules. Max
chooses the maximum of the values, available to it, and min chooses a minimum of the
values, available to it. What do you mean by available to it? That, at from the next layer,
what is being supplied to it, essentially. So, let us fill up this tree. This is L; this is D; so,
min will choose L here.

(Refer Slide Time: 36:01)

This is D; this is W; min will choose a D here. This is L; this is D; the max will choose a
D here. This is, both these are Ds; so, min does not have a choice. Here, it is L and so,
there is only one node. This gets backed up; L. Max will choose W here, and we analyze
this part of the tree, earlier, right; min will choose L here; max does not have a choice;
so, max has to choose L here; min will choose L here; here, max will choose W between

523
W and L; min will choose D from this. So, as a top level, you can see that max has three
choices. One of them leads to a node, which will evaluate, eventually, evaluate to a draw.
This move will eventually, lead to a loss, and this move will eventually, lead to a draw.
So, max can choose one of those other two moves, essentially. Rationally speaking, and
the minimax value of the game is therefore, D.

So, this is the game, which has a minimax value of D, essentially. In some sense, solving
a game amounts to trying to find the minimax value of the game. It is called minimax,
because at alternative layers, you are minimizing and maximizing, essentially. So, in
some sense, you are trying to distinct. Another way to put it is that we are trying to find a
strategy for max. Obviously, at the first level, we have known that max should make as
this move or this move, but on what basis, are you saying that; that basis can be formulas
in the notion of the strategy, and a strategy; we are talking about max at this movement,
but we can talk similarly, about min; is a sub tree, which is constructed as follows, that
one choice for max, and all choices for min. So, by strategy, we mean that max is
freezing max’s choices is and saying; this is my strategy; this is how I am going to play
this game, essentially.

It means that every plays in the game, max says that this is the choice I will make. So,
that is why, the strategy is the sub tree that we construct by making one choice for max,
and all choices for min. Why all choices for min, because it is max’s strategy, and max
does not know what min will play, essentially. So, max has to account for, or cater to all
of min choices. So, the strategy must consider all of min's choices. So, we can draw a
strategy for max. One strategy is, let us say this one. Now, we have chosen this, but for
this, we must choose both; because all choices for min.

Then, we can have this choice for max, and let us say this choice for min. So, this is, the
thick lines represent one strategy for max. Of course, yes, max has occurs to many
different strategies, and you can say that the task is to find the best strategy for max,
essentially. Let us draw another strategy for max, which is on this left side of the tree.
So, max could have chosen this move, and then, of course, you have to choose both these
moves. Then, max could have chosen, let us say this move, which is the better move and
then, min would chosen. So, that is how, we construct the strategy. At max level, you

524
choose all, sorry, at max level, you choose one choice for max. At min level, you choose
all choices for max. What the strategy gives you is the set of leaves. Every strategy has a
set of leaf nodes, which tell you about this. This is what will happen, if max chooses this
strategy. So, the question I want to ask you is, given a strategy; how do we evaluate a
strategy? Let us work with this one; numerical representation.

Let us say the leaves are labeled with 1, 0 and minus 1. Given a strategy, S, and I talk
about the value of a strategy; how can I compute the value of a strategy?

Student: Worst possible one.

Worst possible one and by that, you mean the lowest value essentially.

So, if you look at this white strategy, in this white strategy, the leaves have W, D, D, L;
four nodes in the leaves. The worst of them is L or minus 1. So, that is the value of this
strategy. If max chooses this strategy, he will end up losing the game. If min is perfect,
because the choices that are left, are only for min, essentially. If min is a perfect player,
then if max chooses this strategy represented by this tree, then max will end up losing the
game, and the value of the strategy is the lowest value of the leaf nodes, essentially.

If you look at this strategy, then you can see that there are three leaf nodes here; D, D,
and W. The worst of them is D, and therefore, the value of the strategy is D or 0, in this
case. If you looking at minus 1, 0 and 1, value of the strategy is 0. Now, none of them is
a winning strategy. A winning strategy would be a strategy, whose value was 1,
essentially. Ideally, of course, you want to find a winning strategy for a player, or max,
in this case, because we are writing the program for max, let say, and somebody else is
handling min, or if you cannot find a winning strategy, then at least, you should blue one,
essentially. So, I will leave you with this question here.

Can we think of this as an AND OR tree, and can we use A0 star algorithm to find an
optimum strategy? Because you can see that this like an OR choice, and this is like an
AND choice. This is like an OR choice, and this is like an AND choice. So, can we think
of it as an AND OR tree, and can we use A0 algorithms? So, I will stop here and in the

525
next class, we will take it up from here, essentially.

526
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Lecture - 27
Game Playing Minimax Search

(Refer Slide Time: 00:14)

So, let us continue with a study of game playing, and we saw in the last class the notion
of a strategy, and a strategies constructed by making one choice for max, considering all
choices for min, and then one choice for max, and then all choices for min. So, you can
say that solving a game tree, means discovering an optimal strategy we said, that the
value of a strategy, is minimum of value of leaves. Assuming that, we are talking in this
minus 1 0 1 valuation, where 1 stands for win for max, 0 stands for draw, and minus 1
stands for loss for max. And if once maxes shows on the strategy, then the value of the
strategy is the lowest value, from the leaves in the strategy, remember the strategy is a
sub tree, and what are the value that the leaves are to minimum of those is what the value
of the game will be. And assuming that the opponent is perfect and we assuming that the
opponent is perfect, my feel look at and and over graph.

We viewed our solution and and over graph, if you look at what on solution is, at all
level you have to choose one, we have to make one choice, and at end level we have to
make all choices. So, if for example, if they you had three choices here, then at all level

527
you will choose one choice, to solution would be like this, and if this was an hand node,
you would have to solve for all three, and then if this was an all node, when you would
again make a one choice essentially.

So, we can see a similarity between and over trees and game trees, in and in and over
trees, if you added and node, you had to solve all the three, all the sub problems
emanating from there. If you had an all choice, you could choose any one of them
essentially. So, a max node is like an all choice, because max can choose anyone
essentially. But max has to en counter for all of min choices, so a min node is like an and
node essentially, the differences that the value of the solution, is in this case a some of
the values of cost of these three nodes, solving this nodes.

In this case, it is the minimum of the cost of these nodes essentially, where it is a min
nodes sitting here, if this was minus 1, this was 0, this was minus 1, and this was 1, then
min with select minus 1, so it is a minimum of these value essentially. You can see that,
so also consistent with our logical notion of, and always choose is the minimum. So, a
and b, if a (s) falls and the whole thing becomes falls essentially, so and also in some
sense chooses the minimum value or chooses the maximum value essentially.

So, there is clearly an analogy, between and over trees and game trees; of course, game
trees have a very well defined layer structure, which we can always impose on some and
over problems, but we have an algorithm for solving and over trees, which is the A O
star algorithm. So, my question is can we use the A O star algorithm to solve the game
tree, what do you want, we want a strategy of optimal value, in this case a roundly three
possible values 1 0 minus 1, so if there is a winning strategy we want it. Otherwise we
want the strategy it will draw the game, otherwise we are force to accept a loosing
strategy.

So, can we use the A O star algorithm here, there is also there is analogy, in the and over
graph the solve nodes are nodes which have a label attached to them, and leaves also
have a label attached to them. In some sense, the similar back up procedure is taking
place in A O star, if you remember algorithm essentially, in the forward phase. We
expand the search a little bit, and the backward phase we backup the values, which is

528
very similar to the backup values that we using here. Except the of course, here we I
choosing min and max, there we I just summing up the values essentially. So, the answer
to this is actually yes, we can use the A O star algorithm, provide it we have access to the
complete game tree.

(Refer Slide Time: 05:37)

And why is that not always a feasible thing, let us look at the game of chess, which is
been the game, which has fascinated most programmers or computer scientist essentially.
Now, if you count the size of a tree, as the number of leaves, which means, each leaf is a
different game that you can play. Every pass that you can take in the game tree is a
game, and the leaf is outcome of the game essentially. So, the number of leaves is in
some sense a measure of, the size of the game tree essentially.

So, let us try an estimate, how could the tic tac to ((Refer Time: 06:22)) not is a simple
computationally a simple game, because at one level you have 9 choices, assuming that
you could you cannot, distinguish between symmetric situations let see, you have 9
choices, you have then 8 choices, at the next level 7 choices and so on. It is a very, not a
very large game tree essentially, which is not surprising, because we have figure out by
now, that it is the value of the tree is a draw essentially, what about the chess game tree?

529
As I said, at a max first level, there are 20 choices, those of you know chess will agree
with me, eight pawns eight pawn moves with one step, another eight pawn moves with
two steps, and four knight moves for the two knight essentially. So, 20 at the first level
and as you start playing the game, the game board opens up, and more moves are
possible. So, for example, bishops can start moving, cooks can start moving, the queen
can start moving, and the number of the balancing factor in some sense increases, as you
go towards a middle game essentially.

So, chess players tend to categorize the game into opening, middle and the end game. So,
towards a middle game, it is a most complex part of the game. So, people say that on the
average b is equal to 35, that on the average, a chess game has 35 moves, that a player
has a choose from essentially. And on the average, a typical game is 50 moves long, so
average game of course, some games get over earlier some games can long last and so
on. But just to get an idea of the size of this game tree, so we can say that, you can one
big estimate is set the number of games possible is 35 raise to 50.

So, if the punching factor was concert 25, then you can 35, then you can see, the top
level 35 moves and another 35 moves for each of these moves and so on. So, since it is
50 moves long 35 raise to 50, which is equal to what 10 raise to 120, which means of
course, just to remind the point, if I started writing this, and I label the zeros, this is the
first 0, this is the second 0, this is the third 0, and I will have to keep writing to the end of
the board, till I write the one twentieth 0 essentially.

We have already discussed this kind of numbers before, that this is not the number, you
can refill with, there is no hope ever of solving the game tree completely, chess tree
completely. And in fact, we do not we have an been able to solve it, still now even
though people, putting massive computing power towards solving game trees, and they
are using very sophisticated techniques like, analyzing the end games, and creating of
look up tables for them and then using those to, so all the games trees and that kind of
stuff.

Checkers incidentally I do not know, if you remember when you went through this
progress in computer science, at some time in this century 2005 or something, I do not

530
remember exactly, the game of checkers was solved. So, it was known, what the
outcome of the game is, and this was done through use of massive amount of computing
power, and checkers as the much smaller game, because the choices available for each
piece are very limited essentially.

So, if you know the game, you can only move in one or two directions, one step or
sometimes you can jump over other than so on, is a much smaller game in terms of the
size of the game tree. And that we have just minus to ((Refer Time: 10:38)) chess is
practically impossible, if you remember the kind of numbers you are talking about, 10
raise to 75 is the number of fundamental particles in the universe. And then, if the each
of them was a super computer, so we can do that argument again, and you can see that
you can, any fact not solves the game at all.

So, we do not know, where a why it will always been in chess or not, which is by the
game is still interesting, as oppose to cross and nodes, which we do not want to play
because we know that it is always a drawn game essentially. So, if we do not have access
to the leaf then how can we use this algorithm, we cannot use the algorithm. So, we have
to do some other approximation, and which is what we will look at, essentially is that
kind of variations of ((Refer Time: 11:24))

While I am talking about games, the most complex game, in terms of size of the game
tree, is the game called go, I do not how many of you are heard about it, it is a game
which is extremely popular in Japan, and size of the... and a feeling of the complexity of
the game, can be got by the fact that it is plays on a ninteen by ninteen board, and you
can plays, it is a game in which you plays coins, on one on top of the, not one. So, their
specified locations at cells you can place coins, and you can imagine that in the first
move, we can place it an any those ninteen by ninteen locations and then so on, so it is a
huge game in that terms.

Go is something that we have not been able to program well, by view in the whole
computing community, has not been able to produce good go playing programs. And the
people who talk about go; they say that no you have to use other techniques like pattern
recognition. By pattern recognition, I mean you know, trying to make out which both

531
positions are good, and which both positions are not good essentially. Some people use
the word zen with go and so on. So, games can be quite come, so go is much more
complex then chess in terms of, the size of the game tree which and be can really not too
much with that essentially.

(Refer Slide Time: 13:10)

One game that, is of interest to us, is a game call othello, also called as reversi, and we
have been using that for the last few years, for the game competition and will do so this
year also. So, maybe you should go and look up the game, the idea of the game is that, it
is played on a chess board, like board, except that it is a single color board, and we start
of by playing, two kinds of coins. So, which let me say, I am representing by knots and
crosses like that game, but it practices like, red coin and white coin, and things like that.

So, let say cross stands for red coin, and this stands for white coin, this is the initial
position of the game, in the center of the board and a move, you a move is made by
capturing opponent pieces, which means that, if I have a piece coin here, I can place the
coin here. And in the process, I will capture this piece essentially, which means this
piece becomes actually mike piece. So, I have made one move, and this is a game, then it
is opponents turn, so opponent can also do something similar. So, opponent can for
example, put a coin here, and then this is captured back, in some sense, then I can place a

532
coin here, and capture this back again and again, and the game sort of goes on.

So, if I, if the opponent were to put a coin here, then I then you would simultaneously
capture this and this essentially. So, both will get captured, because by placing a coin
here, you are enclosing one end you have this, and this end you have this, and this end
you have this. So, both in both directions you can... so in all... So, you can captured in
four directions, and if you can make multiple captures in one move, but you can, you
allow to place only one coin on the board.

And essentially, the game ends, when either one side cannot make a move or every or all
the coins have been place on the board. So, it is not uncommon for the board to be
almost completely fill them, and you win the game, if you have more coins in the board
at the end of it, but we have will be also give you a score has to how many points we
have, one by essentially. So, if it is a massive win. So, let say eight by eight board, you
have sixty point coins, and opponent is four coins, then you have a big win essentially.

Whereas, if you have 30 and opponent has 34, then he has a smaller win. So, that is the
game that we will use for the competition, you will have to write a program, which will
compete against other peoples programs. So, how do we play a game, whose search
whose tree we cannot access essentially? So, we do what. What humans do is that we do
a partial look ahead.

533
(Refer Slide Time: 16:35)

So, in most game playing programs, you do a limited look ahead, which means, that if
my tree looks like this, this is max, then instead of search, so I have drawn with this as a
practically, and infinite tree growing exponentially, we cannot explore this threes. So,
what do we do we, cut off at some level, this some people called as a horizon, we cannot
see beyond that, and this is called a number of plies. So, this is k ply, if the ((Refer Time:
17:15)) So, instead of searching the entire game tree, I mean if you have the complete
game tree, then you could have just pack up the values, and you would have found the
minimax value, and the best move for max in the process.

But we do not; we cannot search the game tree, because we have seen that they can be
really huge. So, we decide that we will low or limited look ahead, and try to decide what
is the move. Now, incidentally, that is how human beings also tend to play, at least the
beginners, that is how they tend to play, they do a limited look ahead, may be two moves
ahead or three moves ahead, but they do not do a complete look ahead, in the sense they
do not look at all possibilities.

So, most chess players for example, would not even consider all the twenty moves that
you can make at the starting point, they would have a fancy for two or three different
possible moves. So, either this pawn opening, or that pawn opening, or something like

534
that, and they would only explore that. So, we do a limited search, in the sense that we
do not look at the complete branching factor, we only look at the few possible moves,
and a few possible replies by the opponent, and a few possible moves that I can make
the, and in some sense of a search is incomplete essentially.

Of course, expert chess players tend to do more exhaustive search, they also tend to be
able to judge positions. So, let us, let me introduce that idea at this movement, then if
you are doing a limited look ahead, what is the use, because the nodes that you are going
to look at this level. They are not completed games they are sort of half way plate games,
but you have made a few moves, opponent as made a few moves, and based on this look
ahead, you want to decide, what you want to play essentially. So, what do we do, we
have to write algorithms, to go values to this nodes essentially, so what do we do, we
apply evaluation function.

So, again those of you might have played chess, sometimes or watch other people play
chess, then you can hear comments like, this is the good position for white or something
like that or white is winning. So, white is not yet one, but you look at it in say it looks
like white is winning, instead of making such qualitative statements, we want to give
quantitative values. So, we want to define a function which is an evaluation function,
which will give us a value for the board position. Now, what do we do instead of this, 1 0
minus 1 those three values, which are available to us, when the game ends, instead of
that we break it up into a larger range.

So, typically it is, let say minus 1000 to plus 1000, so the evaluation function returns as a
value in this range, which the understanding that plus 1000 is equal to the original one,
minus thousand is equal to the original minus, one and all values in 0 is equal to the
original 0, which means both sides are roughly equal, but the actual number tells you,
how good it is for max or how bad it is for max. So, if it is plus seventy, it is a little good
for max, if it is plus 700, it is much better for a max, and if it is minus 600, it is quite bad
for max. So, we try to look at a board position, and give it a value, give it a number.

Now; obviously, this means, different kind of reasoning, and in some science it involves
the use of knowledge, about the game essentially. So, you should be able to look at a

535
game, board position and give a number essentially. So, if we can now apply the
evaluation function to each of these nodes, on the horizon, which is wherever search
ends; then we can apply the same minimax back up rule, to evaluate the value of this
game.

So, the backup rule let we said, that at min level, choose the lowest value from it is
successive, and at max level, choose the highest value from it successive. So, we can
back up these values, and determine what is the value for that, and in the process also
decide what is the best move to make essentially. Now, clearly the performance of this
algorithm will depend upon how good your evaluation function is, essentially, because if
your evaluation function is good, then it will tell you which of the board positions are
better.

(Refer Slide Time: 23:02)

Now, ideally we would like to have a perfect evaluation function, which in a game like
GO people have try to do, is that you look at all the choices, and apply the evaluation
function here. So, we will call this function e, and let say e of J, were J is a node. So, we
apply e of J here, ideally we say just look at the all the options and see which one leads
to a better move essentially.

536
And then make them, but in practice, evaluation functions are never perfect, they are like
heuristic functions, you know you are making some judgment, and arriving at some
number that may not be accurate. So, in practice, it has been observed that a combination
of evaluation and look ahead does reasonably well for games. So, this is the situation
where a there is no look ahead, except that at this level, you just look at the choices, we
have and picking the bests base on a evaluation function.

(Refer Slide Time: 23:59)

Here, you are saying, I will look at my choices, first and I will look at what the opponent
can place, in that position and so on. And I will keep doing that, to some k ply,
depending on how much computing power I have, go up to the k (s) ply, apply the
evaluation function there, and then back up the values essentially. So, in some sense,
what will happen is that, that is something will get captured in the look ahead.

So, again if I use a analogy of chess, then if you are doing, you are capturing the
opponents look, and the opponents is capturing you bishops or something than at least
those things are known, that this, this is are going away essentially. So, the absence of an
evaluation function is compensated by doing more look ahead essentially, typical chess
playing programs that you get, on laptops for example, would do something like eight
ply or something like that. And generally, you can imagine that with this kind of

537
branching factor, the number of games that you have to look at is growing exponentially.

And people have surmised, that if you do sixteen ply look ahead for chess, with risibly
good evaluation function, then you can play the grandmaster level essentially. So, it is
a... so nice in practice of course, we do not do such simple searches, we will see, that
sometimes we do a little bit more search in some areas and so on and so far. So, what is
the game playing algorithm, we want to write, see we want to still win the game; we do
not want to make some move and say it is a good move essentially. So, the game playing
algorithm that we will use is. So, let us just call this k ply search.

(Refer Slide Time: 26:10)

We will have an do in which will say, do k ply search, and then make a move, and then
get opponents move, and we will put this into a loop, till the game ends. So, what are we
doing now? If the game tree was something that we can explore completely, we would
have analyze the whole game tree once for all, and said this is the strategy, and it is a
winning strategy. And then, you just play it according to that strategy, but the game tree,
we are not able to access and what we are doing now, in this algorithm, is that at every
move you are doing a search, every time you have to make a move, you do some search,
limited search k ply search, and then decide based on that.

538
So, what does this amount to? This amount to saying that if you make a choice here, let
say this is your game tree, and you do a search, let say up to this ply depth, which means
you make a move based on this much search. So, this is your move; then you wait for
opponent move, so opponent makes a move, let say opponent makes this move, so here.
Somewhere inside that original search that you did, now I this stage, you do another
search, limited search, again k ply search, and then make your second move. So, let say
this is your second move; then opponent makes a move; then again you do a search.

So, in this manner, you can see that for every move that you are making, you are doing a
limited search essentially, what is a advantage of that, is that there is a game and force
you are looking at those part of the three, which you are not seen originally. Originally
you have seen only till this part of the tree, and then after two moves, you can see two
plies deeper. Because you are made one move and opponent is made one move, then
your search will now look a little bit deeper, then again here, another two plies deeper
and so on. So, as it goes along, you are looking at different, so all that remains to do is to
write this k ply search algorithm that is the simplest of game playing algorithms.

(Refer Slide Time: 29:12)

We will see, how to improve upon that in that following classes, so let we first write the
algorithm, it is a very simple algorithm minimax, it is call minimax, and it takes a node J

539
as an argument, and it returns the minimax value essentially. So, let us say, this
algorithm only computes a minimax value, and on the process, we can put in a small
routine, which will tell you what is the best move, that is the secondary thing, which
comes out of it essentially. And rough you the algorithm we just follows, to let us
assume that you have a way of testing, whether you are on the horizon or not essentially.
So, you can have some kind of count, as you go searching into the tree essentially.

So, I am assuming that you will some of figure out how to do that, and then we do the
following, if J is terminal, so J is a node, and by terminal we mean a test, which tells you
whether you on the horizon or not. If you are on the horizon, then we get V J is equal to e
J. So, you simply apply the evaluation function, and you get the value for that note, else
it is not terminal for i is equal to 1 to b, where b is the branching factor, generate the J i
the i th child of J, then if i equal to 1, which means if you are looking at the first child,
then V J or let we uses slightly simpler is this.

Let say something called val is the minimax value of V J, if J, if i is equal to 1, then V J
gets val thus the first node, first child that you have looked at. Otherwise you will update
to a better child, else if J is max, then V J get max, so I just use this devised, because I do
not want write this, this is a recursive call notice, it is a recursive call, with an next node,
I just do not know to write it again and again. So, I am just writing in once, so once I
making a recursive call, and then if the first one of course that gets the value, V J gets
that value, if it is not the first child, then you compare with the current value, and this
new value that you are getting.

So, if it is max, it means that, you have got some value here, which is V J, and you have
looking at this child, and your getting a value V J, let say k, which is return by this. So,
you have, you going to return V J, which is the minimax value of this node J, and so this
recursive call will return the minimax value of V J. And then as I scan from left to right,
going, I go going from 1 to b, I will keep seeing, if I am getting a better value, from the
next call and so on. And wherever I get the better value I will store that.

So, it is a very simple algorithm, which will look ahead k ply deep, and compute the
minimax value of that game, based on the evaluation function, because you have at

540
terminal level, you have applying the evaluation function. And I am assuming that you
will augment this with, be able to select, what is the move that max should make,
because that is really the task that you have doing, that you do this much search. And this
algorithm is basically doing this search here in this area, but you want to make the move
also. So, you must keep track a what, where the best move came from essentially.

So, you must keep track of that then you will make the move, wait for opponents move,
and then again make a call to V J to decide what is the next move? ((Refer Time: 34:58))
So, this is the simplest of all algorithms essentially, what is the nature of this search can
you tell you what kind of search is this doing.

Student: First depth.

It searching this part of the tree, game tree or in if you look at this diagram, it searching
this part of the game tree, would in what manner is searching.

Student: Depth bounder, depth first, depth burst.

Depth bounder yes, because we have doing k ply, but within that bound, how is it
searching in.

Student: Depth first search

It doing depth first search, so you should figure out, this is really doing depth first search,
which of course, give us ((Refer Time: 35:46)) just may not be the best way of looking at
things essentially. So, in the next class; of course, we will trying to improve upon this,
next may be one or two classes, but the, in the remaining time, today which is about five
minutes are so. I want to just spend a little bit of time on this evaluation function, how do
you write in evaluation function for a board position.

Because it is the performance of the algorithm really depends on how good the
evaluation function is, if it can judge, accurately the value of a board, and by accurately

541
you mean, whether it is you know how close to winning it is essentially. Then just one
play search would be enough essentially, if you have very good evaluation function, but
other side in practice it not so easy to get very good evaluation function.

(Refer Slide Time: 36:50)

So, what goes into an evaluation, so essentially you want to look at a board position, let
say we are talking about chess, but in practice of course, when you do you will be doing
othello, when you look at a goal port position, you want to give it a value between minus
1000 and plus 1000 essentially or minus large and plus large let us see, any ideas how
you could give this number.

Student: Any games practice from, wherever it means back tracken keep asign((Refer
Time: 37:24)) keep it.

But you see, the whole point of this exercise, is that we are trying to our search problem
is so huge, that we cannot search the game tree.

Student: ((Refer Time: 37:39))

How do you play many games?

542
Student: May be initially begin with random assignment and you will ((Refer Time:
37:48)) and based on once you reach the end, go back track and ((Refer Time: 37:53))

So, I get a point, you are trying to say that, so you are a machine learning enthusiasts and
you are saying that, I want to learn the evaluation function, that is that will come. In fact,
Samuel Sekars playing program, improved it is game, because it improved it is valuation
function on the way, but that comes after word essentially, before that what are the
components of the evaluation, I mean.

Student: ((Refer Time: 38:24))

Oil all in machinery terms, what is the structure of a evaluation function, what I am
learning, am I learning parameters or weights, weights of what essentially. So, if you
look at Tom Michaels book the first chapter, he actually describes, how game playing
program can learn evaluation function, but then he gives it a linear combination of, I
thing is document checkers, of number of pieces I have, and number of pieces opponent
have, number of things I have, number of things opponent has, and a linear combination
of them is this thing. So, my question is more fundamental, that if you are to write the
evaluation function, just for. Let say you are a chess expert and All right, let say you
have Viswanathan Anand sitting next to you, and you say help me write this evaluation
function, what could he say.

Student: The answers piece of the pieces will have particular value and then ((Refer
Time: 39:22)), and there would be additional advantages, piece advantage.

So, typically an evaluation function will have two components, one is call material and
other is called positional. So, chess players will say, white is winning, because white as
material advantage, which means you know, white is got let say, one row can one bishop
extra. And then any good chess player will say, if you have that much more fighting
power, I am not going to play against you or rather you would resigned essentially. So,
one thing is material, number of pieces, you can say, some of now beginning chess
players might say, that you know a queen as value 9, and bishop has value 5 or 3 or 4 or
whatever I do not know, look as value 5.

543
And then you count, how many pieces, do I have, what is there values, and let say you
give negative value to opponents pieces, and from that I sub track how many pieces
opponent has. So, if I have more pieces or more valuable pieces, than the opponent, than
this, some will become positive essentially. Initially as you can guess, both sides of the
same number of pieces a value of material value is 0, both are the same number of
pieces, but as you capture some pieces, your material value goes up essentially.

Now, in practice; of course, chess playing people have much final gradation of values.
So, they compute in 100, for example, let say bishop is 200, and knight is 220 or
something like that, it really depends on your prospective of the game essentially. So,
one is the material value, how many pieces I have, and how many pieces opponent has,
rather is positional, we says, how are the pieces arranged. Now, this is of course, the
trickier part essentially, this is the more difficult part, because it is not looking a
structure, and not it is not just a method of counting, how many pieces I have, how are
they arranged essentially.

So, chess players, what say things like, if there are two rooks in the same. So, if this is
the chess board for example, then if I have a rook here, and if I have a another rook here,
in the same column, then chess player say that, it stronger position, two rooks in the
same column are very powerful, and you know, you should try to arrive at such a
position essentially. All they might say, that you know some pawn structures, linked
pawn structures, if pawns are supporting each other, it is better than pawns, if their
scattered around the place, and not putting each other essentially.

Then you know that if there is a knight, if n stands for a knight, and if it is the opponent
has a queen here, and a rook here. Then you can see that knight is attacking both the
queen and the rook at the same time. So, chess plays called is a fork, and a fork is;
obviously, a good thing to have essentially, because it among to saying that in the next
move, I am going to either capture a queen or a rook. So, you are going to use lose
material in the process essentially, because queen and rook are important then knight is
not so important in ((Refer time: 43:24)) material value essentially. And there are other
things like controlling the center, and attacking the center, and think like that.

544
Now this program deep blue, and when I read about this is so in 2002 or something like
that, it had a 1000 components to positional evaluation. So, just as we said you know,
rooks in the same column or connected pawns, a protected king or mobile pieces, so you
know bishop is slap then it not very useful, hence things like that. It has a 1000 different
component, which were use to evaluate the board position, in the positional part. So, that
obviously is the key to the whole thing, if you can look at a board position, and give a
value, and this is just an attempt to give an accurate value, it is looking at it piece ((Refer
Time: 44:19)) say this pattern is good, this pattern is good, this pattern is good,

And of course, if the opponent has that pattern you are going to subtract it from
essentially. So, really the secret is in devising a good evaluation function, if you have
good evaluation function, then you do not have to search very much deep in the game,
and you evaluation function itself will tell you what is the good position or not
essentially. So, what you will need to do, for your othello game is to, look at the game,
on the web or read about, what it in try to device an evaluation function, because that is
going to be critical part of your program essentially.

So today, we have seen this, that we cannot search the entire game tree, we have to do a
limited look up, look ahead, and we have a program to do k ply search. And we will
repeatedly call this program, for the first move, for the second move, for the third move,
for every move that we make we will call the program. And this simple version of k ply
look ahead is essentially doing depth first search, left to right and we want to improve
upon that essentially. So, we will do that in the next class.

545
Artificial Intelligence
Prof. Deepak Khemani
Department Of Computer Science And Engineering
Indian Institute Of Technology, Madras

Lecture - 28
Game Playing Alpha Beta

(Refer Slide Time: 00:15)

We are looking at game playing and in the last class, we saw the minimax algorithm. If
you remember, what the algorithm basically, does or what the game playing algorithm
does is that there are two kinds of players; one is max, and the other is min. Max is trying
to maximize the board value and min is trying to minimize the board value. The game
tree consist of alternate layers of max and min. To starting with max node, there are
some min children, and then, there are max children, and so on. This is the tree which
minimax algorithm explores, and we saw that this explore, minimax algorithm does, is
depth first search, left to right.

Now, today we want to look at improvement upon minimax, which does not inspect this
entire tree, up to this cape lie. So, this is cape lie search that we are doing. Can we do
without looking at the entire tree? What minimax is doing is going down all the way, to
this level and at this level, the evaluation function e of j is applied, and then, the values
of these leaf nodes are backed up using the minimax rule. The minimax rule says that if

546
you are backing up to a min level, you back up the smallest of the values of all the
children. If you are backing up to a max level, you back up the largest of the values of all
the children. So, at alternate level, we choose the minimum, the maximum, the
minimum, the maximum, and so on. That is why, the algorithm is called minimax
algorithm, but the question is that do we have to really inspect the entire tree, essentially.

To consider that, Let us first look at a small example. Let us say you are playing this
game of Tick Tack Toe, and for some reason, this is how the game proceeds. Let us say
you play this and we are not drawing the game tree; we will just draw the board. The
opponent plays this; Let us say. So, the opponent is mirroring your moves. You play this
and then, the opponent plays this. Now, Let us say you are doing some search from his
point; one ply search. If you want to, now, consider this move. So, you play this move
here, or you are looking at this move. Then, you can see at this moment that if you are
considering this move, then there is no need to look at all these other children. Why
because this is a winning move position, and you have won the game. So, why consider
the other moves at all, essentially. So, this is the idea that we will explore, up to a greater
depth, essentially, and the algorithm that we want to look at today, is called Alpha Beta .

A little bit of nomenclature, before we continue; max nodes are also called alpha nodes,
and min nodes are also called beta nodes. Max nodes store alpha values, and min nodes
store beta values. What are these alpha and beta values? These are the values of partially
computed node, essentially. Let us see how these happen.

547
(Refer Slide Time: 04:49)

Let us say this is the route node, which is the max node. We always play the game for
max, and at some point of the game, Let us say max is trying to evaluate this particular
min child, which is not the first min child, because it has already seen; remember, that
we are going from left to right; it has already seen some min children, and the sub trees
below that. So, it has the already explored the sub trees this side. It is trying to, now,
evaluate this min child. What will it do? If the value supplied by this min child is higher
than the values supplied by all these children, then this would be adopted; otherwise,
those values would be adopted. Let us say this value is 10, to begin with. So, we say that
alpha becomes 10, after this node is completely evaluated; it means that sub tree below
that is completely searched; alpha becomes 10, because that is a value, this beta node is
giving to this.

Let us say then, we evaluate the second sub tree below that, and this happens to be 15.
Now, we change this alpha to 15. So, this alpha value for this max node is the value it
has seen so far, and the value comes from the left hand side of the tree, essentially. Let us
look at an example of slightly, deeper example of this Tick Tack Toe game.

548
(Refer Slide Time: 06:33)

Let us assume that we are using this following evaluation function, that e of j is equal to
count of the number of rows, or columns, or diagonals; available to max, minus the
number of the same thing, available to min. So, essentially we will evaluate a board
position by saying that how many are available to each side. For example, if this is the
board position that we are looking at; let us say this is the board position we are looking
at. Then, we can see that max has this row, one, two, three and four; two rows and two
columns are accessible to max. For min, it is this column, this diagonal, and this row; so,
3 to min, essentially. So, the value of this board position would be 4 minus 3 is min,
essentially. So, we will use this evaluation function to illustrate this idea of cut offs,
essentially.

549
(Refer Slide Time: 08:07)

Let us say we are starting the game from the beginning, and then, first we explore max,
playing at this corner, here. Now, remember, the game playing algorithm are used as
combination of search or look ahead and evaluation function. So, they do not evaluate
this board position at all. They would look ahead a little bit for evaluating the board
position at this, and then, back up the value essentially. So, let us say about, we are doing
two searches that is easy to defect here. So, we look at one more level. So, at this level,
let us assume that we are looking at this move for, when we are going look at all the
moves for min. Let us say we look at this move for min.

We evaluate this board position. Now, if you look at this board position and count them
carefully, you will see that there are six rows or columns or diagonals available for max.
So, 1,2,3,4,5 and 6. So, 1,2,3 sorry, 1,2,3,4,5,6, and if we look at min then, they are 5;
they happen to be 6 minus 5. So, the board position of this is 1, essentially. Now, this is a
max node. So, we are going to compute alpha values for this. So, let me draw this as the
max node, and this is the min node, because it is min play here, and these on max nodes,
but it does not matter. At this point, you can see that beta will become 1.

The moment we evaluate this position and this node knows that this beta is 1. Now, what
are beta values? Beta values are values of min nodes, and they are upper bounds on the

550
values that they can take. So, beta values are upper bounds, and likewise, alpha values
are lower bounds. What do we mean by this? We mean that once, this node has seen one
value from a left child, which is 1, it is not going to accept any value, which higher than
this value. From the remaining children, it is only looking for lower value. So, this beta
value, which is the partial value, it has got from here, is an upper bound on the value of
this node. It can only be 1 or less, essentially, less, if one of the children evaluate to less,
essentially. So far, we do not have an alpha value, because none of its children is
completely evaluated. So, we look at the second child. Let us say this is the second
option, we are looking at. This, we can see, is symmetric in nature, that both are at two
corners. So, the number must be equal to 0. So, I will leave that for you to verify, and let
us see this one and this one. Now, if we count this, you will see that max has 1, 2, 3, 4, 5,
and min has 1, 2, 3, 4, 5. So, let me know if it is wrong; 5 minus 5 is equal to 0.

One thing, we should have done. The moment we saw this 0, we should have changed
this value to 0, because beta has gone down to 0. Now, 0 is the upper bound on this
value. Then, we look at this or may be this one is 1. 6 minus 5 is equal to 1. This one is
again, symmetric. So, it must be 0, whatever, the count is. Finally, we look at one more
move for win, which is this. Now, this turns out is max as 1, 2, 3 and 4; only 4 available,
and min has 1, 2, 3, 4 and 5, minus 5 is equal to minus 1. Now, we have a new value for
beta. So, beta becomes minus 1 and now, this node is, of course, completely evaluated,
which means, it can say we can think of these as suppliers.

So, beta nodes are suppliers to alpha nodes and below them, alpha nodes are suppliers to
beta nodes and so on and so forth. So, beta nodes always choose the smallest value, and
alpha node will always choose the highest value from what its suppliers have given. So,
here we can see that alpha is equal to minus 1; that is the value this first beta node is
supplying to it, which we can read as saying that alpha is going to be greater than equal
to minus 1, which is the characteristics of an alpha node, essentially. So, from the other
children that we going to look at for max, it is only going to be interested in a node, if its
value is going to be greater than minus 1.

Let us try the second option. Let us say max try this option, and we try the first option for
min, and see this is the first option, which I have for min. We always begin from the top

551
left hand corner, let us say. Now, we can see that, we have already seen this position.
This is the opposite of this position, and the value of this is 5 minus 6 equal to minus 1.
The moment we see this, we know that this is minus 1, and by this, you remember that
beta is less than equal to minus 1. Now, here we have alpha is already equal to minus 1
and this beta says, that I am going to be minus 1 or less. Though this alpha, in some
sense, will tell this beta that we do not, I would not be interested in you any more; that
you do not need to value it yourself any further; which means all these other children that
beta was considering, like this five children we had here; there also, we would have five
children. It would not be evaluated.

So, this is the cut off which takes place, and this is called an alpha cut off. So, an alpha
cut off appears at a beta node, which is a descendant or in this case, a child of alpha
node. It happens when the beta node promises to be worst than or lower than or not
higher than, rather, than what alpha already seen, essentially. So, after alpha has seen one
side, it will now do this in a very control fashion, which is only, as long as we have
better than minus 1, I am going to be interested in you; otherwise, do not explore the sub
tree below that, essentially. So, this will get cut off and then, alpha will try; this is a third
option. So, if we ignore symmetries, I mean, if we take into account symmetries, then
you can see that max has only three moves to start with, either corner, or on a side, or on
this. This small move, I will leave as an exercise for you to explore. The idea is basically,
that a cut off takes place, when there is enough information, essentially. If we were doing
10 ply search, for example, then the entire trees; 8 ply trees below this, would be cut off.
So, the saving would be considerable in amount, essentially.

The algorithm that we are looking at today; this alpha beta algorithm, essentially, is like
minimax, in the sense that its searches from left to right, but it does cut offs along the
way. We have illustrated one cut off, which is alpha cut off, which you can also think of
as alpha induced cut off, and it happens at the beta node. So, beta node stops evaluating
itself, if the parent alpha node tells it to stop evaluating itself. Then, it is an alpha cut off.
In a similar fashion, beta cut off will take place at an alpha node, or it could be thought
of a beta induced cut off, essentially. Now, these cut offs do not necessarily, have to be at
the immediate level, essentially. They can happen at a much deeper level. So, let me
illustrate that with a diagram. Let us say that you are evaluating this deep game tree.

552
(Refer Slide Time: 18:02)

This is the route node, alpha node and we are, this diagram, I am basically, repeating it
here. We are evaluating this beta node and you have evaluated some part of the tree; so,
always the left side of the tree, you have finished evaluating. So, we have got some value
from here; alpha 1; from this side of the tree and essentially, what this node is trying to
do is to see, if it can get a better value, better than this alpha 1 value. What is this alpha
1? Alpha 1 is the best amongst these here, and it is time to see, if this node will supply it
a better value, essentially. Likewise, this node may be looking at a alpha child, and it
may have got some value, which we will call beta 1, from the left tree, that it has
explored on the left side, essentially. So, this process continues. This has got some value
alpha 2 from here.

Then, this has got some value beta 2 from here. Then, let us say this is a node. Let us say
that this is a beta node that we are about, we are trying to evaluate, essentially. Now, if
this node we will call j, which means, we are evaluating this slowly, by looking at its
children. The question we want; so, this has alpha 2 here. So, I hope this diagram is
clear. Just imagine, this that depth first search, sweeping from left to right, and all these
values are coming from the left side of the tree; alpha 1 coming from the left children of
this, when it is trying to evaluate this. This, in turn, is trying to evaluate this. It has got
some partial values beta 1. This one is trying to evaluate this. This has got some partial

553
value alpha 2 and this has got some partial values beta 2. This has got some alpha 3 and
this j is what we are trying to evaluate. The question we want to ask is; when will this j
value reach the route, or in other words, when will this j value influence the game,
essentially? Is it the node worth exploring? You can see that this j value will influence
this node, only if it is better than alpha 3. So, that means, let us call it v j, that is the value
of this node j, because we may be searching deeper, must be greater than alpha 3;
otherwise, this node will take alpha 3 from here. Likewise, it must be better than alpha 2
as well, because otherwise, this will take alpha 2 and also, alpha 1.

Only if it is better than this alpha 3 value, because these are max nodes, remember, and
they are looking higher values. Only if this node supplies the value, which is higher than
alpha 3 and higher than alpha 2 and higher than alpha 1; will it influence the route node,
essentially. Likewise, it must be less than this beta 2, because this is the beta node, and it
is going to only take lower values and also, beta 1. So, we can generalize this and say
that we need to evaluate this node j, only if v j is less than beta and is greater than alpha
where, alpha is equal to max of alpha 1, alpha 2, alpha 3 or in general, all the ancestors
of this node; all the alpha ancestors of this node. So, alpha must be higher than all these
ones, and this value must be higher than that max of that. Beta is min of beta 1, beta 2
and all the beta ancestors of the node, essentially. I just repeat, this node is worth
evaluating only, if it is higher than this alpha 3, and this alpha 2, and this alpha 1, and at
the same time, lower than this beta 2, and this beta 1, and all the beta ancestors,
essentially. So, this alpha and beta can be seen as the bounce, within which, we want the
algorithm to search; otherwise, it should abort the search or prune that below that,
essentially.

Let me write the algorithm first, and then, we will look at a slightly, more detailed
example, essentially. So, I hope this is clear. So, we can think of this alpha beta as the
window. If these are the values of the game three, then this alpha beta is a window. Beta
is an upper limit and alpha is a lower limit. The absolute possible is plus large and
absolute minimum is minus large. So, remember, we had said that the evaluation
function can be something, like in a range of minus 1000 to plus 1000 or something like
that; instead of 1000, I am writing plus large, some suitably large number.

554
What the alpha beta algorithm does is that for evaluating any node, it passes a window
and says, only if you can get me a value inside this window, I am going to be interested
in that. So, again, look at this node. The window is defined by this alpha and this beta
where, alpha is a maximum of all the ancestors, and beta is the minimum of all the
ancestors. What will the node try to do, because this example here, it is a beta node; it is
going to try to pull down this beta and say, I want a value slightly, lower than this beta. If
it, alpha node try to raise the value. So, as we sweep this from left to right, this window
gets smaller and smaller, and search progresses only inside this window; otherwise, the
tree is pruned off. Let us write this algorithm, or at least, I will write a part of it, and you
can write the rest.

(Refer Slide Time: 25:01)

Alpha, beta; it takes an argument j, which is the node, and a value; alpha, and a value;
beta, which are parameters, essentially. These parameters are basically, the window sizes
that are passed on to the node, essentially. Let me first ask you what should be the value
alpha and beta, when we call the game playing algorithm at the root, essentially? So,
alpha should be minus large and beta should be plus large, when the first time we call it;
that means, windows completely opened. Then, the algorithm is simpler. It is basically, a
small variation of the minimax algorithm that we have seen. So, if we assumed that we
have a function, which tells us, weather j is on the horizon or not. You can do this by

555
keeping some kind of a counter as you search deeper, so that, every time you make a
recursive call, you also decrement the counter by 1. Let us say you are doing eight ply
search. When you make this call, they count as 8; when you make this call, they count as
7 and so on. When the counter becomes 0 that means, it is a terminal node on the
horizon. The implication of being a terminal is that you have to apply the evaluation
function stop searching further, essentially. So, if it is terminal, then we can say that v j is
e j where, e j is the evaluation function that we are using in the game. Else, we have to
evaluate still then, which means you looking at a node; it is not a terminal node.

So, you have to evaluate all the children, essentially, one by one. So, for j, let us say i
going from one to b. So, Let us assume that b is a bouncing factor, or every node has b
children. We are going to evaluate them from left to right, essentially. If j is a max node,
what do we want to do here? If it is a max node, like this one, for example, or this max
node; it has got some alpha and beta bounce, given to it, and we want to evaluate this
from left to right. So, this is j 1; this is j 2 and so on, up to j t; we want to evaluate from
left to right, and we want to see, if we can get a higher value than what alpha has.
Remember, it has to operate within this window, and seek a higher value, essentially. I
will write the part for the max, and you write the part for min. So, if j is a max node, then
what you do is; alpha gets maximum of alpha, and a recursive call for j i, the highest
child, with the bounce alpha and beta; the bounce keep getting propagated. So, what does
this step say? We are doing from; i going from 1 to b. So, for every i, we will evaluate
the highest child, and if its providing the better value, which means a higher value,
because we are using the max function here, then alpha. Then, we will update alpha to
that, essentially.

If alpha becomes greater than beta, then we say, return beta. What does this mean? Look
at this alpha value here. It has got this beta bound coming from the top, which is beta 1
and beta 2. If the value that we are trying to compute for this alpha becomes higher than
this beta or this beta, it does not matter. Then, they are going to say that no need to value
it anything here. So, we are making a return statement there, and say, return whatever
beta bound is and that is the best you can do with this. So, this amounts to a beta cut off,
because it is happening in an alpha node; it is dictated by the beta value; it is a beta cut
off; otherwise, if j equal to b, sorry, i equal to b, which means you who have looked at

556
the last child and evaluated it, and you already done this, choosing the best of this values.
You can return alpha. So, this takes care of the alpha side of thing. The other option is l
s, which means j is min; I will not write this completely, but you can fill it up yourself;
this would be similar if i equal to b, return beta. This would be analogous, which says
beta is min of beta and the recursive call. This test will remain the same; alpha greater
than beta, because this test signifies that this window has closed in some sense. This
alpha value has gone above the beta value. So, there is nothing left to explore, but the
return value would be alpha, essentially. So, you can fill up those details, essentially. Let
us now look at a slightly more detailed example of this algorithm.

(Refer Slide Time: 31:43)

Let us assume that we have binary search tree, and it is 4 ply deep. Let us draw the tree
first. We will have 16 leaf nodes and since, a binary tree; there would be beta nodes
sitting here, and then, alpha nodes sitting here, and beta nodes here. This is the root. This
is the alpha node. So, I have just drawn the space that is going to be searched; the space
of nodes; we have not drawn the tree. What we want to see is how will alpha beta
algorithm explore this tree, essentially. Now, observe that the minimax algorithm will
explore this entire tree from left to right, and back up the value. So, these beta nodes will
take the minimum of what they get from here. Then, alpha nodes will take the maximum
of beta children. This beta will take the minimum of these alpha children, and so on. So,

557
you want to see what alpha beta does; we will fill in some random values, just to explore
the algorithm here. So, it is during depth first search; left to right. So, it first goes and
evaluates the first left most node in the tree. Let us, for argument sake, say that this value
is 50, some random value, essentially. Then, it evaluates its next child. Let us say that is
40. So, at this point, this beta is completely evaluated. Its value is 40 and since, this is
completely evaluated, this alpha is equal to 40, which, remember, you must read as
saying that alpha is greater than equal to 40, essentially. Then, it starts the next round. It
looks at this, and looks at this. Let us say this happens to be 30, which means this beta is
equal to 30. Now, you can see this relation between this beta and this alpha node. This
alpha is saying, I am at least, 40. This beta is saying, I am at most, 30. So, this alpha
node is not going to be in this beta node. So, we have an alpha cut off. I will just write
alpha here, to signify, it is an alpha cut off.

So, we are not going to look at this node, essentially. Then, we continue the depth first
fashion; go here, go here, go here, and let us say that this is 70. So, this beta becomes 70
at this moment. Then, explore this, and this is 60. So, this beta is 60, now, and this alpha
is also 60. This beta is 40, because it is getting this 40 value from here, and it is
essentially, asserting that I am 40 or less. This one is getting 60 from here, and we say, it
is 60 or more, essentially. So, which means, this beta will induce a cutoff here. So, this
will be cutoff, and this is the beta cut off, or beta induced cut off, essentially. So, we are
not looked at those two nodes, those two leaf nodes. In fact, we have not looked at the
entrie sub tree there, essentially. At this point, this alpha is equal to 40 and again, we do
depth first search, and let us say then, this value is 30.

Now, you remember this; a node is influenced or the bound that node gets, is influenced
by all the ancestors. In this case, then, only one ancestor, which has a value, which is this
root for this value is 40. Look at this beta node. This beta node is saying it is 30. Of
course, it could go lower than 30, but at the moment it is 30. So, this now induced the cut
off, what we call as a deep cut off. That, because this is upper bounded by 30, this node
is never going to be interested in what is going to happen here; so, might as well, cut it
off, and this is an alpha cut off. So, the alpha cut off does not have to be induced by a
parent node. It can be induced by some ancestor, of course, which is what, we have said
here, explicitly. When you have said that this alpha bound is actually, the maximum of

558
all the alpha bounds, essentially.

So, this alpha is 30. It is saying, I am going to be 30 or more. Let us say we come here;
we come here. Let us say this happens to be 70, and this happens to be 80. Now, this
becomes 70, and this value of also becomes 70, and this beta is 70. So, what is
happening? Now, when we explore this child, it is getting the bound of, beta is 70 and
alpha is 40. It still has the window opened, so, we must explore this tree below here. So,
we do that in the depth first fashion. Let us say that this is also 30, then again, we
introduce a cutoff here, very similar to these same values. So, this is now, 30 or Let us
say now, this is 80. So, we do not get this cut off. We do investigate this node. Let us say
this happens to be 90, and this happens to be now, 80. This says alpha is equal to 80.
Now, look at this beta and this alpha. This beta says, I am at most, 70. This beta is
saying, I am at least, 80. So, we have a cut off here.

We can see that we did not look at this node; we do not look at this node, and we did a
fair amount of cut off. So, out of the 16 nodes, this alpha bit algorithm has not seen 6
nodes; it has seen only 10 nodes, essentially. As an exercise, I will ask you to fill in
values, so that, the number of cut offs are maximum. At the same time, as a different
exercise, fill in the values, so that, there are no cut offs at all. Now, it is possible that the
values are such, that there are no cut offs, essentially. That happens, because the
algorithm is searching from left to right. What does left and right mean? It basically,
means in what order you are generating the moves, essentially. This is some game in
which, you have two moves; let us call them, a and b. You are generating a first, and
then, b, essentially.

Now, what this exercise will reveal to you is now, what is a mini max value of this game
tree? This is 70, this is 40, and this is an alpha node. So, this value is actually, 70 where
is it coming from? It is coming from this node here. This 70 is coming here. This 70 is
going here. This 70 is coming here; which means that the game that will be played, if
only on this analysis; would be that max would make this move; min would make this
move, because that is what min can do best; it is getting 70 here, and 80 there; max
would make this move, and min would make this move.

559
If you go and flip this tree left to right, which means, this game value would come on the
left part of the tree. Then, you would notice that the number of cut offs are moved,
essentially. In other words, if somehow, the best moves are made earlier, if they are
made in the left part of the three, then the number of cut offs will be more. This, you can,
sort of, understand by constructing an example in which, the best moves are on the left
hand side. In fact, if you try to construct a tree in which, you fill in values, so that, the
number of cut offs are maximum, you will see that the game value will come from this
left side of the tree, essentially. Now, that has an implication for game playing programs.
If you are writing a game playing program, you would like to generate your moves. For
example, you are doing the Othello program, and you have some set of moves to
generate. You would like to generate the moves as far as possible, in such a manner that
the best moves are considered first, and then, the worst moves, essentially. So, the
question is; how can you order moves, essentially? Remember, we are discussing this in
domain independent fashion. Of course, you can apply some domain knowledge to saym
these moves are to be considered, and so on, but in a domain independent passion. Any
suggestions?

Student: Heuristically

Heuristically, but how do you choose that heuristically?

Student: Evaluation.

So, here is what many people do. Remember, that when we are playing a game playing
program, you are doing some search up to some Cape lie, deep search, here.

560
(Refer Slide Time: 43:59)

Let us say up to this level or whatever, and you decide to make. Let us say this happens
to be the move that you are making, essentially. This is a move that you made, after you
have done this whole game play search, essentially. Then, what happens? Then, its
opponent turns to make the move, because opponent is going to play the next move. You
have made a move, and opponent will play a move, but what you have access to, is your
search tree, and has to below this max node. Now, you will have to your next move at
this, assuming that this happens to be the best move for min, which will be the case,
because your analysis search that you are considering the best moves for min. Min is
taking the minimum value from all these. Let us assume that this is the minimum value
that min can get. Next time, when your turn comes, you will have to play from here,
which means, you will be searching the tree, starting from here, and going down,
essentially.

Now, you can exploit the search that you did in the previous round, below this node,
because you are getting values from the all the children. You order the children, so that,
the higher order min children come first, and below that, the lower order max children
come first. You can order the sub tree, so that, the best nodes are coming to the left side.
If we do that you are likely to get more cut off, which in the real tournament
environment, will mean that your move would be faster, and you would have more time

561
for the subsequent moves, essentially. You can also do some amount of analysis in
opponent’s time. This is opponent’s time, here. What do the mean by opponent’s time? It
is when opponent is thinking. So, you have made a move, and opponent is thinking; what
to move? That time, in the real world is available to you. You can also think of what
opponent might move, and what you would respond; you could do this analysis in
opponent’s time.

The only thing is, of course, in the game playing assignment that we are going to give
you; you will not get this opponent’s time, because you will have a separate thread
running. We should be invoked, only after opponent has made their moves, but in the
real world game playing situation where, you take your own computer to play. Again,
you do have this time to do some further analysis. One analysis that we can do is to try to
order the moves in such a way, that the best moves comes first, essentially

(Refer Slide Time: 46:36)

Now, this algorithm suffers from this common trade that we have been observing
throughout this course, which is that it is a blind or uninformed algorithm. It searches
from left to right. Of course, this desire to order the moves is, in some sense, the desire
the give it a direction, but given a fixed order of moves; can we have an algorithm, which
will have a sense of direction, which is, which will be like a best first search algorithm?

562
Indeed, there is such an algorithm called sss star algorithm, but we will take that up in
the next class.

We will stop here now, with alpha beta.

563
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Lecture No. - 29
Game Playing SSS

So, we are looking at game playing algorithm.

(Refer Slide Time: 00:16)

And we saw in a last class the alpha beta algorithm, and the tree that was explored by
alpha beta we saw here. It is a binary game tree 4 ply deep and those shaded nodes are
the ones which alpha beta did not explore essentially. Now, alpha beta algorithm
searches from left to right and we want to now look for a algorithm which is not
uninformed, which is not blind in this manner, but has a sense of direction. So, just as we
moved from death first search to best first search by introducing heuristic function, we
want to look at an algorithm which will go towards what it thinks is a good solution
essentially. And indeed such an algorithm was given by stockman in 1979 and it is called
SSS star.

564
(Refer Slide Time: 01:06)

Now, to understand this algorithm first we must understand what is the space in which
this best first search will happen, and look at what max does. So, max is playing a game
of the many choices that it has available, it makes one choice. So, of course, on the
surface level it is selecting a choice, but it is making a choice on the basis of having
searched a key up to some depth, which is let us say k ply search we are doing.

And it is, it is on a basis of this look ahead that max has made this move. And on the
basis of having applied a evaluation function on the horizon essentially. So, in effect max
has force looked ahead at or foreseen what would be min’s response? So, in other words,
max is comfortable with all of min’s responses, and for each of those min’s responses
max has thought ahead that I will make one particular move and then, max is taken into
account, all of min’s responses. So, let us say, this is also a 4 ply search tree. Max has
looked at this entire set of possibilities, and on the basis of this has made the move that
max has made essentially.

So, you can say that, this sub tree that we are seeing is what max has concluded at the
result of this search. He says if max will make this move, whichever move min makes
max has an answer to that and then, whichever makes min makes max has taken that into
consideration while evaluating this thing. Now, you will remember that, such a sub tree
is called a strategy. So, max has chosen a strategy of course, alpha beta algorithm does
not search in the space of strategies, it is does not even consider strategies as such, but if

565
we are going to look at a different algorithm which is SSS star. SSS star searches in the
space of strategies.

So, let us say, this strategy that max has chosen this strategy and let us call this S, and we
have discussed this earlier, what is the value of this strategy? And value in terms of the
leaf nodes that max has taken into account. So, let us say, this is 10, 20, 40, and let us say
all these are bigger values 60, 70. Let us say this is 5, 10, 15, 30. Some values, I am not
filling all the values, but let us say we have some values 10, 20, 40, 60, 70. There is a 5
here 10, 15, 30. What is going to be the mini max value when max uses this key?

It is going to be the minimum of all these values. Because once max has frozen max’s
strategy, it is only up to min to choose and min will lose some analysis and say that, okay
if I make this move and if then, if I make this move then, I will get a value of 5, which is
what min is going to drive it at. And max has no more choices left because max’s
strategy has frozen essentially. So, the value of a strategy is the minimum of the value of
leaves in this strategy S. So, I just use this notation to say that these are the leaves of the
strategy S, and the value of the strategy is the minimum of the value of the leaf of this of
the strategy.

Now, as a corollary, if were to choose a random leaf in a strategy. So, if I chose some
leaf L value of L. So, let us say this is L, some leaf, it does not matter. How does it
influence us, what relation does it have with value of S? It is greater than or equal to the
value of the strategy, which means the value of a leaf is an upper bound on a strategy in
which contains that leaf. Now, let us look at this binary search tree again, if I were to
select this leaf 50, which has this value 50, how many strategies will this be a part of?

You can see that, this of course, is min’s choice and here max has made a choice, and
that choice has 50 as this leaf. Max has to encounter so, so this strategy would be this
choice for max, both these choices for min, this particular choice for max, and both these
for min essentially. Now, since max has to consider both these choices for min, this as
well as this. When max makes this move, max is either selecting a strategy in which max
is making this move here, and this move here. I mean max is considering this move, max
is considering a strategy in which max is making that move and this move, and the
strategy may have this move precisely.

566
Alternatively, this leaf will also be part of a strategy where max makes that move, this
move here, but the other move at this place here, that also will contain this particular
strategy. So, you need to visualize this a little bit, that in this particular game tree every
leaf belongs to 2 strategies, and that is the one strategy is when max makes this choice
here, and the other strategy is when max makes the other choice here. Because max has
to take into account this choice of min here. So, there are 16 leaves, and how many
choices, how many strategies does max have? 2 here, choices here, 2 choices here, and 2
choices here so, max has 8 strategies available to that and as an.

(Refer Slide Time: 09:19)

So, I will just write it here number of strategies equal to 8 for a 4 ply binary search tree,
and as an exercise you can see that for 5 or 6 ply it is the same it is. So, 8 is equal to 2
raised to 3, for 5 or 6 it is 2 raised to 7, for 7 or 8 it is 2 raised to 15, for 9 or 10 it is 2
raised to 31 and you can extrapolate from here. And I will just give this as figures, you
can work this out yourselves that as you go deeper, the number of strategies increases
considerably. And as an exercise you can so, this 2 is a branching factor that is playing
the role here. If he replaces 2 with B what would the figure be like this?

It is an interesting exercise, but you can do that later. For this particular tree, there are 8
strategies and if I choose any arbitrary node, I will get 2 of those 8 strategies, choose any
node and you will get 2 strategies. So, what does SSS star do? It is searches in the space
of strategies, and it must be exhaustive in the sense that it must not miss out on the best

567
move. So, it must consider all strategies to begin with and then, choose one from that.
That is of course, brute force, but we do not do brute force here especially.

(Refer Slide Time: 10:57)

Now, a leaf node, you can say is represents a partial strategy. In other words, it is a
strategy which is not completely refined, if I have seen only one leaf in a strategy, I have
seen only part of the strategy. So, for example, if I have seen this leaf 20, its value is 20,
I have, I know that this part of this particular strategy, it may be part of other strategies
as well. But whichever strategy it is a part of, it has this property that it is an upper bound
of the value of this strategy.

A leaf node also represents a cluster of strategies, which is what we were discussing in
the moment ago that, if I were to choose this node 50, the left most node then, it
represents a cluster of 2 strategies. And the 2 strategies are the one where max makes the
left choice at the top level, and if min were to make this choice then, max has decided the
left choice, but if min were to make this choice, max could either choose this or it could
choose that. So, those are the 2 strategies.

This particular node 50 represents, and this 50 would be an upper bound on both those
strategies. So, what we need to do is to select leads to cover all strategies, that is the first
point of SSS star algorithm, that somehow you select the leaves in such a manner that,
you have a representative from all possible strategies, which means you have covered all
strategies. How do you do that?

568
It is very simple, you extract a sub tree from the game tree in the following fashion that
at max level choose all branches. Why should we do that? Because we want to cover all
strategies. We want to not miss out on any strategies so, for example, at the root level,
we must consider all possible moves that max makes, and likewise at deeper levels
essentially.

So, at max level choose all branches, at min level choose 1. If I do this, so, I should write
here set of, I will construct a set of leaves which will cover all strategies essentially. So,
let us do that for this game tree that we had considered for the alpha beta algorithm. So,
we want to now explore how SSS star will do it, but first it starts off by selecting a set of
leaves where each leaf represents a cluster of 2 strategies in this case, but you must select
the leaves so, that all strategies are covered. So, that we do not miss out on any strategy.

(Refer Slide Time: 14:48)

And the way to that is to select all choices for max, one choice and let us say, arbitrarily
we are choosing the left most choice for min then all choices for max and one choice for
min. Likewise here, one choice for min all choices for max, one choice for min. SSS star,
this algorithm starts of by selecting this set of nodes, leaves.

So, let me just underline these leaves here, we are looking at this one, this one, this one,
and this one. And I will ask you to think about this and convince yourself that of the 8
strategies that we talked about, these 4 leaves cover 2 each, and they are all disjoint and
they cover all the 8 strategies essentially. Now, max is basically tasked so, remember the

569
best first algorithm that we had talked about and then, we also looked at the travelling
salesman problem which is the high level algorithm is that define.

(Refer Slide Time: 16:28)

Best looking partial strategy, a partial solution in that case, in our case the solution is the
strategy till best solution is fully refined. This is the high level algorithm for best first
search A star fits into this, branch and bound fits into this, best first fits into this. At all
point, we are maintaining some sort a priority queue in which we are able to pick the best
value strategy.

So, in A star the best is defined in terms of the f value, which is h value plus g value. If
you remember the travelling salesman problem, we said that at the top most root level,
the set of cities represent all possible to us. And then, we partition this set into 2 sets one
which contain one edge, and the other which did not contain this edge.

So, this was a partial tour, both these are partial tours, they are not fully specified and we
had figured out a way to evaluate what is the lower bound cost on those tours, and we
always picked up best looking tour and refined it further essentially. We want to do
something similar here in SSS star is that, we first cover all the possible solutions and
that is what we have done by choosing the set of leaves to form the clusters which cover
all strategies.

570
So, in this case we have 4 clusters, and in each cluster we have one representative leaf
50, 30, 30, and 70, and that leaf represents an upper bound on the value of the strategy
that this cluster belongs to, this leaf belongs to, each leaf belongs to 2 strategies. So, we
will follow this, we find the best looking partial solution until the best solution is fully
refined when it is fully refined, when the strategy is completely defined. So, this
algorithm SSS star, it uses a priority queue of clusters, but clusters are not represented
explicitly, they are represented by the representative node essentially. And each element
in the priority queue is represented by the name of the node by a value which is either
live or solved and a bound.

So, initially when we start the algorithm, we start it with the root and we say it is live by
live we mean which is not solved essentially. So, you can now also make a comparison
or with the AO star algorithm that we had seen. The AO star algorithm keeps refining a
solution till the root gets solved, root gets level solved, which means that the solution is
completely refined.

And at some point I had also made this observation that, solving a game tree is similar to
solving an of problem because at the max level we have to make one choice, so it is an
odd node. At min level you have to consider all choices, and it is like a min node except
that you do not sum up the pass cause of the parsing solutions as we did in A star, but we
take the minimum of the cost of the parsing solutions here essentially.

So, it is very much like the A star, AO star algorithm and this is the terminology we use
in the AO star also that is we had nodes levels solved or not level solved. In this case we
are explicitly calling them live. The algorithm will proceed till the root gets level solved
so, that is a similarity with AO star you should observe. AO star was the best first
algorithm if you remember and this is also going to be the best first algorithm.

So, we start with this and this bounds. Initially we insert this into the priority queue, this
triple where the root node and level it live and plus large. And always the highest value
will be the head of the root, the priority queue a max queue then, the algorithm picks the
head element which is the value with the largest bound or heuristic value if you want to
call it, and does the following.

So, this is, these are the cases when you pick the head node from the root that, it is max
and it is live then, you add all the children to the queue with the same bound and call

571
them live essentially. So, I should in fact, say non terminal. So, some where there is a
test so, remember the terminal nodes are those on the horizon, and at the horizon we
allowed to apply. In fact we have to apply the evaluation function and get a value for that
node.

If it is max, it is a max node and if it is a live node and if it is non terminal then, add all
children to the queue. So, you will see that this is the step, at max level choose all
branches, at max level we are adding all the children. So, initially we would have added
this root then, we will add both these children to this thing and then, we will call them
live essentially. If it is a min and live and non terminal then, add one child.

So, you can see these two steps correspond to the initial formation of the clusters
essentially. And since this is a priority queue with this plus large values, all this will be
taken care of first and this whole tree would be constructed here. This tree that you see in
this pinkish color here at max level we will add both the children, we will remove max
and add both the children. At min level we will remove min and add one child.

In this case we arbitrarily choose a left side at max level we add both this children at min
level we add one child. And all this will vanish and so, only 4 entries will remain in my
parity queue with the values initially large, but the moment we take a terminal node. If it
is terminal, you label it solved. So, these 4 nodes will be terminal, we will label them
solved and we will put the values that we have, which is 50, 30, 30, and 70. So, at the
end of this so, if you remember the AO star algorithm had these 2 phases, the forward
phase and the backup phases essentially.

So, in some sense this is like the forward phase, whenever we are looking at the live
node it is like forward phase because we are adding the children. Either all children, if it
is a max node or one child if it is a min node, but we are moving forward or moving
down in the tree. But once it is solved nodes, we go into the backward tree essentially.
So, now, we have these 4 nodes and all 4 will get label solved essentially. Because they
have this plus large value, they will always be ahead of the queue.

Even if this one will gets label solved, it will go at the rear of the queue because those
will have plus large. Then, eventually all 4 will get plus these values, and they get sorted
70, 50, 30, and 30 in the priority queue. We always pick so, it is a priority queue, I do not
need to repeat that, we always take the node at the head of the queue. So, once we have

572
this 4 solved nodes in my, in the priority queue, the next node that will get picked is this
one 70, one is 70.

So, these represent 4 clusters let us call them A, B, C, D. Each of these clusters
represents 2 strategies, and each of these values represents an upper bound on those 2
strategies. And the best looking cluster is d, and d comes to ahead of the priority queue
and so, SSS star picks that algorithm, it is a max node. So, if it is a max and solved, there
are two cases, one is that, it has a sibling left in the tree all the other cases when it does
not have a sibling left from the tree. So, both these nodes 30 and 70 and in fact, all 4
have a sibling still left in the tree.

So, they have this case essentially. So, in case they have a sibling, you replace with
sibling and lower value, whichever is the lower of the 2 values is you give that value to
that cluster, why? Because remember that, the value of a strategy is always the minimum
of the value of the leaves essentially. And if you are looking at one more leaf in the
strategy, and if it has a lower value, we must keep the lower value. If it has a higher
value, we must keep the lower value from the first one, it does not matter where the
lower value comes from.

So, keep the sibling as a representative, replace with sibling and with a lower value, if
there is no sibling. So, I will write else here, and this else means that there is no sibling,
replace with parent. So, we will come to this so, let us follow these steps. So, this 70
would be the head of the queue, and it will get removed from the queue, and it is sibling
would be added. So, which amounts to say that we are exploring this node, and we are
looking at this node then, the value of this cluster would still be 70 because that is the
lower of this two values 70 and 80.

So, this would still be at the ahead of the queue because it has its value 70, it has no more
siblings left so, this will get added to the queue and this will be get label solved. So, let
me say this represents label solved as a, as we had done in the AO star algorithm. If it
has no sibling, we place it with the parent with that value. So, still this cluster D is
represented by this node here and its value is 70.

At this point, observe that because this is a max node here, this is never going to play a
role any further because it has an upper bound of 30, and max is getting 70 from here,
max knows it is getting 70 from here. This is like an alpha cut off will take place this

573
place, either you do it explicitly or it will languish at the end of the queue never coming
to the fore essentially does not matter.

That is, that step of alpha cut off is captured by this step here. If it is min and solved
now, just think carefully about this, if it is a min node and it is solved and implicitly we
are not saying this here, it is at the head of the queue, it is at the head of the queue. So,
this is solved and it is at the head of the queue. It has a sibling, but the sibling is never
going to play a role, why?

Because it is a min node, it is, it is got an upper, it is, it is value got a, got a value of 70, it
is sibling has an upper bound of 30, and this sibling is never going to influence this max
node, and it is going to not consider this at all. So, this is implemented by saying that, if
it is a min node and if it is a solved node then, just replace it with its parent. So, now, this
and the, parent gets label solved.

So, I should say that, parent and solved in both cases whether it is a max node or a min
node. For a max node if there are no siblings then, the parent gets label solved and gets
the value. If min node, it does not matter whether there are siblings or not, but its parent
will get label solved, and it will get a value of 70 in this case. Now, we go back to this
step that node is solved and that is a max node, its sibling is not yet solved essentially it
is not a terminal node we are somewhere up in the tree.

So, we replace this with the sibling, the same process if max is solved replace with
sibling and lower value. In this case the value is 70. So, we basically this node becomes
live with a value of 70. It amounts to say that I want to evaluate this with a bound of 70
essentially. Now, since it is a live node, it is like a recursive call essentially henceforth. It
is a live node, we follow this, add all children for max, one child for min. So, it is like a
recursive call which means we add all children for max and one child for min.

So, we first look at this and then so, all these added to the queue, this comes to the head
of the queue because it has a value of 80, it is a terminal node. The moment we see a
terminal node we missed it. This comes to the head of the queue then, it gets a played
biased child with a value of 80 then, there is no more siblings left, this is still at the head
of the queue. So, this gets added to the queue with the value of 80 and solved. The
moment this gets label solved with a value of 80 its parent gets label solved with a value
of 80, and when the, when a max node.

574
So, remember that we are always considering the nodes at the head of the queue and the
highest value node we are looking at that all times. This is at the head of the queue sorry,
this is not correct with the value of 80, but with value of 70 essentially the lower of the 2
values. Now, there are no more siblings left for this max node at the head is the max
node with the label solved so, this is already gone away actually, it is been thrown away,
only one copy is been kept it is. So, no more siblings are left so, this min node gets label
solved with the value of 80.

Now, since it is a min node and it is solved, it does not care about it, and it is at the head
of the queue, it does not care because this means whatever the bounds that are coming
from elsewhere will be lower than this, it is in the hydraulic queue and max is going to.
So, max is getting a value of 80 from here it is getting a value of utmost 50 or 30 from
here depending on what happens in these other nodes which we have not seen, but
utmost the value of 50.

So, they are behind in the queue, the moment a min solved node comes into the head of
the queue as we have said here, we just replace the parent with the solved node. In this
case we replace this with a value of 70. So, what have we done? We have looked at this.
This SSS star algorithm has looked at 1, 2, 3, 4, 5, 6, 7, nodes essentially before it
terminated. Notice, it terminates with the same value of 70 which the alpha beta found
and which is of mini max would have found essentially.

So, all the three algorithms mini max algorithm, alpha beta algorithm, SSS star
algorithm, we find the same mini max value obviously, otherwise they would be
different. So, they find the same move with the same mini max value. Mini max would
have seen all these 16 nodes, alpha beta sees only these 10 unshaded nodes that we have
drawn here, and SSS star sees only these 7 underlined pink nodes that we have seen here.

What is more important? Observe that it is attention is always been focused towards this
side of the tree where the best moves lie for both the players, and it has basically solved
this part and ignored this part altogether essentially. It is not seen this node for example,
which was seen by alpha beta of this node, which was seen by alpha beta and so on
essentially. So, in fact, it was shown by stockman that, alpha beta that, SSS star, if any
node is seen by SSS star, alpha beta will also see that essentially. Where? Yes because
this was 70, this also should be 70 and solved.

575
So, SSS star is the best first variation of alpha beta you can see. The only difference
between what we said earlier when we talked about best first search, we used a notion of
a heuristic function which was domain dependent. In this case we do not use the heuristic
function in that sense, but we use an estimate of the solution. A heuristic function also
gives an estimate of a solution, but here we have a different mechanism to arrive at an
estimate is that is by sampling.

By sampling all strategies to get upper bounds on all possible strategies, and that in the
process we found these clusters. And then we always refined the best looking cluster. So,
that is where the best first nature of the algorithm comes into play. So, I want to end with
a couple of things, one is that if you are talking about a real world game playing
program.

(Refer Slide Time: 40:10)

Then, you do a certain amount of search, you do a cape line look ahead, but there is a
danger of lurking here. Of course, the farther you can see the better, but because the tree
is growing exponentially, you cannot do too much search and most algorithms do not go
beyond 10 ply also essentially.

Now, this something called the horizon effect, and this is as follows that, this supposing,
there is one particular line of play which is of interest, which is being evaluated by
certain mechanism as which happens naturally in this. So, let us say this sequence of
nodes has a role to play in determining that either this better or something else is better,

576
it does not matter. It is an important sequence of events moves. So, each is a move. So,
max move, min move and so on.

Now, supposing in this you were to insert a set of arbitrary moves which are pointless let
us say, max makes some move, max makes a move, min makes a move, max makes a
move, makes the move. And let us for arguments sake say that, this state is equal to this
state. It is possible, if you are looking at a game like chess for example, max might make
a knight move, min might make a knight move, max might take the knight move back,
and make min take its knight move back.

So, both have made a move and undone the move and so, they are in the same state here.
But what happens to the search? In search supposing, we insert this whole sequence here
then, this part gets pushed out of the horizon because of this arbitrary move that you are
doing, this cart gets pushed out of the horizon which means that, something that is
important which was happening in these moves is no longer noticed by this algorithm
because it is search is only till the horizon.

So, this effect is called the horizon effect, and what people have done is that, we often do
a secondary search, before making a move they do a little bit of secondary search to
verify that the move is indeed not a there are no catastrophic lurking behind that move or
something like that. So, very often people do this. Obviously, the secondary search will
only do this much amount of search, which is at last less than actually searching this
whole thing up to this depth, that would have been meant much more worth essentially.

And finally, I want to end with an example, which shows that there can be a limitation to
search. So, this is a well known example which was fed to this hi-tech program, this hi-
tech was a multi processor, 64 processor chess playing machine developed by Berlino in
Seymour, and this position was given to hi tech and it illustrates that search is not always
capable of doing something, unless it is full search of course, which are other forms of
reasoning can do. So, the position is as follows.

So, let us say, these are pawns P, P, P, P, P. So, white has this position. So, it is a very
contrite position somebody invented it just to show that search can have its drawbacks.
So, these are pawns, 8 pawns white house, one in each column, and white has only the
king left here essentially. Let us say, the opponent also has 8 pawns which are kind of

577
head to head with these 8 pawns. In some sense, creating the kind of a dead lock, but the
opponent has other pieces also.

So, opponent has for example, a bishop here and a rook here, and a rook here, and a king
here. So, the opponent is strong in terms of material advantage. This white player has 8
pawns and the king, this red player or black player whatever you want to call has 8
pawns and a king, but also has 2 rooks and a bishop essentially. So, there is a significant
material advantage essentially. What happens if this red moves this rook here? That
makes this move and now its white’s turn to move and this position was given to hi-tech
essentially, this is called the poisoned rook.

So, if you just look up poisoned rook on the web, you will get this position and some
story behind this essentially. So, what is the story? Red moves the rook here, in some
sense offering it to this pawn, right? White can capture this rook like this, if you know
the chess rules, and white, the game playing algorithm that we have been talking about
will basically do this. They will do some k ply search, apply the evaluation function.
What is the evaluation function? We had seen it is a combination of material advantage
and positional advantage.

And then, choose a best move based on this evaluation function. What will this poor
white program do? It will see that, it is getting to capture the rook. As a result of which,
it makes this move of capturing the rook, but this rook as a title says, it is a poisoned
rook because what happens? Once this pawn moves away from here, this impregnable
fortress that right had. So, observe that these rooks cannot attack any of the pawns
otherwise because rooks can move only in this direction, and the only bishop that black
has it is the one which will attack its own color.

So, if you know chess you will see that you will be for example, all black squares and
these will be all white squares, and it has a black bishop, it can never attack, it can never
break, left to itself black can never break into this fortress that white has constructed. But
the moment white moves one pawn from this chain, it opens its territory to black and
then, black can actually as you might say, invade from here and win the game
essentially.

So, even players, most even players will look at this position and say yes. The only thing
that the white can do is to move the king around and then, black cannot do anything, but

578
this chess program which was not able to reason at a what you might say as a higher
level or a meta level, or see further it can only do this search of a tree thought that it was
going to get some material advantage.

So, it captured the rook and actually lost the game essentially. So, which is the, thus the
lesson here is that to build intelligence systems, you need not just one form of reasoning,
but many forms of reasoning working together. The other form of reasoning is that kind
of reasoning that we are talking about here know, analyzing the structure in some way or
making some difference about this thing which this program is not able to do.

So, we will stop here with games, there are other games plays algorithm, game playing
algorithms that we will not consider. For example, Berlino had a algorithm called B star,
which we will not consider, which is also a kind of a heuristic search algorithm which
had a sense of direction, but we will limit ourselves to the SSS star, which is a much
simpler algorithm to talk about essentially. So, we will end with games here and move on
from this place essentially.

579
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science & Engineering
Indian Institute of Technology, Madras

Lecture - 30
Rule Based Systems

So, shall we begin? So, the last few, the last few lectures that we had on games was a bit
of a diversion intended to get you ready for the assignment, which you will soon give
you, possibly next week. We come back to this idea of problem decomposition.

(Refer Slide Time: 00:35)

And today we look at this thing called rule based systems, also known as production
systems, also somewhat ambitiously known as expert systems. So, far in our discussions
we have already mentioned 2 systems, one is Dendral and the other one is R 1 also
known as X CON which were supposed to be 2 of the first so called expert systems. And
by this we mean that programs which harnessed the knowledge of human experts and use
that for solving problems essentially. So, this knowledge of that we get from human
experts in this form that we are talking about was generally in the form of rules or
productions essentially.

580
(Refer Slide Time: 01:58)

And the idea is that if you have a state description, then instead of devising a moves and
function which gives takes one state and gives you another state. What we do is we look
at some part of the state and based on this part of the state we device, let us say action
one or another part of the state we look at we device another action.

So, in some sense we are not looking at the complete state, but we are looking at what
we call as a pattern in the state. So, these things are patterns. Now, this pattern is
typically described in the form of a production or the rule and I will use the syntax of a
language called Opius 5, which I will shortly tell you. But the basic idea is that the
structure is as follows.

So, you have a rule name and you have pattern 1 and pattern 2 and so on, some number
of patterns and I will shortly describe what these are followed by action 1, action 2 some
number of action essentially. So, this is the structure of a rule also known as a
production. And the basic idea is that this is called the left hand side this is called the
right hand side because of the arrow here. This is also called the antecedents and we can
think of this as a consequent.

So, a rule is basically a set of antecedents followed by a set of actions or consequents and
it is a very modular form of knowledge representation. We have so far not used the word
knowledge representation at all in this course, we have assumed that somehow you will
have a state representation and somehow you will have a moves and function which will

581
move from one state to another. Now, with this we for the first time we are talking about
how do we actually represent things.

And the idea here is that knowledge is represented in the form of rules. So, here on the
left hand side followed by the right hand side here. So, the left hand side that we are
talking about here is essentially the pattern that we are looking at and if the pattern
matches then the action can be done. So, for example, the system R 1 had rules like. So,
you should look up the actual rule in X CON or maybe I can just read out a rule from
here. This says, this is the rule from R 1 its rule name is distribute (Refer Time: 05:36).

So, remember R 1 was the system which was used to configure bags machines, deck
vack machines which was some of the most advanced computing systems at that time
which was in the late 70s. And this rule reads the following, if that is the left hand side,
the most current active context is distributing mass first devices and there is a single port
disk drive that has not been assigned to a mass bus and there are no unassigned dual port
disk drive and so on and so forth.

And then the action is, then assigned the disk drive to the mass bus. So, it has a set of
conditions on the left hand side and in this example only one action on the right hand
side essentially. So, the idea is to capture knowledge of a human in this form and use this
knowledge to solve problems. Now, how does that happen?

(Refer Slide Time: 06:45)

582
So, there is this idea of an inference engine. And this inference engine takes on the one
hand data and on the other hand rules and it generates the, I will just use the loose term
to say the solution or a sequence of actions. So, the idea behind rule based systems which
also were for some reason called expert systems and production systems was that the
problem solver or the human expert or the domain expert will only provide the rules for
solving problems.

So, rules like this if some conditions are true then some action happens, if some
conditions are true then some action happens and so on. So, you might say for example,
you might have a rule which says is a particular customer worthy of being given a loan?
Then a bank manager may have a rule of certain kind that if this persons income is so
much and if he is been a customer for so many days and if he has no outstanding loans
then you can give him a loan of whatever 10 lacs or something like that.

So, you can have a rule like this. Now, the thing is that if we allow the domain expert to
express rules like this then they do not have to worry about program which works on this
rules. They only write the rules and somebody else in this case inference engine does the
work for them essentially.

So, this inference engine is the one which will actually pick rules from the collection of
rules when a new problem comes beside which rules to apply and then generate the
solution essentially. Alternatively you might have a rule, the railways might have a rule
based which says who gets what kind of concession and so on so. And if you are a war
veteran you get a certain kind of concession and if you are a sports man you get a certain
kind of concession. If you are a student going home you get a certain kind of concession
and that kind of stuff.

So, all these so called what which people now days call as business rules are described as
a domain people. They are not worried about computer, computing and programs and
things like that. There task is to only provide the rules essentially. We could have a rule
for example to say, good student award which might say that if the student has done all
the assignments and the student has done well in courses. And if there are no proxies in
the attendance then he is a good student we can give him an award or her an award.
Focus we do not do that. So, you do not have to worry about it.

583
(Refer Slide Time: 09:40)

So, the basic idea behind rule based systems is this that a rule is a pattern, action,
combination, there is a collection of rules. So, this is a whole set which the domain
expert provides. This data pertains to the current problem that you are trying to solve and
inference engine. And we will look at this in some detail today. Is the one which does the
job of picking rules and applying them to data and arriving at the solutions essentially.
Which is one reason why some people have called this as pattern directed inference
systems. In fact, there is a book by this name, it is a little old.

Now, the thing for you to observe and we will do this over the next, this lecture and the
next is that rule based programming can be seen as a different paradigm of programming
in itself essentially. Now, in imperative programming which is what most people are
used to languages like c and Pascal and so on. The task of the programmer is to give a
control flow, say do this action, then do this action, then do this action.

And of course, you may have more complicated things like conditional statements and
branches and loops and all this kind of stuff, but it is the programmer which specifies
what actions have to be done essentially. Now, you can think of a rule based system, you
can in some sense port it to an imperative language and you can. So, remember that the
rule based systems basically consists of a large number rules set, the domain expert
rights.

You could convert it into an imperative program and you could have a sequence of if

584
then statements, if this then this else this then this and so on and so forth. The control
flow is fixed and rigid in imperative program, in rule based systems. On the other hand
you can think of this as follows that you have this data...

(Refer Slide Time: 12:00)

in some pool and there are a set of rules hanging above this. And when I say hanging
above this, this has to sort of make you think about it slightly differently.

In the sense that there is no control relation between the different rules. You are not
saying apply this rule first then apply this rule first and then apply this rule first and so
on. You are simply saying in each rule you are giving a little bit of modular piece of
knowledge we say if you see this pattern then this is the action that you have to do with it
essentially. And all the rules are kind of floating around.

So, this is a what we would call is as a rule base and what the inference engine does is
that it will pick some rule and apply it to the relevant data and do the relevant action.
Then they may pick another rule and apply it and so on and so forth. And it will keep
doing that. It is a inference engine, and we will look at this in a little more bit detail,
which decides what are the actions which are actually done. As opposed to an imperative
programming who programmer says do this action then you do this action and so on and
so forth essentially.

So, before we get into the details, I also want to highlight the difference between 2 kinds

585
of reasoning and we have, we have slowly talking about this thing we have not really
done so.

(Refer Slide Time: 13:40)

One is goal directed or backward reasoning and by this we mean that you reason from
goals to sub goals in some sense. We have already seen an example of this when we
were talking about planning an evening out when you say that evening out is planned if
you have an outing plan and if you have a movie plan and if you have a dinner plan. So,
those are the sub goals essentially.

So, you start with a major goal and then you come to the sub goals essentially. And there
are people, a lot of people who feel that human beings by and large are goal directed
reasons. They are backward reasons, they do not reason from. So, this is goal directed
versus forward or you might say data driven reasoning. So, data driven reasoning is some
kind of an algorithm which looks at data and says that if I say this pattern, I will do this
action and so on and so forth.

So, the decision of what to do next is driven by, is decided by the data that is available to
you, which is the form that we will be looking at today. Goal directed reasoning says that
if you have to achieve a certain goal, what sub goals should I achieve essentially? So, for
example, you might say that if you want to get a scholarship into a good US university
for doing post graduate studies then what do you need to do?

586
You need to write g r e, you need to do well, you need to get recommendation letters,
you have to write a excellent statement of purpose all this kind of stuffs. These are sub
goals and to do that to get good grades what you have to do? You have to attend classes
and study hard and write exams well and so on. So, if you go from goals towards sub
goals and essentially, eventually towards actions then you can decide what are the
actions that you need to do essentially.

Again let us say that if I have to plan a trip from here to mandi then in goal directed
reasoning I would say that first maybe I need to go to Delhi and then from Delhi I need
to fly to Chandigarh or take a train to Chandigarh and then take a bus or car or something
and then work out the details later essentially. How do I go from here to the airport?
Now, how do I book my ticket and how do I go from? The lower details come later. In
goal directed reasoning, you reason from the high level goals to the low level goals and
at the lowest level you have action essentially. Now, there is a corresponding
implementational comparison that you can make.

(Refer Slide Time: 16:45)

It is called backward chaining versus forward chaining. So, by backward chaining and
forward chaining we essentially are talking about how the programs are implemented
whereas with backward reasoning and forward reasoning we are talking about what is the
kind of reasoning you are doing. So, by and large of course, backward chaining goes
well with backward reasoning and forward chaining goes well with forward reasoning.

587
But it is possible to do forward chaining and implement backward reasoning or it is also
possible to implement backward chaining and implement forward reasoning essentially.
So, this combination of backward reasoning and backward chaining sorry backward
reasoning and forward chaining is something or actually you should call it backward
chaining.

So, this combination for those of you have written programs and prologs you would
recognize that in prolog you are doing essentially backward reasoning and if you view
the sign, this sign in prolog as in backward pointing arrow which we will do when we
look at logic a little bit later in this course. We will look at this then you can see that you
are essentially chaining from this to this. So, in backward chaining you are chaining from
the right hand side to the left hand side which means you match.

So, this is actually the right hand side, why because the arrow is pointing this way I
could have drawn it like that, but in prolog you write the goal first and then you write the
sub goals, if you remember prolog that if you want to sort an array then you have to have
a combination of array which should be sorted and so on and so forth. So, it really
depends on the match which side. So, if you look at this rule format LHS and RHS then
in backward chaining you are matching here with the RHS and moving to the LHS. So,
you are asking that if this has to be true then can I show that this is true.

(Refer Slide Time: 19:08)

In forward chaining, I match the LHS and go to the right hand side. If I see this pattern

588
then I do the sections. So, we will be focusing today on forward chaining mechanism
because it is a very widely used technique in the industry. So, I gave you a couple of
examples about banking and giving loans and so on and so forth.

(Refer Slide Time: 19:31)

But there is this whole field called business rule management systems. So, if you just
look at, look up this thing B R M S, there is a whole community out there which is
developing software for people in business meaning it could be industry, manufacturing
industry, banking business or anything.

People who are domain experts, to exploit computing power the basic mechanism is this
that all you ask them to do is write the rules and you implement the inference engine
which is the general purpose inference engine. It is like you can say a search engine is a
general purpose program and in similar manner inference engine is a general purpose
program which looks at their rules and their corresponding data and makes all the actions
which are necessary to be done essentially.

Now, let us get down to some examples. So, the language that I am using, we do not
really have to use it and in, I mean you may not have to implement it is called O P S 5.
This is the language is devised, which was devised I think in the 70s,0 also at (Refer
Time: 20:56) university which is one of the centers where a lot of a i related work was
done in the middle of the last century.

589
(Refer Slide Time: 21:05)

And some people say that O stands for official. So, it is a official production system
language and its version was 5 became popular. It was developed by a guy called Charles
Forgy who implemented this algorithm that we are going to look at for his p h d thesis
and then eventually of course, it became something which was commercial in nature. So,
of course, what we need to do is to describe what is this language, what is the syntax of
this language. And so I will focus we will first focus on the pattern. So, how do patterns
describe in OPS5.

So, they follow the, something like a, like a object oriented way of representing things
which are like classes. The syntax that this language uses is that it uses p to stand for
production then name oh so sorry that is the language of a rule. So, we will come to rule
a bit later. First we want to talk about patterns. So, there is a class name followed by
attribute names eventually. So, very much like the class name in a. Remember that
patterns are these things which are will which constitutes the left hand side of a rule and
each pattern is made up of a class name followed by a set of attribute name.

So, this is the data structure that we are using in this language OPS5. So, for example, if
I am implementing a program to play cards, I might have a class name called card then I
might have a name of a card for example, ace or queen or 10 the suit of a card. So, for
those of you who are familiar with cards spades and diamonds and clubs and so on. I
might have the name of a player who is holding that card and I might have rank of the

590
card. So, by rank I mean the current rank.

So, in most games for example, ace is rank 1 and king is rank 2 and so on. But as the
game progresses these ranks might change. So, I might want to store that essentially. So,
this is a, this is a class description and correspondent to this class description I may have
several patterns.

(Refer Slide Time: 24:31)

So, the patterns are the left hand side of the rule and a rule is signified by this opening
bracket followed by p which stands for production, and then the production name for
example play any card. Let us say I just want to write a rule which says play any card
that you have essentially. And in this rule I might have a pattern which says that you
have a card and I might just specify this essentially. So, let us, I will come to this later.
So, next let us just talk about the different kind of patterns that we can have.

591
(Refer Slide Time: 25:18)

So, we can have just a name of the class as a pattern which means that if the inference
engine can see one data element in your data. So, remember we have this data which has
a class name card then that will match that would. Our task is to eventually going to be
the match the pattern and see which patterns match. Of course, there may be more than
one rule which may be matching, we will come to that later and then what action can be
done.

So, for matching the rule we have to match patterns and each pattern should match some
class description. So, if my this class name is card and its attributes are name, suit, player
and rank. Then the pattern must be an instance of that essentially.

592
(Refer Slide Time: 26:12)

So, I can have for example, card, name, ace, suit, spades. So, here I am saying that so
this is the way data is represented. This is the way the matter rate as given as to which
describes the data. So, data itself is written like this.

(Refer Slide Time: 26:48)

So, I have a element which OPS5 data is in the working memory which is called WM.

593
(Refer Slide Time: 27:06)

And each of this is called a working memory element, these is just the terminology that
they use. So, this is a working memory and inside these of course, there are this working
memory elements where each element is an instance of some class name which maybe
not completely specified, which maybe incompletely specified.

(Refer Slide Time: 27:29)

So, for example, here I have only said that there is an element which belongs to the class
name card. Here I am saying there is an element which belongs to the class name card
and the attribute name has value ace and the attribute suit has value spade essentially.

594
Of course, I can specify other things also, but any partial description can be there in the
pattern. We are talking about patterns in rules essentially. I can say things like card,
name. So, this angular brackets are used to distinguish variables from constants. So, ace
is a constant, but anything within angular bracket is a variable and the variable is which
will match anything essentially. And I can have other kinds of relation. So, for example,
I can say rank greater than 1 and so on. So, these are the patterns. So, the left hand side
of the rule is made up of collection of such patterns.

(Refer Slide Time: 29:06)

You can also have a pattern which says, it is a negative sign here.. So, this pattern should
be interpreted differently. It should be interpreted as saying that there is no such pattern
in your, in your working memory or there is no such data element in the working
memory which says that player x, this is a variable x has a card of spades.

So, such a pattern would be used to describe the fact that a given player does not have
any spade cards in his or hand, her hand. So, this negation sign basically is the opposite
of this. This says that this must be present in the database or in the working memory.
This says that it must be absent in the database and only then this rule will match
essentially.

We will look at some examples. Now, actions so remember we have patterns and we
have actions. So, the 2 most important actions in ops 5 is one is called make and after
this you describe the working memory element. So, you can say for example, card,

595
spades sorry suit, name, age and so on and so forth. So, what this action does?

This makes action. It creates a working memory element and puts it into the working
memory, how it otherwise it creates a data element and puts it in the database if you want
to call it a database. Corresponding to this there is also a remove action. Now, a remove
action takes an argument like 2 and this 2 as you will see in the example the first 2, the
second pattern in that particular rule. Remember these actions are on the right hand side
of a rule and this action is saying that remove the pattern, remove the data element which
match the second pattern. So, this pattern number 2 matches some data.

So, that will match something, some working number x let us call it. And this action is
saying that, remove that from the working memory or remove that from data essentially.
If you can combine this you can have a action called modify. So, for example, as the
game progresses you want to change the rank of a card essentially. So, let us say
somebody has played the ace of spades and now you want to say king of spades has rank
1 then you could say modify that particular working memory element and change the
rand to value 1 that is all.

So, we have the so basically we have this 2 actions make and remove, but these can lead
to a third action called, any modify action can be seen as a combination of make and
remove, remove the old one and put in a new one, it is like modify. Apart from that you
have standard actions that you may need for example, reading from a file and so on
which we will not go into here because we want to focus more on how reasoning is done.

596
(Refer Slide Time: 32:54)

So, there are actions like read, write, print. There is even a action called halt which says
that stop the system essentially and exit essentially. So, we will not look into those
details here. So, let me now give you a examples of a few rules and then we will see how
the system operates with these rules essentially. So, I will write these rules for a card
game and I assume that everybody has some in cling of what card games are like
essentially.

(Refer Slide Time: 33:35)

So, let us say we have a rule called play any card and the pattern is let us say there is a

597
class name called turn which I have not described. I have only described one class name
here which is this. But obviously, you will have many different class names in your
system.

So, let us say that you are carefully keeping track of whose turn it is to play next into one
data element which is belongs to this class kind type turn and it says turn to play of some
variable. So, we have not specified who, some player p essentially. And let us say that
already somebody has started playing a particular suit. So, in many games in which you
have to play the card of the same suit at somebody else has done. So, this is saying that
the suit that is in place s, some variable name and this is some player p and this player
has a card of suit s.

So, notice that there is no order sanctity here essentially. I may say that this name, suit,
player, rank, but that does not matter because we are segregating the attribute names with
the attribute values. You can specify the attribute names in any order. You do not have to
specify them in this particular order. So, here I said card with suit s, player p. So, all I am
saying.

So, whatever I say here that if it is a players turn to play and the player could match
anything. And if the suit in play is s and if there is a card which is of suit s and being
held by player p then that player p should play that card essentially. So, let us say I use
this make. So, this is one rule and it has 3 patterns on the left hand side which says that
there are 3 different class names.

Of course, we are interpreting it as a card game that, if it is p’s turn to play and if the suit
in plays s, and if p has a card of this suit s whose name is n then make working memory
element called play player p that card essentially. And then of course, maybe we will
have another rule which will take it and print it onto the screen or something like that let
us not bother about that.

598
(Refer Slide Time: 37:27)

So, let me give you another rule just to so if you want to not play any card, but you want
to play the highest card essentially then what should the rule look like. The first 2 will be
the same, turn. So, let me write this here. So, the first 2 patterns are still the same as this
rule, but the other patterns will change.

Now, you want to specify that the card that you play must be the highest card with your
suit and we can do that by saying the following next card, player p, suite s. So, first of
course, we are looking for a card being held by this player and in this suit. Let us say
name n and rank r because we are interested in picking the highest card we should talk
about the rank and we should say that there is no higher card essentially, which means
we can say that there is no card the same player p, name we do not care about. So, this
the that negative thing that I was telling about. So, between these 2 patterns we are
looking at a set of data elements and picking the one with the highest rank which means
the lowest number. So, rank 1 we want to look at or rank 2 whichever is the lowest
available.

So, we are saying that card, player p, suite s, name n, rank r and there is no card with this
player p of this suit, same suit s whose rank is smaller than this r. So obviously, this r and
this r must match the same value then the same thing. And you can write more rules
depending on you know what kind of game you are playing and what is the strategy you
want to adopt and that kind of stuff, but we would not get into those details here

599
essentially.

But you should observe that this one. This combination of this positive pattern and the
negative pattern can be used to pick the highest rank. So, for example, I could use such a
rule, variation of this rule to start grading or something and say let me pick the student
with the highest marks and do something and then you know that kind of stuff. So, this
can be done. So, the illustrate the power of this let me show you another rule. Supposing
that I have a array of numbers which is signified by index and value.

(Refer Slide Time: 41:12)

And if I have a rule of this kind or this pattern which says that array index i, value n and
array index j greater than i. So, I can do like this, I can put this in brackets and say that
the index value is j and this value j is greater than this value i that we are talking about
here, which means if I am talking about the 5th element here I could be talking about the
7th element or something like that. Let us say value m which is greater than value n. So,
I am not. So, some rule it is there.

Then I do the following, modify one value n. So, I have 2 actions and 2 patterns. The
first pattern let me read it out it says that index value is i and the value is n. The second
pattern says index value is j which is greater than i and the value is m which is greater
than n then I am saying modify this first pattern. Now, the data matching that first pattern
and change its value attribute to m, which is what I have taken from here and modify the
second pattern and change its value attribute to n essentially.

600
So, what am I doing here? I am swapping 2 elements if they satisfy this condition right.
Now, this single rule if I let it loose on the data then it will end up sorting as the same the
entire array essentially. So, just think of you have written a algorithm for sorting
elements. You have looked at all kinds of sorting algorithms and you wrote them.

(Refer Slide Time: 43:53)

But here is one single rule, this much. And this will do the job of sorting for you
essentially. It will repeatedly apply the rule and we will, we will come to this repetition
part in the next class, but it will repeatedly apply to, if it is, it is if any 2 elements out of
place it will swap them essentially and it will keep doing that till this rule matches and
eventually of course, everything will be in place or sorted and then this rule will no
longer apply and your array would be sorted essentially.

So, this also gives you something to think about you can write a program. In fact, ops 5
is a complete programming language like any other programming language like prolog
or c plus, plus or java or python or whatever. And you can write any program in any
language, but in some languages some programs are simpler to write. So, this program of
course, is very easy to write, but what can you say about this performance? It will
depend on so many other things which we will address in the next class.

So, supposing you have many elements which are out of place. This inference engine
here has to decide which instance of the rule to execute next and therein of course, lies
the key to efficiency. If you exchange the correct elements first then you will do things

601
faster and so on and so forth. But of course, there is no way of knowing that here, but it
is a very simple program to sort in this essentially.

So, the idea of rule based systems is that somebody sits and write down, writes down this
rule in this thing and then we give it to an inference engine which basically does the
inferencing for us. Which means basically it picks a rule, applies it then picks the next
rule then applies it and keeps doing that till it either runs out of rules to apply or it runs
into something like a halt statement here which we have explicitly given.

So, maybe if you have a certain goal in mind and if that goal is achieve you will say if
this goal I can see the goal being achieved then halt essentially. You do not have to keep
running the program essentially. So, in the next class you will look inside this inference
engine and it is a very well known algorithm which is very popular and has extensive use
in the industry for developing these systems.

So, basically the idea in the industry is that we will write business rules, we will say
when to give a loan, when to do this, when to do that, but your program must do the
computations for us. They must, your program must accept the rules and find the
solutions that we expect the program to produce. So, in the next class we will look at
inside this inference engine and how it works essentially.

And we are going to focus on the forward chaining inference engine, a backward
chaining engine is already you are familiar with something called prolog. And we might
just look at it a little bit later on in the course, but today we will focus on forward
chaining essentially. So, I will stop here.

602
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Lecture - 31
Inference Engines

So, we are looking at rule based systems power changing systems, and now we want to
look at the inference engine.

(Refer Slide Time: 00:22)

So, the inference engine is basically a program which works in three phases. The first
phase is called match, and what it takes as input is a set of rules and some data and
produces a set of matching rules which in their terminology is called the conflict set. It
produces the data. So, let us first see what is match is doing.

603
(Refer Slide Time: 01:30)

So, if you have these rules. Mainly notice that we have written this two rules in the last
class. This rule has three conditions 1, 2, 3. This rule has four conditions, this is 1, this is
2, and 3, and 4. One of them is in negative condition. So, essentially a rule is a collection
of patterns and actions. As far as matching is concerned, we are only interested in
patterns. So, we can think of this as P 11, P 12, P 13. So, that is for the first rule. Then, P
21, P 22, P 23, P 24. So, for this rule for example and then, so on.

So, P K R or whatever, depending on how many rules you have and how many patterns
each rules has essentially. So, that is rule memory and when Simon annual for talking
about this kind of problems. They would associates it is a long term memory. So, they,
so, basically rules capture the problem solving solver knowledge essentially, which is
always there with the problem solver so, you can think of it is long term memory.
Whereas the actual problem that you have trying to solve sets in a memory which we call
as working memory or WM which is also ordered. So, you have this data elements which
we call working M E 1. So, this is the data item 1 then 2 and so on essentially.

Now, remember that the actions on the right hand side of rules, I allow to delete data
which means they might delete some working memory element. So, some things might
vanish from the way. But initially let us assume that we have some capital N number of
working memory elements. Now, this can be thought of as a time stamp, which says

604
when was this working memory element created and as far as we are concerned time is
just an ordering. So, this is the first element, second element, third element.

We will see that this plays a role in the control of the execution essentially. So, we can
also think of these as short term memory. If you want to think of this as how we solve
problems essentially. So, we have these two memories, one is the working memory
which contains is working memory elements, and we saw examples of that. The other is
the set of rules where each rule consists of a set of patterns. So, this rule has 3 patterns,
this rules has 4 patterns and so on and so forth. And these are the examples of such rules
essentially.

Now, for a rule to match, now the first task is to match which means look at all the rules,
look at all the data and compute which rule matches with which data. So, if you look at
this first rule for example, it says that to play any card it does not specify which card to
play. Now, if you a just starting the game and let us assume that this rule here. So, let us
say this is, this one of the working memory element is this here. So, let a say this one is
turn to play S. So, let us say S is a name of a player. So, it is S turn to play. So, there is a
piece of data, here there is a pattern which says turn to some variable name. So, this will
match with this, which with this data essentially.

And let us say, this element is suit to play, let us say club. So, this is a working memory
element which will match that pattern, second pattern that suit in play, in play. This is a
constant. The data will only have constants, variables are only in the patterns and if a
pattern has a variable it means it can match anything essentially. Whether this was clubs
or diamonds or hearts that rule would still match essentially. And then look at this third
element that player P, in this case P is S has some cards of, let us say this player has 6
cards of clubs. So, there would be 6 card, this player is S and suit is clubs.

Let us say name is 3. So, let us say he has 3 of clubs and he has a 7 of clubs and the 9 of
clubs and the 8 of clubs and jack of clubs and the queen of clubs. So, let us say this
player has got 1, 2, 3, 4, 5, 6 cards of clubs essentially. That is effective and 6 working
memory element in which says and each memory element is saying that there is 3 of
clubs it is held by S. So, let us say S stand for south essentially. Now, you can see that
this rule, the first rule will have 6 instances, for each of these 6 cards 1 instance of that
rule will fire will match.

605
So, 1 instance will says yes it is south’s turn to play, the suit in play is clubs and he has
got the 3 of clubs. Another instance of that rule will say yes it is south turns to play, suit
in plays clubs and he has got a 7 of clubs. So, like this for each of this cards, 1 instance
of the rule will fire and they may be of course many rules in the system and each will
have some instances, it should ready to fire essentially.

So, the task of match this first part of the inference engine is to compute this set of
instances of rules and the corresponding matching data and put it into this set which they
call is a complex set. So, this is rule. So, let us say I call this rule 1 and this is rule 2 then,
my conflict set will have something like.

(Refer Slide Time: 08:53)

Rule 1 matches with first, second and third working memory element. first, second, third.
Let us say this is rule 1 another entry into my working memory element into my conflicts
at would be rule 2 matches with 1, 2 and let us say 4. Another one would be rule sorry
the rule 1, rule 1 matches with 1, 2 and 5. Let us say where 3, 4 and 5 are those these
element.

So, this is the third one, the fourth one, and the fifth one. This is saying that this is an
instance of rule 1 which is. So, these are the times stamps we are using from that
working memory. It is matching working memory number 1 and 2 and 3. This one is
matching the same rule in matching 1 and 2 and 4. This one is matching 1 and 2 and 5.
And likewise for rule 2 and rule 3 and whatever rules we have this set is called the

606
conflict set and the objective of the first phase of the algorithm which is the match phase
is to compute this conflict set. Why do you call it conflict set?

It is like each rules clambering to say I will, I will execute, I will execute and so on. So,
there is a conflict between all these rules. But you can execute only one rule at a time
assuming that you are talking of sequential system here. So, people talk of parallel
systems in which parallel rule firing takes place, but you will not get into that here. What
is the complexity of these task? How difficult is it to compute this conflict set in terms of
how many operations, how many comparison you have to do?

So, the brute force algorithm would take the first pattern which is P 1 1 which means the
first pattern of the first rule and try matching it again solve these things. Then, it will take
the second pattern of the first rule and try matching it with all these things, then the third
pattern of the first rule try matching it with all of this then it will go to the second rule. I
mean here it you do not have to distinguish between rules. We have just a set of patterns.
So, each pattern in this it will try to match with each of this and at the end of it, it will
compute the conflicts set let us keep this in mind.

So, this conflicts set is a set of rules which is ready to execute. So, we use a term execute
or fire. So, rule is ready to fire essentially. We keep this conflict set to step which is
called resolve and it work it does is it select a rule. So, there is another program, another
module to the program which looks at this conflict set and picks one of them to fire
essentially. Now obviously, what this resolve needs is strategy. What is your problem
solving strategy given that many rules of ready to execute which rule should be select to
execute actually essentially.

607
(Refer Slide Time: 12:29)

Now, there are different kinds of strategies. So, one strategy which is called lax in
((Refer Time: 12:33) terminologies or lexicon graphic strategy. It says that choose a rule
that matches or that makes a maximum number of test and what do I mean by test. By
test I mean each individual these things.

(Refer Slide Time: 13:02)

So, for example, I will match the class name with the working memory element that is
one tests. Then, I will match the value of this attribute to play which the working
memory element that is the second test. Third, fourth, fifth, sixth, seventh, eighth. So,

608
this rule has eight tests that it is doing, this rules has a little bit more tests than it is doing.
So, the lax strategy says that choose a rule that makes the maximum number of tests
essentially. So, what is the strategy essentially doing?

(Refer Slide Time: 13:37)

It is saying specificity. In other words, it is saying choose the most specific rule which
matches. Now, if you look at this two examples that we have.

(Refer Slide Time: 13:52)

In these two rules, one rule is saying that, if you have to play a card of suit S and if you
have a card of suite S play that. This rule is saying, if you have to play a card of suite S

609
and if you have many cards of suite S, then pick the one which is the highest or highest
rank card and play that card. Now, obviously given any data that I have so, for example,
I said that we have these 6 cards. So, this will have rank 3, this will have rank 4, and so
on and so forth. Both rules are matching, both rules 6 instances of this rule will match
and 1 instance of this rule will match.

So, there are 7 rules which are trying to tell me which card to play essentially. So, 6 rules
are saying any of these cards and the seventh rule is specifically saying this card queen,
which is of highest rank why because we have said that the rank of this card is r then
there is no card held by this player whose rank is higher or higher than r which means the
the number is lower than r. So, this rule is more specific than this rule. So, this conflicts
resolution strategy at as it is called which is specificity will choose the second rule
essentially or if the second rule matches little.

So, you can see that this strategy allows you to implement what we sometimes call as
default reasoning which is you like many rules essentially. So, it is like saying one rule
says if you are hungry go to the mess and eat essentially. The other rule says if you are
hungry and you have lots of money and you do not have any exam tomorrow then go out
to a restaurant an eat essentially. Now, the first rule is less specific than the second rule.

The second rule requires many conditions to be true that you should have money you
should be free and that kind of stuff. If both rule matches than the second rule would be
selected, if only if the second rule does not match that means you do not have money, or
you have an exam tomorrow then that will not match and then only the first rule will
match and that will execute. So, you can see that the more specific the rule the greater we
would like it to reselected and specificity say exactly that choose a rule, which is making
more number of tests out of the competing set of rules essentially.

610
(Refer Slide Time: 16:48)

Another strategy is looking at the time stamp which is called recency. It says choose a
rule that matches a most recent data and what do you mean by most recent data? Is
basically this time stamp that we have said which has the highest time stamping. So, if
every rule will match certain number of elements remember for example, this is my thing
1, 2 and 3, this is my thing 1, 2 and 4 and so on and so forth.

Whichever rule is matching the latest data choose that essentially. What is the intention
behind this strategy? It is to kind of maintain of flow of reasoning essentially. So, just
imagine you are doing theorem proving, you are proving something, you have proved
some lambda 1 then, you want to use lambda 1 to prove lambda 2 and this rule will allow
you to do that because lambda 1 which is whatever that lambda is could be the latest
entry into a database and if a rule is matching that, that will automatic get selected. So, it
sort of helps in maintain chain of reasoning.

611
(Refer Slide Time: 18:10)

There is a first strategy which is called mea, which stands for may be at some point we
will a discuss this, means ends which is due to a Simon and Noel and it essentially
combines these two by saying recency of pattern one and if there is still a conflict then
specificity. So, I should write else. So, by this I mean that you are only looking at the
recency of the pattern one and pattern one is the first pattern in a every rule essentially.
So, this is the first pattern, this is the first pattern, this is the first pattern and so on.

So, the first element which follows after the rule, look at the recency of that and as you
can see all these are same, but some of the rule may have higher recency and use that
essentially. If there is still conflict that is more than one rule which is in contention then
use specificity to choose between them essentially. So, that is why I have written else
specificity. So, what is the intention behind this? Essentially, you can partition your rule
sets into, set into groups which will solve particular problems essentially. So, let us say
you are making dinner essentially. Then, you might have a set of rules let say to how to
make sambar and another set of rules how to make subjee and so on and so forth.

Now, if you in each of this rules, if you have the first element which says something like
making sambar or making subjee or making chapati or making rice, then the moment you
create a context for saying, okay now I am making rice then you will add that data
element saying making rice that will become the most recent and all the rules which are
concern with making rice they will get priority essentially.

612
And that is done by the fact that recency of the first pattern. The first pattern setting the
context, making rice, making tea or whatever essentially. The moment you said this is
my task I am doing only those rules which have that as the first pattern will be in
contention the rest will not come into play. So, you can see it is a combination of these
two things. It helps you keep focus to what you are doing, but it also tries to see which
rules are best and so on.

(Refer Slide Time: 21:10)

Now, one more thing is refractoriness, this simply says that a same rule cannot fire with
the same data again essentially. So, this comes from some neurobiological studies which
says that, you know when neurons are firing then once a neuron gets certain set of input
they can only fire once with that input and then there is a period of refractoriness in
which they do not, they are not active at all essentially. But from our point of view, we
can see that we do not want the same rule to fire again and again and again essentially.

So, let us say you are doing a classification task or let us say now the task is to classify
students into grades essentially. So, you have said if a student has got more than 95 and
he has done all the assignments then give an a grade. And this let us say, this rule get
selected and it fires. So, you have already done with the task of giving the a grade or
classifying the student in as a student. You do not want that rule to fire again essentially
with the same piece of data.

613
And that is what refractoriness says that every rule can only fire once with that piece of
data. Which means of course, with the same time stamps in the working memory if you
were to delete some data and add the same data again with a new time stamp then it
could fire, that is a different story.

(Refer Slide Time: 22:54)

And then so, its selects a rule and gives it to a module call execute and what is execute
does is to applies the actions of selected rule. So, remember that this resolve module
resolves the conflicts between all these rules and it says this is the rule that is going to
execute and this execute module basically takes a rule and takes its right hand side which
is the actions and in and sort of execute to those actions.

So, remember what was, what was the actions thinks like make or delete and so on
essentially. So, what is this doing? It changes the working memory. The effect of execute
is to change the working memory. It may add some new data and it will deletes some old
data from the working memory.

So, now you have a new working memory and now you have to go back all the way and
do the match all over again. So, this is the algorithm which the, this in some sense a high
level algorithm. Actually, you do implement it like this. We will see how it is done, but
you can think of it as doing like this. That first is a match phase, which looks at all the
rules and all the data and by that mean everything in the working memory and creates

614
this set or this list of rules and the corresponding times stamps of the data elements they
are matching.

Then resolve looks at this set and picks one element out of that and says this is the rule
that we will execute and then when execution happens, the right hand of the side of the
rule may make some changes in the working memory. So, some elements are deleted,
some may be added and then we go back to matching again and selecting a rule and so.
So, we keep doing this till some termination criteria which could be that either the
conflict sets becomes empty at some point, which means no rules are matching all we
have an explicit halt statement somewhere in the program which says if you see a certain
pattern that then you halt and execute.

So, let us now discuss the complexity of this whole process. What can we do to improve
this essentially? Now, empirically people have found which is the hardest part of this, in
terms of complexity, which one will take the most amount of computing time match or
resolve or execute. which one will take the most amount of time? Look at the complexity
of each of the task, look at these tasks, execute what is it doing? So, in this example the
right hand side has only one action which is to add one working memory element with
this particular this one. So, you can see it is a little bit of work.

What does resolve have to do? Resolve has to look at this entire set of matching rules of
the conflict set and applied a chosen strategy whichever the strategy we have chosen, to
select one of these from this and what does match have to do? Match has to look at the
entire set of rules or the entire set of patterns we have compare it with the entire working
memory and try to find out what is the conflict set it is. So obviously, this is a largest
task, the number of comparison we have to do if you have k rules and each rule as let us
say p patterns. So, k into p patterns totally and each pattern may have a certain number of
tests essentially and then, all these working memory elements. So, with each you have to
find.

Remember that you want to find every rule that can possibly execute or every rule that
matches. It is not that you are if you find one or two rules match you are done. All the
rules with match, the complete set must be, this conflict set must be exhaustive. It must
consist all possible rules which are matching. So, this is really the most time consuming
part of the algorithm and people have observed that match takes up something like 80

615
percent of computation time. So, just like you know they say in an industry that most of
the time of computing is spent in sorting for example, most of the time in power
changing inference engine is spent in matching essentially.

You could have other strategies which I have not mentioned here. If you know how
prolog works. Prolog will take the first rule with matches. So, it sort of goes down the
program and the first rule which matches it applies essentially. It does not compute all
possible matches and then selects one. So, that is a different strategy that is the,
something like order of rules or something like that. Here we are considering rules to be
floating in some space and there is no order and any rule is as good as any other rule. So,
what Charles Forgy. So, if you look up some of these terms either ((Refer Time: 29:17))
Charles Forgy or this algorithm which he developed which is called the rete algorithm.

(Refer Slide Time: 29:23)

Which is what you want to study, but I think you do not have enough time today so, we
will do it in the next class. Charles Forgy device the rate algorithm in 1979, his phd
theses at a you and subsequently it became a commercial product. It is available in any
these business rule management engine you talk about, it uses some variation of rate for
matching rules because it is really the hardest part of doing that essentially. So, this is off
course available to us that we have a paper written by Charles Forgy and you can find it
on the net which talks about this is rete algorithm.

616
And then he went on to develop algorithm called rete 2 and from there onwards they stop
disclosing their algorithm to the public because they have commercial interest. They had
this company which was selling that algorithm and did not want to diverse this create
secret that this say.

(Refer Slide Time: 30:38)

And eventually he wrote on algorithm call rete NT. For those of you who are familiar
with windows will see the influence of windows, so just around that time windows NT
came, it was a new thing, and so, this rete NT was new thing. This rete NT is suppose to
be 500 times faster than these original rete algorithm, which means it takes 500 times
less time to the same match essentially. And you can see that the great improvement
instead of having to wait of few minutes, you might have to just wait a second for your
match to be done essentially. Unfortunately we do not know the algorithm, we will only
look at the rete algorithm.

Rete itself, the word rete is a latin word which means the net essentially. And what this
rete algorithm does is to, it compiles the rules into a network which we will see in the
next class to improve upon the efficiency of match. So, what we want to do today now is
to observe where does the inefficiency come from in matching these things. There are
two sources of inefficiency and can you think about then. See one is the following, that
we have many rules and each rule has some number of patterns and those patterns have
to be match with data. Now, if you look at that two rules that I have written.

617
(Refer Slide Time: 32:18)

We have these pattern, we have these pattern turn to play p. So, it was match a working
memory element whose class name is turn and whose to play attribute must match
something, there must be something in the these thing. It could be south, north, east, west
to use the terminology from which. This rule also has a same pattern so, this one could
be here for example, and this one could be here. Likewise as you can see the second
pattern is also the same suit in play S, suit in play S. So, this and this are the same.

Why should we spend computation time doing this match separately? Why not once and
for all save as the this is match, then we can tell this rule there is matching also tell this
rule that is pattern is matching. So, if you can reduce that there will say sometime
essentially. Not only that in this third pattern certain things are common. So, card, suit,
play these things. These also has card, player, suit, name. It has a additional thing called
rank, but at least for those 3 tests or 4 tests, we can share those 4 tests.

So, the first thing that we would like to do is to share the tests the different rules are
doing and different patterns are doing. So, that as far as possible each test we make only
ones or each pattern will match only ones. So, if you match this pattern ones and then
both the rules come to know about it, then we are saving time essentially. So, that is one.
So, this is one source of inefficiency or one avenue for increasing the efficiency of your
match algorithm.

618
But there is another which will occur to you if you look at what this execute is doing.
What is this execute doing? It is making some new working memory elements or it is
deleting of you working memory elements. So, in a large working memory of this kind.

(Refer Slide Time: 34:43)

Let me rub this out. So, what execute my do? It might send a signal that this is to be
deleted, so I will use minus sign for deletion here. It might say this to be deleted and it
might say new one to be added. So, typically a rule when it executes it will delete may be
a couple of element from the working memory, at may add a couple of elements to the
working memory.

Now, what is it doing? After it executes its goes back to the match and at least if you
look at it from the brute force point of view, it is going to match all the rules with all the
data all over again. Does it need to do that? Now, if you look at this first rule here, which
says play any card, it is adding one working memory element essentially. And let us say
I had a 100 rules for playing cards and all of them I have done the match in the previous
cycle and all of the above matching that say 2, 3 instances. So I have 3 or 400 instances
of rules which I match in the last cycle. What I have done when I executed this rule.

Let us say this was executive for some reason or this one is executive for some reason,
yeah I have added 1 working memory element to my these thing. So, why should I match
those other 99 rules again. In this particular example, I am only adding 1 working
memory element and assuming that those 99 rules do not have negative clauses. Even if

619
they have why should I do that match all over again essentially. Rule number 2, rule
number 3, rule number 7 they were matching with some data which is nothing to do with
this 4 data elements that we are talking about.

May be some, may be the some rule was matching a data here and data here and data
here. If it was matching in the last cycle, it will match in the, this cycle as well. So,
essentially what the rete net does. So, what is the match doing? Match is taking rules and
data which is working memory and producing the conflicts set.

(Refer Slide Time: 37:49)

The rete net is an algorithm or the rete algorithm is a algorithm which uses a structure
call the rate net or let me just a use algorithm here, takes as input changes in working
memory which is what this execute action is doing. This change is in working memory,
its changing, is deleting a few elements and adding a few elements. This algorithm rete
algorithm takes changes in working memory and gives you changes in the conflicts set.

So, compare this with this thing this match here. What match is doing is taking the full
working memory, taking all the rules and producing the complex set and it is doing this
every time we are going through this cycle. Every time a rule fires it go the does match
all over again or the area algorithm does is says tell me what are the changes in the
working memory element and it computes that as a result what are the changes in the
complex set. So obviously, as you can imagine this is must less work as compare to

620
doing the full match all over again essentially. We only have to see the effect of these
changes into the conflict set.

May be, if I have because of adding this may be because of deleting this some rule which
was firing matching earlier will not match now. So, I should able to know that
essentially. Likewise if I have add this some new rules may come into the conflict set
and may be some rules might even go out of the conflict set, if there was a negative
clause here. I should be able to capture that essentially.

So, in the next class we will look at this rete net, which is basically a sort of a
discrimination net or you can some people are use other terms like many sorted network
or it is a generalization of a trice structure, but basically it is a network which
discriminates between different kinds of data essentially. You might even think of it as a
extension of binary search tree, which this discriminates between only one kind of data
which is numbers. So, its sends you down one branch or the other.

So, same principle holds except that this is of multifaceted data and we have different
kind of test which are being. So, we even have a test which says that the rank is less than
r or rank not equal to r or something like, things like that essentially. So, all those tests
should be will together. So, will take this up in the next class which is on Friday and
complete this part of the this unit.

621
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Lecture - 32
Rete Algorithm

.Let us begin; we are looking at rule based systems. As we discussed earlier, in rule
based system, the program part, if you want to call it, is written as a set of rules. Now,
these rules can do all kind of things. They can do what you would do in normal
programming, for example, sort an array, and things like that, but they are also, by
instrumental, in capturing what we would say as expert knowledge, essentially. So, we
catch a domain expert and try to get knowledge out of expert, and put it in the form of
rules. So, you could mix up rules of different kinds into one system. That is why, in the
70s and 80s, this was known as the domain of expert systems. It was quite in exercise to
get to experts and elicit their knowledge from them. There was a whole exercise of what
they use to call as knowledge acquisition where, they have protocols of how to talk to
expert, and how to extract knowledge from them, and then put it in the form of rules,
essentially. Eventually, as I said in my last class, this technology, all this has stabilized
into something, which is more popular among business users, because users in banks and
other such organizations; they are able to express what they call as business rules in a
very simple language, and the rest of the programs takes care of everything else,
essentially.

622
(Refer Slide Time: 01:59)

If you remember that a rule based system, consists of set of rules what we call as a
working memory, which basically, holds the data and an inference engine. Inference
engine is a program, whose job is to select rules, and apply them to data, and repeatedly
do that, essentially. So, we saw that what this does is that it is a process, three stage
process. We have match; what match does? It takes inputs. So, this is the most important
part of a rule based system; what it does is it takes a working memory as an input, and it
takes the set of rules as input, and produces what we call as a conflict set. So, what is this
doing? If you remember that each rule is of the form pattern 1, pattern 2, pattern n, and
some actions on the other side. The actions were, as we discussed earlier; either, making
new data elements or deleting some. So, if you look at these patterns, this is what is used
for matching the data. If you collect together, all the patterns from all the rules, then you
would have pattern 11, pattern 12, pattern, let us say, 31, pattern 32. So, we will have a
whole list of patterns, and this working memory is an ordered set; ordered, in the sense
of time stamp. So, all these are working memory elements.

In the match phase, because we want to be complete; every pattern must be compared
with every working memory element, because you want to produce all instances of rules
which are matching; not just one instance. So, every pattern has to be compared with
every working memory element, and that is to be done for this. So, it is a human less
amount of task to be done. What this produces is a conflict set. The conflict set is a list of
rules or instances of rules, along with a time stamp, like this 1, 2, 3 and so on; time

623
stamps of the data. Then, the task is to select which of those matching rules, we have to
fire. We discussed a few conflict resolution strategies, like specificity, and recency, and
mean sense analysis, and so on. So, this is given to an section called resolve, which
selects one rule from the conflict set, along with its data and then, this is given to rule
plus data, and by data, we mean the working memory elements; execute.

If you remember, the effect of execute is either, to make, which means to create new data
elements, or to delete, or remove. These are the two principle effects we are interested in.
Others, like read, print and halt, are immaterial at this point of view, and then, this is
repeated into a cycle, again and again. So, we have observed that this brute force match
has two deficiencies; one there is possible that different rules may be looking at the some
of the patterns, they are using, may be shared, essentially. So, for the upper, this pattern 1
may be there in many rules; pattern 2 may be there in many rules, and so on, but here, we
have listed them as, you know; patterns of rule 1; patterns of rule 2; patterns of rule 3;
and so on. So, they are matched multiple number of times. They may be even shared
partially, as we will see in an example today. That even, if they are shared partially, we
should be able to minimize the number of matches that we do. Remember that these
patterns are made up of class names and attribute value pairs. For every attribute, there is
a test condition, which could be equality that it must match the data, or it must be
something, like greater than a value, or not equal to a value and things like that.

So, that is one thing that we would like to do. The other thing that you would like to do is
that, because when we, let us say this, some rule 3 is selected in the resolve phase, and
let us say this rule 3 matches these pieces of data; let us say 3 pieces of data. It is
possible that when this rule 3 executes, it may say, delete this data. So, let me put a cross
here; delete this data and may be, it might say, add a new data, or let me use a term;
make here. So, let us just take this simple example. We are deleting two elements. Let us
call these elements, A and B, and you are adding a third element, which let us call as C,
essentially. That is the only change you are making into a working memory. Two
elements have been removed and one has been added. So, what we expect in such a
situation is that most of the other rules, which are matching with some data, earlier in the
last cycle, will continue to match in the next cycle. We would like to avoid the trouble of
matching them all over, again.

624
(Refer Slide Time: 09:23)

So, these two properties of this particular match algorithm are done away with by the
algorithm that we are looking at today, and it is called the Rete algorithm. It was, as I
mentioned earlier, given by Charles Forgy in 1979 as a part of his PHD work. Based on
this, he divides the language called OPS5 which, some people say, stands for Official
Production System Language. Remember that, we also called rules as productions, and
all this was happening in CMU at that time. Today, we want to look at the details of this
algorithm; the Rete algorithm and its structure. So, let me also rewrite the rules that I
wrote in the last class. So, remember this card playing rules, we have said.

(Refer Slide Time: 10:18)

625
This is an OPS5 syntax; something like this. If you remember, the first element in this
list is the class name of the data structure; the second element which, with this symbol is
the attribute name; the third element is the value of the attribute name, this attribute; and
you can have multiple attributes and their values. When we enclose something in angular
brackets like this that represents a variable, essentially, which means, it will match
anything. There are no data types involved here. Let us say, we also have a working
memory element, which says, who has to play. So, this one says, it is a turn of some
player P, of course, this is a variable. So, it could be applied to any player, and card suit s
name X, and we are saying, in this third pattern that there is a card, which is also;
whenever, we have same variable name in two different patterns in the same left hand
side of the rule, they must match identically, to the same value, essentially. They cannot
match different values. So, if there is a card of this group is played, whose name is X and
is held by player P, then this player can play this card.

I will not write at the right hand side, because today, we are only interested in the match
part of the algorithm. Then, we had written another rule, P highest card. So, instead of
playing any card, now, I have a different rule, which says that you know, in that suit,
play the highest card. So, this 1 and 2 will be the same; I will not repeat that here, but
this is 1 and this is 2; it is just for our sake, but not part of the language syntax. So,
anyway, let me rub it off from here. The first was same, the third one is similar card; it
must be of this suit, s; it must have some name, because you want to use that name in the
output, say, play this card, and must be held by this player, whose turn it is to play, and
this card has a rank, let us say R. Now, we are interested in playing the highest card. So,
we have this extra bit of information in this particular pattern. Obviously, notice that
patterns can select any subset of the class information.

I put this in bracket, just to make it show; this is the condition that we are looking at; yes,
and this is the negation sign. This should be right as saying, the first two conditions are
the same that the suit being played is s; it is a turn of player P. Player P has a card in this
suit s whose, rank is R, and there is no card held by player P, in this suit s whose, rank is
smaller than R, which means higher than R; we assume that. Then, we will say, play this
card. So, in both instances, we will play this card name s of this suit; play this jack of
spade, something like that. So, let me just give you a third rule. Those of you are literate,
will recognize as playing a trump card in a, when you do not have a card like this. So, the

626
first condition is the same; the second condition is the same; we have a third condition,
which is saying that from trump suit t. So, those of you have played cards, would know
that there is something called a trump suit, and essentially, the effect of calling a suit, a
trump suit is that its cards are always, higher than other suit cards. But they can only be
played, when you do not have the other suit cards. Again, we are talking about a card
here. So, I will not write this name and all, but what I will write here is that the suit is
trump. Everything else will be the same that there is a card, held by player P, whose
name is X, and blank; we are not bothered about here, but this suit that I am selecting is
this trump suit T, and not the suit s, which I have said here.

I can even, to be explicit, I can even say, instead of writing just T, I can write t not equal
to s, but it turns out that in this particular example, that does not really matter; so, I have
not written it like that, but those are the kind of things you can write as condition checks,
essentially, and no card. So, this is the another root, which says that if the suit in place s;
if the turn of the player is P, is there; the trump suit is T, and this player P has a card of
the trump suit, and he does not have any card in the suit, which is in play; then we can
play the trump suit. So, just three different left hand sides that we are interested in, and
we will see how Rete net handled it. So, Rete net is the structure, which is maintained by
the Rete algorithm or Rete algorithm, as some people call it. Rete is the latin word,
which actually, means net, essentially.

(Refer Slide Time: 17:20)

627
The Rete net is made up of two layers. You might think of it as on the network. On the
top, is a discrimination network, and we will see what the structure is. Below, is an
assimilative network. So, the top part of the network is what some people called as a
mainly sorted, decision key or discriminative network. The bottom part collects together;
this is updated. So, what do you mean by assimilative; that for every rule, I need a set of
patterns to match, and the bottom parts will see that it gets those three patterns matched,
essentially. How does this work? This basically, works by inserting at the top, these three
tokens. So, when you say you want to delete this working memory element, we generate
a token; let us say we call it minus a; minus stands for the factor we want to deleting it,
and one for this token b, which is minus b, and one for this token c, which is plus c.

So, as the algorithm is progressing, every time it fires the rule, it will generate some
positive tokens and some negative tokens. Those tokens would be inserted here, at the
top of the network, which is, as I said in the last class, changes in the working memory.
These tokens capture what has changed in the working memory; which things have been
removed from the working memory; what things have been added to the working
memory. What the Rete algorithm does is that takes changes into working memory and
produces changes in a complete set. So, here, are the different rules R1, R2, Rk; so, the
network, which you will see in a little bit more detailed in a moment, is a compilation of
the rules, essentially. This network is defined by the rules that we have in the system, and
it is basically, a different representation of the same rules in a network form. The
working memory, this part, sits in locations, which are nodes in this network, as we will
see in detail. So, the working memories, kind of, distributed over this network, and you
can imagine that tokens are flowing from top to down, just like you would do, for
example, in the very simple case of binary search key; you want to find out, whether
there is a record whose, key value is 17. You will put it at the top, and you will check,
whether that root node is greater than 17 or less than 17, and it will send it down one
branch. You will keep doing that. Eventually, it will filter down to the node, whose value
is 17. In our case, also we are looking for tokens with, of a specific kind, but what we are
looking for, tokens with match our rules, essentially.

So, our tokens will follow down this path, and for example, in the first two, we need
three tokens. So, three different tokens, if they come there, this part is only to make the
search efficient, just like in a search tree. This part is to collect together, the different

628
pattern set or rule need. So, if you can somehow, collect three tokens for this tool one,
and the three tokens arrive here, then we will say rule one is matching. Once it is
matching, and as long as these tokens are sitting there, it will continue to match,
essentially. Only, when we make these changes, if something happens, then it may go
out of the conflict set or something like that.

So, these rules, this gives us the conflict set, here somewhere. Remember, conflict set is
a collection of instances of these rules. Each rule may have more than one instance with
different pieces of data. Collection of instance of this rule, along with the identifiers of
the data elements; they are matching, which is the time stamps that we use here. So,
these are the identifiers. They basically, tell you when this token meet was. So, to start
with, you can imagine that whatever the data that you have, you just put the whole thing
into the net. Instead of just having a few changes, the whole working that is a starting
point, and it will go and figure out different set of rules; you will get the conflict set.
Then, the resolve face will select one of those rules in the conflict set; execute that;
generate a few tokens like this. Those few tokens would be put in here, and they will
again, tickle down this network, and make some changes on the way, but otherwise, the
rest of the match is, sort of, static in these networks. So, let us see what this network
looks like. As I said, the top part is discriminative in nature, which tries to separate
tokens of different kinds. So, that you do not have to do a sequential match. That is a
basic idea in many of these search algorithms. The first thing, we test for, is a class
name. So, we have one class name here, suit.

629
(Refer Slide Time: 23:16)

So, let us say this is the root and there is a node; first nodes are called alpha nodes. Just
some nomenclature; nothing to do with the game playing algorithm that we have been
looking at. And what alpha nodes do is that they have exactly, one parent and they do a
certain test, and the test is what we are writing in the box, and in the first level, it is a
class name; that the class name is suit here. They may have more than one child,
depending on what the rules are saying. For example, here, we have for the card, some
different rules. So, you may have more children, essentially. So, the second one is for
some attribute from the class name. Now, in our case, the attribute is played and our
value is only a variable. So, we do not really have much, but you can imagine that if we
had one rule for spades, and another rule for hearts, then you would have one branch for
spades here, and another branch for hearts here, and this test would be more specific.
You could say if the suit being played is spades or the suit being played is hearts, then
the rest of the network would be different.

In our example we have only a variable. So, it does not matter. We have only one. Then,
let us finish up the easier path first, which is the second last name, which is turn, and it
has also, only one value; see, in all the rules, we have only one value; turn of P; turn of
P. Again, you can imagine if we discriminated between the different players, if we said
turn of south, or turn of north, or turn of east; where, north, east, south, west are the
names of the place, then we would have different branches coming out of here. So, again,
we have only one branch, because there is only one pattern of this kind. Every path in

630
this represents a pattern. If we have different patterns, which are slightly different, then
they will diverge, you know; one pattern will go this way, and another pattern will go
that way. So, it is like a structure, which is similar to the Trie structure; I do not know if
you have studied the Trie structure.

It is a little bit like that, essentially. So, all these are alpha nodes. Alpha nodes have one
parent, and every node has a memory, associated with it in which, tokens can sit,
essentially. So, let us assume that memories, behind it, and we cannot see properly.
Then, other kinds of nodes are beta nodes. I will draw the beta notes with the circle, and
what they do is; pull together, tokens from different patterns; tokens, which are of
different patterns, essentially. In our example, there is no test to be done. These are kind
of independent and all we are saying is that we should have one token of this kind, and
one token of this kind. Then, we have these two patterns, at least. Our first tool needs
three patterns; we have collected two. So, these are beta nodes. Beta nodes may have or
in the implementation that we have described here, have exactly two parents. We can
have more than two, but let us assume that this is like a joint of two structures. We are
only joining two patterns at a time, essentially. So, if we have three patterns, then we will
first join two, and then, the third one, and so on. Beta node also, may do a test as you will
see in the moment. So, let me do this rough rule first, at this end. Obviously, one thing
we need is card, because that is one of our class names, and the last one that we need is
trump. It has one attribute called suit, and it has the value called t. So, this last rule needs
five patterns; one and two, which we have here. Then, one pattern will come down this
path, which is that trump path, and then, two patterns will go down this path. So, two
patterns are talking about cards, essentially.

So, one of them says that player; I have not written that there, but it is a part of that
pattern, and this P must match this P, as you will see. So, card player P suit t. So, all we
are saying now is that if this player has a card of this suit t, then he is allowed to play.
So, we joined together, these two patterns by a beta node, and we put a condition that
this t, which is coming from this side, and this t, which is coming from this side, equal to.
So, we are using the equality condition, because we are just identifying that; see the
variable names I have used the same, but they do not have be the same. There could have
been different variable names, but this test is saying that in this case, it is the same. So,
let us see.

631
Let us say I use this variable s here, and then, I will have to say this s equal to t, or let me
just keep to simple thing; see, this is t. We will look at the other one in a moment. So, it
has got four, but four these things, but it still needs one pattern, which says that it does
not have a card of this suit s. Now, we want to do a match, which says that he does not
have, or such a data element, does not exist. Now, that can be done in a various number
of ways. Remember that we said that every node is associated with a memory,
essentially. Now, I will just use some symbol, which I will claim is a symbol for
negation. The way to read this is that the token will flow down this path, only if the
count; now, this is a bit weird; only the count of the number of tokens, passing down this
path, with this value P and with this value S, is 0, essentially. So, you must have; I am
just trying to use the simple way of talking about it, essentially. What you want to really
say is that if a token of this kind want to actually, come here, then its effect on the rule
will be that it will not match, essentially. You can think of it like that. You can think of it
as saying that the count of the number of tokens. What are these tokens? This is the
pattern, which says that is the card held by player P of suit S. Now, if the player had
three cards of this suit; let us say we are talking about spades.

If the player had three cards of spades, then three tokens would come down here,
essentially, because that should be in the working memory. What this test is doing is
counting, how many tokens are coming here, and if the tokens is greater than 0, then it
will say, no, this rule cannot fire. The other way to look at is that if a token wants to
come here, then it will immediately, disable the rule to which, it is pointing to. So, in any
case, we need a joint here, and the joint should say that this S is the same as this S; so, S
equal to S. I will just write, and this P is the same as this P. So, by this moment, we have
looked at three patterns. There are two positive patterns and one negative pattern. In that
branch, we have looked at two patterns; both of them are positive patterns. If all the five
patterns match, which means, the appropriate tokens come down here, then we can join
them together, and we have to be careful here. So, P is equal to P, for example, and that
is the only thing, you have to worry about, and here is a rule called ruff.

So far, we have drawn the network for only one rule, which is the third rule. The third
rule has five patterns; this one and two are same as this; that the suit will play is that the
turn of player P. The third one says that the trump suit is t. The fourth one says that
player has a card of t, and the fifth one says that the player has no cards of the suit, which

632
is being played S, essentially. So, this snow card is captured here, because of this
symbol.

This captures the first two patterns, and this is the third pattern, which is the trump suit,
which defines a trump suit. This is the fourth pattern, which says that the player has a
card of this trump suit, because T equal to T, and here, all the five patterns will come and
sit here, and we will say, this rule is now, gone to the conflict set. Let us take the first
tool, which is the simple rule. All it says is there is a card of suit S, name X, player P. So,
I could simply say it here. I could say here, I need one more thing which is named. So, if
a pattern wants to come down this path, it would say that it is the card held by player P
variable P, some suit, which is a variable S, and some name, which is variable X, and I
can join it with this, and put all the conditions; that S equal to S, which means S coming
from this side, is the same as the S coming from this side, likewise, for the P coming
from this side, P equal to P and I would have that any card.

Now, you can observe that the only thing I did at this moment, for the rule number one
was to add this box here, and this beta node here. This is the alpha node and this is the
beta node, and nothing else, I have to do. The rest of the match, it is the first pattern; the
suit being played is S; the second pattern, the turn of player P is already there, in the
network. I have to just take a lead from there, and I have already joined them together.
Then, take this third token from here, and at this point, I have three tokens coming
together, which is what I need from a first rule, essentially. So, this intra cycle savings,
so to speak, that in the same cycle, I am matching this rule and this rule, but I am sharing
in this network; the matching work, which is required for those rules. Only, some things
are different, which I do separately, essentially. You can take the middle rule, as you can
see, it always takes something from here, and it needs a card of suit S name X player P
rank Y. So, we can take this from here. Let me use a different color. So, we can take this
from here, and add another alpha node of rank R. I can take still, one more from here,
and add another alpha node, which you will say, rank less than R. Remember, R is still a
variable at this point of time. Then, I can combine them by saying this R equal to R. So,
what is happening in, if you can see this purple part? We are talking about two tokens
flowing down this path up to here. Actually, we do not need this name, sorry. So, this
should actually, come from here.

633
The card held by player of suit S, whose name is X, and whose name is R, and somebody
should have pointed this out. I need this negation node here. So, in this rule, the third
pattern is flowing from here; card held by player P suit S name X rank R, and its coming
here. The fourth pattern is coming from here; card held by player P suit X, then it is
coming here, whose rank is less than R, but then, there is a negation here, which means
the player does not have a card of higher rank, and this I can combine with this.

With all that P equal to P, and S equal to S, and I will get the second rule, which is
highest card. In this network, see, we started off by creating network for the third rule.
Then, we added few more edges and nodes for the first rule, and then, a few more edges,
missing for the third rule. So, notice that there is only one branch going down for the
cards being held by players, which means that every card that every player holds, a token
will go down this branch. The same token, if you look at this structure here, all you are
saying is that it is of suit S, and this is of suit T where, S and T are variables. So,
obviously, every card, let us say it is spades. So, if there is a spade card held by this
player P, the token will come here, as well as, it will also go here, but only if it matches
this, it will match this rule number four; otherwise, it will not match this rule number
four, and only if this matches this suit, which is in play, will the player played this card,
essentially

The Rete network is a network, which is a compilation of the rules. So, if you have this
network, then you do not need the rules, essentially, because it is just a different way of
writing these rules. The task of the software engineer here, is to be able to view the set of
rules, and construct a latter network corresponding to this, essentially. That is the first
part, building the network or compiling the program, as you might say, and then, you put
in all the data from the top. This act like a discrimination network, if the token working
on the element is of class name, card; it will follow this path. If it is of class name,
trump; it will follow that path, and that each stage, it is a test; that it has to satisfy; call it
to move on. This one says that the effect is opposite, that if such a token exist, and this
rule will not fire. If there is no such token coming here, then this rule will fire,
essentially.

So, obviously, you can see that this needs four tokens; four positive tokens, and one
negative pattern, which stands for the fact that they must be no such token. So, if a token
comes down this path, that rule will get disabled. Now, we can also understand what is

634
going to happen in this cycle here. Whenever, we execute a rule, we add these new
tokens, and put them down the network; sort of, we push them from here, and they will
travel some down.

(Refer Slide Time: 42:10)

So, it is possible that we have created, only these three; let us say these two tokens, and
this tokens come in later, or let us say we have these four tokens, and in the last; and let
us say that the suit being played is spades, and in the last round, if the program played a
spade, then we will have a negative token of saying that that card has been played by the
suit, and if that player had only one card of the suit, there is the negative token, will
come here. It will cancel the positive token, which was sitting earlier, and then, this rule
will certainly, come into the conflict set. Essentially, the effect of throwing these tokens
down, this network is to either, activate some new rules or to deactivate some existing
rules, essentially.

Any questions at this point? So, I will expect that if I give you a set of rules of some
domain like I define, you can look at some past papers, for example, you should be able
to construct the network, and show, where the tokens are. Of course, here, we have not
created the tokens. So, we do not know where the tokens will sit, and here, every test is a
very variable kind of a test. So, it will not stop tokens here, but supposing, this was
played spades; let us say this rule was only for spades. So, if I had spades here, then if a
token with hearts or diamonds or clubs come, it will just sit here. It will never be able to

635
reach this place, because there is no rule to tackle those kinds of processes, essentially.
So, we should able to say where the tokens are sitting, essentially. So, as the last bit, we
started off by saying that this whole idea of rule based systems was motivated by a
cognitive approach to problem solving.

(Refer Slide Time: 44:40)

We are also said that the rules, for example, are the long term memory of the problem
solver, because that is the knowledge, which is acquired over a period of time, whereas,
this working memory elements are the short term memory of the problem solver, because
that is the data that the problem solver is interacting with. There are cognitive models of
the brain, which says that this is where, you store short term data, and this is where, you
store long term data. So, in some sense, this is the model for doing this kind of reasoning.
There are variations of this, we can think of. For example, you can think of a language in
which, you can create rules on the fly, essentially. That would be difficult to handle in
this, because here, the network that we have described is a static network, and we have
said it is a compilation of the rule that we have; how do we handle a situation where, the
rules can be themselves, added or deleted from a knowledge base?

636
(Refer Slide Time: 45:42)

Finally, I should add that you should look up system called Soar, which is a successor of
this language, which amongst, apart from this whole idea of using a Rete network; the
Rete network has become like a fixture in all these business, I said, last time we have
mentioned that. You know this Rule Business Management System or Business Rule
Management System; BRMS. So, there is lots of software, which is commercially
available for managing business. Rete networks or its improvements; I also said that 4G
created a newer version of Rete network, which is called Rete NT, which is 500 times
faster than this network that we have just described, but unfortunately, we do not have
descriptions of that network, because it is a trade secret, and it has not been revealed, but
they have been several improvements, in terms of speed, that have been made to the Rete
network. It is a big part of business rule, in general, essentially. This Soar, on the other
hand, was continued in the university environment, again CMU, by a guy called John
Laird.

So, if you look up John Laird or if you look up Soar, you would get more information
out of it. John Laird also made an observation about four or five years ago. He said that
we all know that the computing industry is virtually driven by games, essentially. I mean,
most of the faster processors, for example, are developed, because people want faster
games; you should be able to do things faster. But what John Laird observed was that
games, or at least, the graphic capabilities of games, you know the ability to render
scenes, which are realistic, and ability to have characters moving around, which look like

637
realistic characters, has kind of saturated; that there is not much improvement you can do
on that front, essentially. What he says is that the next level of improvement in games
will be in giving a quote unquote; intelligence to the characters. So, whatever, avatar or
whatever, characters we have in game; if they are intelligent, which means if they have
some knowledge of some kind, and they can do some useful, what you may call,
thinking; then, that is going to be the next level of interesting list in game, essentially. He
is the same guy, who is maintaining this Soar architecture.

So, those of you are interested in games; I am sure, most of you are; I would encourage
you to look at Soar, and if you want to build a game, which has intelligent characters
inside it, then this is a mechanism; this is a language, which allows you to; which is a
kind of decedent of OPS5, which allows you to express what kind of knowledge that you
want put into to an agent, inside a game.

So, we will stop here with this. In the next class, I want to move to a different topic,
which is that of planning. We have talked of planning on and off, but we will look at a
closer look in a little while.

So, we stop here.

638
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Lecture - 33
Planning

(Refer Slide Time: 00:15)

So, today we move on to this topic on planning. So, in the next 6 or 7 lectures, I will cover
little bit of planning and little bit constraint satisfaction, which is I think enough for a
introductory course on A I. But next semester, Dr Narayan swami and I offering a course on
planning and constraint satisfaction, which will go into much more detail in the topics. So,
what we are doing here is just a basic, but the more recent advancements will probably do in
that course essentially.

So, there is a nice book on planning, which we are not following here by Malik Galab and
Dina Noven, call automatic planning. They start the book by this definition of planning and
they say it is the reasoning side of acting. So, planning is concerned with actions. Essentially,
we are not deal with them explicitly on and off. For example, we looked at heuristic for the
blocks world. We talked about moving blocks around, but we did not really explicitly module
domains and we do not explicitly module actions. We will do that in the planning path that
we are doing now.

639
Now, just imagine an agent. So, this is world and the agent inside the world, but will draw it
here. It perceives the world and deliberates and then acts. So, this is a model of an agent in an
environment that we have. The agent perceives the environment or the world, does some
deliberation or some reasoning as Galab and all call it. Then, based on that reasoning the
agent does acting in the world. Of course, we as human beings do it all the time. We are
planning consciously or unconsciously. Whatever, we do these are not random actions that
we do. Anyway almost everything we do has a certain goal in mind. We think about how to
achieve that goal and our actions are oriented towards that achieving the goal essentially.

So, goals of course, can be long term or short term and things like that, but all the time in the
world we are planning. Now of course, in the world of computing, planning is becoming
more and more important as systems are becoming more and more autonomous. So, for
example, you must have heard about autonomous cars now a days. So, there is stand forth car
and if there is a car rally for autonomous cars. So, you have to let loose the car in some sense
and it should be able to go to a destination.

Now, if you think about what is involved here? It is not just route finding, which is what we
have discussed to some extent, but to drive a car there are many many actions that you have
to do. You have to look out for other cars, you may have to slow, you may have to break, you
may have to accelerate. All these are actions and somebody has to decide or some agent or
some program has to decide, which actions to be done at what time essentially. That decision
making or this reasoning side of acting is what planning is all about essentially.

So, we imagine that we have a planning system and that interacts with some people call an
executive, which interacts with the world essentially. So, I using the term executive, it was
used by NASA in one of their early systems, the remote agent architecture. So, you should
look up on the web about this remote agent. It was experiment done by NASA experiment
called deep space.

Sometime in the mid 19th, they had this aircraft flowing spacecraft, flowing somewhere in
space going to earth some asteroid or some planet or something. In between for 2 or 3 days,
they had removed ground station control from that space craft, which means they had let it
function autonomously, deciding for itself what it would do in the next 2 or 3 days. During
that period they had also introduced deliberately a fault in the space craft and which was used

640
to testing their automatic fault diagnosis system, which of course, we are not doing here, but
apart from fault diagnosis planning was a experiment that they did.

Then, you must have heard about mars lowers and the more recent lowers that have been sent
to mars. Given, the fact that it takes of the order of 20 minutes, I do not know the exact figure
for a signal to travel from earth to mars. You cannot control vehicle on mars as you would be
controlling a vehicle, let us say in your lab. If you had a remote or something like that
because the signal takes that much more time in such situations, it is inevitable that you have
autonomous planning built into the system.

So, both in the remote agent architecture and in the mars lowers, what the scientist do? is that
they give high level commands go and explore that area or go and collect samples of this in
from this area. Then the rest of the activity is planned by the system on board. So, planning of
course, these are some of extreme examples space example, but planning happens all the time
essentially. You know scheduling holiday planning, you go to some flight reservation
systems, they will tell you that if you want to go from here to basils, what are the flights you
should take? So, all this comes under planning some way or the other essentially.

So, let us look at this model of what is happening in the world. Let us see to what degree of
complexity, we can capture this essentially. So, in planning systems we are going to do the
following. We are going to talk of domain description, action let me use the word
specification or action description, describing what actions are available to you and of course
goal states.

641
(Refer Slide Time: 07:44)

We will look at this in little bit while, the early planning systems were kind of domain
specific. People would say I want to build a system for planning in this domain or planning in
that domain and so on. But following the theme that we have been in this course, which says
that a problem solver must be removed from the domain. You want to find general purpose
planning systems or else this community calls it domain independent planning essentially,
which means that we will devise a language to express all this.

So, this is like domain functions that we talked about in search. The planner would be
independent of this. You would write the planner assuming that, you would read a domain
description and a problem description or goal description and the planner would
automatically compile and run essentially. So, one of the earliest planners that was built was
called strips.

642
(Refer Slide Time: 09:17)

We had possibly talked about it at least in the context of this Robert called Shakey. So, if you
remember Shakey was one of the earliest Roberts, that was built in the computing
community, which could roam around a corridors of stand Ford University. May be looking
for forward points anything like that know where you could share itself. It had a on board
camera and it was on 2 wheels. So, if you just look up Shakey, you will see some
photographs and I had shown the photographs during my introduction as well. It was fairly
autonomous cam of course, it could not do much, it could just roam around the place and the
planner that they used was called strips.

Some people say, stand ford research institute planning system essentially and that was the
earliest kind of language description, the simplest kind of language description that you can
use for planning. We will sought of look at this language to start with this essentially, what
would do all this people had started working with devising languages. So, these are called
planning domain, description language or in short PDDL and there are versions of PDDL.
You start with PDDL 1.0 and we will look at only PDDL 1.0 here in this course and which is
basically, what strips does essentially?

So, what happens when you go to higher levels of language? Basically, the richness of the
language increases, you are able to describe the domain as well as actions of more complex
nature. So, let us see what are those kind simplifications that we are making in strips. So, we
had describing strips domains now. So, it is a term that has become common to use for the

643
simplest kind of planning domains. We call them strips domains because that is what this
strips planner use essentially and I will describe them in a short way.

So, these are the simplifications that we simplifying assumptions that we use, one is that the
space is finite. So, you can still work with this idea of space search that we started this whole
course with where you are in some given state. You want to be in some desired state and the
actions that you have to chose only now we will. So, that time we said there is a move
generation function, which takes a state and gives you successors state. Now, we are saying
that we have access to actions and we will reason with those actions essentially.

So, the simplest assumption is that the space is finite essentially. The second assumption that
we make is that, it is fully observable which means just as we discussed when we were
talking about games. We said we have complete information games and incomplete
information games fully observable planning system is one where, the agent can sense the
entire world, which means this perception part of the agent is perfect. The agent can see the
entire world and there is no missing information essentially.

Each of the assumptions that we are talking about we can relax and we can richer in some
sense planning problems of planning domain essentially. The third is, it is static. By static we
mean that agent is the only one making changes in the world essentially. So, there is no other
influence in the world. All this is not allowed, there is no there no cloud, there is no sun
moving around, there is no rain falling, nothing. Agent is the only one, which is making
changes in the world and such a domain is called static.

Obviously, when we talk about multi agent systems advisable systems like game playing that
we saw, then this domain this assumption we are violating and the other problem becomes
more complex essentially. Then the actions are deterministic. All these are simplifying
assumptions that we are making and what do we mean by this? That when an agent does an
action, let us say pick up this watch essentially or know pickup this remote. Whatever action
the agent does, it happens in the real world and as planned essentially.

So, which means in a deterministic world, if I want to say throw this ball into the basket on a
basket ball court, then you put ball into the basket essentially. Of course, the real world is not
deterministic when you play basketball, you would realize that it needs a considerable
amount of practice to get better at it, but we will make a simplistic assumption. That the

644
world is deterministic, which means that whatever actions agent does, they will happen in the
real world.

Just imagine that what would happen, if the world was not deterministic, then you create a
plan that you will get out of this building, you will board your bicycle and go off to the hostel
and have some tea. Then when you go down, you find that somebody has stolen your bicycle
or may be bicycle is punctured or something like that, then you can no longer do the
planning. So, the effect of deterministic action is, once you planned your plan, plan can be
executed without any false essentially because the world is fully observable and static and
deterministic. If you make a plan, the plan will execute in the world.

So, you want to worry about monitoring the plan and things like that essentially. So, now a
days of course, there is a big community looking at stochastic actions or probabilistic actions.
So, you must have heard about probabilistic planning. So, they follow a entirely different
approach to planning, what we are doing is something like search. They do something which
is quite different. So, we have this mark of decision processes and so on.

Doctor Ravidran talks more about such things in his course and I think he is also offering a
course on probabilistic reasoning next semester. Then simple goals, I mean that we have the
goal test function is on applied on the final state. You say that this is what you want to
achieve, you want to be in the hostel and you want to be eating dosa and having some coffee
and if that is a situation, then the goal is satisfied. So, that is what we will call it as simple
goals. Goals tested only on the final state as opposed to this, there is a community.

So, if you look at higher levels of prejudice for example, they would talk about what we call
as trajectory constraints, which means you have conditions on the path that you have are
finding for the goal essentially as well. So, the trajectory is the solution and you have
conditions on the solutions essentially. So, for example, if you are planning a long trip, you
might say that that at all points I must be within 5 kilometer, some reason let us say your
friend is not too well and or susceptible something. You say that my route must be such that
all points I must be within 5 kilometers of a hostel.

So, that is a trajectory constraint you are may be saying that, I want to go from here to
Nagpur, that is a final goal, but I have constraints on the path as well. We will not look at
those, we will only simple goals or you might have constraints like a every time you go into a

645
room and out of the room, you must switch on, switch off the lights or something like that.
These are constraints on the path essentially also soft constraints.

Our goal description is rigid that only if those conditions are met. we will say that a goal has
been achieved essentially. You might have soft constraints or soft goals, which might say I
would like to go to the market and if possible I want to go and see a movie. You know by all
these stuffs and do some, you may have a plan, but you may accept the plan, which does not
achieve all the goals completely, but achieve as many as possible. Then you would have to
save a penalty associated with that.

So, in that if I do not know take this goal of buying a tooth paste, my plan is not as good as a
plan, if I had also both the toothpaste. Then it becomes a optimization problem because you
have to now find a plan, which satisfies certain optimality criteria. As refers to this
satisfaction criteria which says, if these things are true, then I have achieved my goal
whereas, here you are willing to relax it and that is why it is called soft constraint that you are
willing to allow for some things not to be achieved. You would still accept the plan
essentially; obviously, with each such condition being relaxed the problem becomes harder
than harder.

Then a plan is equal to sequence of actions. We will assume that our plan is a sequence of
actions. So, we do not want more complicated plans, which have networks of actions some
things like that. And one more constraint which is very important, it is being relaxed quite
often now a days which is the notion of time. So, will assume that we do not have a notion of
time to start with. We will just say action a happens, then action b happens, then action c
happens. There is only a notion of sequence that actions happen in a sequence whereas, in the
real world, some actions may take more, some actions may take less amount of time.

So, you may say I will take a bath and they have a cup of tea and then cycle to the gate. Each
of these actions may take a different amount of time. If you want to take that into account,
then you want to have what we call is. So, what we have is instantaneous actions, but what
we can have is just write versus durative actions. So, by durative actions we mean actions,
which have durations and then of course, you can even start talking about doing things in
parallel. Once we were talking about time and durations.

So, you can say that while my sambhar is being cooked on the left side gas, I will make
something else in the right hand gas. So, that things will happen in parallel, this will take so

646
much time. Then you might worry about what is the total amount of time that my plan will
need to execute, which is call make span in planning terms, but durative actions brings these
kind of situation.

So, one interesting problem which was raised by a P. SE. student from one of these
universities, I think Washington university is the following. He says that and this was a
problem, which could not be solved by the techniques that were there till 2007 and it is a task
of repairing a fuse essentially. So, it is a very simple problem. Let us say this is a duration
whatever time t you need to repair the fuse. So, the situation is that lights have gone out in the
apartment and you are repairing the fuse and it takes you that much time.

So, the action is durative, it means it takes that much amount of time. You have one match
stick essentially and the match stick can give you light for so much amount of time. So, this is
along the x axis though y x is meaningless. Basically, this is a interval over time so, this takes
so much time, this takes so much time and the task is to repair the fuse, but you need light
when you inserting the fuse in to the socket. So, essentially this match stick must be leaked in
such a way that it overlaps with the end part of this fuse repair action.

Now, the trouble with this is from first yearly we are not into details here. The thing is that
when you are dealing with durative actions, you cannot talk about time going in know
seconds or something like that, that would become simply unmanageable essentially. So, you
do not do that, what you do is I do anything at this time point, when I starting this action.
May be I can start another action or no something like that. Essentially can I do something
when this action is running. So, time jumps from beginning of action to end of action
essentially and then you plan to do something.

Now, the whole problem with this is that you cannot reason about it here, you cannot reason
about it here. You have to set of reason about it in such a way that, this comes into here. So,
this gave rise gives gave rose to some many interesting problems in how to build systems,
which will do. Now of course, we have planners which can do this and they would place an
action somewhere in between so that this saddles this essentially. So, durative actions of
course, increase the complexity manifold essentially.

So, one way that some people have tried to model durative actions by seeing to split it into
two actions saying, this is a start action and this is a end action, but durative actions will have
know conditions like that. Something must be proved throughout or something must be. So,

647
for example if you are taking a bath then you might have condition that the light must be on
throughout that, you are afraid of the dark or something like that.

Now, if you traveling from place a to place b, then at the end of the action you should be at
place b. At the beginning of the action you must be at place a and in between you must be at
neither place. So, this kind of constrains on actions make complexity of planning more and
more difficult essentially. So, I should say here some sense cross. So, we are not using
durative actions in strips domain. These are the simplifying assumptions that you make in
strips domain. Actions are instantaneous, the goal is fully observable actions are
deterministic, the world is static and we have simple goals.

Even in these simple domains it was shown to be p space concrete. So, planning is those one
of the hard problems that you are trying to solve and it will always need some exponential
amount of time essentially. Every time you relax this constraints, the problem becomes more
complex in terms. So, let us look at how strips describes actions. So, we will use this blocks
world. You know we already talked about blocks world, moving blocks around. So, we will
we will see how blocks world is described in strips essentially and PDDL is basically is kind
of a standardization of what we are seeing here essentially. So, first we have domain
predicates.

(Refer Slide Time: 27:26)

So for example, we might say on x, y where, x and y are variables to signify that block x is on
block y, then we can have I use a once used in strips on table x, then holding x. We assume

648
that there is a one arm Robert, which is moving the blocks around, so that one arm Robert can
hold one block and that is described by this predicate saying holding block x. So, this x y
would be filled up with variable with constraints in real world, in a real problem essentially.
Clear x says that there is nothing on top of x essentially, which is a kind of a short form or
saying that another case there exists y.

So, we will look at first or logical little bit later in the course case, you are not familiar with
the language. So, whenever you define a domain, you have first define what are the
predicates which will define the situation essentially. So, instead of saying that we have some
state representation, now we are saying that you must be define a predicates which describe
the state and the state will be a collection of such sentences.

(Refer Slide Time: 29:35)

So, for example, if I have a situation like this A is on b on c something like this. Then I will
describe it by saying that on a b, on b c, on a d, on table c, on table d, on table f, clear a, clear
e, clear f and this is my one arm Robot. I need a predicate for that which we will call arm
empty. So, either you are holding something or arm is empty essentially. So, we can describe
this state using these predicates essentially plane values of the variables. We have the
operators or the actions. So, in this simple world we will assume that there are 4 actions, one
is pickup.

649
(Refer Slide Time: 30:33)

So, in PDDL you first describe what are the predicates that you will use to describe the
world, then you describe actions. Now, actions are described by 2 things, one is what is
necessary to the action to be applicable? What should be clue for the action to be applicable
and secondly, what will become clue after the action is applied essentially?

So, in some sense we have the left hand side and the right hand side like in the rule that we
talked about earlier, but in this strips kind of a thing, we would say that they are three list, one
is a precondition list, one is the add list and one is the delete list. This is the original way in
which strips describe the actions. So, precondition list is those predicates which must be true
for the action to be applicable. So, if you want pickup list to be applicable, it should be true
that it should be on the table and it should be clear, which means nothing must be on top of it.
I will just use for this AE and arm must be empty.

So, if these 3 predicates happen to be true in my domain, which means of course, I have to
just see the precondition list is a subset of my state description. The state description is the set
of predicates which is describing this thing essentially. So for example, this will have on table
f and it will have clear f and arm empty. So, because these things are in the state description,
this action becomes applicable. I can pickup f for example, that is the way that strips original
description is given.

So, if you look at any text book, we use that we will distinguish between the 2 actions in
which we pickup something from the table and we pick up something from top of another

650
block. So, that will have a different precondition just to illustrate what is happening? It has to
be owned something either on the table or on another block. So, this pickup is only for
picking up things from the table.

So, the add list as they used to call it is a list of effects, which will become true if this action
were to be executed. In this case the only thing we are put is that the Robert will be holding
this block x and the delete list contains those things, which will become true, but no longer
true. That includes things like on table x and arm empty x. So, it also include clear x, but as I
will discuss in the moment. You can either like it or you can ignore it, it does not matter. So,
corresponding to this, I have an action called put down. So, precondition list says holding x
whose add list contains on table x arm empty.

Now, if I want to remove this clear x from here, then I should add it here. If I do not remove
it then I should not add it here. You just think about this that either I must add it in both these
places or I can live it out of both these places. What is important is that if you are picking up
there must be nothing on top of it. So, I need this clear x at that point. It is obvious that once
you pick it up and put it down on something else, there will be nothing top of that at that
instant. So, clear x will remain true after that essentially.

(Refer Slide Time: 35:51)

Corresponding to these there are two more actions which we call as unstuck because I will
just name the actions or operators. I will expect you to them to describe it in more detail with
this precondition list, add list and delete list. So, the only difference between unstack and

651
pickup is that, the pickup is from the table and unstack is from some other object. So, you can
unstack a from b or you can unstack e from d. The conditions are similar a must be clear, a
must be on b and arm must be empty to unstack it and likewise you can stack it. In the more
modern PDDL language, I would call this as positive effects.

(Refer Slide Time: 37:06)

So, if my action is a then this list is the precondition list of a, this is the effects positive of a
and this is effects negative of a. So, in the modern PDDL language we do not use this name
precondition list, add list and delete list. We will say preconditions and effects and effects can
be positive or negative. Positive means they become true after the action is applied, negative
means that they become false after the action is applied and the two of them together call
effects.

So, we describe a set of action using the precondition list and the effects list of the actions
and we can basically now have a language to define any domain in which you want to do
planning. What we will need to do is, define a set of predicates which describe the domain
which means describe the states of the domain and then define the actions which can be done
essentially. So, what is the planning problem? A planning problem is defined by a set of
operators or actions a domain description, starts state and a goal description essentially. So,
this is the start and this is the goal. So, for example, this could be my start state, I have not
written it in this language, but you can write it quite easily. My goal description could simply
be something like a, I could say on, this could be my goal description.

652
(Refer Slide Time: 39:22)

So, observe of course, that I do not need to completely specify the goal as a state. The start
state I need to completely specify, you say that it is fully observable. Everything that is true
must be given, but a goal does not have to be completely specified. Your goal may simply be
that as long as those 2 things are true, then I am happy with my plan essentially. I do not care
where the other blocks are as long as e is on c and c is on f, then I am happy. So, e is on c, c
is on f, this f could be on something else or it be on the table and I do not care about.

Now of course, this is like what we do in the real world. We do not necessarily describe the
goal completely. Let us say this is what I want to be true, I do not care about anything else
essentially. So, the goal description is a set of predicates, the start description is also a set of
such predicates incenses of predicates. When do we say that action a is applicable?

653
(Refer Slide Time: 40:54)

If precondition of a is a subset of start, it is just like saying that if my preconditions are true,
then well s does not have to be start it can be any state. So, an action a is applicable in a state
s, if preconditions are part of the state essentially, which means that they are true essentially.
We say that when the action is applied we get a new state. So, we progress new state s prime
where, s prime is given by the given state that we are in. I will use this as a minus sign or I
can use a standard minus sign whatever the effects minus of a union effects plus a.

So, once you define a state and once you define an action, we know that if an action a is
applicable in a state s, then I can apply action a to state s to go to a new state s prime, which
is given. I remove everything which other negative effects of the action from the state and
add all the positive effects of the action, so I get a new state s prime.

A plan pie is a sequence of actions. So, we have assumed here that our plan is a simple thing
like sequence of action. So, plan we will use a term pie is a sequence of actions a 1, a 2 an
essentially. So, the first thing you want to do is to write small routine, which will validate a
plan that this plan is doing my task. What is the task? The task is that I have been given a
start state, I have been given a goal state and somebody tells me that, this is a plan somebody
meaning some program that I have attend. The program should be able to validate that, this is
a plan how do I do that before we do that, how do I do goal test? How do I say that a given
state is a goal state?

654
So, goal test is simple g should be a subset of state s, g is a set of predicates describing the
goal. So, if g is a subset of s, then that s is a goal state. s is a state which describes the whole
state. We just some conditions have set on those on the state. So, it is like saying that I should
be stick on the canteen and having a dosa, I do not care what everybody else is doing, if that
is true then i am in a goal state essentially and that we can test by simply doing a subset
essentially.

So, now having a goal test function how do I and somebody gives me a plan pie. How do I
know that the plan is a plan for solving a problem. I must able to write a small program to
validate that what will that program do it will progress over the state. So, it will apply first
action a 1 to given state, then in the resulting action it will apply a 2, which means it check
whether a 2 is applicable and apply a 2 and then keep doing that till we apply all the actions.
In the resulting state that we have progressed to remember this, we have this progress
operator. How can we move from state to state that last state must satisfy that condition, that
the goal must be a subset of the state essentially.

So, this all we have done today is to describe the planning domains were to speak. The
planning domain is described using a language called a planning domain description
language. I have urged to look at some website, which will tell you a what which will give
you a actual syntax of PDDL. Essentially it constitutes of a set of predicates which are used
to describe the domain or define a domain and a set of operators or actions, which are the
actions that you can do in the domain.

The real actions are the instances of these operators, if you want to distinguish between them.
Then you can describe a planning problem by saying this is my start state. So, I can say on ab
is true, on b c is true, on a d is true, on table f is true, on table d is true, on table c is true, clear
a, clear e, clear f and arm empty. If I state all these things, I have said I have described my
state. I say my goal state is that c must be on f and e must be on c that is my goal state. So, I
can now have goal state.

Now, all I need is an algorithm to find a plan and then of course, I also need to be able to
validate a plan. We will see that some algorithms will need this validation check because it
can produce plans sequence of actions, which are not necessarily plans. So, we need this
check essentially. So, in a next couple of classes we will look at a few algorithms to find this
plan. What is the simplest approach you can think? You are talking of a specific problem

655
which is a blocks world domain. Now, this is just using as a example to illustrate our
planning algorithms. Our planning algorithms have to be domain independent, we do not
know what are the predicates or what are the actions, but we should be able to still write an
algorithm to use that.

So, this solution of course, is a very nice solution which sought of defines the fact that I kept
saying it is peace space complete, but in fact, in the blocks world domain, there is a very
simple algorithm. It says first put everything on the table, which will take you if the n blocks
at most some order n by 2 or something like that or order n. Let us see just put everything on
the table and then assemble the thing that you want to assemble.

So, you want c to be on f pickup c put it on f, you want e to be unseen, pickup e and that can
be done in linear time essentially, but that is a very specific solution to a very specific
problem. We are interested is in domain independent planning, which means we do not know
the domain, we do not know the predicates, we do not know the actions. We have a language
to describe the domain, the predicates and the actions, and the start state, and the goal state
and then the algorithms should be able to operate.

So, the simplest approach as you can get is the power state space search that we have been
doing all this along. Go to the start state apply the move gen function. How can you write a
move gen function? You have to simply test for all the actions, which are applicable and they
will basically constitute a move gen function. They will give you what are the actions you
can do, keep doing that and then we also have goal test function. So, the simplest is forward
state space search, but we will look at other approaches and we will do that in the next class
onwards.

We will stop here.

656
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Lecture - 34
Planning FSSP BSSP

(Refer Slide Time: 00:14)

So, we are looking at planning and in the last class, we had looked at this language call
PDDL very briefly we are talked about this planning domain description language. And
we are looking only at the simplest version which we can say version 1.0 which equals to
the strips domain. In this simplest version of the language actions are instantaneous the
world is deterministic, the world is completely observable, the goals are simple in the
sense that only goal state is look that a nothing else. And the agent is only 1 whose
changing the world, but we took the first steps towards rejoining. A when we if starts
talking about the language like this planning domain description language. We are
talking about representation how to represent domains how to represent actions? So
typically if you will look at the periodical document you would say they who start of the
defining domain.

So, the domain that we are looking at is the blocks world domain so you would say
domain name is blocks world the predicates or on table holding. And all those predicates
and some of the actions we saw. So, let me the couple of the actions stack x on y and the

657
pre conditions are thus set of predicates which must hold further action to be applicable.
And the p conditions first stack was on x y then clear y and arm empty. So a is stands for
arm empty will use the short form. So, if x is on y so of x and y have variables here in
the operate us, but when we actually talk about domain they will get bound to constance
which stand for specific blocks like a and b and c and so on. So, if x is on y and y is sorry
somebody should a point of this off. It should be holding x if a holding x and y clear then
you can stack x on y. And the effects positive are on x y and arm empty and the effects
negative are holding x and clear y. There were one more positive effect which is you can
add as clear x. This is not precisely that the syntax with periodical uses, periodical uses
less like’s intakes so everything is in brackets.

And you say if x then positive effects then negative effects are actually a continuation of
positive effects with a not prefix, but will use this simpler notation here essentially. So,
basically operators or actions are represented by these 3 kinds of things 1 is set of p
conditions holding x clear y for example, and then the effects so what the effects? Effects
are what are the changes that this operator makes in the world if that operators in the
world executed. So, what we a saying here is that stack x y is possible when you are
holding x and y is clear. And if you executes stack x y then x will be on y the arm will be
empty. Because you would have draft x some sense and x would be clear and this will no
long be true that you a holding x and clear y essentially. Similarly, we have define
unstack x y which is a inverse of this so the pre conditions are on x y arm empty and
clear x.

So, if you x is on y and arm is empty which means robot arm is not holding anything
remember we are talking about 1 arm robot which is moving blocks around. And clear x
which means nothing is on top of x essentially then the effects positive are holding. So,
you can see here there is a kind of a association will that clear y and the effects negative
all these things on x y arm empty and clear x. So, in this version I have add clear x go to
effects here and there so if you add it here universe as it in this also yes in that earlier.
So, as a say this is a simplest version of period l or period l 1.0 on which actions in sent
an yes and the other effects let we talked about. So, just before you move on to the
planning of it us let me illustrate a couple of riches language construct essentially.

658
(Refer Slide Time: 07:00)

So, one thing which a richer language you to apply is called conditional effects. So,
conditional effects are effects which are which come in to force only if certain conditions
are true essentially. So, as an example of conditional effects let say that you have a
domain in which you are talking about moving objects around and so on and so forth and
you have trucks and you have this kind of thing. And you want to move let say 1 table
from location a to location b. And you define move truck action which says the truck
goes from location let us say x to location y. And the conditional affects of that a that if
there is any object in that truck then that object also moves from location next to
essentially. Or if you look of the something like a briefcase domain, so if a briefcase
goes from location x to location y.

So, let us say you have bag that you carry from department to hostel then if there is a
book a conditional effect will say if z is in briefcase then z goes to location y. So, and
was writing it in sort of semi English, but the basic idea is that the act of moving a
briefcase from location x to y has conditional effect within the sense that whatever inside
the briefcase we will also get move to the target location essentially. Of course, in this
there is no sustain the effects are determines stack an every action very clearly well
defined effects essentially, but that said is a language. Another feature which a richer
language is have a durative actions which are been explore quit a lot now a days. So, as
we said in the last class strips actions of periodical one action have no concept of time in
world at all and that is, because actions a in sententious.

659
So, if you a taking up a object or putting down a object was stacking in object and
stacking in object we just assume that it happens in sent essentially. And then once a
action happened it is the effects a seen immediately essentially. But in the real world of
course, actions are not in sententious. If you go from here to the hostel it will take a
certain amount of time from go from here to the hostel so that must be a taken in to
account. So, durative actions are represented by durations or intervals so this is a start of
the action and this are the end of the action.

So, for example, we consider this action of going from here to the hostel then the actions
start at a certain time t and it has a certain duration so there is certain amount of time t
that is this action takes. Now, this time t could be constant or it could be a variable for
example, move action in the real world go from the here to the hostel it will take you 10
minutes. You go from here to the department it will take you 5 minutes you will go from
here to the gate it may take you 15 minutes. So duration could be dynamic in that sense
essentially and so this is a start of the action this is the end of the action. So, the action is
define over a duration and such search planning domains a called durative such actions a
called durative actions. And what you have is that you have start conditions or lets use a
term p condition at of you is using and then starts effects.

And you have end conditions and you have end effects so what will meaning by this?
That let say let you want to go from here to a lab on the department and we have defined
an action for that. So, the start 3 condition as you should be here and may be you have a
bicycle or something what is, whatever is other 3 condition that we have here, what are
the start effects? Start effects are the effect which comes into play as soon as actions
begin. So, the moment is start going towards the lab your are no long a here essentially.
So, that is a starts negative effect essentially what are the end effects? End 3 end may
have 3 conditions so 3 conditional could be the for example, this a lab is opened that is
the lab is open at this point of time then the action will execute correctly. And the end
effects could be you could be inside the lab or something like that. And then we have
overall conditions which have predicate which was being true throughout the action.

So, supposing you are cycling from here to some place then the overall condition could
be said that all points your cycle has enough air. It is in so if the condition fails any time
then the action wills no longer work. Now, as you can imagine once you start working
about durative actions then things become much more complicated. You have to talk

660
about start effects start conditions end effects end conditions overall effects, overall
conditions and so and so for. So listening becomes much more complicated and you can
imagine that 1 action may be like this another action may happen like this. And then
there are different kinds of relations between these intervals that you have to explore.
And there is a well define study about it Aliens interval algebra which basically looks at
all the relation between which 2 time intervals can have or 2 interval can have their does
not want to be time. So, this is for example, saying that a second interval begins before
the first interval. Hence and you can imagine that a there are many different possibilities.

They can happen simultaneously 1 can second one can start immediately after the first
one or the second one can be contain totally in the first one or second one can totally
contain the first one. So, in there are 13 such intervals and then there is an talks about
these thing with a intervals we will not get into to this here. The third that is often
introduced is found is like metric values. So, these are numerical values which can take
different values and to take an ex to illustrate an example here. It could be that inside of
bicycle let us say you using a motor cycle. Then one of the things that you will need one
of the condition that you will need is that at all points you must have certain amount
petrol essentially. So, this metric values would keep track of amount of the petrol that
you have in a your motor cycle. And it could say something like if you have ex amount
of petrol at the start then at the end of action you will have x minus some k amount of
petrol. So, that is a end effect that is they were abreaction and that introduces a third
dimension which is that of continues numerical values essentially. But we will in this
course not look at these richer domains and focus more on the basic planning which is
done using the strips periodical can of a domain essentially.

661
(Refer Slide Time: 16:08)

So, we had define something’s we at said that an action A is applicable if thus anyone
remember the pre conditions of A so will use this as a predicate. This stands of a
conditions of A so this basically stands this set you is a subset of S. So we should say
applicable in state s. So, with this planning as you will notice we have set of making a
gradual transaction into representation. So, for we did things like search, but you do not
bother about representation. Now, we a talking about how to represent a domain, how to
represent actions and also we have keeping the reasoning part in mind. And the
reasoning part is at least to start with will be the same kind of reasoning which is
searching for solutions essentially. So, because we have this explicitly representations we
can talk about this kind of rule conditions.

So, an action A is applicable in state s if the p conditions of a all the subset of s


essentially. Likewise we say a goal so when we say goal g we essentially mean a set of
predicates that we want to be true. So, for example, today we look at an example may
goal is the following 1 A B and on B B. Let a say hich stands for the fact that A is on B
and B is on B in the blocks for domain. So this thing is the goal essentially and this
comma is to be in it is and which means at both the things must be true A must be on B
and B must be on D essentially. So, goal is basically a partial description of a state that
you want to be in of the desire state essentially. So, we say that the goal g such so let say
a state S satisfies a goal g remember both of these are sets of predicates. A state is for

662
example, what is true in the world on A on B on C whatever likewise goal is what you
want to be true in the world essentially.

So, use a word goal a satisfies a state satisfies a goal g. If in the representation that we
have use g is the subset of the S. So, you can see that this can be use to define the goal
test function that have be reach the goal or not. We are working its simple goals here and
we want to see whether we have reach the state which is the goal or not. Then this can be
used to test for that essentially whether action a is applicable we can define a function
call progress let say in a state S using an action A and what this progress does is an a
takes a state s and applies they action A. And gives you a new state explain were explain
as is on the last class this S minus the negative effects of A union the positive effects of
A. So, we have a progress function which allows it a move from one states to next state
essentially. So, we have basically the machinery for doing state space search here we can
test whether than action is applicable in a state on a not by using this applicability test.

We can progress from the state to the next state so in the in effect we have done the
move gen function. By if we just select all the actions which applicable in the state we
have the move gen function and then we have the goal test function here which says that
you can test for the S on goal is not essentially. So, forward state space search essentially
does the kind of the search set we have in talking about. So, let me give you a so a
planning problem is define by a start is as before a goal so start s let us call it goal g and
a set of actions essentially. So, obviously we have define the domain using a language an
having define the domain in the define in we have define the set of actions of the
possible domain. And a specific problem is when we have define the starts at s a goal say
g and the set of actions that a available to you defines which can be use to go from start
to goal; essentially. So, let us that is assume that we have some simple domain.

So, we want that A B and D so let a say a given starts it is the following A is on B lets
say this is on the table. And let a say C is on D and then E is on the table Ff is on the
table. Some arbitrary starts seat I have given so let us say this is a start and that is a goal
say that you want to achieve which is that A should be on B and B should be on D
essentially. So, again that we repeat that is a goal is a description and the description may
be true in many states as you can imagine. If you just have this a on b on d then the rest
of blocks will be available in any way and as long as those 2 predicates on A B and on B
B or there in the state. That state is the goal says that is what you have said here that a

663
state S satisfies the goal g if g is the subset of S. Now, we want to do this planning and
we start with forward state space search. Because that is ((refer time 23:41)) familiar
with.

So, how it forward state space search? To gets is exactly like space search that we have
to been talk to about so far, that given a start state S. You can apply many actions and so
inside of writing the actions let me that the action was saying by using this arrows. So,
this arrow says that I am unstack A from B that is one action I can apply I can unstack C
from D, I can unstack F from E, I can unstack I from H is I can pick up. If you remember
there a 4 actions we talked about stack unstack and pick up and put down. The only
different between pick up and unstack is it unstack happens from another block and pick
up happens from the table. So, just we will a says how to use different action and this
one? So, the move gen function in this particular state would generate these actions 1 2 3
4 5 6 7 actions. And your first algorithm will have to take one of them essentially. Let
say it picks this action for some reason now, what is the situation? A situation is that this
rubato is holding this block K that say it pick this action.

It is the now, holding this block K everything else it is the change so how do we define
this new state? Define it by doing this progress action which says that if I am doing this
unstack K J action I must apply this I must remove the negative effects of A. The
negative effects of unstack that ex so on by in this case x is K on y is J. So, K J is no
longer true, on K J is no longer true; arm empty is no longer true, because you are
holding K. And clear x which is K is no longer true and then I must add the positive
effects of this action. The positive effects of the action he has it has A holding it and wise
clear. So, this effect would be present at fact you have holding K and. Because you have
lifted it K away from this j will become clear so clear j will be added to the state. So, that
is the progress action that you have to implement to get this new state K. Now, you can
see that having done this action these actions are still applicable sorry actually they are
not. But they would become applicable the moment arm becomes empty this moment is
arm is not empty arm is holding K.

So, you do only action you can do with this one arm robo unfortunately super case some
were either put it on the table or put it on the one of the block. From this we can illustrate
by saying that from here the actions are either put it on this are either put it either put it
on this and so on and so. So, I will have a clause a state space here it 2 difficult to draw,

664
but you basic you gt the basic idea that you move from state to state. And whichever
state you looking at you affair the move gen function which is looking for all the actions
which are applicable pick one of them and move forward essentially. So, this
representation mechanism allows a rule of overall search you have doing which was we
listen even name to it. We are not is that given a name to this we have only set forward
stack space search, because we have not talked about what does strategy is now you
could do that first search or you could do or you could even do did or some something
like that. You could do anything that we were doing here essentially so what is the
difficulty with this? What is the drawback of forward stack space search?

If you look at this given start state and that given goal state can you observe what this is
doing wrong? And when you make this observation you must keeping mind that you are
not solving the only the blocks whole problem. We are writing this as a generic
algorithm of a our stack space search in which we can plug in the domain with all it set
of action on a operators and all the predicate it is not goal riven essentially. And in fact,
you look at the I properly drawn this like this, because the 3 blocks A B and D that you
want in the goal to be true or here in the left hand side. And this right hand side is
actually will does not care not of course the real world is like that if you are let us say
planning for yourself what to do next? Then if you what simply try to describe what is
true at this moment in IIT.

You would have 100 in 1000 of statements in something who is there in the office, who
is there in the lab, who is there in the canteen what is happening in this class is going on
their this class is going on 100 in 1000 of statements. You would have and then you 100
of 1000 actions would be possible essentially when we look at this particular problem we
can see that is no point in picking up A or I or K or L or M. Because they are all go on to
help us achieve the goal where as something should be done on this part essentially. So,
the problem is forward stack space search and this arises from the fact that state
description can be large, the number of applications can be large is that there is a large
branching factor. The number of actions which are applicable in any given state is thus
too many to we consider by a grout force approach and that is true in any real world
situations value essentially.

665
(Refer Slide Time: 30:22)

How do we address this problem? One approach is heuristic search and that is something
you will again not explore in this course. Because we do not have enough time and you
have most welcome to come for the planning course you entrance it. And when we talk
of heuristic search here we are not talking in a domain independent fashion. Earlier we
said for this 8 puzzle this is a heuristic you can use of a blocks world. This is a heuristic
you can use you have number of blocks which are in the correct place. And that kind of
staff, but we are we do not want to define you the 6 domain dependent fashion. We are
interested in talking about domain independent fashions. And the idea of domain
independent heuristic is to solve simpler problems essentially. And the approach that we
use in planning is to solve the, what we called is the relaxed?

So one approach is to solve the relax planning problem and what why mean by that? I
mean that if I am in a start stage S and I have all these possible moves from these stages
that is call them S 1 S 2 S 3 S 4 S 5. Then I solve a relax planning problem going from
this S 1 to goa1l and S 2 to goal and S 3 to goal and S 4 to goal and S 5 to goal. Look at
the solution produce by this relaxed or as the solution is call the relax plan. Look at the
relax plan and see which is the cheapest relax plan may be in terms of number of moves.
So, if S 2 is given in the cheap cheapest relaxed plan then I will say I am going to make
this move from S 1 to S 2 exactly like in the heuristic search that we said earlier. We said
that we will apply the heuristic function to all these things and choose the 1 with sees to
be closes to the goal. Except that here we are literally applying us problem solving

666
strategy to solve from each of this positions to go to the goal. And the seeing which one
seen which is which one seems to be close to the goal.

Now, the first question arises to that you would ask is that computing the heuristic
functions should not be as expensive as solving the original grout force problem.
Because that is the case and what is what is advantages of grout basic function is giving
you in the heuristic function that here seen so for the domain dependent heuristic
functions. We had said that it is a static function you only look at a state and do a
computation of a value. So for example, you do it leaden distance or as we set the
number of blocks and so on so they are they were cheap. In this case, we same the relax
problem in a such a way that it is a easy problem to solve which means computationally
easy to solve and when we say computationally easy time. So if each of the, these states
can be solve can be foes as a relax planning problem and it can be solve in palynology
time. Then we will just have now, K times that time polynomial time will be able to
decide on which 1 to use here.

So, remember we at said that the even in this simple domain planning is hard in fact its
p’s space complete which means it requires polynomial space and time. And here we us
trying to say that we will solve a simpler problem, but try to estimate which of them is
closer to the goal. And notice that this is being done independent fashion how to be do
this? The relax planning problem is basically says ignore the negative effects of actions
and those are called relax actions. So, I just ignore this part essentially and then you can
see that if you that visualize this become monotonic situation. You keep adding new
predicates to the start set you are not deleting any predicates the complexity of planning
comes. Because you are deleting things that you pick up K from J K is no longer on J
essentially. But if I am living on K J relax plan then I monotonically increasing the set of
predicates. And at some point I will get those to predicates in said my said. And then I
will say have solve all the relax planning problem.

So, it terms out you should take my word for it that solving relax planning problems
when the effects negative effects are ignore is polynomial time essentially. And heuristic
search essentially uses that approach. There are 1 or 2 other ways to compute heuristic,
but we will not going that what you will do instead is to focus on doing what we call is a
back word search. We are already works some exposure to backward reasoning we said
that ((refer time 35:53)) for example, (refer time 35:36)) and goal trees that you work

667
from goals back words. But they the representation was tailor to backward reasoning that
we said that this is how you decompose a given goal into sub goals. And then that is a
goals into more sub goals and eventually you have trivially sub goals to solve. Here we
are not changing the representation, but we want to back ward reasoning. So, we need a
few things know again we have to so this machinery will no longer we have applicable.

(Refer Slide Time: 36:30)

We talk about something called relevant actions. So we say an action A is relevant for a
goal g this is an some sense applicable actions. In forward stack space search actions are
applicable in a given state in backwards search back ward we use a terms stack space
search here again. Actions are applicable to goal descriptions and we say that in action A
is relevant for goal g if the following is true let me write. If you the effects plus of A and
g is not equal to 5 that an action has a effect which is required in the goal to be true
which mean so what you have simply saying is that the inter section of the positive
effects of the action and goal must not be empty. It must produce something which is
relevant for the goal then we say action A is the relevant. But is another thing you want
which is that effects minus of A 1 must be equal to 5. There it not produces any negative
effects which you want in the goal to be true. Remember what does the goal is the set of
predicate set a want to be cube.

And you are looking for actions which will achieve the goal essentially. So, we say that
an action A is relevant to A goal if the positive effects of the a has something for the goal

668
which means intersection was goal is not empty and the negative effects of the a is empty
essentially. So, we now, 2 predicates in a goal and you can see that we 2 actions are
relevant essentially. And you tell me what were the 2 actions? So I have moving from
goal to sub goal now, I am saying that if I want this to be achieved and if this is my last
action then I will achieve it. But only I am use I am only using this definition of
relevancy essentially I am not any deep reasoning here and just applying this test to see
what actions are relevant. Anyone wants to venture against hazard against.

Student: Stack A, B

Stack A on B and one more is stack B on D. Now, you can see the principle advantage
that backward stack space search gives us forward stack space search is in the branching
factor. Because backward stack space search is focus on the goal. It is only looking at
actions which will achieve the goal. And in this example you can see the branching
factor is small we have only 2 actions at we can consider and here they were many
actions at we were considered. That is a main advantage of doing backward reasoning
((refer time; 39:52)) reasoning then we have to talk about a regression. So, regress goal g
with action A it means how do you refine the sub goal? So he get g plain h g minus
effects of A, because action A is producing those effects. We do not then to be clue in
the in the previous state or the previous goals description union any one p conditions of a
because that actions must be true in must be applicable in the previous goal state
essentially.

So this is called regression and if you want to now, over this you can see that I must
remove the effects of stack A B which is on A B, but I would be left with on B D. And
now, I need the 3 conditions of those stack action which is holding B. And clear D in the
similar fashion for this cycle regress to on A B is here holding B clear D. You can go in
the stack B on D I am must B holding and D must be clear. So that is why I added it to
the so I have moving backward from the goal to sub goals in some sense and this is the
process I will search backward. The main advantage that I am looking for is lower
branching factor, because I am only looking at actions which needed from a goal to be
achieved. But there is a problem here every one there is a problem like what is the
problem here? So, look at these second action so if you visualize what you are doing?
You have this A on B, B on D and you are saying that may last action would be to stack

669
B on D. To if you try to imagine that you will see that it practically impossible, because
he cannot be holding B when A is on B.

So, if you look at these 2 then they are inconsistent cannot be true at the same time.
Moreover if I want to regress from here I could say stack I could take any goal, I could
pick an action to achieve this goal which is to stack B on D. I could pick an action to
achieve this goal with which will be unstack b from somewhere or pick up b from the
table. And I could cube the action to achieve this goal and that action would be
unstacking something from D of course, unless D is already true. But I can choose the
first one for example, I can say stack b on d and when I regress so this I get those holding
B clear D and then the p conditions of this which are holding A clear it should be clear
B. So, I will have holding A and clear B and this holding B and clear D come, because of
the stack B D action. Again you can see that this is an impossible state which call a this
is not possibly a state whereas this is a state it you can agree.

Because B is on D an you a holding A and you about to stack A on to B that is perfectly


feasible. In fact it will be part of the plan that you will be a looking at. This is not
feasible this says you a holding B and you holding A at the same time and then you have
these 1 arm which is trying to do this essentially. So, backward stack space search has
this advantage that we branching we get a low branching factor. Because it is goal
directed, but it has a disadvantage that what we get are not states essentially may not be
states. Now, this progress action is sound and by sound I mean it is close over the other
state place. In some sense is that when you apply a progress action to a state what you
get is a state?

So, in that is sense its close a other state place the regress action is not sound when you
apply a regress action to a goal. You get a goal description which may not necessarily be
state essentially which may not be possibly part of a state. So, for example, this is cannot
be part of the state how can we having on A B an also holding B essentially that is not
allowed in the rules of the game you have defined you have a one arm robo which should
be holding B. And then so because is holding b nothing on B on top of B so this not
possible. So, how do we get around this problem? So this is not sound so there 2 things
you can do basically is when the will stop this backward stack space search, when we
what you mean by that what is the test that will you use.

670
Student: 3 condition of the actions applicable

You terminate this when that g whatever prime that we have is part of the start state.
Remember that you have regressing from 1 goal to another sub goal to another sub goal
to another sub. If at some points you have set of predicate which are true in a starts set
that you do not have to do any more actions. You are already home is not to speak
essentially so this is a condition that you will use. So, you should verify there is possible
for meet to construct a plan like this using backward stack space search, but this is not a
valid plan. So, I can do a check for a valid plan how to a do that? So what is a check for
valid plan check for a valid plan takes an as arguments a plan pie. Basically takes a
problem description and a plan. And the problem description is a starts states and a goal
description and a set of actions.

So, which this input you can like a small routine which will begin with a start states and I
will not write the detail essential running out of time. It will begin with a start set and
keep progressing with the actions what is the actions pies A 1 A 2 A k. So, the plan is of
k actions you begin with a start set apply the force action. And we have seen that
progress is a sound you will get a new state apply the second action to that you apply the
third action. And till the end you keep applying the actions and then you see whether the
state that you reaches the goal set on not. And you will see that if you construct the plan
like this test will fail that it will say the, this is not a plan why because. For example, this
action will not be applicable at all, because you will ever reach a state where you a
holding both these things together.

So, you can do the check essentially another thing we can do a check for consistency and
proven, because start state may have 100 of facts. So, for example, here we have facts
about F and E and H and G and K there will never be generate it in the back ward face
from there we will never talk about K and J. So it will never figure in the goal
description see what will happen is that you will say I want to stack A on B and I want to
stack B on D. So, first I want to stack B on D and then I want to A on B so stack B on B
add to first pick up B. To pick up B I have to remove A so it will remove A pick it on the
table then pick up B sorry pick up C and remove it on A table so that D becomes clear.
Then it will pick up A and stack on B then it will pick up S and stack into A those will
never come.

671
So, those in the starts it so that is why this subset is necessary. This check for consistency
is basically just a moment. This check for consistency basically proving set saying move
this state do not go back ward from here. Or remove this state do not go backward from
here inset go along this ration. No as you can see in the actual plan this will be the last
section stacking A on B that will necessarily with a last section. The previous action
must be to make this clue or to make this the previous action must be or maybe it is make
clear B true. Once B is made clear and then holding A for that the action will be pick up
A as you shall look at this carefully. You have to first put a on the table then you will
have to pick up a from the table. So thus third last action would be picking up A so there
are parts which are which will yield the plan there are parts which are can be proven.

And then even if you do not have well you have this final check. That after you think he
of solve the problem just see whether what you have produce is the valid plan or not. If it
is a valid plan then accept the solution otherwise continues searching essentially. And the
good thing about this consistency check is at it does not have to be perfect. If it is perfect
it will it will proven this state if it is not perfect it own proved. It you may produce a
series of actions and you will find it not a plan and then you come back and try again
essentially. So, even if you have a in consist in perfect consistency check it will sell work
essentially. So, in the next class, we will try as we keep doing all these time. To look at
an approach which will combine the good features of backward search and forward
search.

The good feature of forward search is that it is sound that whatever acts sequence of
actions it produces are valid plans, because the applicability test is the sound. The good
feature about backward stack space search is that branching factors is a low, but the bad
point is that it produces the poorest states. This is not state at all it is a poorer state and
we have to know do unnecessary extra work to remove the states essentially. So, will try
and see an a algorithm which will search in a backward fashion like this and construct a
plan in a forward fashion like that essentially for this is doing is searching in the
backward fashion and constructing a plan also from the last action backwards. This is the
last action this is the second last section will soon which as you are seen is a not a sound
process. This one is constructing sound plans, but it is doing too much searches you want
to combined the 2 and will do that next class.

So, will stop here.

672
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Lecture - 35
Goal Stack Planning Sussman's Anomaly

(Refer Slide Time: 00:14)

We are looking at planning. In the last class we saw two approaches; one was a forward
state space planning, and the other was backward state space planning. The forward state
space search; forward state space planning; thus, forward state space search, it starts
from the start state and keeps applying actions, till it finds a goal state, considers actions
in forward direction. This one considers actions in a backward direction. It constructs
plan also, in the forward direction, and this one, constructs the plan in backward
direction. So, in that sense, the two processes of looking for actions and constructing a
plan, happens very in a closely coupled fashion. In forward state space search, we start
looking for the first action and then, as soon as we pick a first action, we say this is the
first action of our plan.

In this manner, we construct the plan also, in a forward direction. In backward state
space search, in the likewise manner, we start looking at the last action, looking for the
last action; what could be my last action, and then, also construct the plan in a backward

673
fashion, by saying that will be my last action of the plan, essentially. Now, if you
remember, we had this notion of relevant action here, and the action was said to be
relevant, if the effect of a; intersection goal was not empty, and if it has no negative
effects, which kind of, distracted the goal. We had the notion of a relevant action and we
had a notion of regression. We could regress a goal over an action. So, we would get a
sub goal, g prime, if which is obtained as g minus the effects of a, because we expect that
the actions will; actionable, produce the effects. The whole thing union p conditions of a.
In the similar manner, we had for forward state space search; the notion of an
applicability of an action and the notion of progress. So, a state could progress over
another state. So, you could progress over it.

This regression progress and the notion of applicability and relevance was used basically,
to do both these stars, that looking for actions and building the plan at the same time. So,
the process of building the plan is that you move from one state to the next, and then,
look for an applicable action; then, move to the next state, and look for an applicable
action; move to the next state. Here, you are looking at a relevant action. So, you look at
a goal, find a relevant action, regress to a goal g prime, and try to find a new action at
that point. So, what we had observed then, was that this was a sound process that, when
you progress from one state to another; what you get is a legal state, essentially. So, this
was sound, but this was not sound.

We had seen that you could regress to a set of predicates, which could not have been part
of a state. So, for example, you might have something, like holding a and holding b, at
the same time. That, of course, not possible in a state in which, because we are
considering one arm robot. So, you could; this process of regression was not sound, in
the sense, it was not closed under the set of states. You could start with a possible state
and you could end up with something, which is not a state, whereas, this was sound and
you would always end up in states, which is why, when we did backward state space
planning, we said one of things to do is that after you found a sequence of actions, check,
whether it is a valid plan or not, before accepting it. So, this was a plus point of forward
state space planning, and this was corresponding negative point of backward state space
planning. On the other hand, in forward state space planning, we had large branching
factor.

674
Because the state was a complete description; it may have hundreds of facts. There may
be hundreds of applicable actions. Forward state space planning would consider all those
hundreds actions, and choose one of them, essentially. Backward direction search had
low branching, and the reason for that was that we were focusing on the goal; we are
trying to see, what we need do to get the goal, predicates into our state, essentially. So,
this was a plus point for backward state space search. Today, we want to look at an
algorithm, which combines both these features. So, what do we mean by this? We want
to look at an algorithm, which will consider actions from the goal point of view, in a goal
directed fashion, but it will construct plans from the starting state to the goal state,
essentially, which means that we will be benefiting from the low branching factor of
doing goal directed search, and also, the soundness of constructing a plan in a forward
manner, essentially.

You should ponder by little bit over this, as to why is the progress action sound, and the
regress action not sound, essentially. So, the actions are not symmetric in that sense,
essentially; you cannot prove both ways. These are sort of an arrow of time, which says
this is a precondition, and this is a post condition. So, you can only construct plans by
looking at pre conditions and making post conditions.

(Refer Slide Time: 07:38)

675
The algorithm that you want to look at today is called goal stack planning. It is actually,
one of the earliest planning algorithms devised, and was in fact, used in the skips
program, which was used to control the robot in Stanford that we have spoken about,
essentially. The general idea of goal stack planning is the following. What I will do is I
will give a high level description of the planner, and then, we will look at an example in
a little bit more detailed, essentially. So, as the name suggests, this uses a stack to do the
reasoning, and we do the following that; let us also consider an example, along the same,
at the same time. So, let us say that this is an example. Again, we have resorting to the
blocks while, because we are familiar with it, but you must keep in mind that these are
general domain independent algorithms that we are considering. This is a starting state
and I am not drawing anything, which is relevant. You can imagine that there are 50
blocks, which I have not drawn here, which will not interfere with our plan. So, we just
want to focus on the planning actions today.

(Refer Slide Time: 09:19)

The goal state is that; let us say is that you want a on b, and you want b on d, essentially.
You actually, do not care what else is true, essentially. So, the goal is on a b, and on b d.
As long as these two predicate are there in my state description, I would call that a goal
state, essentially, which means, as long as a is on b, and b is on d, that whichever state it
is, is a goal state. You can think of this as a set of states in which, this part is common.

676
Everything else can be in some manner, essentially. So, this is basically, a set, and the
algorithm that we are looking at, will do the following.

(Refer Slide Time: 10:22)

It pushes; you start of a pushing the goals that you want to achieve on the stack. So, the
top of the stack will always, contain the goals that you want to achieve, essentially.
When I say goals, I basically, mean the predicates of the goal, essentially. Now, let me
do this here, and let me write the algorithm, here. So, goal stack planning; push goals on
to stack; then, you pop the stack. There are various things that can come out of the stack.
If it is a predicate; I will use very loose language here. When I say, if predicate, I mean a
statement like; on a b, or on b c; or holding a; or some such things. Then, there are two
possibilities; one if true, which means, it already holds in the world; then do nothing.
Else, it is not true; push an action on to the stack. So, this is a basic process that this
algorithm follows. The stack has, you are pushing these goal predicates on to the stack.

In our example, we push these two things on to the stack; then, you pop. So, it is the
alternates between push and pop; you pop the stack. If it is a predicate that comes out,
then you check, whether the predicate is true. If it is true, then you do not have to do
anything. If it is not true, you push an action on to the stack, essentially. What action?
You should say the element action; here, define the notion of relevant action.

677
If it is not a predicate, it must be an action there. Only two kinds of objects in a domain,
either predicates or actions; if it is an action; I forgot one thing. You push the action on
to the stack and push. So, let us say this action is a; push pre conditions on to stack. You
first, push the action. Then, you push the pre conditions of the action, and you can get
some intuition here, that you are pushing an action. Then you are pushing the pre
conditions, and then, you will look at the top of the stack, and there will be the pre
conditions of the actions. If they are true, then this thing will happen; you will not do
anything; you just remove them. Eventually, if the action comes to the top, then you will
say, yes, I have found one action, essentially.

There is an extra step, which is, push each predicate on to the stack, essentially. So, this
part is that, if then part, then for this if, we have these else, pop, action, or we have
already done the pop. So, add action to the plan, and by this, we mean that a plan
becomes plan followed by dot where, the dot operator is a concordinate operator, which
says that you take the plan and add the action at the end of it. So, you found the next
action, essentially. So, this part, that you are talking about, it constructs plan in a forward
direction, is taken care of by this operator, essentially. The new plan is an old plan with
the action at the end. Initially, of course, old plan will be empty. The moment you find
the first action that will go into it, and then you find the next action that will go into it,
and so on and so forth. So, this is a high level algorithm for goal stack planning. Let us
see how it actually, executes this. We will, sort of, try to simulate for this small problem;
what goal stack planning does? Before I do the simulation, let us make an observation as
to what this is really, doing. It is taking a set of goals; the pre conditions. Every action
has a few pre conditions. So, it is a set of goals, or set of sub goals, you might want to
say, and push it on to the stack.

Then, it pushes each predicate of the pre conditions on to the stack. We should do the
same thing here, essentially; push each predicate. We will see the usefulness of this step
in the example that we see, but the important thing to note here, is that it is taking a set of
goals, or a set of sub goals; we use the term goals and sub goals, interchangeably. The
initial goal is the only final goal. Everything else is a sub goal, but we tend to use the
term goals, also for that, essentially. When we have a set of goals to solve, for example,
in the pre conditions of an action, we put them one by one into the stack, which means

678
we have serializing the goals, sub goals, essentially. So, this is saying; serializing the sub
goals. In effect, we are saying, we will first achieve one sub goal. Then, we will achieve
the next sub goal. Then, we will achieve the next sub goal, and in that fashion,
essentially. So, we have, in some sense, if you look at what A star did, it also said, I am
breaking up a goal into sub goals, and I will solve each of them independently. This is
doing that, but it is also imposing an order in which, you will solve them. So, that is why
we use a term; it is serializing the sub goals, essentially.

(Refer Slide Time: 17:44)

So, our initial sub goals are those two, on a b. I will just use, I will not use brackets, just
to make it short; and on b d, and I will go by stack downwards, and I hope that you get
used to that idea. When we pop stack, the stack we will just put a line across, to say, that
element has been popped. So, you must visualize this stack going down. Let us say that
we put these in this order that we say, you achieve on b d, and you achieve on a b. So,
this is the bottom of my stack, and my stack is going like this. So, whatever done, I have
pushed the two, I have pushed the goal, which is these two elements on to my stack, and
then, I pushed each predicate in some order. So, we are not saying in what order; we
saying in some order; push them in to this stack. This is a place where, you can, sort of,
try to think of heuristics; what is the good order of pushing things. So, this is a goal
given to us.

679
I want to emphasize again, that considering of actions is done in a backward fashion. So,
we are now, only trying to see what actions will achieve these goals, which is exactly
like, what backward state space does; except for that backward state space says that
moment, for example, if you look at on a b, it will say the last action must be stack a b;
stack a on b. It starts constructing the plan also, in a backward fashion. We will not do
that here; we will wait a little bit more patiently, till we are sure that whenever, we add
an action to a plan, its pre conditions are true. Backward state space planning does not
look at pre conditions at all. It only looks at the relevance of an action. It says if an action
is relevant, it could be the last action, and we saw that this leads to the trouble that the
plan construction process is not sound. So, we have on a b, now, and we go to the pop
cycle.

So, we push this out. This is gone and then, we have this condition. It is a predicate and it
also, happens to be true, in my given state; you look at the value in a given state. It is true
so, you do not do anything. Then, you pop the next thing out on b d. Remember, this was
popped out first, and now, we are talking about on b d. So, let me, sort of, use an arrow
to denote that we are considering this; just for our sake. That is not true; on b d is not true
in my, this state, and therefore, I push and action, which will make this true. So, the
action that makes on b d true is stack BD. So, let us say we use this arrow to depict the
fact that we have pushing an action, essentially.

So, you push an action and we push the pre conditions of the action. What are the pre
conditions of stack? I will you short forms; h for holding; holding b, and clear d,
anything else; you remember these preconditions for stack. You must be holding b, and d
must be clear, I think that is about it. Then, I have pushed the individual actions. While,
we are doing this example, we will use a simple heuristic, we will assume that the
holding action is the last action we want to do; last goal we want to achieve. Remember,
these are two goals. If you can, just to recall, this is a, let me put brackets here; that we
want holding b to be true, and we want to check, whether d is clear to be true, and we
will push each individual action.

So, the first action, the first predicate we push, will be the last predicate we will check,
and let us use this heuristic between ourselves. In practice, of course, an algorithm may

680
have to back track and try the other option or something like that. That we will check for
holding b later. First, worry about; let us worry about clear d, essentially. So, this is a
push space. In a push space, everything gets pushed; the action and its pre conditions,
and individual goals in the pre conditions. We will refer to them also, as goals, because
this is a goal stack now, essentially. So, we know push, we know pop this c d out.

That clear d is not true, but we must insert; we must push an action, which will make
clear d true. So, this is a situation; c is on d. So, we can use an action unstack. So far, we
are doing backward search c d, and then, the preconditions of unstack c d, which are that;
on c d must be true; and arm empty must be true; and one more, clear c must be true.
Then, these individually, again, in some order, let me choose arm empty as the last
predicate. Intuitively, I just want to reduce some amount of extra work we want to do
here, but this is a matter of choosing heuristics. So, everything is pushed here, like this.
Basically, this is a cycle; I have not mentioned it here, but this whole thing is in to a
cycle here. Then, you go and pop this clear c; now, clear c happens to be true in our
world. So, we do not do anything. On c d also, happens to be true; on a also, happens
true, and it is not a surprise in particular case; that the conjunct of all three on CD, an
AE, an CC happens to be true.

So, we remove this from the stack, and now, in the next pop, an action comes out, which
is this last part of the algorithm, which says, if it is not predicate, it must be an action,
and add action to the plan. So, this becomes our first action; unstack c from d. The world
has changed now. The world is, I am holding d. Whenever, I look at a predicate, I must
look at this world. Now, you will notice that when I am talking about actions, I am going
in the forward direction. This was the given start state, and this is the first action that will
be there; part of my plan. The first action will be unstack c from d, essentially. So,
everything that we do here will be sound, essentially.

That also, does not lose sight of the fact, that we are considering the actions in a goal
directed fashion. We started off by saying that what is necessary for making on a b, on a
b true, on b d true, and then, we said to make on b d true, you must do stack b d, and
then, we discovered that to do stack b d, we need to do clear d, and to do clear d, we can
do unstack c d, and we find that we are able to unstack cd, and so, we put that as a first

681
action. So, this signifies the plan. So, that is gone now, from the stack. Then, it has got
holding b as a next action. Holding b is not true in this world; you are holding c,
essentially. Let me grow the stack from here, that when I pop holding b out, I am forced
to insert an action. So, I have a choice here. Notice that to make holding b true, I can use
an action; unstack b from something, or I can use the action; pick up b, essentially. We
will assume that we have some non determinism going on here, or you could look at the
state and try to decide, which of those two actions is a relevant action? So, we will
assume that somehow, we are used pick up b.

So, the stack is now going like that, and along with pick up b, the actions, which are arm
empty, and on table b, and clear b. So, let us say I look at them in this order or in the
same order; on table b and clear b. So, I have pushed this action and it is preconditions.
Then, I pop the top of the stack. Remember, the top of the stack is actually, at the lower
end of our list; I have popped this. Clear b is not true in the world that I have here. So, I
must insert an action, which is unstack something from b, but we will assume that we
have figured out that it has got to be a from b, and the preconditions for that are on a b,
and arm empty and clear a. So, let us say arm empty, on a b and clear a; clear a is true.
So, I can remove it from the, pop it from the stack. On a b is also true. So, I can pop that
from the stack, but arm empty is not true, because this is the world that I am looking at. I
am moving forward from here. I am holding c.

So, I must make arm empty to make arm empty, I insert an action, put down c and the
preconditions for that are holding c, and that is all. So, you pop this and you pop this, and
this becomes a second action; that you have put down c. So, now, the world looks like a,
which is the world. I have done two actions. One action I have done is unstack c from d,
and then, I was in this state. Then, I have put down c, then I am in this state; that is a
second action. Then comes this conjunct, on a b is true here, arm empty is true here, clear
a is true here. So, I can remove this, and then, I can pop this; this becomes the third
action. Now, the world looks like, you are holding a. The rest is all on the table. So, this
is that.

Let me label these states. This is a state after action one. This is a state after action two.
This is a state after action three, which is unstacked a b. Then, the next thing on top of

682
stack is, on table b. I pop that and I see that is true in this state. Then, arm empty is not
true. So, I have to achieve arm empty. Let me start here. I need, I remove arm empty of
course, and then, say put down. So, I am holding a. I need put down a. Again, there is a
choice, which I am, sort of, skinning over here. The choice is really, that either, I put a
down, or I put it on b, or put it on c, or put it on d, essentially. May be, you can do a little
bit more sophisticated reasoning here, but I am, sort of, just to illustrate this, I am just
saying that we have something, like a nondeterministic choice happening, which means
magically, we are making the correct choice; this is to put down.

For which, you must be holding a. So, you can do this, and this becomes the fourth
action. After the fourth action, everything is on the table and the arm empty. So, I must
go back by a stack I have; I find this conjunct here; arm is empty, on table t b, and clear
b; everything is true. So, I pop that, then this becomes my fifth action; pickup b. So, at
the end of fifth action, I am holding b, and a c d are on the table. So, this is gone from
my stack, and this is where, we had taken off. Now, you are holding b and clear d. You
can see that in that fifth state, both are true; you are holding b and clear d is true. This
goes off. Now, we have the sixth action coming out. The moment an action comes out of
the stack, we know that it is applicable. Why, because we have just popped their
preconditions; pre conditions must be true, essentially. So, it must be applicable. So, this
is the sixth action; stack b on d. So, this is how it looks and arm is empty.

Now, observe that in a manner of speaking, we started off with two sub goals; on b d and
on a b. We decided to do on a b first, and in this case, it was already true in this state. So,
we do not have to do anything, but as you can see that was a right choice, essentially. If I
have to achieve, if we look at the goal state, which is that a must be on b, b must be on d;
you can see that the way to achieve the goal is to first, achieve on b d, and then, put a on
top of the stack that you have, tower that have constructed, essentially. We choose an
opposite order, and we ended up finding a plan, which is the six step plan, which says
that you unstack c from d. So, this is the state. Then, you put down c. Then, you unstack
a from b. Then, put down a. Then, you pick up b, and stack it on to d, which is what, this
did, essentially.

So, on the surface, it looks like we have achieved both these goals, but if you look at this

683
state, when we achieved the second goal, which is on b d, which is what we were doing
all this while, and as a result of which, on b d is true here. We have undone the first goal
that we had done, essentially. My first goal was that a should be on b. We started off
with a on b, but by the time, we finished on b d that, a now, lying on the table. So, you
can see this is the reason why, we have added both the conjunct of the goals as well as
individual goals. So, we are saying we want achieve this conjunct, but we will do it
individually, will serialize the sub goals; we did this; then, we did this. Then, we found
that in this sequence, we, somehow ended up, undoing some of the goals. So, when I
popped this out, I will find that this is not true.

So, I will insert both the goals again, into the stack. So, let us say if we inserted in the
same order here; that I insert on b d and on a b first; first, on b d, then on a b, which
means I am first doing on a b and then, I am doing on b d, as I did in the last time,
essentially, but now my starting state has changed. That is my starting state. So, I will
not go into the stack, because we do not have a space left on the board, but you can
imagine that to achieve on a b, we will do the same thing; stack a on b. To stack a on b,
you must be holding a. To be holding a, you must pick up a. So, you pick up a and stack
a on b; these two actions, you will end up doing. So, you will achieve on a b.

Once you achieved on b, you will go back to on b d, but this time, on b d is already true,
because in that state, as you can see, it is already true. Only thing you are doing is in the
seventh and the eight step, you are picking up a from here, and stacking it on to b. So,
this is a final state that you are looking at; a is on b and b is on d, essentially. So, both the
sub goals are true, and then, I am finally, able to pop the goal, and that is a terminating
criteria. If I can pop the goal and come up an empty stack; that means, I found a plan for
solving my problem, essentially. To emphasize what goal stack planning does, it does, it
considers plans in a backward fashion. It looks for actions in a backward fashion by
putting the goals that you want achieve, on to the top of stack, starting with an empty
stack of push, and it always, looks at the goals set on the top of the stack, which means, it
is doing backward reasoning, but when it comes to constructing a plan, when it comes to
saying that this is my first, these are my actions; it starts off by choosing the first action
first.

684
So, if you look at this plan, this is a first action. Even, when you want to actually,
implement the plan, you want to first, unstack a from b, sorry, unstack c from d; put it
down on the table. So, it is doing, it is constructing the plan like a forward state space
planner. It is looking for a plan like a backward state space planner. So, it is taking the
advantage of both the things. It is only focusing on the goal by looking for a plan, or it is
making sure that when it is construct a sequence of actions, it is a valid plan, because it is
doing it in the forward fashion. In the process, it ends up serializing the goals, but we
have to be extra sure that we do not disturb the goal; so that, we add this whole thing or
doing this extra thing, all over again, essentially.

Now, it turns out, and I will leave this as a small excise for you, is that if I had
considered them in the opposite order, which says that first, do on b d, then, on a b,
which inside inverted the order in which, I push up in to the stack. First, I would have
done on b d, which amounts to everything that we have done here, and I would have
ended up in that state. Then, I would have done on a b, and I would have just picked up
the a; this a, and put it on to b. So, there is an order I can choose in which, I am not
undoing the work done for the solving the previous goal. This particular order, I am
undoing the work. Of course, I did not have to do any work to achieve on ab, because it
was already true, but imagine that, a was on the table here, or something like this, and
then, I picked up a, and put on to b. Now, I would have undone the work that I have, I
am doing, essentially. So, there is an order in some cases, essentially.

685
(Refer Slide Time: 40:46)

Now, interestingly, it was shown by a guy, called Sussman, that it is not always possible
that such an order may be found. What order am I talking about? I am talking about an
order of serializing sub goals; so that, there is no disruption of previously achieved sub
goals, essentially. So, this particular example is known as Sussman’s anomaly. If you
just search on web, you will find this example.

The interesting thing about this is that he shows that, this kind of planning will not
always work; well, work in the sense, without doing this extra work, essentially; because
we are serializing the sub goals, we also call this as linear planning. I will achieve one
goal, then I will achieve the second goal, then I will achieve the third goal, and so on; I
will solve goals in a linear fashion, essentially. So, I serialize the goals, essentially. What
Sussman showed was that there are examples where, you just cannot serialize the sub
goals. The example is quite a simple one. This is a start state; c is on a and a is on b. The
goal state is a on b on c, which is very similar to that, essentially. Let me just, for the
sake of illustration, call this d, to make it identical to the goal state that we just looked at,
which means, this whole exercise that we did, will also hold for this. Of course, except
for the start state is different, but the main point is that I cannot think of two goals,
achieve on a b, and achieve on b d. Goal stack planning is forcing me to serialize the sub
goals in some order, and what sussmans showed was that you cannot serialize the sub

686
goals.

Let us see what happens. So, let us say you first, achieved on a b. I am not going to the
process, but we are just. To achieve on a b, what will you have to do? You will have to
pick up unstack this d from a, put it down somewhere, then we will have pick up a, and
stack it on to b; these four actions will achieve on a b, and goal stack planning will do
that. You should try it as an exercise. So, a will be on b, and d will be on the; and arm is
empty, and then; that means, you have first, done on a b, then you have to do on b d.
Now, if you do achieve on b d, you can see something very similarly, happening. You
will unstack a from a; unstack a from this stack, put it down from the table, pickup b and
stack it on to d.

So, what would you get is d. What we have shown in this example? As an exercise, you
should fill in the details and show how block, this goal stack planning will actually, do
this? When you first, achieve this, then you achieved this. So, when you achieved this,
this is true. When you achieved this, this is true, but this is a goal, and this is not a goal
state, which means, to achieve this goal on a b and on b d, I cannot, at least, this order is
not correct of doing things. Of course, I can do extra work; pick up this a and put it on d,
but that means, I am somehow missing the correct order, if there is one. What Sussman
shows was that there is no correct order. So, let us try the other order.

You can achieve on b d first, which is very simple. You just pick up b and stack on to d.
So, you have achieved on b d. Then, you achieve on a b. What happens; you have to
unstack b, put it down; unstack d, put it down; pick up a, put it on to b. So, you would
get a b d. Again, you can see the other order also, does not do the task. None of these two
paths leads to the goal, essentially. Of course, you can do extra work; that is a different
matter, but we cannot take these two goals individually, and say, I will solve the first
one, then I will solve the second one, and my task is done. I could do it here, if I change
the order of this goal. If I have done on b d first, and then, on a b, then I would have
solved the task in a serial order, essentially. What Sussman showed was that there are
these non-serializable sub goals, essentially. That in many problems, goals are not
serialized with.

687
So, that is a problem with this kind of planning, which we will also, call linear planning,
because we are serializing the goals and saying, I will do this first; and I will do this first;
and so on. Of course, this is something that we have observed earlier, in other situations.
For example, when we talk about solving the rubrics cube, then if you say I will do the
top surface first, and then, the middle layer and then, the lower surface and then, by the
time you finished the top surface and while, you are doing the middle layer, in the
middle, you upset the top layer.

Of course, those who know the solution know, how to get it back, but that is like doing
an extra work, essentially. So, rubrics cube is the typical example of a goal, which is
fundamentally, not serializable, like this problem, which means that there is no way that
you can achieve the first goal, and not have to achieve it later again, essentially. Such
problems are called non serializable sub goals, essentially. So, in the next class, we will
look at an approach, which some people call as non-linear planning, which allows us the
possibility of solving this kind of a problem, optimally. What do I mean by that? That, if
you just think about this problem, this Sussman’s anomaly; the best way to solve it
follows.

You unstack d, put it on the table; that is two actions. Then you pick up b, put it, stack it
on to d; that is two more actions. Then, you pick up a, stack it on to b; that is six actions,
but neither of these paths is going to give a plan with six actions. Of course, they will
even eventually, achieve the goal, but this will have to do two more actions here, and this
will have to do at least four more actions here, essentially. So, I cannot find optimal plan,
essentially. In the next class, we will look at an approach where, the possibility of
finding a optimal plan is kept open, essentially. You can see that; to find an optimal plan,
you have switch between goals, in some sense, that when you start by putting d on top of
a b. For example, when you start to do this, then when you put d; what you do? You
unstack d, and put it on the table, and you want to achieve a on ab. Then, you want to
certainly, realize that if you stack a on to b, you would not be able stack b on to d. So,
you abandon that goal of achieving on a b, and switched to the other goal of achieving on
b d, in which case, of course, you will find optimal plan, but goal stack planning, because
it serializes the sub goals. It says, I will completely solve my first goal and then, go to the
second goal, is not able to do that, essentially.

688
So, we will stop here, and in the next class, we will take up this non-linear planning.

689
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Lecture - 36
Non linear planning

So, in the last class we looked at goal stack planning and one of the observations we
made was that, it essentially does backward listening, and it takes a set of goals to solve,
serializes them and tries to solve them one by one essentially. And then we saw that like
suspense showed that there are certain problems where you cannot serialize the sub goals
and solve them independently and arrive at the final solution; you may have to do extra
work to, finally, come to this thing.

(Refer Slide Time: 00:47)

So, today we want to look at an approach which is called nonlinear planning and it is
named and it is known by many other names. So, lot of people have looked at it also
known as partial order planning which kind of stands for the fact that the plan may be a
partial order of actions not necessarily a sequence of actions. So, far we have said in the
simplified world that a plan is a sequence of actions; we do action a 1, then we do action
a 2 and then we do action a 3 and so on. But this allows us to have actions, plans or

690
recognized plans as a structure which is a partial order, which means that you are not
committing to the order of actions essentially.

So, to take an example, supposing you have to wear shoes for going out or something,
then you have four actions to be done which is, for example, wear left sock, wear right
sock, wear left shoe and wear right shoe essentially. Now it is not necessary that you
choose a particular sequence of these four actions. The only thing that you have to do is
to wear the left sock before you wear the left shoe and wear the right sock before you
wear the right shoes. So, as long as you satisfy this constraint of this order, then the rest
can be done in any order. So, in that sense, this kind of planning is also known as least
commitment planning.

And by this we mean that in the process of planning, you do not commit too much
essentially; you only commit and we will see what are the kind of commitments you
have to make as little as possible as little as necessary to solve the current whatever sub-
problem that you are solving. This is also known as plan space planning, and again we
are familiar with this notion. We have talked about solution space planning; we saw, for
example, that when you want to solve SAT like problems or TSP like problems, you
construct candidate solutions and do some perturbation in them or we also saw that when
we are doing branch and bound with TSP that you work with partial solutions.

That you initially start with a set of all possible to us and then gradually partition them
into subsets eventually homing on to towards you are looking at. So, this is in that flavor;
it is in the space of possible solutions essentially. So, plan space planning basically
works with partially specified plans, and there is a refinement operator step. And there is
a refinement step which refines plans in the sense that you specify them more and more
essentially. Now a partially specified plan or the partial plan is denoted as a four topple
which is a set of actions or operators A which are part of the plan.

So, we are not saying anything about where they lie in the plan; all you are saying is that
these are actions as part of the plan. And these actions may be partially instantiated
which means instead of saying, for example, unstake a from b, you might say something
like unstake x from b which means that you basically want to achieve clear b, for

691
example, and if there is something on b, then you want to unstake it essentially or you
might also say unstake a on x, for example. So, partially specified actions may be
allowed essentially, then there is an ordering relation which basically says that action a
happen before action b.

So, we will specify order explicitly as supposed to implicitly in the planning that we
have been doing so far. So, the planning that we have been doing so far can be seen as a
state space planning. And so if you remember states space planning, so you combined
selection and placement of actions; of course, this is more true of power state space
planning and backward state space planning and little bit lets true for goal stack
planning. But in forward state space planning and in backward state space planning, a
forward state space planning you says this is s; this is by a 1. I go to some explain and
then I choose a 2 and I will go to s double prime and so on.

In forward state space planning, the twin task of choosing actions and placing actions in
the plan are done simultaneously. They are sort of so closely coupled that you cannot
distinguish between the fact of choosing and scheduling them, because the process of
looking for action says that I am looking for the first action, then I am looking for the
second action, then I am looking for the third action. So, both the things go together.
Likewise in backward state space planning essentially; you are choosing the last action,
then the second last action then the third last action and so on essentially.

In nonlinear planning or partial order planning, we do these two things separately. We


sort of distinguish between the fact that an action is necessary for necessary part of the
plan, but we do not necessarily say where the action should be, okay. So, for example, if
you are planning a trip from here to Mandy, then you might say, okay, our necessary part
of the plan is to somehow get from let us say Chennai station to Delhi station; let us say
you are going by train. So, you have decided what the action is, but you have not stated
when that action will take place in your plan in a sequence of actions that.

You will say that somewhere in my plan, there must be this action of catching a train
from Chennai to Delhi. So, this is the flavor of partial order planning or plan space
planning or least commitment planning is that you have said that, okay, I need to take a

692
train from Chennai to Delhi, but I have not said at which point if because eventually my
actions will be sequences of actions at which point will I actually be taking this train
essentially. Likewise, another action might be booking a train ticket essentially. Now I
could do it at any time before the journey; assuming, of course, that we get tickets, but
this is not always the case but still.

So, this thing about combining the selection and the placement of action is forced in state
space planning. Because you are working with states in plan space planning, we are
working with the notion of these partial plans. And we can say that these two things are
dependent of each other. Of course, eventually we will have to impose certain order on
the actions, but when we are talking about plans, we say these are the actions in the plan
and some actions may have certain ordering which you will state in the ordering relation.
And this basically says that a happen before b and so on and so forth.

Then we have a casual relation; let me just use the terms c here which I will come to in a
moment and then we have a set of binding constraints. So, what do I mean by this? I
might say at some point that an action stack a on to x. So, I will use this lower case for a
variable just for a moment is part of my plan. I might even say that this action comes
before something and happens after something and so on and so forth which I will say
here, but at some point I might say that x equal to b here. If I add this extra bit of
information, then I have added something more to the plan which says that this action
must put a on to b essentially or I could even say x not equal to b.

I could add any kind of a constraint binding constraint to variable essentially or I could
say that x belongs to some domain of let us say block set I have; let say blue block set or
something like that. I could do all this sort of things; I could add extra constraints which
tell you how the variables are bound essentially. Here we are talking about casual links,
and the general idea is that this is a triple where you have an action a; you have a
predicate p, and you have an action b. And what you are saying here is that a produces a
predicate p and b consumes a predicate b predicate p essentially.

So, for example, p could be holding a, and what is there action which might do that? It
could be something like unstack a from y, and there might be another action like put

693
down a. This forms a triple and we see this as a causal link between two actions a and b;
in this case unstack a from something and put down a. And we say that unstack a from
something is providing a predicate or producing a predicate holding a because that is a
effect. So, p is a P belongs to effects of a and p belongs to the pre conditions of b. So,
there is a predicate like this if you have expressed the predicate like this as such.

So, all these are sets, sets of actions, sets of ordering links, set of causal links and sets of
binding constraints, and this partial plan is represented by a portable like this. And if you
have asserted a triple here which says that a p b, we are saying that in our plan a
produces a predicate p and that p is consumed by b essentially. So, obviously, the
moment I add such a causal link, I must also add an action that a happen before b and
you can say that this either happens implicitly or you must do it explicitly. So, this is a
structure of a partial plan; as you can see in this partial plan structure, I may specify
something; I may not specify other things.

So, I may have let us say six actions and I may have ordering relations only between a
few of them and causal links between a few of them and binding constraints for some of
them essentially. This is in keeping with this notion of least commitment planning, and
what we are saying here is that we will add only as much to the plan as it is necessary for
solving the sub problem we are trying to solve. So, if you go back to suspense anomaly,
in goal stack planning, you are committing to solving one sub goal first and then the
second sub goal. And then you are committing to ordering them in the particular way
that you are doing; we do not want to do that. We will try to do add constraints only
when necessary. So, in that sense its least commitment.

A word about plan space planning; so we are not working in the space of states anymore.
We will start with some initial plan and we will apply the refinement step to get new
plans out of that. The search space contains only plans or partial plans and partial plans
are described by this portable essentially. So, eventually what we want to do is look at a
partial plan, choose a refinement step and there may be more than one refinement step.
So, some of you have to select one; come up with a new partial plan and do this
repeatedly. Choose the refinement step; come up with the new plan and so on. So, one
question that one might ask is when will one terminate this.

694
Now one thing that I would like it to sort of ponder over and think about is the following
that even if I have a small state space; let us say I have these three blocks like in
suspense anomaly a b and c. There is only a finite number of ways in which you can
arrange this c blocks. So, the set of states is finite essentially, but even for such a domain,
the set of possible plans is infinite; maybe we will come to that, but I want you to think
about this a little bit, okay. So, what is the initial plan? The initial plan this we will call
as pie zero; we will always have an initial plan. And the initial plan has two actions a 0
and a infinity.

It has one ordering constraint which says that a 0 happens before a infinity, and it has
nothing else. This will always be our initial plan. Remember, keep in mind that states do
not exist as far as this planning algorithm is considered; we are only looking at plans or
partial plans essentially. So, now the objects that we are manipulating are only plans
essentially. So, of course, one question will arise us to when do we terminate because we
can no longer say that apply a goal test function to a state, because we do not have state
anymore in our this thing. We will have only plans essentially.

Now what are these actions? A 0 has no preconditions and its effects is equal to I just use
this notation to start it. So, of course, every planning problem will have a different a 0
action and a 0 action simply produces the start state essentially its effect. When I say
this, it is basically a set of predicates, okay.

695
(Refer Slide Time: 18:50)

For example, in the suspense anomaly, we said this is C A B; this is the start stage. So,
we have an action; let me draw it here, whose preconditions is the empty set and whose
effects are these things on C A on table A on table B, clear B, clear C and A. So, this is a
0 action. As you can see every time you define the planning problem. So, what is the
planning problem? A planning problem is a given set of actions. If you are looking at the
state space project then the set of states in a start state and a goal state and a set of action
that you can use to generate the moves essentially.

So, now, we have said that we are doing away with the notion of states at all. So, instead
of the start state, we have a start action and what the start action says is there is an action
whose effects are these precondition this start space predicates essentially that somehow
produces that.

696
(Refer Slide Time: 20:33)

And if our goal is to have A on B on C, then my a infinity is an action which has no


effects but whose preconditions are what I want to achieve. So, on AB and BC; this is a
infinity. So, this is my starting node in the search space of partial plans, and this is a
partial plan which has two actions. One action which produces the start state, other
action which consumes the goal state and I have specified anything else except that the
start stage happens before the goal stage which we have said here. The start action
happens before the goal action or a 0 happens before a infinity.

So, if you now try to remember, when we are talking of TSP and we say that we partially
specify we say that we have only one edge specified which will go from Chennai to
Bangalore. Then we say that everything else could be anything. So, it was a set of
solutions. Likewise, this partial plan pi 0 can be seen to stand for a set of plans in which
this is the first action and that are the last action. Now intuitively we can see that we will
have to somehow make the connections between these and the kind of connections that
we will need to make is that how do we generate the. So, I must have some action.

So, a little bit like backward state space search; some action I must have stack a on b
which will have an effect on a b which I will link to this essentially. So, this is like
saying that I have specified now one more action in my plan that I must have the stack a

697
b action, and along with specifying this, I am going to specify the few more things. For
example, I would say that there is a causal link between stack a b and a infinity and stack
a b is producing this predicate on a b which is being consumed by infinity. So, I must
establish a causal link.

And one of the things that we would want to do in partial order planning is to somehow
add causal links and see that they are not disturbed later essentially and various people
have tried various approaches to that essentially. So, this is the causal links which is
doing that essentially. So, let me say this.

(Refer Slide Time: 23:44)

So, let say you unstack a from let us call this b here in the different planning problem and
then you want to generate this holding a and you want to link it to put down b put down a
essentially or instead of putdown a, let me choose a different action which is to stack; let
us say a on c. I want to illustrate what I mean by this disruption of causal links
essentially. So, what am I doing here? I am saying that I have an action unstack a; it
produces holding a which is consumed by stack a onto c or instead of c, you can use a
variable x or something like that, it does not matter.

And let us say this arrow stands for a causal link essentially and implicitly we assume

698
that wherever causal link goes and other link follows that this much happen before this.
So, it implicitly captures it essentially. Now I have said that my plan is a partial plan.
Now supposing I have some other action floating around in my plan which says that
stack let us say d 1 to y. So, let me change my example a little bit. Let us say this is
producing it is producing holding a, but it is also producing clear b and then I am saying
unstack stack x on to b.

So, this also illustrates a fact that when I am considering two actions, they do not
necessarily have to be continuous essentially. So, I am not saying that this action
happens immediately after this. I am saying this happens after this sometimes, but this
action unstack a b; one of the things it produces is clear b and for the stack action x on b,
I need to consume this clear b action. Because I can only put something on into b if b is
clear. So, I want to emphasize the fact that that they are not contiguous; this could
happen at supposing it is eventually you produce a linear plan because it is a one arm
robot and it can only do things in a linear fashion.

So, let us say this is the fifth action, and this is the twelfth action. So, I am not saying
anything about that; all I am saying is that there is a causal link from this action to this
action which means it is producing something which this is consuming. And there is an
ordering link which says that this was happened and sometime only later this must have
happened. And now somewhere in my partial plan, there is a action floating around
called stack d onto y which says that you stack this object d onto some variable y. Now
we can see as I say that let us assume this is the fifth action and this is the twelfth action.

What if this was the seventh action sometime in between and what time this y became
equal to b; remember we can do these binding constraints. So, some point I might say
stack for some reason that stack d onto b. So, there is a danger that this causal link that I
am interested in is going to get disrupted by this action. And my planning process must
somehow take this into cognizance and try to do something about it; we will come to that
in a moment. So, some of the older planning algorithms, there was an algorithm called
tweak one of the first nonlinear planning algorithms. This was written by a guy called
Ariston Tate.

699
So, if you look of Tate and tweak, you will find get some information on the web. They
use the term clobbering which modern planning people do not use, and they also use the
term declobbering. And I was told that Ariston Tate has one of these online courses on
planning which one of my acquaintances was telling me about. So, I am sure if you
search for this online course, you will hear lot about tweak and this process essentially.
So, what do I mean by clobbering? I mean I am clobbering the casual link. This action
threatens to clobber this causal link.

And if it clobbers it, another action might declobber it essentially which means that even
if I let say stack d on to b at some later point I might say unstack d from y because
eventually I want to stack b on to y. So, I am just trying to give you a flavor of the nature
of partial order planning that you keep throwing actions into your partial plan, because
you somehow discover that those actions are necessary. Like, for example, deciding that
you have to take a train from Chennai to Delhi and then you keep adding more actions.
When you add more actions you have to be careful about things like this essentially.

That casual links are not disrupted essentially, then when we talk of links like ordering
links; one is to talk about consistency. The ordering links should be consistent, and what
do I mean by this? You cannot say that a happen before b and b happens before c and c
happens before a, then you have a cycle and then that is not consistent essentially. So,
another thing that the planning algorithm will have to do is to know what is the planning
algorithm we are talking about.

700
(Refer Slide Time: 30:55)

We are talking about a set of refinement steps and the refinement steps are of four kind,
add an action to the set a, add ordering link to this set of ordering links, add the casual
link to see. Anyway it says whenever you add this, you have to also add one here or add
a constraint. So, what we are saying is that we will start with a partial plan; we will start
with this partial plan always an action a 0 which produces the start state as it is showed
here and an action a infinity which consumes the goal state as we have shown there. And
then we want to fill in more and more stuff; how do we fill in? We have a series of
refinements steps.

What do refinement steps do? They may add an action; for example, I said here that you
must add this action stack a and b, then I must add a causal link between this and a
infinity and an ordering link between this and a infinity. So, I could choose any one of
these refinement operators, and that is where the search will come into play. And
eventually, I need to refine the plan more and more which means specify the plan more
and more. Initially, when I have only a 0 and a infinity, I could have an infinite number
of plans which would fit into this and which would still be a solution.

701
(Refer Slide Time: 32:50)

So, what is a solution? When do we terminate? One way to say it is that, okay, if you
want to decide whether a partial plan is a solution, then you could at look at it from the
state space perspective, which means you completely specify it. Complete; that means a
slight b here; they do not any variables in the plan. Insensate all variables to something,
produce a linear order or in other words do a topological sort, because given a partial
order, you can always convert it to a consistent linear order. And that process as I am
sure you know is called topological sorting and this topological sort of pie will give me
some pie prime.

And then I will just use the old mechanism of progressing from the start state, applying
the actions one by one, because now I have sequence of actions. I will apply the first
action, then the second action, then the third action and so on and check whether the last
state that I get is the goal state or not. Of course, I could do that; the travel is the
topological sorts could be many. So, do I check for one sort or do I do for all. So, if you
go back to this shoe wearing problem which no doubt you have encountered at some
point.

You could first wear both the socks and then the two shoes, then both the sox could wear
in any order and then the two socks two shoes in any order or you could first wear the

702
left socks, then the left shoe, then the right socks, then the right shoe; all these
minimizations of the partial plan that we talked about but we did not draw are valid
plans. They are valid ways of wearing a pair of shoes; do I need to look at all those socks
and then do that? That would be too painful. Instead the partial planning community has
come up with a different test for a solution plan and we say that as well plan has no
flaws. Of course, we need to clarify what do we mean by this essentially.

So, what are we interested in? We are interested in a way of looking at a partial plan and
saying whether it is a solution plan or not, it is a solution or not. And implicitly, what we
mean is this that if this was a solution, then I could take any ordering of the actions and
that would any consistent ordering by which you mean that if there is an ordering
relation in the partial plan, it must be respected in the solution plan in the linear plan, and
that is called topological sort. I can do with any topological sort and that will be a valid
plan.

But I do not want to actually do this process of making linear orders and testing,
applying that, check for validity function. I want to look at the partial plan itself and
make it this observation essentially, okay. So, it should have no flaws. So, what do we
mean by flaws? There are two kinds of flaws; one is called open goals. By open goals,
you mean no casual links. So, if you look at my partial plan of three objects which you
have to somehow figure out from this stuff on the board, I have one action a 0. I am
trying to solve those Sussman anomaly problems; I have one action.

So, this is my starting position and a 0 essentially produce this starting position. I have
one action a infinity which has these two preconditions; one of them has a causal link
which is stack a on b, but this instead intern has more casual links or more preconditions.
So, you must be holding a. Then b must be clear; I think that is all, right. So, if I look at
this partial plan of three objects a 0 stack a b and a infinity, I have three open goals or
three unsatisfied goals; one is clear b, one is holding a and one is on b c. And I say if I
have an open goal in my partial plan which means I have a unsatisfied goal or I have a
goal which does not have a causal link, then that is a flaw in my plan essentially.

So, a solution plan must not have open goals. If it has no open goals, then it could be a

703
solution plan. The other kind of flaw is called a threat, and what we saw here was a
threat essentially. What is that threat? A threat and action threatens a causal link. So,
causal links can have threats from actions essentially. When is this situation a threatening
situation? It must satisfy three conditions. So, I will just mention them today and in the
next class we will take it up from there.

First thing is that it must be somehow undoing this predicate that the casual link is
supporting. Remember that every causal link has a predicate produced by one consumed
by the other. If it can somehow produce lot of clear b; so if this action has an effect not
clear b, then it could be a threat, why? Because this action stack something onto b; it
requires b to be clear, and this action is producing that or clobbering that predicate. So,
how do we express this view? We say this that we cannot unify this tear by and by unify
for the moment, we will just assume that we cannot assign this value to this variable
which means, if y becomes equal to b, then this action could be a threat, but that is only
one condition.

There are two more conditions. The other two conditions are that this action happens
after this action and before this action. If all these three conditions were to be true which
means that I can put y equal to b in my plan? I can add an ordering link between unstack
a b and this like this which means it is consistent to add this ordering link. And I can add
an ordering link like this which means this action happens before this action. If all these
three things happen, then we say that the theta is materialized essentially, and in effect, it
will no longer be a valid plan, because once you have put d on to b, you cannot put this x
on to b essentially.

So, something has gone wrong with the plan essentially, but this is only a potential
threat. It is a potential threat, because my plan is a partial plan; I do not know what this
value for y is. I can force it to be not equal to b; for example, I said you can do
something like this. I can say do not stack it on to b, then of course, I have removed the
threat or I can say force it to happen before this action. Then also I have sort of evaded
the threat or I can say force this action to happen after this action essentially; that we will
see the algorithms for resolving flaws.

704
But we have this idea of a threat now and action a or let us say an action c threatens a
causal link a p b; if it is consistent that it can produce lot of p if it is consistent that it can
happen after a and before b, then it is a potential threat essentially. And of course, you
resolve that threat we will have to see that one of those three conditions does not happen
essentially. So, we say that partial plan is a solution plan if it has no flaws and by this we
mean, it has no open goals like for example, here we have three open goals this one this
one and this one. And it must have any threat essentially.

In this example, I have only three open goals. So, may be my step would be to put in the
action which will produce this or put in the action which will produce this or put in the
action which will produce this. And this is the general flavor that once I have flaws in
my plan and the flaw should be either open goal or a threat, I must produce a solution for
the flaw which is to say somehow takes care of that flaw essentially. So, the high level
algorithm for partial order planning or plan space planning or least commitment planning
or nonlinear planning is to start with a empty plan a 0 n p t, which basically is telling you
what the start state is and what the goal state is. Keep refining it till there are no flaws
left essentially.

Once you have no flaws, you must still have a partial order; you may not have specified
a complete order. Like for example, in this shoe wearing example, but that would still be
a plan. And by this we mean we take any consistent linearization of those actions, and
that action will be a plan in the sense of those plans being a valid plan in the state space
perspective that we have seen earlier. So, today we have just specified what do we mean
by a partial plan. So, this is four topple made up of a set of actions which may be
partially instantiated the set of ordering links which may not be complete, a set of casual
links which may just link some actions with other actions and set of binding constraints
which says that some variables can takes some values or cannot take some values and so
on.

And we have defined what does it mean for it to be a solution plan that it should have no
flaws, no open goals and no threats, and task is to keep refining a partial plan till it has
no flaws essentially. So, we look at this algorithm in the next class when we meet which
is on next Friday.

705
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Lecture - 37
Plan Space Planning

So, we are looking at plan space planning and its known as we discuss in the last class by
various names partial out of planning, lease commitment planning. And the different
between plan spaces planning, state space planning is plan space planning purchase in
the space of all possible plans.

(Refer Slide Time: 00:44)

And a plan in represented as a 4 topple where the first one is actions as they could be
operators with variables inside them then ordering links the causal links and then binding
constants. So, a plan is basically made of this 4 topple and we have observed that pi 0 is
always with a 2 actions A 0 and A infinity. And it as 1 ordering links if says that A 0
happens before A infinity it as know binding constants no causal links. And that is
always the starting position for such in plans space planning where A 0 if remember is an
action whose pre conditions are nil and post condition are others started. Whatever the
dedicates in the start state of everything is a effect of the A 0 actions. And A infinity you

706
in the action which consumes the goal predicates and has no effect. So the starting node
in such space is this pare of 2 actions and the ordinary constrain between them. So, let us
first write down on the algorithms and then will discuss for those constants. So, high
level algorithms for plan space planning so begins with pi 0 which is a initial plan. And
we will use pi to stands of plans in general we initialize pi to pi 0 and then while we
discuss flaws that will pi them again and a moment.

So, the basic strategy in plan space planning is to inspect the plan for flaws and try to
resolve those flaws one by one essentially. And you have using here what we will call as
nondeterministic operator which we choose some flaw from the set of flaw that in the
plan is will define flaw the again in moment. And it looks for resolvers as will assume
that we have mechanism for fine in of what are the resolvers for those flaws? They may
be more than 1 and then again we say choose. If the set of resolvers empty then you
cannot you find of plan else choose resolver R belong to set. Again we assume that
nondeterministic will make the right choice which is by at this stage we can return empty
in practice. Off course, I am sure your know conversion with this idea now, that we talk
about nondetermisum algorithms like this. It basically implies at the background the
deter mist calculation there search happening essentially. So, you would try 1 flaw from
this set try to resolve it and then if will each at then you will back track and try the next
flaw and resolve it.

Likewise you would choose one resolver to use of flaw and carry forward. And if you
each at then your backtrack and try the next resolver. So, essentially in currently there is
a search embedded behind this nondetermisum choose operators which we and a familiar
with that here we will assume that somehow a choosing the collect clause of this
algorithms is nondeterministic in nature. Then let us a pi 1 or pi gets lets in a apply is a
choose is applies operator to pi essentially. So, this so here we have looking for the set of
is the dissolvers the flaw then we are picking 1 resolver and applying that 2 pi which
means little do something to the plan remove that flaw in some way. And then update the
plan essentially and will keep doing that is longest the plan has a flaw essentially. Only
place we will exist which failure so of after we exist from this off course we just return
on pi. But the other place so exist is when we cannot find the resolver for flaw between is
exploding all possible resolver in missing. And then we are discussing that they are 2

707
kinds of flaws one is open goal and the other is threat. So, let us discuss this suspense
that we at seen earlier remember that we a shown that a algorithms like goals type
planning, which does linier planning in the sense at it breaks of the goals one by one
trails all the first goal and then trails all the second goal.

(Refer Slide Time: 08:17)

And we are shown that with the problem like this with the goal like on A B and on B C.
If this is the goal set which in our system would be represented by A infinity having 2
goals on just you short forms on B C if it is nothing. So if you remember this was the
suspense annually the start the goals state was at a s on B and B s on C and the starting
state was that C is on A.

708
(Refer Slide Time: 08:56)

And B is on the table A is on the table which again you will recall you can represent by
A 0 with no input and anything that is here which is let us say norms empty global be we
a having a dealing with A 1 norms. So, what on take this for this ants for A 1 table B
clear B on table A on C A clear A that. So start action an that is end action that is so
suspense monopoly problem way wave we have way simple problem of 3 blocks. And
what we so earlier was that if you try to say that of the 2 goals that I have on A B on B C
I will try to solve them 1 by 1. The first I will completely solve on A B then I will
completely solve on B C or in the other order that you cannot do essentially when you do
the second goal it enviably disc rubs the first goal that you had achieve earlier. So, these
goals are what we set was nonsingular lisable. And what you want see the plans space
planning offers the possibility of solving such problems and finding optimal solution.

So, if you work out the solution in your mind for this problem where this is the start state
an that is a goal state. The optimal solution is the 6 step solution was you first untaxed
from A put it on the table then put B on to C that is 2 more action. Then put A on to C
that is 2 more action goals at planning would never find this 6 step plan and plan space
planning can find influence that why interest in that. Now, they are 2 kind in flaws as
you set in a plan, one is opens of goal or open goals we will use a 2 terms synthetic
ignobly. Another is a threat so as of now, this partial plan which as only 2 actions A 0

709
and A infinity as 2 flaws. And they are 2 open goals which say that there is nothing
supplying this predicate what we need is some action which will supply this predicate.
While in other words every open goal must have a supporting causal link which means
we must have some action here which produces this predicate which is consumed were
this actually so threats. So, let us what is the look looked at threats.

(Refer Slide Time: 11:56)

So, let us say some point us on to pick up. Let say pickup M is a block. So I am drawing
in this top to down as is open a practice is many books M city some way here. And I
have on ordering from top to down which have will not draw. Because we have only few
actions with discuss here. And then some way we have the action let say stack n on to y
so I will you this question mark to distinguish constants from variables as you also open
practice. So, n is a constants with means the block name is n and this question mark y
means it is a variable. So, we do not know which block you n stack n 1 to some for some
reason you have got this action have an this happy this is the ordering an this is the order.
And let us say one of the B condition for this is clear M of course there other B
condition. It will remember the strict action that A must B M T and M must b on the
table and 1 of this clear M. So, let us so let say this is opens goal that is flaws b
addressing so how do you how do you address how do you resolve this open goal flaw?
They are 2 possibilities, one is that an existing action in the plan can supply the predicate

710
which means as already action some way and you can just establish causal link from that
action to this action provide it.

It is consistent and by this we mean the ordering links a consistent by and by that we
mean that there are no cycles in that essentially. The other option the other way of is
ordering open goal list to add an new action. So, for example, in this situation there is no
action which can supply on A B or on B C so we are comparing to add a new action. So,
for example, we might say stacks A B as on action and this provide this causal link to on
A B. That the second way of in solving the open goal which is to add an new action to
the plan how do we add an action to the plan? We add the action to this set of actions
then we add a causal link from this new action to the predicate that we a try to support.
And means certain ordering link here and if there any constant and we for example, you
make want to say add stack x y an x equal to A and Y full B. But will skip that part
essentially likewise you could say this is achieved by stack B C. So, because of the stack
that one other things we are doing in plans space planning is resolve open goals. It still as
a flavor of backward is in, because open goals are only in the goals state and the new
keep building.

But it split possible that you end of first satisfying some open goals then jump here. So,
there is no fixed order of such essentially. The other in case in thing about plans space
planning with here observed was that in state place planning both the selection of the
action and position in the action was them simultaneously in 1 move. So, for example,
forward space place would say this is the next action. So which means the select the
action an also set the next position is the next in the plan. Because they are plan is a
sequence actions likewise in the backward stage space search you sack construct in the
plan from n steps. In plan space planning we are selected this action stack A on B and
stack Bon C, but we are not say anything about what is you pending the relating already
in between them essentially. So, we separate the task of selecting in action and infusing
an ordering on the action essentially so that is then independently. So, in principle off
course this allows to first select some action in between. For example, if they have
something with as very practical to the plan and we knew it was practical then we could
select it in between. And we a choose in select of example of going from place a to place
b.

711
So, if at to go from here to there on for example, I would to say I need to flight to Delhi
first and I put that action in between. So plans space planning allows you to do that but,
this kind of listing which is more up stack requires still more structure in the plan. And
structure is difficultly hypothetical in nature. So your to top of high level actions and the
define in to low lower actions which you are not going to considered. Here so we a
working as the at the action level which the at 1 plat level essentially which is the I this
as the flavor of backward search to some extent. So, we have this clear M here and let us
say do make clear M B say up stack something let say it do not give. Now, what that is
from x from m and that will produce this. So I will use the dotted line to represented
causal link which means this is producing this and this consumedly that essentially. So,
this was off course happen before this essentially. So the moment in set this action in to
my plan. So, what we that do in this? Already they in my plan this 4 actions the 2 start
actions and the this 2 actions at for they in plan.

And then I have added this action up stack something from M and added that. Because I
want to resolve this goal of clear M, but the moment I have this plan structure. It has a
flaw which is of the type threat essentially and what is the threat? In the threat is that
here on the casual link so basically threat is to a casual link. Always that something is
trying to disturb to casual link what is the casual link doing? Casual link says this action
is producing clear m which is consumed by this pickup M action. So, I which established
casual link in my plan essentially. Now, because there are variables involved here,
because of fact that we do not know what is ordering of these 3 actions. Off course, we
not this once come before this, but we do not my anything about how this is place
between take to these to. Then the put essential rate what is the put essential rate that this
action will come in between these 2 actions. And this y will be in to M which would
mean that for pickup M I needed clear M to be true, but if this stack n on to M happens
before that which means it happen between these 2 actions.

So, let us a this is the threat data an considering that this happens first this happens next
in this happens third. So, this action happens after this action an before this action and
there is the possibility that this y I can we made equal to M or bond to M then you can
see the details destroying this casual link. Because for pickup m needed clear m by
putting n on to M that clears M is been lost. So, this action is the threat to this casual link

712
so as we I think discuss in the last class. And action is threat and when you say threat it
algorithms potential threat to a casual link. If it can produce so what can it produce? It
can produce not clear y as an effect so effects are below and the p condition are that it
can produce an effect. Lets us call it not p which can the infinite with a condition p
needed by the this action which means is pointed destroyed this condition. If can so in
other wards is y can be infinite with M and many say infinite we mean by could M is
possible and this action happens after this action and this action happens before this
action. So this 3 conditions were to become too at that y equal to M and there as on
wonderingly between this and this and on wonderingly between this and this then the
threat would materialize.

And my plan with a longer is a valid plan, because is would know this would come on
open goal which on a longer be satisfied. Because a casual link was satisfying this, but in
certain this action in between destroyed at casual link. So, threat is the other kind of flaw
in the plan 1 of 1 is A simply kind that open goal as those support the other kind is that
there is a threat to how to be resolve the threat? So, there are 3 west of resolver threat 1 is
separation which in this example. So, basically say add of binding constants. And make
sure that this cannot unify with is which means add this to might set up binding constants
essentially. One side does that they can know that I am not going to stack this n on to M.
I will stack known to something else which means M is free for whatever that x is to be
stack known to that. That is one way of resolving a threat another way is to demote is
called demotion So, let us say in general terms let us say action A is a threat to a link call
A one some predicate p in action A 2. So, this is A 1 this is A 2 and this is action A one
way so say that p is different which is separation the other 1 is demotion which says that
that action A happen after this pickup M which means add the link A 2 happening before
A.

So, when I say add added to the relevant place when I binding ordering link I must added
to this set when I binding constants I must added to that is set an so on. So, I can pushes
action down and so we say it is been demoted it happens later. That were also resolve the
threat, because then nothing is disputing this clear M. Because they all this happens some
were later another third is as we can a imagine promotion which is you add A 1 to
happen before A again to the set of ordering links. So, we have basically 5 resolvers for

713
open goal we have 2 resolver’s goal I just say old action. By this I mean an existing
action so may be just right existing actions order new action. To resolver open goal
either fines some existing action which can supply the predicate all if you cannot then
add a new action. Because taking an old action existing action may actually violated the
constant that the should be no cycle in the action. So, this is all the open goal and this for
the threat essentially and algorithm essentially goals. So, this cycle of choosing of flaw
how a initial plan as 2 flaws those 2 open goals choosing a resolver. So, both of those
flaws we are chosen to resolver who is to add the new action. And a keep doing that till
there are no more flaws left in the action. So, this is the less a this is the flaw an
underline 1 and if have cycle it then it is been resolved. But off course, these actions
have their own requirements.

So, stack B C requires holding B it requires clear c it requires on no in that is all and it
produces a from on b c produces a empty an produces clear B. So, let us assume that
clear B is an effect of stack in B on to C which means 1 unstack something of where if
you are to under stack of B from something then B would be your longer clear. So, when
you are holding it B is not clear only when it predict down it is clear essentially.
Likewise this needs clear B holding A and it produces arm empty and its produces clear
A. So, everything A is an open goal every time we added new action all is B conditions
are open goal essentially then we look for of flaw and resolve it is essentially. So, let us
say we look for this flaw clear B and you say this action is producing clear B. So, I have
an existing action which is supplying clear B so I had an ordering link which so I say that
that this action or this clear B is consumed by this. So, I anode ordering link which is at
their link is between the actions that this action was happen. So, first an must stack B on
C which will produce this clear B and then I will consume clear we when I stack A on to
B known A B C.

So, you can see that slow logical in some sense that you know you can see that you must
first stack B on to c and then A on to B an at least 6 doing in this part correctly
essentially. Now, let us at this 1 it to this is an action so an action that we can fine is
unstack no holding A. So, action that we can kind is pick up A and the action for this
also we can fine is so this here picking care of pickup B. Lets a we are doing this in this
order so pickup we will also produce other things likes no arm empty. In that is sense A

714
little delete A empty it needs arm empty as a predicate A empty is predicate will
producing not A empty it needs on table A this needs on table B. So, let us right it here
on table B on table A it needs clear A it needs clear B observe that we have and open
goal clear B here which could I will supplied by that action stack B on to C. But we
cannot choose that, because then we would have to say that that happens before this and
you already setup this happen before that. So, something which is a kind of goal with is
that the moment you added a causal link this a causal link we are added. Pickup B
produces holding B which is consumed by stack B on to C. The moment we add this link
we also added ordering link essentially.

So, once you have added an ordering link is this selection we cannot add on ordering link
in this selection. So, we cannot choose this that for this essentially. So, let us say that we
choose this as an open goal and we say unstack something x from A and that will make it
clear. So, become an effect of this action and it also produce holding x and another
things and unstack x on A needs x to be on A. So, we can produce it like this that this
start action has an effect call on C A and we can say if you say that x equal to C then we
can establish this casual link here. So, this is taken care of the which am not drawn here
the C condition for this is on C A that is taken care of, because it is will produce by A 0
another C condition is arm empty which is also produce by this. And the third a
condition is clear C which must be there somebody should of point is these are this we
clear C and that is solving. So, the c conditions need for unshackling c from we which is
clear C is true in the initial state, on C A true in the initial state. And arm empty is true in
the initial state and they are produce for this so we establish this 3 casual links which will
do that essentially.

Now, we can see there if will look at this casual link here that A 0 is producing A empty
which is being consume by unstack C from A is being threat arm by this action here
pickup B. Because if pickup B we have not set when pickup B is always set is that A 0
happens then we some stack C from A happens. Then pick up A happens notice also that
this cannot happen immediately afterwards and once you unstacks C from A we are
holding C and it would produce something call mark arm empty and holding C. So,
which means we obviously cannot do unstack C from A followed by pick up A, because
pick up A also needs arm to be empty. But our plan an as only set that this must happen

715
before that an nothing else we are not set that no action can be in certain between an as
way will see that can be done essentially. So, the only thing set is 1 line is at you unstack
C from A and there at some point you pick up A an stack an to B. But before you stack
an to B some time you stack B on to C and before you stack B on to C you must pick up
B essentially so that is all we have so of essentially. But now, we have a threat this is
being is a threat in this is threat in to disturb this casual link here.

So, we have these 3 option available separation is not possible, because there no
variables safe. Promotion is not possible, because A 0 is the first action which always the
first action an nothing can happen before A 0. The only thing remains is demotion which
means we say the this must happen after this. It means we in introduce the casual an
ordering link between this and that. Then let us say that so let us say that we address one
of these issues they lets a we looking at this arm empty. And we are pushed it be on this
so we cannot connect it to the start in action. So, we must produce some action calls
pickup our put on something. This put on by will need holding y and this open goal can
be might by this goal holding C by seen y could to C essentially. So, what are the actions
we have so far? We have what is action unstack C from a pickup A put down so this will
become C here put down c is someway so 1 2 3 4 5 6 if got all the 6 actions that we need.
And I will set up leave this is the small exercise so you to see that now, you can resolve
all the flaws in this by either introducing in ordering constant to remove a threat. All
connecting so for example, we can connect this to this which means this must happen
after that.

So, the first action that we can C is unstacking C from A and sometime after that you
must have pick up A. And sometime after that you must have put down C we do not
know we just know that this happens after that. Then we know that after putdown C we
must have pick up B from A. After pickup B we have stack B C an sometime after stack
B C we have stack A B off course it is not at proper plan, because as you can C you can a
do this to action stack B C and stack A B immediately afterwards. It needs A pre
condition which is destroyed by this which is should be holding A. So, you can see that
this needs holding A and that is provided by this. So, this is a pre condition an some we
out to figure out that the sequence of fraction is that you unstack C from A when you
putdown C when you pick up B stack B on to C and then this needs to be added. That

716
this pickup we must be happen after stack B C so why it happens? I will owe it as a
small, because it happens. Because some flaw is to be resolved and the flaw could be for
example, that arm empty you are something like that is needed for pickup A and stack B
on the c producing arm empty so we with that will right the stack.

So, in the end will have one order go from here to here on here to here sorry go from
here to here when go from here to here? And we expect earlier plan in this particular
example, because of the fact that we a dealing with a one arm brought. And you can do
only one thing at the time as oppose to other kind of domains. For example, this shoe tie
domain that we discuss earlier to were lefts out were life rights of were left shoe were
right shoe there the final plan may not have a leave your order essentially. It will have a
parlor order that you first were the 2 socks and then you were the 2 shoes and then you
are done essentially. And then you could do between them in any order essentially. So,
any linearization of a partial order or any to topological out of a partial order should be a
valid plan. That is a original condition is started with, but in set of having to check all
linearization we have change this too is condition that which should have known flaws.
And the flaw as define is they must be known open goals. It means every open goal or
every goal every sub goal must have a supporting casual link.

And they must be no threats that no action must be threat in to dispute casual link
essentially. And as on is that condition is satisfied the plan that we produce is going to be
a valued plan. In this case it terms out to be linear plan which you are also the optimal
plan essentially. The off course we would possibly observe that they little bit of a right of
a hand here that as some of chosen the right actions and the right things. In a put some
away chosen the right things for this choose a flaw on choose in action. So, notice at this
a symmetric situation we want to pick up B on we want to pick up A. And I choose pick
up A which let us to the plan if have chosen pickup B then you can see that there know
way you could have produce the optimal plan we the produce longer plan. Because if
you are going to do pickup B as a action which is the after this then you will have to put
it down again. And then pickup C put it down and then pick up A so although extraction
would keep in a essentially. So, this is the basic idea of plans space planning this is the
algorithms there are variation would be algorithms which people have essentially.

717
Some people only maintain and is in the, of to be solve and the keep is ordering threat
along the way is in the open goals to be solve. This is the more generate algorithms an
you can off course right variations this you can put in some heuristic and so on
essentially. So I will just like to end with ((refer time 42:14)) about something which and
mention on the way which is this notion of so if you have observe what was happening
here? We started such resolving open goals from the end essentially. Because x y the
open goals for an this produce more open goals and we had this flavor of backward
reasoning happening essentially. And that, because we are working as the ground level
we are only considering action which are applicable in the do mine but, if you at high
level actions like know go from here to Delhi. Now, that is not an which is the ground
level action the ground level action would have many other steps that you go from here
to the station you go the tree in an all this. If you have this notion of hypothetical actions
which means action which can decomposition in to smaller actions then we can talk
about high level plan and then refining it in to lower level plan.

And then this strategy that we at spoken about earlier means ends analysis. If it member
Simon a Newell the means an analysis strategy problem solving was the another first
strategy proposed by in a in the 60’s in the last century. It was based on the way thought
human beings solve problem. And the basic idea behind means an analysis is that just
like we have flaws and we have resolvers of flaws. A more general think is that you have
different is and you have resolvers for defenses so what you mean by differences? So, in
this travel do mine I would say there are many different is my plan to go from here to
deradun. One is at the highest difference is that I am here I am not in deradun so in that is
in sense it is like of flaw. But then decompose it in lower level actions the, and then I
decide that you know I need these steps I need go from here to station all. Let us say
even lower level there I to go from here to the gate and then gate to the station and then
getting to a train and then go to Delhi. And so I if realize that and if a can recognize the,
these are the difference is that I and it to resolves.

So difference one difference is that I am in Chennai not in Dehradun. And so the largest
difference I me have a difference operator table which would say if distance is more than
500 kilometer you must take a train or of flight. If distance is less than between 100 and
500 you can take up bus or something like that we have a operator difference stable like

718
that. And then the key thing about means and analysis is at if you have difference is and
if you a operators you must choose a largest differences first. The largest differences
would be that I am in Chennai I am not in Delhi and once says say that I have to resolve
that differences first I would first selection action or the plan for doing that. And that let
us say that action is taking a train from Chennai central to Delhi. And then I have smaller
difference is left am here not in the station that is 1 difference then I will Delhi station.
But I am not in lets I am taking to a bus there as soon I am not the bus station on
anything likes that essentially. So, you address higher difference is first and the lower
difference is later. So differences are the ends and operators are the means to a cheap bus
ends essentially. And means at the analysis says that you must analyze the do mine. And
look at the larger more important things for first an less them and then look at the less
important things essentially.

So, 1 interesting example of this is this tours of ((refer time; 46:18)) which we must be
familiar with. So, if you want lets a move 5 dist from location A to location C then there
are 5 differences. In that smallest this see that a not at C second smallest dist this at A not
in C you are 5. And if we can now, order this difference say the larger the dist the more
the difference then the here means. And analysis strategy says that first whatever moving
the biggest disk to the definition then many about moving the other this essentially. And
then off course, you know the familiar ((refer time 46:56)) algorithms will come
automatically out of this that you move n minus those to B then move this largest this to
C and so on essentially. So, all those can be seen as specific case is of means analysis
and that was propose fight long time in their book for human problem solving. So, all
this algorithms that we are have seen of planning so far. These are in some sense ancient
they are all developed in the last century sometime. It terms out that whatever however
tries to right this algorithms. The problem is hard member we said that even the split
domain the simplest planning domain is piece space complete essentially.

So, we terms of that even in simple domains the only kind of plans we can find were of
lengths something like 5 to 10 essentially. So, if you are the problem in which the
solutions of 5 steps longer 10 steps long then we could use is algorithms for solving
them. Then in 1995 of pare of researches care of with a new idea and with an algorithms
which look at this whole thing in very different way. In algorithms call graph plan an

719
there was the variation which came long along that time which increase the length of the
plans could be found by an order of magnet to essentially. It means it we could find plans
of lengths 10 we these algorithms goes stack planning or any of these algorithms we
could find plans of lengths of 100 using these new algorithms essentially. Off course, we
would not have time to study those algorithms in details in this course, but in the next
class I will just give a games of this algorithms Graph plan which is kind of a land mark
algorithms in the world of planning essentially. And will end with planning in this course
with that essentially so I will stop here.

Student: This shows A 2 should happen before this shows that A 1 happens before A.

Promotion here promoting so A should happen before A 1 that is right holding is not part
of length. Sorry.

Student: Here holding is not part of length here this is we are going to pick up A to after
stacking B on C we are pick up in A.

Then you have this holding A not an action it is a predicate that is a I drawn a circle on
that for an action. After that after pick in up a stack A on to B in that is a last step in the
plan essentially. So, we will stop here.

720
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Lecture - 38
Graph Plan

So, let us for the last time this in this course look at planning and so as I said in the mid
nineties, there were a series of approaches to planning which produce much longer plan,
which means, they were much faster algorithms and could the solve larger problems.

(Refer Slide Time: 00:37)

All these approaches were 2 stage approaches which means you have at we have a
planning problem, you converted into something else and solve that problem essentially.
So, one thing that was very successful was that if you converted into a S A T problem
and we already familiar with S A T and then you could use some solver. In fact, there
was an algorithm called S A T plan which was very successful in the mid nineties and
early part of this century. So, how do we know it was a good algorithm?

721
(Refer Slide Time: 01:29)

They used to be, there is every 2 years a competition call I P C international planning
competition, in which people submit their programs. So, just like you had this travelling
salesman problem competition, in this you submit your planner and they try it out to set
up problem and then the planner which was best is does the to be the winner.

So, S A T plan was one of those planners which did very well for the period of time, and
what they did was to convert a planning problem into a satisfiability problem and then
solved it using it so on algorithm because, but you could solver first, that is ((Refer Slide
Time: 02:13)). Another approach was to convert a planning problem into a C S P
problem, which as a name suggests converts is into a C S P and, by C S P mean constant
satisfaction problem we look at a, at them briefly in the next 2 lectures and then solve the
C S P.

So, here posing it has a different problem and then using some solver, to solve this is
recurring team in many problem solving situations. What graph plan does is to convert it
into structure called a planning graph? This is algorithm we want to look at today and it
is called graph plan, it was given in 1995 by two researchers called Blum who is now as
C M U I think in first. So, failed recent by in contrast to the other algorithm that you
have seen all those algorithms were quite old.

722
So, as a name, as I, as I said what these types is it construct a structure called a planning
graph and then searches in the structure for a plan. All these 3 algorithms have one thing
in common is that the conversion that they do in the case of either S A T or C S P or
planning is incremental in nature. So, they construct a structure of certain size and then
see if there is a plan there little bit like D F I D you might say. Keep exploring longer and
longer plans to see when you can find the plan and because of that all these algorithms,
end of finding the optimum solutions as well because they incrementally search for
longer plans.

So, you can imagine that whatever the mechanism for convert, take it into a S A T
problem, it is a relation between action and states and new states and new action so on
and you express this as a satisfiability problem. Things like re condition of this must be
satisfied by the action, by something you know that kind of some and they always find
shortest answers. So, let us start where looking at the panning graph structure, the
panning graph is a layered graph with alternate layers of prepositions and actions
essentially.

So, there is a layer P 0 which is, which contains all the star prepositions, then there are,
there is a layer A 1, which is a layer of all possible actions, which can be done at the first
time instance. But, followed by layer P 1, which consists of all the prepositions, which
could possibly which you at the, at the end of the first layer essentially. Now, the
difference between the other algorithms heuristic search, we just briefly mentioned in the
passing that we know. We could have powers express search or backward express search
with the heuristic function which was computed by solving a relax planning problem a to
decide which successor is best.

That search heuristic search or set space search searches in this set of states plan space
search that we saw in the last class searches in the space of plans. Here, we searched in a
structure called a planning graph which is some sense are union of all possible states that
can happen essentially. So, if I have, for example a start state and I can do action A 1 to
go to sate S 1 and I can do action A 2, do to the state S 2, that is what heuristic search
would do, powers express search would do, it is a should, I go to S 1 or should I go to S
2, in graph plan what you do is.

723
(Refer Slide Time: 06:57)

You have this layer start layer, then you have first action layer which includes both this
actions A 1 and A 2 and, then you have layer which is basically the union or will be S,
right S 1 union S 2. We put together all the possible predicates into one set which is in
this next layer or the, so we have this proportion layers and action layer essentially. So,
one thing with all these algorithms do is to first convert this problem into a proportional
problem which means, then if I will operator like a pick up a put down. So, on you look
at what the starting state is what the objects in the domain are, so that let say I have 5
blocks A, B, C, D, E, then it will produce all possible actions.

So, with this 5 blocks pick up a stack A 1 to B, stack A 1 to C, all possible actions that
produce and work A goes actions. So, I want to give you a flavor of this, so let us search
work with this, with this same example that we are looking at the Sussman’s anomaly.

724
(Refer Slide Time: 08:20)

Which means we are starting with the state A C is on A and B is on the table and they are
on the table, so in the initial layer, in layer P 0, we would have all these things, so let say
arm empty then on table B, clear B on table A on C, A clear C these act these. So, this is
P 0 while layer P 0, now there are 4 kinds of edges in the planning of the, first is
preconditions edges which go from an action to some precondition in the previous layer.
Then effects positive edges, it takes minus edges and one more set of edges, which is
called Mutex Ulatian.

Now, already familiar with the precondition edges, the positive effects and the negative
effects we look at work Mutex is in the movement. So, in the, in the initial layer, we
have this proportion, so it is a proportion layer 0, P 0 then we have action layer 1 in
which we want to insert all actions which are possible in this propositions.

We want to put a slight additional constant that the preconditions of those actions must
be possibly true at the same time if they are possibly true, at the same time we will
consider such an action. Let us, why this notion of Mutex relations will come? So, as he
told with, now this is an often use term some mutual exclusion essentially, so we will
have this notion of. So, this Mutex relation is going to be a, these edges are going to be
inside east layers essentially, so this layer will have it is own Mutex relations.

725
This layer will have Mutex relation, they are from one action to another, in an action
layer or from one proportion to another in a, in a proportion layer. If they might exists
between two entities, either two actions of the propositions it means at, though A cannot
be true at the same time. So, it if there is a Mutex between A 1 and A 2, it means they
both cannot be done at the same time. So, as you can see in this example, we have 2
actions unstacks A from A or pick up B and, we can see that they cannot be done in the
same time because they both required the arm to be empty and only one can use the arm
essentially.

So, you would have a Mutex relation between them, likewise they are Mutex relation
between proportions which say that those things cannot be true at the same time one
thing which happens constantly as we construct planning. So, we construct the planning
graph from left to right, we keep inserting actions, then proportion and actions then
proportion and actions then proportion. Till what time do we do that? Till the time when
the goal propositions appear in a propositions layer and they are not mutually exclusive.

So, what are the goal propositions that we are interested in? In our case, we are interested
in on A B and on B C, if these 2 occur in a proportion layer. In fact, the first time they
occur in the proportion layer and they are not Mutex, which means they are not mutually
exclusive as per the planning graph. Then there is a possibility that we have solved the
problem so that at that point, the first stage of the algorithm which is a powered space
and the second stage begin. So, that says, that this is that actually a plan which will
achieve a both those put goal conditions for me or not essentially. So, it is a 2 states
problem, you construct a planning a search for solution, if you cannot extend the
planning graph by more search for a solution and you keep doing that.

Now, one action which is a uniform action which we always insert is called a no op and a
no op has any predicate P as an input and the same predicate P as an output. So, as a
name suggest, it says that no operation we are doing on this predicate. So, we will depict
no ops by these lines like this initially, but we will not draw them after that because they
really take up to a space. So, as a effect of this, in P 1 we will have the same actions clear
B on table, B arm empty on C A clear C, this we will do all the type in every layer, we
would always insert no op actions.

726
You can see the effect of this no op actions is going to be that the planning graph is
going to grow monotonically. The sets of propositions in the proposition layer is going to
grow monotonically because whatever is present in previous layer will always, we
carried forward to the next layer plus of course new things may be added essentially. So,
what are the new things which can be added in our example? We have the action unstack
no, so it be write here pick up B, so that I can add to this. Why can I, why can I add to
this? Because it is preconditions that B is clear, that B is on the table and arm empty is
true maybe I should have written this a little bit that side we will manage then.

So, this is a first kind of links we have been edges, we have maintained the precondition
edges then the second kind of edges are positive effects. So, the effect of pick of B is
holding B, so I have a positive edge from this to holding B and I have negative edges to
it is negative effects which is on table B. So, I have negative edge going from here, so we
will defect to net, let say we can defect net negative edge leg this, and for argument sake
we will urgent that this also becomes falls at the end of A and arm empty.

So, that is a third kind of edge, so first kind is precondition edges which go from a
propositions to an action layer and they basically capture the preconditions that A is
necessary, thus action to be applicable. The second kind is the positive effects which are
what would be true in the next layer of proposition, if there action word to be executed
and in that sense. You can see that no op, basically say where if the no op action word to
be executed then on table layer would continue to be true a. So, this we have been here
on table A was true here and if I do a no op then on table A will be true in the next layer
essentially.

That is one more action and one more action we can add is unstack C from A and you
can see that the preconditions for that is that arm must be empty, that A must be on the
table A and C must be clear. So, all these 4 conditions of preconditions for that the effect
of that is holding C and clear A and the negative effect is that we have assume that we
will mark this like this. But, that C is no longer clear C is no longer on A and arm is no
longer empty, so when do we have Mutex solutions? That is A. How do we define that
two entries in a layer are Mutex essentially? So, let us talk of actions first.

727
(Refer Slide Time: 18:47)

There are 4 conditions under which action are Mutex, so we say a 1 and a 2 are Mutex if
one of these conditions holds. The first is that P 1 belongs to precondition of a 1 and p 2
belongs to precondition of a 2 and Mutex P 1, P 2, Mutex P 1, P 2, I mean there is an
edge between P 1 and P 2 which says that they are Mutex. So, if a 1 requires P 1 as
precondition and a 2 requires P 2 as a precondition, and the previous layer, thus to mark
as Mutex then those actions convert be done together. Essentially, 2 effects P belongs to
effects plus of a 1 and P belongs to effects minus of a 2, the second condition for two
actions to be Mutex is that there is some predicate P which is a positive effect of.

So, that is in the layer, remember that this effect links go from the action layer to the next
layer and we are trying to decide whether any 2 actions in the action layer or Mutex or
not. So, if action a 1 produces P and if action a 2 deletes P, so that means it is a negative
effects of a 2 then a 1, a 2 are Mutex. They can never happen the same time because the
semantics of these 2 actions happening in parallel is not defined essentially. So, graph
line in general, will allow us to develop parallel plan which means if we are the 2 arms
robot, for example. Then the 2 arms robot could have simultaneously picked up C from
and picked up B from the table and then something with them essentially.

728
So, parallel actions are in general allowed, of course we have put both is actions in our
action layer, but we know that, now familiar with this one arm robot. How it can
operate? That only one of those actions can happen which means that these actions must
be mark as Mutex somehow essentially. So, we will see this is not a condition for that,
but one condition is that the one action is producing something P and the other action is
deleting P. So, we do not know it both were to happen together whether P would be true
or P would be false and, if you go to linearise them then the final effect of those
sequential actions would change depending on the order essentially.

So, we cannot do that, one condition the third condition is the P belongs to effects minus
of a 1 and P belongs to precondition a 2. So, if a, if a predicate P is required by action a 2
and predicate P is being deleted by action a 1, then we say that a 1 and a 2 cannot be
parallel. Again, the reason for this is, that if you got to linearise them then the 2 orders
will not produce the same effects essentially, for example if you going to do a 1 first then
it would delete P and then it convert to a 2 after that you could do a 2 first and a 1
afterwards.

But, then you can see that the effect of these 2 out actions is different, we want our
parallel pans to web search that they can always be linearise to give us a solution plan.
So, that is one more condition and the fourth condition is that, so it as draw it like this is
a 1 produces P and deletes P and if a 2 also does that, it consumes P and deletes P if both
of them are consuming something. So, this remember this dot stands for deleting P, it is a
negative edge, it saying that P is a condition for a 1 and not P is a effect of a 1, this is
also saying that P is condition of a 2 and not P is an effect for.

So, if both the actions, so it have something, you have consuming I mean there only 1
cake and there are 2 people, so only one of them eat essentially. So, both cannot be put
into the running after saying a is eating the cake and b is eating the cake. In our example
you can see that there is a negative edge from to arm empty from pick up b that this is
going to delete arm empty and this is also going to delete arm empty both are consuming
arm empty. So, this fourth case applies to this and we mark Mutex edge between this and
this we can also say it that this no op is Mutex would pick up b because it no op is
producing clear b and this pick up b is deleting clear b.

729
So, the a, second clouds which say that one of them is producing it and other one is
deleting it and these 2 actions cannot happen in parallels. So, we have to, so as you can
imagine there would be many Mutex suggest between a actions a, that we will have to.
So, one Mutex edge that we have is between this and this, another Mutex edge is
between these two, and there are others, of course that you can find, so that takes care of
the action layer.

(Refer Slide Time: 25:29)

Then Mutex relation between proposition P and P 2 are Mutex, so P 1 and P 2 are
proportions in the same layer. Obviously, Mutex relations are always within the layer if
there is, I will just write a wrong explanation that we see what to write, what we want to
say is that P 1 and P 2 are Mutex. If all ways that P 1 can be produce and all way is
Mutex with all ways that P 2 can be produce, and by this mean any action with produces
P 2, if Mutex with every other action with produces P 2 essentially.

If there is a no actions a 1 and a 2 in the presiding layer such that P 1 belongs to effects a
P 2 belongs to of a with 2 and, in other words if we can find 2 actions a 1 and a 2 which
are not Mutex and a 1 is producing P 1 and a 2 producing P 2. Then we can say P 1 P 2
are not Mutex essentially, but if you cannot find, so in there are no actions, a 1 and a 2 is
satisfy this condition then we say that they are Mutex essentially.

730
So, let us solve, try to construct this example from here, so the first thing you will
observe is that in this layer a 2. So, a 2 is a superset of a 1 everything which is there in a
1 will always be there in a 2, why because we are just no op operations which a carrying
forward everything which was here, which made this, all these action possible. We are
being carried forward here and, therefore there will be possible, so apart from this pick
up b unstack c from a and so on. So, for example it is always possible for me to say that
my first action is to do nothing, for some strange reason that I will do a no op.

If I do a no op then I can always do a pick up b after that essentially or I can always do a


unstack c a, after that which means these 2 actions must come in my layer, so everything
which is there in a 1 plus something new. What are the new things we can have? So, we
can see, so I am not going to write all these again here ideally you should copy it here,
but now it is a bit of work. So, I will just write this as a 1 plus the following that you
have holding b, you can do something with b.

(Refer Slide Time: 29:18)

You can stack B on to C, why because you know all the conditions are there if you are
picked up B, then you are holding B and then C is clear because no op, we will make it
clear in that, is that is about what you need.

731
So, we can stack B on to C, or you can stack B on to, no you can put down B these are
two extra actions that I will add to my, likewise for this one holding see I will add stack
C on to A, stack C on to B putdown C. So, I have the same A 1 set of actions and
remember all these no ops are always there for every predicate in my act proposition
layer any proposition that proposition layer there is a no op action which takes it power
to the next layer.

So, that is always part of the graph plus anything new that I can add essentially the only
condition, I need for adding an action is that it is preconditions must be available in a
non Mutex fashion essentially. So, again to repeat you want stack, you want to see what
do I need, I need holding B, of course I need clear C and that is what it is. So, of course it
will have its effects, it will say arm empty not clear C and not holding B and so on
essentially. So, in this process, I construct the planning a layer by layer, at every layer I
have to find what are the Mutex and put them in.

As you can observe the planning op is going monotonically essentially it is start with
these propositions then it expands and because whenever a proportion gets into the
planning of a no op will take it to the next layer, only new ones will be added. So, for
example A on a stack B on C, one thing that gets on B C plus of course all these P, so it
is growing monotonically, it is also. Can we show in? It is going as a polynomial of the
original problem size and general, the size of the planning of grows increases polynomial
essentially.

So, that is a first space of graph land you construct the panning graph. So, till what
stage? The stage 1, they are 2 possibilities that either you have reach the stage in which a
plan may exists. So, let say we have 2 arm robot for arguments, say we can pick up C in
and pick and pick up and pick up the unstack C A and pick up B in as action 1, then at
the second layer we can put down C and also no.

So, let say in the first layer, you unstack C A and in the second layer you put down C and
pick up A at the same time. In the third layer, you pick up B and the fourth layer, you
stack B on to C and fifth layer you stack A on to B. So, you can get a 5 step solution if
you had A 2 arm robot.

732
We know that for A 1 arm robot, as you on a last class there is a 6 steps solution, so the
stage when ends if one all goal predicates of propositions exists in a layer non Mutex if
you can find all the goal proposition. What were the goal proportions to interest in on B
on A B and on B C? If they happen be in A, in a proposition layer and they are non
Mutex, then we stop this forward planning graph construction exercise and we go to a
backward search.

So, typically you could do something like depths first here remember that the problem is
still in hard, but you could use other method, so for example people have try to
understand satisfaction methods and so on. What are we looking for? We are looking for,
so at some stage, so let me write that, those things here on B A and, I can write them here
on B A and, sorry on A B and on B C, I am looking for these propositions. So, if they
occur in some layer and they are not some let us say P n and they are not Mutex then I
start a backward search face and say. Can I find all the sub goals of this, in the previous
layer in the non Mutex fact fashion?

Now, hobbies these have positive links coming from actions and those positive links
could be in the final solution that we have going to construct. What are these positive
links going to be? What are the two actions will produce this stack? But, are they non
Mutex? I am looking for non Mutex actually, so I need to do this backward search
looking for non Mutex actions with non Mutex preconditions. So, as you can guess this
is a no op action essentially, so in a previous layer if I have stack, and I am I am saying
that because I know where the solution is a last action must be stack A on B.

But, essentially what they, I will go to them in neat will lead to do is to search, so from
this goal set find an action set which assumes goal sets which is non Mutex. So, stack A
on B and stack B on C will be Mutex because both are them, so stack A on B makes B, A
makes B not clear and this needs or something like that. So, something will come and
they will be Mutex essentially, I need a goal set here and then an action set here. Then a
goal set come, A here sub goals and then actions and then sub goals and then actions till I
come to the initial layer and nowhere do a encounter a Mutex solution.

733
If that is a case, then found up line, imagine as you can imagine this will be found only in
the 6 layer, for this particular problem because A we know by now that it need 6 actions.
But, in the sixth layer, we will find this stack A B Mutex with nothing and its condition
will be Mutex, will be not Mutex, A envelop them will be holding A and clear B and on
B C already the here and, so essentially will be able to find a path back. So, this
backward space is a search space which will try to solve every goal set with sub goal set
with another sub goal set right up to the start set in a non Mutex fashion.

So, do this, of course it, let it do some it try this sub goal set another sub goal set could
be just 2 actions, for example you can see that that it may try. But, of course it will not
find a solutions because for 2 no op action to be happen in parallel this must need 7
steps, we know we need this 6 steps anywhere. But, it could try this it could try this
combinations or it could try this combinations or it could try com third combination we
will do some kind of the search and typically the original graph and algorithm which in
the albumin first backward space was depth first search.

So, in the forward space, you go all goal proportions like this in the layer non Mutex,
you do backward search, if it fails then you extend the graph by one more layer and do
the same thing again. Now, notice that when you extend the graph by one more layer,
you will still get those things non Mutex or you mean, you need to convinced about this
that once a goal set is non Mutex, it will remain non Mutex in the next layer also. But, it
is possible that some sub goal may not the Mutex some sub goal may be Mutex which is
why we need to extend the planning a further and further essentially.

So, there are 2 cases, one is that a pan exists in which case stage 1 ends when all goal
proportions are found then the backward space begins else. So, I will just, else meaning
we are not found the plan extend panning graph and you keep doing this extend planning
graph search for a plan, extend the planning graph, search for a plan. So, you can
imagine that because we are doing that, if you find the plan it will be shortest possible
plan essentially, the other situation is no plan exists or this is the little bit trickles see. So,
I do not think we will have the time to go into details to stretch out the conditions when
the algorithm should say that no plan exists, but the general idea is as follows.

734
So, what is happening in this in this planning graph extension stage that more
proportions a being added, so more and more things will come in to play as non Mutex.
So, initially for example you might say pick up B is possible, unstack C from a possible
and then because we all, we already have clear C somewhere. You can say after B you
can stack C on stack B on to C and because we have unstack C from we can pick up A in
the next you can and so on.

So, it might be that you might see the proportions appear in the proportion layer, but they
may not be Mutex, at some point they will be Mutex and then you will search for in a
backward space. You will find that at some layer some preceding layer they become
Mutex and then you cannot complete backward space search all way all the way to the
start set. So, you find that their Mutex for you, if you have tried all possible
combinations and at some layer and they are Mutex, then we extend the planning graph
to let say it was that layer T, and then extend it to layer T plus one and then we come
back and search.

So, what Blamen first showed was this condition, if the sub goal set that you are trying to
solve the main constant when you extend from planning. What is the sub goal set? Sub
goal set is the set of sub goals where each sub goal is the set that you want, so for
example for on A B on B C, they will have the sub goals as we want to do stack on B
holding A clear B and on B C. These three actions would be in the sub goal set, this
action will have different actions at the sub goal set, some other actions will have
different actions in the sub goal set.

So, we have a set sub goals set of sub goal sets, if the set of sub goal sets does not change
as you go from layer T layer, T plus 1 as you extend the planning graph from layer T to
layer T plus 1 then we can terminate best in it. There is no plan or to get some intuitive
feeling behind this, what happens is that as we construct the planning graph first the set
of propositions stabilize that no more propositions added to the set. We added everything
possible essentially and then the set of Mutex relations stabilize, it means at no more
Mutex relations are added.

But, even then you have to be a bit careful that you can extend it by one more layer and

735
you might still find and what Blamen, first this is straightly complete condition which
says that if extend the planning graph from layer T to T plus 1. The sub goal set at some
intermediate layer n remains constant, then n you can be sure that you will never because
if you are not be. If B able to add a new sub goals, at when you move from T to T plus 1
you can never add anymore because a set rule set up propositions as stabilized set up set
up Mutex is stabilized.

So, there is nothing more you can do after that it takes a bit of a time to figure this, so it
is summarize a graph plan algorithm works in 2 stages. In the power stage construct a
planning graph which contains is 4 kinds of edges A, the precondition edges, the positive
effects negative effects and Mutex relations. Once it finds all the goal properties, it exists
in a non Mutex form in a layer it goes to the backward space tries to search for a non
Mutex substructure in the planning graph where they are no Mutex relations which
means that those actions are possible even if they are in parallel.

If it cannot find, it goes back and extends the planning graph by one more layer, again
goes it will the backward space and keep doing that. So, still at some point either it finds
a solution or this criteria which we described somewhat briefly is met where it says that
no plan can be found essentially. So, the planning graph gives as a optimal plan in terms
of the most the shortest times span or the shortest number of the steps in which parallel
solutions can be found if there is a parallels solution.

So, of course keep in mind that domain that we are talking about has no parallel solutions
because it has single arm robot. But, many domains may have parallel solutions and what
graph plan does give us is the shortest time in which a parallel solution can be found and
we can of course always linearise that by putting all the actions in any layer can be
linearise in any. What it gives us is the set of actions at layer 1, set of actions in layer 2,
set of actions in layer 3 which is the parallel shortest possible planning. So, it is an
optimal algorithm and it gives to the shortest plan and it can solve must larger problems
than the earlier I will go to essentially.

So, I think I will stop here with planning we do not have time to go into the other
algorithms. So, what you will do in the next couple of lecture is get a flavor of constant

736
satisfaction problems which is actually a very big area in computer science in a that there
are people who work, try to pose everything as a C S P and there is a huge amount of
activity writing solve us for constraint satisfaction. So, what we will do is, we will just
get a flavor of how to pose a problem as a constraint satisfaction problem. What is the
constant satisfaction problem? What are the basic ideas of? What are the algorithms at
we use for solving it?

The interesting thing about studying C S P constraint satisfaction problems is that it


gives us a opportunity to combine search with reasoning essentially know that you can
do some amount of search. But, you can contain the search while doing some amount of
reasoning and constant satisfaction problems give us a national way of doing that
essentially. We will try to get a flavor of that in the next few lecturers essentially, and
then we will move on to representation proper with login essentially. We will stop here.

737
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Lecture - 39
Constraint Satisfaction Problems

(Refer Slide Time: 00:21)

So, today we want look at different approach to problem solving and this approach is
called constraint satisfactions or some people called it constraint processing. And we talk
of constraint satisfaction problems and we call them CSP essentially. So, this constraint
satisfaction approach to solving problems is unified way of representing problems as
constraint satisfaction problems as we will see today. And then solving the CSP
essentially so essentially what this says. So, for example, in state space search we talked
about states end, moves end and going to new states and so on. And then we looked at
solution space search and these method in constraint satisfaction problems everything is
expressed in the following way. There is set of variables x, x one, x two, a finite set, a set
of domains d which you will denote by d 1 d 2 and so on. And the meaning of this
domain says that variable x 1 can take values from domain d 1 variable x 2 can take
value from domain d 2 and so on. So, each variable has it is own domain it does not have
to be the same set of values.

738
So, one can be numbers another can be colors another can be day of the week; another
can be name of the students; another can be the grade that the student has got anything in
the variable. And each variable has a domain from the third thing is a set of constraints c
which you will say c 1 c 2 let us say up to c k I will come to constraints in a moment first
let us talk about variables. So, we will assume that there is final set of variables which is
always the case so x n which means d n. And we will also assume far as for as we are
concern that this domains are discrete domains essentially and not only discrete. But we
will assume that they are finainic essentially which means that the set of values that
available can take we will assume it is finite essentially. Because of the particular kind of
sub problems that you want to look at and the solutions to those problems essentially
now obviously many problems can be posed as variations of finite discrete domain. So,
let us first discuss the constraint.

So, each constraint c i is a pair S i and R i where S i is called scope of the constraint. And
it is basically a subset of x which means that a constraint is defined over a subset of the
variables and that subset that particular sub set was for the i t h constraint is called the
scope of that constraint c i and R i. So, this subset let us say this sub set is made up of
variables x i 1 x i 2 x i p. Let us say that there are p variables each of them has a
corresponding domain and the relation R i is basically defined over those variables. So, it
is a sub set of d i 1 cross d i 2 cross d i p that is the most generate way of defining a
constraint satisfaction problems a set of variables a set of domains for each variable. And
a set of constraint defined over subsets of variable. And we have just using the generic
definition of a relation here simply saying that is the subset of the gross product of all the
domains in practice this could be explicit or implicit.

So, we will assume that the relations are explicit as for as we are concern for this
discussion. So, for example, if you say that there is a number between 1 and 5 or the days
of the week can be numbered from let say 1 to 7 when you will simply list this set as 1
comma 2 comma 3 up to 7. We could have said it in some implicit way like greater than
0 and less than 8. But we will not go into those things here and in many case it does not
really matter. So, we will assume that the domain the relations are available to us as
explicit pairs of tuples which is the subset of this cross product of the domains. Now, you
will you must have for example, solved set of linear equation they can also be seen as a

739
constraint satisfaction problems except that the domains may be continues in linear
programming. Whereas, domains are discrete in integer programming and they have their
own so specialized kind of constraint satisfaction problems have their own methods for
solving them essentially

So, for example, you know how to solve set of linear e Q inequalities or a set of linear
equations. We will not going to specialize methods they eve those specialized methods
obviously more efficient than the general methods that we want to look at that. We want
to explode is that for general kind of constraint satisfaction problems. And we will be
interested in those problems where the relations are explicit and the domains are finite
and discrete. We will called this a finite constraint satisfaction problems and we want to
look at ways of solving those problems what is the solution a solution to a CSP. So, we
often say that a CSP is denoted often by R as this triple x d and c. An very often we use
the term constraint network it is the standard term which the community uses. So, we
will also stick to that but when you say as a CSP or constraint network basically mean
the same thing this is triple of set of variables set of domains for the variables. And
constraints was this variables solution to a CSP is an assignment to each variable
obviously from their domains such that each constraint is satisfied.

So, they will define what we mean by constraint in a moment essentially. But that is a
general idea of a constraint satisfaction problem again to repeat we have a set of
variables we have set of domains for 1 domain for each variable from which can take
values. And we have a set of constraints defined over a subset of those variables without
loss of generality will assume that for each possible scope there is only 1 constraint
define. So, for example, if I take variable x 1 and x 2 i define only 1 constraint on that I
will not define 2 separate constraints, because you can always combine the 2 constraints
into 1. So, we are assume that there is 1 and the solution to CSP is an assignment to each
variable such that each constraint is satisfied essentially. Now, this is very generate way
of looking a things but it is very useful, because it turns out that a lot of problems can be
as constraint satisfaction problems.

And we can now subscribe to this strategy of saying that if you solving a new problem
just as a CSP then takes an of the self CSP solver and use it to solve the CSP essentially.

740
So, we can capitalize upon the expertise of people who have worked on constraint
satisfaction problems and use their solutions directly. So, only thing when you need to do
is so to pose it has a CSP and he turns out many, many problems can be pose as finite
discrete C S P’s for which we can use the methods that have discussed in the committee.
We will not have time to discuss those methods again as I said this is just giving you an
exposure to this particular field and for those of you who are interested you should come
to the planning and constraint satisfaction course like semester in which you been look at
all the defined methods which we will not have time to discuss here.

So, as in a side you should also observe that is the problem is a special case of a CSP and
what is what kind of CSP it is? It is the CSP in which the domains each domain has 2
values 0 or 1 or true or false or whatever. And the constraints are defined in terms of the
logical operators that we talk about and or and not and so on. And if you have such a
problem where you have this then you have a sat problem essentially. So, the sat problem
is special kind of a CSP and again of course sat has its own specialize approaches to
solving them. So, there are approaches to sat and all kinds of things. So, again we will
not going to the special ways of solving things essentially but we just observe that that is
also a CSP.

(Refer Slide Time: 11:51)

741
Now, our favorite problem one of our favorite problem small problems is this n queen
problems. So, if you look at this problem let say we are looking 4 queen and how can we
represent this? We can say that there are 4 variables let say x 1 one for this column x 2
for this column x 3 for this column x 4 for this column. And let us say the values the that
can take a ne 2 3 4 4 numbers then we can express this as a CSP there equine problems
by specifying the constraint So, I will use the short form R 1 2 to stand for constraint
number 1 which has the scope of variables x 1 and x 2. So, will uses directly in a relation
here so it sort of clear between us. Then we are talking about a constraint which is over
the first 2 variables what us those constraints we can simply specify it as saying that
these pairs are allowed. So, for example, you have placed the queen on first row the first
queen on the first row second queen I can only place on the third row.

So, 1 comma 3 is allowed or 1 comma 4 is allowed or 2 comma 4 is allowed or 3 comma


1 is allowed or 4 comma 1 and 4 comma 2. So, this is what I meant by saying that we
have it available to us explicitly it does not have to be implemented like this. As far as it
is only for our discussion that we are assuming that this constraint between the first
queen and the second queen i e expressed available to us as a pair of values that is 2
variables can take. So, each variable can take 4 on of this 4 values the domains are all the
same and this constraint is expressed like this likewise you will have to express R 2 3 R
1 3 R 1 4 R 2 4 and R 3 4. So, there are 6 constraints, because you can choose 2 variables
in 6 place and all those 6 which I am leave as a small exercise for you is to be can be
expressed as a solution essentially. And then of course the problem is to find the set of
values for this variables such that the values satisfy all this constrain which means if I
taken a value for variable 1 and some value for variable 2 let us call it i and j. Then that
pair i j must occur in set of topples here then that essentially.

742
(Refer Slide Time: 14:42)

So, we have some definitions we say that a constraint or we say that assignment. We will
use a bar as an assignment which will be a short form for a set of so a over scope s. So,
basically this means that this scope as says on which variables this assignment is done
and this assignment selects 1 value for each variable from their respective domain
essentially. So, explicitly I would say something like ah x 1 is equal to some value a then
x 2 is equal to some value b and so on. But implicitly we will just assume that we will as
a vector of a b and so on. And we assume that somehow we are able to specify what the
variables are essentially it does not really matter the only important thing is that
assignment is over a scope S. The scope S basically says that these are the variables to be
giving value essentially. So, an assignment a a over a scope S satisfies a constraint ci if
the following wholes that the scope of this constrain which you will call S i is a subset of
this scope S which means every variable in the constraint has a value and we will use the
term pie to talk of a projection.

So, I am sure you are familiar with a notion of a projection here of a bar over this set S i.
So, why by this we mean that from this assignment a which is over some set of variables
which you called S select only those values which correspond to those variables which
belong to S i. And this is projection of a bar onto the subset of variable is a subset of R i
so an assignment satisfies. So, if I given assignment here for example, i put a queen here

743
and i put the queen let say here and i put queen here So, this assignment says a is equal to
1 for the first queen 4 for the second queen and 2 for the third queen that is an a bar for
this particular assignment and I can say that this assignment a satisfies the solution R 1 2
why because if I take the projection of the of the these variables on the first and the
second variable which is 1 and 4 I can find that 1 4 of course in my relation R.

So, it satisfy the relation R then we say that an assignment a bar is consistent. If it
satisfies all constraints in its scope I will not expand upon this. Basically we are saying
this a bar has a scope S and for whichever constraint that scope of that constraint is a
subset of S. It must satisfy that constraints which means projection over those variables
must be belong to that particular triples essentially. So, you can see that this particular
assignment which I have here where first queen is here the fourth queen second queen is
here and the third queen is here is consistent. So, I have not written those constraints, but
you can see that between 1 and 2. It is satisfying the expected constraints what is the
constraints that the queen must know the attack another queen which here express
explicitly here you could have express. It has a relation which says x if that is the array
then xi not equal to y i or something like that essentially if x 1 xi why location of the
queen that they are not on the diagonal. They are not on the same row they are not on the
same column you could have express it something like that essentially but we have
expressed explicit here.

So, we can see that the first 2 queens are not attacking each other .So, they and be that
shown by this the second and the third also not attacking. So, if you are written R 2 3 you
could have seen that and the first and the third also not attacking and they would have
been present in this essentially. So, we say that this assignment a bar is consistent
essentially sometime you use the term partial assignment which means that if an
assignment only 2 a subset of the variables. But we will use the time term
interchangeably or a partial solution some time we say. So, an assignment is consistent if
it satisfies all the constraints which call within the scope in this case there are 3
constraints between queen 1, queen 2, queen 1, queen 3 and queen 2, queen 3 and all the
3 are satisfied. So, this assignment is consistent now observe that just because it is
consistent it does not mean it can be a part of a solution. Because you can you surely
know that this cannot be extended to a complete solution essentially you cannot put a

744
value and you this thing you put it here it will attack these 2. If you put it here it will
attack these 2 if you put it here it will attack this if you put here it will attack these 2. So,
you cannot place a variable.

So, this a consistent partial solution but it is not a consistent full solution. A full solution
is a consistent assignment to all the variables which is another way of saying that what
you are said here that a CSP solution to a CSP assignment to each variable which
satisfies all the constrain essentially which is the same thing that is saying that if you
assign all the variables and it is consistent it is a solution essentially. So, how do we
solve? So, again we want to look at general purpose methods of solving CS P's just like
with it for state space or we did not care what the state was and where the moves end
came from as long as we had move zen function and as long as we had a goal test
function. We said we will use the search algorithm that are used likewise we have just
made an observation that many problems can be posed as C S P’s I have mentioned
sometime during planning. That planning can be posed as a CSP which we will not have
time to going here but it can be planning can be also posed as sat which is the special
case of C S P.

Both are slightly different formulation and we will look at in due course couple of more
problems which can be posed as CSP but we want to look at general purpose ways of
solvency C S P’s essentially. So, let me discuss 1 more problem before we go just to to
highlight what are the issues involved. So, let say we are doing this map coloring map
coloring can also be posed as CSP before we come to map coloring we are not said
anything about the scopes of this constraints essentially do we have any constraints on
the scope of this constraints. I have posed this problem where the scope was 2 variables
between queen 1 queen 2 or queen 1 queen 3 or between queen 1 and queen 4 and so on
and so forth why not between 3 variables between 4 variables in this. You can express
constraints and in more variables for example, if I had express the constraints as a
constraint of 4 variables I would actually be expressing the solution itself, because I
would have just basically the set of solutions in that essentially. So, in fact, this kind of
illustrates the idea that we pursue in constraint satisfaction problems that you do not have
to specify the problem completely as long as you give some specification of the set of
constraints it is a task of the solver to elicit a solution out of that essentially.

745
So, what is the solution to this? You know that solution to the CSP there in fact only 2
solution 1 is 3 1 2 4 and the other is 2 3 1 4 2 3 1 4 2. And the other 1 is if I start with 2
and 4 1 and 3 I have only these 3 now this particular relation is a relation on the set of 4
variables this is called a solution relation where R refers to this constraint satisfaction
problem. So, correspondent to a constraint satisfaction problem there is a solution
relation and it is task of the solver to elicit the this solution from the CSP I have only
specify binary constraints here between 2 variables. And the solver will eventually tell
me the these are the 2 possible solutions it may not express in this form but it gave me an
assignment of all possible variables and we will do that. So, without again loss of
generality we will assume that we are working with binary relations.

A binary C S P’s and a binary C S P’s we mean that C S P’s which have scopes of size
either 1 or 2 essentially scope of size 1 basically says that I am defining subset of domain
scope of size to subset of the cross product of 2 domains. So, binary CSP has scopes 1 or
2 and such a CSP is called binary CSP and it has been shown that any higher order CSP
can be converted to a binary CSP by adding more variable essentially. So, I will leave
this as a small thought exercise for you to work on how can I converted into how can I
convert this. For example, into a binary CSP if this was my original CSP which is the
solution itself how can converted into a binary CSP. So, we will stick to binary C S P’s
because we know the there is a whole lot of methods which obtain those problems and
other problems can be express as a binary C S P's. So, this map coloring problem is
naturally pose as a binary CSP ao.

746
(Refer Slide Time: 26:16)

Let say I have 3 countries will not name them and let us say my formulation of C S P’s
that I have a relation between this, this, and this. So, let us just called this x 1 x 2 and x 3
and let us say they have the same domains which is let say just 2 colors are allowed red
and blue. So, I think that I think the domains inside this circles and the relation is not
equal to now we can see that this problem has 2 solutions. You can color these 2 red and
this 1 blue or you can color these 2 blue and this 1 red essentially which means implicitly
there is a constraint between these 2 variables. But that constraint is not explicit it has not
been mentioned in the CSP the CSP only says that I have a constrain between x 1 and x 2
and between x 1 and x 3. I do not say anything about x 2 and x 3 but eventually off
course, this can be again elicit ate through a process of solving the constraint. And we
will try and see whether we get some insight into how we acquire new relations in the in
some manner essentially. So, how so let us look at these 2 problems. So, we have now
you know in those case 4 variables in this case 3 variables and we have some domains
and so on. How can we solve a C S P?

The simplest approach you can go back to state space search. You try you sign a value
for the first variable; you have sign a value for the second variable; you assign a value
for third variable assign a value for fourth variable and so on. Finish an assignment in
state space search we would have finish the assignment and then check whether it is goal

747
state or not in constraint satisfaction problems. We can backtrack earlier why because the
moment you know that some constraint is not being satisfied or some we have defined
the lotion of consistent assignment or consistent partial solution. The moment we know
that the assignment is not consistent we can backtrack from their essentially. So, you do
not have to assign values to all variables. So, in other words you do not have to assign
values to all variables. So, in other words you do not have to as in this case of course you
will backtrack only when you see the fourth when you try to put the fourth queen. But if
you let us try a 6 queen problem, you will see that there is some kind of processing you
can do when you can back track only or you might be able to backtrack only.

(Refer Slide Time: 29:24)

So, the simplest algorithm for solving a CSP is called backtracking it is a official name
of name of this algorithm essentially. So, everybody uses this term essentially. So, what
do we have? We have constraint satisfaction problems given x given d and given c
essentially. So, let us this outline this algorithm So, 1 issue that keep in mind is that let
say you have assigned some value to x 2 and then you are trying for values of e c. And
let say you assigned the third value to x 2 let us say in a set of domain you are trying the
third value 2 x 2. Then you go forward to x 3 and you find that there is no values to x 3
and you need to back track to the fourth value of x 2 and then try all values of x 3 again.
So, the fact that you every time you go back and forth you have to keep track of what

748
values left for you to try this algorithm does takes a very approach it makes a copy of the
domain every time.

So, it start up by saying i gets the value 1 di is a copy of di in this case d 1 and while i is
in this range n it does the following it calls off function call select value. So, let say x
gets a value from a function called select value I should also initialize a equal to empty
initially I do not have a value for the partial solution given a select value for this x i x i d
i c. So, what this select value function will do is it will give me the next value for xi from
its domain d i; d i prime actually with this consistent by this you mean it is consistent
with all the previous variables that have been given values. So, if xi is equal to 4 let us
say then if there is a constraint between 1 3 and 4. Then this value for the first variable
must be consistent with those first and third value since essentially the way that you have
just defined there essentially. So, it will written 2 things it will either it something called
null which means it cannot find a value then we say i goes to i minus 1 D i prime hence i
goes to i plus 1 you have got some value for x 1. You got some value for x 2 you got
some value for x 3 or let me try and draw more elaborate diagram let say x 1 has theses
values x 2 has or d 1.

So, let us say you are looking at x 4 now and let say you have always progressing from
let say your select value function always fix the value from left to right, because they are
in some list and it has selected this value for d 1 and this value for d 2 and this value for
d 3. And now we are trying to find if any of these values for d 4 is consistent with this
assignment we have here essentially. So their 2 thing that can happen either it can it will
say that no nothing is possible here which means I was go and look for a new value for d
3 here, because we are tried this. So, we are doing simple that first search over this 3 of
possibilities where you start with the first 1 here then the first 1 here then the first 1 here
and then keep back trap. If you cannot find a variable value here you must go back to this
1 and try the next value for this essentially which means you must set i is equal to i
minus 1. So, I do not need to change this right.

So, let see what select value is going to do essentially what it will do? So, let say you are
looking at xi from di and c it will say let us say a gets head of di if it is a list it takes a
first value and it says di di before that it must have check which says that if di is equal to

749
empty then return null. So, if any point di becomes empty it must return null otherwise it
it should go into the this loops looking for consistent value if a i if a sorry let us call this
a a i if a comma a i. It means the whole solution that has been constructed so far which
we will as a. So, this 3 values in this examples plus this fourth value that we are we are
just picked from its domain we put them all together and that is a new partial solution if
this is consistent then return a i. Well I called it x here you could have called it x it does
not matter. So, it is in a loop inside taking out values from the domain taking out and
showing them to speak essentially. So, when it was doing d 3 that d 3 prime had thrown
away these two values from the domain.

So, they were not there in d 3 prime anymore. So, when it backtrack it must try the next
value essentially then it must try the next value then the next value. And if you cannot
find there is must backtrack that is the standard. Therefore search chronological
backtracking that we have been talking about this other form of backtracking that we had
mentioned dependency directed backtracking was actually invented for solving for
improving this algorithm backtrackings. So, that instead of if you cannot find a value for
d 4 just to give an example that let say only constraint that d 4 participate it is in let us
call it R 1 2 4 which means this variables 1 and 2 and 4 essentially or if you want to work
with binary constraints and let us say R 1 4 and R 2 4. So, let say the only constraints d 4
participates as in with variable 1 and variable 2 with variable 2 then if you cannot find
the value for d 4. What is the point of looking for new value for d 3? Because d 3
anyway not influencing the consistency of the choice of d 4 essentially the only thing
that d 4 is getting influences by this relation R 1 4 or by R 2 4.

And the fact that you cannot find value for d 4 means that one of this relations you
cannot satisfy which means actually you should really jump back here. But of course we
will not get into that here, because it needs a little bit more formulization. So, that is the
general idea of dependency directed backtracking that if you can keep track of which
constraint is being evaluated. Then you can jump back to one of the variables in that
constraint in this case they have binary here essentially. But in our example we just to i is
equal to i minus 1 which is like doing chronological backtracking if you cannot find a
value for d 4 try a new value for d 3 and so on essentially. So, that is a i mean I have
written may be it is not quite correct something is missing here. But you can work that

750
out work out the details, but the basic idea is to go down this set of choices looking for a
new value for the is variable. If you cannot find it if you can find it go to the next
variable which is what we have doing here i is equal to i plus 1 di copied the domain and
solution that we have made and then keep going forward in that session if you cannot
find it backtrack. So, it is simple depth first search being done over this structure of the
now this algorithm is actually the simplest the starting point for constraint satisfaction
problem algorithms.

The other algorithm which improve upon that do the following is that for example, some
algorithms when they are doing select value though look head to see whether or given
choice will in future conflicts with some vary variable essentially. So, that is one kind of
you know improvement which is called look ahead algorithm then we have this kind of
intelligent backtracking algorithm. So, dependency directed backtracking algorithm that
we are talking about essentially and there are some other things that we will discuss. So,
that you get a flavor of that essentially, but this by and large is a the simple algorithm.
Now observe that backtracking happens the moment of partial solution is what is a
partial solution we are talking about here it is a value for a 1 for a 2 and up to a i minus
1. And we cannot find we cannot extend this to a i or the is value for the is variable or 2
x i. Let say we cannot find a value for this is variable and we backtrack at this point itself
we do not go further to see you know, because at this point as you can see no by
definition not right no a solution is a consistent assignment to all the variables. Or in
other words a solution is an assignment to all the variables.

So, that every constraint is satisfied and a partial solution or a or a partial assignment is
consistent. If it satisfies all the constrain for whom values have been selected or who's
scope fall within this essentially here. So, we say that constraint ci assignment a bar
satisfies ci if the scope of the ci is has a value which means it is already in the
assignment. And the projection of this assignment over this scope of S i is a subset of
this which means it which actually I should say belongs to this belongs word here belong
to this and in. So, that is only for with respect to 1 constraint ci and assignment is
consistent if it is satisfies the all the constrain with satisfy this property which means to
scope are contained in this and who's projection is contained in that correspondent
essentially. So, it can only be that it can never be the case that a that assignment which is

751
a partial solution which is not consistent can never consistency. So, when I was talking
about beginning on CSP one of the thing. So, I had said was that CSP is very interesting
because it allows you to combine search with reasoning now that reasoning part web
have not seemed. Let me just give you a motivation for that and then in the next class,
we will see how that is done. So, one of the another problem which is pose as a CSP you
know if you solving a crossword puzzle. Or if you are solving sudoku or something you
do a lot of reasoning in fact you are supposed to do more reasoning than search.

(Refer Slide Time: 45:08)

So, they these problems are similar they are called some of you must have looked at such
problem and if you just search on the web you will find examples. So, for example, you
might say send plus more. So, these are sort of arithmetic problems in which we the
digits have not been revealed, but they have been replaced by letters and your task is to
find out what digit does each letter stand for. So, we assume that each assignment is
distinct that that 1 letter stands for exactly 1 digit and so on. So, how would you solve
such a problem essentially? So, if you want solve this or let me take another example.
So, here is a another example. So, this plus this should give you apple essentially. So, if
you wanted to solve something like this then what you would do is you would create a
variable for each a place for each letter of course the number of variables are number of
distinct letters in this problem, but you want to find vales for them. So, these boxes that

752
an creating are boxes for values this a plus and obviously you must create some more
boxes which are for carry over.

So, 4 boxes for carry over and you can define the domains of so these variables so the
domain of this is 0 slash 1, because we adding 2 numbers we can never have carry over
moves and 1 and the domains of these are 0 to 9 essentially. So, the kind of reasoning
that Sangeetha was talking about we can do here essentially. So, for example, you can
observe that this must be 1, because you can never be get a value of more than 1 which
means this carry over must of course we are not interested on this. But never the less the
what the moment we have said this a a equal to 1 we know that this is equal to 1 and this
equal to 1. We can fill in the value there t is 9, why t 9? And it must be distinct this a p
and t are distinct they cannot stand for same digit. So, which means this p cannot be 1
and in any case p cannot be 1 So, if t is nine then what is p is 0 this must be 1 we have 0
here also know and we have nine for t here we have nine here nine here So, this must be
eight this must be 1 So, we know this must be 3.

So this is 0, now, we need to fill in this is 8. So, this leaves only 2 for this. So, we have
off course, solved in this problem now, and we are solved it through a process of
reasoning certain some rezoning with it you know this can only be 1 and know this kind
of stuff. Now, if you off course, were not should I say smart enough to do this kind of
reasoning you could still have try this algorithm backtracking here you could have said
try a value for a try a value for b and so on and so forth. Off course, we have to express
the constraint, how do we express the constraints that this plus this must be either this
must be either equal to this plus this or this plus this this plus carry over 1 and modulo
10. So, we have to express that carefully essentially, but the point is that we a try to make
here is that solving constraint satisfaction problems not necessarily have to be done
through a search method like backtracking. They can be done through other processes
like reasoning in some way now can this reasoning be done in general purposely
essentially.

Of course, here we are exploding the rules of arithmetic and it turns out that yes indeed
you can do a certain amount of. So, this process is called constraint propagation. So, we
are saying that if this is 1 this must be 0; this is 0; this must be 9 and this must we are

753
propagating the constraint. So, what are we doing here basically reducing the domains of
the variable we have reduce this from 0 to 9 to just 1 value 1 and just to 0 and so on so
forth. And the moment we reduce the value of some domain some variable we can
propagate a reduction to another domain. This idea of propagation is of course very
common in reasoning. And we will see in the next class little bit about how this can be
generalized. And we will look at another interesting problem that propagation demands
that quite effectively. So, we will stop here with this.

754
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Lecture - 40
CSP Continued

So, in the last class, we saw the definition of constrain satisfaction problems and we
made this observation that solving the C S P basically allows co-operation between two
kinds of algorithm, one, which the search over possible assignments and the other which
does some kind of reasoning which we will call as propagation. So, we have this notion
of consistency.

(Refer Slide Time: 00:36)

We are, this notion of I consistency, so we say that a C S P or a network is I consistence,


if every consistence assignment to I minus 1 variables can be extended to I variable
which means that, if we have found values for I minus for 1 variables any I minus
variables. Then, you can always take any I variable, any next variable and extend find
the consistent value for that. So, of course not all networks will be I consistence, but the
general effort in reasoning is to enforce consistency in some manner essentially.

So, we can start with the very simplest notion of consistency, which is called node

755
consistency or one consistency, when you say one consistency then we use the term note.
So, it basically means that you take any variable and you will find one value which is
consistent. What do we mean by consistence with one variable? That if there happens to
be a constraints over only that variable, a unary constraint then all the variables satisfy
that. So, we can enforce node consistency by simply saying that if there is some value
which does not satisfy constraint, remove it from the domain.

So, in general node consistency and there is one consistency, two consistency will
domain which essentially. So, two consistency, what we will look at it cal A R C
consistency and what two consistency says that you take a assign a value 2, any variable
which is node consistence which means that values satisfy any unary constraint that there
might be, you take any other variable and you will be able to find a value for that
variable which is, which. So, that there is a 2 values, for the first variable and second
variable are together consistence essentially.

(Refer Slide Time: 03:19)

So, to talk about A R C consistence is often useful to draw, what we call is a matching
diagram, and the matching diagram basically does something like this, that it creates a
domain for each variable. So, this is X 1, this is X 2 and then you have variables inside is
domains, so these are the values that this variable can take and edge represents the fact

756
that this value, let us call it A 1 and this value let us call it B 1 belongs to the relation
essentially. So, basically it is a depiction of relation essentially, so I whatever the relation
is something like this, so such a diagram column is matching diagram.

So, what we want to do in our consistency over these two variables is to keep only those
values in the domain, for which there is a corresponding value in the other domain
essentially. So, for example we do not want to keep this value because there is no
matching value in the other domain, we do not want to keep this value, we do not want to
keep this value. So, to implement this, we have a simple procedure called revise and this
is kind a standard name for it D X D Y or X Y it takes. So, we say that we are going to
prune the values of D X to only those values which have a value corresponding value for
this R X Y.

So, if this is X and that is Y we can prune in this essentially, so for each A belonging to d
X, if there is no B belonging to D Y, remove A from D X very simple procedure which
looks at each value. So, this is A, this is a procedure which is only pruning the domain D
X, so it looks for each value this is A X and that is Y, and it looks each value of X. If
there is no value in Y, if there is no B belong with D Y, I should have add it first, that A
B belongs to R X Y. In other words, if we have a binary constrain over X and Y, then
every value of X should have a matching value in Y and this procedure revise is pruning
this domain X domain of X.

So, it looks at all these values and for A, every value where there is no matching value,
on the other side this one, this one has two values. So, it, this one has no values, so it will
remove this from the domain, it will keep this, it will keep this and will remove this. So,
it will remove 2 values and keep one essentially, so the process of doing our consistency,
basically doing this repeatedly so that all pairs of variables are our consistency
essentially.

757
(Refer Slide Time: 07:29)

So, the simplest algorithm for doing that is call A C 1, so and, so our consistency 1, and
it does basically the following for each X Y, such that R X Y is the domain, R X Y is the
real, is a constrain, call revise D X D Y, R X Y and call revise in both directions D Y D
X R X Y. So, we will assume that R X Y contains the relation between them, so I instead
of writing R Y X, I am just writing R X Y, for each pair X Y of variable we want do this
which means if I have 3 variables.

Let us see this one, and let us say I have something like this, so this is X Y and Z, so I
have 3 variables, 2 relations between X and Y, Y and Z and I want to make this network
our consistence which means that every assignment one of these variables can be
extended to an assignment of 2 variables? Notice that, in the constrate network diagram
that we had drawn earlier, we would have drawn this like this, this is Y, this is Z and this
is X. So, there is A between these two, this is called a network from same network, it
basically a constrate graph which says which variable is constrain, by which variable and
we are talking about binary constrain.

So, we have only edges, otherwise you would have hyper edges, the constrain of,
basically identifies which variables are related to each other. The matching diagram tells
you which values of variables, our participating in those constrains, when we say

758
something like, this that, this is a constrain graph. So, if you remember this map coloring
example that we saw, we said that if this is, if the domains of all 3 are red, blue and this
is not equal to this, and this is not equal to this. So, that is a constrain given to first,
implicitly we have a constrain between this and this which says that anything is allowed.

So, R B is allowed R, R is allowed, D B is allowed, and B R is allowed, if we have not


specified, if we have not specified a constrain, that means implicitly it is a universal
relation. It means anything allowed to anything which means, what that our back
trekking algorithm said that if you first want to give a value for X, and if you want to
give a value for Z, it is not putting constraint which means you can choose. If we choose
R for X, you can choose B for Z, it does not care because it does not know any constrain
essentially.

So, there is the universal relations, but of course this not expressed, it simply sit, so it
does not participate in the matching diagram here and it does not participate in this. So,
when you say R X Y is the constrain, remain it is an expressive constrain mention in the
C S P essentially.

(Refer Slide Time: 12:02)

So, before I come back to this, we have a notion of 3 consistency which is known as path

759
consistency, remember the notion of I consistency which is the generic notion. So, 3
consistencies say that any pair of values which are constraint can be a extended to a third
value. So, if I choose 2 values for this, I should be able to extend to a third value, so if I
choose Z for this and blue for this, I can extend it to blue for this.

If I X, if I choose blue for this and red for this, I can choose blue for that, but if I choose
these two values first, if I choose red for this and red for this then I can extend this two
blue for this variable Y. But, if I go to choose red for X and blue for Z, then I cannot
extend to Y, so this network is given to us is node 3 consistence or is node path
consistence. What path consistency would do, would to prune this relation in this
example?

So, what we do? It do that it will say that you choose red for this and blue for this and it
cannot extend it to the value. So, this red blue must be removed from this relation you
choose blue for this and red for this and you cannot extended to a value, so you must
prune this from the relation. So, enforcing path consistency prunes relations the moment
you have pruning a universal relation, you have making a express it, so in fact it adds a
new relation. So, when you do path consistency you get a new network, it looks like this,
it adds the relation in some sense, in the sense that earlier it was a universal relation, but
now it is not a universal relation.

So, you have to express it, represented which means X is not related Z, so these two are
not equal to, but this one is equal to, so we have new network. So, the general idea of
enforcing consistency is to limit the choice is a available to a search algorithm, to only
those which are like you to participating and solutions. But, since an N variable problem
will have N variable, so we, to really achieve full consistency you would have to achieve
N consistency, which off course is you can take it from me is a hard task.

So, which is why we do not often do it, so very often algorithms will do ache, some
degree of consistency, either node consistency or path consistency, consistency or path
consistency, or some higher order consistency depending on how much, how complex
the problem is. Then leave the rest to search and leave the rest to those other kinds of
things that we just briefly mentioned in the passing like dependency direct is back

760
tracking or loop a heading search.

So, there are various tools in the, of a C S P solver consistency enforcement is one of
those which is what we are trying to look at today. So, coming back to arch consistency,
I want to make this network arch consistence, which means if I choose any value in any
variable, I should get the allowed to by the consistency, given to me choose value for the
next variable. So, obviously as you saw, when we call, what will we do? We will call
revise with X and Y with Y and X with X and Z with Z and X with Y and Z and Z and
Y. So, at least 6 calls to revise, we will have to make remember, revise is the directional
call it prunes only the domain of the first one.

That if for each X and D X, there is no corresponding value in Y, remove this A from D
X, that is what revise does for this. So, we have to make at least 6 calls to revise the other
6 call, enough is a question I want to ask you. If I say revise X Y, revise Y X, revise X Z,
revise Z X, revise Y Z, revise Z Y, am I done, and do I get a network which is our
consistence? Let us try it on this, so let us call revise X Y first, which means we are
going to throw this away and we are going to throw this away from this essentially. Then
that is called revise Y X, which means we are going to throw this away from the domain
of Y, so let me actually circle it.

So, the N cross it, so we know that those things are not there essentially, then next called
X Z, X Z we do not have a constraint at all, the constraint given to us is this. This
constraint diagram which turns anything about, which means it is a universal relation
which means there is no constraint, on choosing any value from there. So, this revise call
will not do anything, but we have a constraint between Y and Z, so let us try that. For
this, we have this value, for this we do not have this value, so we have to delete this for
this, we have this value for this, we have this value, anyway this is not there, for this we
do not have this value, so we have to delete this. So, this is gone, now let us see what
happen the moment I have deleted this, this value

Well, this has one left here, but this related this value this one does not have a matching
value, so I read something revise X Y. But, now when I did revise wise it, I deleted the
value of Y and this value of X does not have anything left, so at least one more to call to

761
revise, I will have to make essentially. So, if you do this, then this one has this, this one
does not have this, so this goes away, this one has does not have this of this goes away
this one has this.

So, it is almost our consistence except for this value, here this value does not have
corresponding value, here this value has, this value here and then this value has that
value there, this value has this value here, and that is that, but this one does not have
value here. So, I must make another call to X Y, so which means I must put this into a
loop. What is the safest thing to do until no more changes, until no domain changes? I
will keep doing this, revise all these, revise at the very outside, we can see calling revise
X Z does not really make a sense.

But, of course we are not calling revise X Z, because the condition says for each were X
Y. So, set R X Y is constraint, so R X Z is not a constraint. So, it is we will make calls
revise X Y, Y X, Y Z and Z Y repeatedly, till we come to a condition where no domains
as change because then we are show that it is our consistence essentially. So, is this
algorithm, is it nice, is it a good algorithm? The answer is no, because it can be argued
that the number of calls number of calls revise that you make is actually very large that
in the worst case in every cycle you will remove only one value from one domain.

In the worst case, you can construct a network like there, so that in one cycle which
means a complete set up revise calls, you will remove only one value from one domain,
and in the next cycle, you will remove one value from one domain. So, if there are N
domains, and let us say all of them are connected and each of them has K value, then you
will make N into K cycles which is obliviously not in efficient things to do. The reason
for that is that, why should we do this route force call to all combinations of revise.

Now, if we look at what is happening here, what we should have done is the following
that when we made this revise at call, and then we deleted this value from the domain of
Y, we should we have said Y has change. So, therefore any constraint participating in
relation in which Y is there, look at that relation again because Y has change any X Y
participating in this thing, so look at that relation again. So, which means when we delete
a value from a domain then we should for future, revise call only those relations which

762
are participating with that variable essentially.

So, that actually leads just in algorithm which I will not describing any more details here
which is called A C 3, which was describe a who is consider by many, to be in some
sense, the big boss of constrain satisfaction. If you want to meet him, we can go to of
Ireland, where there is the big constrain center and it probable the place which the largest
population of the constrain researches essentially. So, there are algorithms for some AC
3, there is no, there is A C 2 is missing lost essentially, but this A C D 3 algorithm ,what
I will just refreeze describe. What we does is that initially it maintain the cube of all
possible revise called, but then it removes elements from the and it makes a call it add to
the queue only if it beans it necessary that I need to check this relation again.

Essentially, which means if I deleted something from Y then let I should check this
revise X Y call, again make this at least time as loop again if I did it something from in
this case it is only Y. So, if I did something from y in a call from y to x then I must make
a revise call to Z, Z Y again essentially. So, only if there is a danger of having lost some
matching value I will make a call to revise again. So, I will put that particular revise call
in the queue which means only where changes are matter are happening this constraint
will it propagated in that sense that is why we call this constraint propagation.

We have to another algorithm which of course we will not discuss call A C 4 does not
even make generic calls to revise, again it says this value has change in this thing. So, if
this value was a matching value for some this thing some other, so let us say this value
was a matching value. For this value in X, let me go and check if X still has a matching
value or not, otherwise I will remove X from that. So, I does not even make this revise X
Y call, again it is only looks at this particular value to see if there is a corresponding
matching value left because it could have another value.

But, in which case, of course we do not have to delete it simply, because it does not have
any value remaining after this has been deleted, this should be deleted. So, that is the still
final level detail algorithm A C 4 and, obviously the complexity decrease as you go from
here to here representation increases, we have to represent more things essentially. So, I
want to talk about one particular constrain satisfaction problem where an algorithm,

763
which his claim to be some were between A C 1 and A C 2 or close to. What would have
been A C 2? If it has been describe is use.

(Refer Slide Time: 25:20)

This is a famous problem know as Huffman close labeling and there is a well know
algorithm call Waltz algorithm which is some were like is A C 2, one more less, so it is
not officially called like that. So, let me first describe the problem, the problem is for
scene labeling and when you say seen labeling we mean that we have a line drawing to
available to us and you must label it is essentially. So, for example you have a figure like
this, so we will look at a variation which talks of simple figures though Waltz algorithm
actually applies to more complex figures the simple figure that we will talk about a
trihedral which have 3 faces.

So, each vertices has 3 faces and faces of planer, so any object which is made up of
planers, whether surface of planer and where every edge every vertices is made up of
exactly 3 edges. So, all this vertices qualify, so these are 3 edges three edges and 3 faces
coming to meet there, these vertices 1, 2 and 3. Then every vertices made up 3 faces,
such object are called trihedral object and certain object, for example if I am able to draw
something like this, if you can imagine this over this let me object like this. You would
not be trihedral object even if equal to be object, so anyway this is not a, cana of object

764
we are dealing with and interested in these cana of object.

Now, it is a labeling problem which means you have to label every edge and labels are as
follows, class labels for stands for convex edge. Remember, we are talking about an edge
by convex, we mean that the matter or material is inside, in some sense I would label this
edges plus because it is a convex edge. As when seen from outside, minus edge is a
concave and I would label this and this edge, for example as minus because it made up of
a concave, so corner of a room between the walls and roof.

For example, these are out concave edges essentially and various books used different
some people use notion fold and blade. But, we will use the notion of arrow, either
pointing this way or that way we will distinguish between them, but the general idea is
matter on right side. What you mean by right side? That if we are following the direction
of the arrow the matter is on the right side and the other side is blank arrows of whatever.
So, if I went to look at this figure matter is this side, so I must label it this side and like
this and so on. So, this Huffman close labeling task is to label a line drawing with these 4
kinds of labels, in a more general case there are other kinds of labels that we use.

For example, we have shadows, then if we have cracks in objects or if we have more
than 4, 3 edges, 3 phases meeting at vertex. So, all kinds of edges in be there and they
could be other kinds of labels, but for this very simple class of object which is trihedral
objects which are made up of planer where each vertex made up of 3 faces, there are the
4 kind of objects we can use to label. So, object in the task is to find this labels
essentially, now this space for this problem you can see is that each edge can be label in
4 ways essentially and we in vertices.

So, when we look at a line, the line, the drawing like this, we can distinguish between 4
kinds of verities, one which we call is the Y vertices which look like this, which is made
up of 3 vertices which are looking something like this. One, we sometime call as a W
vertex which look like this, one we call as a T vertex which look like this, so this is 3
edges, so this is the vertex, but the view point is like this. So, if go to seat from here,
exactly it should look like a tea essentially and the 4 is an L vertex where you cannot see
the third edge, which is coming to that.

765
So, for A, these are the only 4 kinds of vertex that we can see and we are assuming here
that these are generic views and these vertex. You know it is not as if we have 2 objects
and somehow by placement of 2 object it is looks like it is a straight line, we avoid such
views essentially where the little bit of align change of camera, if you want to call it will
change the view. So, we assume these are cana of generic views essentially and the task
is to label drawing like this, so if you go back to this original drawing you want to find
this set of label for this.

Now, any vertex with 3 edges coming to it in principle can be label in 4 into 4 into 4, 64
ways, so, 64 plus 64 plus 64 plus 16. But, because these objects are trihedral, we know
that they can be ladled in less than 4 ways, so let me take this example this W vertex, it
can be label as. So, I will label it from left to right, it can label, this can be minus, this
can be plus and this can be plus, as we can see here minus plus and plus that is one
possibilities or it can be matter on. So, let me just label it, it can be like this as you can
see here this is plus this is like this or it can be plus minus.

We have an example, we have one here this is plus this is minus and this is minus, now it
turns out that for this W kind of vertex, these are the only 3 physically possible set of
labels. So, from 64, we have broaden it down to 3, can we, this information is the
question we ask in this labeling problem and I will leave this is an exercise. This is
broaden down 6, this is broaden down 6 and this is also broaden down 6, there are only 6
base of labeling base T joints and so on. So, this T can be, for example part of a table, so
you have a table and this is a leg or something like that.

You could see a T joint here in which case you can imagine this will be like this, this will
be like this and this will be like this. This is one of the 6 base and there are 6 different
ways you can label it T joint and it should be a interesting exercise for you to try and find
this out. So, how can be exploit this information? The one simple piece, the one simple
pack we should exploit is at one edge can be labeled only in one way at both the end. So,
every edge has 2 ends it participates in every edge participates in 2 vertices, so it
participate it must be label does a same in both the ways.

So, for example, once I have label this edges plus it means, it plus at this end it is also

766
plus at this end, obviously it is same edge essentially it is a simple trihedral object, we
are talking about. Essentially, once I know that this particular edge is plus here, this part
is plus I have constrains on this part, so I can look up a table for this out of the 6 possible
relations which of them has a plus in the middle. What are the 2 other things on the side?
So, I can prune the domain of that particular those 2 edges exactly like we prune the
domains for arithmetic puzzles when we got some valuable prune values for the domain
essentially.

So, what Waltz algorithm essentially does is that it does a, it first does a scan of this
entire set of 9, drawing it makes an assumption that this is a solid objects which means
that was edge is are the boundaries which means matter lies inside them. So, it can start
up by labeling this then labeling this then labeling this then labeling this and so on and
then it does propagation. Essentially, that once we have this outer label they will
constrain the inner label, for example if we look at this place here it allow only a positive
label to be given here because in the W joint we have only these three are possible.

So, we are, fix this variable we can propagate to plus here and this is plus you will see
that this plus, plus, plus is the one of the queue combination that is allowed essentially.
So, this is the idea of propagation, essentially once you know a value at some place you
propagated to the next one and prune the domain for the next one. Exactly, like what we
talked about here revise the domain of the particular variable and these up to waltz
algorithm does essentially. So, let me give another example where whole same
satisfaction has been use as a mechanism for doing something.

767
(Refer Slide Time: 38:34)

This example comes from, what we call is consistency base diagnosis, so let me use one
standard example which we use in this thing, let say we have a small device made up of
3 multiplier. This is M 1, this is M 2, this is M 3, it takes to inputs, let us say the inputs
are fix, so let say I 1 I 2, I 3 I 3 I 4, I 5 I 6, it produces, each produces one output. Let us
say this output is to an adder, and this output is produced, spread to another adder and we
have 2 values, here is a small device which does something from add an addition.

So, multiplication combination consistency base diagnosis is kind of also called as model
base diagnosis. So, model base consistency base diagnosis got it says is that you
construct a model of the system and a model that we build is a constrain model. How do
we do that? We say that multiplier is defined as follows that if it is, then I will use this
for implications sine output of M equal to input 1 into 2. So, I am defining a multiplier
which works I, saying that if the multiplies is then this relation must old which is the
constrain between through variables, 3 variables of that M is equal 2 I 1 into I 2, I N, so
on.

For the others, we can define then we can talk about the connectivity, we can say that the
output of M 1 is equal to input one of a, one in some notation we can say this. So, we are
saying essentially that this output is connected to these input and that is the express by

768
this con this also constraint between 2 variables that this value must be the same as this
value that is all we are saying. So, by we, so we describe these 3 multipliers plus 2
addressing constraint like this, this is the logic constraint is the logical statement, this
implies this essentially we can converted in to a constraint which this being a statement
some sort.

So, we describe the whole device by saying this, this part tells you what the connectivity
is, this part tells you what the behavior of individual component is, and between these 2,
we have describe the whole device essentially. So, now what supposing we get this input
2 input, 3, here what do we expect? We can predict that what this device will produced is
2 into 3 is 6, here 2 into 3 is 6, here 2 into 3 is 6, here and then 6 plus 6, it should give us
12 and should give a 12, that is the expected output essentially.

But, what happens if O A 2 is equal to 12, as expected this is the real value real constrain
or observation some people would call it. But, anyway in the C S P notation, it is a
constrain and O A 1 equal to 10, now supposing I give you this problem I H, what is it?
It is a constrain satisfaction problem where these is the variable, in variable it is a either
true or it is falls and this whole statement is also in variable it is either true or it is falls
which can be from the mathematics. So, let us not get in to the details here, but there is
basically a set of constrains which as description of the aiders predicts, what the output
should be?

But, the system given to us is at the inputs are these 2, 3, 2, 3, 2, 3, this is a multiplier,
this is a multiplier, this is a multiplier, this is an adder, this is an adder, the outputs that
you seeing is output here is 10, the output here is 12. So, what is happening? So, this
approach to diagnosis the task have diagnosis here is to identify of faulty component and,
so we assume that connections are never faulty. So, this is a simplified, we have looking
at things that one of these either the multiplier or the adder has become faulty. So, how
do we do this?

So, that is has a description level, we say that this is a C S P, each of these, the 3 inputs
are variables, this is constrain between those variables 3 inputs and a this multiplier. So,
M 1, I 1, I 2 and O 1, 4 variables, this is a constrain between 4 variable, so if M 1, if the

769
multiplier is the then the output of multiplier should be equal to the product of the 2
inputs for the adder, then the multiplier of the multiplier. So, I have these constrains I
have the constrains that I 1 equal to 2, I 2 equal to and so on and I simply say give me a
solution for this C S P.

What are the variables which are missing the variables, which are missing are that? So, I
know, I know the input variables, I know the output variables, I do not know the
intermediate variables and I do not know whether all the devices are working or not. So,
without going into to the details of how this C S P is search and it is quiet a domain in
itself and there is a whole community which does model base diagnosis, the idea is to
find the solution to this C S P.

What will be the solution look like? The solution will basically say some of these things
of fault essentially for the C S P, P to be satisfiable some device. Some component must
be faulty, some component, some statement which says this component is must be faults
and that will be discovered in the solution which the solution remember must contain this
at this output is 10. So, which means I making a statement like 10 is equal to 6 plus 6,
that is what the adder is saying, so obliviously this is the faults statement.

Now, whether the fault is with the adder or whether the fault which is the 2 inputs,
because nobody, Y is told me that the input is 6, the input is hidden from me, it could be
something else essentially. So, is this adder faulty? Is this multiplier for faulty? May be
these two multiplier is a faulty, which possible that these two a faulty, in such a way or
not that these two are faulty in such a way that this is doing something and this is
undoing that and producing 10. What this is doing? Wrong is being reflected here which
possible, so diagnosis of course there is no clear cut answer to this question, has to why
are we seeing this 10 here.

But, the algorithms are given towards finding minimal diagnosis which kind of is a laser
that you must heard out the simplest solution are the best which says that all though
diagnosis in which only component as faulty is other preferred diagnosis. Then we can
see it must be A 1 or M 1 and the algorithm actually find that essentially that either A 1
is not or M 1 is not and that is why we are seeing 10 there. So, what I wanted to show

770
here was that this is the another example where the problem can be as a C S P, N solved
has a C S P essentially, and this is problem of diagnosis essentially.

So, I will stop with constrain satisfaction here because we do not have too much time left
in the course and in the remaining part of course I want to look at knowledge
representations which is really a core of A I. We should not finish and course without
acknowledge in presentation. So, next will lecture will focus on logic as a language for
representation and listening essentially.

Thank you.

771
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science & Engineering
Indian Institute of Technology, Madras

Lecture - 41
Knowledge Based Systems

So far in this course, we have looked at mostly search base methods and 1 form of the
other. But if you look at human beings; human beings are essentially knowledge base
creature by this I mean that we do not we very rarely do search all we well rare rarely do
tried and other unless off course you are doing research or something which is the
different matter. But by enlarge in everyday activity we look at a problem or we look at a
situation and we instinctively do something.

And when say instinctively what I really mean is that we already have lots of experiences
toward in our heads and we also have lot of rules that we have acquired over a period of
time. We know exactly what to do. For example, if you it is raining and if you happen
caring and a umbrella you instinctively open it. You do not plan and say that if you open
in umbrella and keep it over your head then you will not get wet. In fact, you do not even
reason about a goal saying that you have a goal of not being wet. So, lot of time we
simply use knowledge. So, we want to now spend the rest of the semester looking at
knowledge based approaches. In particular we will look only at a specific kind of
knowledge representation which is logic base, but let us start off with the more general
representation.

772
(Refer Slide Time: 01:36)

So, essentially you can associate the word knowledge bit to know that you know you
know facts you know how to solve problems; you know how to play tic tac to you know
how to ride a bicycle. All kinds of things are known to you essentially. And if a agent is
to interact in a with it is world or with it is surroundings in an intelligent way. Then an
agent needs to represent its surroundings create a model of its surroundings if you all and
reason with that model. Not necessarily a model which reflects how the world is around
us but some form of representation which helps us solve problems. So, if we remember
that we had said that when these philosopher 2 three hundred years ago they were

773
wondering about what is mind? What is thinking? They said Hume and deccart 10 hobs
they said thinking is meaningful; meaningful manipulation of symbols. So, if you recall
this we are sought coming to this approach of what we mean by thinking.

Off course, when we did search we also did meaningful manipulation of symbols in the
sense that we created a represent representation of the state space. We created the
representation of the move and we move sought did some manipulation which would
take us from the start state to the goal state. But now, we are talking in the more general
sense so manipulation of symbols also we call the physical symbol system hypothesis
which we had talked about in the very beginning saimon and newel. They said the ability
to manipulate physical symbol systems is sufficiency to create intelligent behavior and
what we meant by physical symbol systems? So, symbol systems of first we understand
these are structures of symbols by physical what they meant was that they obey some
kind of laws of reasoning. So, which is also serve ties of a this idea of manipulation of
symbols know only question that remains is about meaningful.

We will try and get some insides into this as we progress not necessarily today, but may
be in the next week. But let us focus on the fact that what we mean by thinking is
representation of our situation or the world or the surroundings and the goals and the
desires and anything. And thinking basically means manipulation of those
representations if you sought of bring it down to simple computer science term. It
basically means we want to do some kind of representation data structures we want to
device and wherever we have data structures we need to have algorithms. So,
manipulation of symbols would be done by algorithms and in that sense the word
physical fits in there is that is that we have well defined ways of manipulating symbols
which obey some laws. Essentially what is the kind of knowledge that we carry in our
heads? So, simplest is natural language; natural language is a medium for representing
knowledge and even exchanging knowledge essentially.

Of course, it is started off as a media of a exchanging, but as people device means of


writing and printing books. It also became a means of representing knowledge so that
somebody else could read it as a different point of time. Why do not we use natural
language as a representation mechanism for knowledge? The main reason has we

774
probably have discussed earlier is that it is ambiguous it is not straightforward to
understand what 1 is saying in an ambiguous un ambiguous passion essentially. Because
languages is so first of all we can so say the same thing in many different ways. And
secondly if you express something if you utter something for example, if I say she
walked over towards a terminal. It is not clear entirely what I mean by this sentence or if
I say he went towards the bank again it is not clear what I mean.

Because the words bank and words terminal have more than 1 sense is more than 1
meaning essentially a terminal could be terminal in a lab a computer terminal. It could
also mean a terminal in a airport essentially so we do not know what we are talking
about. Likewise bank could mean a river bank or it could mean a financial institution
this, only word which have different senses, but other things like verbs also have
different sense. We saw an example time fly’s like an arrow I think we had discuss it the
beginning of the course is a sentence which for us seem some ambiguous. But if you
what to look at it this passionately without the background context at we always carry in
a our head it has more than 1 pass essentially. And we discussed may be 1 or 2 of them
may be I would encourage you to look at them again. But for various reasons natural
language is not attest yet a medium of knowledge representation though even eventually
we hope that we will acquire our machines with enough language processing facility.

Capability is to talk us in our languages in English, Hindi, Tamil whatever and represent
knowledge also in those language is essentially. Then we do not need to program in C
plus plus python ogg in our any such thing you could simply talk to them in English that
is way in the future we saw other mechanisms for representation. So, I am talking about
representation here it could be an natural language, we a seen rules already essentially
when we studied rule base systems we said that you know you could capture heuristic
knowledge in the form of rules. And then use that in fact it turned out at rule base
systems has become like a programming language in itself essentially. We also saw
something similar to rules when we were talking about planning. The planning operators
had the similar flavor about rules at there was a well defined language in which you
could describe the preconditions. And then an action basically linked preconditions to
post conditions essentially there was a well defined language.

775
So, we were moving towards move systematic base of representing things they could be
other mechanism for representing things. We could be have tables for example, log
tables I do not know how many of you have used log tables. But when I was a student
that was the primary means of which you needed to carry to your exams know if you
wanted to do some sophisticated computation you needed a log table and something
called a slide rule. I do not know whether you have heard about it but these are
representation of knowledge in some sense essentially then he tells you that for this
number the logarithm is this for this number the logarithm is this and so on essentially.
Then we have knowledge embedded in procedure at typically and imperative program
contains a lot knowledge which is not explicit essentially. You may have written a
program to create the, I to compute the Eigen values or Eigen vectors or to find routs of
an equation.

All these programs contain knowledge which is your knowledge which has been put into
the program. And hence for when the program runs basically it is executing your
instructions knowledge is not explicit essentially. And I do not know whether we had
discussed is sometime but the view that was performed by Kowalski Robert. Kowalski;
he use to teach in imperial college London 1970s, he was the invent of 1 of the inventors
of the programming language. The idea of logic program he said that a program is equal
to logic plus control that the program compute a program has 2 components. 1 is what
you might call as business logic or the domain logic or the problem solving strategy. And
the other is control essentially what statements to execute in a program essentially and
the view in an logic programming community is that control should be left to a system in
our case and inference engine.

So, prolog will basically as you will see rather basically does depth first search over a
representation, but we have seen a variation which is system which is the forward
chaining system where you write rules and everything that you want to says express in
the form of rules. So, we can associate logic with rules and control with an inference
engine. So, we this we study the rate algorithm for representing representation of rules
and then we saw the match resolve execute cycle which forward chaining the essentially.
So, that is separating out the so the control is left to the inference engine you only
provide the logic essentially. But very often in our imperative program C programs

776
specially whatever you do is embedded inside procedure that is another form knowledge
representation when you talk about knowing how to swim or knowing how to ride a
bicycle. It is almost impossible for you to articulate that knowledge how do you ride a
bicycle you cannot explain, explain it so easily so it is sought of embedded into some
procedure which you have learned over a period of time essentially.

So, we also carries us knowledge off course then there are things like neural networks
which basically encode knowledge in form of weights. And we had already observed that
these are sub symbolic they are representation they are explicit representations, but a
symbol does not stand for something in a neural networks a symbol meaning. A weight
basically everything in a neural networks is we called it in terms of weight essentially. A
weight basically tells you what is a connection between 1 neuron and the, we will spend
our time in the next in the remain the part of this course looking at logic as a mechanism
for representation as well as vehicle for reasoning essentially. So, whenever we do
representation what is the use? Representation if we cannot do something with it so
along with representation you must have reasoning’s essentially and logic provides a
very nice vehicle for doing both essentially.

So, logic is associated with on the 1 hand with language and on the other hand with
inference, inference procedure of the capability to doing essentially. So, before I get into
logic I want to sought of try to create a outline of what are the different kind of logics at
people talk about essentially. 1 thing common amongst all is that or the language of logic
talks about sentences a thief race in logic is a sentence a basic entity in when you are
talking about using logic as a language representation is sentence. So, essentially logics
are devised to express instances in 1 form of the other. And what is the sentences some
sentences something which in principle is true or false or can be true or false anything
which in principle can be true or false is a sentence and logic. Various logics are device
to represent various categories of sentences and then of course,, reason with them
essentially.

So, what do you mean by something which can be true or false. If I say can you please
lend me a book now that is not a sentence in the sense of sense of logic, because it is not
something which is true or false is basically a request or imperative statement in saying.

777
Can you do something for me essentially or it could be question mark. So question marks
are not sentences if I say what is your name, it is not something which is true or false.
So, it not logically a sentence it cannot be assign a truth or false value or if I say please
give me a glass of water, it is not a sentence it is a request or an imperative statement.
Those things are not sentences anything which in principle can be assigned a truth or
false value is a sentence if I say why it wins in chess or why it always wins in chess. And
by this I mean when both sides play perfectly it is a sentence I do not know whether this
sentences true or false, because as we discussed when we talking about playing the chess
game tree so huge that we have no hope of solving it at least not in the very near future.

So, we can make such sentences whose value we cannot determine, but never thus they
had they qualify a sentence if I say the moon is made of green cheese that is also
sentence. But in this case you sought of fairly confidently say that it is a sentence which
is false so sentences as those things which can be true or false essentially. Now, there are
different kind of languages essentially so the simplest kind of language treats a sentence
as 1 unit. So, if I say the earth is flat then that is 1 unit essentially and the simplest kind
of logic or logic language which we called as propositional logic treats a sentence as a
unit unbreak unbreakable indivisible unit you cannot look inside a sentence. You cannot
say that this 1 is talking about something called the earth. And is talking about the
property call being flat nothing of the sought it simply says this is some sentence and
typically we use.

And you must we familiar with this something like a symbol to say p stands for this p
stands for the earth is flat essentially or I can say Socrates is a man. This is a sentence in
propositional logic I cannot look inside it I just it is something and there off course I
would want to assign a truth value essentially. I can say all men are mortal this is also in
propositional logic it just some sentence I can simply call it q and I can simply call this r
or p 1 p 2 and so on and so far that the we cannot look inside sentences. So, a logic in
which you cannot look inside the sentence or were a sentence is the atomic unit is called
propositional logic. And then off course, we have logical connectives so we can say the
earth is flat or Socrates is a man or the earth is flat implies that Socrates is a man you can
do all these kind of things. You can combined sentences to form compound sentences we
will see the syntax and I am sure you are familiar with it such logics are called

778
propositional logic essentially.

Now, these 2 statements are well known for anybody who have studied logic, because
they together represents some of the oldest examples that people have been using in logic
in fact from Greek times that is why we have the term Aristotle. Socrates and the logic
and the reasoning part says that if you know this or if this is true. And if you know this
and or in other words this is true we will use the term interchangeably then you can infer
that Socrates is mortal. So, this is the reasoning party and this is what we mean by
reasoning in logic is that if you know certain set of statements to true or you assume that
to be true or that given to be true. Then you can infer that there are other statement which
are true somebody tells with that Socrates is a man and somebody tells you that all men
are mortal. Then what Aristotle should was that you should be able to infer that Socrates
is mortal essentially.

And this particular form of reasoning or this particular loll is called a syllogism we will
not go too much into history of logic here, because there is plenty to look at they, but we
just keep in mind that this was there in Greek times as well essentially. In fact, reasoning
is the motivation for which logics by maintained in the first place essentially. Because
people wanted to say that we do not want to argue about truth value of statement
essentially. If you know something is true and something else in true and something else
is true we should be able to infer that something else is true essentially. So, in that sense
logic is associated with the particular form of reasoning which is called deduction. And
we will come to this again, but just to get started by deduction we mean making
inferences which are necessarily true essentially. So, that if the premises or the given or
the axioms are true then the conclusion man net must necessarily be true logic is
concerned with forms of reasoning which take you from true statement to more true
statements.

And they take you and it is a reliable robust mechanism if logic if a logic machine and
we will talk about that says that this sentence is true. Then you should be able to believe
it without any reason, but there are forms of reason that we often do which are not
necessarily always true. So, in a experience for example, is always captured in patterns
and conclusions. So, you see that clouds and you say it is likely to rain essentially now

779
off course that conclusion you can make, but it is not necessarily true. Because you do
not know what is going to happen in a that is not a logical that is not something that you
can did use essentially or to take a more topical example if I say this students signature is
there on my sheet. Then he is present in the class know that is not a reliable conclusion to
make essentially these are forms of reasoning which are not deductive essentially. And
we do deduction we though other forms of reasoning all the kind we do for example,
generalization the whole machine learning community is occupied with generalization
you look at examples from that.

You generalize essentially you look at one leaf you look at another leaf you look at the
third leaf. And then you a come to a conclusion that all leaves are green it is a form of
inference which is not necessarily deduction essentially or you look at a crow. And you
see that the crow can fly and you see that the sparrow and you can see that sparrow can
fly. You see a parrot; parrot can fly and you come to a conclusion that all birds can fly
this is a form of generalization is a form of learning you might say, but it is not infallible.
For example, there are birds which cannot fly penguin is cannot fly so there are other
forms of reasoning which we do all the time or look at the process of diagnosis. We saw
recently an example of doing diagnosis by a process of consistency checking essentially..

But we can not necessarily say that we have concluded is the real fault, because in the
real world in the real world a fault causes a symptom in the real world something has
gone wrong. And therefore, we can see the symptoms of, it has it having gone wrong, but
in diagnosis we move in this other direction. We look at a symptom and we jump to a
conclusion or we way can inference that this is a fault essentially and these are not
necessarily correct. So, if you go back to example that we looked of multiply as an
adders we could have concluded that the 1 adder was wrong or you could have
concluded that 1 multiple had gone wrong. Any of them could have been true in fact, we
could have even said a combination of things could have gone wrong any of them could
have been true and without further investigation we are not able to make a more
definitive statement.

Medicine is full of this kind of reasoning you look at symptoms and you make inference
has to what has happen to you; you have fever you have shivering in the evenings. And

780
you jump to a conclusion that you have malaria essentially now this is not deduction it is
not necessarily true. We are interested in logic for deduction that statements that we
derive or produce using this logic machinery should necessarily be true and logic is built
around that motivation essentially. Now, in propositional logic I cannot take these 2
statements and I cannot device a mechanism for arriving a that conclusion, because it is
not so straightforward to to make the connection for that. We will have to go to a higher
logic or a more descriptive logic which is called as first order logic. And we will infects
spend most of our time in first order logic; in first order logic we break up a sentence
into parts.

We keep the word all separately; we keep the word men separately; we keep the word
mortal separately and the connection between them separately. We break down the
sentence and represented with consequent which we can then reason about essentially.
So, essentially what will say is that men is men is a class of entities mortal is another
class of entities and if anything happens to be a man then that was also be mortal. And
there from that we will reason the, because Socrate is happens to be a man then Socrates
is these reason, but to this to do this we need this machinery of first order logic which is
little bit more expressive in nature. Now, this is a fact of logic is that the more expressive
a language you device the more complex is the computational required to reason with
that essentially so computational complexity increases with expressiveness.

So, some of you would have heard about good else incompleteness theorem and what
that theorem says and may be will come back to that bit later. If we get time is that if
languages are expressive enough then there are conclusions which the logical machinery
cannot make in that sense it is incomplete essentially. So, we talk about completeness
when we have talking about search there is a similar notion of completeness in logic is
that given the fact that there true statement out. There is my machinery able to reach the
true statement or not is the notion of completeness and what good else showed in 1931
that reach enough languages will never allowed completeness in. So, there is a trade of
the more reached the language the more difficult it is do computation. And most of the
computations that we do include including the program that we write C program will
following to the category of first order logic essentially.

781
And it is characterized by having variables and being able to write lops with use
variables and things like that essentially we will formalize that a little bit later but there
are other kind of sentences we can talk about. So, we have seen 2 forms logic
propositional logic first order logic there is also, mething of called second order logic
which is what apply to we will see if we come to that, but what about sequence like this
if I say Nisha is tall. Now, that is the statement in principle off course, I can give a truth
value to this by saying that this is true this could I could say this is a true sentence or this
is a false sentence. But the question arises as to when can I say that this sentence is true
so logic will also we concern that semantics associate that when is this sentence true
essentially. Now, the trouble with this kind of sentences is that if I want to device a rule
with says if you have more than 6 feet. Then you are tall if you have less than 6 feet you
are not tall then I have this problem that you know what is somebody's 5 feet 11 inches
or 11 and half inches would is that person not tall.

And somebody who is just 6 feet is tall it leads to problems when you are trying to model
real world domains. And reason with them that if you want to talk about properties like
tall warm. Know cool what you really mean somebody say this room is to warm
somebody else is feel is not to warm apart from the part the fact that there is subjective
difference. There is also the notion of we do not really know what temperature does
warm begin with. And what temperature it is known on the so do we say that if it is more
than 30 degree centigrade it is warm or some. Or it is more than 35 degrees can have a
crisp boundary that is the real question essentially. Can we divide things into warm not
warm, tall not tall by drawing a crisp boundary and many people felt that that is not the
case that that is not what you do should do. And one of them was lofty zeza zadeh you
must have heard about in who device this thing call fuzzy logic so the where fuzzy logic
is difference from what we is use here.

So, these 2 are what we call as classical logic and by classical logics we, but more less
mean 2 valued logics sentence can take 1 of 2 value in our case we call them true and
false. But that is only for our own convenience, but people have thought about multi
valued logics somebody says may be you should have 3 values true false. And may be or
something like that or true false and do not know maybe that is a third value people have
try to device logics like that essentially what fuzzy logic. So in 2 valued logics a

782
sentences either true or it is false essentially. And that is as we will the semantic of
sentence like this says that if you have a mortal you either belong to the set of things
which are mortal. Or you do not belong to the set of things which are mortal if you
belong to the set of which are mortal then you are mortal otherwise you are not mortal.
So, that is a crisp set as we say as oppose to that what is Zadeh said was at why do you
want to work with crisp set know you have a notion of a set membership or the
characteristics function that you must have defined a some point why is it mapping only
to 0 or 1 why cannot it map to 0.7 0.8 or something like that. And zadeh said that for
example, if I want to define tallness and this is height then I should be able to define
something like this.

Whereas, some point it becomes 1, but there are regions where you have different
amount of commitment to this belonging to this set. So, set membership of fuzzy sets so
fuzzy logic comes from fuzzy sets and classical logic comes from crisp sets that which
either have membership is 0 1 here membership is a continuing between 0 to 1. So, you
can if you have this height this is a membership to the set of tall people if you have this
side this is a membership to the set of tall people and it varies essentially. There are other
kind of logic which deal with uncertainty so for example, if I say tweedy is a bird can I
infer that tweedy can a fly to make these kind of inferences. And we call these inferences
defeasible inference meaning that the inference can be defeated that you may say yes
tweety can fly, but somebody will give you some evidence to show that tweety cannot
fly essentially. Somebody tells you tweety is a bird and you say tweety can fly and then
somebody tell you the tweety is a penguin then you realize that penguins cannot fly..

Then you inference defeated essentially, but most of the time we do make inference
which can be defeated. So, we go to the canteen and assume that I will get tea there or
canteen will be open or my bicycle will have here all kinds of things. We believe in
which can be defeated essentially so the ability to make defeasible inferences very
important to us, because otherwise we would never be able to make inferences if I say if
I use a statement that all students have bright. Then somebody comes and shows me 1
student in some corner of the country whose not bright then may statement becomes
false essentially. Because when I say all students are bright it means that each and every
students that each and every person whose the student must be bright. Now, if somebody

783
shows me 1 person not in Chennai off course, who is not bright. Then my statement goes
is no longer to which means any in any inference that I would have use making that
statement as a premise and know can no longer may, because the statement is no longer
too.

So, if I say students in IIT go and take a jobs in gold man sacs where in bright or
something like that the ability to make to ability to reason with statements in which you
do not have complete belief of which there is uncertainty. And then reason with them is
very important for us we use it all the time and we must develop mechanisms to that.
And some of the things that off course we will not studied in this course a things like
default logics and default reasoning essentially. Then we make other kind of statements
which so we may want to make a statement this kind that may be let say we are sitting in
this room. And we here some sounds and then we make an inference we make a
statements that it is raining essentially. Off course do not know whether it is raining or
not.

But we can make a statement that it is possibly raining what is a what you mean by truth
values of sub sentences, how would you valuated truth value of such a sentence. If I say
it is raining you can go out and look at outside window and say that the statement is true
or statement is false. But what about statement like it is possibly raining essentially or if
you say that may be he is sleeping in his room essentially. So, off course, you can in this
particular case you can go and check with the person is sleeping in his room. And then
you can have new statement which says he is sleeping in his room, but what can you say
about the statement may be he is sleeping in his room is that the true statement or a false
statement. Now, people have tried to device logics well I do not know what you mean
when you say true statement so you have to be a bit careful essentially when you say it is
a true statement.

Then what one should say is that there a possible world, so if you assuming that
everything that you do not know you we have only a about the knowledge about the
world essentially. Then if there is a possible world or a feasible world or a consistent
world in which that statement were to be true then we can accept such a statement to be
true essentially. So, it is a little bit not easy to discuss right now, but we have this that it

784
is possibly true or it is necessarily true. So, in fact there is a whole logic which is devised
to talk about statements of necessity and possibility essentially. So, for example, you
might say in that diagnosis that it is possible that he has malaria essentially and these
logics which are called modal logics allow 1 to talk about statements like this essentially.

So, we have something called modal operators and there are in particular 2 operators,
one is called diamond operator this is standard usage in modal logic community. And the
other is called a box operator and if you write a statement in diamond p it basically
means that p is some statement like this. For example, the earth is flat if I say diamond p
it means is possible that the earth is flat essentially. So, I have a language for expressing
this essentially and then they have mechanism of reasoning which such statement
essentially. One particular kind of modal logics is temporal logics where the temporal off
course deals with time and when you deal with time and you are talking about the future.
Now, you know that not all the time we can make definitive statement. So, if the world is
if the sky cloudy then you can say it will possible easy in essentially in the future. But
temporal logics are even more tied to time you we can make statements like if you leave
a block of ice on you table.

You can make a statement that it will eventually meat essentially that there will be
sometime in the future when the block of ice will no longer be there and till meal
essentially. So, we can reason about time essentially or time in change and things like
that essentially so you can say that if you keep within proxy then eventually you will get
a w grade essentially. Of course, not today now another kind of log temporal logics deal
with statements like the following that John knows something so John knows that the
earth is round. Now, look at this statement this sentence contains a sentence so that is the
first thing you should observed that there is a sentence inside this the earth is round and
well I had p here. So, let us call this P 1 now what I am saying in this sentence is that
John knows p 1 essentially. So, such logics are called epistemic logics and I would
basically represent this statement in a simple epistemic logic as saying that k j were k j is
the modal operator which says the John knows something.

And then that sentence p 1 so when can now talk about someone knowing something’s,
but one can also talk about statements like this that this is another sentence in this

785
epistemic logics. And let say m stands for marry what this sentence is saying is that Mary
knows that John knows that the earth is round. And off course, you can I have all kinds
of thing you can say Mary knows that John does not know that peter has gone home or
all kind of statement. So, you can combine logical operators which I am not seen so far,
but we will see them in due course. So, what modal operators at modal logics are doing
is that they are taking some standard logic and in our case is propositional logic that we
are talking about then applying modal operators. So, these are modal operators which
talk about necessity and possibilities possible that Mary loves John if I want to say then I
could use a diamond operator essentially.

And or I could use a know this operator temporal logics use modal operators which talk
about time epistemic logics use modal operators which augment a language by talking
about what people know essentially. Now, almost similar to knows is the notion of belief
so people have built logics which talk about belief essentially what is the difference
between knowing and believing? Well, we would like to say that if you know something
it is necessarily true if you believe something it may not necessarily be true. So, I can say
peter believes at the earth is flat essentially that is the perfectly valid true statement,
because that is what peter believes essentially. The earth may not be flat, but the
statement is the true statement essentially so belief so knowledge and belief.

Off course, it is quite an interesting thing to work with specially when you are talking
about multi agent scenario essentially. In fact, epistemic logics were created to handle
multi agent situations if we have know many agents if you have networks of agents and
they are doing certain things. Then how can we reason about it how can we show form
example that given a certain protocol. A group people will always elect a leader or not
elect a leader a would be unable to elect a leader. How can you prove such thing
essentially this what people with epistemic logics have been associated with possibility
and necessity is a mathematical foundation called rough sets. So, just like fuzzy sets
form a basis for fuzzy logic and fuzzy sets are sets in which membership can vary to
degrees rough sets are sets which have basically 2 boundaries essentially.

786
(Refer Slide Time: 47:22)

So, for example, if I say I am talking about the set of tall people then I could draw 1
boundary here. And another boundary here and the meaning of this is that anybody
inside anything inside the any circle is definitely tall necessarily tall. And anything inside
the outer circle, but outside the inner circle is possibly tall but not necessarily tall and
anything outside both the circle is not tall essentially. So, rough set is a mechanism
which has been device to deal with this kind of a uncertainty essentially now the last
form of reasoning that it not so much like logic. But we have probabilistic reasoning so
probabilistic reasoning is one way of handling uncertainty essentially. So, for example,
when you are doing diagnosis for example, you might device a system with says if
somebody has some symptoms. And somebody has yellow eyes then its lightly that the
person has jaundice how do you express his lightly look us one thing is this possibility,
but if you want to now talk about gathering evidence and increasing your belief in
something.

So, you believe that that this is the really the case you may be you do one more test and
you belief increases probabilistic reasoning is 1 mechanism for handling uncertain logic
knowledge essentially. Now, you must distinguish between fuzzy logic which also is
talking about numbers which are less than 1 and probabilistic reasoning which says my
conclusion. I am 70 percent sure of my conclusion or something like that there is a

787
difference between this and fuzzy reasoning in fuzzy reasoning. It is a question of it is a
linguistic problem really as what do you mean by the world tall? Word tall essentially,
because it is not defined clearly whereas, in probabilistic reasoning if you say that the
chances that he has jaundices 0.7 or the probability is 0.7. Then you are saying that this
is way of talking about my lack of knowledge about the real situation. So, off course you
know that you know when you throughout diceyou say the chance of it falling 1 or 2 is 1
by 3 is simply says that.

Because I cannot say what it what is going to fall I just use probability as mechanism for
expressing my lack knowledge about what is what really is the cases essentially. If you
had perfect knowledge about physics and all the information that is needed to do the
calculations of what is really going to happen when you through a dice then between the
moment when you through it. And the moment when it really falls you should be able to
compute in principle whether it will fall with 1 or 2 or 3 or 4 or 5. But off course, we do
not have that knowledge we do not have the computation ability. So, we say talk about
probabilities it is likely that it that it will fall like this certain likely it will fall like this
essentially if you get time we will look at a variation of probabilistic reasoning in the
context of diagnosis and see that when you get multiple pieces of evidence. How do you
combined those evidence to maintain a different amount of belief in all kind of things
that will be towards a end if we get time.

In the next class, we will look at propositional logic more formally in fact, when we talk
of logic we always use the term formal logic, because logic is concerned with form and
not with content essentially. Logic is concerned with valid forms of reasoning and it does
it is not concerned with what you are talking about. So, I can say that Socrates is a man
and all men are mortal the syllogism allowed be to complete to conclude. Socrates is
mortal if I say Ramesh is a student all students are bright then the same form of
reasoning will allow me to infer that Ramesh is bright essentially. And why is that a form
why is that a valid conclusion, because the form of reasoning that I am using which
called as a syllogism he showed to be valid form of reasoning. So, we will look at valid
forms of reason that we will use in propositional logic and try to show why they are valid
also in the. So, will start off with the language and then we look at the mechanism for
reasoning and this whole thing we will called as a machinery of logic. So, we will in the

788
next class try an build a small machinery for propositional logic essentially. And then the
rest of the course will spend on first order logic essentially, because that is really interest
more interesting essentially. So, I will stop here I think.

789
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Lecture - 42
Knowledge Based Systems PL

(Refer Slide Time: 00:16)

So, today we want to look at propositional logic. So, to keep things into prospective our
motivation for studying logic is to be able to make inferences. So, in general if he given a
set of facts and if he ask a statement about which is not in this set some statement. And
we are ask the question is this statement true or not true then we should be able to answer
that question. And you want to build the mechanism or a machinery to do this
automatically or mechanically if not necessarily automatically. By mechanically we
mean by applying a set of rules which can be then by human as well. So, whenever we
talk about logic, we have first talk about the language part which like any formal
language is a set of sentences that belongs to this language. So, what is the language of
propositional logic? It has 2 components; one is called the logical part and it essentially
contains the alphabet of these operators or operator symbols. So, you must keep in mind
that we want to distinguish between the syntax of the language and the semantics of the
language and do you not do you.

790
So, a very pedantically in this lectures, but we need to distinguish between the fact thus
the syntax part which says that these are some symbols that we use. But off course, the
meaning of this symbols calms from this semantics and which as you are familiar with is
given by the through tables at we can construct for each of these operators. So, at one
level other syntactic level we just treat them as symbols but when we talk about the
semantics of sentences. Then we will sort of reverts to the idea which comes on the proof
table essentially. So, for the movement we have focusing only on the syntax part just
think of them only a symbols essentially. Then there are other things like brackets that
we can use then there is a non logical part by thus we mean it varies from problem to
problem essentially let us as assume that be since here. So, we have symbols which is set
of proposition symbols. So, this is the alphabet with which we construct the language we
have a concern at which we use and all progression logics. And we have a varying set
which may be in principle and infinite or at least countable infinite or it could be fine at
as well essentially.

We just talking about the 5 statements speak you are a steep then this set will be simple
set of statement. So, this is a countable set of propositional symbols a we have 2 more
symbols here one is I will use this notation. But you could also use f was 0 and the other
is this, but you could also used t or 1 or even true and falls the full letters. It does not
matter basically they are these 2 symbols which are they are in all logics this you often
read as bottom. And this we read as top essentially bottom are falls or 0 are equivalent.
Basically it is a set of 2 symbols it is a 2 valued logic that we have talking about. And
when we talk about semantics then every sentence in our language will be map to one of
these symbols. And that is how the semantics will be defined month we would implicitly
between us say that it is stands for true sentences. And the other stands for fall sentences
I mean this is fall sentences and this is true sentences for that is only for us our sake. As
far as the logic itself is concerned the machine in self is concerned they are just 2
symbols to which sentences are mat on true essentially. But a travelers helps to keeping
mind the semantics which is that we are talking about false or true statements. What is
this machinery that we have talking about?

This machine is we want to built is essentially a machinery which first defines this
language. And then tries to reach to sentences in this language essentially and a mapping

791
for to this anywhere will come to that in the movement. So, the set of sentences let us
call it s p where p stands for propositional logic is defined as follows that this belongs to
this his set top belongs to this set. So, that of course, you know they stand for something
which is always true or and something which is the always false. So, this is the set of
propositional symbols let us call this set p. Then we say that if alpha belongs to p then
alpha belongs to s p. So, every propositional symbol is the sentence and then we say that
if alpha beta belong to p. So, alpha beta these symbols we will use as variables to stands
for any sentence essentially alpha and beta belong to s p and a, these are finally, operator
symbol. So, of these symbols we will distinguish between binary and unary and we will
not talk about higher or operators and we do not need to a. So, this one is unary we often
of always you also use the this sign for this a for this sign. We also use sometimes an
arrow for this we sometimes use 2 arrows and there are the symbols which we I have not
mentioned here, but we can build the language using other symbols.

So, I So, is your familiar that there are sixteen possible binary operates that you can
device and these are only one 2 3 4 of them and there are 12 more which so if alpha beta
belongs to the set of sentences. And so instead of using this particular symbol let me here
use some abstract symbol if says the, this is the binary operators symbol then alpha well I
am. So, set I just defending the language here. So, for example, a if p and q are to
propositional symbols then p and q belongs to this set and once this belongs to the set we
can also say p and q or q belongs to the set.

So, essentially we can combined this any number of time and we can create sentences
which are which are really long. So, and which infinites so the sentences in fact, we can
create even if the set of propositional symbols is finite essentially likewise we have for
the unary symbol. If alpha belongs to s p and we just use this symbol in a unary operator
then so essentially now we have defined we started with the central propositional
symbols. And we have augmented this with more sentences and the sentences are using
the operators. So, the or the connective essentially and the bracket symbols and so we
can get in infinite set of symbol essentially. So, this is the syntax, what is the meaning of
these sentences propositional logic is not concern with meaning. So the meaning is given
by us essentially.

792
(Refer Slide Time: 11:11)

So, I can say that a if have set of sentences like Alice likes a math and physics and I call
this sentence p. That is only for my own sake essentially I want to let say reason about
Alice and I create a set of sentences is about Alice. So, I say I like math’s and physics a
Alice likes music. If Alice likes math then Alice likes algebra if Alice likes algebra and
music. So, I should really say I will just likes music then Alice goes to college that is say
I have this set of sentences. So, always thank to give you flavor of anyway familiar with
this the kind of things we can do with professional logic and from this we want to ask the
question that is this sentence. So is this sentence can be cannot be informed not
essentially. So, I can now encode this problem in propositional logic and I can say a that
Alice like math’s is p and Alice likes physics is q.

Then I can call the sentences p and q because Alice like math’s and Alice likes physics
this I can call as m. And then I have a statement if Alice like math then Alice likes
algebra. So, this sentence I can call as a let us say and this hole sentence I will represents
and implies a. And this sentence says Alice like algebra Alice like music o implies a if
Alice likes music and then Alice goes to college So, this let us call it is c So, this
sentence I can write as a and m implies c why did I put brackets around here? But I am
not doing it, because when if clear then we do not often put brackets and this is the
sentence c and we are asking with the this sentences true or not. So, I am saying here that

793
m means that Alice likes music but that is only between us as for as the logical
machinery is concerned it is not concern with this at all. So, like we set logic as formal
essentially.

(Refer Slide Time: 14:55)

We say logic is it only depends upon the from so essentially what we have going to ask
is a if a assume this is true; this is true; this is true; this is true. Then is this necessarily
true number we said logic talks about for deduction essentially, but suddenly you have
started talking about true statements for let us talk about truth first when we says
semantics. We essentially have to kinds of well should I safely worst do it values it is a,
what does the sentence meant? What does p mean? What does q mean? What does n
mean? Now, we are saying that logic is not concern with that essentially. The other thing
is truth; truth values or some people in the tram truth functional semantics. And what we
have saying here is that there is a mapping can which goes from this site of sentence sp
to in site. We want to define this mapping and that those 2 symbols we set that between
us will assume this stand for false and truth essentially. So, what are what is the mapping
a looking for we are looking for a mapping which takes an import any sentence in my
language. And maps it to one of those 2 symbols which means it tells me, but that is
again between us where there the sentences true or false essentially.

794
So, we have to a define this mapping. So, we start with the function v which maps this at
p to the set of essentially lets external what do mean by this? Somebody is telling me
what is true and what is not true that I am saying that Alice likes music’s. So, I am give
mapping and to clue and mapping p to true q to true. And so in general given the
mapping of the professional sentences we can derive a mapping or so this is general
version we will come to specifics later sometimes you have uncertain solid.

So, I am I it is say p is true or r it is true in which a off course, a do not I am not given
the mapping, but I am given the mapping for pm also. We will come to that a bit later
first you all the defined that if you are given the mapping a which we call we which takes
you from the set of propositional symbols to this. Then can be defined this value function
and that we can do as follows. If help a belongs to p then val alpha is equal to, because
somebody is given a started it maps to he are very talking about to tables from this point
onwards. Then we always define val is equal to this thing and val they will little bit of
over loading happening here ideally at at should use different symbols essentially.

So, I so let the say we defined it to 2 now, what and really saying is that this bottom
always bottom symbol always maps to the truth fell you false. So, let us call this is the
set of truth values and here working with the binary true valued logics. So, that this truth
values which one we call false and one we call true and bottom always must be false and
top always maps toT with stands for true essentially. And then if val alpha is equal to
true then val not alpha is equal to form. So, I right rules like this if alpha maps true, and
alpha is the sentence in my language there. Then the another sentence in which I prefix
alpha with this or unary operator for we will off course, between us read as not alpha
then not alpha maps to f essentially and like this you can right other rules. So, for
example, we can say if well alpha you control true and val beta is equal to false then val
alpha implies beta is equal to false. So, we are familiar with this. So, we will not spend
our time righting this essentially you can convert these kind of statements into a table.
So, for example, a alpha beta alpha and and place beet.

So, false false true false true well I do not know where there you can make out between
my maybe I should out chosen 0 and one that anyway you not the truth table for
implication. So, this is the truth table for implication these are the simplification if I can

795
like this for each low I can write a sentence like this. So, this is this stands for the a third
row here this statement call this third row. So, I considered the 2 table or I can rights
statement like distance so on, but we are familiar with this as to how to defined this
semantics of how to lift the semantics of smile sentences to compound sentences when
they are made of using operators. And we have this logic rules that we all familiar with
essentially. So, what have it done? We offset that we have define the language of
professional logic, we have define what is the set of sentences which make up this
proportional logic. And then we have define the mapping which tells us that if he knew
the truth values of the constituent sentences.

How can we lift those truth values to more compounds in this is and we have rules for
doing that which means somebody gives us this valuation which a that these are the 5
atomic sentences a proportional’s symbols. And they map to true false true false
respectively then for this entire set of infinite formalize. I have a mechanism for
determining for there they map to true of where there they false and that I can do by
using true tables or something like that. So, where, there the logic machinery come and
here. So, logic machinery comes in the logic machinery as we set is deals only with the
form it does not deal with content. It does not with semantics it is it does not deal with
meaning what we have interested in is can we table of a syntactic machinery which will
do that mapping for us essentially. Instead of having to apply these rules repeatedly you
can keep applying the rules if some formula l for and some formula beta for what is the
truthfully of this. It depends if both of them a truth then this compel formula is true and
you can breakdown essentially. Instead of doing that in staff having to construct the true
table can we have an alternate mechanism for doing that?

So, to come to that let a say that somebody as given a this function. We with basically
saves which professions of true and which are falls from that we can come compute this
function val which says the something for compound sentences. And from that we can
defined this said t is the set of 5 sentences. So, of all the sentences which I can construct
did my language I the semantics tells me which have true and which have falls, but I
have own and either waves of reaching those sentences. I do not know to apply these
schools I do not want to construct to tables quite, because the side of the 2 tables then
grow to be very large is essentially I have 2 constituent here alpha beta. And I have 4 set

796
4 rose it may true table if I have a sentence like alpha and beta implies or delta implies
theta and I have 5 variables and I have 2 rest of 5 rose in may true table. So, the side of
the truth table goes exponentially with the number propositional variables set I use in my
sentence essentially. So, I do not know want to construct a true table what do we do we
defined in set the notion of a rule of inference. So, let me of this part out. So, what are
we interested it we I interested it we a interested in reaching those wet set of true
sentences this.

(Refer Slide Time: 27:00)

And what do I mean by that if I somebody else given with the promises which is this set
p that at talked about the semantics defines is set t devises I assume to be true. So, these
assume to be true but from if I assume these to be true then there where hole set which is
true. So, for example, I can say a and or not q is this true or not this is the hole they there
infinite. So, the sentences I can construct using these symbols and I can ask the question
of the self them were there that is true or not from that I can collect the true sentences.
And I have this set true essentially now a are seen that he say the mechanism of reaching
this set again truth true or different means now you talking about the logic machines. Or
you can say and inference this use is something call rules of inference. So, what does the
rule of inference? Rule of inference is a pattern and the most popular on that no thought
the a familiar with is called nodes ponens and modus ponens says that if he can see a

797
sentence alpha. And the sentence belongs to the set of sentences at we have and if you
can see a sentences alpha implies beta.

Then we can add in sentence beta what do I mean by this that if I have a set of sentences
given to me to set of premises. For example, our data base of sentences if you want to
use at a then I can keep adding new sentences to the data base essentially. And how do I
do that? I do it by this pattern matching mechanism which says that if I have alpha and if
I have alpha implied beta then I can add beta essentially. So, this is rule of inference
there have many others you must see familiar with. So, one rule is call simplification we
should will need for this it say if you can see a sentence of this kind then you can right
the sentence of this kind essentially alpha. So, notice that once you have rules like
simplification we do not feel complete for somebody to us the set of all atomic
valuations for all atomic proposition. Somebody is given a of p and q from this by
applying this we can always add q to or p to the. Then you say we can add this say
sentence beta or we can add this new sentence alpha to our data base. We also use the
turn that we can derive beta essentially or we also use the turn prove beta. So, if alpha is
given to us and alpha implies beta is given to us when we can say we can derive beta or
we can prove that beta is true essentially.

So, the notion of a proof is distinct from the notion of truth that we have talking about
there the notion of truth is base on the semantics of the operators that we have defined
what does it? What does and mean? What does all mean? what does the implication
mean? What does negation mean? And when he say mean basically we are talking about
the truth table essentially how does and combine the truth values of it is component to
give you truth value. That is the meaning of those operator and truth the notion of truth is
based on the notion of semantics are those operator essentially. So, notion of proof or
derivation resent purely syntactic notion it is purely base on pattern matching. It says if
you can see is such a pattern and if you can see is such a pattern. Then you can derive
this pattern it is nothing to do at least on the surface which the notion of prove essentially
ideally. Of course, we would not been in a machine like this a syntactic machine like this
which is producing something which a not interested. And we I interested in true
statement we want to know what is true even some premises not else is true. So, theory
on proving is the name that we use. So, this is essentially So, let us see what happening

798
here before we look at the machine if further what is been given to us says p and q for
lets label number this now true is am.

And you are familiar with this process p implies a or a and m implies c here the fourth
facts is a terms sometime we used given to us that this 4 sentences at true. Then a
conclusion that we have see is see true or not will Alice go to college or not essentially,
but we have not looking at the meaning of this sentences. Any more we have not even
looking at the truth values of the sentences. We are asking that will this machinery that
we have we have talked about tools of inferences. There are more rules that we can
talked about and if necessary he will introduce the can we have this machinery produce
this formula c or derive this formula c a prove this formula c. So, what is this machinery?
This logic machine essentially says pick our rule and corresponding data by data had be
mean the sentences which are already there. So, for example, these sentences are already
given to us essentially and if some rule is applicable data take that rule. And take that
data and add consequent of rule of a there as the matching component at am stating
explicitly here what is the consequent? The consequent is return below the line here
return below the line here we also use another way of signifying this rule.

We say alpha implies beta there is another way of righting this rule it is say that given
alpha and given alpha beta you can derive beta. So, this symbol is used for proving or
deriving thing and this is stating a rule of inference modus porous is say that if I give this
to you can add that corresponding to this is corresponding is also the notion of truth.
Then we are asking if this is true and this is this is true this is true then is it necessary that
c is true. We would like this as follows p and q and n and p implies a and this one a and n
implies c we have. So, corresponding of sentences we have the social of truth value and
correspondence notion of a truth value. We have saying that these 4 sentence entail c. So,
the premises is what are the premises is 4 things even to us premises entail the sentence c
and that more formally we like this these first premise, second premise, third premise,
fourth premise. And that this similar symbol, but it 2 lines is for entailment an entailment
says that if these are true than that is necessarily true that you cannot the that true that is
notion of entailment. So, that is set of truths 2 sentence that we talk about an entailments
basically asked the question thus is see belong to the set of 2 sentence or not those this
premises entail c the question. We are asking here is can you derive see from the

799
premises could derive we see are use this rules of inference for entailment; we have to
use the truth stable for the operators they are quite this things.

So, I am so fact that they are different. So, logic machinery basically says packable and
corresponding data and add consequent of the rule. And we put this essentially and what
this will produce is another set of formulize which we will call is what is the call this that
is let me, because I use the symbol p earlier. Let us say this is p r something some
symbol what is the set? This is a set of I should write it here set of provable formulas. So,
I have in adamantly sort of switch to the terminology of formulas but essentially a
formula is a sentence in my language essentially a well form formula sp. It is a short for
well form formula anything that I can construct using the sentence that I have defines is a
formula. And this set p r is a set of all those formulas which are provable given the set of
premises and the set of rules of influences that are given to me essentially. So, the
question we are asking is the c belong to this set or not and instead of writing this loop
which will keep going indefinitely, because the set is actually infinite. We can put a the
termination criteria that if c if the consequent equal to c then stop essentially we can just
put in the condition that.

So, that means we are writing the program to generate a proof for only c and not
worrying about other provable formulas essentially. So, there a small variation
essentially so how do we go about this? This is given to us we can add p here why
because there is a role called simplification which says if you have alpha and beta and
alpha matches p and beta matches q. Therefore, I can produce p here by simplification
there another rule. So, I can use now modus ponens because I have p and I have p
implies a I can add a remember modus ponens say if you have alpha, alpha is p and alpha
implies beta is p implies. Therefore, I can beta becomes a and I can add a there is a
another rule which says that if I have 2 things. Then the conjunction of those 2 things can
be added which is called addition, but I can write a m, because I have a and I have m I
can write a and m I can add this is another rule of influence which I have not written here
explicitly and then again I can use modus ponens. So, ideally I should say how would I
get this? I got it from one and simplification how did I get this? I got this from 5 and 3
and modus ponens hence so on essentially. So, you are familiar with this process having
studied geometry and algebra in school.

800
So, now I have got a m I got this by addition of a and m and then the last thing is c,
because I have this a a and m a and m matches alpha and a and m implies c matches
alpha implies beta. So, c matches beta. So, I can using modus ponens. So, this is a
syntactic machinery of of arriving at proof of c which does not look at the meaning of the
operators. It does not understand what does m stands for? What does implication stand
for know all it has is set of rules of influence which says if you see this pattern. And you
see this pattern you can add a new formula to your data base and we can put this to a
cycle and say can we add c. So, at this point we have added c. So, we can say yes c is
provable in to. So, that is a notion of proof notion of provability that is syntactic
machinery we are after, because it is easy to write pattern matching algorithms and in
this case the patterns are exact. So, you know matching is not even hard and you can
keep looking for rules and matching data and keep producing new data.

So, you can imagine the forwards search algorithm which says this is what is given to me
this is a rule available to me from this. And this I can for this from this and this I can
infer this from this and something else I can infer this So, I can mo moving forward and
that c is somewhere. So, if I can reach this formula c then I have stopped. So, the power
state space search that we talked about is essentially being done here in. So, notice that
we have said pick a rule we have not said which rule that of course, is a very tricky
question you can go off in some direction and infer meaningless things. So, that is
obviously the thing that we need to tackle next essentially. So, what have we done? We
have defined the language of propositional logic this in fact then the semantics the truth
function semantics which says that given a sentence. We can arrive at it is truth value
using the truth table approach then we talked about this machinery of making inferences
which which does not look at semantics. But it only looks at the syntax, it is a syntactic
machinery and basically adds new formulas to that we can sort of bill a machinery to
produce the formula.

So, is this machinery, good machinery or not? This is this is the question which is asked
by all logicians. So, what do we have? We have a logic machinery is this logic
machinery doing what we want us want it to do? So, there are 2 notions that we use here
one is the notion of soundness. The notion of soundness says that my machinery will
only produce true formulas we can express this as follows if. So, let say l is a language

801
give to us or set of premises given to us if l entail alpha then l sorry if l l derives alpha
then n l entails alpha notion, so soundness is that my logic machinery will produce only
true formula if it produces a formula then that formula must be true. Of course, trivially
if your logic machinery does not produce any formulas that will be a sound machine. But
that of course, is not a very good machinery as a properties completeness that we have
already spoken about is that if l entails alpha. Then l completeness says that if a formula
is true then there is a derivation for that formula in my logic machinery that if I had a non
deterministic machine which knew which role to pick and which data to apply to it will
produce a derivation essentially.

Of course, in the absence of having non determinism we have to use search and we will
sort of focus on that as we go along tie to soundness is a notion which is basically is
derived from soundness. But it is useful to talk about is it is consistency, consistency
says theta logic is consistent if you can derive either alpha or negation of alpha but never
both essentially. So, says that l derives alpha and I must use the exclusive or l derives
negation of alpha. So, this stands for exclusive which is not a symbol I have defined here
but we can define it in terms of this. So, consistency is basically de tied up to soundness
is if you just think about this a little bit only one of those things will be true and
soundness says that I can prove only those true 2 things essentially if I prove something
it must be true. So, you cannot have alpha and not alpha being derived. So, which is
basically the other side of soundness, but very often we talk about consistency
soundness, completeness and consistency essentially. So when we begin the next class,
we will talk about these properties. We will ask the question when is my system going to
be sound why is this rule sound how do I say that this rule is sound.

So, we will ask this question, we will also talk about completeness a little bit, but we will
not prove it is completeness properties the profs are simply little bit longer. And then we
will move on to higher order logics essentially may I will give you an example of logical
systems which were know as axiomatic systems. So, last 200 years mathematician have
been talking about building axiomatic system. So, they have been saying if you use this
set of operators and this rule of then your logic machinery is complete essentially. So, I
will just give you an example you would not prove their completeness but we can talk
about it essentially. So, in the next class, we will begin with soundness and completeness

802
and then move on to logic.

Eventually we will also focus on this machinery what really this algorithm should be
essentially this seems to wage and it needs all this power of non determinism to work.
We need something more efficient to be able to derive the proofs essentially observe that
you know of course, we are not talking about first sort of logic yes. But in first sort of
logic you would have this problem you could say if a number is odd then it is successor
is even essentially. And if somebody tells you 5 is odd then you will say go into a
direction saying 6 is even, 7 is odd, 8 is of course, you have to have the other rule also
that if a number is even and it is successor is odd. Then you will just keep making
statements 5 is odd, 6 is even, 7 is odd, 8 is even, 9 is odd and you will never come out
that loop essentially. So, obviously that is like depth first search going into an infinite
branch that is danger lucks even in this algorithm here and we have to look at some of
those algorithm issues. So, we will stop here.

803
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Lecture No - 43
Prepositional Logic

(Refer Slide Time: 00:15)

So, we are looking at logic and listening and we are looking at propositional logic, so
just to do a quick recap.

(Refer Slide Time: 00:37)

804
We first look at the language and the language is made up set of propositional symbols I
do not remember what the symbol we use anyways. So, something like this accountably,
if a set of atomics symbols and each of these symbols stands for a sentence and we do
not particularly care about its stands for essentially. Associated with this set of symbols
is a function v which map p this symbols to set let say true and false basically a two
value set which as far as we are concerns will stands for true sentences and false
sentences. Then, we have connectives and or not imply and so on, which gives lies to a
set of sentences, so we will call them a set of formulas.

Let us call set f, so we could construct compound sentences from the atomic using the
logic connectives and associated with this set of formulas is a function with maps f to the
same set true or false. So, this side is the syntax and that side some sentence semantics of
this thing and what the semantics captures here is that how do this logical connectives
connect the or influence the meaning of compound sentences essentially and that is given
by this. So, if you remember for example, a sentence like p and q would map to true if
both p map to true and q map to true and so on. So, now, given a sentence f belonging to
f or formula f belonging to f, so for example, you might say something like p and q
implies r or not.

Then, p implies some such formula re arbitrary formula, which can be constructed using
this logical connectives p q r and s are the atomic symptoms and implies of or not
implies other connectives. Essentially, we can find a valuation for this if you know what
the valuation for p q r and s is, so if you know for example, the p is true, let us say every
p q r s everything is true and this true implies true or false implies true. So, this whole
thing will evaluated to true essentially, so this was the notion of truth this is the notion of
truth. It says that given a sentence f belonging to this set of synthesis there we can
compute whether the sentences true or not given a sentence and given a valuation v.

For all the atomic sentences, we can compute whether the given sentence match to true
or false essentially and I do not remember whether we discussed this, but in general a
formula can fall into three categories one is a valid formula. So, we say f is valid if for
every valuation we that we can think of Val f is equal to true, so you are familiar with
this notion I just repeating this. So, when you say for every valuation v essentially we are
talking about the truths table for the formula. So, this formula for example, we can
construct the truth table which will have 16 rows because there are 4 propositional

805
atomic sentences, which means p can be true or false q can be true or false, r can be true
or false or and s can be true or false.

For each of these 16 valuations or each of the sixteen rows in the truth table if the last
column is label with true then we label then we say the sentence is valid. So, a sentences
is valid if is true for all possible valuations a trivial sentence which is valid is for
example, p or not p this is of course there is only one variable, but this is a valid sentence
also known as the tautology always true.

(Refer Slide Time: 06:07)

We say that the sentence is satisfiable or the formula satisfiable if for some valuation v
this is true the same thing accept that instead of every v use term some. If you can find at
least one valuation which will make this true then that sentence is satisfiable for
example, I might say p implies q 1 simple sentence not always true for example, if p true
and q false this is false. I can find valuations of p and q usually make this true if there are
3 valuations which will make this true such sentences is also sometimes call contingent
sentences. So, one is valid the second is satisfiable or contingent and a third is
unsatisfiable.

Here, we replace this with no if there is no valuation which makes this sentence true,
then it is unsatisfiable example as you can this something like p and not p essentially. So,
in general there are three kind of sentences valid satisfiable and unsatisfiable. All are
seen notions are used somewhere or the other as we will see later essentially, so when we

806
talk about truth we also say we sue this symbol to say that a formula is valid, you can use
say using this essentially.

(Refer Slide Time: 08:07)

Then, there is a notion of entailment if now this is an interesting notion because in when
we study logic the reason we study logic is to we are able to capture valid forms of
argument. Essentially, we are not so much interested in saying can we find the valuation
which will make the sentence true or not we are more interested in saying that if
somebody has given you a set of premises or a set of axioms. Then, does some other
sentence follow that essentially, so given set of s sentences we say that a formula f is
entailed by s, which we write as f entails s if the following is true if for every that makes
s true and when we say a set of sentences is true essentially.

What we mean is that every sentence in that set is true, so you can think of it either
bigger sentence with the sign in between their essentially. So, we are more interested in
the notion of entailment that somebody give us the set of premises let us call them s then
we want to ask whether a given formula f is true or not essentially. That is notion of
entailment essentially, so these concepts are kind of semantics concepts.

807
(Refer Slide Time: 10:19)

On the other hand we have the notion of provability, so if you recall we did a small proof
in the last class notion of provability is tied up with the rules of inference. For example,
most commonly used tool of inference call modus ponens and we express it follows like
if you have alpha and if you have alpha implies beta then entails beta. So, last time we
wrote with as three separate lines alpha and then alpha implies beta below and then beta,
but this is just another way of writing it. Essentially, you are saying that if you are given
the alpha and if you are given alpha implies beta, then you can derive beta or in some
signs you can add beta to your set of sentences.

So, notion of provability is a or the notion of proof is the entirely syntactic concept if
simply says that given set of formulas, you can keep adding new formulas to the system
and this particular rules says that is you have a formula which matches alpha. If you have
another formula which matches alpha implies beta, then you can add the formula which
matches beta to the set essentially this alpha and beta could be arbitrarily compound
from. They do not have to be atomic form, essentially it is just s it is just a pattern that a
pattern the same thing must be here of course alpha can be any formula. If that formula
implies beta is present in the data base or nor it be a set of sentences, then you can add
this.

So, the proof finding algorithm is basically a simple algorithm which keeps which does
the following select a rule of inference select some data to which you will apply it to and

808
add some new formula to your database. You keep doing this repeatedly till you have
proved the in formula that you are interested in essentially. So, another rule of inference
is for example, Modustollans, which says that if you have not beta and alpha implies beta
then you can add not alpha in this set I will show.

You are familiar with this kind of formulas, so the next question I want to ask is what
makes a rule of inference a suitable rule of inference essentially. So, if you recall we had
this notions of soundness we say that a logical system is sound which says that if s in
other words if you can derive a formula alpha from a given set of sentences then if their
logic system is sound. Then, this alpha will also be entail by the set of sentence s which
can be re express as the following if your set of premises are true then the conclusion
will necessarily be true that is the notion of entailment.

Thus, the right hand side this notion of entailment we say that the rule of a logic system
is sound if anything that can be derive is entailed as well. On the other hand, a logical
system is complete if anything that is entailed can be derived observe this also look likes
sentences in a logic. This is a slightly different logic in which we can talk about
sentences like s derives alpha or s entails alpha which of course has to be in some
different logic essentially, not in proportional logic. So, the question we want to ask is
when the rule of inference is sound essentially, so soundness is tide up to valid rule, so
rule is valid if it entails right hand side.

So, rule like this is valid if this and this actually entail beta, so look at an example of a
rule which is not valid if I say alpha implies beta and beta, I am just writing in the old
which is its similar thing this is not really a valid rule. So, we do not have a name for it
this rule s says that if alpha implies beta is given to you and beta is given to you infer
that alpha is true. I do not know whether we discuss in the last class this is actually the
process of abduction, so it is like this for example, you might say that if somebody is
drunk then that person staggers while walking.

So, that could be alpha could stands for somebody is drunk and beta will stands for
person stagger while walking essentially, then you say somebody staggering while
walking essentially.

You come to the conclusion that the person is drunk essentially, now this is not a valid
rule of inference because it possible that the person may be drowsy or sleepy or tired or

809
hurt it could be anything, but it is not necessarily follow that that. If you have this fact
you assume that to be true that people who are drunk will stagger, it does not mean that
anybody who staggers is drunk essentially, whereas if we you were to use that formula
here if some you say that somebody is drunk. If you have that somebody is people who
are drunk stagger then you can info the other person who is stagger because the rule that
actually says that essentially.

In other example, that we might have discusses this problem this process of diagnosis in
medical diagnosis a disease causes symptoms. So, disease implies symptoms if you see
the symptom then you infer the disease essentially now that is not necessarily a valid
inference. It is possible that the symptom could have been due to some other disease as
well essentially because there are many diseases for example, cause fever. Then, if you
simply say that just because it is fever it is this particular disease not a valid rule of
inference. So, how do we distinguish between valid rules, and rules which are not valid
rules are based on tantological implications, which means that corresponding to every
rule of inference.

There must be tantology which is an imp there must be an implication statement which is
a tantology and corresponding to modus phonons for example, the tantology is alpha and
alpha implies beta implies beta. So, notice the similarity between that pattern and this
pattern here this is one sentence one sentence in my language of logic. There, I am using
an additional symbol which stands for derives or something like that which is and extra
logical symbol in sense it is not a part of language that I am using, but this is a sentence
in my logic and what we are saying now is that rule of inference is valid. If it is based on
a tontological implication by this, we mean we should we should have a corresponding
sentence here and a sentence must be a tautology.

So, is this sentence are tautology you can construct a truth table to find out and show that
this is tautology or you can try and show that this is a tautology by trying to show that it
is not a tautology which is kind of proof that you often do proof by contradiction. This
would say let say that can we make that sentence false or if you remember the truth table
for implication you can make it false only if you make it this part true.

If you make this part false now you can make this part true only if you make this true and
if you make this true to make this true you have to either make alpha false or beta true,

810
but if you have made alpha is true here. So, this mean that alpha is true, so now, alpha is
become true and beta is become false, so this part becomes true implies false which is
actually false, once this becomes false this and this becomes false and once this and this
become false, this becomes true. So, we can make this sentence false sorry we cannot
make this sentence false because to make that sentence false we have to make this part
true. We have to make this part true, but we are unable to make this part true because
you have to make this true and this part false.

That is the only row in the implication to table where this implication formula becomes
false. So, this has to be false, so that means, this beta has to be false now to make this
whole part true this has to be true. So, to make this part true implies false has to be true,
but true implies false is not true it is false. So, we are not able to make this false, so we
are not able to make this whole sentence false. So, this sentence is a tautology, so as a
small exercise, maybe you should just construct the truth table for this and see that this is
tautology. Now, there are other kinds of rule of inference which are based on tautological
equivalences and they give rise to rules of substitution.

This rules of inference have a sense of direction in the sense that you have to be given
the on the left hand side and then you can produce the formula on the right hand side. So,
it has a sense of direction you can go from left to right essentially in rules of substitution.
You say that two formulas are logically equivalent and therefore, you can substitute one
for other at any point of time essentially. So, an example of a rule of substitution is and
you must be familiar with many rules like De Morgan’s law and so on, but one rule is for
example, alpha implies beta is equivalent to not alpha not beta. So, you must have looked
at it at some point that if you have not done this, tries to construct a truth table for this.

You will see that this tautology and if it is a tautology we can base a rule of substitution
on this, which means that whenever we see a pattern of kind alpha implies beta. We can
replace it which the pattern of kind not alpha or the beta and vice versa and we do not
have to replace it in the sense we do not have to substitute it we can even add it, but it
same thing. So, De Morgan’s law for example, you must have studied, so there are many
tautological equivalences and you can convert each of them in to rule of substitution you
can also see a rule of substitution as a bidirectional rule of inference.

It says that this implies this and that implies that because if you might remember you can

811
say something like alpha is to beta is equivalent to alpha implies beta and beta implies.
So, you can think of a rule of substitution s to rules of implies essentially going in both
the direction essentially. So, what you have in your to construct a logic machine is a set
of rules of inference a set of rules of substitution and you have to ask is now to pick one
of this rules and keep producing new formulas till the formula you were that looking for
is generated. So, till this formula f we are interested is generated, so if you look if you
remember the example we did last time we had something like this.

It was given to us to that p and q were p was something like allies like maps and like
music or something like that. Then, p implies r, then r and s implies t, then not q or s and
from this you have to show t that was the problem that we looked at last time or
something very close to this. So, these are the four premises given to us the set of
sentences s and from this we have to show the t is derived essentially. So, we write this s
as this it says that this and this and this and this let say this is s 1 s 2 s 3 s 4 then s 1
comma s 2 comma s 3 comma s 4 and t entails this or derives this we can only use the
truth term terms interchangeably entailment and derivation if the logic is both sound and
completes.

If it is sound, then whatever is derived is true is entail if it is complete whatever is entail


has been derive. So, we come to the completeness in a moment, but so far we have
address the issue of soundness when my logic machine sound is provided I use valid
rules of inference. Valid rules of inference are characterized by the tautological
implication or tautological equivalence and corresponding to each tautological
implication we can construct a rule of inference. So, for example, if somebody says p
implies p is a tautological implication which is trivially true. You can have a trivial rule
for this which says that if you have p you can add p to that essentially, but of course that
is trivial for any tautological implication.

You can add a rule of inference for any tautological equivalence you can have a rule of
substitution. So, before you move on i want to mention one theorem, which is quite well
known. So, let me write me here its call deduction theorem this theorem says that if I
have a set of sentences s and a sentence alpha and I want to derive the sentence beta if I
can do that. So, remember that this most notation stands for this idea of generating a
proof that you keep apply in rules of inference till you generate beta essentially.

812
The deduction theorem says that if you can do that then always the case that you can do
this, you can take s to the right hand side sorry you can take alpha to the right hand side
and instead of deriving beta you can derive alpha implies beta. Now, this s could be an
empty set, which means that if you want to show that beta can be derive from alpha, then
you can equivalently show that this formula alpha implies beta is true essentially. So, in
other words if you want to show that this is true, this implies this amount is showing that
this and this and this and this implies this if I construct the large formula then that is true
or tautology. So, you can see that all valid derivation amount to proving this large
tautology essentially.

So, let me ask the question what is a relation between these three sentences can you think
of a relation if f is valid, then if f is satisfiable or if f is unsatisfiable is there a relation
between them, we will come to this, so, just think about that a little bit. So, I want to
spend a little bit of time talking about completeness when is a logic system complete. So,
you can see that by definition a logic machine or a logic system is complete if it can
derive every true formula. In other words, every formula that is entailed a given set of
axioms essentially and in the last a couple of hundred years ago, there were lot of people
trying to build logic systems and show that they are complete essentially.

(Refer Slide Time: 30:13)

So, this are known as axiomatic systems axiomatic system said there is set of statements
you must true to be you must accept to be true without asking them asking about them.

813
You know just accept them to be true on faith and some rules of inference of choice
made is essentially. So, one of these earliest systems, in fact probably the first one which
is given by Frege who is also credited with inventing first order logic was the modern
form of first order logic that we use. He had six axioms, which were as follows this is a
name of the axioms then one it says alpha implies beta implies alpha. Then, if you just
look up for Frege axiomatic system, you will find a very nice page on Wikipedia which
describes this six axioms and the things which we follow from there.

Essentially, you do not have to write them here, so this Frege gave this axiomatic system
which have this six axiom first axiom says alpha implies beta implies alpha second
axiom says alpha implies beta implies gamma. The whole thing implies alpha implies
beta implies alpha implies gamma the third one says alpha implies beta implies gamma is
also the same as beta implies alpha implies gamma. Then, there are three axioms says
which use negation sign alpha implies beta implies not beta implies not alpha. So, you
can see for example, this one Frege one has connection with this two rules, essentially
this tool says that alpha implies beta and if you see that not beta implies not alpha is
implied.

Then, if you have to replace this to with not beta, not alpha, then you have something
like this. So, you can see the modus Tollens can derive using Frg one essentially and
substitute this with not beta implies not alpha and that then it becomes like modus
ponens because not beta is given not beta implies not alpha is given and not alpha is
given. This from many of us is trivial negation alpha implies alpha is a axiom in Frege
system essentially and here one rule of inference which is modus ponens, which we are
familiar with this rule here. So, Frege’s axiomatic system says that all tautologies of
propositional logic can be derived from this set of axioms and this rule of inference.

So, it is a complete system because we are not go into going to proof of this year, but we
just take it for granted that here is a definition of a complete logical. After Frege, there
were many people who devised different axiomatic systems using different sets of rules
of inference and other things essentially now as a small exercise. I will ask you to prove
this formula using Frege’s axiomatic systems observe that this is not taken for granted in
Frege system, when you say an axiomatic system.

These are the only six sentences that you take for granted this are pattern, which means

814
alpha can be substituted with anything essentially and only one rule of inference allow.
Using this, can you come up with a derivation of this sentence p implies p, you can try it
a little bit while. Then, if you cannot succeed you can go to the Wikipedia site and you
will find the proof there. So, I am not going to do it in the board here, but you can find a
proof essentially, but the whole idea of developing axiomatic systems was to choose a set
of axioms in the given language. This is now a case professional calculus or professional
logic a rule of infer some rule of inference and say that this set which is complete, which
means all tautologies can be derived in this including this tautology which stands for this.

This implies if you look at the deduction theorem that if I can show that to be a tautology
then I can this t follows from those four sentences is equivalent. So, if I can show this
then I can show that beta follows from alpha essentially and Frege system can derive all
possibilities of course that does not mean that it is a trivial task. So, as you know people
have been struggling to find proof of things for example, Fermat’s last theorem to a few
hundred years before it was accepted as being solved essentially. So, finding the proof is
not the trivial task it is because there is a lot of choice available to you what to use and so
on.

One of the kind of proof that we did in the last class is called natural deduction which
says you take a rule of inference applied to some premises and add a new one to the set
and then again take another rule and apply it and so on. So, we had if you remember we
had set that from this we can first derive p and then from p and p and r we can derive r
and then from p and q we can derive q then from q and not q or s. We can derive s there
is a rule which I am not may be not stated. So, from r and s, we can derive r and s and
from this we can derive t. So, there you can start like this derive all this things then
derive t, this process is called natural deduction essentially.

815
(Refer Slide Time: 38:26)

Very often, we can generate rules of inference by showing something like this supposing
I want to show that this is a tautology then one way of doing things is to assume the left
hand side. So, I assume p and q implies r and in some sense put it in a box, so everything
that follow within the box is based on this assumption. Then I make another assumption
p, I open another box and third assumption q which I open, another box and now I can
say. So, I have assume this three formulas I have assume this I have assume this I have
assume this, now I can say p and q then I can use p and q and this and apply modus
ponens and assume r or infer r rather.

Then, I can close this box and say I made this assumption of q and so it is a little bit like
this application of this deduction theorem that I am closing this box and I am saying.
Now, q implies r because r is what I reduced from here then I am saying I will close this
box and I will get let me write I t here p implies q implies r. Then, finally when I close
this box then I have this implies this, so you see to prove this formula. So, you must be
familiar with this kind of proof which you have done quite often assume that this is a
case then show that something in this true. So, to show that this is a tautology we said let
us assume the left hand side, then we said let us assume q, so we have three assumptions
that picked by this three boxes.

So, anything that is entailed by these assumptions are inside the boxes, so p and q is true
inside this box because p has been assume q has been assume and because this has been

816
assume r can be inferred because p and q implies r modus ponens. We can apply then we
close this box we get q implies r because the only assumption we made in here was q,
then we close this box then we get p implies q implies r because the assumption we made
was p here. Then, we close this box we get this formula essentially, so that is another
way of another rule of inference essentially.

So, the process of finding proof is not a straight forward process and lot of peoples spent
a lot of time trying to devise strategy for finding proofs. In 1965 along came a logistician
by the name of Robinson who devises the Skema, which you will look at in the next
class in which only one rule was of inference was enough for driving all kind of things
essentially. Now, observe that when we talk of axiomatic systems we are daily talking of
proving all tautology essentially. In the real world we are often more interested in
showing that given a set of premises that something else is true. Before we go on to this
Robinsons method, there is another observation that we must make which is about the
choice of connectives.

Now, if you observe this Frege system, it uses only two connectives the implication and
the negation essentially and yet they can made a claim that system is complete which
means any true statement that can express in proportional logic can be derive essentially.
So, where do the other connectives come from well you cannot derive statements in
those exact form, but we have this rules of substitution essentially. So, for example, we
have this rule of substitution which says that not alpha or beta is equal to alpha implies
beta. So, Frege saying that not that he can derive this form of the sentence, but then
equivalent form of sentence here, but logically they are the same they are saying the
same thing essentially.

So, there is a notion of a set of connectives also which is complete I am sure you are
familiar with this set. So, I am not going to spent too much of time here for example, you
must say that this set. So, Frege set this set is complete just use the implication sign and
the negation sign and you can express everything can that can be expressed in
proportional logic using this connectives. We have all this rules of substitution and you
can derive everything that is a tautology in proportional logic using this system of
derivation which is those six axioms plus the modus ponens rule.

Essentially, there are other systems which are complete for example, a commonly used

817
system is uses these three signs negation. We will see that Robinson’s method using this
set of connectives you can even just work with and or you can even just work with or
and negation sign. There are other combination which are possible and some that you
must be familiar with the NAND and NOR. So, you must see familiar with the fact that
these two connectives NOT and AND, NOT or NAND and NOR. This one single
connective enough to express everything you can express in proportional logic which
means you choose any set of connectives that we have here and or implies equivalence
and so on.

So, express anything using those connective use can device a language in which there is
only one connective NAND or only one connective nor or you can express the same
thing in this logic with only one connective. What do you mean by you can express the
same thing you mean that they are logically equivalent which mean that any valuation
which makes any sentence true in the first one will also make the second sentence. The
same sentence true in this in this second formulation, in other words if alpha implies beta
is true here not alpha or beta will be true here or if this sentence is true, then this
sentences that is a meaning of equivalence.

So, anything that can be set in proportional logic can be set with set of complete
sentences complete connectives and then you can choose a set of axioms and a rule some
rules of inference and you can have a logically complete system. The beauty about
Robinson system was that he did not need any axioms he just needed one rule of
inference and that was the complete system you could derive everything all tautologies in
that essentially. So, as a last thing it is nice to define a complete system which is minimal
stance in some sense small set of axiom and small one rule of inference. As you will see
if you try to show that p implies p it involves a lot of steps in showing that p implies see
because you have to start with some axiom substitute something for there.

Then, you know keep laying with the axiom it is a nice exercise you must try in practice
you would not want a minimal as the system you would want a system in which
inferences can be made very quickly. This means you can derive sort of complex rules of
inference, so you can say for example that and this side in fact this is a rule of inference
that is alpha implies beta and beta implies gamma, then alpha implies gamma, you can
have a rule of inference like this.

818
So, you can see its like transivity and f alpha implies beta implies gamma, then you have
rules says that alpha implies gamma if you have such a rule in your system then you have
to showed alpha implies gamma or something like that. If somebody have given you
alpha and asked you to show gamma then you can just take this rule here and this
conclusion here and then use modus ponens can derive gamma essentially. So, it makes
sense to introduce more rules of inference which would make you inferences shorter, so
there is some wage analogy which defining a language here.

So, when people define the instruction sets for machines, they work with more
instructions or less instructions and each ahs his advantage essentially. So, theoretically
of course it is nice show that a small set is good and complete, but in practice you guy
need something bigger unless you come up with something like Robinsons method
which was small and yet very efficient essentially. So, we look at Robinson’s resolutions
method in a next class essentially.

819
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Lecture - 44
Resolution Refutation for PL

(Refer Slide Time: 00:23)

So, we have been looking at reasoning in proportional logic. And today we want to look
at recent method known as the resolution method. This is given by Alan Robinson in
nineteen sixty five. So, it is a fairly recent method if you compare it with for example,
free gaze axiomatic system which is more than 2 years old. This is by recent and it is
motivated by the fact that you want to write a computer program to generate proofs
which means you have to find proofs automatically. Now, if you look at the high level
algorithm for finding proof it basically says picks some rule of inference, pick some
applicable data and produce some new data add it to the system. So, it is basically a
search algorithm which is searching in this phase of sentences which can be produce and
trying to produce the sentences that you are interested in.

820
(Refer Slide Time: 01:42)

So, to motivate this resolution method let us go back to the problem that we have started
with the example that we looking at which said that P and R or something like that R
replies Q. P and S implies T and not using the same symbols, but anyway it is a same
problem and from this we have to show T. Now, if you if you go back over the proof we
did for this and this was to be shown. So, this is 1 2 3 4 then we derive it for example, P
from this thing and there was a rule called 1 called simplification. So, we use 1 rule for
inference called simplification. And then we derived R by the same rule then from R and
R and Q with a derived Q. So, 6 2 the second rule of inherency use there and then from R
and not R or S we derived S. So, this is from 6 and 4 and the rule called differential
syllogism. So, I just used ds there then from this we get from 5 and 8 we get P and S.
And then finally, we get T from 9 3 and so this proof required us to use 3 rules of
inference simplification 4 rules of differential syllogism and the addition. And then of
course, we have to choose which ruled to pick and what to apply.

So, and so it look nice and simple when you look at just these 4 sentences and the
conclusion that you want to draw. But in practice the set of facts that available to a
reasoning system would be hundreds of may be 1000s of facts and you may want to ask
whether certain formula is true or not. So, it is it is not that only relevant things are given
to you there may be many irrelevant other formulas and there may be 100s of rule which
may be applicable. So, if you look at the power safety search example like we are
looking at in planning. It generates a huge launching factor and the number of possible

821
things we can do is very large essentially. We have not use any axioms here so by and
large we try to systems were axioms are not needed. It essentially they just add put
enough rules of inference which will work, but as a motivator for this method say some
other example we have given a simple formula S and you are asked to derive P or not T
we asked to derive.

So, you must distinguish between this in tactic process of deriving formulas as suppose
to the semantic process of looking at the truth value. And saying this formula is true if
you look at P or not P you can obviously say it is true it is and it is quite trivial, but the
question is can you derive it in the system. So, just say that it is not straight forward to
the write P implies P system if not. So, in fact, you cannot derive if you are given only S
and you have given all kinds of rule of inference simplification and differential
syllogism. There are so many of them it is not going to help you can never derive this
from this equation that Robinson’s method. Resolution method is a complete method if
you use only 1 rule of inference and it does not need any extra axiom system for that is
why so very attractive it makes the task of programming. It is simpler to simpler you take
that rule all need to decide on which data to apply here that is of true still not a straight
forward problem.

(Refer Slide Time: 06:54)

So, what is the resolution rule as it is called? So, will look at the simple form there is a
more general form which we will may or may not look at it later if you are given P or Q

822
and if you are given not P or S. We can derive Q or S this is the resolution rule in the
simplest form it could be P and Q or R also does not matter. It does not have to be just 2
thing, but in the simplest form it has just 2 proposition here 2 propositions here. So, in
the language of resolution method we call each of these elements are literal and
resolution effect resolution method works formulas in what is known as clause form.
And we are concerned in proposition logic clause form is the same as conjunctive normal
form. So, we have to first construct your formulas in clause form or conjunctive normal
form which means that you cannot use things like P imply Q. And so if you have a
situation like P implies Q you should convert it into not P or Q essentially. So, I will take
it for granted that you can convert any given formula into clause form clause form or C n
f is basically a formula of the kind C 1 and C 2 and C k where C I is of the form d 1 or d
2 or d r.

Let us say and each d I is either a proposition or negation of proposition. So, it is a very
simple structures form at the outer most level it is a conjunction of clauses. So, C 1 and
C 2 and C k each clause is a disjunction of literals d 1 d 2 or d R each literal is either a
proposition or it is negation. The negation sign is some sense occurs in the inner most
package if you imagine brackets all over the places essentially. So, for example, you
could start up by converting this problem into clause form saying P is the clause Q is the
sorry R is the clause. Then not R or Q is a clause then not P or not S or T is a clause this
corresponds to that this is not R or S. So, I converted this 4 sentences into clause form
here. So, each one of there is a clause and I can think of it as a P and R and this thing and
this thing and this thing have a larger formula. Now, first of all we must ask ourselves is
this rule a sound rule or a valid rule of inference and its validity is based on a validity of
this equivalence P or Q. Or so I can keep writing this, this should be consistence we need
this larger equivalence to argue about what we are doing. But in practice the only think
we need this and this implies this which means P or Q and not P or S implies Q or S,

So, for the rule to be sound so this implication is a total logic it is a rule of inference that
we have if you this and if you see this. Then you can produce this and remember that
unlike the rule that that we talked about earlier these are not patterns in the sense that you
cannot plug in any arbitrary formula here. This is the clause; this is the clause and this is
the clause which means the consistence or proposition or negation of propositions it is
not a arbitrary formulas but it take any such 2 clauses. So, if you want to describe this

823
rule if you get 1 clause C 1 another clause C 2 and one of them has literal and one of
them has a negation of that literal. We can produce a clause to add up all the literals here
and remove the, what is cancelling each other in some says. So, we are talked about
clause form also instead of writing it C 1 and C 2 we often write it as C 1 comma C 2
comma C k. So, we think as it is a set likewise this 1 d 1 comma d 2 d R where the
comma is interpreted appropriately and in this outer level and in this outer level and or in
the inner level sorry the joined by the and…

So, the rule of inference required by this implication be true, but we are going to be in
trusted in this as you will see in this moment why that is the case. So, this is called the
resolvent this is called resolvent I always keep getting let us call this resolvent whenever
you add that resolvent to the set of clauses. Then 2 set of clauses are equal in other words
what we are saying is that we can keep adding clauses to the set. So, the resolution
method works as follows that is start with a set of initial clauses let us called them C 1 C
2 C 3 C k. And then you replace or you add 1 more clause to this which is the first
resolvent which means you are saying that this set of clause. I am using a comma here
and you should not confuse this comma’s equivalent to and here or you can write and
that is to be safe I added 1 more clause here then I can add one more. So, what I am
saying is there I keep adding this resolvents to my set of clauses and what I get is an
equivalent set of clause. This set is equivalent logically equivalent to this, this set is
logically equivalent to this.

So, you must prove that this is indeed the cases once you prove this it is easy to prove
this, because from the right hand side to the left hand side. It is trivial from the left hand
side to the right hand side is based on this that you are adding this extra clause Q or S.
And then you can basically show this is the topology and I keep doing this till I get a
clause which looks like till I get the formula which looks like this this is called the empty
clause till I generate an empty clause. So, what am I doing? I am starting with set of
clauses C 1 C 2 up to C k and I added with a first resolvent R 1 then to this added R 2
then R 3 R 4 and so on till R l. And at some point I was able to generate the empty clause
and I added the empty clause to this set and that is actually the algorithm terminates. So,
what is the algorithm? Algorithm says take any 2 clauses which are which can be
resolved which means one of them has a literal. And other one has a negation of that
literal resolve them produce a new clause and added to this set. And then pick any 2

824
clauses apply the resolution rule keep applying the resolution rule keep, adding clauses
till at some point you generate the empty clause. How do you get the empty clause? If
you see clause like this P if you see clause like not P you can see P or nothing not P or
also nothing.

So, when you apply this same rule here there is no Q and there is no S. So, all you left
with this empty clause we sometime write with a box sometime we write with this.
Basically it is an empty clause empty clause stands for false or it always false. So, I have
jumped a little bit ahead from the national deduction of style of proof in the national
deduction style of proof. We said that we keep adding new formulas and we terminate
when we find the formula that we interested in here I already jumped ahead. And said
that the resolution method keeps adding new formulas and it terminates when you
generate the empty clause when you find 2 clauses like this. And so this box is also used
to denote an empty clause this bottom also can be used to denote an empty clause.
Basically an empty clause stands for something which is false in the resolution method
terminates with an empty clause. And we will see a moment why or I will tell you in the
moment wise, but what the implication of this? What is the implication of the fact that
you have added empty clause to your set of clauses?

We have to go back and see what were we are saying here, we are saying that this set of
clauses logically equivalent to this set of clauses which is logically equivalent to this set
of clauses and so on which is logically equivalent to this clause. Now, what is the truth
value of this clause? It is false, because it is something and something and something and
something and something and false. So, this whole formula is evaluated to false which
means this is false which means this is false and this is false I could have use the
resolution method to say terminate for the example. In that example we are doing
terminate when generate T essentially, but we are not doing that we are saying that
terminate when you generate in an empty clause which means we are trying to show that
this formulas are unsatisfiable. That is why this method is called reputation method it is
used to show that a formula is unsatisfiable. A formula when I say this is the large
formula right this got many clauses inside it and it used to show that the formula is
unsatisfiable.

And the reason why the resolution method is implemented in this manner is, because it
has been shown that the method is complete. If you want to derive null clause it is not

825
necessarily complete if you want to derive any arbitrary clause which may be an tail by
the system. So, remember completeness means whatever can be derived basically
resolution method is complete for showing that a formula is unsatisfiable. If you give
with an unsatisfiable formula it will there will a derivation which will derive the null
clause from the set of unsatisfiable formula is it going to help us. We are not interested in
unsatisfiable formulas or are we could we interested in unsatisfiable formula. So, let us a
question I asked you earlier in the last class we have 3 kinds of formulas; one is valid
formulas which are always true on the all valuations there are satisfiable formula which
are true under some valuations. So, for example, P implies Q is true in under certain
valuations then there are unsatisfiable formulas which have which are true for any
valuation which are false for all valuations.

And essentially what we are saying is that if we have unsatisfiable formulas then you can
test it using syntactic procedure of this resolution method which means keep applying the
resolution method and you will be able to derive null clause. And so first of all you have
to convince that deriving the null clause is indeed showing the unsatisfiable. Because of
this equivalence this whole formula is equivalent to the 1 before that which is equivalent
to this which is equivalent to this which is equivalent to this which means the original set
that started with is shown to be unsatisfiable. And remember that everything is to be
expressed in clause form. So, if I could so have that problem to solve if you look at that
top 5 lines there I have 4 premises given and 1 confusion to be drawn can I set of
produce an unsatisfiable formula from there. So, the process is straight forward. So, let
us look at the clause form here. So, I have C 1 C 2 C 3 C 4 C 5. What am I trying to do?
If I look at my logical notation I am trying to do this P 1 P 2 P 3.

826
(Refer Slide Time: 23:33)

So, I am saying given these 5 clauses all those 4 statements which are equivalent to 5
clauses, because this is the end statement it is broken down into 2 clauses. Because the
clause form says that the clause is has only the this junction side inside. So, P becomes
the separate clause R becomes the separate clause this becomes R or Q this becomes not
P or S or T and this is anyway not R or S. So, what am I really trying to do was this that
can I derive T from this clauses. If you go the back to the deductions you know it said
that if you can do that then you should be able to derive C 1 and C 2 sorry. So, remember
that that is what a deduction said that you could take actually you could do it 1 by 1. But
eventually we will come to this formula and I will leave that as a small exercise. Because
we are also shown somewhere in between if we have P and Q and the whole thing
implies R then P implies Q implies R. So, the 2 formulas are simple. So, eventually we
can all the ans come together here. So, this is the set of premises implies the conclusion
is what we are trying to show to be true.

So, what are trying to show? You are trying to show that this formula this larger formula
is a topology what I have with me is a method which can show that a given formula is
unsatisfiable can I brief this gap is not it? So, that is why I ask this question there are 3
kind of statements valid statement, satisfiable statement and unsatisfiable statement. If I
take a negation of a formula what where does this how to how do they get related to each
other if I take a negation of satisfiable formula for example, if I take the negation of valid
formula or take a negation of unsatisfiable formula. So, there is a clear relation and you

827
need to think about little bit about this between the valid formula and the unsatisfiable
formulas. A valid formula is true under all valuations which means every row in the truth
table. It will end up with true, if take if you take the negation of that formula then every
lower ends of become false. So, I have this. So, I am trying to show that this is true
which means you show that negation of this C 1 and C 2 and C 3 and C 4 and C 5
implies T is unsatisfiable.

Instead of trying to show that that this is the topology which means you can be derived
and assuming a sound and complete system. Instead of trying to show that it is a
topology I can say that take it is negation and that there is a unsatisfiable formula. So,
you must convenience yourself that this is the sound step that if I take topology and
convert and convert its take it is negation it becomes unsatisfiable. So, I produce now
what I think is a unsatisfiable formula and of course, to check whether it is unsatisfiable
instead of constructing a 2 table. I will use the resolution method to check whether that is
the case. Now, just serious of steps if I go to take this not inside so I have to convert this
into clause form. Remember that the resolution method applies only to clause form if I
take this not inside I will get not of not of C 1 C 5 or T. So, remember this equivalence
which said that alpha implies beta is not equivalent to not alpha or beta. So, this whole
thing I am taking as alpha. So, I am taking not of this alpha and or this T and then when I
have push this not finally, inside what I will get this C 1 and C 2 and C 3.

So, that is the logically equivalent formula now I have the formula that I have been
interested in clause form. So, each of these is a clause I we already said that. So, each of
them is a clause. So, we end this, this is just a literal from not T is also clause essentially.
Now, we have a approach to using the resolution method which is that you take the set of
clauses that are given to you which is the optimizes. You take the conclusion and take it
is negation not of T and add it as a clause and the claim is that this set is unsatisfiable is
it unsatisfiable we can try and show that using the resolution method essentially. So, the
resolution proofs all usually shown as directed graph directly recycle graph. So, the
resolution they are called. So, let us try and do the resolution method here. So, just write
the clauses again now this is P this is 1 clause this is R this is not R or Q not P or not S or
T not R or S. These are the 5 given clauses and the negation of this is P I have the 6
clauses. So, C 1 to C 5 when negation of the goal and negation of the conclusion which
is the 6 clauses.

828
So, before we proceed you should be able to see this is a very proof contradiction what
we are saying for the given these clauses assume that conclusion is not true. And then
show the whole set becomes unsatisfiable which means of course, this is unsatisfiable.
The only reason why this could be unsatisfiable if this last clause was not correct which
means C 1 C 2 C 3 C 4 C 5 are anyway given to us you cannot assume them to be false.
The only reason why this formula becomes false is because of not of T is there so not of
T meaning is false. Therefore, not of T must be false itself it means T must be true it is
proved by contradiction. So, that with a small task of finding a proof and what is the
proof is a derivation which ends in an empty clause or a null clause. So, let us see we can
take any 2 clauses and resolve them. So, for example, a h m well in this there are not too
many choices, but in general a larger problem will after you many choices, but in this
case we do not have too many. So, let us take this and this and you get negation of Q. So,
you must see that what I am doing is applying the resolution step from this clause. I am
removing R from this clause I am removing not of R.

So, from this clause I am removing R and from this clause I am removing not of R. So,
what I am left with this Q then is that correct this Q is not very useful for me I can take
this and this and I can get S. So, remember this is a directed graph I can take any 2
clauses. So, this is just a set. So, I can take now this and this I get not P or T see I can
resolve this with this or I can resolve this with this. But just for the sake of the argument
let say we will resolve this with this. So, you can see that actually we have derived T we
have actually derived T, but off course, as we know from the national deduction method
that is quite possible to do that. But our method does not stop here our method says we
have must derive the null clause. So, you can see from this and this we can derive the
null form. So, this graph represents the proof that the given set of clauses is unsatisfiable
and the proof wholes, because of this equivalence that we have seen that we are keep
adding new things. And in the end we added the empty clause and in the empty clause
basically stands for false you can see that the empty clause is saying T is true. And at the
same time you are saying not T is true and obviously that stands for false or
contradiction.

If we can derive an empty clause then we have shown that this set of clauses
unsatisfiable which means this large formula is unsatisfiable. And it can only be
unsatisfiable if not T false nothing else can be false because say that are premises given

829
to us if not T is false then T must be true. So, let us quickly just look at this rule what
does says it says P or it says in terms of literals at least P and P implies Q. And you can
do Q now you can see that if I want to converted into clause form. I have P and not P or
Q and I deriving Q because you know these 2 are equivalent essentially. So, we can see
that is just 1 example of the resolution rule been applied or if you look at says P implies
Q and not Q implies not P. And you can see this is again not P or Q and not Q and not P.
So, again in this form when you replace P implies Q with not P or Q you can see it is a
same resolution rule being applied take a literal from here Q take the negation from other
1 and cancel that and whatever remains is that. So, all these rules as we can see are
special cases of the resolution rule that we have stated here. If at the resolution rule more
general than this as you saw in this example we does not have to be only 2 literals it can
be more than 2 literals.

So, we use not P or not S or T and consider it with S. So, removed S from here and what
was left was not P or T is not it? So, that is so the resolution method is a sound and
complete proof procedure for the case of proportional logic is it to introduced in 1965.
And since an lot of theorem proving work is based essentially on the resolution method
essentially it is most popular way of proving theorems, but so far we have seen the
proportional logic. So, what is the language of proportional logic? We have set of
propositions and they could be stand for anything. And we do not know the only thing
that is sort of this system is solving out for us is the meaning of the connectives when
you have this connectives what do the sentences mean? What do compound sentences
mean when they when are they true and then we can set of know extend that truth value
of essentially if you go back to the example we started with this the Socrates argument.

All men are mortal Socrates in a man; Socrates is man this is beyond our reach to talk
about this argument is beyond the reach of proposition logic. Because we cannot really
do anything with this sentences essentially, because what you need here is to somehow
understand what you mean by all. And then relation between being man and being mortal
and then to say the same relation between being man and being mortal this is applied to
Socrates, because Socrates happen to be a man. Now, this is not possible in proportion
logic, because if you say this is P this is Q then there is nothing we can be do. There is no
logical connective applied here essentially we could say for example, m stands for man
and n something else R stands for mortal. And you could say m implies R is, but you can

830
see that motion of all is essentially. To handle this we need to go to first order logic
which is a more expressive language which allows us to look inside sentences.

Remember in proportion logic a sentence is atomic and the only thing you can say about
a sentence is that in our mind it stands for something. And whether it is true or false, but
in first order logic a sentence can be broken down into its consequence. And you can
look at the relation between the different constitutes what is the relation between of
being between being a man and being a mortal? And then saying that this relation carries
a what to Socrates, because Socrates is a man therefore, same relation was applied to him
that also mortal that is possible in first certain logic. First certain logic will introduce
motion of individuals men Socrates and so on. In proportion logic there is no individuals
there are only sentences, the sentences can only be true or false whereas, the individuals
are things which can participate in relation essentially. So, in the next class, when you
meet on Wednesday we will look at of logic and we will look at all these proof methods
in the context of first order logic plus some extra rules. So, we will stop here.

831
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Lecture - 45
First Order Logic (FOL)

(Refer Slide Time: 00:20)

So, we are in the some sense in the last lap of the course, and we are looking at logic. So,
when we say logic there are many aspects to what we are talking about. The first one is
syntax or the language part. So, we first define a formal language which is the language
of logic that we are talking about and we have rules to say what is the sentence in this
language and what is not. Then we have the lotion of semantics which tells us that when
we make sentences out of smaller sentences, how do they combine, what is the meaning
of those sentences? Or in general we are concern with meaning and truth. So, what do
the sentences mean is one aspect of semantics and is a sentence true or not true is the
other aspect of semantics. Because we are really interested to use logic as a mechanism
to compute in coats true statements to arrive with true statements to arrive with true
statements through a synthetic process essentially.

So, but to do that we first need to define whether the sentences true or not essentially that
is done in the notion of semantics associated with the notion of semantics. So, given us

832
so one of the thing we are interested in is a given a set of sentences yes which we could
call is premises we could call as axioms. But something that we assume to be true
without questioning that we want to find out what other sentences can be true. So, in gen
so in particular we may be interested in a particular sentences call alpha and we may be
interested asking whether alpha is true or not essentially. And to this end we have this
notion of entailment and we say that the set of sentence and tail alpha if whenever those
sentences are true in those circumstances alpha must be true also. In professional logic
which is what we have looked at so far what do the mean by the circumstances?

Basically we mean the valuation function which assigns of valuation to each atomic
proposition and then we can lift the valuation up to the compound sentences. And if we
have a set of sentences which are true in some given valuation then alpha must be true in
that valuations essentially that is the notion of entailment. So, this is the semantics we
define, because we are interested in a using logic for doing something. But we have also
the same time interested in doing it in a mechanical way. So, the promise of logic is that
you can sit down with paper and pencil and decide whether the given sentences true or
not essentially. Of course, in the modern context the promise is that you can write a
computer program to tell you whether the given sentence is true or not. Because
whatever you can do with paper and pencil you can do with a computer program as well.

So, corresponding to the notion of entailment which is a notion concern with truth value
we have a notion of proof or drivability in the synthetic side of the language essentially.
So, we say that we can derive the sentence alpha given the set s. So we are given the set
of sentence which we call is a knowledge base or something like that. And we want to
find out now, what else can be add to the knowledge base of the database or set of
sentences s and the notion of proof. So, notion of proof is based on the notion of the rules
of influence and rules of influences are basically rules which tell you that given some
pattern already present in the set of sentences what new sentence can you add to the set
up? So, rule basically will say if it has a set of antecedents so for example, model says
that if you can see alpha. And if we can see alpha implies beta where alpha and beta
could match to anything any sentence then you can add the sentence beta to the
knowledge base. So, rules of basically synthetic pattern base mechanism for adding new
sentences. And the idea off course you have different kind of proof procedures.

833
So, we have seen some of them which something which we call national deduction
which kind of flows with the rules of influence then there something call indirect. We
saw an example of this when we said that make an assumption p or alpha and then it
somehow we can show q then we can show p implies q. This form of proof is called
indirect proof were you start with an assumption and then arrive at a formula where is
the assumption becomes a left is side of a implication statement. Then we saw the
resolution method for proportional logic. And there are other methods so for example,
that is the tableaux method which is very popular in many logic circles essentially. So,
proof procedures are basically mechanism for doing achieving this mechanically
producing new form list or mechanically testing whether the given formula can be
produce from a given set up formulas and arriving at alpha.

So, as you can see there are 2 routs so arriving to alpha 1 is a semantics rout which says
that you look at the meaning of alpha look at the valuation function. Or look at the
semantics as to which propositions are true and then decided the alpha is true or not that
amounts to constructing a truth table. And we said that truth tables are too large very
often if the number of proposition is high. And we want to use this mechanism of proof
of generating new sentences till we have generates generated alpha for essentially. Now,
today, we want to look at a different language. So, when we say logic is language
primary it is language we can define different kinds of logic and we discuss this in the
one of the early lectures that you know proposition logic is one of them. But there are
more expressive language is which can express things in more details for example, there
is epistemic logic which talk about people knowing what other people know and that
kind of stuff.

But we are not going to that accent we want to look at the next most the next level of
language which is in fact, such popular level that almost everything that we do in
computing falls within the scope of this language. And this is known as first order logic
and we want to look at that today FOL essentially. In fact, everything that we do on a
computer program in a on a computer for example, you write a c program or a java
program can be seen as working in first order logic. The characteristics feature of first
order logic is there is a notion of variable which is not there in the notion of proposition
logic. And we can talk about relation between the different elements essentially. So, let

834
us define first the language first logic and I will do this set of simultaneously. I will work
with the syntax in the semantics instead of first giving you the full syntax and then the
semantics separately.

So, we will do it a slightly in formal way I will keep writing the syntax and will keep
looking at what the semantic corresponded to and we will see what kind of expression is
allows us essentially? So we are talking of FOL syntax now as in proposition logic there
is the logical part which includes symbols like this. So, we are defining the alphabet of
the language so and so on which is the very which is the same as what it was in
proposition logic then symbols like brackets and so on. All this is borrowed from
proposition logic essentially everything is same. So, at this point I would again like to
emphasis the fact that you must discriminate between the symbol that we are using and
the meaning of the symbol essentially. So, I have use this symbol or I could have use this
symbol and if you want to call this a symbol. It does not a matter what you use this it is a
symbol and the syntactic machinery basically looks at the symbols as the something
which can be match with same symbol.

So, it is basically pattern matching were as the semantics of the symbol as we have all
agreed is that it combines it determines a truth value of 2 sentences combine using this
connective by in a specific manner. So, we know that alpha and beta is true whenever
alpha is true and beta is true and only then essentially. So, that is a semantic side
essentially. So, the semantics is just for our benefit to understand what the language is
doing for us and valid and validate for the language is doing for it. The machinery itself
does not use the semantic at all the machinery just does pattern matching and so on so
forth. So, we are return the semantics of and s saying that alpha is true alpha and beta is
true if alpha is true and beta is true. But when we talk about proofs we have different
things operating upon this symbol. So, we say for example, is p n q is given to us or
alpha and beta given to us we had a rule calls simplification which says we can add alpha
itself essentially. Or if alpha is given to you and beta is given to you then you can write
alpha and beta.

Though we are not looking at the meaning there we only looking at the well define
procedure which is what the proof procedure we are talking about. So, these are the

835
symbols which we borrow from proposition logic then we have set of variables that is
called is set v and typically we use symbols like x y z or sometimes x 1 x 2 and so on. It
does not matter basically it is a set of variable to given to us. A variable symbols you
might want to say we do not we will sort of not keep distinguish between the syntax and
semantics. So, when we say a variable we will say that it has its own well define
semantics how is the semantics of FOL defined? The semantics is defined in terms of a
domine D or some logic book that you might has seen also called the universe of
discourse. Essentially the domine is a set of object or elements and the language of first
set of logic allow you to talk about relations between elements.

So, for example, you might have the set might be a set of people and you might say that
for example, ram is a brother of laxman. So, it is a relation between ram and laxman and
it is a brother relation. FOL allows you to talk about things like that essentially of it is a
domine is a set of natural numbers you might say 7 are less than 11 essentially. So, it is
relation between 7 and 11 the less than relation and we are saying 7 comas 11 belong to
this relation. So, everything is a terms of domine D and the semantics that we will
associate with variables is an assignment function we will call this which matches which
maps every element of every variable to an element of the domine essentially. We are
doing syntax and semantics at the same time. So, as we write the syntax we will try to
understand what is the meaning of those things? So, every variable would basically stand
for element in the domine essentially. So, for example I might say there is a variable for
x and the assignment will say x equal to 3 or x equal to 7 or x equal to 20. So that is that
is what the assignment function is doing.

So, we would also write saying that x maps to x A. So, this notation we will use to say
that this is x A belongs to domine and x belongs to the set of variables and this x maps
that is the assignment function which is doing for us. So, x A will stands for the image of
x under this mapping then we have a set of quantifiers so true be very pedantic should
say set of quantifiers symbols, but we will set of use some slightly loosely. And the 2
most common quantifiers that we use are symbol like this which we read as for all and a
symbol like this which we read as there exists. So, it is to additional to when you see this
symbol additional read it is for all it additional to read the symbol is there exists directly
we will see the semantics of this as he go. But essentially these are quantifiers for

836
variables here and they dictate what assignments are we talking about essentially? So, as
you probably know when we use for all so if say for all x p s were p is a predicate which
you will define in a moment.

Then I am saying that take any assignment and this p x must be choose essentially. So,
we will come to the more formal definition later, but at this moment let us understand
and say there are quantifiers are use to quantify what assignments can be used for talking
about those variables essentially. And just when we are describing this we can also make
an observation that this is the characteristics of first order logic that you have variables in
a quantify an over variables. If we talk about the so called second order logic then we
have relations or as we will call them predicates. And we will have predicate variables
and quantifier over predicate variables essentially. So that is the higher order logic which
is we are not talking about in our logic predicate symbols will be fixed essentially. So,
this is the logical part which is common in every language that we define and the non
logical part in proposition logic and non logical part was a set up proposition that we
start off which essentially. In this case it is a set of there are 3 sets.

So, we will define the language l to be define by 3 sets which we will call P F and C. So,
these are 3 sets some books would use the symbol our instead of P add use p, because p
is stands of predicate here if we use our then it stands for relation essentially. Both are
talking about the same thing the predicate comes on the language of the logic. And
relation is basically define the about the domine we do not expressly talk about relations
here. But, predicates as you will see then intended to stand for relations essentially so
what is this set? This is a set of predicate symbol. So very often we use things like P Q R
and so on. But we can also use things like brother friend all of these would stand for
predicate symbols essentially. So, basically a set we define a set and the language that we
are talking about will use those symbols essentially. Then so this is P F is the set of
function symbols typically we use small f small g small h or f 1 f 2, but we could also
use things like plus father and so on and as you can guess from this plus in father that
function symbols basically denote functions in the domain essentially.

Then there is a set of constraints symbols it could be c 1 c 2 or something. So, it could be


something like for example, 0 which stands for 0 or which could stands for an empty

837
string any anything which is in this thing. So, we are talking about the syntax of the
language the any first order language is defined by these 3 sets; a set predicate symbol a
set of function symbol and a set of constraint symbols what is the mapping of this sets?
The mapping is defined by an interpretation function we will call is I and what this
function does is that it maps every P belonging to P to use for this moment a symbol p I
on the domine. So, we will just come to that in a moment essentially likewise every
function symbol is map to we will come to this in a moment a function symbol f I and
every constraint symbol is map to a constraint symbol t I. But in this case we can also
specify at this stage that it something which belongs to D. So the interpretation is an
interpretation of language.

The language has as part of it is consitauance these constraint symbols and what
interpretation tells you is at what does the constraint symbol stands for in my domine
essentially. So, this symbol for example, might stand for the number 0 which is a
element of a domine or this might stands for n p string or a symbol liked might stands for
example, which is we can treat to be constraint and so on. So, there are 2 symbols which
I have forgotten here which is this bottom in the top which is there in every language let
me talk about. So, this is the vocabulary of the language then we define what we call as
the family of terms sometime we call and set of terms. So, the language we are working
towards defining what is the sentences in our language so far we have only define the
alphabet for the language. And now, we are gradually working towards a set of sentence
is, but before we do that we need to define what we call is the set of terms. And the set of
terms let us call the T is defined as follows. That if for every edge which belongs to be
for every variable that is we have edges belongs to so in other words we are saying every
variable is the term then for every c belonging to T we say C belongs to p.

So, every constant is a term so what are these terms that we are talking about? The set of
term is essentially the set of object in the domine essentially. Basically when we say x
belongs to t we the semantics of that is that x A belongs D were x belongs to v x A
belongs D that is what we are saying a set of terms. So, x is a term here we are saying x
is a term here we are saying a cons here we are saying variable is a term here we are
saying constraint is term and by term basically we mean something in my domine
essentially. So, this C I also belong to D and x A is also belongs to D. So, one thing

838
which I have not at mentioned is that each of these predicate symbols or each of these
function symbols has associated with an arity essentially. So, they have an associated
with an arity which basically tells you how many arguments it can take? In the case of
functions, we will define them terms in the case of predicate we will define atomic
formulas.

But basically in both the cases they take a set of arguments and the arity tells you what
the arguments are so very often we write the arity below. For example, if we write arity 2
below plus it basically means it takes 2 arguments. If we take arity 3 below plus we say
that it is a function 3 argument essentially. So how many arguments it take essentially?
Now, we have variable system constraint system and therefore, every f n when I write f n
like this it means this says arity n belonging to my set of function symbols and a set p 1 p
n belonging to set off term. So, this is a recursive definition as you will see this structural
recursion which we offer new to define a language. The expression f n followed by t 1, t
2 t n, so we can take a function symbol of given arity then take that many agreements
which must be terms put them together and we get a new term essentially. So, for
example, I could say something like plus 7 6 if plus is a function symbol of arity 2 then
this expression stands for the term which takes to arguments.

Now, you know that this plus for example, is mapped so for example, I might say
something like this that plus under the interpretation I maps to plus which I would have
call plus I. But let see more we're saying that plus map maps so this plus is and
arithmetic operator on the domine of natural numbers. Let us see and if 7 and 6 are terms
which map to respectively number 7 and 6 so which is off course very obvious for all of
us. Then these expression stands for the number 13 given the fact that I am interpreting
the plus as a addition symbol a symbol which stands for addition of 2 numbers and given
that am giving this 2 agreements which respectively stands for 7 and 6. Then in domine
if I add it 7 and 6 I would have got the number 13 and this term essentially stands for this
number 13. So, we can define this here that given a term it is mapping under
interpretation and assignment gives us a term which interpret function name which is
interpreted followed by every term who is interpretation I have to do.

So, the semantics of terms is that either there variables or they are constraints or they are

839
function symbols apply to an appropriate number of terms essentially. And what they
gave us is a term essentially. So, this itself is a mapping from D raise to n to D. Basically
a function symbol defines a term a function symbol also defines some mapping in the
domine which is from D raise to D I could have written here actually D raise in to D it is
a mapping from D raise in to D. So, it takes the argument n agreements and gives the one
value or one result and all the agreements and all the values they comes from the domine
essentially. So, plus for example, is D cross D to D essentially it takes 2 agreements and
it gives one agreements.

(Refer Slide Time: 30:02)

We want look at formulas eventually we want to move towards sentences. But before we
come to formulas so want to define this set F of formulas before that we define a set A of
atomic formulas. And this atomic formulas will some sense of correspond to the
propositional symbols in propositional logic essentially. So, first these things they belong
to atomic formulas then if P belongs to P is a set of predicate symbols and t 1 t 2 t n
belongs to a set of terms. So, t is a as an arty n then this P with arity n t 1 t 2 t n belongs
to set up so an. Atomic formula in first order logic which is also called predicate logic or
predicate calculus is made up of taking proposition symbol of arity n and n arguments
which must be terms essentially. Now, you must keep this in mind this is strictly what
keeps first order logic to define the boundary the first order logic what can what is the

840
sentences of first order logic and what is not a sentences of.

So, if want to say for example, the predicate name believes and if I say john believe that
the earth is flat. Then I can try and define a predicate which says 2 agreements as to who
is believer and what is the thing that is believed, in then the believer is john. And the
thing that believed is believed in is the flat earth essentially now, but flat earth itself a
relation. It is basically saying that the earth is flat essentially something which is true or
false essentially. So, this sentences john believes at the earth is flat is not a sentences in
first order logic. Because you can only give terms as inputs and cannot gave formulas as
the input that essentially. So the predicate symbol that we are taking about is basically
map to a subset so p I is a subset of d rise to n. If you were they were the subset is a it is
an relation on the domine D and the relation is this subset which has the which belongs
to this set essentially. Sometime logician talk about first order logic and they
distinguished from first order logic with equality. Some people do not include equality
inside the language, but some people do.

So, let us talk about the first order logic with equality here in which case we have one
more atomic formula. So, we have this top and bottom is atomic formula is an we have
taking a predicate symbol and put giving a appropriate number of arguments gives a
atomic formula. And now, if t i and t j belongs to a set of terms then t i equal to t j
belongs to the set of atomic. So, in first order logic with equality we have atomic formula
is which talk about equality as well essentially. So, we have an expression essentially so
which means when we talk about first order logic with equality we must have the
equality symbol also thrown in here. So, this is a set atomic formulas so before moving
on to compound formulas or our formula is in general let us talk about the proof values
of these atomic formulas. Because we want to talk about the semantic in terms of which
formula is are true and which formula is are not true. We are so remember this truth or
valuation is a mapping to this set of true or false sentences.

So, one thing we always do that we always map this true false and we always map this
true and then we map atomic formulas of the first kind. So, I will note write and so it
basically let assume it is implicit t n to true. If and only if the corresponding which is t 1 i
A t 2 when I write i A on the top it means by applying the interpretation and by applying

841
the assignment if there are any variables inside. So, this is off course something which
we are familiar with formula is in this notion. That a predicate like this is true if I say
brother ram laxman and I say brother ram laxman maps true. If in the real world ram and
laxman the pair belongs to the set up pairs which define the set of brothers essentially. Or
I can say that 7 less than 13 is true if in my domain I define the less than relation let us
say I define in to then 7 coma 13 belongs to the set of pairs which define the less than
relationship in this we are familiar with. The second kind t 1 t i equal to t j this maps to
true if t i and stands for the same element if they stand for the same element essentially.

So, if I say the prime minister of India and if i say manmohan singh then I say these 2
terms are the same. In other words manmohan singh is a prime minister of India if in the
domine they map to the same person essentially so that gives us the set up atomic
formulas. Now, let us talk about the set up formulas many people use the term formulas
some people use the term formula. But I think the more moderns style seems to be
formulas essentially. So here first of all we borrow everything we do in proposition logic
which means we define things like alpha and beta alpha implies beta and so on. So,, this
is like in P l exactly we borrow there so if alpha is formula and beta is a formula then
alpha in beta is formula essentially. So, you must suppose start of by saying that all
atomic formula is formula essentially and then you can use logical connective so
construct more formulas. But we also have formula is of this kind that for all followed by
variable name followed by a formula is a formula.

And then exist followed by a name followed by a formula they both belong to the set of
formulas F that we are talking about. So, this is something new in first order logic which
is that you can. So recall that we just made an observation that atomic predicates they
correspond to proportional symbols into proposition logic. In fact, you can think of
proportional and proposition logic as a predicate with 0 arguments essentially. So, there
nothing to follow after that it just becomes to proportional symbols you cannot break it
down in to future where as the predicate breaking down in it into further. So, if I say
Socrates is a man we treat it that as a proposition earlier. But now, we would talk about it
as man Socrates that we are breaking it down. And saying that man is a relation of arity
one which is defines over the set and basically Socrates belongs to this. So, man is a
subset of the elements of the domine and Socrates is 1 those elements essentially. So the

842
atomic formula is define here and the of compound formula is define exactly like we do
in proposition logic that we use. In fact, this is what gives a definition to what does this
symbol and mean what does the symbol implies mean and so on.

And then we have this new formula is which say that alpha and there exists x beta
essentially. So, when these formulas are is true? So,, we say so let me write it here for on
x alpha on the way interpretation and an assignment maps to true if for every assignment
B that is an x variant a. So, let me first treat this out we are saying that a formula for all
like self under an interpretation an assignment so what does the interpretation do?
Interpretation defines all the predicate symbol and the function symbols as you what the
stand for an assignment tells you what every variable is being map to in the this thing.
We say that such a thing is true under a given interpretation and an assignment. If for
assignment B so we are talking of other assignment know what are assignments?
Assignments give map variables to elements in the domine that is an x variant of a so
when we say an x variant of a we mean that the assignment B defers from the assignment
a only in the mapping of these variable x.

And all other variables it maps identically as in a. So, b maps everything like a does
except for x which it maps differently. So that is call an x variant of a so for every b that
is an x variant o f a. In other words x can take any value the formula now without the
quantify alpha I under that assignment B map to true. Likewise the exits x alpha under i
A maps true all this is a same except that instead of every we have some. So, for
example, if I have a sentences so let us say that there is only one variable in a system so
far, but and let us see we are talking about natural number. Then if I say for all x x
greater than equal to 0 I say that such a sentences true under an assignment a if for any
value that I plug in for x this part x greater than 0 greater than equal to 0 will be true
essentially. You can do things like this for all x x greater than y.

And then we can say that in this assignment a y is equal to something. That is a little bit
complicated we do not want to get into there and that is not the sort of thing we normally
do so let us not try and do that even essentially. But the cuts of the matter are that a
sentence quantified by for all x would be true. If you plug in any value for x and the
sentences becomes true and you remove the quantify. Likewise something like their

843
existence x such that even x is a sentences which would be true. If we can find some
assignments which means some value of x which make this second part true. So I can say
even 4 for example, because even 4 is true I can say there and x even number is true
essentially. This is a sentence which will have not decide define what is the sentence?

(Refer Slide Time: 45:50)

But to define a sentence first we have to characterize variables in to 2 kinds variable x


can be either bound or free. So, a variable is set to be bound if it is quantified essentially.
So, if I say for example, for all x p x y were p is some p is some predicate and x and y
are variables I can say that x is bound, but y is 3 essentially. So, if variable has a quantify
then it is bound and if a variable does not have a quantify then it is free. So, if I say
something like this for all x p x y or exists y q x y something like this. Then you can see
that this is bound this is 3; this is bound and this is also bound why? This occurrence of y
bound, because it comes it in the scope of quantifies which is defined by the bracket
around here. And this occurrence of y goes not have a quantify so this occurrence y is 3,
but x is quantified by x in likewise that x is quantified by this end, because they come
within the larger brackets. So, a sentence is a formula with no free variables this is
definition of a sentence.

So, it must first be a formula well form formula to all those definitions that we have

844
defining this formula set formula is set of atomic formula and so on it must be a well
form formula. So, notice that this is also well form formula, but it is not a sentence
because it is got a free variable inside essentially. So, a sentence is the well form formula
which we are calling as formulas without any free variables and the intuition behind this
is that. Because every variable bound which means every variables assignment is
controlled by some quantifier we can talk about the truth value of that variable
essentially. So, this for example, we can map to true or false so all of you will agree that
this is true this is true. But if I said something like for all x even x it is a sentence it has
got only one variable and it is something which is not true.

So, I can say that this is not fall this is not true, because the definition of the truth value
according to this definition it says that for I plug in any value of for x and then even x
must be true. Now, if i plug in 3 for example, when even 3 is not true so there for the
sentence not true. So, if I say something like this for all x; x greater than y now this is not
a sentence. Because it has got a 3 variable which is y and you can see that we cannot say
whether this sentence is true or false. Because why could be anything essentially, but if I
said something like this there exist y such that for all x x is greater than y then I can say
whether the sentence true or not will that. Of course depends on the domine and the
interpretation function that we are talking about so a sentence something to which we
can assign a truth value sentences are like this which can be map to truth values. A valid
sentence is true under all interpretation you can choose any domine and any mapping for
the predicates symbols and function symbols and the sentence will be true.

So, we can see sentence like x equal to x or P of x implies p of x will always be true
irrespective of what domains be true. Satisfiable instead of all we say some and
unsatisfiable instead of some are say none. So, with this have a basic machinery for
working with first order logic we have define the language start it off by a alphabet then
the set of terms then the set of atomic formulas and then the set of formulence. And the
finally, the set sentences and we defined how to assign truth value to each of them. And
we can now talk about this whether the sentences valid or satisfiable or unsatisfiable. We
will upon this part, little bit in the next class when you meet will stop here.

845
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Lecture - 46
Reasoning in FOL

So, let us continue with the study of FOL. In the class, we defined the language, essentially
the syntax and symmetries. Now, let us look at how you can do reasoning or inferences with
possible logic, but before we do that let us just recap that, a language is defined by the sets of
RFC. So, let us say that r is a symbol called g and very small language, F is a symbol a 2
symbols, let us say one is called s and one is called c and c contains of one symbol, let us s
say is called e. So, we said that we defined a mapping I to define the meaning of these 3 sets
these things essentially.

(Refer Slide Time: 1:43)

So, this mapping is called interpretation mapping, but we also call an interpretation that is use
the symbol I is defined as a domain and an interpretation mapping. So, an interpretation of a
set of sentences or a language is in terms of the domain about which, the language is making
statements and a mapping which tells you what does is predicate symbol mean, what does
each function symbol mean and what does each constant symbol mean essentially.

So, whenever a sentences alpha is true under an interpretation and we defined the symmetric
in the last class. We say that is interpretation entails alpha. So, it is a notation that we use and

846
we talked about the motion of valid sentences. So, a valid sentence is something which is true
in all interpretations, which means you can choose any domain and any mapping function and
the sentence will be true. So, basically the set of tautologies that we can talk about. So, for
example, the system that we discussed earlier.

Satisfiable sentences are those which are true under some interpretations and unsatisfiable
sentences are those which are never true. So, an example of unsatisfiable sentences would be
for example, x not equal to x. So, such a atomic formula can never be true essentially. I think
which we can also read as write as x which is the same thing.

(Refer Slide Time: 03:37)

It is just a short form that we have used to write in. So, given a set of sentences s we say that
a interpretation which means a domain an interpretation is a model for s. If every s belongs to
s interpretation imply this sentence s. So, if you write a set of formulas or sentences in a
given language and then you get find the domain and an interpretation which makes all those
sentences true. Then we say that we have found a model for those set of sentences. Logics are
always tried to say this is the logic system and does it have a model essentially.

You means can you is it true in some domain and some interpretation it is just to for the sake
of complete lesson. We not really go into too much detail there. So, let us look at this small
example of a small language which is got one relation symbol which, we have called g and
two function symbols we have called s and c and one constant symbol, which we have called
e essentially. Now, we can have one interpretation I 1, which does the following.

847
(Refer Slide Time: 05:24)

Let us say domain is equal to national numbers and I can say my constant. So, this is the
mapping I maps to 0, the number 0. This maps to successor plus 1, this let us say this is of I i
t 2 and this is of ii t 1, this maps to plus or multiplication it does not matter some binary
operation on this. This map to greater than let us say. Then I can write statements for
example, I can say successor of e greater than successor of successor of e. I would use this as
the prefix and then said the arguments are, but I used so this variations.

(Refer Slide Time: 07:13)

So, this is the predicate symbol and these are 2 arguments, which are terms and the terms are

848
used constructed using the function symbols and the constant symbols here. They are no
variables in my sentence, but I can add variables 1 of them could be. So, I could write for all
x greater than x e. I am not talking about the truth value of these sentences. I am not claiming
that this sentences is true or this sentence is true. Always saying is that we can interpret what
does the sentence mean and the meaning is given by a choosing a domain and choosing an
interpretation mapping.

Then any sentence in my language here which I have define the syntax, the sentence like this
is essentially saying as we all know that the successor of 0 is greater than the successor of
successor of 0. Of course, it is not a clue set will, but that is what it saying. So, we can
understand the meaning of the sentence and this is saying that every x is greater than 0, which
may or may not be true or which is not true, but that that is not the point. The point is not we
are not talking about the truth value here, we are talking about the meaning of the semantics
of the syntax.

(Refer Slide Time: 08:27)

As the same time I can choose a different interpretation which is let us say streams over
alphabet a and b and then I can say that this now this is i 2 maps to empty string, let us say
lambda. This maps to concatenation, this maps to successor in lexicographic order and this let
us say maps to longer than.

So, the point I want to really emphasis here is that a language is one thing and what it means
or what is the semantics of those sentences in that language is totally another thing. It is

849
depended upon the interpretation. So, a same expression like this which is defined in my
language would mean one thing if I am talking about numbers. It would mean a different
thing if I am talking about strings, it could mean a totally different thing if I am talking about
people. For example, I could map these things to some relation between people. So, that is
one point which, I really want to emphasis that and this is something which is very central to
in fact writing programs as well as the study of logic.

(Refer Slide Time: 10:03)

So, we cannot stop emphasizing the fact that everything that we do in logical reasoning is
based on forms essentially. It is not ma based on meaning essentially any proof procedure
that I have, if you remember it says I have a rule set of rules of inference, I gave a set of
premise sand I keep adding new formulas. It is purely based on form it does not matter what
we are talking about numbers or they are talking about strings or they are talking about
something else.

850
(Refer Slide Time: 10:47)

So, the logical system is formal and the logic is basically captures valid forms of reasoning.
So, this we have talked about earlier as well, so it is it says that when is an inference or sound
inference, which we call as deduction essentially. How do we define deduction? Logic is
basically concerned with making sound inferences and we have seen as to when can a rule of
inference we sound. For example, it is based on tautological implications or tautological
equivalences and so on. If you build logic around sound rules you will get a sound logic it,
but everything is formal essentially.

(Refer Slide Time: 11:38)

851
So, what is the reasoning mechanism that we can use in first set of logic. We already have a
set of rules that we have spoken about. So, for example, still applies it still says the same
thing that if we have alpha, if we have alpha implies beta, then you can derive beta. Thus,
same rule we can carry forward to first set of logic because it is only talking about, how to
interpret this implication sign essentially even that we have essentially. It not really worried
about things like variables and so on.

So, it does not matter what alpha beta is. As long as they are sentences this rule applies
essentially, but we also need some new rules of inference to take care of the fact that we have
talking about quantifiers. The most important rule is call universal instantiation and what it
says is that if you have. So, I will write it as top and bottom if you have a formula of the kind
for all x and anything else inside this, but which contains an x essentially.

(Refer Slide Time: 13:11)

So, let us say an alpha with some x somewhere essentially, it could have other terms also. If
you have a formula of this kind, you can have a formula of this kind alpha which the term
replaced by a where a belongs to a set of constants. So, what we are saying is that you can
always instantiate a universally quantified sentence to a sentence, which uses the constant
essentially. So, let us see an example of this. So, if you go back to the argument that we were
talking about.

The argument was all men are mortal, Socratic is the man and you want to show that Socratic
is mortal. We had made an observation that we cannot do this in propagation logic and we

852
need something more expressive almost stronger to talk about this essentially. Now, we have
the machinery for doing that. So, let us first translate these sentences into first logic and this
is an exercise which you must have done i am sure at some point of time, but if you not done
you must practice a little bit, but you can translate this into a language as follows.

(Refer Slide Time: 15:15)

For all x, man x implies mortal x where, man and mortal predicates. Since unary predicates
they basically define a subset of the universal discourse of the domain. What this statement is
saying that, if any x belongs to this or any x satisfies the predicate man, it must satisfy the
predicate mortal x essentially. In other words if x belongs to man interpretation, then x
belongs to predicate man. So, what is this man interpretation?

Basically, it is a set in my domain subset in my domain which defines unary relation of man
essentially. So, when you say man x we say that x belongs to the set of thing, which is call
men. So, recall that unary relation is basically a subset relation essentially of the of the u
universe of discourse essentially. You can see the same thing as saying that man is the subset
of mortal i. So, this statement is essentially saying that the set of men and here off course we
mean set of human beings is the subset of the set of things which have mortal essentially.

That is just set in English language we say that all men are mortal in FOL we will write it like
this, but the semantics of that is that the set which corresponds to the set man. The mapping
of this is a unary relation this is a subset you see call man I and essentially we are saying this.
This sentence we can map to man Socratic and we want to show that mortal Socratic. Now,

853
with this rule of universal in sensation which we often abbreviate to UI. We can produce the
formula which is very useful for us which is that I can instantiate this formula by substituting
x equal to this. So, off course I should have clarify here the belongs to set constants. So, that
is a inference step of UI and then off course this is the inference step of mortals followings.

(Refer Slide Time: 18:37)

Now, having used universal instantiation I have a formula which is very nice because it is
saying alpha implies beta and then I have alpha and I can use which is this essentially. Again
to emphasis the fact that it has nothing do with the factor it is man or it is mortal, it could be
all students have bright or all leafs are green or all birds are small anything.

Any such statement if you have, if you accept the first statement, if you accept the second
statement, we must accept the third statement that is all logic is telling us. It is form of
reasoning this form of reasoning is valid essentially. If this is true and this is true this must
necessarily will be true that is a lotion of ((Refer Time: 19:19)). What motion of proof is?
This gives us a way of producing the sentence syntactically without looking at the meanings
of things essentially.

As you can see if you want to talk about mechanizing this process, then there is a little bit of
guess work involved there. What is this value of that? I should substitute essentially. Off
course we look at this whole picture and we can see that a must be Socratic in that case of
that x must be Socratic. Off course I will have this useful thing, but never the less if you are
doing the forward search can of a process and you are saying I have all these universally

854
formulas, how do I proof something which means you have to keep.

So, remember that the basic mechanism is pick a rule pick a piece of data for which the rule
is applicable and apply the rule essentially. So, this universally statement could be applied to
100s of people. Every element in my database I could say Socratic is mortal, Ram is mortal
whoever essentially, anybody you give me a name and I will say he is mortal that is why I
could prove that thing. So, this off course should trigger this thoughts about forward
reasoning versus backward reasoning for you and we will look at that little bit, if not today
then we will tomorrow.

The same kind of strategy is come into play. When you do forward reasoning you have a
choice of applying so many rules of inference so much data that we have whereas, if you are
doing backward reasoning, then you know what you want to show true, but the thing is that
we need a role of inference which allows us to. We need to be able to use backward
reasoning here.

So, if you remember as you saw during planning, backward reasoning, backward state space
search for planning and into some kind of problems essentially. So, are they similar problems
here, we will have to see. Now, for a sake of completeness there another rule which is called
generalization. What this rule says that, if you have a formula of this fine alpha a, you can
replace it by a formula of the kind there exist x, x.

(Refer Slide Time: 22:03)

855
So, if I have the statement in my database, which says is Ramesh is bright, then I can
generalized to say that their exist someone who is bright essentially. Why do we need this
because sometimes the goal may be to show that there exist something, something, something
essentially. So, an example which is given in one of the text book is as follows. It says that
you are given the following facts on a b on b c green a, not green c. So, this is what is given
to you the set of premises, the set s of sentences that a is on b, b is on c, a is green and c is not
green.

(Refer Slide Time: 23:22)

Your goal is to show in exist on x then exist on y that on x y. So, the goal is we asking
whether this formula is true. So, this is given to us, a is on b, b is on c and you can off course
do any interpretation of the set of formulas. The most natural interpretation which comes to
main is the blocks follow interpretation and where green stands for a color. So, let us accept
that, but the thing is to show that this formula is true, we are not going to rely on the meaning
of those sentences essentially.

You want to show that this formula which is there exist x exist y such that on x y and green x
and not green y these true. So, what does this formula true or not true? First of all what is the
intuition whether it is true or not true and secondly, off course the question that we will ask is
can we prove it. Remember the motions of soundness and completeness that does a logical
system prove every true formula essentially.

So, first let us let me ask is this formula true or not true. You people have done the career

856
course should not answer what is the intuition says? So, let us look at the meaning it saying
that there is a block on another block and the block above is green and the block below is not
green. Given to us is three blocks the top most block is green, the bottom most block is not
green nothing is said about the block in between.

So, I will leave it as a small exercise for you to work out, but this is the kind of thing you may
want to prove which is while sometimes you may need a rule which say something like that
essentially. So, this is the process of forward chaining we are looking at. So, forward
chaining says basically stringing together a set of facts by moving from the left inside of a
rule where, we are using more exponents. So, the right end side given alpha, you are chaining
into beta then given beta there is beta implies gamma then you chaining into gamma.

So, you moving in the forward direction essentially what is given to you and what can be
derived. You move in the forward direction and this what we have doing here essentially.
Even this formula we can derive this formula, given this you can derive this. So, you have
moving in the forward direction. The first thing you want to do is avoid this guess work. You
do not want to use this rule of universal because if you go to use it directly, then you would
end up during lot of guess work as to what should I instantiate the values of the variable tool
essentially.

It is clear if you look at this formula is that, you know here we are talking about x here we are
talking about. So, it make sense to somehow say that I am instantiate x to because while
looking at both this formulas. So, the difference is the universal instantiation rule only looks
at this formula then says I must produce this whereas, if you also look that is then you would
know the you have to produce this essentially.

So, let us define a new role which combines these 2 steps into 1 step, but to that let us first
modify our representation little bit to make it simpler for us. To do that we define something
in what is called an implicit quantifier form. So, we want to express our formulas without
actually lighting down the quantifies because remember that if you going to write programs
to process these sentences, then you have to in worry about how to interpret the quantifies
and so on. So, you will have to parse them in you know construct quantifies and then take
them as a stuff.

857
(Refer Slide Time: 28:30)

So, instead what is normally done is to express the quantifies implicitly and for today’s class
we will only look at the universal quantify because that is all we have here. We are not worry
about the existential quantify in the next class for a universal quantifier. So, if x is universally
quantified.

(Refer Slide Time: 29:09)

So, for the moment let us assume that we know the nature of a variable. So, in our logic that
we have defined there only 2 kinds of quantifiers and in the sentences there are no free
variables. So, start with there are only bound variables and variables are bound either by a

858
universal quantifier or by a existential quantifier. So, let us assume that we know the nature
of the quantifier and we can somehow say that is a universally quantified variable, which is
easy in sentences like this, but in more complex sentences we will see later it is not.

So, let us assume that we can do that than in the implicit quantifier form, we simply place it
with x souse the same symbol x. So, this is just a convention which says that I am going to
use the question mark to stands for a variable, which is universally quantified, which means I
will now rewrite this whole thing as follows that man x implies mortal x.

(Refer Slide Time: 30:29)

So, instead of writing is like that I am writing like this, but it does not mean that i have
changed the sentence, the sentence is still the same except that the universal quantifiers
implicit. I have indicated it in my sentence by putting a question mark before the variable,
which tells we that this variable all these occurrences of this variable has one occurrence of a
universal quantifier before the sentence is implicit. It is their like in that, but it is implicit
essentially.

You can see that this is basically motivated to simplify the processing that you do when you
write a program to do forward chaining essentially. So, given this and given this you can see
that it is becoming a little bit simpler to do this processing. Somehow if I could have a
version of more exponents, which did not require an exact match, but which allowed us to
substitute anything for a universally quantified variable, then life would become simpler. I
would say I am trying to match this with this, how can I do this?

859
(Refer Slide Time: 31:53)

If you can simply say x equal to you can do it and then this will become like the step of
universal instantiation. Again I would like to emphasis, I am not doing it independently first I
am now looking at a modified version, which will have this built into it essentially. So, in
some sense I know the target to which I want to instantiate because I have this fact here
essentially.

So, this rule is call in that calling modified and the rule is as follows. It says that let us say m,
m p and it says that from alpha prime and alpha implies beta, beta prime where, a substitution
let us say theta applied to alpha gives alpha prime. So, we are saying the substitution is a
function which when you applied to any formula, then it replaces some occurrences of
variables in that formula with other occurrences essentially. So in fact, when applied alpha
implies beta, it gives you alpha prime implies beta prime or it gives you alpha implies beta
whole prime.

What is the substitution? A substitution is basically a set of variable value pairs and in our
case, the variable is x and the value that is Socratic that is the substitution we are interested
in. What does it do? It says that when I apply theta to alpha implies beta I am looking at this
specific example. In general if you apply theta to any formula this is just one kind of a
formula which we have interested in. It gives you alpha prime implies beta prime and how
does it give you that it substitutes.

So, what is the substitution telling we it saying that you substitute every occurrence of x with

860
Socratic essentially. So, if I apply the substitution to this if this is my alpha implies beta, then
my alpha prime implies beta prime is this. So, this is alpha implies beta and this becomes
alpha prime implies beta prime under the substitution theta is equal to x.

(Refer Slide Time: 35:39)

So, how we are saying is that the substitution tells you what to substitute for the set of
variables named in the substitution. In this substitution only one variable is named which is x
and a very simple substitution is given which is, but it could be anything else. I could
substitute any for example, arbitrary term for it I could say the grandfather of which means
let us say the father of mother of Socratic. That is a term remember that is also a term I could
have substituted this in my formula, then I would have formula which says if grandfather is
the man, then grandfather is mortal.

I would have got that statement instead the substitution that I am doing here is which for x.
So, I am getting this statement that if is a man then is mortal that is alpha prime implies beta
prime. I already have alpha prime here which has got inside. So, now I can apply more
directly essentially. So, this whole 2 step process is collapse into one step process in this
modified exponents rules where, thus instantiation is taking place somewhere inside by
means of a substitution.

It is saying that if you given a formula alpha prime and a formula alpha implies beta, if you
can somehow found find the substitution which will make these two equal. So, the technical
term you uses the unifier, which we will study in the next class, but for the moment just let us

861
think of it does a substitution, which will make this formula equal to this formula which is
what the universal instantiation step is doing.

So, the technical way of saying that is if you can unify alpha prime with alpha and you can
always unify by means of a substitution, which means you make them look the same. Then
you can imply beta prime directly, which means apply the same substitution to beta. What is
beta here, mortal x man x implies mortal x. So, this is alpha and this is beta and this is alpha
prime. I can make alpha prime and alpha the same by saying applying the substitution x equal
to this. This rule says apply the same substitution to the right hand side of the expression and
directly infra mortal.

(Refer Slide Time: 38:50)

So, this step is, this is the modified rule that says that I can substitute directly here jump to
this conclusion in some sense without having into go through that step of instantiation
explicitly, but this is a simple example. It has unary predicate with only one argument and so
it is easy to understand. In general modifies says that given any alpha prime and anything of
alpha implies beta.

In fact, I do not need have to call it alpha I could call it gamma. I can unify gamma with
alpha, but it traditionally we see alpha prime in alpha. If you can somehow make them the
same, then go higher and make the inference and apply the same way of making them the
same to the conclusion as well essentially. So, apply the same substitution to beta to get beta
prime.

862
So, if I have a simple database and by simple I mean, I have only statements of this kind
something x implies something x. So, b b x implies 2 x which one let us say there is only 1
variable could have more than 1 variable essentially. So, you could have a statement which
says friend x y implies friend y x essentially, then I could say Suresh is the friend of Ramesh.
I could infer that Ramesh is the friend of Suresh essentially by applying this rule. So, if I have
a simple database of the kind where, I can apply modest exponents rule then I have a simple
mechanism for keep applying rule repeatedly till i generated formula that I have essentially.

Let us assume we are using only modified modest exponents because our data is of that
nature, but still the question of which facts to apply to is still open. You still do not know
there is no sense of direction saying that this is an inference I should make. So, the same
problem of huge branching factor in the forward direction exist in forward chaining
essentially. So, can we do backward chaining. So, what would backward chaining be like. So,
let me just give you a hint here and we will take it up the next class.

So, backward chaining needs to distinguish between statements which are there in the
database. Let us call them facts and statements which you want to show to be true essentially.
So, forward chaining is simple you have a set of facts given to you which includes no rules
and statements like all these kind of statements. You can keep adding new statements and you
can terminate when the required statement is produced essentially.

If I want to do backward chaining what is the mechanism that I have been used to do
backward chaining. So, let me take the same rules, but I want to express something like this.
To show what I am trying to highlight here is that you need to distinguish between what you
have and what you do not have.

863
(Refer Slide Time: 42:46)

What you have is the set of facts, what you do not have is something you want to show to be
true. So, we could use a marker this is only for programming purposes. It is not part of the
language. It says that if you have a goal beta prime and if you have a rule of the kind alpha
implies beta. So, we call this a rule left hand side, right hand side, then you can replace the
goal of showing beta prime with the goal of showing alpha prime essentially.

If you implementing this we will need to keep the facts that we have in one set of formulas,
which are separated from the goals that we are trying to show. So, we can do that simply here
on the board by putting a marker called show. So, when you have marker called show it
means, it is a goal it is not something which is true. It is something you want to show to be
true and this is saying that to show that beta prime is true, if you have a rule of a kind alpha
implies beta and you can do this unification process.

So, you can find the substitution then you can reduce it to a sub goal of saying show that
alpha prime is true or does a translate to our problem here. It says that if you want to show
that Socratic is mortal and if you have a rule which says all men are mortal, then substitute or
replace the goal of showing that Socratic is mortal with the goal of showing that Socratic is a
man. This is the process of backward chaining.

So, from goals you are moving to sub goals essentially. Now, you can see that this process is
going to become a little bit complicated. If my left inside had more than one statements. So
for example, you can define a grandfather by saying that x is a grandfather of y, if x is a

864
father of z and z is the parent of y, you could define it like this. Now to show that x is a
grandfather of y, you would have to show both those things that show that there exist a z
whose father is x and there exist I mean that same z is a father of y essentially.

So, grandfather x y if father x z and p stands for parent z y. In the forward direction I would
have this fact somewhere that says Peter is the father of marry and marry is a mother of John,
then I can show that Peter is a grandfather of john by going in a forward direction. In a
backward direction I have to put and here. So, you can reduce the sub goal to these 2 sub
goals, but there is and here essentially.

May be there is another rule which says that grandfather of x y. So, let me say instead of
parent I have mother here father x z father z y I have 2 separate rules. This rule says that x is
a father of father of y, this rule says that x is a father of mother of y essentially. For some
reasons I have these 2 separate rule, then you can see that to show that Peter is a grandfather
of John I could either use this rule or I could use this rule. So, you can see that it is becoming
an AND-OR tree essentially.

You could either use one rule or you could use another rule essentially. So, backward
chaining has this complication. You know that how do you get that to think like this. So,
essentially it maps to AND-OR tree and we will look at that in the next class essentially.
Essentially, what backward chaining is saying is that from a goal you can move to a sub goal
essentially.

So, I will stop here and in the next class we will look at this process of finding the
substitution here and there is a very nice algorithm call unification algorithm which some of
you must have studied for doing so. We will see how this kind of AND-OR trees is a tackled.
We will also briefly mention as to this kind of backward chaining process is what really this
language prolonged does. Just to complete the course may be we will look at the resolution
method in first set of logic and the motivation for that method can be this problem that we
have here.

865
(Refer Slide Time: 48:21)

So, let me reveal to you that this sentence is indeed true, but it cannot be proven either by a
forward chaining or backward chaining. Those methods are not complete, but resolution
method we showed this or at least we talked about it. In proportional logic case also it is a
complete method essentially which is why it is so attractive essentially. So, we will end the
course with resolution method for first set of logic, which is in the next two classes I think.

866
Artificial Intelligence
Backward Chaining
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Lecture - 47
Backward Chaining

(Refer Slide Time: 00:15)

So, we are looking at FOL and in particularly we are looking at reasoning aspect. We
shall not have the time to look in to the knowledge representation aspect too much here.
So, we will assume some simple representation schema which means the choice of the
functions at we are talking about a set of predicates a set of functions and the set up
business.

Now, we saw 2 rules of infants; 1 was a universal in sensation we said that a from for all
x and this is an instance of that. You can p of a and then we saw generalization it says
that from p of a you can reduce the x p x. So, in case you have a query of this kind we
saw an example of that earlier. Now, in addition to this there are also some rules of
substitution. So, for example, you can replace for all x p x with there exist x naught p x
and likewise it exist naught exist x p x equivalent to for all x naught p x. So, these are
kind of commonsense rules. So, if you see what we are saying here we are saying that if

867
it is naught the case that for every x from property p or some predicate p is to it means
there must be some x for which p x is naught true essentially.

So, if you move the naught across a quantifier it changes the nature of the quantifier. If
you move a naught a across a universal quantifier it becomes an existential quantifier.
The rest of the expression does naught change it is like you are moving a naught inside,
but changing this likewise if you change the move are naught across the existential
quantifier it becomes a universal quantifier. So, again if you look at this; this is like
saying that for example, if p x stood for something which is both even and odd. So, this
left side is saying that there does naught exist an x which is both even and odd which is
equivalent to saying that for all x which the case at they are naught even and odd. So,
you can move them across on both sides. So, these are rules of substitutions which I
quite useful in some situations we will see some of them. And you might be familiar with
them they are known as de Morgan’s laws. Also something which are use some times
that is that for example, if you have for all x for all by p x y there p some predicate this is
equivalent to for all y for all x p x y.

So, the quantifier of the same kind you can change without changing the meaning of the
sentence. Likewise for there exist there exists y there exist y; there exist x they would be
similar, but if the quantifier of different kind then you cannot do that. So, if you say for
all x there exists the y p x y this is not equivalent to saying that there exist a y thus that
for all x p x y. So, you can try and think of a counter example to show that this is not the
case. So, for example, for every number x there exist a number y which is bigger than x.
So, p stands for bigger than or greater than then this statement is true whereas this
statement say that there exist a y which is greater than every number. So, which is
obviously not true essentially So, you cannot switch 2 quantifies of different kind if you
do switched and the meaning change as the truth value also will change. And you are
talking about something totally different here and something totally different here.
Whereas, here there is no difference if they have the same kind then you can switch the
quantifiers and it does not change the meaning.

Here we are saying that if you if you have a negation sign you can move it inside
inwards towards the expression and what it does? It changes the sign of the quantifier

868
changes the nature of the quantifier. Universal quantifier becomes existential and
existential becomes universal, which is why I had mention in the last class at it is not
very easy to identify what is really the nature of the quantifier. So, for example, if I make
a statement a it does not let say we are talking about people and this statement is saying
that they does not exist a x who is divine. What is the nature of this variable x? Is it a
existential variable or is it a universal variable? And this becomes important, because if
you remember the implicit quantifier for that we discussed in the last class you can
replace a universally quantified variable with a question mark and then it become
simplistic.

So, is this a universally quantified variable or existentially quantified variable? So, the
way to look your answer is right your way the way to understand that is to push the
negation sign as much is possible. And then look at what is outer most quantifier
essentially if you push a negation sign inside it would become equivalent to saying that
for all x naught divine x. So, we are saying that everyone is naught divine which is like
saying that no 1 is divine essentially. So, this is the universally quantified variable. So,
we can replace at he want to put at in implicit quantifier form by saying naught divine x I
putting a question mark that is a cementation we are adopting between our sense. So, that
we do not have to write the quantifier we do not have to process the quantifier when we
are liking a program to do that we can just keep those kind of variables as universal
variables. So, just a word about what you do with existentially quantified variables when
you are talking about implicit quantifier from.

869
(Refer Slide Time: 08:18)

Let us forget a sentence like this there exist x even x. So, it is saying that there exit some
number which is an even number essentially. The way to convert is into implicit
quantifier form and the process is called skolemization after legation called skolem. So,
well solemn who for us introduce these ideas by replacing it with a constant even s k 12.
So, conventionally we may say we use a name s k this is us again between us it does
naught impact the meaning of what we are writing it is a constant. So, we so what are we
done? We have remove the quantifier and replace it we replace x by something called s k
12. So, s k again is the sin honors or skolem you could have use any constant as you as
far as you remember that it is a constant which is been introduce in this process and it
must naught be a constant which is being use anywhere else. So, you cannot say a for
example, 0 if it is a constant or something like that you must use some unnamed some
new name and treated as a constant after that.

Because if you look at the meaning of the there is some number which is even and all
you have do saying is here that there some numbers which I am calling s k 12. And that
is even essentially, because you may naught really know what that number is, but you
can do that. If you have something like this or statement like this for all x there exists y p
x y they if you look at the what is the sentences saying the sentences saying you should
always read the quantifier from left to right that for every x there exist a y. So that p x y

870
is true essentially. So, for example, for every x there exist search y which is greater than
x essentially. So, that is an example that we mentioned in the last class we can this into
an implicit quantifier form by writing it as p x s k eleven x. So, what are we saying here
when we have a existentially quantified variable inside scope of a universally quantified
variable then we are replacing that variable with the function of the other variable and
the function is the skolem function.

So, this is the this is the skolem constant of x So, what is this function we do not know
what that function is it is some function and what we intent by the usage of the word
function here is that the value that the variable can take is dependent upon the value that
x can take. So, again if p stands for greater than and the implication and the meaning of
this is at y is greater than x because the meaning is always determine by us. It depends
upon the relation that you are talking about essentially then what we are saying is that for
every x you can choose a y. So, is that y is greater than x so that y that it choose depends
upon x and therefore, we can think of it as a function of x. We do not know what that
function is, but it some function as for as reasoning in concerned will treated as a
function and translate this into this statement.

So, this so in general off course if you have for all x 1 for all x 2 for all x n p x 1 x 2 y
then you will translated to p of x 1. So, I should put a question mark here because x is the
universally quantified variable x 1 so this y which is in the scope. So, there exist the y
here is in the scope of all these universal quantifier for x 1 to x n. So, this y become to
skolem function of x 1 to x n. So, in general you just look at what are the quantifiers on
the on this side universal quantifiers on this side and make that variable of function of
that. If I had something like this there exists x for all for all y p y p x y then I would
replace it with p s k 5 y. So, this entire process of converting a sentence into implicit
quantifier from now it is we call it as skolemization after this suggestions skolem
essentially.

871
(Refer Slide Time: 14:55)

So, in the last class, we had started looking at forward chaining we had looked at the role
of modified modus ponens mindset that if you have alpha prime and alpha implies beta
then you can produce beta prime from there. And if you move from alpha prime to beta
prime the process is called forward chaining. So, in forward chaining you have alpha
prime and then you can add beta prime in backward chaining you would not to show beta
prime. So, we use it is connection show beta prime so we have sub goal now show alpha
prime. So, backward chaining we just started looking at in the last class it works with
goals by goals. We mean something that we want to show to be true a formula that we
want to show to be through. And looks for implications of this kind alpha implies beta so
basically it is using mod modus ponens in a slightly different form. Or if you think a
little bit about this you can see that what backward chaining is the doing is this kind of
modus may be that will become clearer as we move forward that is a different rule of
infonants.

So, if we have our original Socratic argument which said that man x implies mortal x and
in our database. So, this is there in the database and this is there in the database which is
man Socrates. And off course, you may have as a date of birth less ignores that for the
movement and you want it to show that mortal Socrates is true. Then forward chaining
would apply this modus modified modus ponens rules and in this form and this form this

872
thing. Whereas, backward chaining we are not written the it in this form, but it says that
which show beta prime man with alpha implies beta you can reduce it to show alpha
prime essentially. So, show mortal Socrates how does it is works if you recall we match
this with this with a unifier x is equal to Socrates we will look at unification a little later.
So, we apply this unifier to this and according to this backward chaining process we
reduce this to show man Socrates. So, let us put wholes inside boxes so that we can
distinguish between goels and fact. So, this is a fact this is a fact this is a goel; this is a
goel and so on. So, what this off course, it severe way to simplify to solve the goal and
so see with that is present in the data base. So, in this example, when you have this goal
of show that that man Socrates is true then you simply look up the database.

And you will find that it true essentially will in the movement we will look at how the
language prolog is basically doing this which is it is doing backward chaining. And if it
if a fact is present in the database or in your program prolog program it is really true
essentially. But the nice thing about reasoning with logic is that we can ask a question
like is this formula true that it does there exist the y such that mortal y is true this is what
is given to us this is a database. And this is a database even this database is that satement
true that there exist the y so that mortal y is true. So, I intentionally use a different
variable here y and here an x here it is naught really necessary forming to do that.
Because when you look at this statement then it is saying that for there exist some y such
that pus that y is mortal. I could have Jolivel said there exist some x such that x is mortal,
but because we want to set off keep above formally spleen and a part and is a practice
that is necessary as we will see.

We use a different variable name which is a is a goal noise know if I if this is my goel
that show that this is a case. Then we invert that is skolemization convention when you
are doing forward chaining a formula like this a a universally a formula which has a
universally quantify variable is put with a question mark here. Whereas, in backward
chaining an existential variable is denoted by a question mark that is specifically in
backward chaining when you are talking about show when you are talking about goels
sorry for goels the convention is reverse. So, I will this as mortal y so again let be that
when I write show mortal y in this form this notation denotes and existentially contified
variable that is true only for variables which occur inside goels. So, inside goels a

873
convention is reverse essentially and the next class we will see why that makes sense but
you can try and guess.

So, I can ask a query existentially query so this is now becoming like little bit like a
database essentially activity. So, you have some set of acts available to you and you are
asking a query is there some entity with satisfy certain properties. So, if you are saying
somebody like something like is there in employee he wants more than 10000 rupees and
something and you get some results out of that essentially. The difference between
something like RDBA BMS and using login is that logic can make inferences on the way
essentially to retrieve answers for you. So, when you asking a question that is there
somebody whose mortal in the knowledge base or data base whatever you want to call it.
The knowledge base has only 2 statements in our small knowledge base that Socrates is
the man and that all women are mortal a knowledge base does not say that any one is
mortal any specific individual is mortal.

But backward chaining in particular and reasoning in login in general allow you to ask
existential queries like this. And answer those queries after a process of doing some
inferences making some direction. So, what do we do? We match this show mortal x
with the right hand side of our statement implication mortal x. So, we say x equal to y is
the substitution you want so this becomes mortal y. So, this case translated to show man
y so our query is about. So, it is still an existential query it is a easier someone whose
mortal gets translated into a sub query or a subgoel which is easier some whose the man
essentially now that can be answer by the database yes Socrates is the man. So, we can
communicate the query written the answer to this query by saying yes y equal to Socrates
is the answer to your question. So, in the last class, we had briefly mentioned that you
could define. So, let say you want to define grandfather or let say you want to define
grand pa grandparent.

874
(Refer Slide Time: 25:08)

So, you might want a statement like this a for all x for all y grand. So, let say g m stands
for grand ma x y and let say this means that x is the grandmother of y implies. Let say g
p stands for grandparent let say for some reason we want to we want to this put this rule
into our knowledge base or database, what you want to call it? So, what you are saying
here is a grandmothers are grandparents then you could have a role like for all x for all y
g f and the same thing. Then you could have a role which says a for all x for all y and for
all z mother x y and father z x implies grandfather z y. I could have a role of this kind
which says that for all x for all y for all z if x is the mother of y and z is the father of x
then z is the grandfather of y. So, you could imagine that you have billing a database of
relationships where you are defining how what is relationship means what is grandfather
abouts a grandmother abouts a grandparent and so on and so forth. And then you should
be able to ask a query you should give us database of acts. So, let say the basic database
only contents a mother child or let say parent child and the gender of each person. So, I
could say a Jane is a parent of tom and Jane is female and Tom is male.

So, I could have this kind of database. So, there is only 1 relation parent child
relationship and general relationship then you can define a mother at saying that x is the
mother of y if x is the parent of y and x is female. So, you could do all that kind of staff
we will naught get into the details, but you could have a knowledge base of this kind.

875
And then you could ask a question how is Peter related to Jane? For example, you know
then the system should find whether what is the relation between them? Off course, you
cannot ask this very generic question is to how it is related you can ask something like
whose Janes paternal uncle. For example, you could ask such questions essentially let
say we in talking about grandparents whose grandparent. And that is about to basic query
about then you can see that grandparent x y can be solve in 2 is that either you are a
grand pa x is the grandfather of y how x is the grandmother of y? Then you could say x
is the grandfather of y if you could use this as 1 rule which says that now you have to a
bit careful here a father x z. So, I am just using different name such to be consistent here
and mother z y. And you can imagine that there is a another rule which uses father and
father in both the places so that 2 possibilities here.

So, you can see what is happening here that is the space in which backward chaining
operates if you ask a query about something is Jane the grandfather of Peter then this
system will apply. So, in backward chaining you match with right hand side of an
implication and c is the left hand side can become a subgoel or you can match either with
grandparent So, you can match either with this rule or with this rule in both cases
grandparent will match. So, you could either use r 1 or 2 if you called 1 rule will take
you here another rule will take you to grandmother. So, either x is a grandfather of y or x
is a grandmother of y then grandfather if x is a grandfather of y. It could be that x is the
father of z who is the mother of y or it could be that x is the father of z who is the father
of y both are possible. So, you have all these question possibilities and the space that
backward chaining we have to search is that is an ando trees essentially and what prolog
dose is backward it does depth first search on that tree. So, let we rewrite this you know I
am skipping in the step of skolemization which in this cases simple because we are only
universally quantified variables. So, you can just replace everything with a question
mark before that if you were to write this in prolog you would write it as something like
this g p.

876
(Refer Slide Time: 32:10)

So, prolog is this the different convention pro see we are using a convention that
question mark stands for the variable and something without a question mark stands for a
constant. It should be a constant or existentially quantified variable which is the skolem
constant or something like that it does not matter. But that is a convention way using
prolog is this the different convention prolog uses the case that when you have x and y
uppercase letters then it is a variable. So, those of you why use prolog would know this
so let us stick to our convention which is to use a question mark. So, we write the
consequent first and then we write the antecedent. So, this same rule let us write it like
this so what is happened I have taken this rule and rewritten it like this are you I have
away the quantifiers I have convert it to an implicit quantifier form. And I have the order
in which you are writing in the normal rule you write there antecedents on the left hand
side and the consequent on the right hand side in this notation. I am write in the
consequent on the left hand side and the antecedent on the right hand side and I have
change the direction of that.

So, have use an arrow instead of that sign here. So, again those of you use prolog would
know that prolog this is something like this instead of the arrow sign, but it means a
same thing that is only a matter of convention. This is the easier for us to understand that
ther direction of implication is from right to left. Then I would write the second

877
statement as g p x y g m x y then g f x y then somewhere I would write a statement for g
m then g f which is what I have written here which is that g m x y if a mother x y prolog
is it is a coma instead of. And so we will also use a comma here the keep in mind that
this coma basically stands for in and so the same set of statement to either writing in a
different. We are not changing anything we have into implicit quantifier from and we are
writing it consequent on the left hand side and antecedent on the right hand side.

And we are replacing ands by comas and this kind of staff, but the statements are still the
same they are still the same universally quantified statements in logic. Somewhere down
there I would have let us say a mother Jane Peter sorry oh ya a x is the grandfather of y.
So, y is the mother of no sorry z is the mother of y and x is the grandfather of x is the
father of z thanks. Somewhere I would have a statement saying a mother Jane Peter for
example, and father Peter Jane and may be other fact also, you recognize this as a prolog
programme. I hope prolog programme a statement of these kinds now this is a restricted
form of logic, but we do not go into that the restriction here is that the consequent can
only be one predicate. You cannot have more than you cannot have alls inside here and
so and so forth, but you can recognizes as the prolog programme what does prolog do?

(Refer Slide Time: 36:55)

If I ask a query like is there or show grandparent x. So, a let us say Peter z or something

878
like that. The reason for writing things in this invited form is, because it makes a task of
matching simpler you always match with what is on the left hand side in you have a
programme and the right hand side is the step of making that inference. So, if I ask a
query like that is there someone whose peters grandson or is there someone to be more
precise whose grandfather is Peter. Then prolog starts looking from top to down trying to
match this with the things on the left hand side in this example very conveniently. It
matches the first element it is in so when it matches this it poses this as the goel.

So, at last whether Peter is a grandmother of some z essentially So, it will translate that
gole into the subgoel. So, backward chaining we said was moving from right to left in
this notation. So, in this notation it is moving from left to right. So, it basically goes from
goels to subgoel ask that as a new query and as you can imagine I should have something
like I have a I could use parent here parent here. And then a female Jane and so on and so
both I mean that is a data that I originally said we have the Jane is female Peter is male
and Jane is male and that kind of staff. This will get translated to g m grandmother Peter
z which intern will so essentially here going down. So, because I written this rule first
this grandmother rule first it is like having this rule on this side. So, it is going down that
path so you can visualize what prolog is doing as backward chaining and with a
particular strategy which is depth first. And the way it is implemented here is that the
first rule matches the that.

So, it will it is it tried this so just imagine that that tree was lift from left to right. So, it
would be going down the left side first. And then it would eventually with naught be able
to show that Peter is a grandmother of z it would back track. And then try this second
rule essentially which saying we go all the way down and then try the other branch.
Exactly as we had when we looked at goel trees with that first search first essentially. So,
let us look at another a example that we have considered earlier which is that of planning
and outing. So, if you recall when you are looking at a goel trees the we had this task of
planning and outing with a friend and outing consisted of some 3 things that 1 evening
out 1 a entertainment and followed by dinner. So, eventually you know you have to finds
values for these which your friend would be happy with.

879
(Refer Slide Time: 40:55)

So, if you are remember it was something like this that let say a let say let us call with a
birthday plan I do not remember what we had said that time. So, you have a birthday
plan made up of x y and z if you have an outing plane of called x and entertainment plan
called y in a dinner plan for z. And then you could say an outing plan is a valid plan if it
is an outing and likes that is a f stands for friend x likewise a other 2 plans entertainment
plan and a dinner pans. So, you could write it as a prolog programme you could say that
this is how. So, you could add f as a parameters so x comma f here comma f here comma
f here. So, let say f is f if stands for your friend and let say your query is a essentially
that. So, let me write the query here a what is the good birthday plan x y z and let say
your friends name is Peter and this is my query.

So, that question mark after this what backward chaining will do is exactly the same
thing that we did here earlier it would. So, this is birthday plan then it tries to find an
outing and entertainment and dinner or may be this is have entertainment. We are said
movie in the last class it does naught matter and below this would be ando tree outing let
say beach or moll and things like that. And entertainment could be some in movie a or
movie b or movie c and restaurant could be some restaurant d or e or f and below that.
off course, you have the facts like whether your friend f likes going to the beach some
statement like beach Peter it must be present in your in your knowledge base. Because

880
that is how will instead of asking in a friend you are saying that is there a statement like
that in my database already.

So, that attention that I want to draw to you here is to the same problem that we
encounter. Now, you must recall what does happening there you are searching this ando
tree and well initially we had pouse it as a simple odd tree. But what were saying is that
let say that you decide that you want to go to a go to the beach you ask your friend shell
we go to the beach and then you are doing a depth first search. So, you try all these
options 1 by 1 you say shell we go to this movie a or shell we go to this movie b or shell
we go to this movie c and then try all combinations of this. So, for example, a beach and
a and b then beach and a and e then beach and a and f and let say all fail. So, what is
happening here in backward chaining is that you are doing this subgoel promulgation.
And so the way prolog does is there it goes from top to down and left to right so given
trees of goel outing and entertainment and dinner. It will first try outing try to find a
value for that that say beach then entertainment.

And then try to find a value for that let say a and then dinner and try to find a value for
that d and at that point let us say it fails and say that no this is naught a good birthday
plan. So, it will back track and try e here so it is still be in this goal, but it would try a
different value. So, whenever back tracking happens it happens in these same directions.
So, when it is back tracking here it goes up 1 step when it is back tracking here it goes
up. Well, where is back tracking here goes down 1 step and it is back tracking here it
comes f 1 step. So, it tries the first option for dinner and the second option for dinner and
the third option for dinner. And everything fails you do naught want to do repeat this
same search again for a different option of b if you can figure out somehow that beach is
the. So, we had mention the term dependency directed back tracking there. So, in some
systems like in satisfaction system that is kind of done automatically the system keeps
track of what is the dependency? But in a system like this in logic or the implementation
of logic call prolog the language gives the user ability to control back tracking.

So, instead of writing a statement like this what you can do in prolog is to instead of this
you write the statement as follows that a birthday plan. So, let me just use this b x y z f if
outing plan then you use a special symbol cut then and entertainment plan any user

881
special symbol and then a dinner plan. So, instead of 3 subgoels you have added 2 extra
subgoels these are which special subgoels is prolog allows you. And so those you have
use it could know the this is the cut operator as it is called. And what it does is there is
basically a device given to the user or the programmer. If you want to say to control back
tracking and what it basically saying is that if you are going to back track from this to
this side. Then do not try a new value for this really jump back to the original goel. And
you have without going to the details I just want to point out that the cut feature of
prolog is basically use to control this huge amount of back tracking that one doing in an
unconstraint search.

So, what prolog does forward backward chaining is allows you to do is to allows it to ask
existential queries you can ask some query about is Peter the grandfather of somebody or
who is Janes grandfather or you could defined. And maybe that is a good exercise for
you to define the ancestor of so when is an x in ancestor of y. Then you could ask a
query whether you know Jane is an ancestor of someone or things like that and the
system will search in your database. It may it will make inferences by jumping across
such implication signs and eventually dig out an answer for you. This process is often
called as which is more than what a database system give you database system gives you
retrieval in an efficient faction. But it does not do deductions on the way prolog allows it
you deductive retrieval and in that sense it is more powerful than the RDBMS system.

And in fact prolog is the complete programming language anything you can do in java.
For example, you can do in prolog essentially and logic programming in general is also a
complete programming paradigm essentially it had it is limitations. So, I had hinted
about that when I said that there is only a limitation in the format in which you can write
prolog program it works with only a subset of logical statements which are known as a
which a which basically say that there can be only 1 consequent in any implication
statement. So, it is not complete I refer you back to this problem of 3 blocks that we had
mentioned in the last class, we are said that a is on b and b is on c and a is green and c is
naught green. And we are said that is it true or does there exist 2 blocks x and y such that
x is on y and x is green and y is not green in we cannot solve that problem using in the
forward chaining or backward chaining.

882
So, in that sense forward and backward chaining are naught complete and remember that
when you talk about logical systems we are interested in sound and complete system. So,
everything that we are doing here is sound, because it is based on sound rules of
inference it is only the queue completeness which is the question. So, in the next class
which is the last class of our course, we will look at this use of resolution method which
we saw for proportional logic and apply to in ((Refer Time: 52:09)) logic. And see that
that particular problem which I will discuss again in the next class can be solved using
resolution method essentially. So, on the way we will need to just have a quick look at
the unification algorithm which is this algorithm which is used for matching a to terms
like this which is necessary in the implicit quantifier form that we are using here. So, I
will stop here and will meet in the next class for the last time in this course.

883
Artificial Intelligence
Prof. Deepak Khemani
Department of Computer Science and Engineering
Indian Institute of Technology, Madras

Lecture - 48
Resolution for FOL

(Refer Slide Time: 00:15)

So, we meet for the last time semester and we want to look at the resolution method for F
O L essentially. So, as my might have mentioned earlier it was the method which was
introduced by Alan Robinson in 1965 and since then it has been used extensively in
theorem proving. So, things like for example, the proof for last theorem had extensive
support from programs which would based on resolution method eventually. And it is a
complete method for a proving the theorem or proving something which uses only one
rule of inference which is the resolution rule and it does not mean any other essentially.
Now, talking about generally about logic as we said that logic is basically a system in
which you define a language. And then define some rules of inference in that language
define a meaning of sentence in that language and then talk about soundness and
completeness and as you define more and more expressive languages. So, we have seen
only one step we have moved from professional logic to personal logic in which we said
that we can talk about variables and quantifies. Our variables it was shown by godel in

884
his theorem which was called Godel’s completeness theorem that first of the logic is
sound and complete essentially which means you can always device of a certain logic
system which is sound and which is complete.

And by complete we mean anything which is entailed by a set of statements can be


derived using that machinery that we have building essentially. It terms out that that F O
L is sound complete, but it is only semi decidable. And what this means is that in
particular it the way it applies to F O L is as follows that if you gave a true statement or
statement which is entailed to the system. And ask the system to prove it then there exists
a proof for that statement in that system essentially. And you can always devise a
strategy and you can imagine the strategy something like breath first search that always
you know find the nearest inference first and eventually. You will go further and further
away and eventually you will find the proof. So, you can always devise a strategy to for
finding a proof, but if you give a statement which is not true then the system in your halt.
Because there is no proof and the system my never come out saying that there is no proof
now in the case of proportional logic, because we have only dealing with a countably,
countable set of propositions when usually it is finite.

We can always say that at least for finites set of propositions we can always say that
there is no proof, because we try all combinations and eventually say that there is no way
that this can this statement can be true. So, even if it is false you can come out and say
false in the case of first of logic we cannot come out and say that yes statement is false.
You can only keep trying essentially which falls down to saying that you are program
can get in to an infinite loop essentially. And which is not of course, surprising that we
have written all of us have written programs which get into infinite loop. Of course, most
of our programs get it be infinite loop imperative languages, because we have written it a
long a long loop or long exist criteria in logic programming or prolog. We do not control
the flow of execution we simply stay it what is to be done. Of course, we do control it in
the order in which we write statements, because there we observed prolog does depth
first search essentially. Top to down and and left to right which means if you write
statements in a wrong order it is possible we could get it into a infinite loop. But even if
you write statements in the correct order if the statement that you trying to prove is not
true it can still get into an infinite loop essentially.

885
(Refer Slide Time: 05:11)

So, today, we will we will go back to this example that we saw and so on a b on b c. And
let us on table c and being a supposing given to you this is a set of facts given to you.
And you want to show that there exists an x there exists a y such that on x y and being x
and not green y. This is a goal that you want to show to be true and that is the facts given
to you now you can see that forward chaining backward chaining does not make sense
here. Because in your in your data base or knowledge base there are no implication
statements both forward chaining and backward chaining move set of allow you to move
across simplification statement. So, if you have alpha implies beta then if you have alpha
then you can say yes beta is there. Or if you have the goal beta you can say the goal
alpha can be a goal, but they are no implication statements here given the set of facts and
we are also show this here. So, you can imagine that forward chaining and backward
chaining does not work at all essentially.

So, you was of course, convenience your self is that this statement is true and as human
beings. We might use one technique which is call looking at different cases of proof by
cases we could say take the case when x is a and y is b or look at b. So, there are 2 cases
that either b is green or b is not green if b is green then x is equal to b and y is equal to c
and b is on c and b is green and y is not b. If b is not green then you can say x is equal to
a and y is equal to b and a is green and b is not green. So, whether b is green whether b is

886
not green you can show that in either or the 2 cases this statement is true. And if you
further say that this is the only possibility that one of them is true then you can argue that
yes this statement must be true essentially. So, this sequence which has slipped in
between if you argue that one only one of them is true is essentially a part of classical
logic in classical logic every statement is either true or false and there is nothing in
between.

So, it is called law of excluded middle essentially which some people object to, because
of statements like this, but I can say that p or not p is true or q or not q is true. So, this is
a true statement why because of the law of excluded middle either p must true or not p
must be true. So, in which case this this and this true and true and both are true
essentially I can shuffle this and write it as not p or q or not q or p. And then which I can
write as p implies q or q implies p. So, this is true for any 2 statements p and q whether
had any language personal or professional or high order language. It does not matter and
if you treat implication as implies in the causal sense then people have difficulty with
such statement. Because just imagine that p stands for the earth is flat and q stands for
the moon green then you are saying that there is casual connection between these 2
things that the earth is flat. And the moon is green that either the earth is flat implies that
the moon is green or the fact that moon is green implies that the earth is flat essentially.

Now, obviously there is no casual connection between these 2 statements you take any 2
statements p and q. And this statements is always true essentially so which is I
sometimes logicians tend to distinguish between logics with capture cause and
relationship. And we will not go in to that but rather they would see that instead reading
an implication here. You must read it like this when you say p implies q it is easier to
read it as this without getting worked up about it says that either p is false or q is true.
That is all you are saying essentially, but when we read this an implication we have the
sense of being a causal relationship, which is not really the case essentially. Now, let us
get back to the resolution method.

887
(Refer Slide Time: 10:52)

And before we do that we want to look at this algorithm called unification algorithm
which is a very famous algorithm. What unification algorithm does is that it takes 2
patterns while use a more generic term then formulas and tries to unify them which
means tries to find the substitution which should make them same essentially. This
patterns are made up of 2 kinds things; one are one is constant and one is variable. And
variable is something that you can substitute something for the other and constants are or
atoms as some people call it cannot be substituted cannot be changed they have. So, for
the sake of simplicity we will adopt slightly different notation.

888
(Refer Slide Time: 11:46)

Supposing I have this statement man x or, so let us say not man x or mortal x. Now, this
statement which is in the mathematical language of logic has this particular notation that
you have the predicate name. Or it could be the function name in some situations
followed by brackets followed by the arguments and then predicates connected using
logical connectives essentially. So, you have to distinguish between different kinds of
things for the sake this unification algorithm that is assume that we have a uniform
notation. And the in that notation what this will look like is a that I will use or here and
the notation uses the question mark as a thing for variable. This just for the sake of
understanding this algorithm easily you can always adopt the algorithm to this notation,
but this is simply easier to use. Because it is a very uniform list like notation which those
of you who are use list put be happier to look at essentially. So, I have moved a or sign
outside. So, the outer most connective is first and then the inner connective which is not
here so not man is written like this so not.

So, this whole thing from this bracket to this bracket is one expression or term if you we
just call it a term or a list. So, from here to here is a list which is so the first one is always
a connective the first element in the list is always a connective or a predicate name. In
this case so here predicate name is differentiated and you know connective. So, the first
thing is in the case it is a predicate name man here it is a connective here it is a

889
connective. So, this has got this 2 arguments this whole thing is one thing and this whole
thing. So, it is a list of 3 elements so everything is a list of some number of elements and
the only thing we need worry about is let either something is a constant or an atom as the
list people say or it is a variable essentially So, and the only the point up out that is an
atom can only match an atom. So, if I am this with a if I want to match this with man, so
what I am trying to do I am trying to do something like resolution. If you recall I have
this clause and I have this clause. And I have something here I have and I have the
negation of that thing here resolution's rule if you remember always first of all it works
in the clause form that that you must express things in c n f like form. And then you have
to look for something verses and it is negation of a positive literal and negation negative
literal. And in some sense cancel it out essentially and from there you can derive mortal
x essentially.

So, which means of course, I will have to match this man x with mortal x with man
sonatas which is work the unification algorithm will allow it do it will tell me what
values for x will make this true expressions. The same in one case we have just called it
as list here essentially. So, to match this with this of course, the predicate name must be
the same. So, we are just treating it as a atom here which means it can only match a
another atom which is same a man not can only match not or can only match or it is only
the variables which can match something else. So, let us assume that we have an
arbitrary nested pattern which is express as a list of this kind. And we want to write a
general algorithm which will find a unifier for any 2 patterns or a substitution was those
2 patterns which should make it a same this substitution is called a unifier. And that is
algorithm that we are looking for unification algorithm. So, the algorithm is called let us
say it is called unify I will just sketch it here and you get fill in the details.

So, let us say x y and what is does is it. So, will write in a prolog like fusion this part that
is sub unify x y. So, like we do in very very often we write a program; we add a third
parameter to make like simpler for us we added an empty list and call it sub unify sub
unify what this is going to be is the substitution. So, obviously when we call it when we
call we so x y are any search list arbitrary list we start of by supplying an empty
substitution. And now we, so x y theta where theta is the substitution we are trying to
build what is a substitution? Substitution is a collection variable value essentially it says

890
x equal to x should be substituted by this y should be substituted by this and so on. We
should make the 2 patterns same essentially and this algorithm goes to a series of cases. I
will just list out the salient once here; obviously x and y are the same whether they have
atom or whether they have a list then you do not need to do anything they already unified
essentially. So, some of main cases are follows.

So, these are cases so I just write it as f So, let us understand this to mean that x is a
variable then call another function called unify x y theta. So, if one of them is a variable
then it is a candidate for being substituted by something else and that this function unify
will do likewise if y is a variable you can also call var unify with y x theta somewhere.
So, this order will not be the perfect order I am writing here if atom x then if x equal to
why return theta else. So, if x is an atom if the argument that we are trying to unify. So,
this going to be a program, so this x and y initially will be such bigger patterns but
eventually when we build on into list at some point they will either a variable or an atom.
So, if it is a variable then we just I call it call var unify with others argument if it is an
atom we check whether it is matching or not essentially. So, if it is not a variable then if
it is not an atom it must be a list then we say if length x not equal to length it is a list.

So, we can compute it is length return fail if the 2 list are of unequal length then you can
never make them match else make recursive cause. So, make I just write it like this
appropriate recursive calls what does it mean that? For example, if this is one of my
argument this is x and this x has a list of 3 elements the first element is constant. Or the
second element is this list which is this whole thing and the third element is this other
list. So, if I have another list of 3 elements I can try to match that and then I will make
recursive calls once this with this then another with this and another with this. So, I
implemental build a substitution that leads with ask with the task of writing the var unify
which is really building the substitution I might have skipped one of the small details
does not matter x. So, when we make a call to var unify we know that x is a variable. The
other thing would be a list could be a something else. So, let us say this is a variable and
y and theta we know that the first thing is a variable essentially we have to do a few
checks. So, essentially what we really want do is to say add var egual to y to my theta.

Add one more substitution, but before doing that I want to do. So, what I what I want do?

891
I want say return bete union var equal to y. I want to basically add one more substitution
which is the call I am making, but is I may allowed to do that I have do a couple of
checks first first is if var is already equal to y. Then you can just return theta you do not
have to do anything else it could be the case. For example, I am comparing well these are
not variables does not matter it could that they are same variables then you just return
theta then if variable occurs return fail. So, if the variable happens to occur in y then you
return fail. So, let me use an example to illustrate by this is needed and this example is
from book by I can make them what which disturbs it in this notation.

(Refer Slide Time: 24:31)

So, the example is as follows naught c’s x. So, let us say c is stands for the predicate and
the meaning of sees is that the first argument can see the second argument. So, for
example, I see you or things like that, and this is a universally quantified statement. It is
saying there for all x let us assume that x is people x cannot see x which means one
cannot see oneself essentially. So, let us assume that that is a true statement and I have a
rule which I am let us say I am doing forward chaining. And the rule is as follows again I
am writing it in this new notation where instead of writing the implication sign I will
write an if here then the antecedent and then the consequent. So, instead of saying this
implies the consequent I am writing it in this list like notation which may them what the
use which is quite nice notation to use easy to process so this rule says.

892
So, it always worry about the number of brackets in this set of thing. So, what does this
tool say? It is a universally quantified statement how would read it in English. So, feet is
a function remember that in first of the logic the argument to predicate can only be terms.
And terms are either variables or art or constant or functions feet of z is function. So, let
us say feet was stands for z's feet. So, this is saying that is anyone cannot see their feet
they should diet essentially this may be a true statement. Of course, know the question is
shall should this rule and this these 2 rules can we apply forward chaining here to that
everyone should diet not given these facts right. So, let us say how that particular
statement that we are talking about if variable occurs y then return failure comes to our.
So, we are trying to unify this with this remember the antecedent should match.

Now, we will make recursive cause first you see that this is a list of 2 arguments then we
will make 2 recursive cause one with the first argument. Then one with the second
argument then the first call this is an atom and this matches this. So, that is fine so the
first recursive call will work and it will not change theta at all the second recursive call
has list of 3 elements and this also as a list of 3 elements. So, that is fine first we will
make a call with sees and sees here in the first call this case will come if if this is an atom
this is same atom then do nothing in the second call we will do this. So, our algorithm
will say x is a variable. So, I will call var unify with x and x and z and then in the last
statement which I will reached there I will say add x is equal to z I am not writing the
question mark here to theta. So, this will go into theta so this is if you want to may these
2 patterns same substitute for x the value z. And you will have theta now you have x is
already been put as z here. So, let so whether you do this here or whether you do this
theta both ways it works in fact the others other situation is if var as the value in theta
while you let us call it z in theta.

Then if variable already has the value in theta then call sub unify with that value
essentially.So, in this example we already have a value z in theta x equal to z. So, if we
are not change this x that clause will and we will call with x n. So, we so that recursive
call that sub unify call in this line here would be with the value of z with z and feet of z
essentially. So, let us assume that we have already made this z which is this case where
this is happening now you are trying to unify this x which is a variable with feet of. So,
this is become z now, so let me put an arrow here and show that this is become z.

893
Because we have substituted x with z now, we have making a call of a variable z with a
list which is feet of z which is the y.

So, this x in this statement here sorry this is well in this statement and this is y and this
statement says that if var occurs in y then return fail essentially. So, in this example z
occurs in this pattern or list. So, the algorithm should return failure in that we cannot
unify this essentially which is good for us because otherwise all of us would have have to
diet. So, this does not apply and this particular clause here is meant to catch exactly these
kind of a things essentially. So, you can never unify z with feet of z you know if you
substitute feet of z for z then you will have to substitute feet of z for this z also then the z
inside then the z inside then the z inside. So, it know does not make sense. So, this
unification algorithm.

(Refer Slide Time: 32:20)

And what it returns is the theta which is the most general unifier which can may the 2
patterns same. So, for example, if I have a statement p x z and another statement p x let
us say constant a. Then you can see that I can have one unifier which is x is equal to b
and z equal to a that is a unifier I am not saying that this algorithm will find this that a
unifier. So, that is one unifier and there is another unifier which is simply z equal to a
this unifier more general then this unifier, Because it does less amount of substitution I

894
mean anything that this does this also does, but there is something which this does which
this does not do. So, this is called more general then this go in to details we will accept
that there is a partial order of unifiers. And there is something called the most general
unifier which is called m g u and this algorithm essentially returns the most general
unifier. So, again without going into details we just accept the fact that it is desirable to
find the most general unifier. And the reason for that is that you can make the most
general inferences from which you can always derive most specific inferences using the
universal rule essentially. So, I am we do not have to go in to details over that, but it
basically the it is son ate algorithm which is really popular and theorem proving and we
use it all the time. So, let us address this problem let us see an how the resolution method
solve this problem.

(Refer Slide Time: 34:55)

So, to convert to solve a problem with resolution method you have to convert it in to
clause form. And a clause form is a form which looks like follow as follows at there is
some number of universal quantifiers x 1 all x 2 all x n. Then there is a set of clauses c
one and c 2 and c k such a form such a form of a formula is clause form when each c I is
d one or d 2 or d r and each d I is equal to l I or negation of l I. So, of course, they inside
part you will recognize as a conjunctive normal form a set of clauses which are joined by
an and each clause is basically some something. And each of those things is either a

895
literal which means an atomic statement or the negation of an atomic statement. So, you
have push the negation side all the way inside and you have removed or thrown way
existential quantifiers likely in our example we do not really have existential quantifiers.
Or at least we will see in a moment that we do not have, but we discussed earlier how to
handle existential quantifiers by using the solemn functions. And the solemn constants
that can be done and then re arranging any formula into c n f is something that I am sure
you have studied how to do that.

And then you move the universal quantifiers outside one they are together outside you
can just threw it away use quantifier form which is what we had doing here essentially.
So, what is it, so if you recall the deduction theorem that we had talked about earlier it
said that to show this follows on this you are equivalently showing that this and this
entails this. And if you want use the resolution method you will recall that to use the
resolution method you must take the conclusion and take it is negation and add it as a
clause to your system essentially. So, this is already in clause form all we need do is to
convert this into take it is negation. So, what it is negation of that? The negation of that
put on negation sign outside here I will have to push the negation sign inside, because I
have to convert it into this clause form. So, this will become for all x for all y not one x y
or not green x or green y. So, once this negation go inside then it go inside the and sign
convert it to an or remember that we have to push to the inner most place. So, this will
become not on this will become not green this negation negation cancel then this will
become green y.

So, let me write this here on x y or green x or not green y. So, this is one clause express it
in the implicit quantifier from and those are other things given to us which is on a b and
on b c. So, let us forget on table c that is not useful for us I mean we can write it, but it
does not help us green a and that is say it is on table c there any way we do not really
need that. So, we has this clauses and we want to what is it we want to show that can we
derive the null clause from this. And we are going to use the unification algorithm along
the way our example is so simple. That we do not have to really use a very complicated,
because it is very simple you can match it. So, let us just try this form this and this I
substitute x equal to a and y is equal to b I get green a or not green b. So, I am not stated
the resolution step or the rule for first of the logic, but you can see that it is very similar

896
to modified you do you apply the substitution and in the resolving you have substitution
already applied essentially.

So, because I am saying x equal to a and y is equal to b then this becomes a this becomes
b and I get this essentially then from this. And this I can get similarly green b or not
green c is that correct this should be negation here right and these should not be negation
and this should be negation y. So, all 3 are long way so negation on x y negation green x
yes seen that thing there and green y essentially what we get here is not green a and
green b and not green b or green c. So, is that correct know and then from this and this I
can get not green b and from this one and this one if you can keep track of arrows I can
get green b and from this and this. So, just I repeat from on a b and that negated goals.
So, remember this is a negated goal that we have added we get not green a or green b
then from on b c and the negated goal we get not green b or green c, but here we have
side not green c.

So, when we resolve this with this we get not green b only this remains when we resolve
this with this this is green a and this not green a. So, you get green b and then we not
green b or green b and from that we get the null clause essentially. So, you can see that
there is simple proof using the resolution method and if look at a proof carefully you can
see that it is trying to in some sense say at the same time that if this formula is to be
unsatisfiable. If this whole set of formulas which means only this formula, because this is
accepted to be true; this is the premises given to us. If this is to be false then it entails
same time b must green and b must not be green and of course, is a contradiction. So, as
we discussed earlier the resolution method is like a proof for contradiction. So, if you
remember when we talked about forward chaining or backward chaining. There is no
way you can move from this set of data to this conclusion essentially the conclusion
holds that one the exists a block on another block.

So, that the block top is green that the block below is not green it is it is not even initially
clear, but it is true. But we cannot derive it using first forward chaining or back ward
chaining, but in resolution method there is a very simple small proof for doing that. So,
in fact, this procedure by Alan Robinson was a big in logical reasoning automatic
theorem proving. And nowadays, automatic theorem proving is an, in many different

897
place for many different applications essentially. And the heart of this is resolution
method essentially which is a sound and complete method for first of the logic
essentially. So, I think we should stop here with this we will end this course I must say I
enjoyed teaching the class. And I hope some of you at least enjoyed the course I think so.

898
THIS BOOK IS NOT FOR SALE
NOR COMMERCIAL USE

(044) 2257 5905/08 nptel.ac.in swayam.gov.in

You might also like