You are on page 1of 105

Syllabus Artificial Intelligence Introduction to AI, AI Tasks, AI Techniques, Knowledge Representation, Approaches to Knowledge Representation, Procedural Vs.

Declarative Knowledge, Inferential knowledge, Issues in knowledge representation, Frame Problem, Forward Vs. Backward Reasoning. First order predicate logic, Prepositional Logic, Predicate Logic, Inference Rules, Resolution, Unification, Structured Knowledge Representation, Semantic Network, Conceptual graph, Frame Structure, Conceptual Dependency, Script. AI Problem, Space and Search, Means-end Analysis, Breadth first search, Depth first Search, Hill Climbing Search, Best first Search, A* Algorithm Learning, Rote Learning, Learning by taking advice, learning by problem solving, inductive learning, Explanation based learning. Expert System, Application area of Expert System, Structure of Expert System, Characteristics of Expert System, MYSIN Case study. Fuzzy Logic, Memory Organisation, Neural Network, Genetic Algorithm, Matching.

UNIT 1 : Introduction to AI 1.1 What is Artificial Intelligence? 1.2 Is AI Possible 1.3 Some AI Tasks 1.4 What we can do with AI? 1.5 AI Techniques 1.5.1 Knowledge Representation 1.5.2 Search 1.6 The Underlying Assumption UNIT 2 : Knowledge Representation 2.1 What to Represent? 2.2 Application of Knowledge in AI 2.3 Properties for Knowledge Representation Systems 2.4 Approaches to Knowledge Representation 2.4.1 Simple Relational Knowledge 2.4.1 Inheritable Knowledge 2.5 Inferential Knowledge 2.6 Procedural Vs. Declarative Knowledge 2.7 Issues in Knowledge Representation 2.8 The Frame Problem 2.9 Forward Vs Backward Reasoning UNIT 3 : First Order Predicate Logic 3.1 Logic 3.2 Introduction to Propositional Logic 3.3 Predicate Logic 3.3.1 Introduction to Predicate Logic 3.3.2 Predicate Logic: Semantics 3.4 Quantification 3.5 Well-Formed Formula For First Order Predicate Logic 3.6 From Wff to Proposition 3.7 Transcribing English to Predicate Logic Wffs 3.8 Properties of Statements 3.9 Inference Rules 3.10 Resolution 3.11 Conversion to Clausal Form 3.12 Unification Unit 4 : Structured Knowledge Representation 4.1 Semantic Network 4.2 Conceptual Graph

4.3 Frame Structures 4.4 Conceptual Dependency 4.5 Scripts Unit 5 : Problem, Problem Space and Search 5.1 Search and Control Strategies 5.2 Preliminary Concepts 5.3 Water Container Problem 5.4 Production System 5.5 Problem Characteristics 5.6 Means-end analysis 5.7 Problem Reduction 5.8 Uninformed or Blind Search 5.8.1 Breadth-First Search 5.8.2 Depth-First Search 5.9 Informed Search 5.9.1 Hill Climbing Methods 5.9.2 Best First Search 5.9.3 A* Algorithm Unit 6 : Learning 6.1 What is Learning? 6.2 Types of Learning 6.2.1 Rote Learning 6.2.2 Learning by Taking Advice 6.2.3 Learning by Problem Solving 6.2.4 Inductive Learning 6.2.5 Explanation Based Learning Unit 7 : Expert System 7.1 What is Expert System? 7.2 Expert System Application Area 7.3 Expert System Structure 7.4 Expert System Characteristics 7.5 Conventional Vs Expert Systems 7.6 Participants in Expert Systems Development 7.7 Tools For Development of Expert System 7.8 MYSIN Unit 8 : Matching and Reasoning 8.1 Fuzzy Logic 8.1.1 What is Fuzziness? 8.1.2 Current Application of Fuzzy Logic 8.1.3 Overview of Fuzzy Logic 8.1.4 Fuzzy Sets

8.1.5 Hedges 8.1.6 Fuzzy Set Operations 8.1.7 Fuzzy Inference 8.2 Memory Organisation 8.3 Neural Networks and Parallel Computation 8.3.1 Neural Network Architectures 8.4 Genetic Algorithm 8.5 Matching 8.5.1 Variable Matching

1.1 What is Artificial Intelligence? 1.2 Is AI Possible 1.3 Some AI Tasks 1.4 What we can do with AI? 1.5 AI Techniques 1.5.1 Knowledge Representation 1.5.2 Search 1.6 The Underlying Assumption


1.1 What is Artificial Intelligence?

Artificial intelligence (AI) is a broad field, and means different things to different people. It is concerned with getting computers to do tasks that require human intelligence. However, having said that, there are many tasks, which we might reasonably think require intelligence - such as complex arithmetic, which computers can do very easily. Conversely, there are many tasks that people do without even thinking - such as recognizing a face - which are extremely complex to automate. AI is concerned with these difficult tasks, which seem to require complex and sophisticated reasoning processes and knowledge. People might want to automate human intelligence for a number of different reasons. One reason is simply to understand human intelligence better. For example, we may be able to test and refine psychological and linguistic theories by writing programs, which attempt to simulate aspects of human behavior. Another reason is simply so that we have smarter programs. We may not care if the programs accurately simulate human reasoning, but by studying human reasoning we may develop useful techniques for solving difficult problems. AI is a field that overlaps with computer science rather than being a strict sub field. Different areas of AI are more closely related to psychology, philosophy, logic, linguistics, and even neurophysiology. However, as this is a CS course I'll emphasize the computational techniques used, and put less emphasis on psychological modeling or philosophical issues. I'll just briefly touch on some of the widely discussed philosophical issues below: What is AI Exactly? As a beginning I offer the following definition: AI is the branch of computer science concerned with the study and creation of computer systems that exhibit some form of intelligence: system that learn new concepts and tasks, systems that can reason and draw useful conclusions about the world around us, systems that can understand a natural language or perceive and comprehend a visual scene, and systems that perform other types of feats that require human types of intelligence. Like other definition of complex topics, an understanding of AI require an understanding of related terms such as intelligence, knowledge, reasoning, thought, cognition, learning, and a number of computer-related terms. While we lack precise scientific definitions for many of these terms, we can give general definitions of them. And, of course, one of the objectives of this text is to impart special meaning to all of the terms related to AI, including their operational meanings. Dictionaries define intelligence as the ability to acquire, understand and apply knowledge, or the ability to exercise thought and reason. Of course, intelligence is more than this. It embodies

all of the knowledge and feats, both conscious and unconscious, which we have acquired through study and experience: highly refined sight and sound perception; thought; imagination; the ability to converse, read, write, drive a car, memorize and recall facts, express and feel emotions; and much more. Intelligence is the integrated sum of those feats, which gives us the ability to remember a face not seen for thirty or more years, or to build and send rockets to the moon. It is those capabilities, which set Homo sapiens apart from other forms of living things. And as we shall see, the food for this intelligence is knowledge. Can we ever expect to build systems, which exhibit these characteristics? The answer to this question is yes! Today with the advent of the computer and 50 years of research into AI programming techniques, the dream of smart machines is becoming a reality. Researchers are creating systems which can mimic human thought, understand speech, beat the best human chess player, and countless other feats never before possible. It is not my aim to surprise or shock you--but the simplest way I can summarize is to say that there are now in the world machines that can think, that can learn and that can create. Moreover, their ability to do these things is going to increase rapidly until--in a visible future--the range of problems they can handle will be coextensive with the range to which the human mind has been applied. --Herbert Simon In spite of these impressive achievements, we still have not been able to produce coordinated, autonomous system which posses some of the basis abilities of a three-year-old child. These include the ability to recognize and remember numerous diverse objects in a scene, to learn new sounds and associate them with objects and concepts, and to adapt readily to many diverse new situations. These are the challenge now facing researchers in AI and they are not easy ones. They will require important breakthrough before we can expect to equal the performance of our three-year old.

1.2 Is AI Possible?
Before we embark on a course in Artificial Intelligence, we should consider for a moment whether automating intelligence is really possible! Artificial intelligence research makes the assumption that human intelligence can be reduced to the (complex) manipulation of symbols, and that it does not matter what medium is used to manipulate these symbols - it does not have to be a biological brain! This assumption does not go unchallenged among philosophers etc. Some argue that true intelligence can never be achieved by a computer, but requires some human property, which cannot be simulated. There are endless philosophical debates on this issue (some on, brought recently to public attention again in Penrose's book. The most well known contributions to the philosophical debate are Turing's ``Turing test'' paper, and Searle's ``Chinese room''. Very roughly, Turing considered how you would be able to conclude that a machine was really intelligent. He argued that the only reasonable way was to do a test. The test involves a human communicating with a human and with a computer in other rooms, using a computer for the communication. The first human can ask the other human/computer any questions they like, including very subjective questions like ``What do you think of this Poem''. If the computer answers so well that the first human can't tell which of the two others is human, then we say that the computer is intelligent.

Searle argued that just behaving intelligently wasn't enough. He tried to demonstrate this by suggesting a thought experiment (the ``Chinese room''). Imagine that you don't speak any Chinese, but that you have a huge rule book which allows you to look up Chinese sentences and tells you how to reply to them in Chinese. You don't understand Chinese, but can behave in an apparently intelligent way. He claimed that computers, even if they appeared intelligent, wouldn't really be, as they'd be just using something like the rule book of the Chinese room. Many people go further than Searle, and claim that computers will never even be able to appear to be really intelligent (so will never pass the Turing test). There are therefore a number of positions that you might adopt: y Computers will never even appear to be really intelligent, though they might do a few useful tasks that conventionally require intelligence. y Computers may eventually appear to be intelligent, but in fact they will just be simulating intelligent behavior, and not really be intelligent. y Computers will eventually be really intelligent. y Computers will not only be intelligent, they'll be conscious and have emotions. My view is that, though computers can clearly behave intelligently in performing certain limited tasks, full intelligence is a very long way off and hard to imagine (though I don't see any fundamental reason why a computer couldn't be genuinely intelligent.) However, these philosophical issues rarely impinge on AI practice and research. It is clear that AI techniques can be used to produce useful programs that conventionally require human intelligence, and that this work helps us understand the nature of our own intelligence. This is as much as we can expect from AI for now, and it still makes it a fascinating topic!

1.3 Some AI Tasks

Human intelligence involves both ``mundane'' and ``expert'' reasoning. By mundane reasoning I mean all those things which (nearly) all of us can routinely do (to various abilities) in order to act and interact in the world. This will include: y Vision: The ability to make sense of what we see. y Natural Language: The ability to communicate with others in English or another natural language. y Planning: The ability to decide on a good sequence of actions to achieve your goals. y Robotics: The ability to move and act in the world, possibly responding to new perceptions. By expert reasoning I mean things that only some people are good at, and which require extensive training. It can be especially useful to automate these tasks, as there may be a shortage of human experts. Expert reasoning includes: y Medical diagnosis. y Equipment repair. y Computer configuration. y Financial planning. Expert Systems are concerned with the automation of these sorts of tasks. AI research is concerned with automating both these kinds of reasoning. It turns out, however, that it is the mundane tasks that are by far the hardest to automate.

1.4 What we can do with AI ?

We have been studying this issue of AI application for quite some time now and know all the terms and facts. But what we all really need to know is what can we do to get our hands on some AI today. How can we as individuals use our own technology? We hope to discuss this in depth (but as briefly as possible) so that you the consumer can use AI as it is intended. First, we should be prepared for a change. Our conservative ways stand in the way of progress. AI is a new step that is very helpful to the society. Machines can do jobs that require detailed instructions followed and mental alertness. AI with its learning capabilities can accomplish those tasks but only if the worlds conservatives are ready to change and allow this to be a possibility. It makes us think about how early man finally accepted the wheel as a good invention, not something taking away from its heritage or tradition. Secondly, we must be prepared to learn about the capabilities of AI. The more use we get out of the machines the less work is required by us. In turn less injuries and stress to human beings. Human beings are a species that learn by trying, and we must be prepared to give AI a chance seeing AI as a blessing, not an inhibition. Finally, we need to be prepared for the worst of AI. Something as revolutionary as AI is sure to have many kinks to work out. There is always that fear that if AI is learning based, will machines learn that being rich and successful is a good thing, then wage war against economic powers and famous people? There are so many things that can go wrong with a new system so we must be as prepared as we can be for this new technology. However, even though the fear of the machines are there, their capabilities are infinite Whatever we teach AI, they will suggest in the future if a positive outcome arrives from it. AI are like children that need to be taught to be kind, well mannered, and intelligent. If they are to make important decisions, they should be wise. We as citizens need to make sure AI programmers are keeping things on the level. We should be sure they are doing the job correctly, so that no future accidents occur.

1.5 AI Techniques
There are various techniques that have evolved that can be applied to a variety of AI tasks these will be the focus of this course. These techniques are concerned with how we represent, manipulate and reason with knowledge in order to solve problems. y Knowledge Representation y Search

1.5.1 Knowledge Representation

Knowledge representation is crucial. One of the clearest results of artificial intelligence research so far is that solving even apparently simple problems requires lots of knowledge. Really understanding a single sentence requires extensive knowledge both of language and of the context. For example, today's (4th Nov) headline ``It's President Clinton'' can only be interpreted reasonably if you know it's the day after the American elections. [Yes, these notes are a bit out of date]. Really understanding a visual scene similarly requires knowledge of the kinds of objects in the scene. Solving problems in a particular domain generally requires

knowledge of the objects in the domain and knowledge of how to reason in that domain - both these types of knowledge must be represented. Knowledge must be represented efficiently, and in a meaningful way. Efficiency is important, as it would be impossible (or at least impractical) to explicitly represent every fact that you might ever need. There are just so many potentially useful facts, most of which you would never even think of. You have to be able to infer new facts from your existing knowledge, as and when needed, and capture general abstractions, which represent general features of sets of objects in the world. Knowledge must be meaningfully represented so that we know how it relates back to the real world. A knowledge representation scheme provides a mapping from features of the world to a formal language. (The formal language will just capture certain aspects of the world, which we believe is important to our problem - we may of course miss out crucial aspects and so fail to really solve our problem, like ignoring friction in a mechanics problem). Anyway, when we manipulate that formal language using a computer we want to make sure that we still have meaningful expressions, which can be mapped back to the real world. This is what we mean when we talk about the semantics of representation languages. In other way we can say that AI techniques is a method that exploits knowledge that should be represented in such a way that: y The knowledge captures generalizations. In other words, it is not necessary to represent separately each individual situation. Instead, situations that share important properties are grouped together if knowledge does not have this property, inordinate amounts of memory and updating will be required. So we usually call something without this property data rather than knowledge. y It can be understood by people who must provide it. Although for many programs, the bulk of the data can be acquired automatically (for example, by taking readings from a variety of instruments), in many AI domains, mist of the knowledge a program has must ultimately be provided by people in terms they understand. y It can easily be modified to correct errors and to reflect changes in the world and in out world view. y It can be used in a great many situations even if it is not totally accurate or complete. y It can be used to help overcome its own sheer bulk by helping to narrow the range of possibilities that must usually be considered.

1.5.2 Search
Another crucial general technique required when writing AI programs is search. Often there is no direct way to find a solution to some problem. However, you do know how to generate possibilities. For example, in solving a puzzle you might know all the possible moves, but not the sequence that would lead to a solution. When working out how to get somewhere you might know all the roads/buses/trains, just not the best route to get you to your destination quickly. Developing good ways to search through these possibilities for a good solution is therefore vital. Brute force techniques, where you generate and try out every possible solution may work, but are often very inefficient, as there are just too many possibilities to try. Heuristic techniques are

often better, where you only try the options, which you think (based on your current best guess) are most likely to lead to a good solution. Although AI techniques must be designed in keeping with these constraints imposed by AI problem, there is some degree of independence between problems and problem solving techniques. It is possible to solve AI problem without using AI techniques (although those solutions are not likely to be vary good). And it is possible to apply AI techniques to the solution of non-AI problems. This is likely to be a good thing to do for problems that possess many of the same characteristics, as do AI problems.

1.6 The Underlying Assumption

One of the assumptions underlying work in Artificial Intelligence is that intelligent behavior can be achieved through the manipulation of symbol structures (representing bits of knowledge). These symbols can be represented on any medium - in principle, we could develop a (very slow) intelligent machine made out of empty beer cans (plus something to move the beer cans around). However, computers provide the representational and reasoning powers whereby we might realistically expect to make progress towards automating intelligent behavior. So, the main question now is how we can represent knowledge as symbol structures and use that knowledge to intelligently solve problems. The next few lectures will concentrate on how we represent knowledge, using particular knowledge representation languages. These are highlevel representation formalisms, and can in principle be implemented using a whole range of programming languages. The remaining lectures will concentrate more on how we solve problems, using general knowledge of problem solving and domain knowledge. In AI, the crucial thing about knowledge representation languages is that they should support inference. We can't represent explicitly everything that the system might ever need to know some things should be left implicit, to be deduced by the system as and when needed in problem solving. For example if we were representing facts about a particular CS3 Honours student (say Fred Bloggs) we don't want to have to explicitly record the fact that Fred's studying AI. All CS3 Honours students are, so we should be able to deduce it. Similarly, you probably wouldn't explicitly represent the fact that I'm not the president of the United States, or that I have an office in Lilybank Gardens. You can deduce these things from your general knowledge about the world. Representing everything explicitly would be extremely wasteful of memory. For our CS3 example, we'd have 100 statements representing the fact that each student studies AI. Most of these facts would never be used. However, if we DO need to know if Fred Bloggs studies AI we want to be able to get at that information efficiently. We also would like to be able to make more complex inferences - maybe that Fred should be attending a lecture at 12am on Tuesday Feb 9th, so won't be able to have a supervision then. However, there is a tradeoff between inferential power (what we can infer) and inferential efficiency (how quickly we can infer it), so we may choose to have a language where simple inferences can be made quickly, though complex ones are not possible. In general, a good knowledge representation language should have at least the following features:

It should allow you to express the knowledge you wish to represent in the language. For example, suppose you want to represent the fact that ``Richard knows how old he is''. This turns out to be difficult to express in some languages. y It should allow new knowledge to be inferred from a basic set of facts, as discussed above. y It should be clear, and have a well-defined syntax and semantics. We want to know what the allowable expressions are in the language, and what they mean. Otherwise we won't be sure if our inferences are correct, or what the results mean. For example, if we have a fact grey (elephant) we want to know whether it means all elephants are grey, some particular one is grey, or what. Some of these features may be present in recent non-AI representation languages, such as deductive and object oriented databases. In fact, these systems have been influenced by early AI research on knowledge representation, and there is some promise of further crossfertilization of ideas, to allow robust, multi-user knowledge/data bases with well-defined semantics and flexible representation and inference capabilities. However, at present the fields are still largely separate, and we will only be discussing basic AI approaches here. Broadly speaking, there are three main approaches to knowledge representation in AI. The most important is arguably the use of logic. A logic, almost by definition, has a well-defined syntax and semantics, and is concerned with truth preserving inference. However, using logic to represent things has problems. On the one hand, it may not be very efficient - if we just want a very restricted class of inferences, we may not want the full power of a logic-based theorem prove, for example. On the other hand, representing some common-sense things in a logic can be very hard. For example in first order predicate logic we can't conclude that something is true one minute, and then later decide that it is not true after all. If we did this it would lead to a contradiction, from which we could prove anything at all! We could decide to use more complex logics, which allow this kind of reasoning - there are all sorts of logics out there, such as default logics, temporal logics and modal logics. However, another approach is to abandon the constraints that the use of a logic imposes and use a less clean, but more flexible knowledge representation language. Two such ``languages'' are structured objects and production systems. The idea of structured objects is to represent knowledge as a collection of objects and relations, the most important relations being the subclass and instance relations. The subclass relation (as you might expect) says that one class is a subclass of another, while the instance relation says that some individual belongs to some class. We'll use them so that ``X subclass Y'' means that X is a subclass of Y, not that X has a subclass Y. (Some books/approaches use the relation is a to refer to the subclass relation. So Fred Bloggs is an instance of the class representing AI3 students, while the class of AI3 students is a subclass of the class of third year students. We can then define property inheritance, so that, by default, Fred inherits all the typical attributes of AI3 students, and AI3 students inherit typical attributes of 3rd yr students. We'll go into this in much more detail below. Production systems consist of a set of if-then rules, and a working memory. The working memory represents the facts that are currently believed to hold, while the if-then rules typically state that if certain conditions hold (e.g. certain facts are in the working memory), then some action should be taken (e.g., other facts should be added or deleted). If the only action allowed is to add a fact to working memory then rules may be essentially logical implications, but

generally greater flexibility is allowed. Production rules capture (relatively) procedural knowledge in a simple, modular manner.


2.1 What to Represent? 2.2 Application of Knowledge in AI 2.3 Properties for Knowledge Representation Systems 2.4 Approaches to Knowledge Representation 2.4.1 Simple Relational Knowledge 2.4.1 Inheritable Knowledge 2.5 Inferential Knowledge 2.6 Procedural Vs. Declarative Knowledge 2.7 Issues in Knowledge Representation 2.8 The Frame Problem 2.9 Forward Vs Backward Reasoning

2.1 What to Represent?

Let us first consider what kinds of knowledge might need to be represented in AI systems: Objects -- Facts about objects in our world domain. e.g. Guitars have strings, trumpets are brass instruments. Events -- Actions that occur in our world. e.g. Steve Vai played the guitar in Frank Zappa's Band. Performance -- A behavior like playing the guitar involves knowledge about how to do things. Meta-knowledge -- knowledge about what we know. e.g. Bobrow's Robot who plan's a trip. It knows that it can read street signs along the way to find out where it is. Thus in solving problems in AI we must represent knowledge and there are two entities to deal with: Facts -- truths about the real world and what we represent. This can be regarded as the knowledge level Representation of the facts which we manipulate. This can be regarded as the symbol level since we usually define the representation in terms of symbols that can be manipulated by programs. We can structure these entities at two levels the knowledge level -- at which facts are described the symbol level -- at which representations of objects are defined in terms of symbols that can be manipulated in programs (see the following Fig. )

Fig Two Entities in Knowledge Representation English or natural language is an obvious way of representing and handling facts. Logic enables us to consider the following fact: spot is a dog as dog(spot) We could then infer that all dogs have tails with: x dog(x) p hasatail(x) We can then deduce: hasatail(Spot) Using an appropriate backward mapping function we could generate the English sentence Spot has a tail. The available functions are not always one to one but rather are many to many which is a characteristic of English representations. The sentences All dogs have tails and every dog has a tail both say that each dog has a tail but the first could say that each dog has more than one tail try substituting teeth for tails. When an AI program manipulates the internal representation of facts these new representations should also be interpretable as new representations of facts. Consider the classic problem of the mutilated chess board. Problem In a normal chess board the opposite corner squares have been eliminated. The given task is to cover all the squares on the remaining board by dominoes so that each domino covers two squares. No overlapping of dominoes is allowed, can it be done. Consider three data structures

Fig. Mutilated Checker the first two are illustrated in the diagrams above and the third data structure is the number of black squares and the number of white squares. The first diagram loses the colour of the

squares and a solution is not east to see; the second preserves the colours but produces no easier path whereas counting the number of squares of each colour giving black as 32 and the number of white as 30 yields an immediate solution of NO as a domino must be on one white square and one black square, thus the number of squares must be equal for a positive solution.

2.2 Application of Knowledge in AI

We have briefly mentioned where knowledge is used in AI systems. Let us consider a little further to what applications and how knowledge may be used. Learning -- acquiring knowledge. This is more than simply adding new facts to a knowledge base. New data may have to be classified prior to storage for easy retrieval, etc.. Interaction and inference with existing facts to avoid redundancy and replication in the knowledge and and also so that facts can be updated. Retrieval -- The representation scheme used can have a critical effect on the efficiency of the method. Humans are very good at it. Many AI methods have tried to model human (see lecture on distributed reasoning) Reasoning -- Infer facts from existing data. If a system on only knows: y Miles Davis is a Jazz Musician. y All Jazz Musicians can play their instruments well. If things like Is Miles Davis a Jazz Musician? or Can Jazz Musicians play their instruments well? are asked then the answer is readily obtained from the data structures and procedures. However a question like Can Miles Davis play his instrument well? requires reasoning. The above are all related. For example, it is fairly obvious that learning and reasoning involve retrieval etc.

2.3 Properties for Knowledge Representation Systems

The following properties should be possessed by a knowledge representation system. Representational Adequacy -- the ability to represent the required knowledge; Inferential Adequacy - the ability to manipulate the knowledge represented to produce new knowledge corresponding to that inferred from the original; Inferential Efficiency - the ability to direct the inferential mechanisms into the most productive directions by storing appropriate guides; Acquisitional Efficiency - the ability to acquire new knowledge using automatic methods wherever possible rather than reliance on human intervention. To date no single system optimises all of the above

2.4 Approaches to Knowledge Representation

We briefly survey some representation schemes. We will look at some in more detail in further chapters. y Simple relational knowledge y Inheritable knowledge

2.4.1 Simple relational knowledge

The simplest way of storing facts is to use a relational method where each fact about a set of objects is set out systematically in columns. This representation gives little opportunity for inference, but it can be used as the knowledge basis for inference engines. y Simple way to store facts. y Each fact about a set of objects is set out systematically in columns(shown below). y Little opportunity for inference. y Knowledge basis for inference engines.

Figure: Simple Relational Knowledge We can ask things like: y Who is dead? y Who plays Jazz/Trumpet etc.? This sort of representation is popular in database systems.

2.4.2 Inheritable knowledge

Relational knowledge is made up of objects consisting of y attributes y corresponding associated values. We extend the base more by allowing inference mechanisms: y Property inheritance o elements inherit values from being members of a class. o data must be organised into a hierarchy of classes (Fig. given below).

Fig. Property Inheritance Hierarchy y Boxed nodes -- objects and values of attributes of objects. y Values can be objects with attributes and so on. y Arrows -- point from object to its value. This structure is known as a slot and filler structure, semantic network or a collection of frames. The algorithm to retrieve a value for an attribute of an instance object: 1. Find the object in the knowledge base 2. If there is a value for the attribute report it 3. Otherwise look for a value of instance if none fail 4. Otherwise go to that node and find a value for the attribute and then report it 5. Otherwise search through using isa until a value is found for the attribute. 2.5 Inferential Knowledge Represent knowledge as formal logic: All dogs have tails
x dog(x) p hasatail(x)

Advantages: y A set of strict rules. o Can be used to derive more facts. o Truths of new statements can be verified. o Guaranteed correctness. y Many inference procedures available to in implement standard rules of logic.

Popular in AI systems. e.g Automated theorem proving.

2.6 Procedural Vs. Declarative Knowledge Declarative knowledge representation: y Static representation -- knowledge about objects, events etc. and their relationships and states given. y Requires a program to know what to do with knowledge and how to do it. Procedural representation: y control information necessary to use the knowledge is embedded in the knowledge itself. e.g. how to find relevant facts, make inferences etc. y Requires an interpreter to follow instructions specified in knowledge. y Knowledge encoded in some procedures o small programs that know how to do specific things, how to proceed. o e.g a parser in a natural language understander has the knowledge that a noun phrase may contain articles, adjectives and nouns. It is represented by calls to routines that know how to process articles, adjectives and nouns. Advantages: y Heuristic or domain specific knowledge can be represented. y Extended logical inferences, such as default reasoning facilitated. y Side effects of actions may be modelled. Some rules may become false in time. Keeping track of this in large systems may be tricky. Disadvantages: y Completeness -- not all cases may be represented. y Consistency -- not all deductions may be correct. e.g If we know that Fred is a bird we might deduce that Fred can fly. Later we might discover that Fred is an emu. y Modularity is sacrificed. Changes in knowledge base might have far-reaching effects. y Cumbersome control information.

2.7 Issue in Knowledge Representation

Below are listed issues that should be raised when using a knowledge representation technique: Important Attributes -- Are there any attributes that occur in many different types of problem? There are two instance and isa and each is important because each supports property inheritance. Relationships -- What about the relationship between the attributes of an object, such as, inverses, existence, techniques for reasoning about values and single valued attributes. We can consider an example of an inverse in band(John Zorn,Naked City)

This can be treated as John Zorn plays in the band Naked City or John Zorn's band is Naked City. Another representation is band = Naked City band-members = John Zorn, Bill Frissell, Fred Frith, Joey Barron, Granularity At what level should the knowledge be represented and what are the primitives. Choosing the Granularity of Representation Primitives are fundamental concepts such as holding, seeing, playing and as English is a very rich language with over half a million words it is clear we will find difficulty in deciding upon which words to choose as our primitives in a series of situations. If Tom feeds a dog then it could become: feeds(tom, dog) If Tom gives the dog a bone like: gives(tom, dog,bone) Are these the same? In any sense does giving an object food constitute feeding? If give(x, food) p feed(x) then we are making progress. But we need to add certain inferential rules. In the famous program on relationships Louise is Bill's cousin How do we represent this? louise = daughter (brother or sister (father or mother( bill))) Suppose it is Chris then we do not know if it is Chris as a male or female and then son applies as well. Clearly the separate levels of understanding require different levels of primitives and these need many rules to link together apparently similar primitives. Obviously there is a potential storage problem and the underlying question must be what level of comprehension is needed.

2.8 The Frame Problem

So far in this chapter we have seen several methods for representing knowledge that would allow us to form complex state description for search program. Another issue concerns how to represent efficient sequences of problem states that arise from a search process. For complex ill-structured problems, this can be a serious matter. Consider the world of a household robot. There are many objects and relationships in the world, and a state description must somehow include facts like on(Plant12, Table34), under(Table34,Window13), and in(Table34, Room15). One strategy is to each state description as a list of such facts. But what happens during the problem-solving process if each of those descriptions is very long? Most of the facts will not change from one state to another, yet each fact will be represented once at every node, and we will quickly run out of memory. Furthermore, we will spend the majority of out time creating

these nodes and copying these facts most of which do not change often from one node to another. For example, in the robot world, we could spend a lot of time recording. above(Ceiling, Floor) at every node. All of this is, of course, in addition to the real problem of figuring out which facts should be different at each node. This whole problem of representing the facts that change as well as those that do not is known as the frame problem. In some domains, the only hard part is representing all the facts. In others, though, figuring out which ones change is nontrivial. For example, in the robot world, there might be a table with a plant on it under the window. Suppose we move the table to the center of the room. We must also infer that the plant is now in the center of the room but that the window is not. To support this kind of reasoning, some systems make use of an explicit set of axioms called frame axioms, which describe all the things that do not change when a particular operator is applied in state n to produce state n+1. (The things that do change must be mentioned as part of the operator itself.) Thus, in the robot domain, we might write axioms such as Color(x, y, s1) move(x,s1,s2) color(x,y,s2) Which can be read as, if x has color y in state s1 and the operation of moving x is applied in state s1 to produce state s2, then the color of x in s2 is still y. Unfortunately, in any complex domain, a huge number of these axioms becomes necessary. An alternative approach is to make the assumption that the only things that change are the things that must. By must here we mean that the change is either required explicitly by the axioms that describe the operator or that it follows logically from some change that is asserted explicitly. This idea of circumscribing the set of unusual things is a very powerful one; it can be used as a partial solution to the frame problem and as a way of reasoning with incomplete knowledge. But now lets return briefly to the problem of representing a changing problem state. We could do it by simply starting with a description of the initial state and then making changes to that description as indicated by the rules we apply. This solves the problem of the wasted space and time involved in copying the information for each node. And it works fine until the first time the search has to backtrack. Then, unless all the changes that were made can simply be ignored (as they could be if, for example, they were simply additions of new theorems), we are faced with the problem of backing up to some earlier node. But how do we know what changes in the problem state description need to be undone? For example, what do we have to change to undo the effect of moving the table to the center of the room? There are two ways this problem can be solved. y Do not modify the initial state description at all. At each node, store an indication of the specific changes that should be made at this node. Whenever it is necessary to refer to the description of the current problem state. Look at the initial state description and also look back through all the nodes on the path from the start state to the current state. y Modify the initial state description as appropriate, but also record at each node an indication of what to do to undo the move should it ever be necessary to backtrack through the node. Then, whenever it is necessary to backtrack, check each node along the way and perform the indicated operations on the state description.

Sometimes, even these solutions are not enough. We might want to remember, for example, in the robot world, that before the table was moved, it was under the window and after being moved, it was in the center of the room. This can be handled by adding to the representation of each fact a specific indication of the time at which that fact was true. This indication is called a state variable. But to apply the same technique to a real-world problem. We need, for example, separate facts to indicate all the times at which the Statue of Liberty in the New York.


Forward Vs Backward Reasoning

The object of a search procedure is to discover a path through a problem space from a goal state, there are actually two directions in which such a search could proceed. y Forward, from the start states. y Backward, from the goal states. The production system model of the search process provides an easy way of viewing forward and backward reasoning as symmetric processes. Consider the problem of solving a particular instance of the 8-puzzle. The rules to be used for solving the puzzle can be written as shown in following figure. Assume the areas of the tray are numbered:

1 4 7

2 5 8

3 6 9

Square 1 empty and Square 2 contains tile n Square 2 empty and Square 1 contains tile n Square 1 empty and Square 4 contains tile n Square 4 empty and Square 1 contains tile n Square 2 empty and Square 1 contains tile n Square 1 empty and Square 2 contains tile n Using those rules we could attempt to solve the puzzle shown below.

2 1 7

8 6

3 4 5

1 8 7

2 6

3 4 5

Reason forward from the initial states. Begin by building a tree of move sequences that might be solutions by starting with the initial configuration(s) at the root of the tree. Generate the next level of the tree by finding all the rules whose left sides match the root node and using their right sides to create the new configurations. Generate the next level by taking each node generated at the previous level and applying to it all of the rules whose left sides match it. Continue until a configuration that matches the goal state is generated." Reason backward from the goal states. Begin building a tree of move sequences that might be solutions by starting with the goal configuration(s) at the root of the tree. Generate the next level of the tree by finding all the rules whose right sides match the root node. These are all the rules that, if only we could apply them, would generate the state we wanted. Use the left sides

of the rules to generate the nodes at this second level of the tree. Generate the next level of the tree by taking each node at the previous level and finding all the rules whose right sides match it. Then using the corresponding left sides to generate the new nodes. Continue until a node that matches the initial state is generated. This method of reasoning backward from the desired final state is often called goal-directed reasoning. Four factors influence the question of whether it is better to reason forward or backward y Are there more possible start states or goal states? We want to move from the smaller to the larger y In which direction is the branching factor? We want to proceed in the direction of the smaller branching factor y Will the program be asked to justify its reasoning process to the user? Proceed in the direction that corresponds to the way the user thinks y What kind of event is going to trigger a problem-solving episode? If a new fact triggers reasoning, forward chaining is better. If a query triggers reasoning, backward chain.


3.1 Logic 3.2 Introduction to Propositional Logic 3.3 Predicate Logic 3.3.1 Introduction to Predicate Logic 3.3.2 Predicate Logic: Semantics 3.4 Quantification 3.5 Well-Formed Formula For First Order Predicate Logic 3.6 From Wff to Proposition 3.7 Transcribing English to Predicate Logic Wffs 3.8 Properties of Statements 3.9 Inference Rules 3.10 Resolution 3.11 Conversion to Clausal Form 3.12 Unification

3.1 Logic
Logic is a language for reasoning. It is a collection of rules we use when doing logical reasoning. Human reasoning has been observed over centuries from at least the times of Greeks, and patterns appearing in reasoning have been extracted, abstracted, and streamlined. The foundation of the logic we are going to learn here was laid down by a British mathematician George Boole in the middle of the 19th century, and it was further developed and used in an attempt to derive all of mathematics by Gottlob Frege, a German mathematician, towards the end of the 19th century. A British philosopher/mathematician, Bertrand Russell, found a flaw in basic assumptions in Frege's attempt but he, together with Alfred Whitehead, developed Frege's work further and repaired the damage. The logic we study today is more or less along this line. In logic we are interested in true or false of statements, and how the truth/falsehood of a statement can be determined from other statements. However, instead of dealing with individual specific statements, we are going to use symbols to represent arbitrary statements so that the results can be used in many similar but different cases. The formalization also promotes the clarity of thought and eliminates mistakes. There are various types of logic such as logic of sentences (propositional logic), logic of objects (predicate logic), logic involving uncertainties, logic dealing with fuzziness, temporal logic etc. Here we are going to be concerned with propositional logic and predicate logic, which are fundamental to all types of logic.

3.2 Introduction to Propositional Logic

Propositional logic is a logic at the sentential level. The smallest unit we deal with in propositional logic is a sentence. We do not go inside individual sentences and analyze or discuss their meanings. We are going to be interested only in true or false of sentences, and major concern is whether or not the truth or falsehood of a certain sentence follows from those

of a set of sentences, and if so, how. Thus sentences considered in this logic are not arbitrary sentences but are the ones that are true or false. This kind of sentences are called propositions. Sentences considered in propositional logic are not arbitrary sentences but are the ones that are either true or false, but not both. These kinds of sentences are called propositions. If a proposition is true, then we say it has a truth value of "true"; if a proposition is false, its truth value is "false". For example, "Grass is green", and "2 + 5 = 5" are propositions. The first proposition has the truth value of "true" and the second "false". But "Close the door", and "Is it hot outside?" are not propositions. Also "x is greater than 2", where x is a variable representing a number, is not a proposition, because unless a specific value is given to x we can not say whether it is true or false, nor do we know what x represents. Similarly "x = x" is not a proposition because we don't know what "x" represents hence what "=" means. For example, while we understand what "3 = 3" means, what does "Air is equal to air" or "Water is equal to water" mean? Does it mean a mass of air is equal to another mass or the concept of air is equal to the concept of air? We don't quite know what "x = x" mean. Thus we cannot say whether it is true or not. Hence it is not a proposition. Simple sentences which are true or false are basic propositions. Larger and more complex sentences are constructed from basic propositions by combining them with connectives. Thus propositions and connectives are the basic elements of propositional logic. Though there are many connectives, we are going to use the following five basic connectives here: NOT, AND, OR, IF_THEN (or IMPLY), IF_AND_ONLY_IF. They are also denoted by the symbols: , , , , , respectively. Truth Table Often we want to discuss properties/relations common to all propositions. In such a case rather than stating them for each individual proposition we use variables representing an arbitrary proposition and state properties/relations in terms of those variables. Those variables are called a propositional variable. Propositional variables are also considered a proposition and called a proposition since they represent a proposition hence they behave the same way as propositions. A proposition in general contains a number of variables. For example (P Q) contains variables P and Q each of which represents an arbitrary proposition. Thus a proposition takes different values depending on the values of the constituent variables. This relationship of the value of a proposition and those of its constituent variables can be represented by a table. It tabulates the value of a proposition for all possible values of its variables and it is called a truth table. For example the following table shows the relationship between the values of P, Q and P Q: OR P F Q F (P F Q)




In the table, F represents truth-value false and T true. This table shows that P Q is false if P and Q are both false, and it is true in all the other cases. Meaning of the Connectives y Meaning of connectives: NOT, AND, OR, IMPLIES, IF AND ONLY IF Let us define the meaning of the five connectives by showing the relationship between the truth value (i.e. true or false) of composite propositions and those of their component propositions. They are going to be shown using truth table. In the tables P and Q represent arbitrary propositions, and true and false are represented by T and F, respectively. NOT P T F P F T

This table shows that if P is true, then ( P) is false, and that if P is false, then ( P) is true.


This table shows that (P Q) is true if both P and Q are true, and that it is false in any other case. Similarly for the rest of the tables. OR



(P F T T T



When P Q is always true, we express that by P Q. That is P Q is used when proposition P always implies proposition Q regardless of the value of the variables in them.


When P Q is always true, we express that by P Q. That is is used when two propositions always take the same value regardless of the value of the variables in them. IMPLIES P F F T Q F T F (P T T F Q)

When P Q is always true, we express that by P Q. That is P Q is used when proposition P always implies proposition Q regardless of the value of the variables in them. IF AND ONLY IF P F F T T Q F T F T (P T F F T Q)

When P Q is always true, we express that by P Q. That is is used when two propositions always take the same value regardless of the value of the variables in them. Construction of Complex Propositions Syntax of propositions First it is informally shown how complex propositions are constructed from simple ones. Then more general way of constructing propositions is given. In everyday life we often combine propositions to form more complex propositions without paying much attention to them. For example combining "Grass is green", and "The sun is red" we say something like "Grass is green and the sun is red", "If the sun is red, grass is green", "The sun is red and the grass is not green" etc. Here "Grass is green", and "The sun is red" are propositions, and form them using connectives "and", "if... then..." and "not" a little more complex propositions are formed. These new propositions can in turn be combined with other propositions to construct more complex propositions. They then can be combined to form even more complex propositions. This process of obtaining more and more complex propositions can be described more generally as follows: Let X and Y represent arbitrary propositions. Then [ X], [X Y], [X Y], [X Y], and [X Y] are propositions. Note that X and Y here represent an arbitrary proposition. This is actually a part of more rigorous definition of proposition. Example: [P -> [Q V R]] is a proposition and it is obtained by first constructing [Q V R] by applying [X V Y] to propositions Q and R considering them as X and Y, respectively, then by applying [X -> Y] to the two propositions P and [Q V R] considering them as X and Y, respectively.

Note: Rigorously speaking X and Y above are place holders for propositions, and so they are not exactly a proposition. They are called a propositional variable, and propositions formed from them using connectives are called a propositional form. However, we are not going to distinguish them here, and both specific propositions such as "2 is greater than 1" and propositional forms such as (P Q) are going to be called a proposition. For the proposition P Q, the proposition Q P is called its converse, and the proposition Q P is called its contrapositive. For example for the proposition "If it rains, then I get wet", Converse: If I get wet, then it rains. Contrapositive: If I don't get wet, then it does not rain. The converse of a proposition is not necessarily logically equivalent to it, that is they may or may not take the same truth value at the same time. On the other hand, the contrapositive of a proposition is always logically equivalent to the proposition. That is, they take the same truth value regardless of the values of their constituent variables. Therefore, "If it rains, then I get wet." and "If I don't get wet, then it does not rain." are logically equivalent. If one is true then the other is also true, and vice versa. As we are going to see in the next section, reasoning is done on propositions using inference rules. For example, if the two propositions "if it snows, then the school is closed", and "it snows" are true, then we can conclude that "the school is closed" is true. In everyday life, that is how we reason. To check the correctness of reasoning, we must check whether or not rules of inference have been followed to draw the conclusion from the premises. However, for reasoning in English or in general for reasoning in a natural language, that is not necessarily straight forward and it often encounters some difficulties. Firstly, connectives are not necessarily easily identified as we can get a flavor of that from the previous topic on variations of if then statements. Secondly, if the argument becomes complicated involving many statements in a number of different forms twisted and tangled up, it can easily get out of hand unless it is simplified in some way. One solution for that is to use symbols (and mechanize it). Each sentence is represented by symbols representing building block sentences, and connectives. For example, if P represents "it snows" and Q represents "the school is closed", then the previous argument can be expressed as [ [ P -> Q ] ^ P ] -> Q , or P -> Q P ----------------------------Q This representation is concise, much simpler and much easier to deal with. In addition today there are a number of automatic reasoning systems and we can verify our arguments in symbolic form using them. One such system called TPS is used for reasoning exercises in this course. For example, we can check the correctness of our argument using it. To convert English statements into a symbolic form, we restate the given statements using the building block sentences and the connectives of propositional logic (not, and, or, if_then, if_and_only_if), and then substitute the symbols for the building blocks and the connectives.

For example, let P be the proposition "It is snowing", Q be the proposition "I will go the beach", and R be the proposition "I have time". Then first "I will go to the beach if it is not snowing" is restated as "If it is not snowing, I will go to the beach". Then symbols P and Q are substituted for the respective sentences to obtain ~P -> Q. Similarly, "It is not snowing and I have time only if I will go to the beach" is restated as "If it is not snowing and I have time, then I will go to the beach", and it is translated as ( ~P ^ R ) -> Q.

3.3 Predicate Logic

The propositional logic is not powerful enough to represent all types of assertions that are used in computer science and mathematics, or to express certain types of relationship between propositions such as equivalence. For example, the assertion "x is greater than 1", where x is a variable, is not a proposition because you can not tell whether it is true or false unless you know the value of x. Thus the propositional logic can not deal with such sentences. However, such assertions appear quite often in mathematics and we want to do inferencing on those assertions. Also the pattern involved in the following logical equivalences cannot be captured by the propositional logic: "Not all birds fly" is equivalent to "Some birds don't fly". "Not all integers are even" is equivalent to "Some integers are not even". "Not all cars are expensive" is equivalent to "Some cars are not expensive", Each of those propositions is treated independently of the others in propositional logic. For example, if P represents "Not all birds fly" and Q represents "Some integers are not even", then there is no mechanism in propositional logic to find out the is equivalent to Q. Hence to be used in inferencing, each of these equivalences must be listed individually rather than dealing with a general formula that covers all these equivalences collectively and instantiating it as they become necessary, if only propositional logic is used. Predicate logic is a development of propositional logic, which should be familiar to you. In proposition logic a fact such as ``Alison likes waffles'' would be represented as a simple atomic proposition. Lets call it P. We can build up more complex expressions (sentences) by combining atomic propositions with the logical connectives ~ p and m . So if we had the proposition Q representing the fact ``Alison eats waffles'' we could have the facts: P Q : ``Alison likes waffles or Alison eats waffles'' P Q : ``Alison likes waffles and Alison eats waffles'' ~ Q : ``Alison doesn't eat waffles'' P p Q : ``If Alison likes waffles then Alison eats waffles''. In general, if X and Y are sentences in propositional logic, then so are X Y, X Y, ~X, X p Y, and X mY. So the following are valid sentences in the logic: P ~Q P (P pQ) (Q ~R) p P Propositions can be true or false in the world. An interpretation function assigns, to each proposition, a truth value (i.e., true or false). This interpretation function says what is true in the world. We can determine the truth value of arbitrary sentences using truth tables which

define the truth values of sentences with logical connectives in terms of the truth values of their component sentences. The truth tables provide a simple semantics for expressions in propositional logic. As sentences can only be true or false, truth tables are very simple. In order to infer new facts in a logic we need to apply inference rules. The semantics of the logic will define which inference rules are universally valid. One useful inference rule is the following (called modus ponens) but many others are possible: assertion : a, implication : a b --conclusion : b This rule just says that if a b is true, and a is true, then b is necessarily true. We could prove that this rule is valid using truth tables. Thus we need more powerful logic to deal with these and other problems. The predicate logic is one of such logic and it addresses these issues among others.

3.3.1 Introduction to Predicate Logic

The most important knowledge representation language is arguably predicate logic (or strictly, first order predicate logic - there are lots of other logics out there to distinguish between). Predicate logic allows us to represent fairly complex facts about the world, and to derive new facts in a way that guarantees that, if the initial facts were true then so are the conclusions. It is a well understood formal language, with well-defined syntax, semantics and rules of inference. The trouble with propositional logic is that it is not possible to write general statements in it, such as ``Alison eats everything that she likes''. We'd have to have lots of rules, for every different thing that Alison liked. Predicate logic makes such general statements possible. Sentences in predicate calculus are built up from atomic sentences (not to be confused with Prolog atoms). Atomic sentences consist of a predicate name followed by a number of arguments. These arguments may be any term. Terms may be: Constant symbols such as ``alison''. Variable symbols such as ``X''. For consistency with Prolog we'll use capital letters to denote variables. Function expressions such as ``father(alison)''. Function expressions consist of a function followed by a number of arguments, which can be arbitrary terms. This should all seem familiar from our description of Prolog syntax. However, although Prolog is based on predicate logic the way we represent things is slightly different, so the two should not be confused. So, atomic sentences in predicate logic include the following: y friends(alison, richard) y friends(father(fred), father(joe)) y likes(X, richard)

Sentences in predicate logic are constructed (much as in propositional logic) by combining atomic sentences with logical connectives, so the following are all sentences in predicate calculus: y friends(alison, richard) p likes(alison, richard) likes(alison, richard) V likes(alison, waffles) ((likes(alison, richard) V likes(alison, waffles)) ~likes(alison, waffles)) p likes(alison, richard) Sentences can also be formed using quantifiers to indicate how any variables in the sentence are to be treated. The two quantifiers in predicate logic are  and , so the following are valid sentences: y X bird(X) ~flies(X) i.e., there exists some bird that doesn't fly. y X (person(X) Y loves(X,Y)) i.e., every person has something that they love. A sentence should have all its variables quantified. So strictly, an expression like ``X loves(X, Y)'', though a well formed formula of predicate logic, is not a sentence. Formulae with all their variables quantified are also called closed formulae.
y y

3.3.2 Predicate Logic: Semantics

The semantics of predicate logic is defined (as in propositional logic) in terms of the truth values of sentences. Like in propositional logic, we can determine the truth value of any sentence in predicate calculus if we know the truth values of the basic components of that sentence. An interpretation function defines the basic meanings/truth values of the basic components, given some domain of objects that we are concerned with. In propositional logic we saw that this interpretation function was very simple, just assigning truth values to propositions. However, in predicate calculus we have to deal with predicates, variables and quantifiers, so things get much more complex. Predicates are dealt with in the following way. If we have, say, a predicate P with 2 arguments, then the meaning of that predicate is defined in terms of a mapping from all possible pairs of objects in the domain to a truth value. So, suppose we have a domain with just three objects in: fred, jim and joe. We can define the meaning of the predicate father in terms of all the pairs of objects for which the father relationship is true - say fred and jim. The meaning of  and  is defined again in terms of the set of objects in the domain. X S means that for every object X in the domain, S is true. X S means that for some object X in the domain, S is true. So, X father(fred, X), given our world (domain) of 3 objects (fred, jim, joe), would only be true if father(fred, X) was true for each object. In our interpretation of the father relation this only holds for X=jim, so the whole quantified expression will be false in this interpretation. This only gives a flavor of how we can give a semantics to expressions in predicate logic. The details are best left to logicians. The important thing is that everything is very precisely defined, so if use predicate logic we should know exactly where we are and what inferences are valid. We can explain the whole concept in very easy term that

To cope with deficiencies of propositional logic we introduce two new features: predicates and quantifiers. A predicate is a verb phrase template that describes a property of objects, or a relationship among objects represented by the variables. For example, the sentences "The car Tom is driving is blue", "The sky is blue", and "The cover of this book is blue" come from the template "is blue" by placing an appropriate noun/noun phrase in front of it. The phrase "is blue" is a predicate and it describes the property of being blue. Predicates are often given a name. For example any of "is_blue", "Blue" or "B" can be used to represent the predicate "is blue" among others. If we adopt B as the name for the predicate "is_blue", sentences that assert an object is blue can be represented as "B(x)", where x represents an arbitrary object. B(x) reads as "x is blue". Similarly the sentences "John gives the book to Mary", "Jim gives a loaf of bread to Tom", and "Jane gives a lecture to Mary" are obtained by substituting an appropriate object for variables x, y, and z in the sentence "x gives y to z". The template "... gives ... to ..." is a predicate and it describes a relationship among three objects. This predicate can be represented by Give( x, y, z ) or G( x, y, z ), for example. Note: The sentence "John gives the book to Mary" can also be represented by another predicate such as "gives a book to". Thus if we use B( x, y ) to denote this predicate, "John gives the book to Mary" becomes B( John, Mary ). In that case, the other sentences, "Jim gives a loaf of bread to Tom", and "Jane gives a lecture to Mary", must be expressed with other predicates.

3.4 Quantification --- Forming Propositions from Predicates

Subjects to be Learned y universe y universal quantifier y existential quantifier y free variable y bound variable y scope of quantifier y order of quantifiers Contents A predicate with variables is not a proposition. For example, the statement x > 1 with variable x over the universe of real numbers is neither true nor false since we don't know what x is. It can be true or false depending on the value of x. For x > 1 to be a proposition either we substitute a specific number for x or change it to something like "There is a number x for which x > 1 holds", or "For every number x, x > 1 holds". More generally, a predicate with variables (called an atomic formula) can be made a proposition by applying one of the following two operations to each of its variables: 1. assign a value to the variable 2. quantify the variable using a quantifier (see below). For example, x > 1 becomes 3 > 1 if 3 is assigned to x, and it becomes a true statement, hence a proposition.

In general, a quantification is performed on formulas of predicate logic (called wff ), such as x > 1 or P(x), by using quantifiers on variables. There are two types of quantifiers: universal quantifier and existential quantifier. The universal quantifier turns, for example, the statement x > 1 to "for every object x in the universe, x > 1", which is expressed as " x x > 1". This new statement is true or false in the universe of discourse. Hence it is a proposition once the universe is specified. Similarly the existential quantifier turns, for example, the statement x > 1 to "for some object x in the universe, x > 1", which is expressed as " x x > 1." Again, it is true or false in the universe of discourse, and hence it is a proposition once the universe is specified. Universe of Discourse The universe of discourse, also called universe, is the set of objects of interest. The propositions in the predicate logic are statements on objects of a universe. The universe is thus the domain of the (individual) variables. It can be the set of real numbers, the set of integers, the set of all cars on a parking lot, the set of all students in a classroom etc. The universe is often left implicit in practice. But it should be obvious from the context. The Universal Quantifier The expression: x P(x), denotes the universal quantification of the atomic formula P(x). Translated into the English language, the expression is understood as: "For all x, P(x) holds", "for each x, P(x) holds" or "for every x, P(x) holds". is called the universal quantifier, and x means all the objects x in the universe. If this is followed by P(x) then the meaning is that P(x) is true for every object x in the universe. For example, "All cars have wheels" could be transformed into the propositional form, x P(x), where:
y y

P(x) is the predicate denoting: x has wheels, and the universe of discourse is only populated by cars.

Universal Quantifier and Connective AND If all the elements in the universe of discourse can be listed then the universal quantification x P(x) is equivalent to the conjunction: P(x1)) P(x2) P(x3) ... P(xn) . For example, in the above example of x P(x), if we knew that there were only 4 cars in our universe of discourse (c1, c2, c3 and c4) then we could also translate the statement as: P(c1) P(c2) P(c3) P(c4) The Existential Quantifier The expression: xP(x), denotes the existential quantification of P(x). Translated into the English language, the expression could also be understood as: "There exists an x such that P(x)" or "There is at least one x such that P(x)" is called the existential quantifier, and x means at least one object x in the universe. If this is followed by P(x) then the meaning is that P(x) is true for at least one object x of the universe. For example, "Someone loves you" could be transformed into the propositional form, x P(x), where: y P(x) is the predicate meaning: x loves you, y The universe of discourse contains (but is not limited to) all living creatures. Existential Quantifier and Connective OR

If all the elements in the universe of discourse can be listed, then the existential quantification xP(x) is equivalent to the disjunction: P(x1) P(x2) P(x3) ... P(xn). For example, in the above example of x P(x), if we knew that there were only 5 living creatures in our universe of discourse (say: me, he, she, rex and fluff), then we could also write the statement as: P(me) P(he) P(she) P(rex) P(fluff) An appearance of a variable in a wff is said to be bound if either a specific value is assigned to it or it is quantified. If an appearance of a variable is not bound, it is called free. The extent of the application(effect) of a quantifier, called the scope of the quantifier, is indicated by square brackets [ ]. If there are no square brackets, then the scope is understood to be the smallest wff following the quantification. For example, in x P(x, y), the variable x is bound while y is free. In x [ y P(x, y) Q(x, y) ] , x and the y in P(x, y) are bound, while y in Q(x, y) is free, because the scope of y is P(x, y). The scope of x is [ y P(x, y) Q(x, y) ] . How to read quantified formulas When reading quantified formulas in English, read them from left to right. x can be read as "for every object x in the universe the following holds" and x can be read as "there exists an object x in the universe which satisfies the following" or "for some object x in the universe the following holds". Those do not necessarily give us good English expressions. But they are where we can start. Get the correct reading first then polish your English without changing the truth values. For example, let the universe be the set of airplanes and let F(x, y) denote "x flies faster than y". Then x y F(x, y) can be translated initially as "For every airplane x the following holds: x is faster than every (any) airplane y". In simpler English it means "Every airplane is faster than every airplane (including itself !)". x y F(x, y) can be read initially as "For every airplane x the following holds: for some airplane y, x is faster than y". In simpler English it means "Every airplane is faster than some airplane". x y F(x, y) represents "There exist an airplane x which satisfies the following: (or such that) for every airplane y, x is faster than y". In simpler English it says "There is an airplane which is faster than every airplane" or "Some airplane is faster than every airplane". x y F(x, y) reads "For some airplane x there exists an airplane y such that x is faster than y", which means "Some airplane is faster than some airplane". Order of Application of Quantifiers When more than one variables are quantified in a wff such as y x P( x, y ), they are applied from the inside, that is, the one closest to the atomic formula is applied first. Thus
y x P( x, y ) reads y [ x P( x, y ) ] , and we say "there exists an y such that for every x, P( x, y ) holds" or "for some y, P( x, y ) holds for every x". The positions of the same type of quantifiers can be switched without affecting the truth value as long as there are no quantifiers of the other type between the ones to be interchanged. For example x y z P(x, y , z) is equivalent to y x z P(x, y , z), z y x P(x, y , z), etc. It is the same for the universal quantifier. However, the positions of different types of quantifiers can not be switched.

For example

y P( x, y ) is not equivalent to

x P( x, y ). For let P( x, y ) represent x < x y P( x, y ) reads "for every

y for the set of numbers as the universe, for example. Then

number x, there is a number y that is greater than x", which is true, while y x P( x, y ) reads "there is a number that is greater than every (any) number", which is not true.

3.5 Well-Formed Formula for First Order Predicate Logic --- Syntax Rules
Subjects to be Learned y wff (well formed formula) y atomic formula y syntax of wff Contents Not all strings can represent propositions of the predicate logic. Those which produce a proposition when their symbols are interpreted must follow the rules given below, and they are called wffs(well-formed formulas) of the first order predicate logic. Rules for constructing Wffs A predicate name followed by a list of variables such as P(x, y), where P is a predicate name, and x and y are variables, is called an atomic formula. Wffs are constructed using the following rules: 1. True and False are wffs. 2. Each propositional constant (i.e. specific proposition), and each propositional variable (i.e. a variable representing propositions) are wffs. 3. Each atomic formula (i.e. a specific predicate with variables) is a wff. 4. If A, B, and C are wffs, then so are A, (A B), (A B), (A B), and (A B). 5. If x is a variable (representing objects of the universe of discourse), and A is a wff, then so are x A and x A . (Note : More generally, arguments of predicates are something called a term. Also variables representing predicate names (called predicate variables) with a list of variables can form atomic formulas. But we do not get into that here. Those who are interested click here.) For example, "The capital of Virginia is Richmond." is a specific proposition. Hence it is a wff by Rule 2. Let B be a predicate name representing "being blue" and let x be a variable. Then B(x) is an atomic formula meaning "x is blue". Thus it is a wff by Rule 3. above. By applying Rule 5. to B(x), xB(x) is a wff and so is xB(x). Then by applying Rule 4. to them x B(x) x B(x) is seen to be a wff. Similarly if R is a predicate name representing "being round". Then R(x) is an atomic formula. Hence it is a wff. By applying Rule 4 to B(x) and R(x), a wff B(x) R(x) is obtained. In this manner, larger and more complex wffs can be constructed following the rules given above. Note, however, that strings that can not be constructed by using those rules are not wffs. For example, xB(x)R(x), and B( x ) are NOT wffs, NOR are B( R(x) ), and B( x R(x) ) . One way to check whether or not an expression is a wff is to try to state it in English. If you can translate it into a correct English sentence, then it is a wff. More examples: To express the fact that Tom is taller than John, we can use the atomic formula taller(Tom, John), which is a wff. This wff can also be part of some compound statement such as taller(Tom, John) taller(John, Tom), which is also a wff.

If x is a variable representing people in the world, then taller(x,Tom),

x taller(x,Tom),

taller(x,Tom), x y taller(x,y) are all wffs among others. However, taller( x,John) and taller(Tom Mary, Jim), for example, are NOT wffs.

3.6 From Wff to Proposition

Subjects to be Learned y interpretation y satisfiable wff y invalid wff (unsatisfiable wff) y valid wff y equivalence of wffs Contents Interpretation A wff is, in general, not a proposition. For example, consider the wff x P(x). Assume that P(x) means that x is non-negative (greater than or equal to 0). This wff is true if the universe is the set {1, 3, 5}, the set {2, 4, 6} or the set of natural numbers, for example, but it is not true if the universe is the set {-1, 3, 5}, or the set of integers, for example. Further more the wff x Q(x, y), where Q(x, y) means x is greater than y, for the universe {1, 3, 5} may be true or false depending on the value of y. As one can see from these examples, the truth value of a wff is determined by the universe, specific predicates assigned to the predicate variables such as P and Q, and the values assigned to the free variables. The specification of the universe and predicates, and an assignment of a value to each free variable in a wff is called an interpretation for the wff. For example, specifying the set {1, 3, 5} as the universe and assigning 0 to the variable y, for example, is an interpretation for the wff x Q(x, y), where Q(x, y) means x is greater than y. x Q(x, y) with that interpretation reads, for example, "Every number in the set {1, 3, 5} is greater than 0". As can be seen from the above example, a wff becomes a proposition when it is given an interpretation. There are, however, wffs which are always true or always false under any interpretation. Those and related concepts are discussed below. Satisfiable, Unsatisfiable and Valid Wffs A wff is said to be satisfiable if there exists an interpretation that makes it true, that is if there are a universe, specific predicates assigned to the predicate variables, and an assignment of values to the free variables that make the wff true. For example, x N(x), where N(x) means that x is non-negative, is satisfiable. For if the universe is the set of natural numbers, the assertion x N(x) is true, because all natural numbers are non-negative. Similarly x N(x) is also satisfiable. However, x [N(x) N(x)] is not satisfiable because it can never be true. A wff is called invalid or unsatisfiable, if there is no interpretation that makes it true. A wff is valid if it is true for every interpretation*. For example, the wff valid for any predicate name P , because
x P(x) x P(x) is

x P(x) is the negation of

x P(x).

However, the wff x N(x) is satisfiable but not valid. Note that a wff is not valid iff it is unsatisfiable for a valid wff is equivalent to true. Hence its negation is false. Equivalence Two wffs W1 and W2 are equivalent if and only if W1 W2 is valid, that is if and only if W1 W2 is true for all interpretations. For example x P(x) and x P(x) are equivalent for any predicate name P . So are x [ P(x) Q(x) ] and [ x P(x) x Q(x) ] for any predicate names P and Q . y To be precise, it is not for every interpretation but for the ones that "make sense". For example you don't consider the universe of set of people for the predicate x > 1 as an interpretation. Also an interpretation assigns a specific predicate to each predicate variable. A rigorous definition of interpretation etc. are, however, beyond the scope of this course.

3.7 Transcribing English to Predicate Logic wffs

Subjects to be Learned y Translating English sentences to wff Contents English sentences appearing in logical reasoning can be expressed as a wff. This makes the expressions compact and precise. It thus eliminates possibilities of misinterpretation of sentences. The use of symbolic logic also makes reasoning formal and mechanical, contributing to the simplification of the reasoning and making it less prone to errors. Transcribing English sentences into wffs is sometimes a non-trivial task. In this course we are concerned with the transcription using given predicate symbols and the universe. To transcribe a proposition stated in English using a given set of predicate symbols, first restate in English the proposition using the predicates, connectives, and quantifiers. Then replace the English phrases with the corresponding symbols. Example: Given the sentence "Not every integer is even", the predicate "E(x)" meaning x is even, and that the universe is the set of integers, first restate it as "It is not the case that every integer is even" or "It is not the case that for every object x in the universe, x is even." Then "it is not the case" can be represented by the connective " ", "every object x in the universe" by " x", and "x is even" by E(x). Thus altogether wff becomes x E(x). This given sentence can also be interpreted as "Some integers are not even". Then it can be restated as "For some object x in the universe, x is not integer". Then it becomes x E(x). More examples: A few more sentences with corresponding wffs are given below. The universe is assumed to be the set of integers, E(x) represents x is even, and O(x), x is odd. "Some integers are even and some are odd" can be translated as x E(x) x O(x) "No integer is even" can go to x E(x) "If an integer is not even, then it is odd" becomes x[ E(x) O(x)]

"2 is even" is E(2) More difficult translation: In these translations, properties and relationships are mentioned for certain type of elements in the universe such as relationships between integers in the universe of numbers rather than the universe of integers. In such a case the element type is specified as a precondition using if_then construct. Examples: In the examples that follow the universe is the set of numbers including real numbers, and complex numbers. I(x), E(x) and O(x) representing "x is an integer", "x is even", and "x is odd", respectively. "All integers are even" is transcribed as x [ I(x) E(x)] It is first restated as "For every object in the universe (meaning for every number in this case) if it is integer, then it is even". Here we are interested in not any arbitrary object(number) but a specific type of objects, that is integers. But if we write x it means "for any object in the universe". So we must say "For any object, if it is integer .." to narrow it down to integers. "Some integers are odd" can be restated as "There are objects that are integers and odd", which is expressed as x [ I(x) E(x)] For another interpretation of this sentence see a note "A number is even only if it is integer" becomes x [ E(x) I(x)] "Only integers are even" is equivalent to "If it is even, then it is integer". Thus it is translated to x [ E(x) I(x)] Proving Things in Predicate Logic To prove things in predicate calculus we need two things. First we need to know what inference rules are valid - we can't keep going back to the formal semantics when trying to draw a simple inference! Second we need to know a good proof procedure that will allow us to prove things with the inference rules in an efficient manner. When discussing propositional logic we noted that a much used inference rule was modus ponens: A, ApB --B This rule is a sound rule of inference for predicate logic. Given the semantics of the logic, if the premises are true then the conclusions are guaranteed true. Other sound inference rules include modus tollens (if A p B is true and B is false then conclude ~ A), and-elimination (if A B is true then conclude both A is true and B is true), and lots more. In predicate logic we need to consider how to apply these rules if the expressions involved have variables. For example we would like to be able to use the facts X (man(X) p mortal(X)) and man(socrates) and conclude mortal(socrates). To do this we can use modus ponens, but allow universally quantified sentences to be matched with other sentences (like in Prolog). So, if we

have a sentence X A p B and a sentence C then if A and C can be matched or unified then we can apply modus ponens. Representing Things in Predicate Logic Your average AI programmer/researcher may not need to know the details of predicate logic semantics or proof theory, but they do need to know how to represent things in predicate logic, and what expressions in predicate logic mean. Formally we've already gone through what expressions mean, but it may make more sense to give a whole bunch of examples. This section will just give a list of logical expressions paired with English descriptions, then some unpaired logical or English expressions - you should try and work out for yourself how to represent the English expressions in Logic, and what the Logic expressions mean in English. There may be exam questions of this sort. y   p m x Table(x) ~numberoflegs (x,4) ``There is some table that doesn't have 4 legs'' x (macintosh(x) p ~realcomputer(x)) ``No macintosh is a real computer'' or ``If something is a macintosh then its not a real computer'' x glaswegian(x) p (supports(x,rangers) V supports(x,celtic)) ``All Glaswegians support either Celtic or Rangers'' existsXsmall(x) on(x, table) ``There is something small on the table'' y ``All elephants are grey'' y ``Every apple is either green or yellow'' y ``There is some student who is intelligent'' y x red(x) on(x,table) p small(x)
y ~x brusslesprout(x) lasiy(x) [Note: When asked to translate English statements into predicate logic you should NOT use set

expressions. So I don't want things like. x : Xecarrols : orange(x)]

3.8 Properties of Statements

Satisfiable A statement is satisfiable if there is some interpretation for which it is true. For exp P is satisfiable because we can assign either true to P or false to P. Contradiction A sentence is contradictory (unsatisfiable) if there is no interpretation for which it is true. For exp P & ~P is always contradiction since every interpretation results in a value of false. Valid A sentence is valid if it is true for every interpretation. Valid sentence are also called tautologies. For exp P V ~P is always valid since every interpretation results in a value of true. Equivalence. Two sentences are equivalent if they have the same truth-value under every interpretation. For exp P & ~(~P) are equivalent since each has the same truth values under every interpretation. Logical consequences A sentence is a logical consequence of another is it is satisfied by all interpretations which satisfy the first. For exp P is a logical consequence of (P & Q) since any interpretation for which (P & Q) is true, P is also true.

SOME EQUIVALENCE LAW Idempotent Laws Commutative Laws Distributive Laws Associative Laws Absorptive Laws De Morgans Laws Conditional elimination Bi-conditional elimination P P |P PP|P PQ|QP PQ|QP P (Q R) | (P Q) (P R) P (Q R) | (P Q) (P R) P (Q R) | (P Q) R P (Q R) | (P Q) R A (A B) | A A (A B) | A
(P Q) | P Q (P Q) | P Q

P p Q | P Q P m Q = (P Q) & (Q P)

TRUTH TABLE FOR EQUIVALANENT SENTENCES P true true false false Q true false true false ~P false false true true (~PvQ) true false true true (P Q) true false true true (Q P) true true false true (P Q)&(Q P) true false false true

3.9 Inference Rules

The inference rules of PL provide the means to perform logical proofs or deductions. There are two categories of inference rules A. Deductive inference Rules are those inference rules, which are certain. The followings are the deductive inference rules 1. Modus ponens : From P and P Q infer Q. This is sometimes written as Assertion: P Implication: P Q Conclusion: Q Here is an example of an argument that fits the form modus ponens: If democracy is the best system of government, then everyone should vote. Democracy is the best system of government. Therefore, everyone should vote. 2. Modus tollens : Assertion:


Implication: P Q Conclusion: ~Q If there is fire here, then there is oxygen here. There is no oxygen here. Therefore, there is no fire here. 3. Chain rules : From P Q, and Q R, infer P R. Or P Q Q R P R For example, Given : (programmer likes LISP) (programmer hate COBOL) And : (programmer hates COBOL) (Programmer likes recursion) Conclusion : (programmer likes LISP) (Programmer likes recursion)

4. Substitution if s is a valid sentence, s derived from s by consistent substitution of propositions in s, is also valid. For example, the sententence P V ~p is valid.

B. Non-Deductive inference Rules are those inference rules, which are not certain. 1. Abductive Inference : Abductive inference is based on the use of known casual knowledge to explain or justify a (possibly invalid) conclusion. Given the truth of proposition Q and the implication P Q, conclude P. For example, people who have had too much to drink tend to stagger when they walk. Therefore, it is not unreasonable to conclude that a person who is staggering is drunk even though this may be an incorrect conclusion. People may stagger when they walk for other reasons, including dizziness from twirling in circles or from some physical problem. We may represent abductive inference with the following, where the c over the implication arrow is meant to imply a possible causal relationship. assertion Q implication P Q conclusion P abductive inference is useful when known causal relations are likely and deductive inferencing is not possible for lack of facts. 2. Inductive Inference : Inductive inference is based on the assumption that a recurring pattern, observed for some event or entity, implies that the pattern is true for all entities in the class. For exp, after seeing a few white swans, we incorrectly infer that all swans are white We can represent inductive inference using the following description. P(a1),,P(ak) x P(x)

Inductive inference, of course, is not a valid form of inference, since it is not usually the case that all objects of a class can be verified as having a particular property. Even so, this is an important and commonly used form of inference. 3. Analogical Inference : Analogical inference is a form of experiential inference. Situation or entities which are alike in some respects tend to be similar in other respects. Thus, when we find that situation(object) A is related in certain ways to B, and A is similar in some context to A, we conclude that B has a similar relation to A in this context. We depict this form of inference with the following description, where the r above the implication symbol mean is related to. P Q P Q Analogical inference, like abductive and inductive is a useful but invalid form of commonsense inference.

3.10 Resolution
The most well known general proof procedure for predicate calculus is resolution. Resolution is a sound proof procedure for proving things by refutation - if you can derive a contradiction from ~P then P must be true. In resolution theorem proving, all statements in the logic are transformed into a normal form involving disjunctions of atomic expressions or negated atomic expressions (e.g., ~dog(X) V animal(X)). This allows new expressions to be deduced using a single inference rule. Basically, if we have an expression A1 v A2 ...v An v ~C and an expression B1 v B2 ...v Bm v C then we can deduce a new expression A1 v A2 ...v An v B1 v B2 ...v Bm. This single inference rule can be applied in a systematic proof procedure. Resolution is a sound proof procedure. If we prove something using it we can be sure it is a valid conclusion. However, there are many other things to worry about when looking at a proof procedure. It may not be complete (i.e., we may not be able to always prove something is true even if it is true) or decidable (the procedure may never halt when trying to prove something that is false). Variants of resolution may be complete, but no proof procedure based on predicate logic is decidable. And of course, it may just not be computationally efficient. It may eventually prove something, but take such a long time that it is just not usable. The efficiency of a proof will often depend as much on how you formulate your problem as on the general proof procedure used, but it is still an important issue to bear in mind. Resolution is very simple. Given two clauses C1 and C2 with no variables in common, if there is a literal l1 in C1 which is a complement of a literal l2 in C2, both l1 and l2 are deleted and a disjuncted C is formed from the remaining reduced clauses. The clause C is called the resolvent of C1 and C2. For example, to resolve the two clauses (~P V Q) and (~Q V R) we write ~P V Q, ~Q V R ~P V R several types of resolution are possible depending on the number and types of parents. We define a few of these types below.

Binary Resolution : Two clauses having complementary literals are combined as disjuncts to produce a single clause after deleting the complementary literals. For example, the binary resolvent of ~P(x,a) V Q(x) and ~Q(b) V R(x) is just ~P(b,a) V R(b) Unit resulting (UR) resolution : A number of clauses are resolved simultaneously to produce a unit clause. All except one of the clauses are unit clauses, and that one clause has exactly one more literal than the total number of unit clauses. For example, resolving the set {~MARRIED(x,y) V ~MOTHER(x,z) V FATHER(y,z), MARRIED(sue,joe), ~FATHER(joe,bill)} Where the substitution F = { sue/x, joe/y, bill/z } is used, resulting in the unit clause ~MOTHER(sue,bill). Linear resolution : When each resolved clause Ci is a parent to the clause Ci+1 (i = 1,2 .,n-1) the process is called linear resolution. For example, given a set S of clauses with C0 S, Cn is derived by a sequence of resolutions, C0 with some clause B0 to get C1, then C1 with some clause B1 to get C2, and so on until Cn has been derived. Linear input resolution : if one of the presents in linear resolution is always from the original set of clauses (the Bi), we have linear input resolution. For example, given the set of clauses S = { P V Q, ~P V Q, P V ~Q, ~P V ~Q} let C0 = (P V Q). Choosing B0 = ~P V Q from the set S and resolving this with C0 we obtain the resolvent Q = C1. B1 must now be chosen from S and the resolvent of C1 and B1 becomes C2 and so on.


As noted earlier, we are interested in mechanical inference by programs using symbolic FOPL expressions. One method we shall examine is called resolution. It requires that all statements be converted into a normalized clausal form. To transform a sentence into clausal form requires the following steps: Step 1. Eliminate all implication and equivalency connectives (use ~P V Q in place of P Q) and (~P V Q) & (~Q V P) in place of P m Q. Step 2. Move all negations in to immediately precede an atom (use P in place of ~(~P), and DeMorgans laws, x ~F[x] in place of ~(x) F[x] and x ~F[x] in place of ~(x) F[x]). Step 3. Rename variables, if necessary, so that all quantifiers have different variable assignments; that is, rename variables so that variables bound by one quantifier are not the same as variables bound by a different quantifier. For example, in the expression x (P(x) (y Q(y))). Step 4. Skolemize by replacing all existentially quantified variables with Skolem functions as described below, and deleting the corresponding existential quantifiers. We describe the process of eliminating the existential quantifiers through a substitution process. This process requires that all such variables be replaced by something called Skolem functions, arbitrary functions which can always assume a correct value required of an existentially quantified variable. For simplicity in what follows, assume that all quantifiers have been properly moved to the left side of the expression, and each quantifies a different variable. Skolemization, the replacement

of existentially quantified variables with Skolem function and deletion of the respective quantifiers, is then accomplished as follows: 1. If the first (leftmost) quantifier in an expression is an existential quantifier, replace all occurrences of the variable it quantifies with an arbitrary constant not appearing elsewhere in the expression and deleting the quantifier. This same procedure should be followed for all other existential quantifiers not preceded by a universal quantifier, in each case, using different constant symbols in the substitution. 2. For each existential quantifier that is preceded by one or more universal quantifiers ( is within the scope of one or more universal quantifiers), replace all occurrences of the existentially quantified variable by a function symbol not appearing elsewhere in the expression. The arguments assigned to the function should match all the variables appearing in each universal quantifier which precedes the existential quantifier. This existential quantifier should then be deleted. The same process should be repeated for each remaining existential quantifier using a different function symbol and choosing function arguments that correspond to all universally quantified variables that precede the existentially quantified variable being replaced. An example will help to clarify this process. Given the expression x v x y P(f(u),v,x,y) Q(u,v,y) the Skolem form is determined as v x P(f(a),v,x,g(v,x)) Q(a,v,g(v,x)). In making the substitutions, it should be noted that the variable you appearing after the first existential quantifier has been replaced in the second expression by the arbitrary constant a. This constant did not appear elsewhere in the first expression. The variable y has been replaced by the function symbol g having the variable v and x as arguments, since both of these variables are universally quantified to the left of the existential quantifier for y. Replacement of y by an arbitrary function with arguments v and x is justified on the basis that y, following v and x, may be functionally dependency. Step 5. Move all universal quantifiers to the left of the expressions and put the expression on the right into CNF. Step 6. Eliminate all universal quantifiers and conjunctions since they are retained implicitly. The resulting expressions (the expressions previously connected by the conjunctions) are clauses and the set of such expressions is said to be in clausal form. As an example of this process, let us convert the expression x y (z P(f(x),y,z) (u Q(x,u) & v R(y,v))). into clausal form. We have after application of step 1 x y (~(z) P(f(x),y,z) V (u Q(x,u) & (v) R(y,v))). After application of step 2 we obtain x y (z ~P(f(x),y,z) V (u Q(x,u) & (v) R(y,v))). After application of step 4 (step 3 is not required) y (~P(f(a),y,g(y)) V Q(a,h(y)) & R(y,l(y))). After application of step 5 the result is y((~P(f(a),y,g(y)) V Q(a,h(y)) & (~P(f(a),y,g(y)) V R(y,l(y))). Finally, after application of step 6 we obtain the clausal form ~P(f(a),y,g(y)) V Q(a,h(y))

~P(f(a),y,g(y) V R(y,l(y))

3.12 Unification
Resolution works on the principle of identifying complementary literals in two clauses and deleting them thereby forming a new literal (the resolvent). The process is simple and straightforward when one has identical literals. In other words, for clauses containing no variables, resolution is easy. There are three major types of substitutions, viz. 1. Substitution of a variable by a constant. 2. Substitution of a variable by another variable. 3. Substitution of a variable by a function that does not contain the same variable. A substitution that makes two clauses resolvable is called unifier and the process of identifying such unifier is carried out by the unification algorithm. We can also define the same in another way that any substitution that makes two or more expressions equal is called a unifier for the expressions. The unification algorithm tries to find out the Most General Unifier (MGU) between a given set of atomic formulae. For example, to unify P(f(a,x),y,y) and P(x,b,z) we first rename variables so that the two predicates have no variables in common. This can be done by replacing the x in the second predicate with u to given P(u,b,z). Next, we compare the two symbol-by-symbol from left to right until a disagreement is found. Disagreement can be between two different variables, a nonvariable term and a variable, or two nonvariable terms if no disagreement is found, the two are identical and we have succeeded. If a disagreement is found and both are nonvariable terms, unification is impossible, so we have failed. If both are variable, one is replaced throughout by other. Finally, if the disagreement is a variable and a nonvariable term, the variable is replaced by the entire term. Of course, in this last step, replacement is possible only if the term does not contain the variable that is being replaced. This matching process is repeated until the two are unified or until a failure occurs. For the two predicate P. above, a disagreement is first found between the term f(a,x) and variable u. Since f(a.x) does not contain the variable u, we replace u with f(a,x) everywhere it occurs in the literal. This gives a substitution set of { f(a,x)/u } and the partially matched predicates P (f(a,x),y,y) and P (f(a,x),b,z). Proceeding with the match, we find the next disagreement pair, y and b, a variable and term, respectively. Again, we replace the variable y with the term b and update the substitution list to get { f(a,x)/u, b/y }. The final disagreement pair is two variables. Replacing the variable in the second literal with the first we get the substitution set { f(a,x)/u, b/y, y/z } or, equivalently, { f(a,x/u, b/y, b/z}. Note that this procedure can always give the most general unifier.


4.1 Semantic Network 4.2 Conceptual Graph 4.3 Frame Structures 4.4 Conceptual Dependency 4.5 Scripts


Network representation gives a pictorial presentation of objects, their attributes and relationships that exist between them and other entities. A semantic network or a semantic net is a structure for representing knowledge as a pattern of interconnected nodes and arcs. It is also defined as a graphical representation of knowledge. The objects under consideration serve as nodes and the relationships with another nodes give the arcs. The following rules about nodes and arcs generally apply to most of the semantic networks. 1. Nodes in the semantic net represent either o Entities o Attributes o State or o Events 2. Arcs in the net give the relationship between the nodes and labels on the arc specify what type of relationship actually exists. Using a simple semantic net, it is possible to add more knowledge by linking other objects with different relationships. Following figure shows this















From this, it is possible for us to say that a scooter is a two-wheeler and it is a moving vehicle. The network also shows that a moving vehicle needs an engine (could be petrol diesel or any engine), a fuel system to sustain the engine running, an electrical system for its lights, horn and also for initial ignition (in case of petrol vehicles) and brakes (of course, very important).

Unlike FOPL, there is neither generally accepted syntax nor semantics for associative networks. Such rules tend to be designer dependent and vary greatly from one implementation to another. Classification of Nodes in a Semantic Net Generally, the nodes in the semantic net are classified as y Generic nodes. y Individual or instance nodes. A Generic node is a very general node. In the above fig, for the semantic network of Movingvehicle, the Two-wheeler is a generic node because many two-wheeler exist. On the contrary, individual or instance node explicitly state that they are specific node. In the above figure Scooter is an individual node. It is a very specific instance of the two-wheeler. A number of arc relations have become common among users. They include such predicate as is-a, member-of, subset-of, ako (a kind of), has-parts, instance-of, agent etc. Less common arcs have also been used to express modality relation (time, manner, mood), linguistics case relations (theme, source, goal), logical connectives (or, not, and, implies), quantifier (all, some), set relations (superset, subset, member), attributes, and quantification (ordinal, count). One particular arc or link, the is-a link, has taken on a special meaning. It signifies that scooter is a two-wheeler and motorbike is a two-wheeler. Is-a relationship occurs in many representations of worlds. Bill is a student, a car is a furry animal, a tree is a plant, and so on. The is-a link is most often used to represent the fact than an object is of a certain type (predication) or to express the fact that one type is a subtype of another (for example, conditional quantification). Semantic Network structure permits the implementation of property inheritance, a form of inference. Nodes, which are members or subsets of other nodes, may inherit properties from their higher-level ancestor nodes. For example, from the following fig. It is possible to infer that a mouse has hair and drinks milk. Hair mammal is-a rodent is-a mouse milk


Although there are no commonly accepted standards for a syntax and semantics for associative networks, we present an approach in this section, which we feel may at least become a de facto standard in the future. It is based on the use of the conceptual graph as a primitive building block for semantic networks. A conceptual graph is a graphical portrayal of a mental perception which consists of basic or primitive concepts and the relationships that exist between the concepts. A single conceptual graph is roughly equivalent to a graphical diagram of a natural language sentence where the words are depicted as concepts and relationships. Conceptual graphs may be regarded as

formal building blocks for semantic network which, when linked together in a coherent way, form a more complex knowledge structure. An example of such a graph which represents the sentence Ram is eating soup with a spoon is depicted as A plumber is carrying a pipe.







In this figure concepts are enclosed in boxes and relations between the concepts are enclosed in ovals. The direction of the arrow corresponds to the order of the arguments in the relation they connect. The last or nth arc (argument) points away from the circle relation and all other arcs point toward the relation. Concept symbols refer to entities, actions, properties, or events in the world. A concept may be individual or generic. Individual concepts have a type field followed by a referent field. The concept [PERSON:ram] has type PERSON and referent ram. Referent like ram and food in figure are called individual concepts since they refer to specific entities. EAT and SPOON have no referent fields since they are generic concepts which refer to unspecified entities. Concepts like AGENT, OBJECT, INSTRUMENT, and PART are obtained from a collection of standard concepts. New concepts and relations can also be defined from these basic ones. A linear conceptual graph, which is easier to present as text can also be given. The linear form equivalent to the above sentence is [PERSON:ram] (AGENT) [EAT](OBJECT) [FOOD:soup] (INSTRUMENT) (SPOON) where square bracket have replaced concept boxes and parentheses have replaced relation circles. Some Examples of Conceptual Graphs are Example 1. "John is going to Boston by bus."

In DF, concepts are represented by rectangles: [Go], [Person: John], [City: Boston], and [Bus]. Conceptual relations are represented by circles or ovals: (Agnt) relates [Go] to the agent John, (Dest) relates [Go] to the destination Boston, and (Inst) relates [Go] to the instrument bus.

Above figure could be read as three English sentences: y Go has an agent which is a person John. y Go has a destination which is a city Boston. y Go has an instrument which is a bus. The linear form for CGs is intended as a more compact notation than DF, but with good human readability. It is exactly equivalent in expressive power to the abstract syntax and the display form. Following is the LF for above Figure : [Go](Agnt)->[Person: John] (Dest)->[City: Boston] (Inst)->[Bus]. In this form, the concepts are represented by square brackets instead of boxes, and the conceptual relations are represented by parentheses instead of circles. A hyphen at the end of a line indicates that the relations attached to the concept are continued on subsequent lines. Example 2. A person is between a rock and a hard place.

. In LF, above Figure may be represented in the following form: [Person]<-(Betw)<-1-[Rock] <-2-[Place]->(Attr)->[Hard]. Example 3. Tom believes that Mary wants to marry a sailor A conceptual graph containing a nest of two contexts

In the above Figure, Tom is the experiencer (Expr) of the concept [Believe], which is linked by the theme relation (Theme) to a proposition that Tom believes. The proposition box contains another conceptual graph, which says that Mary is the experiencer of [Want], which has as theme a situation that Mary hopes will come to pass. That situation is described by another nested graph, which says that Mary (represented by the concept [&top;]) marries a sailor. The dotted line, called a coreference link, shows that the concept [&top;] in the situation box refers to the same individual as the concept [Person: Mary] in the proposition box. Following is the linear form of Figure 4: [Person: Tom]<-(Expr)<-[Believe]->(Thme)[Proposition: [Person: Mary *x]<-(Expr)<-[Want]->(Thme)[Situation: [?x]<-(Agnt)<-[Marry]->(Thme)->[Sailor] ]].


Frame were first introduced by Marvin Minsky (1975) as a data structure to represent a mental model of a stereotypical situation such as driving a car, attending a meeting, or eating in a restaurant. Knowledge about an object or event is stored together in memory as a unit. Then, when a new situation is encountered, an appropriate frame is selected from memory for use in reasoning about the situation. Frame are general record-like structure which consist of a collection of slots and slot values. The slots may be of any size and type. Slots typically have names and values or sub fields called facets. Facets may also have names and any number of values. A general frame template structure is illustrated in following figure (<frame name> (<slot1> (<facet1><value1><valuek1>) (<facet2><value1><valuek2>) . . . (<slot2> (<facet 1><value1><valuekm>)

. . .) An example of a simple frame for Ram is depicted in following figure (ram (PROFESSION (VALUE professor)) (AGE (VALUE 40)) (WIFE (VALUE sita)) (CHILDREN (VALUE love kush)) (ADDRESS (STREET (VALUE 7)) (CITY (VALUE audhya)) (STATE (VALUE uttarpradesh)) (ZIP (VALUE 124507)))) From the above figure it will be seen that a frame may have any number of slots, and a slot may have any number of facets, each with any number of values. This gives a very general framework from which to build a variety of knowledge structures. The slots in a frame specify general or specific characteristics of the entity for which the frame represents, and sometimes they include instructions on how to apply or use the slot values. Typically, a slot contains information such as attribute value pairs, default values, conditions for filling a slot, pointers to other related frames, and procedures that are activated when needed for different purposes. Facets (subslots) describe some knowledge or procedures about the attribute in the slot. May takes many form such as: y a constraint value; for example, the slot 'age', would be constraint age has to be an integer between 0 and 120 y a default value; for example, unless there is contrary evidence it is assumes that all people like sambal belacan y If-added Procedure: Executes when new information is placed in the slots. y If-removed Procedure: Executes when information is deleted from the slot. y If-needed Procedure: Executes when new information is needed from the slot, but the slot is empty. y If-changed Procedure: Executes when information changes. Procedural attachments are called demons. They are used to derive slot values. Important aspects of procedural attachments are that they can be used to direct reasoning process. Taking another example (ford (AKO (VALUE car)) (COLOR (VALUE silver)) (MODEL (VALUE 4-door)) (GAS-MILELAGE (DEFALUT fget)) (RANGE (VALUE if-needed))

(WEIGHT (VALUE 2500)) (FUEL-CAPACITY (VALUE 18))) The Ford frame illustrated in above figure has attribute-value slots (COLOR: silver, MODEL: 4door, and the like), a slot; which takes default values for GAS-MILAGE, and a slot with an attached if-needed procedure. The value fget in the GAS-MILAGE slot is a function call to fetch a default value from another frame such as the general car frame for which ford is a-kind-of (AKO). When the value of this slot is evaluated, the fget function is activated. When fget finds no value for gas mileage it recursively looks for a value from ancestor frame until a value is found. The if-needed value in the Range slot is a procedure name that, when called, computes the driving range of the Ford as a function of gas mileage and fuel capacity. Slots with attached procedures such as fget and if-needed are called procedural attachments or demons. They are done automatically when a value is needed but not provided for in a slot.


Conceptual Dependency originally developed to represent knowledge acquired from natural language input. The goals of this theory are: y To help in the drawing of inference from sentences. y To be independent of the words used in the original input. y That is to say: For any two or more sentences that are identical in meaning there should be only one representation of that meaning. In CD theory different types of basic building blocks are distinguished. Each of these types, in turn, has several subtypes. The types are made up of entities, actions, conceptual cases, conceptual tenses. ENTITIES Picture producer (PP) are actors or physical objects (including human memory) that perform different acts. Picture aiders (PA) are supporting properties or attributes of producers. ACTIONS Primitive action (ACTs) as list below. Action aiders (AA) are properties or attributes of primitive actions. Examples of Primitive Acts are: ATRANS -- Transfer of an abstract relationship. e.g. give. PTRANS -- Transfer of the physical location of an object. e.g. go. PROPEL -- Application of a physical force to an object. e.g. push. MTRANS -- Transfer of mental information. e.g. tell. MBUILD -- Construct new information from old. e.g. decide. SPEAK

-- Utter a sound. e.g. say. ATTEND -- Focus a sense on a stimulus. e.g. listen, watch. MOVE -- Movement of a body part by owner. e.g. punch, kick. GRASP -- Actor grasping an object. e.g. clutch. INGEST -- Actor ingesting an object. e.g. eat. EXPEL -- Actor getting rid of an object from body. e.g. ????. CONCEPTUAL CASES (ALL ACTIONS INVOLVE ONE OR MORE OF THESE) o -- Objective Case R -- Recipient Donor Case. I -- Instrumental Case e.g. eat with a spoon. D -- Destination or Directive Case e.g. going home. CONCEPTAL TENSES (TIME OR ACTION OR STATE OF BEING) Conditional (c) Continuing (k) Finished Transition (tf) Future (f) Interrogative (?) Negative (/) Past (p) Present (nil) Start Transition (ts) Timeless (delta) Transition (t) CONCEPTUAL DEPENDENCIES Sementic rules for the formation of dependency structures such as the relationship between an actor and an event or between a primitive action and an instrument. 1. Bird flew. p PP ACT Bird PTRANS


Ram is a student.

PP PP Ram student 3. o ACT 4. Shyam pushed the door. p o PP shyam PROPEL Joe gave sue a flower PP ACT PP o flower 5. CT Joe ate some soup. I Joe INGEST o soup p Joe ATRANS r Sue Joe



do o spoon

Advantages of CD: y Using these primitives involves fewer inference rules. y Many inference rules are already represented in CD structure. y The holes in the initial structure help to focus on the points still to be established. Disadvantages of CD: y Knowledge must be decomposed into fairly low level primitives. y Impossible or difficult to find correct set of primitives. y A lot of inference may still be required. y Representations can be complex even for relatively simple actions. Consider: Dave bet Frank five pounds that Wales would win the Rugby World Cup. Complex representations require a lot of storage Applications of CD: MARGIE (Meaning Analysis, Response Generation and Inference on English) -- model natural language understanding. SAM (Script Applier Mechanism) -- Scripts to understand stories. See next section. PAM (Plan Applier Mechanism) -- Scripts to understand stories. Schank et al. developed all of the above.


A script is a structure that prescribes a set of circumstances which could be expected to follow on from one another. It is similar to a thought sequence or a chain of situations which could be anticipated. It could be considered to consist of a number of slots or frames but with more specialised roles. Scripts are beneficial because: y Events tend to occur in known runs or patterns. y Causal relationships between events exist. y Entry conditions exist which allow an event to take place y Prerequisites exist upon events taking place. E.g. when a student progresses through a degree scheme or when a purchaser buys a house. The components of a script include: Entry Conditions -- these must be satisfied before events in the script can occur. Results -- Conditions that will be true after events in script occur. Props -- Slots representing objects involved in events. Roles -- Persons involved in the events. Track -- Variations on the script. Different tracks may share components of the same script. Scenes -- The sequence of events that occur. Events are represented in conceptual dependency form. Scripts are useful in describing certain situations such as robbing a bank. This might involve: y Getting a gun. y Hold up a bank. y Escape with the money. Here the Props might be y Gun, G. y Loot, L. y Bag, B y Get away car, C. The Roles might be: y Robber, S. y Cashier, M. y Bank Manager, O. y Policeman, P. The Entry Conditions might be: y S is poor. y S is destitute. The Results might be: y S has more money.

O is angry. y M is in a state of shock. y P is shot. There are 3 scenes: obtaining the gun, robbing the bank and the getaway.

Result: S has more money. O is angry. M is in a state of shock. P is shot.

Fig. Simplified Bank Robbing Script Some additional points to note on Scripts:

If a particular script is to be applied it must be activated and the activating depends on its significance. y If a topic is mentioned in passing then a pointer to that script could be held. y If the topic is important then the script should be opened. y The danger lies in having too many active scripts much as one might have too many windows open on the screen or too many recursive calls in a program. y Provided events follow a known trail we can use scripts to represent the actions involved and use them to answer detailed questions. y Different trails may be allowed for different outcomes of Scripts ( e.g. The bank robbery goes wrong). Advantages of Scripts: y Ability to predict events. y A single coherent interpretation may be build up from a collection of observations. Disadvantages: y Less general than frames. May not be suitable to represent all kinds of knowledge.


5.1 Search and Control Strategies 5.2 Preliminary Concepts 5.3 Water Container Problem 5.4 Production System 5.5 Problem Characteristics 5.6 Means-end analysis 5.7 Problem Reduction 5.8 Uninformed or Blind Search 5.8.1 Breadth-First Search 5.8.2 Depth-First Search 5.9 Informed Search 5.9.1 Hill Climbing Methods 5.9.2 Best First Search 5.9.3 A* Algorithm

5.1 Search and Control Strategies

Search is one of the operational tasks that characterize AI programs best. Almost every AI program depends on a search procedure to perform its prescribed functions. Problems are typically defined in terms of states, and solutions correspond to goal states. Solving a problem then amounts to searching through the different states until one or more of the goal states are found. In this chapter we investigate search techniques that will be referred to often in subsequent chapters.

5.2 Preliminary concepts

Problem can be characterized as a space consisting of a set of states and a set of operators that map from one state to another states. Three types of states may be distinguished one or more initial states, a number of intermediate states, and one or more goal states. A solution to a problem is a sequence of operators that map an initial state to a goal state. A best or good solution is one that requires the fewest operations or the least cost to map from an initial state to a goal state. The performance of a particular solution method is judged by the amount of time and memory space required to complete the mapping. Thus, a solution based on some algorithm A1 is consider better than one using algorithm A2 if the time and space complexity of A1 is less than that of A2. It is customary to represent a search space as a diagram of a directed graph or a tree. Each node or vertex in the graph corresponds to a problem state, and arcs between nodes correspond to transformations or mapping between the states. The immediate successor of a node are referred to as children, siblings, or offspring, and predecessor nodes are ancestors. An immediate ancestor to a node is a parent. Search can be characterized as finding a path through a graph or tree structure. This requires moving from node to node after successively expanding and generating connected nodes. Node generation is accomplished by computing the identification or representation code of children

nodes from a parent node. Once this is done, a child is said to be generated and the parent is said to be explored. The process of generating all of the children of a parent is also known as expanding the node. A search procedure is a strategy for selecting the order in which nodes are generated and a given path selected. Search problem may be classified by the information used to carry out a given strategy. In blind or uninformed search, no performance is given to the order of successor node generation and selection. The path selected is blindly or mechanically followed. No information is used to determine the preference of one child over another. In informed or directed search, some information about the problem space is used to compute a preference among the children for exploration and expansion. Before proceeding with a comparison of strategies, we consider next some typical search problems.


There is a 4l container and 3l container ; neither has any measuring markers on it. There is a pump that can be used to fill the containers with water.Problem to solve is to get exactly two liters of water in the 4l container. SOLUTION From initial state to goal state through appropriate sequence of moves or actions such as filling and emptying the containers. Content of the two containers at any given time is a problem state. Let : x - content of the 4l container y - content of the 3l container Then : (x,y) - problem state represented by an ordered pair. The set of all ordered pairs is the space of problem states or the state-space of the problem . State-space : { (x,y) | x = 0,1,2,3,4 y = 0,1,2,3 } Data structure to represent the state-space can be : o vectors o sets o arrays o lists etc... Problem statement : initial state (0,0) goal state (2,y) where y = any possible number. Moves transform from one state into another state. Operators determine the moves. Operators for the problem state-space : 1. Fill the 4l container 2. Fill the 3l container 3. Empty the 3l container 4. Empty the 3l container

Pour water from 3l container into 4l conatiner until 4l container is full Pour water from 4l container into the 3l container until the 3l container is full 7. Pour all the water from 3l container into the 4l container 8. Pour all the water from 4l container into the 3l container Preconditions need to be staisfied before an operator can be applied.

5. 6.

EXAMPLE : # 1 can be applied if there is less than 4l water in the container. IF there is less than 4l in the 4l container THEN fill the 4l container. Adding pre-conditions to operators => generation of production rules. Forwarded form of rule # 1 : IF (x,y| x?4) THEN (4,y) The forwarded set of production rules : R1 IF (x,y| x?4) THEN (4,y) R2 IF (x,y| y?3) THEN (x,3) R3 IF (x,y| x>0) THEN (0,y) R4 IF (x,y| y>0) THEN (x,0) R5 IF (x,y| x+y>=4 ? y>0 ? x?4) THEN (4,y-(4-x)) R6 IF (x,y| x+y>=3 ? x>0 ? y?3) THEN (x-(3-y),3) R7 IF (x,y| x+y?=4 ? y>0) THEN (x+y,0) R8 IF (x,y| x+y?=3 ? x>0) THEN (0,x+y) In certain states, more than one rule can be applied. EXAMPLE: (4,0) satisfies the preconditions of R2,R3 ? R6

5.4 Production System

Since search forms the core of many intelligent processes. It is useful to structure AI programs in a way that facilitates describing and performing the search process. Production systems provide such structures. A definition of a production system is given below. Do not be confused by other uses of the word production, such as to describe what is done in factories. A production system consists of : y A set of rules, each consisting of a left side (a pattern) that determines the applicability of the rule and a right side that describes the operation to be performed if the rule is applied. y One or more knowledge/databases that contain whatever information is appropriate for the particular task. Some parts of the database may be permanent, while other parts of it may pertain only to the solution of the current problem. The information in these databases may be structured in any appropriate way. y A control strategy that specifies the order in which the rules will be compared to the database and a way of resolving the conflicts that arise when several rules match at once. y A rule applier.

So far, our definition of a production system has been very general. It encompasses a great many systems, including water jug problem solver. It also encompasses a family of general production system interpreters, including: y Basic production system languages, such as OPS5 and ACT. y More complex, often hybrid systems called expert system shells, which provide complete (relatively speaking) environments for the construction of knowledge base expert systems. y General problem-solving architectures like SOAR, a system based on a specific set of cognitively motivated hypotheses about the nature of problem solving. All of these systems provide the overall architecture of a production system and allow the programmer to write rules that define particular problems to be solved.

5.5 Problem Characteristics

In order to choose the most appropriate method (or combination of methods) for a particular problem, it is necessary to analyze the problem along several key dimensions: y Is the problem decomposable into a set of (nearly) independent smaller or easier subproblem? y Can solution steps be ignored or at least undone if they prove unwise? y Is the problems universe predictable? y Is a good solution to the problem obvious without comparison to all other possible solutions? y Is the desired solution a state of the world or a path to a state? y Is a large amount of knowledge absolutely required to solve the problem, or is knowledge important only to constrain the search? y Can a computer that is simply given the problem return the solution, or will the solution of the problem require interaction between the computer and a person?

5.6 Means-end analysis.

The problem space of means-end analysis has as initial state and one or more goal states, a set of operators Ok with given preconditions for their application, and a difference function that computes the difference between two states Si and Sj. A problem is solved using means-end analysis by 1. Comparing the current state Si to a goal state Sg and computing the difference Dig. 2. An operator Ok is then selected to reduce the difference Dig. 3. The operator Ok is applied if possible. If not, the current state is saved, a subgoal is created and means-end analysis is applied recursively to reduce on the subgoal. 4. If the subgoal is solved, the saved state is restored and work is resumed on the original problem. Example

R & (~P



Q) & R

(~~P V Q) & R

(P V Q) & R

(Q V P) & R As a simple example, we suppose General Problem Solver is given the initial prepositional logic object Li = (R & (~P Q)) and goal object Lg = ((Q V P) & R). To determine Lg from Li requires a few simple transformations. The system first determines the difference between the two expressions and then systematically reduces these differences until Lg is obtained from Li or failure occurs. For example, a comparison of Li and Lg reveals the difference that R is on the left in Li but on the right in Lg. This causes a subgoal to be set up to reduce this difference. The subgoal, in turn, calls for an application of the reduction method, namely to rewrite Li in the equivalent form LI = ((~P Q) & R). The rest of the solution process follows the path indicated in the tree of above figure. The Key Idea in Means-Ends Analysis is to Reduce Differences The purpose of means-ends analysis is to identify a procedure that cause a transition from the current state to the goal state, or at least to an intermediate state. Here is the general Procedure To perform means-ends analysis,  Until the goal is reached or no more procedures are available  Describe the current state, the goal state, and the difference between the two.  Use the difference between the current state and goal state, possibly with the description of the current or goal state, to select a promising procedure.  Use the promising procedure and update the current state.  If an acceptable solution is found,announce it; otherwise,announce failure. Difference-Procedure Tables Often Determines the Means > The key idea in means-ends analysis is to reduce differences. > Means-ends analysis is often mediated via difference-procedure tables. The difference-procedure table determines what to do, leaving descriptions of the current state and destination state with no purpose other than to specify the origin and destination for the appropriate procedure.

5.7 Problem-Reduction

Sometimes, it is possible to convert difficult goals into one or more easier-to-achive subgoals. Each subgoal, in turn, maybe divided still more finely into one or more lower-level subgoals. The most typical example is in computer programming. Problem Reduction is ubiquitous in Programming.Most real world programs consists of a collection of specialized procedures. Each time one specified procedure calls another, it effects a problem-reduction step.The key idea in Problem Reduction is to Explore a Goal Tree A goal tree is a semantic tree in which nodes represent goals and branches indicate how you can achive goals by sloving one or more subgoals. A goal tree consists of,  And goals, all of which must be satisfied,  Or goals , one of which must be satisfied.

Goel: Acquire TV set

Goal:Steal TV set

Goel:Earn some money

Goal: Buy TV set

5.8 Uninformed or Blind Search

As noted earlier, search problems can be classified by the amount of information that is available to the search process. Such information might relate to the problem space as a whole or to only some states. It may be available a priori or only after a node has been expanded. In a worst case situation the only information available will be the ability to distinguish goal from nongoal nodes. When no further information is known a priori, a search program must perform a blind or uninformed search. A blind or uninformed search algorithm is one that uses no information other than the initial state, the search operators, and a test for a solution. A blind search should proceed in a systematic way by exploring nodes in some predetermined order or simply by selecting nodes at random.

5.8.1 Breadth-First Search

Breadth-first searches are performed by exploring all nodes at a given depth before proceeding to the next level. This means that all immediate children of nodes are explored before any of the childrens children are considered. Breadth-first tree search is illustrated in the figure given below. It has the obvious advantage of always finding a minimal path length solution when one exist. However, a great many nodes may need to be explored before a solution is found, especially if the tree is very full. An algorithm for the breadth-first search is quite simple. It uses a queue structure to hold all generated but sill unexplored nodes. Algorithm 1. Place the starting node s on the queue. 2. If the queue is empty, return failure and stop.

3. If the first element on the queue is a goal node g, return success and stop. Otherwise, 4. Remove and expand the first element from the queue and place all the children at the end of the queue in any order. 5. Return to step 2. The time complexity of the breadth-first search is O(bd). The space complexity is also O(bd).

5.8.2 Depth-First Search

Depth-first searches are performed by diving downward into a tree as quickly as possible. It does this by always generating a child node from the most recently expanded node, then generating that childs children, and so on until a goal is found or some cutoff depth point d is reached. If a goal is not found when a leaf node is reached or at the cutoff point, the program backtracks to the most recently expanded node and generates another of its children. This process continues until a goal is found or failure occurs. Depth-first tree search is illustrated in the figure given below. Algorithm: 1. Place the starting node s on the queue. 2. If the queue is empty, return failure and stop. 3. If the first element on the queue is a goal node g, return success and stop. Otherwise, 4. Remove and expand the first-element, and place the children at the front of the queue (in any order). 5. Return to step 2. The depth-first search is preferred over the breadth-first when the search tree is known to have a plentiful number of goals. Otherwise, depth-first may never find a solution. The time complexity is the same as that for breadth-first, O(bd). It is less demanding in space requirements, however, since only the path from the starting node to the current node needs to be stored. Therefore, if the depth cutoff is d, the space complexity is just O(d)

5.9 Informed Search

When more information than the initial state, the operators, and the goal test is available, the size of the search space can usually be constrained. When this is the case, the better the information available, the more efficient the search process will be. Such methods are known as informed search methods. They often depend on the use of heuristic information i.e. information about the problem (the nature of the states, the cost of transforming from one state to another, the promise of taking a certain path, and the characteristics of the goals) can sometimes be used to help guide the search more efficiently. This information can often be expressed in the form of a heuristic evaluation function f(n,g), a function of the nodes n and/or the goals g. Recall that a heuristic is a rule of thumb or judgmental technique that leads to a solution some of the time but provides no guarantee of success. Heuristic play an important role in search strategies because of the exponential nature of most problems. They help to reduce the number of alternatives from an exponential number to a polynomial number and, thereby, obtain a solution in a tolerable amount of time.

Generally two categories of problem use heuristics. 1. Problem for which no exact algorithm are known and one needs to find an approximate and satisfying solution. e.g., computer vision, speech recognition etc.; 2. Problem for which exact solutions are known, but computationally infeasible. e.g., chess etc. The following algorithm make use of heuristic evaluation functions.

5.9.1 Hill Climbing Methods

Hill climbing is like depth-first searching where the most promising child is selected for expansion. When the children have been generated, alternative choices are evaluated using some type of heuristic function. The path that appears most promising is then chosen and no further reference to the parent or other children is retained. This process continues from node-tonode with previously expanded nodes being discarded. In fact, there is practically no difference between hill-climbing and depth-first search except that the children of the node that has been expanded are sorted by the remaining distance. The algorithm for hill-climbing is given as below Step 1 : Put the initial node on a list START Step 2 : If (START is empty) or (START = GOAL) terminate search Step 3 : Remove the first node from START. Call this node a Step 4 : If (a = GOAL) terminate search with success. Step 5 : Else if node a has successor, generate all of them. Find out how far they are from the goal node. Sort them by the remaining distance from the goal and add them to the beginning of START. Step 6 : Goto Step 2. A typical path is illustrated in following figure. 0













11 22





Hill-climbing technique is being used in some activity or other in out day-to-day chore. Some of them are: 1. While listening to somebody playing flute on the transistor, tone and volume control are adjusted in a way that makes the music melodius.

2. While tuning the carburetor of a scooter, the accelerator is raised to its maximum once and the carburetor is tuned so that the engine keeps on running for a considerably long period of time. 3. An electronics experts, while making the transistor for the first time, tunes the radio set at mid-afternoon when the signal is weak for proper reception. Problem of Hill-Climbing Technique 1. Local maximum: A state that is better than al its neighbors but not so when compared to states to states that are farther away.

Local Maximum 2. Plateau: A flat area of the search space, in which all neighbors have the same value.

Plateau 3. Ridge: Described as a long and narrow stretch of elevated ground or a narrow elevation or raised path running along or across a surface by the Oxford English Dictionary, this is an area in the path which must be traversed very carefully because movement in any direction might maintain one at the same level or result in fast descent.

Ridge In order to overcome these problems, adopt one of the following or a combination of the following methods. y Backtracking for local maximum. Backtracking helps in undoing what has been done so far and permits to try a totally different path to attain the global peak. y A big jump is the solution to escape from the plateau. A huge jump is recommended because in a plateau all neighboring point have the same value. y Trying different paths at the same time is the solution for circumventing ridges. The problem encountered with hill climbing can be avoided using a best first search approach.

5.9.2 Best-First Search

Best-first search also depends on the use of a heuristic to select most promising paths to the goal node. Unlike hill climbing, however, this algorithm retains all estimates computed for previously generated nodes and makes its selection based on the best among them all. Thus, at any point in the search process, Best-first search moves forward from the most promising of all nodes generated so far. In so doing, it avoids the potential traps encountered in hill climbing. Then best-first process is illustrated in given figure 9 D 8 E 1 3 A 12 S 6 B 5 C F
J L Goal Node

2 M

6 7 H J First, the start node S is expanded. It has three children A,B and C with values 3,6 and 5 respectively. These values approximately indicate how far they are from the goal node. The child with minimum value namely A is chosen. The children of A are generated. They are D and E with values 9 and 8. The search process has now four nodes to search for. i.e., node D with value 9, node E with value 8, node B with value 6 and node C with value 5. Of them, node C has got the minimal value which is expanded to give node H with value 7. At this point, the nodes available for search are (D : 9), (E : 8), (B : 6) and (H : 7) where (E : F) indicates that (E) is the node and F is its evaluation value. Of these, B is minimal and hence B is expanded to give (F : 12), (G : 14).

At this juncture, the nodes available for search are (D : 9), (E : 8), (H : 7), (F : 12) and (G : 14) out of which (H : 7) is minimal and is expanded to give (I : 5), (J : 6). The entire steps of the search process are given in table below.

Step # Node being Expanded 1 2 3 4 5 6 S A C B H I


Available nodes

Nodes Chosen

(A : 3), (B : 6), (C : 5) (A : 3), (B : 6), (C : 5) (A : 3) (D : 9), (E : 8) (B : 6), (C : 5), (D : 9), (E : 8) (C : 5 ) (H : 7) (B : 6), (D : 9), (E : 8), (H : 7) (B : 6) (F : 12), (G : 14) (D : 9), (E : 8), (H : 7), (H : 7) (F : 12), (G : 14) (I : 5), (J : 6) (D : 9), (E : 8), (F : 12), (I : 5) (G : 14), (I : 5), [J : 6) (K : 1), (L : 0), (M : 2) (D : 9), (E : 8), (F : 12), Search (G : 14), [J : 6), (K : 1), stops as

As you can see, best-first search jumps all around in the search graph to identify the node with minimal evaluation function value. The algorithm is given below 1. Place the starting node s on the queue. 2. If the queue is empty, return failure and stop. 3. if the first element on the queue is a goal node g, return success and stop. Otherwise, 4. Remove the first element from the queue, expand it and compute the estimated goal distances for each child. Place the children on the queue (at either end) and arrange all queue elements in ascending order corresponding to goal distance from the front of the queue. 5. Return to step 2.

5.9.3 A* Algorithm
In best-first search, we brought in a heuristic value called evaluation function value. It is a value that estimates how far a particular node is from the goal. Apart from the evaluation

function values, one can also bring in cost functions. Cost functions indicate how much resources like time, energy, money etc. have been spent in reaching a particular node from the start. While evaluation function values deal with the future, cost function values deal with the past. Since cost function values are really expended, they are more concrete than evaluation function values. If it is possible for one to obtain the evaluation function values and the cost function values, then A* algorithm can be used. The basic principle is that sum the cost and evaluation function values for a state to get its goodness worth and use this as a yardstick instead of the evaluation function value in best-first search. The sum of the evaluation function value and the cost along the path leading to that state is called fitness number. Consider the following figure again with the same evaluation function values. Now associate with each node are three number, the evaluation function value, the cost function value and the fitness number. The fitness number, as stated earlier, is the total of the evaluation function value and the costfunction value. For example, consider node K, the fitness number is 20, which is obtained as follows: (Evaluation function of K) + (Cost function involved from start node S to node K) = I + (Cost function from S to C + Cost function from C to H + Cost function from H to I + Cost function from I to K) = 1 + 6 + 5 + 7 + 1 = 20. While best-first search uses the evaluation function value only for expanding the best node,

D 9 14 2 3 6 3 S 2 6 8 6B A 2 4 3 G 19 11 C 5 5 F 12 18 5 I 14 23 7 18 H 7 6 6 J 2 M 2 21 1 2

E 8 13 K 1 20

20 Goal Node

23 The Algorithm for A* as follows 1. Put the initial node on a list START 2. If (START is empty) or (START = GOAL) terminate search 3. Remove the first node from START. Call this node a 4. if (a = GOAL) terminate search with success. 5. Else if node a has successor, generate all of them. Estimate the fitness number of the successors by totaling the evaluation function value and the cost function value. Sort the list by fitness number. 6. Name the new list as START 1 7. Replace START with START 1 8. Goto Step 2.


6.1 What is Learning? 6.2 Types of Learning 6.2.1 Rote Learning 6.2.2 Learning by Taking Advice 6.2.3 Learning by Problem Solving 6.2.4 Inductive Learning 6.2.5 Explanation Based Learning 6.1 What is Learning? Learning is a an area of AI that focusses on processes of self-improvement. Information processes that improve their performance or enlarge their knowledge bases are said to learn. y Intelligence implies that an organism or machine must be able to adapt to new situations. y It must be able to learn to do new things. y This requires knowledge acquisition, inference, updating/refinement of knowledge base, acquisition of heuristics, applying faster searches, etc. A simple model of learning systems:

Simon [1983] has proposed that learning denotes ...changes in the system that are adaptive in the sense that they enable the system to do the same task or tasks drawn from the same population more efficiently and more effectively the next time. As thus defined, learning covers a wide range of phenomena. At the end of the spectrum is skill refinement (i.e. people get better at many tasks simply by practicing). At the other end of the spectrum lies knowledge aquisition. Many AI programs draw heavily on knowledge as their source of power. How can we learn? Many approaches have been taken to attempt to provide a machine with learning capabilities. This is because learning tasks cover a wide range of phenomena. Listed below are a few examples of how one may learn. We will look at these in detail shortly Skill refinement -- one can learn by practicing, e.g playing the piano. Knowledge acquisition -- one can learn by experience and by storing the experience in a knowledge base. One basic example of this type is rote learning. Taking advice

-- Similar to rote learning although the knowledge that is input may need to be transformed (or operationalised) in order to be used effectively. Problem Solving -- if we solve a problem one may learn from this experience. The next time we see a similar problem we can solve it more efficiently. This does not usually involve gathering new knowledge but may involve reorganisation of data or remembering how to achieve to solution. Induction -- One can learn from examples. Humans often classify things in the world without knowing explicit rules. Usually involves a teacher or trainer to aid the classification. Discovery -- Here one learns knowledge without the aid of a teacher. Analogy -- If a system can recognise similarities in information already stored then it may be able to transfer some knowledge to improve to solution of the task in hand.

6.2 Types of Learning 6.2.1 Rote Learning

Rote Learning is basically memorisation. y Saving knowledge so it can be used again. y Retrieval is the only problem. y No repeated computation, inference or query is necessary. A simple example of rote learning is caching y Store computed values (or large piece of data) y Recall this information when required by computation. y Significant time savings can be achieved. y Many AI programs (as well as more general ones) have used caching very effectively. Memorisation is a key necessity for learning: y It is a basic necessity for any intelligent program -- is it a separate learning process? y Memorisation can be a complex subject -- how best to store knowledge?

6.2.2 Learning by Taking Advice

The idea of advice taking in AI based learning was proposed as early as 1958 (McCarthy). However very few attempts were made in creating such systems until the late 1970s. Expert systems providing a major impetus in this area. There are two basic approaches to advice taking: y Take high level, abstract advice and convert it into rules that can guide performance elements of the system. Automate all aspects of advice taking y Develop sophisticated tools such as knowledge base editors and debugging. These are used to aid an expert to translate his expertise into detailed rules. Here the expert is an integral part of the learning system. Such tools are important in expert systems area of AI.

Automated Advice Taking The following steps summarise this method: Request -- This can be simple question asking about general advice or more complicated by identifying shortcomings in the knowledge base and asking for a remedy. Interpret -- Translate the advice into an internal representation. Operationalise -- Translated advice may still not be usable so this stage seeks to provide a representation that can be used by the performance element. Integrate  When knowledge is added to the knowledge base care must be taken so that bad sideeffects are avoided. E.g. Introduction of redundancy and contradictions. Evaluate -- The system must assess the new knowledge for errors, contradictions etc. The steps can be iterated.

6.2.3 Learning by Problem Solving

There are three basic methods in which a system can learn from its own experiences. y Learning by Parameter Adjustment y Learning by Macro Operators y Learning by Chunking Learning by Parameter Adjustment Many programs rely on an evaluation procedure to summarise the state of search etc. Game playing programs provide many examples of this. However, many programs have a static evaluation function. In learning a slight modification of the formulation of the evaluation of the problem is required. Here the problem has an evaluation function that is represented as a polynomial of the form such as: c1t1 + c2t2 + c3t3 + The t terms a values of features and the c terms are weights. In designing programs it is often difficult to decide on the exact value to give each weight initially. So the basic idea of idea of parameter adjustment is to: y Start with some estimate of the correct weight settings. y Modify the weight in the program on the basis of accumulated experiences. y Features that appear to be good predictors will have their weights increased and bad ones will be decreased. Learning by Macro Operators The basic idea here is similar to Rote Learning: Avoid expensive recomputation Macro-operators can be used to group a whole series of actions into one.

For example: Making dinner can be described a lay the table, cook dinner, serve dinner. We could treat laying the table as on action even though it involves a sequence of actions. The STRIPS problem-solving employed macro-operators in it's learning phase. Consider a blocks world example in which ON(C,B) and ON(A,TABLE) are true. STRIPS can achieve ON(A,B) in four steps: UNSTACK(C,B), PUTDOWN(C), PICKUP(A), STACK(A,B) STRIPS now builds a macro-operator MACROP with preconditions ON(C,B), ON(A,TABLE), postconditions ON(A,B), ON(C,TABLE) and the four steps as its body. MACROP can now be used in future operation. But it is not very general. The above can be easily generalised with variables used in place of the blocks. Learning by Chunking Chunking involves similar ideas to Macro Operators and originates from psychological ideas on memory and problem solving. The computational basis is in production systems (studied earlier). SOAR is a system that use production rules to represent its knowledge. It also employs chunking to learn from experience. Basic Outline of SOAR's Method y SOAR solves problems it fires productions these are stored in long term memory. y Some firings turn out to be more useful than others. y When SOAR detects are useful sequence of firings, it creates chunks. y A chunk is essentially a large production that does the work of an entire sequence of smaller ones. y Chunks may be generalised before storing.

6.2.4 Inductive Learning

This involves the process of learning by example -- where a system tries to induce a general rule from a set of observed instances. This involves classification -- assigning, to a particular input, the name of a class to which it belongs. Classification is important to many problem solving tasks. A learning system has to be capable of evolving its own class descriptions: y Initial class definitions may not be adequate. y The world may not be well understood or rapidly changing. The task of constructing class definitions is called induction or concept learning

6.2.5 Explanation Based Learning (EBL)

Humans appear to learn quite a lot from one example. Basic idea: Use results from one examples problem solving effort next time around. An EBL accepts 4 kinds of input: A training example

-- what the learning sees in the world. A goal concept -- a high level description of what the program is supposed to learn. A operational criterion -- a description of which concepts are usable. A domain theory -- a set of rules that describe relationships between objects and actions in a domain. From this EBL computes a generalisation of the training example that is sufficient not only to describe the goal concept but also satisfies the operational criterion. This has two steps: Explanation -- the domain theory is used to prune away all unimportant aspects of the training example with respect to the goal concept. Generalisation -- the explanation is generalised as far possible while still describing the goal concept.


7.1 What is Expert System? 7.2 Expert System Application Area 7.3 Expert System Structure 7.4 Expert System Characteristics 7.5 Conventional Vs Expert Systems 7.6 Participants in Expert Systems Development 7.7 Tools For Development of Expert System 7.8 MYSIN


Who is an Expert? An expert is a person who has expertise and knowledge in a certain area. Through experience, the expert develops skills that enable him/her to effectively and efficiently solve problem (called heuristics). Definition of Expert Systems: Prof. Edward Feigenbaum of Stanford University, leading researchers in ES has produced the following definition: " . . . An intelligent computer program that uses knowledge and inference procedures to solve problems that are difficult enough to require significant human expertise for their solution." Simply, expert system is a computer program designed to model the problem-solving ability of human expert. IMPORTANT POINT! In the process of emulating the behavior of a human expert, an expert system must be able to supply users with the same services and facilities that the human expert does. WHY BUILD EXPERT SYSTEMS The answer to this question is by comparing an expert system with a human expert as illustrated in the following figure. Factor Human Expert Expert System Time Availability Workday Always Geographic Safety Perishable Performance Local Irreplaceable Yes Variable Anywhere Replaceable No Consistent

Speed Variable Consistent Cost High Affordable We build expert systems for 2 reasons: 1. to replace an expert 2. to assist an expert Reasons for replacing an expert: y Make available expertise after hours or in other locations

Automate a routine task requiring an expert. y Expert is retiring or leaving y Expert is expensive Reasons for assisting an expert: y Aiding expert in some routine task to improve productivity y Aiding expert in some difficult task to effectively manage the complexities y Making available to the expert information that is difficult to recall


Major Application Areas: Agriculture, business, chemistry, communications, computer systems, education, information management, law, military etc. Types of problem solved by expert systems: Problem-solving Paradigm Description Control Governing system behavior, to meet specifications Design Configuring objects under constraint Diagnosis Inferring system malfunctions from observable Instruction Diagnosing, debugging and repairing student behavior Interpretation Inferring situation description from data Monitoring Comparing observations to expectation Planning Design actions Prediction Inferring likely consequences of given situation Selection Identifying best choice from a list of possibilities Prescription Recommending solution to system malfunction Current Applications 1. Financial applications The type of system which are in common use include: a. System that aid bank managers when they are deciding whether not to grant a loan to a particular customer b. System that give advice as to whether to grant a mortgage or not c. System which advice insurance companies as to the risk factor involved in insuring a particular individual or item d. System which are used by credit card companies to help them decide whether or not to issue an individual with a credit card e. Systems, which have been devised to recognize and guard against computer fraud. 2. Industry, manufacturing and military The type of system which are in common use include: a. Systems that are capable of diagnosing various industrial faults, such as faults in aircraft, gas turbines and helicopters b. Systems that are used to design and make small mechanical parts System that identifies targets and potential threats to security.


Human Expert Problem solving

Expert System problem solving

1. The Knowledge Base (LTM) The key bottleneck in developing an expert system. Contain everything necessary for understanding, formulating and solving a problem. It contains facts and heuristics. The most popular approach to representing domain knowledge is using production rules. Rule 1 IF car won't start THEN problem in electrical system

Rule 2 IF problem in electrical system AND battery voltage is below 10 volts THEN bad battery 2. Working Memory (STM) Contains facts about a problem that are discovered during consultation with the expert system. System matches this information with knowledge contained in the knowledge base to infer new facts. The conclusion reach will enter the working memory. Thus, working memory contain information either supplied by user or infer by the system. 3. Inference Engine Knowledge processor which is modeled after the expert reasoning power. Processor in an expert system that matches the facts contained in the working memory with the domain knowledge contained in the knowledge base, to draw conclusion about the problems. It taps the knowledge base and working memory to derive new information and solve problems 4. The User Interface The user communicates with the expert system through the user interface. It allows the user to query the system, supply information and receive advice. The aims are to provide the same form of communication facilities provided by the expert. But normally has less capability of understanding natural language and general knowledge. Example of inference process: Consultation with a Mr. Mech (ES). Based on previous rules. MM: The car cannot start? User: YES User assert: The car cannot start. MM assert: Problem in electrical system. MM: Is the battery voltage 10 volts? User: YES User assert: The battery voltage 10 volts. MM assert: Bad battery. The Explanation Facility A trademark of expert systems: ability to explain their reasoning. An additional component of expert system. ES can provide explanation on: 1. WHY it is asking the question 2. HOW it reached some conclusion. Explaining WHY Example: MM: Will the car not start? Person: WHY MM: If I know that the car won't start then I usually assume the problem is the electrical system.

Expert responds with what they might conclude from the answer. ES respond to a WHY query by displaying the rule it is currently pursuing. Explaining HOW Besides giving final results, expert system can explain how it arrived at a result. Conventional programming works on well-defined problem but not for expert system. Therefore, needs justification on the system findings. Example: MM: The battery is bad Person : HOW MM: Since your car won't start, I assumed there was a problem with the electrical system. Once I found the battery voltage was below 10 volts, I knew the battery was bad. ES respond by tracing back through the rules that fire the conclusion. This tracing is a map of the system line of reasoning.


An expert system is usually designed to have the following general characteristics: 1. High-level Expertise The most useful characteristic of an expert system. This expertise can represent the best thinking of top experts in the field, leading to problem solutions that are imaginative, accurate, and efficient. 2. Adequate Response Time The system must also perform in a reasonable amount of time, comparable to or better than the time required by an expert to solve a problem. 3. Permits Inexact Reasoning These types of applications are characterized by information that is uncertain, ambiguous, or unavailable and by domain knowledge that is inherently inexact. 4. Good Reliability The system must be reliable and not prone to crashes because it will not be used 5. Comprehensibility The system should be able to explain the steps of its reasoning while executing so that it is understandable. The systems should have an explanation capability in the same way that human experts are suppose to be able to explain their reasoning. 6. Flexibility Because of the large amount of knowledge that an expert system may have, it is important to have an efficient mechanism for modifying the knowledge base. 7. Symbolic Reasoning Expert systems represent knowledge symbolically as sets of symbols that stand for problems concepts. These symbols can be combined to express relationship between them. When these relationship are represented in a program they are called symbol structures. For example :Assert: Ahmad has a fever Rule: IF person has fever THEN take panadol

Conclusion: Ahmad takes panadol 8. Reasons Heuristically Experts are adapt at drawing on their experiences to help them efficiently solved some current problem. Typical heuristics used by experts: y I always check the electrical first y People rarely get a cold during the summer y If I suspect cancer, then I always check the family history 9. Makes Mistakes Expert systems can make mistakes. Since the knowledge of expert have to be captured as close as possible in expert system, like its human counterpart, it can make mistakes. 10. Thrives on Reasonable Complexity The problem should be reasonably complex, not too easy or too difficult. 11. Focuses Expertise Most experts are skillful at solving problems within their narrow area of expertise, but have limited ability outside this area.


Conventional Systems Knowledge and processing are combined in one sequential program Expert Systems Knowledge base is clearly separated from the processing (inference) mechanism (knowledge rules are separated from the control) Program may make mistakes. Explanation is a part of most expert systems

Programs do not make mistakes (only programmers do) Do not usually explain why input data are needed or how conclusions were drawn The system operates only when it is completed Execution is done (algorithmic) basis on a step-by-step

The system can operate with only a few rules (as a first prototype) Execution is done by using heuristics and logic Can operate with incomplete or uncertain information Effective manipulation of large knowledge bases Representation and use of knowledge

Needs complete information to operate

Effective manipulation of large databases

Representation and use of data

Efficiency is a major goal Easily deals with quantitative data

Effectiveness is a major goal Easily deals with qualitative data


The main participants in the process of building an expert system are: 1. the domain expert 2. the knowledge engineer 3. the user The Domain Expert Is a person who has the special knowledge, judgment, experience, skills and methods, to give advice and solve problems in a manner superior to others. Although an expert system usually models one or more experts, it may also contain expertise from other sources such as books and journal articles. Qualification needed by the Domain Expert: y Has expert knowledge y Has efficient problem-solving skills Can communicate the knowledge Can devote time y Must be cooperative The Knowledge Engineer A person who designs, builds and tests an expert systems. Qualifications needed by Knowledge Engineer: y Has knowledge engineering (art of building expert system) y Has good communications skills y Can match problems to software y has expert system programming skills The User Is a person who uses the expert system once it is developed. Can aid in knowledge acquisition (giving broad understanding of the problems) Can aid in system development
y y


Expert system developer can choose between three different approaches in building an expert system :1. use a programming languages (usually an AI language) 2. use a shell 3. use an AI environment (or toolkit)

1. Languages Expert systems may be written in symbolic languages, such as LISP, or PROLOG or in conventional high level languages such as FORTRAN, C and PASCAL LISP All the large early expert systems were developed in LISP (List Processing) or a tool written in LISP. LISP deals with symbols. PROLOG Research on logic programming culminated in the seventies with the invention of the PROLOG language. Means Programming in Logic. A PROLOG program can be thought of as a database of facts and rules. 2. Expert System Shells A shell is a program that can be used to build expert systems. An expert system shell performs three major functions: 1. Assists in building the knowledge base by allowing the developer to insert knowledge into knowledge representation structures 2. Provides methods of inference or deduction that reason on the basis of information in the knowledge base and new facts input by the user 3. Provides an interface that allows the user to set up reasoning task and query the system about its reasoning strategy 3. AI Environments or Toolkits More expensive and powerful than either languages or shells. Advantage of using toolkits: They provide a variety of knowledge representation techniques such as rules and frames (inheritance) The following is the actual figures for the different development tools use by expert system builders in the UK: Conventional Languages 11% AI languages 23% Expert system Shells 56% Toolkits 11%

Developed at Stanford University in the mid 1970's The first large expert system that perform at the level of human expert. Use as benchmark by expert system developers. Provide consultative advise about bacteremia and meningitis. Bacteremia is an infection that involves bacteria in the blood. Meningitis is an infections that inflammations of the membranes that envelop the brain and spinal cord. Can be fatal, thus need quick response, but positive identification normally takes 24 - 48 hours. Normally doctors must begin treatment in the

absence of lab results. Very complex and doctor need advice of an expert of bacteremia and meningitis. How MYCIN works? y MYCIN begin by initiating a dialogue. y Physician response to the questions y MYCIN provides diagnosis and prescription How MYCIN reasons? y Laboratory results of body fluid analyses y Symptoms that patient is displaying y Characteristics of the patient such as age, sex etc. MYCIN consultative proceeds in 2 phases: 1. Diagnosis is made to identify the most likely infection organisms 2. Prescribe one or more drugs (antibiotics) A consultation With MYCIN 1. Please enter information about the patient. Name Age Sex Race Sally 42 years Female Melayu 2. Are there any positive cultures obtained from SALLY? No 3. Are there any pending cultures of SALLY? Yes 4. Site: date collected: stain examined: method CSF 3-Jun-75 Yes N/A 5. Stain of smear: (type NONE if no organism was seen on the smear) None Unable to reach conclusion from lab test, like physician continue asking questions. 6. Has Sally recently had symptoms of persistent headache or other abnormal neurologic symptoms(dizziness, lethargy etc.) Yes 7. Has SALLY recently had objective evidence of abnormal neurologic signs documented by physician observation of examination? Yes Questions 6 and 7 traditional approach to diagnose by symptoms. Conclude from the questions by giving first conclusion. The CSF cultures will be considered to be associated with MENINGITIS. The following questions will identify particular organism that cause the infections. Is the meningitis a hospital-acquired infection? Yes 9. Is SALLY clinical history consistent with EPIGLOTTITIS? No 10. Is Sally clinical history consistent with OTITIS-MEDIA? 8.

No 11. Has the patient ever been to an area that is endemic for coccidiomycoses? Unknown 12. Is Sally a burn patient? Yes (Why) The type of infection may be bacterial . How? List of rules that conclude the infection may be bacteria will be listed. The final phase, prescription of antibiotics. 13. Does SALLY have clinically significant allergic reaction to any antimicrobial agent? No 14. Is Sally pregnant of breast-feeding? No 15. Do you have reason to suspect that SALLY may have impaired renal functions? No 16. SALLY weight in kilograms? 51.4 My preferred therapy recommendation is as follows: Give the following combination: 1. ETHAMBUTAL Dose: 1.28g (13.0 100mg tablets) q24h PO for 60 days then 770 mg (7.5 100 mg tablets) q24h PO. Comments: periodic vision screening tests are recommended for optic neuritis. 2. INH Dose: 513 mg (5.0 100mg-tablets) q24h PO 3. RIFAMPIN Dose: 600 mg PO q24h Comments: Administer dose on empty stomach. HOW GOOD AN EXPERT IS MYCIN In one complex evaluation, 8 independent evaluators evaluate MYCIN with 9 others human diagnostician for 10 difficult cases of meningitis. The task used was the selection of drugs for cases of meningitis before causative agents had been identified. Two phases of the evaluation: 1. MCYIN and 9 human experts evaluate 10 cases 2. Each of them prescribe medications Two evaluative criteria was used to see whether the prescriptions: 1. Would be effective against the actual bacteria after it was finally identified. 2. Adequately covered for other possible bacteria while avoiding over prescribing. Result:

Criteria 1: MYCIN and 3 other humans expert consistently prescribe therapy that would have been effective for all 10 cases. Criteria 2: MYCIN received higher ratings. 65% correct in all the cases whereas human expert 42.5% to 62.5%. MYCIN strengths is based on 4 factors: 1. MYCIN's knowledge base is extremely detail because acquired from the best human practitioners. 2. MYCIN do not overlook anything or forget any details. It considers every possibility. 3. MYCIN never jumps to conclusions of fails to ask for key pieces of information. 4. MYCIN is maintained at a major medical center and consequently, completely current. MYCIN represents 50 man-years of effort.


8.1 Fuzzy Logic 8.1.1 What is Fuzziness? 8.1.2 Current Application of Fuzzy Logic 8.1.3 Overview of Fuzzy Logic 8.1.4 Fuzzy Sets 8.1.5 Hedges 8.1.6 Fuzzy Set Operations 8.1.7 Fuzzy Inference 8.2 Memory Organisation 8.3 Neural Networks and Parallel Computation 8.3.1 Neural Network Architectures 8.4 Genetic Algorithm 8.5 Matching 8.5.1 Variable Matching


In our daily life situations, we always experience a large number of attributes which are not precise. But human brain always processes such imprecise terms. If a doctor asks a patient how are you ; patient replies almost OK. Here the media of communication for information passing is not mathematics but something else which are not well modeled in any way. The hedge almost is a vague term. But, interestingly doctor processed this information in his brain and took some action which could be in the form like: i) Doctor smiles; ii) Doctor expressed his satisfaction by some movement of his eyes or face; iii) Doctor thinks of further treatment, if any etc., etc. Such type of data processing or information processing are always done by the words biggest computer human brain. To make this processing faster than human brain, we need to make use of machine (computer). Machine has got no intuition, no intelligence but circuits and devices. It can be fed with such data, which it can process. Thus it can be make intelligent artificially; it can be make expert in any area say: medical diagnosis, chess game, washing a cloth, robot movement, etc. artificially. For this reason, a mathematical modeling of vague concepts or vague knowledge or imprecise data or ill-defined information is necessary. An important tool, probably one of the most important tools, to do so for such information processing is fuzzy set theory. In this unit a good discussion is done on fuzzy sets and its basic operations. Applications of fuzzy sets in artificial intelligence, experts system etc. you will learn at a later stage.

8.1.1 What is FUZZINESS?

According to OXFORD DICTIONARY FUZZY means Blurred, Fluffy, Frayed or Indistinct Fuzziness is deterministic uncertainty Fuzziness is concerned with the degree to which events occur rather than the likelihood of their occurrence (probability)

For example:
The degree to which a person is young is a fuzzy event rather than a random event. Suppose you have been in a desert for a week without a drink and you came upon a bottle A and B, marked with the following information: P (A belongs to a set of drinkable liquid) = 0.9 B in fuzzy set of drinkable liquid = 0.9 Which one would you choose? Some unrealistic and realistic quotes: Q: How was the weather like yesterday in San Francisco?

A1: Oh! The temperature was -5.5 degrees centigrade A2: Oh! It was really cold.
Experts rely on common sense to solve problem. This type of knowledge exposed when expert describe problem with vague terms. Example of vague terms: When it is really/quite hot ... If a person is very tall he is suitable for ... Only very small person can enter into that hole I am quite young Mr. Azizi drive his car moderately fast How can we represent and reason with vague terms in a computer?

Use FUZZY LOGIC!! 8.1.2 Current Applications of Fuzzy Logic

Some examples of how Fuzzy Logic has been applied in reality: Camera aiming for telecast of sporting events Expert system for assessment of stock exchange activities Efficient and stable control of car-engines Cruise control for automobiles Medicine technology: cancer diagnosis Recognition of hand-written symbols with pocket computers Automatic motor-control for vacuum cleaners Back light control for camcorders Single button control for washing machines Flight aids for helicopters Controlling of subway systems in order to improve driving comfort, precision halting and power economy Improved fuel-consumption for automobiles Expert systems also utilized fuzzy logic since the domain is often inherently fuzzy. Some examples: decision support systems financial planners diagnosing systems for determining soybean pathology

a meteorological expert system in China for determining areas in which to establish rubber tree orchards


Study mathematical representation of fuzzy terms such as old, tall, heavy etc. This term dont have truth representation. i.e. truth or false [0,1] But, have extended truth-values to all real numbers in the range of values 0 to 1. This real numbers are used to represent the possibility that a given statement is true or false. (Possibility Theory) Eg. The possibility that a person 6ft tall is really tall is set to 0.9 i.e. (0.9) signify that it is very likely that the person is tall. Zadeh (1965) extended the work and brought a collection of valuables concepts for working with fuzzy terms called Fuzzy Logic.

Definition of Fuzzy Logic

A branch of logic that uses degrees of membership in sets rather that a strict true/false membership

Linguistic Variables
Fuzzy terms are called linguistic variables. (or fuzzy variables)

Definition of Linguistic Variable

Term used in our natural language to describe some concept that usually has vague or fuzzy values

Example of Linguistic Variables With Typical Values

Linguistic Variable
Temperature Height Weight Speed

Typical Values
hot, cold short, medium, tall light, heavy slow, creeping, fast

Fuzzy rules in expert systems:

R1: IF Speed is slow THEN make the acceleration high R2: IF Temperature is low AND pressure is medium THEN Make the speed very slow IF the water is very hot THEN add plenty of cold water Fact: The water is moderately hot

Conclusion: Add a little cold water Possible numerical values of linguistic variables is called UNIVERSE OF DISCOURSE. Example: The Universe of Discourse for the linguistic variable speed in R1 is in the range [0,100mph]. Thus, the phrase "speed is slow" occupies a section of the variables Universe of Discourse. - It is a fuzzy set. (slow)

8.1.4 Fuzzy Sets

Traditional set theory views world as black and white. Example like set of young people i.e. children. A person is either a member or non-member. Member is given value 1 and non-member 0; called Crisp set. Whereas, Fuzzy Logic interpret young people reasonably using fuzzy set. HOW? By assigning membership values between 0 and 1.

Consider young people (age <= 10). If person age is 5 assign membership value 0.9 if 13, a value of 0.1 Age = linguistic variable young = one of it fuzzy sets Other fuzzy sets, old and middle age.

Definition: Fuzzy Sets

Let X be the universe of discourse, with elements of X denoted as x. A fuzzy set A is characterized by a membership QA(x) that associates each element x with degree of membership value in A. Probability theory relies on assigning probabilities to given event, whereas Fuzzy logic relies on assigning values to given event x using membership function: QA(x): X p [0,1] This value represent the degree (possibility) to which element x belongs to fuzzy set A. Q A(x) = Degree (x A) Membership values is bounded by: 0 e Q A(x) e 1


How to represent fuzzy set in computer?? Need to define its membership function. One approach is: Make a poll to a group of people is ask them of the fuzzy term that we want to represent. For example: The term tall person. What height of a given person is consider tall? Need to average out the results and use this function to associate membership value to a given individual height.

Can use the same method for other height description such as short or medium. Multiple fuzzy sets on the same universe of discourse are refers to as Fuzzy Subsets. Thus, membership value of a given object will be assigned to each set. (refer to fig. 2) Individual with height 5.5 is a medium person with membership value 1. At the same time member of short and tall person with membership value 0.25. Single object is a partial member of multiple sets.


How do we represent fuzzy set formally? Assume we have universe of discourse X and a fuzzy set A defined on it. X = {x1,x2,x3,x4,x5...xn} Fuzzy set A defines the membership function QA(x) that maps elements xi of X to degree of membership in [0,1]. A = {a1,a2,} where ai = Q A(xi) For clearer representation, includes symbol "/" which associates membership value ai with xi: A = {a1/x1,a2/} Consider Fuzzy set of tall, medium and short people: TALL = {0/5, 0.25/5.5, 0.7/6, 1/6.5, 1/7} MEDIUM = {0/4.5, 0.5/5, 1/5.5, 0.5/6, 0/6.5} SHORT = { }

8.1.5 HEDGES
We have learn how to capture and representing vague linguistic term using fuzzy set. In normal conversation, we add additional vagueness by using adverbs such as:

very, slightly, or somewhat..

What is adverb?? A word that modifies a verb, an adjective, another adverb, or whole sentence. Example: Adverb modifying an adjective. The person is very tall How do we represent this new fuzzy set?? Use a technique called HEDGES. A hedge modifies mathematically an existing fuzzy set to account for some added adverb.

Concentration (Very)
Further reducing the membership values of those element that have smaller membership values. Q CON(A) (x) = (Q A(x))2 Given fuzzy set of tall persons, can create a new set of very tall person. Example: Tall = {0/5, 0.25/5.5, 0.76/6, 1/6.5, 1/7) Very tall = { /5, /5.5, /6, /6.5, /7}

Dilation (somewhat)
Dilates the fuzzy elements by increasing the membership values with small membership values more than elements with high membership values. Q DIL(A) (x) = (Q A(x))0.5 Example: Tall = {0/5, 0.25/5.5, 0.76/6, 1/6.5, 1/7} somewhat tall = { /5, /5.5, /6, /6.5, /7}

Intensification (Indeed)
Intensifying the meaning of phrase by increasing the membership values above 0.5 and decreasing those below 0.5. Q INT(A) (x) = 2(Q A(x))2 for 0 e Q A(x) e 0.5 = 1 - 2(1 - Q A(x))2 for 0.5 e Q A(x) e 1 Example: short = {1/5, 0.8/5.5, 0.5/6, 0.2/6.5, 0/7} indeed short = { /5, /5.5, /6, /6.5, /7}

Power (Very Very)

Extension of the concentration operation. Q POW(A) (x) = (Q A(x))n Example: Create fuzzy set of very very tall person with n=3 Tall = {0/5, 0.25/5.5, 0.76/6, 1/6.5, 1/7} Very very tall = { /5, /5.5, /6, /6.5, /7}

8.1.6 FUZZY SET OPERATIONS Intersection

In classical set theory, intersection of 2 sets contains elements common to both. In fuzzy sets, an element may be partially in both sets. Q A B (X) = min (Q A(x), Q B(x))  x X Example: Tall = {0/5, 0.2/5.5, 0.5/6, 0.8/6.5, 1/7} Short = {1/5, 0.8/5.5, 0.5/6, 0.2/6.5, 0/7} Q tall short = Tall and short can mean ________ Highest at the middle and lowest at both end.

Union of 2 sets is comprised of those elements that belong to one or both sets. Q A B (X) = max (Q A(x), Q B(x))  x X Example:

Tall = {0/5, 0.2/5.5, 0.5/6, 0.8/6.5, 1/7} Short = {1/5, 0.8/5.5, 0.5/6, 0.2/6.5, 0/7} Q tall short = Attains its highest vales at the limits and lowest at the middle. Tall or short can mean ________

Complementation (Not)
Find complement ~A by using the following operation:

(x) = 1 - Q A(x)\

Short = {1/5, 0.8/5.5, 0.5/6, 0.2/6.5, 0/7} Not short = { /5, /5.5, /6, /6.5, /7}


Fuzzy proposition: a statement that assert a value for some linguistic variable such as height is tall. Fuzzy Rule: Rule that refers to 1 or more fuzzy variable in its conditions and single fuzzy variable in its conclusion. General form: IF X is A THEN Y is B Specific form: IF height is tall THEN weight is heavy Association of 2 fuzzy sets are store in matrix M called Fuzzy Associative Memory (FAM) Rules are applied to fuzzy variables by a process called propagation. (inference process). When applied, it looks for degree of membership in the condition part and calculate degree of membership in the conclusion part. Calculation depends upon connectives: AND, OR or NOT. A fuzzy Logic program can be viewed as a 3 stage process:

1. Fuzzification
The crisp values input are assigned to the appropriate input fuzzy variables and converted to the degree of membership.

2. Propagation (Inference)
Fuzzy rules are applied to the fuzzy variables where degrees of membership computed in the condition part are propagated to the fuzzy variables in the conclusion part. (max-min and maxproduct inference)

3. De-fuzzification
The resultant degrees of membership for the fuzzy variables are converted back into crisp values. Example: Assume 2 cars traveling the same sped along a straight road. The distance between the cars becomes one of the factors for the second driver to brake his car to avoid collision. The following rule might be used by the second driver: IF the distance between cars is very small AND the speed of car is high

THEN brake very hard for speed reduction. IF distance between cars is slightly long AND the speed of car is not too low THEN brake moderately hard to reduce speed In the above examples, the rules are feature with: linguistic variables: fuzzy subsets: connectives: hedges: Diagram: Fuzzy Logic Controllers are build up from 4 main components: Fuzzifier Fuzzy inference engine Defuzzifier

8.2 Memory Organisation

Memory is central to common sense behaviour. Human memory contains an immense amount of knowledge about the world. So far, we have only discussed a tiny fraction of that knowledge. Memory is also the basis for learning. A system that cannot learn cannot, in practice, possess common sense. A complete theory of human memory has not yet been discovered, but we do have a number of facts at our disposal. Some of these facts come from neurobiology while others are psychological in nature. Computer models of neural memory are interesting, but they do not serve as theories about how memory is used in everyday, commonsense reasoning. Psychology an AI seek to address these issues. Psychological studies suggest several distinctions in human memory. One distinction is between short-term memory (STM) and long-term memory (LTM). We know that a person can only hold a few items at a time in STM, but the capacity of LTM is very large. LTM storage is also fairly permanent. The production system is one computer model of the STM-LTM structure. Perceptual information is stored directly in STM, also called working memory. Production rules, stored in LTM, match themselves against items in STM. Productions fire, modify STM, and repeat. LTM is often divided into episodic memory and sementic memory. Episodic memory contains information about past personal experiences, usually stored from an autobiograpical point of view. For example, a college graduation, a wedding, or a concert may all form episodic memories. Semantic memory, on the other hand, contains facts like Birds fly. These facts are no longer connected with personal experiences. Sementic memory is especially useful in programs that understand natural language. Models for episodic memory grew out of research on scripts. Recall that a script is a stereotyped sequence of events, such as those involved in going to the dentist. One obvious question to ask is: How are scripts acquired? Surely they are acquired through personal experience. But a particular experience often includes details that we do not want to include in a script. For example, just because we once saw The New Yorker magazine in a dentists waiting room, that doesnt mean that The New Yorker should be part of the dentist script. The

problem is that if a script contains too many details, it will not be matched and retrived correctely when new, similar situation arise. In general, it is difficult to know which script to retrive. One reason reason for this is that script are too monolithic. It is hard to do any kind of partial matching. It is also hard to modify a script. More recent work reduces scripts to individual scenes, which can be shared across multiple structures. Streotypical sequences of scenes are strung together into memory organization packets. Usually, three distinct MOPs encode knowledge about an event sequence. One MOP represents the physical sequence of events, such as entering a dentists office, sitting in the waiting room, reading a magazine, sitting inte dentists chair, etc. Another MOP represent the set of social event that take place. These are events that involve personal interactions. A third MOP revolves around the goals of the person in the particualr episode. Any of these MOPs may be important for understanding new situations. MOPs organize scenes, and they themselves are further organized into higer-level MOPs. For example, the MOP for visiting the office of a professional may contain a sequence of abstract general scenes, such as talking to an assistant, waiting, and meeting. High-level MOPs contain no actual memories, so where do they come from? New MOPs are created upon the failure of exceptions. When we use scripts for story understanding, we are able to locate interesting parts of the story by noticing places where events do not conform to the scripts expectations. In a MOP-based system, if an expectation is repeatedly violeted, then the MOP is generalized or split. Eventually, episodic memories can fade away, leaving only a set of generalized MOPs. These MOPs look something like scripts, except that they share scenes with one another. Lets look at an example. the first time you go to the dentist, you must determine how things work from scratch since you have no prior experience. In doing so, you store detailed accounts of each scene and string them together into a MOP. The next time you visit the dentist, that MOP provides certain expectations, which are mostly met. You are able to deal with the situation easily and make inferences that you could not make the first time. If any expectation fails, this provides grounds for modifying the MOP. Now, suppose you later visit a doctors office. As you begin to store episodic scenes, you notice similarties between these scenes and scenes from the dentist MOP. Such similarities provide a basis for using the dentist MOP to generate expectations. Miltiple trips to the doctor will result in a doctor MOP that is slightly different from the dentist MOP. Later experiences with visiting lawyers and gevernment officials will result in other MOPs. Ultimately, the structures shared by all of these MOPs will cause a generalized MOP to appear. Whenever you visit a professions office in the future, you can use the generalized MOP to provide expectations. With MOPs, memory is both a constructive and reconstructive process. It is constructive because new experience create new memory structures. It is reconstrctive because even if the details of a particular episode are lost, the MOP provides information about what was likely to have happened. The ability to do this kind of reconstruction is an important feature of human memory. There are several MOP-based computer programs. CYRUS is a program that contrains episodes taken from the life of a particular individual. CYRUS can answer questions that require significant amount of memory reconstruction. The IPP program accepts stories about terrorist attacks and stores them in an episodic memory. As it notices similarities in the stories, it

creates general memory structures. These structures improve its ability to understand. MOPTRANS uses a MOP-based memory to understand sentences in one language and translate them into another.

8.3 Neural Networks and Parallel Computation

The human brain is made up of a web of billions of cells called neurons, and understanding its complexities is seen as one of the last frontiers in scientific research. It is the aim of AI researchers who prefer this bottom-up approach to construct electronic circuits that act as neurons do in the human brain. Although much of the working of the brain remains unknown, the complex network of neurons is what gives humans intelligent characteristics. By itself, a neuron is not intelligent, but when grouped together, neurons are able to pass electrical signals through networks.

The neuron "firing", passing a signal to the next in the chain.

Research has shown that a signal received by a neuron travels through the dendrite region, and down the axon. Separating nerve cells is a gap called the synapse. In order for the signal to be transferred to the next neuron, the signal must be converted from electrical to chemical energy. The signal can then be received by the next neuron and processed.

Warren McCulloch after completing medical school at Yale, along with Walter Pitts a mathematician proposed a hypothesis to explain the fundamentals of how neural networks made the brain work. Based on experiments with neurons, McCulloch and Pitts showed that neurons might be considered devices for processing binary numbers. An important back of mathematic logic, binary numbers (represented as 1's and 0's or true and false) were also the

basis of the electronic computer. This link is the basis of computer-simulated neural
networks, also know as Parallel computing. A century earlier the true / false nature of binary numbers was theorized in 1854 by George Boole in his postulates concerning the Laws of Thought. Boole's principles make up what is known as Boolean algebra, the collection of logic concerning AND, OR, NOT operands. For example according to the Laws of thought the statement: (for this example consider all apples red) y Apples are red-- is True y Apples are red AND oranges are purple-- is False y Apples are red OR oranges are purple-- is True y Apples are red AND oranges are NOT purple-- is also True Boole also assumed that the human mind works according to these laws, it performs logical operations that could be reasoned. Ninety years later, Claude Shannon applied Boole's principles in circuits, the blueprint for electronic computers. Boole's contribution to the future of computing and Artificial Intelligence was immeasurable, and his logic is the basis of neural networks. McCulloch and Pitts, using Boole's principles, wrote a paper on neural network theory. The thesis dealt with how the networks of connected neurons could perform logical operations. It also stated that, one the level of a single neuron, the release or failure to release an impulse was the basis by which the brain makes true / false decisions. Using the idea of feedback theory, they described the loop which existed between the senses ---> brain ---> muscles, and likewise concluded that Memory could be defined as the signals in a closed loop of neurons. Although we now know that logic in the brain occurs at a level higher then McCulloch and Pitts theorized, their contributions were important to AI because they showed how the firing of signals between connected neurons could cause the brains to make decisions. McCulloch and Pitt's theory is the basis of the artificial neural network theory. Using this theory, McCulloch and Pitts then designed electronic replicas of neural networks, to show how electronic networks could generate logical processes. They also stated that neural networks may, in the future, be able to learn, and recognize patterns. The results of their research and two of Weiner's books served to increase enthusiasm, and laboratories of computer simulated neurons were set up across the country. Two major factors have inhibited the development of full scale neural networks. Because of the expense of constructing a machine to simulate neurons, it was expensive even to construct neural networks with the number of neurons in an ant. Although the cost of components have decreased, the computer would have to grow thousands of times larger to be on the scale of the human brain. The second factor is current computer architecture. The standard Von Neuman computer, the architecture of nearly all computers, lacks an adequate number of pathways between components. Researchers are now developing alternate architectures for use with neural networks. Even with these inhibiting factors, artificial neural networks have presented some impressive results. Frank Rosenblatt, experimenting with computer simulated networks, was able to create a machine that could mimic the human thinking process, and recognize letters. But, with new top-down methods becoming popular, parallel computing was put on hold. Now neural networks are making a return, and some researchers believe that with new computer

architectures, parallel computing and the bottom-up theory will be a driving factor in creating artificial intelligence.

8.3.1 Neural Network Architectures

Neural networks are large networks of simple processing elements or nodes which process information dynamically in response to external inputs. The nodes are simplified models of neurons. The knowledge in a neural network is distributed throughout the network in the form of inter-node connections and weighted links which form the inputs to the nodes. The link weights serve to enhance or inhibit the input stimuli values which are then added together at the nodes. If the sum of all the inputs to a node exceeds some threshold value T, the node executes and produces an output which is passed on to other nodes or is used to produce some output response. In the simplest case, no output is produced if the total input is less than T. In more complex models, the output will depend on a nonlinear activation function. Neural networks were originally inspired as being models of the human nervous system. They are greatly simplified models to be sure (neurons are known to be fairly complex processor). Even so, they have been shown to exhibit many intelligent abilities, such as learning, generalization, and abstraction. A single node is illustrated in following figure. The inputs to the node are the values x1,x2,.xn, which typically take on values of 1, 0, 1, or real values within the range (-1,1). The weights w1,w2,wn, correspond to the synaptic strengths of a neuron. They serve to increase or decrease the effects of the corresponding x1 input values. The sum of the products xi * wi , i = 1,2,..,n, serve as the total combined input to the node. If this sum is large enough to exceed the threshold amount T,

x1 x2 =x*w x3

w1 w2 y


the node fires, and produces an output y, an activation function value placed on the nodes output links. This output may then be the input to other nodes or the final output response from the networks. Following fig illustrates three layers of a number of interconnected nodes. The first layer serves as the input layer, receiving inputs from some set of stimuli. The second layer(called the hidden layer) receives input from the first layer and produces a pattern of inputs to the third layer, the output layer. The pattern of outputs from the final layer are the networks response to the input stimuli patterns. Input links to layer j (j = 1, 2, 3) have weight wij for i = 1, 2 . n. General multiplayer networks having n nodes (number of rows) in each of m layers (number of columns of nodes) will have weights represented as an n * m matrix W. Using this representation, nodes having no interconnecting links will have a weight value of zero. Networks consisting of more than three layers would, of course, be correspondingly more complex than the network depicted in following Figure.

A neural network can be thought of as a black box that transforms the input vector x to the output vector y where the transformation performed is the result of the pattern of connections and weights, that is, according to the values of the weight matrix W. Consider the vector product

x * w = x iwi


w11 w21

x2 w23 x3 w33 layer 1




layer 2

layer 3

There is a geometric interpretation for this product. It is equivalent to projecting one vector onto the other vector in n-dimensional space. This notation is depicted in following Figure for the two-dimensional case. The magnitude of the resultant vector is given by

x * w = |x||w| cos U
where |x| denotes the norm or length of the vector x. note that this product is maximum when both vectors point in the same direction, that is, when U = 0. The product is a minimum when both point in opposite directions or when U = 1800 degrees. This illustrates how the vectors in the weight matrix W influence the inputs to the nodes in a neural network.

8.4 Genetic Algorithm

Genetic algorithm learning methods are based on model of natural adaptation and evolution. These learning systems improve their performance through processes, which model population genetics and survival of fittest. In the field of genetic, a population is subjected to an environment, which places demands on the members. The members which adapt well are selected for mating and reproduction. The offspring of these better performers inherit genetic traits from both their parents. Member of this second generation of offspring which also adapt well are then selected for mating and reproduction and the evolutionary cycle continues. Poor performers die off without leaving offspring. Good performers produce good offspring and they, in turn, perform well. After some number of generations, the resultant population will have adapted optimally or at least very well to the environment. Genetic algorithm systems start with a fixed size population of data structures which are used to perform some given tasks. After requiring the structures to execute the specified tasks some number of times, the structures are rated on their performance, and a new generation of data structures is then created. The new generation is created by mating the higher performing structures to produce offspring. These offspring and their parents are then retained for the next

generation while the poorer performing structures are discarded. The basic cycle is illustrated in following figure.

Generate initial population So Structures perform given tasks repeatedly Performance utility values assigned to knowledge structures New population is generated from best performing structures

Process repeated until desired performance reached

Mutations are also performed on the best performing structures to insure that the full space of possible structures is reachable. This process is repeated for a number of generations until the resultant population consists of only the highest performing structures. Data structure which make up the population can represent rules or any other suitable types of knowledge structures are fixed-length binary strings such as the eight-bit string 11010001. An initial population of these eight-bit strings would be generated randomly or with the use of heuristics at time zero. These strings, which might be simple condition and action rules, would then be assigned some tasks to perform (like predicting the weather based on certain physical and geographic conditions or diagnosing a fault in a piece of equipment). After multiple attempts at executing the tasks, each of the participating structures would be rated and tagged with a utility value u commensurate with its performance. The next population would then be generated using the higher performing structures as parents and the process would be repeated with the newly produced generation. After many generations the remaining population structures should perform the desired tasks well. Mating between two string is accomplished with the crossover operation which randomly selects a bit position in the eight-bit string and concatenates the head of one parent to the tail of the second parent to produce the offspring. Suppose the two parents are designated as xxxxxxxx and yyyyyyyy respectively, and suppose the third bit position has been selected as the crossover point (at the position of the colon in the structure xxx:xxxxx). After the crossover operation is applied, two offspring are then generated, namely xxxyyyyy and yyyxxxxx. Such offspring and their parents are then used to make up the next generation of structures. A second genetic operation often used is called inversion. Inversion is a transformation applied to a single string. A bit position is selected at random, and when applied to a structure, the

inversion operation concatenates the tail of the string to the head of the same string. Thus, if the sixth position were selected (x1x2x3x4x5x6x7x8), the inverted string would be x7x8x1x2x3x4x5x6. A third operator, mutation, is used to insure that all locations of the rule space are reachable, that every potential rule in the rule space is available for evaluation. This insures that the selection process does not get caught in a local minimum. For example, it may happen that use of the crossover and inversion operators will only produce a set of structures that are better than all local neighbors but not only produce a set of structures that are better than all local neighbors but not optimal in a global sense. This can happen since crossover and inversion may not be able to produce some undiscovered structures. The mutation operator can overcome this by simply selecting any bit position in a string at random and changing it. This operator is typically used only infrequently to prevent random wandering in the search space.

8.5 Matching:
Matching is the process of comparing two or more structures to discover their likenesses or differences. The structures may represent a wide range of objects including physical entities, words of phrases in some language, complete classes of things, general concepts, relations between complex entities, and the like. The representations will be given in one or more of the formalisms like FOPL, networks, or some other scheme, and matching will involve comparing the component parts of such structures. Matching is used in a variety of programs for different reasons. It may serve to control the sequence of operations, to identify or classify objects, to determine the best of a number of different alternatives, or to retrieve items from a database. It is an essential operation in such diverse programs as speech recognition, natural language understanding, vision, learning, automated reasoning, planning, automatic programming, and expert systems, as well as many others. In its simplest form, matching is just the process of comparing two structures or pattern for equality. The match fails if the patterns differ in any aspect. For example a match between the two characters strings acdebfba and acdebeba fails on an exact match since the string differ in the sixth character positions. In more complex cases the matching process may permit transformations in the patterns in order to achieve an equality match. The transformation may be a simple change of some variables to constants, or it may amount to ignoring some components during the match operation. For example, a pattern matching variable such as ?x may be used to permit successful matching between the two patterns (a b (c d) e) and (a b ?x e) by binding ?x to (c d). Such matching are usually restricted in some way, however, as is the case with the unification of two clauses where only consistent bindings are permitted. Thus, two patterns such as (a b (c d) e f) and (a b ?x e ?x ) would not match since ?x could not be bound to two different constants. In some extreme cases, a complete change of representational from may be required in either one or both structures before a match can be attempted. This will be the case, for example, when one visual object is represented as a vector of pixel gray levels and objects to be matched

are represented as descriptions in predicate logic or some other high level statements. A direct comparison is impossible unless one form has been transformed into the other. In subsequent chapters we will see examples of many problems where exact matches are inappropriate, and some form of partial matching is more meaningful. Typically in such cases, one is interested in finding a best match between pairs of structures. This will be the case in object classification problems, for example, when object descriptions are subject to corruption by noise or distortion. In such cases, a measure of the degree of match may also be required. Other types of partial matching may require finding a match between certain key elements while ignoring all other elements in the pattern. For example, a human language input unit should be flexible enough to recognize any of the following three statements as expressing a choice of preference for the low-calorie food item. I prefer the low-calorie choice. I want the low-calorie item. The low-calorie one please. Recognition of the intended request can be achieved by matching against key words in a template containing low-calorie and ignoring other words except, perhaps, negative modifiers. Finally, some problems may obviate the need for a form of fuzzy matching where an entitys degree of membership in one or more classes is appropriate. Some classification problems will apply here if the boundaries between the classes are not distinct, and an object may belong to more than one class. Following fig illustrate the general match process where an input description is being compared with other descriptions. As stressed earlier, the term object is used here in a general sense. It does not necessarily imply physical objects. All objects will be represented in some formalism such as a vector of attribute values, prepositional logic or FOPL statements, rules, frame-like structures, or other scheme. Transformations, if required, may involve simple instantiations or unifications among clauses or more complex operation such as transforming a two-dimensional scene to a description in some formal language. Once the descriptions have been transformed into the same schema, the matching process is performed element-by-element using a relational or other test (like equality or ranking). The test results may then be combined in some way to provide an overall measure of similarity. The choice of measure will depend on the match criteria and representation scheme employed.

Object to Match match



Comparator Object or classesMetric Representation Transformations


The output of the matcher is a description of the match. It may be a simple yes or no response or a list of variable bindings, or as complicated as a detailed annotation of the similarities and differences between the matched objects. To summarize then, matching may be exact, used with or without pattern variables, partial, or fuzzy, and any matching algorithm will be based on such factors as Choice of representation scheme for the objects being matched, Criteria for matching (exact, partial, fuzzy, and so on), Choice of measure required to perform the match in accordance with the chosen criteria, and Type of match description required for output.

8.5.1 Variables Matching

All of the structures we shall consider here are constructed from basic atomic elements, number, and characters. Character string elements may represent either constraints or variables. If variables, they may be classified by either the type of match permitted or by their value domains. We can classify match variables by the number of items that can replace them (one or more than one). An open variable can be replaced by a single item, while a segment variable can be replaced by zero or more items. Open variables are labeled with a preceding question mark (?x, ?y, ?class). They may match or assume the value of any single string element or word, but they are sometimes subject to consistency constraints. For example, to be consistent, the variable ?x can be bound only to the same top level element in any single structure. Thus (a ?x d ?x e) may match (a b d b e), but not (a b d a e). segment variable types will be preceded with an asterisk (*x d (e f) * y) will match the patterns (a (b c) d (e f) g h), (d (e f) (g)) or other similar patterns. Segment variables may also be subject to consistency constraints similar to open variables. Variables may also be classified by their value domains. This distinction will be useful when we consider similar measure below. The variables may be either quantitative, having a meaningful origin or zero point and a meaningful interval difference between two values, or they may be qualitative in which there is no origin nor meaningful interval value difference. These two types may be further subdivided as follows. Nominal variables. Qualitative variables whose value or states have no order nor rank. It is only possible to distinguish equality or inequality between two such objects. Of course each state can be given a numerical code. For example, marital status has states of married, single, divorced, or windowed. These states have no numerical significance, and no particular order nor rank. The states could be assigned numerical codes however, such as married = 1, single = 2, divorced = 3, and widowed = 4. Ordinal variables. Qualitative variables whose states can be arranged in a rank order, but the difference between two distinct values has no significance. Ordinal variables may also be assigned numerical values. For example, the states very tall, tall, medium, short, and very short can be arranged in order from tallest to shortest and be assigned an arbitrary scale of 5

to 1. However, the difference between successive values does not necessarily have any quantitative meaning. Binary variable. Qualitative discrete variables which may assume only one of two values, such as 0 or 1, good or bad, yes or no, high or low. Interval variables. Quantitative variables, which take on numeric values and for which equal differences between values, have the same significance. For example, real numbers corresponding to temperature or integers corresponding to an amount of money are considered as interval variables.