You are on page 1of 52
Artificial Intelligence Notes 5. Predicate Logic Introduction Predicate logic is used to represent Knowledge. Predicate logic will be met in Knowledge Representation Schemes and reasoning methods. There are other ways but this form is popular. Propositional Logic Itis simple to deal with and decision procedure for it exists. We can represent real-world facts as logical propositions written as well-formed formulas. To explore the use of predicate logic as a way of representing knowledge by looking at a specific example. It is raining. > RAINING Itis sunny. > SUNNY It iswindy. > WINDY If it is raining thenit is not sunny. : RAINING 97 SUNNY Socrates is aman > SOCRATESMAN Plato is a man > PLATOMAN ‘The above two statements becomes totally separate assertion, we would not be able to draw any conclusions about similarities between Socrates and Plato. ‘MAN(SOCRATES) MAN (PLATO) ‘These representations reflect the structure of the knowledge itself. These use predicates applied to arguments. All men are mortal > MORTALMAN It fails to capture the relationship between any individual being a manand that individual being a mortal. ‘We need variables and quantification unless we are willing to write separate statements. Predicate: AA Predicate is a truth assignment given for a particular statement which is either true or false. To solve common sense problems by computer system, we use predicate logic. Logic Symbols used in predicate logic V—For all a—There exists + —Implies T= Not v-oR A - AND Artificial Intelligence Notes Predicate Logic ‘© Terms represent specific objects in the world and can be constants, variables or functions. ‘+ Predicate Symbols refer to a particular relation among objects. * Sentences represent facts, and are made of terms, quantifiers and predicate symbols + Functions allow us to refer to objects indirectly (via some relationship) * Quantifiers and variables allow us to refer to a collection of objects without explicitly naming each object. © Some Examples © Predicates: Brother, Sister, Mother , Father co Objects: Bill, Hillary, Chelsea, Roger © Facts expressed as atomic sentences a.k. = Father(Bill,Chelsea) = Mother(Hillary,Chelsea) * Brother(Bill, Roger) = —Father(Bill,Chelsea) literals: Variables and Universal Quantification Universal Quantification allows us to make a statement about a collection of objects: ‘© ¥xCat(x) = Mammel(x) : All cats are mammels ‘+ Vx Father(Billx) => Mother(Hillary.x): All of Bill's kids are also Hillary's kids. Variables and Existential Quantification Existential Quantification allows us to state that an object does exist (Without nar 3x Cat(x) » Mean(x) : There is a mean cat. 3x Father(Bill.x) » Mother(Hillary.x) : There is a kid whose father is Bill and whose mother is Hillary Nested Quantification ‘+ Vxuy Parent(x.y) > Child(y.x) + ¥x3y Lovestx.y) +x Passtest(x) v Gx ShootDave(x))] it: Functions + Functions are terms - they refer to a specific object. + We can use functions to symbolically refer to objects without naming them. + Examples: fatherof(x) _age(x) timestxy) — succ(x) + Using functions © Wx Equal(xx) © Equalfactorial(0).1) © V x Equal(factorial(s(x)),times(s(x).factorial(x))) If we use logical statements as a way of representing knowledge, then we have available a good way of reasoning with that knowledge. Artificial Intelligence Notes Representing facts with Predicate Logic 1) Marcus was a man ‘man(Marcus) 2) Marcus was a Pompe pompeian(Marcus) 3) All Pompeians were Romans vx : pompeian(x) -» roman(x) 4) Caeser was a ruler. ruler(Ceaser) 5) All romans were either loyal to caeser or hated him. Vx + roman(x) > loyalto(,caeser) v hate(x, caeser) 6) Everyone loyal to someone. vx,3y : loyalto(x.y) 7) People only try to assassinate rulers they are not loyal to. vx, Vy: Person(x) A Ruler(y) Atry_assassinate(x,y) + —Loyal_to(x,y) 8) Marcus try to assassinate Ceaser try_assacinate(Marcus, Ceaser) Q Prove that Marcus is not loyal to Ceaser by backward substitution 4, —Loyal_to(Marcus, Ceaser) 1 5. Person(Marcus) | Ruler(Ceaser) A Try_assacinate(Marcus, Ceaser) ot 7. Person(Marcus) \ Ruler(Ceaser) at 9. Person(Marcus) Representing Instance and Isa Relationships Two attributes isa and instance play an important role in many aspects of knowledge representation. The reason for this is that they support property inheritance isa - used to show class inclusion, e.g, isa (mega_star.rich). Instance - used to show class membership, e.g. instance(prince.mega_star). maar) Popein(Marcis) = Co | Pure Predicate Logic Yc: Roman) + lyate, Cae \ hat, Caesay 7 ‘rstareatMarcis, mar) ‘rstarce(Marcss. Ponpoan) ‘Yr instancelx, Pompei) -+ stares, Reman) instance Caesar, ref "Yr instances, Roman) -+ loys, Caesan \/hatts, Caesar) ‘natane(Marcus, ma) lnstance(Marcus, Pompoian) ‘xPompeian, Romar) instarcs{Caesa. ner "yx instances, Roman) -+ yates, Caesan \ hates, Caesan Yer Wy: Ye stance, say. 2) + instance, 2) Three Ways of Representing Cass Membership Isa Relationship Artificial Intelligence Notes In the figure above, > The first five sentences of the represent the pure predicate logic. In these representations, class membership is represented with unary predicates (such as Roman), each of which corresponds to a class. Asserting that P(x) is true is equivalent to asserting that x is an instance of P. The second part of the figure contains representations that use the instance predicate explicitly. The predicate instance is a binary one, whose first argument is an object and whose second argument is a class to which the object belongs. But these representations do not use an explicit isa predicate > The third part contains representations that use both the instance and isa predicates explicitly. The use of the isa predicate simplifies the representation of sentence 3, but it requires that one additional axiom be provided. This additional axiom describes how an instance relation and an isa relation can be combined to derive @ new instance relation. . Computable Functions and Predicates This is fine if the number of facts is not very large or if the facts themselves are sufficiently Unstructured that there is little alternative. But suppose we want to express simple facts, such as the following greater-than and less-than relationships gt(1,0) ) gt(2.1) 101.2) st(3.2) (2.3) Clearly we do not want to have to write out the representation of each of these facts individually. For one thing. there are infinitely many of them. But even if we only consider the finite number of them that can be represented, say, using a single machine word per number, it would be extremely inefficient to store explicitly a large set of statements when we could, instead, so easily compute each one as we need it. Thus it becomes useful to augment our representation by these computable predicates. ‘Marcus was @ Man => Man(Marcus) ‘Marcus was a Pompeian => Pompeian(Marcus) ‘Marcus born in 40 AD > Born(Marcus, 40) All men are mortal > Wx: Men(x) + Mortal(x) All Pompeians died when the volcano was erupted in 79 AD. Erupted(volcano,79) A(¥x:pompeian(x) -» died(x,79)) 6. No mortal lives longer than 150 years ¥x,¥ €1,¥ €2: Mortal(x) ABorn(x,¢1) A Greater then(t2 ~ £1,150) — died(x,t2) 7. leis now 1991 => Now= 1991 Alive means not deal Vx,VE: (alive(x,t) + Dead(s,t)) A-Dead(ax,t) ~ alive(x,t)) ies then he is dead at all later times Vx, VEL, Vt2: Died(x, £1) Greater_then(2,t1) Dead (x, t2) 9. Ifsomeone Artificial Intelligence Notes Prove that Marcus is dead now. salive(Mercus, NOW) tT ©, substitution) dead Marcus, now; T (10, substitution) Gied( Marcus, i.) \ know, ) b ©. substitution) PompoariMaross) \ gtirom, 79; t @) gtnow. 73} T @B, substitute oquals) ‘at:1991 79) 7 (compute at) nt Fig.5.5 One Wey of Proving That Marcus Is Dead native( Marcus, now) a (9, substitution) dead Marcus, now) iy (7, substitution) morta Marcus) /. born Marcus, t,) gt now — t,, 150) T (4, substitution) man(Marcus) borMarcus, t) gtinow — f,, 150) 7 a) born(Marcus, t,) gtinow — t;, 150) 2) (8) gtmow — 40,150) gt1991 — 40,150) t (compute minus) gt1951,150) Me {compute gt) nit Another Way of Proving That Marcus is Dead Artificial Intelligence Notes Resolution: A procedure to prove a statement. Resolution attempts to show that Negation of Statement gives Contradiction with known statements. It simplifies proof procedure by first converting the statements into canonical form. Simple iterative process: at each step, 2 clauses called the parent clauses are compared, yielding a new clause that has been inferred from them. Resolution refutation: Resolution inference rule © Convert all sentences to CNF (conjunctive normal Oe ee form) * Negate the desired conclusion (converted to CNF) (av 1) conctsion Apply resolution rule until either — Derive false (a contradiction) = Can't apply any more Resolution refutation is sound and complete + If we derive a contradiction, then the conclusion follows from the axioms + If we can’t apply any more, then the conclusion cannot be proved from the axioms. Sometimes from the collection of the statements we have, we want to know the answer of this auestion - "ls it possible to prove some other statements from what we actually know?" In order to prove this we need to make some inferences and those other statements can be shown true using Refutation proof method i.e. proof by contradiction using Resolution. So for the asked goal ‘we will negate the goal and will add it to the given statements to prove the contradiction. So resolution refutation for propositional logic is @ complete proof procedure. So ifthe thing that you're trying to proveis, in fact, entailed by the things that you've assumed, then you ion refutation. = Resolution can be applied to certain class of wf called clauses. = A clause is defined as a wif consisting of disjunction of literals. Conjunctive Normal Form or Clause Normal Form: Clause form is an approach to Boolean logic that expresses formulas as conjunctions of clauses with an AND or OR. Each clause connected by a conjunction or AND must be wither a literal or contain a disjunction or OR operator. In clause form, a statement is a series of ORs connected by ANDs. AA statement is in conjunctive normal form if it is a conjunction (sequence of ANDs) consisting of one or more conjuncts, each of which is a disjunction (OR) of one or more literals (Le., statement letters and negations of statement letters). All of the following formulas in the variables A, B, C, D, and E are in conjunctive normal form: + 7AA (BVO) = (AV B)A (BV CV 3D) A (DV 5B) -AVB ANB Artificial Intelligence Notes Conversion to Clause Form: Wax: [Roman(x) /\ know(x, Marcus)| > [hate(x,Caesar) \/ (Wy : 3z : hate(y, z) + thinkcrazy(x, y))) > Clause Form: ARoman(x) \ aknow(x, Marcus) \/ hate(x, Caesar) \/ ~hate(y, 2) \/ thinkcrazy(x, 2) Algorithm: 1. Eliminate implies relation (+) Using (Ex: a> b =>—avb) Vx: aRoman(x) \ knows x, Marcus) \/ Ihate(x, Caesar) \y (y : Ez: hate(y, 2) thinkcrazy(zy)) 2. Reduce the scope of each ~ to a single term =GP)=P = (Vb) = anh = (anb) => av>b Wx : [>Roman(x) \/ ~know(x, Marcus)| \/ [hate(s, Caesar) \/ (¢y : Wz: shate(y, 2) \V thinkerazyx. 9D] 3. Standardize variables so that each quantifier binds a unique variable. Vx: P(X) V Vx: Q(x) can be converted to vx: P(x) v Wy Q) 4, Move all quantifiers to the left of the formulas without changing their relative order. Bx ¥x,¥y? PC) V Q(2) Wx: Vy 2 Vz: [>Roman(x) \/ sknow(x Marcus)] \/ thate(x, Caesar) \/ (“hate y, 2) \/ thinkcrazy(x.y))] 5. Eliminate existential quantifiers. We can eliminate the quantifier by substituting for the variable a reference to a function that produces the desired valué. ay: President(y) => President(S1) vx.3y: Fatherof{y.x = >¥x: Fatherof($2(5)) President(fune()) > func is called a skolem function, In general the function must have the same number of arguments as the number of universal quantifiers in the current scope. Skolemize to remove existential quantifiers. This step replaces existentially quantified variables by Skolem functions. For example, convert (3 x)P(x) to P(e) where ¢ is a brand new constant symbol that is not used in any other sentence (c is called a Skolem constant). More generally, if the existential quantifier is within the scope of a universal quantified variable, then introduce a Skolem function that depends on the universally quantified variable. For example, "Vx 3y P(x.y) is converted to "Vx P(x. f(x)). f is called @ Skolem function, and must be a brand new function name that does not occur in any other part of the logic sentence. Artificial Intelligence Notes 6. Drop the prefix. At this point, all remaining variables are universally quantified. PO) VQ(x) [Roman(x) vy mknow(x, Marcus) \v Uhate(x, Caesar) \ (shate(y, 2) thinkcrazy/x, ¥))1 7. Convert the matrix into a conjunction of disjunctions. (avb)vc = av bye Associative Law (avb)Ac = (@AcV (bAC) _ Distributive Laws (@nb)ve = (ave) A (Ve) avb=bva Commutative Law >Roman(x) \/ >know(x, Marcus) \y hate(x, Caesar) \/ shate(y, 2) \V thinkcrazy(%, ¥) 8. Create a separate clause corresponding to each conjunct in order for a well formed formula to be true, all the clauses that are generated from it must be true. 9, Standardize apart the variables in set of clauses generated in step 8. Rename the variables. So that no two clauses make reference to same variable. Convert the statements to clause form 1. man(marcus) pompeian(marcus) ¥ pompeian(x) > roman(x) ruler(caeser) vx: roman(x) > loyalto(x.caeser) V hate(x,caeser) x, 3y: loyalto(x.y) x, ¥y: person(x) a ruler(y) » tryassacinate(x.y) > — loyalto(x.y) tryassacinate(marcus, caeser) The resultant clause form is ‘Axioms in clause form ‘mar(Marcus) Pompeian(Marcus) ~Pompeiantx,) \/ Reman\x,) ruler\Caesar) ~Rlomantx) \/ loyalto(x,. Caesar \/ hate(x.Caesan fayatto(x, Mx) ~markx) \/ ~rullrty,) \/ ~tryassassinatetx,.y,) » —vyalto(x.¥1) tryassassinate(Marcus,Caesar) PANO PAwN PNOOSYENH Basis of Resolution: Resolution process is applied to pair of parent clauses to produce a derived clause. Resolution procedure operates by taking 2 clauses that each contain the same literal. The literal must occur in the positive form in one clause and negative form in the other. The resolvent is obtained by combining all of the literals of two parent clauses except ones that cancel. If the clause that is produced in an empty clause, then a contradiction has been found. Artificial Intelligence Notes Eg: winter and — winter will produce the empty clause. Ifa contradiction exists, then eventually it will be found, Of course, if no contradiction exists. itis possible that the procedure will never terminate, although as we will see, there are often ways of detecting that no contradiction Resolution in Propositional Logic: 1, Conver all the propositions of F to clause form, 2. Negate P and convert the result to clause form, Add itt the set of clauses obtained in tep 1 3. Repeat until either a contradiction is found or no progress can he made: (a) Select two clauses. Call these the parent clauses. (b) Resolve them together. The resulting claus, called the resolvent, wil be the disjunction of al of the literals of both of the parent clauses with the following exception: If here are any pairs of literals J and ~ such that one ofthe parent clauses contains andthe other contains ~, then select one such pair and eliminate both (and fon the resolvent (©) Ifthe resolvent is the empty clause, then a contradiction has been found. fit is no, then adit 0 the set of clauses availabe tothe procedure, Example: Consider the following axioms PPAQAR (ST) -Q T Convert them into clause form and prove that R is true LP 2 (PAQ=R => =(PAQVR + =PV=OVR 3 GYDN=R =(SVT)vQ -> (SA>T)VQ > (+SVQ)A(>TVQ) ar ~~ ie = wa Sw a — Ris contradiction. Hence, R is true. Unification Algorithm + In propositional logic it is easy to determine that two literals cannot both be true at the same time, * Simply look for L and ~L . In predicate logic, this matching process is more complicated, since bindings of variables must be considered. ‘+ In order to determine contradictions we need a matching procedure that compares two literals and discovers whether there exist a set of substitutions that makes them identical. ‘+ There isa recursive procedure that does this matching. It is called Unification algorithm. ‘+The process of finding a substitution for predicate parameters is called unification. Artificial Intelligence Notes + We need to know: — that 2 literals can be matched. — the substitution is that makes the literals identical. + There is a simple algorithm called the unification algorithm that does this. ‘The Unification Algorithm. 1. Initial predicate symbols must match. 2. For each pair of predicate arguments: — Different constants cannot match. — A variable may be replaced by a constant. — A variable may be replaced by another variable. — A variable may be replaced by a function as long as the function does not contain an instance of the variable. + When attempting to match 2 literals, all substitutions must be made to the entire literal. + There may be many substitutions that unify 2 literals; the most general unifier is always desired. Unification Example: P(x) and P(y): substitution = (x/y) > substitution x for y P(x.x) and P(y,2): P(2/y)(y/2) Py for x,then z for y P(f()) and P(x) + can’t do it! P(x) vQUane) and P(Bill) v.Q(y): (Bill/x,Jane/y) Father(Bill, Chelsea) —Father(Bill, x) v Mother(Hillary, x) Man(Marcus)—Man(x) v Mortal(x) Loves(father(a), a) sLoves(x,y) v Loves(y,x) ‘The object of the Unification procedure is to discover at least one substitution that causes two literals to match. Usually, if there is one such substitution there are many hate(x,y) hate(Marcus,z) could be unified with any of the following substitutions: (Marcus/x,2/y) (Marcus/x,y/2) (Marcus/x, Caeser /y, Caeser/2) (Marcus/x, Polonius/y, Polunius/2) {In Unification algorithm each literal is represented as a list, where first element is the name of a predicate and the remaining elements are arguments. The argument may be a single element (atom) or may be another list. ‘The unification algorithm recursively matches pairs of elements, one pair at a time. The matching. rules are: Artificial Intelligence Notes + Different constants, functions or predicates cannot match, whereas identical ones ca + A variable can match another variable, any constant or a funi expression, subject to the condition that the function or [predicate expre: contain any instance of the variable being matched (otherwise it will lead to infinite recursion). + The substitution must be consistent. Substituting y for x now and then z for x later is inconsistent. (a substitution y for x written as y/x) gore Uni 12) . IF LI of £2 are both variables or constants, then: (a) TF L1 and £2 are identical, den return NIL. (b) Else if L1 is variable, then if L1 occurs in £2 then return {PAIL}, else etuen (L2/L). (c) Else if £2 isa variable then if £2 occurs in LI then return (FAIL), else return (L1/L2). (d) Else retum (FAIL). 2. Ifthe initial predicate symbols in L1 and £2 are not identical, then return {FAIL}. 3. If LL and £2 have a different number of arguments, then return (FAIL). 4. Sct SUBST to NIL. (At the end ofthis procedure, SUBST will contain all the substitutions used to unify Lh and 12) 5. For! < 1 0 number of arguments in Lt: (a) Call Unify with the th argument of L} and the ith argument of L2, putting result in S. (b) HS contains FAIL then retum {FAIL}. (©) IES is not equal to NIL then: Apply 5 to the remainder of both Li and £2. Gi) SUBST : = APPEND\S, SUBST) 6. Rewen SUBST. Example: Suppose we want to unify p(X,Y,¥) with p(a,Z,b). Initially & is (p(%,Y,¥)=p(a,Z,b)}. The first time through the while loop, & becomes {X=a, ¥=Z,Y=b}. Suppose X=a is selected next. Then 5 becomes{X/a} and E becomes {Y=Z,¥=b}. Suppose Y=Z is selected. Then Y is replaced by Z in S and &. S becomes{X/a,¥/Z} and E becones {1 Finally Z=b is selected, Z is replaced by b, 5 becomes {X/a,¥/b,2/b}, and E becomes enpty. The substitution {X/a,¥/b,2/b} is returned as an MGU. Unification: va: knows(John,x) + hates(John, x) knows(John, Jane) vy: knows(y, Leonid) vy: knows(y, mother(y)) vx: knows(x, Elizabeth) Artificial Intelligence Notes UNIFY (knows(John,x), knows(Jokn, ane) = YJane/x) UNIFY (knows(John, x), knows(y, Leonid)) = (Leonid/s, John/y} UNIFY (knows(John, x), knows(y, mother(y))) = (John/y, mother(John)/x) UNIFY (knows(John, x), knows(x, Blizabeth)) = FAIL Resolution in Predicate Logic + Two literals are contradictory if one can be unified with the negation of the other. © For example man(x) and man (Himalayas) are contradictory since man(x) and man(Himalayas ) can be unified. + In predicate logic unification algorithm is used to locate pairs of literals that cancel out, + It is important that if two instances of the same variable occur, then they must be given identical substitutions Algorithm: Resolution 1. Convert all the statements of F to clause form, 2. Negate P and convert the result to clause form. Add it to the set of clauses obttfHied in 1 3. Repeat until either a contradiction is found, no progress can be made, or a prede- termined amount of ‘effort has been expended, (a) Select two clauses. Cull these the parent clauses. (b) Resolve them together. The resolvent willbe the disjunction of all the literals of both parent clauses, with appropriate substitutions performed and with the following exception: If there is one pair of literals 71 and ~72 such that one of the parent clauses contains T2 and the other contains 71 and if 71 and 72 are unifiable, then neither 71 nor T2 should appear in the resolvent, We call 71 and 72 Complementary literals. Use the substitution produced by the unification o create the resolvent. If there is more than one pair of complementary literals, Only one pair should be omitted from the resolvent (6) Ifthe resolvent isthe empty clause, then a contradiction has been found. If it is not, then add it to the set of clauses available tothe procedure. Prove that Marcus hates ceaser using resolution. Prove: hate (Marcus, Caesar) nate (Marcus, Caesar) —Pompeian (Marcus) y lovato (Marcus, Caesar) > 7 loyolto (Mercus, Caesar) Macuslrs. Caosartyy an (Marcus) \ —euler (Caesar) \/ ~tryassassinate (Marcus, Caesar) Artificial Intelligence Notes Example: John likes all kinds of food. Apples are food. Chicken is food. ‘Anything anyone eats and it is not killed is food. (a) Convert all the above statements into predicate logic (©) Show that John likes peanuts using back chaining (0) Convert the statements into clause form (@) Using Resolution show that “John likes peanuts” Bill eats peanuts and is stil alive. Swe eats everything bill eats 1. Vx: food(x) > like(John) Food(Apples) . Food(Chicken) 2. 3, 4. Wx, y: Eat(x,y)A >Killed(x) + Food(y) 5. Eats(Bill, Peanuts), Alive(BilD) 6. ¥x:Bats(Bill,x) + Eats(Swe,2) (b) Backward Chaining Proof: Like (John, Peanuts) f Food(Peanuts) f Fat(Bill, Peanuts) A Alive(Bill) 1 it (0 Clause Form: 1. Food(x) V Like(Joun, x) Food(4pples) Food(Chicken) Eats(Bill, Peanuts) Alive(Bill) 7. — Eats(Bill,x)) v Eats(Swe,x) (4) Resolution Proof: oy aeN i 7 Ue oe —~ (Eat(x, y) A~ Killed(x)) v Food(y) => (~ Eat(x,y) v Killed(x)) v Food) Peonuhs ts 7 feed Creare) J 4,> Prenuts 7 eatcos ,Asanure) Vv Fodiueers) 5c) A tia Peakeait, Rosh Artificial Intelligence Notes Answering Questions ‘We can also use the proof procedure to answer questions such as “who tried to assassinate Tryassassinate(y.Caesar). — Once the proof is complete we need to find out what was substitution was made n can be used to answer fill.in-the-blank questions, such as "When Marcus die?" or "Who tried to assassinate a ruler?” Answering these questions involves finding a known statement that matches the terms given in the question and then responding with another piece of the same statement that fils the slot demanded by the question. From Clause Form to Horn Clauses ‘The operation is to convert Clause form to Horn Clauses. This operation is not always possible. Horn clauses are clauses in normal form that have one or zero positive literals. The conversion from a clause in normal form with one or zero positive literals to a Horn clause is done by using the implication property. AP vQ Rewrites toP Q Example: Predicate Yo (aiterate(x) 3 (urites(x) 4 ~By(reads(x,y) book{y))) Simplify x literate(x) v (writes{x) a —ytreads(x.y) - book{y)))) Move negations in ‘x (lterate() v Cortex) a Vy freds( a) book(y))) x literate(s) v (yites(x) a Yy-eads(xy) v book(y))) No Skolemize (there are no existential quantifiers) Remove universal quantifier x Wy (literate) v (rte(x) A (-weads(xy) v book(y))) literate) v uritests) - eadtsy) ¥ bo0K(9)) Distribute disjunctions Aiteate(x) v —writes(x) (literate) v —reads(x.y) v —bookty)) ‘Goritetx) v litrate(x)) (-reads(xy) v =book{y) litrate(3)) Convert to Clause Normal Form rites) Titerate(x) ~reads(x,y) v —bookty) v literate) ‘Convert to Horn Clauses swrites(x) > literat(x) reads(x,y) A book{y) > literate(x) (ox (liternte(x) > reads(x) v write(x)) ‘Simplify \ux iterate) v reads(x) v write(x)) ‘The negations are already in Ya (iterates) v reads(x)v writes) Artificial Intelligence Notes 4. Knowledge Representation Issues Introduction: Knowledge plays an important role in Al systems. The kinds of knowledge might need to be represented in Al systems: > Objects: Facts about objects in our world domain. e.g. G brass instruments. > Events: Actions that occur in our world. e.g. Steve Vai played the guitar in Frank Zappa's Band. > Performance: A behavior like playing the guitar involves knowledge about how to do things > Meta-knowledge: Knowledge about what we know. e.g. Bobrow's Robot who plan's a trip. It knows that it can read street signs along the way to find out where it is. rs have strings, trumpets are Representations & Mappin; In order to solve complex problems in Al we need: ‘large amount of knowledge Some mechanisms for manipulating that knowledge to create solutions to new problem. A variety of ways of representing knowledge have been exploited in Al problems. In this regard we deal with two different kinds of entities: = Facts: truths about the real world and these are the things we want to represent, + Representation of the facts in some chosen formalism. These are the things which we will actually be able to manipulate. ‘One way to think of structuring these entities is as two levels: + Knowledge Level, at which facts are described. + Symbol Level. at which representations of objects at the knowledge level are defined in terms of symbols that can be manipulated by programs. Mappings between Facts and Representations: Reasoning [ programs | pfnioral ts eee] Ce eee English generation ‘The model in the above figure focuses on facts, representations and on the 2-way mappings that ‘must exist between them. These links are called Representation Mappings. ~ Forward Representation mappings maps from Facts to Representations. Backward Representation mappings maps from Representations to Facts Artificial Intelligence Notes English or natural language is an obvious way of representing and handling facts. Regardless of representation for facts, we use in program, we need to be concerned with English Representation of those facts in order to facilitate getting information into or out of the system, Mapping functions from English Sentences to Representations: Mathematical logic as representational formalism. Example: “Spot is a dog” ‘The fact represented by that English sentence can also be represented in logic as: dog(Spot) ‘Suppose that we also have a logical representation of the fact that “All dogs have tails” > vx: dog(x) + hastail(x) Then, using the deductive mechanisms of logic, we may generate the new representation object: hastail(Spot) Using an appropriate backward mapping function the English sentence “Spot has a tail” can be generated, Fact-Representation mapping may not be one-to-one but rather are many-to-many which are a characteristic of English Representation. Good Representation can make a reasoning program simple. Example: “All dogs have tails” “Every dog has a tail” From the two statements we can conclude that “Each dog has a tail.” From the statement 1, we conclude that “Each dog has more than one tail.” ‘When we try to convert English sentence into some other represent such as logical propositions, we first decode what facts the sentences represent and then convert those facts into the new representations. When an Al program manipulates the internal representation of facts these new representations should also be interpretable as new representations of facts. Mutilated Checkerboard Problem: Problem: In a normal chess board the opposite corner squares have been eliminated. The given task is to cover all the squares on the remaining board by dominoes so that each domino covers two squares. No overlapping of dominoes is allowed. can it be done? Consider three data structures Number of black squares = 30 Number of white cquares = 32 @ o o Three Representations of a Mutilated Checker board Artificial Intelligence Notes ‘The first representation does not directly suggest the answer to the problem. The second may suggest. The third representation does, when combined with the single additional facts that each domino must cover exactly one white square and one black square. ‘The puzzle is impossible to complete. A domino placed on the chessboard will always cover one White square and one black square. Therefore a collection of dominoes placed on the board will cover an equal numbers of squares of each color. If the two white corners are removed from the board then 30 white squares and 32 black squares remain to be covered by dominoes. so this is impossible. If the two black comers are removed instead, then 32 white squares and 30 black squares remain, soit is again impossible. The solution is number of squares must be equal for pos solution, Initiad Final facts forward ¥ | representation mapping Internat representation of inital facts >} ‘Operation’ of program Representation of Facts In the above figure, the dotted line across the top represents the abstract reasoning process that @ program is intended to model. The solid line across the bottom represents the concrete reasoning process that a particular program performs. This program successfully models the abstract process to the extent that, when the backward representation mapping is applied to the program's output, the appropriate final facts are actually generated If no good mapping can be defined for 2 problem, then no matter how good the program to solve the problem is, it will not be able to produce answers that correspond to real answers to the problem. Artificial Intelligence Notes Using Knowledge Let us consider to what applications and how knowledge may be used Learning: acquiring knowledge. This is more than simply adding new facts to a knowledge base. New data may have to be classed prior to storage for easy retrieval, etc. Interaction and inference with existing facts to avoid redundancy and replication in the knowledge and also so that facts can be updated. Retrieval: The representation scheme used can have a critical effect on the efficiency of the method, Humans are very good at it. Many Al methods have tried to model human. Reasoning: Infer facts from existing data, Ifa system on only knows: + Miles Davis is a Jazz Musician. +All Jazz Musicians can play the If things like fs Miles Davis a Jazz Musician? or Can Jazz Musicians play their instruments well? are asked then the answer is readily obtained from the data structures and procedures. instruments well. However @ question like “Can Miles Davis play his instrument well?” requires reasoning. The above are all related. For example, it is fairly obvious that learning and reasoning involve retrieval etc. Approaches to Knowledge Representation ‘A good Knowledge representation enables fast and accurate access to Knowledge and understanding of content. The goal of Knowledge Representation (KR) isto facilitate conclusions from knowledge. ‘The following properties should be possessed by a knowledge representation system. + Representational Adequacy: the ability to represent all kinds of knowledge that are needed in that domain: + Inferential Adequacy: the ability to manipulate the knowledge represented to produce new knowledge corresponding to that inferred from the original: + Inferential Efficiency: the ability to incorporate into the knowledge structure additional information that can be used to focus the attention of the inference mechanisms in the most promising directions. + Acquisitional Efficiency: the ability to acquire new information easily. The simplest case involves direct insertion. by a person of new knowledge into the database. Ideally. the program itself would be able to control knowledge acquisition. No single system that optimizes all of the capabilities for all kinds of knowledge has yet been found. As a result, multiple techniques for knowledge representation exist. Knowledge Representation Schemes ‘There are four types of Knowledge Representation: Relational Knowledge: — provides a framework to compare two objects based on equivalent attributes = any instance in which two different objects are compared Is a relational type of knowledge Artificial Intelligence Notes > Inheritable Knowledge: — is obtained from associated objects = itprescribes a structure in which new objects are created which may inherit all or a subset of attributes from existing objects. > Inferential Knowledge — Is inferred from objects through relations among objects — Example: a word alone is simple syntax, but with the help of other words ii phrase the reader may infer more from a word: this inference within linguistic called semantics Declarative Knowledge a statement in which knowledge is specified, but the use to which that knowledge is to be put isnot given. — Example: laws, people's name; there are facts which can stand alone, not dependent on other knowledge Procedural Knowledge — a representation in which the control information, to use the knowledge is embedded in the knowledge itself. — Example: computer programs, directions and recipes: these indicate specific use or implementation Simple relational knowledge The simplest way of storing facts is to use a relational method where each fact about a set of objects Is set out systematically in columns. This representation. gives little opportunity for inference, but it can be used as the knowledge basis for inference engines. + Simple way to store facts. + Each fact about a set of objects is set out systematically in columns. + Little opportunity for inference. + Knowledge basis for inference engines. Table - Simple Relational Knowledge . Player Height Weight | Bats - Throws ‘Aaron «0 180 Right Right Mays 510 170 Right =Right Ruth 62 215 | Let Lert Williams 63 205 Toft = Right Given the facts it is not possible to answer simple question such as "Who is the heaviest player?" but if a procedure for finding heaviest player is provided, then these facts will enable that procedure to compute an answer. We can ask things like who "bats - left” and "throws - right’. Inheritable Knowledge Here the knowledge elements inherit attributes from their parents. The knowledge is embodied in the design hierarchies found in the functional, physical and process domains. Within the hierarchy, elements inherit attributes from their parents, but in many cases not all attributes of the parent elements be prescribed to the child elements, ‘The inheritance is a powerful form of inference, but not adequate. The basic KR needs to be augmented with inference mechanism. Artificial Intelligence Notes The KR in hierarchical structure, shown below, is called “semantic network” or a collection of “frames” or “slot-and-filler structure”. The structure shows property inheritance and way for Insertion of additional knowledge. Property inheritance: The objects or elements of specific classes inherit attributes and values from more general classes. The classes are organized in a generalized hierarchy. Pa nged PCI Baseball Knowledge a isa: show class inclusion cE 510) aie | heat instance: show class membership Inatanco | instance Fcrieago|_,_twam [Three Finger] [Poe Woe-| team [Brook Brown [Dodgers Inhevtoble Knowledge ‘© The directed arrows represent attributes (isa, instance, team) originates at object being described and terminates at object or its value. The box nodes represent objects and values of the attributes. Viewing a node as a frame Example: Baseball-player Isat ‘Adult-Male Bats EQUAL handed Height: 61 Batting-average: 0.252 Algorithm: Property Inheritance ‘Toretrieve a value V for attribute A of an instance object O: |. Find 0 in the knowledge base. 2. [there isa value there forthe attribute A, report that value, 3. Otherwise, see if there is value forte attribute instance. If not, then fa. 4. Otherwise, move to the node corresponding to that value and look fora value forthe attribute A. fone is found, report it 5. Otherwise, do until there is no value forthe is attrihute or until an answer is found: (a) Get the value ofthe isa atribute and move to that no. (b) See if there is a value forthe attribute A. IF there is, report it. ‘This algorithm is simple. It describes the basic mechanism of inheritance. It does not say what to do if there is more than one value of the instance or “isa” attribute. Artificial Intelligence Notes This can be applied to the example of knowledge base, to derive answers to the following queries: = team (Pee-Wee-Reese) = Brooklyn-Dodger = batting-average (Three-Finger-Brown) = 0.106 = height (Pee-Wee-Reese) = 6.1 = bats (Three-Finger-Brown) = right Inferential Knowledge: This knowledge generates new information from the given information. This new information does not require further data gathering from source, but does require analysis of the given information to generate new knowledge. In this, we represent knowledge as formal logic. Example: ~ given a set of relations and values, one may infer other values or relations a predicate logic (a mathematical deduction) is used to infer from a set of attributes. inference through predicate logic uses a set of logical operations to relate individual data, the symbols used for the logic operations are: »* (implication), "3" (not), "V" (er), "A" (and), "(for all, "3." (there exists). Examples of predicate logic statements 1. "Wonder" is a name of a dog dog (wonder) 2. All dogs belong to the class of animals : ¥x: dog (x) + animal(x) 3. All animals either live on land erin ¥x (x) > live (, water land) V live (x, water) From these three statements we can infer that “ Wonder lives either on land or on water.” Note : If more information is made available about these objects and their relations, then more knowledge can be inferred, Procedural Knowledge Procedural knowledge can be represented in programs in many ways. Thélmost common way is simply as for doing something. The machine uses the knowledge when it executes the code to perform a task. Procedural Knowledge is the knowledge encoded in some procedures. Unfortunately, this way of representing procedural knowledge gets low scores with respect to the properties of inferential adequacy (because itis very difficult to write a program that can reason about another program's behavior) and acquisitional efficiency (because the process of updating and debugging large pieces of code becomes unwieldy). ‘The most commonly used technique for representing procedural knowledge in Al programs is the Use of production rules, 0 ninth inning, and score is close, and Jess than 2 outs, and first base is vacant, and batter is better hitter than next batter, ‘Then: walk the batter. Procedural Knowledge as Rules Artificial Intelligence Notes Production rules, particularly ones that are augmented with information on how they are to be Used, are more procedural than are the other representation methods. But making a clean distinction between declarative and procedural knowledge is difficult. The important difference is in how the knowledge is used by the procedures that manipulate it. Heuristic or Domain Specific knowledge can be represented using Procedural Knowledge. Issues in Knowledge Representation Below are listed issues that should be raised when using knowledge representation techniques: ‘© Important Attributes : ‘Any attribute of objects so basic that they occur in almost every problem domain ? ‘© Relationship among attributes: ‘Any important relationship that exists among object attributes ? © Choosing Granularity : ‘At what level of detail should the knowledge be represented ? © Set of objects : How sets of objects be represented ? © Finding Right structure : Given a large amount of knowledge stored, how can relevant parts be accessed ? Important Attributes : (Ref, Example - Fig. Inheritable KR) There are attributes that are of general significance. There are two attributes “instance” and “isa”, that are of general Importance. These attributes are important because they support property inheritance. The attributes are called a variety of things in Al systems, but the names do not matter. What does matter is that they represent class membership and class inclusion and that class inclusion is transitive. The predicates are used in Logic Based Systems. Relationship among Attributes 1D The attributes to describe objects are themselves entities that we represent. OI The relationship between the attributes of an object, independent of specific knowledge they encode, may hold properties li 1 Inverses, existence in an isa hierachy, techniques for reasoning about values and single valued attributes. Artificial Intelligence Notes Inverses : This is about consistency check, while a value is added to one attribute. The entities are related to each other in many different ways. The figure shows attributes (isa, instance, and team), each with a directed arrow, originating at the object being described and terminating either at the ‘object or its value. There are two ways of realizing thi + first, represent two relationships in a single representation; e.g., a logical representation, team(Pee-Wee-Reese, Brooklyn-Dodgers), that can be interpreted as a statement about Pee-Wee-Reese or Brooklyn-Dodger. + second, use attributes that focus on a single entity but use them in pairs, one the inverse of the other; for e.g., one, team = Brooklyn- Dodgers , and the other, team = Pee-Wee-Reese, The second way can be realized using semantic net and frame based systems. This Inverses used in Knowledge Acquisition Tools. is Existence in an “isa” hierarchy : This is about generalization-specialization, like, classes of objects and specialized subsets of those classes. There are attributes and specialization of attributes. Example: the attribute "height" is a specialization of general attribute in turn, a specialization of "physical-attribute". ization relationships for attributes are "physical-size" which is, These generalization-speci important because they support inheritance. This also provides information about constraints on the values that the attribute can have and ‘mechanisms for computing those values. Techniques for reasoning about values : This is about reasoning values of attributes not given explicitly. Several kinds of information are used in reasoning, like, height : must be in a unit of length, age _: of person can not be greater than the age of person's parents. ‘The values are often specified when a knowledge base is created. Artificial Intelligence Notes Several kinds of information can play a role in this reasoning, including: ‘© Information about the type of the value. * Constraints on the value often stated in terms of related entities. '* Rules for computing the value when it is needed. (Example: of such a rule in for bats attribute). These rules are called backward rules. Such rules have also been called if needed rules. '* Rules that describe actions that should be taken if a value ever becomes known. These rules are called forward rules. or sometimes ifadded rules. Single valued attributes : This is about a specific attribute that is guaranteed to take a unique value. Example : A baseball player can at time have only a single height and be a member of only one team. KR systems take different approaches to provide support for single valued attributes. ® Introduce an explicit notation for temporal interval. If two different values are ever asserted for the same temporal interval. signal a contradiction automatically. > Assume that the only temporal interval that is of interest is now. So if @ new value is asserted, replace the old value. > Provide no explicit support. Logic-based systems are in this category. But in these systems, knowledge base builders can add axioms that state that if an attribute has one value then Its known not to have all other values. Choosing Granularity What level should the knowledge be represented and what are the primitives ? = Should there be a small number or should there be a large number of low-level primitives or High-level facts. - High-level facts may not be adequate for inference while Low-level primitives may require a lot of storage. Example of Granularity = Suppose we are interested in following facts John spotted Sue. = This could be represented as Spotted (agent(John), object (Sue)) = Such a representation would make it easy to answer questions such are Who spotted Sue ? = Suppose we want to know Did John see Sue 7 — Given only one fact, we cannot discover that answer. ~ We can add other facts, such as Spotted (x, y) > saw (x, ¥) = We can now infer the answer to the question. Artificial Intelligence Notes Choosing the Granularity of Representation Primitives are fundamental concepts such as holding, seeing, playing and as English dear we will find difficulty in deciding upon which words to choose as our primitives in a series of situations. Separate levels of understanding require different levels of primitives and these need many rules to link together Set of Objects a very rich language with over half a million words Certain properties of objects that are true as member of a set but not as individual; Example : Consider the assertion made in the sentences “there are more sheep than people in Australia", and “English speakers can be found all over the world." To describe these facts, the only way is to attach assertion to the sets representing people, sheep, and English. The reason to represent sets of objects is : If a property is true for all or most elements of a set, then it is more efficient to associate it once with the set rather than to associate it explicitly with every elements of the set . is done in different ways : in logical representation through the use of universal quantifier, and in hierarchical structure where node represent sets, the inheritance propagate set level assertion down to individual. Example: assert large (elephant); Remember to make clear distinction between, - whether we are asserting some property of the set itself, means, the set of elephants is large, or - asserting some property that holds for individual elements of the set , means, any thing that is an elephant is large. There are three ways in which sets may be represented : (a) Name, as in the example - Ref Fig. Inheritable KR, the node - Baseball- Player and the predicates as Ball and Batter in logical representation. (b) Extensional definition is to list the numbers, and () In tensional definition is to provide a rule, that returns true or false depending on whether the object is in the set or not. Artificial Intelligence Notes {a:sun — planet(x) A human — inhabited(x)} - Intensional Definition Extensional Definition — Set of our sun planets on which people live is Earth Finding Right Structure Access to right structure for describing a particular situation. It requires, selecting an initial structure and then revising the choice. While doing so, it is necessary to solve following problems : = how to perform an initial selection of the most appropriate structure. - how to fill in appropriate details from the current situations. ~ how to find a better structure if the one chosen initially turns out not to be appropriate. - what to do if none of the available structures is appropriate. - when to create and remember a new structure. There is no good, general purpose method for solving all these problems. Some knowledge representation techniques solve some of them. Artificial Intelligence Notes 6. Representing Knowledge using Rules Procedural versus Declaration Knowledge Declar Knowledge Procedural Knowledge Factual information stored in memory and known to be static in nature. the knowledge of how to perform, or how to operate knowledge of facts or concepts @ skill or action that you are capable of performiny Knowledge about that something true or false Knowledge about how to do something to reach a particular objective or goal Knowledge is specified but how to use to which that knowledge is to be put is not given control information i.e., necessary to use the knowledge is considered to be embedded in the knowledge itself Eg: concepts, facts, propositions, assertions, semantic nets ... E.g: procedures, rules, strategies, agendas, models It is explicit knowledge (describing) Itis tacit knowledge (doing) The declarative representation is one in which the knowledge is specified but how to use to which that knowledge is to be put is not given. ‘+ Declarative knowledge answers the question 'What do you know?" ‘+ Itis your understanding of things, ideas, or concepts. ‘+ In other words, declarative knowledge can be thought of as the who, what, when, and where of information. + Declarative knowledge is normally discussed using nouns, lke the names of people. places, or things or dates that events occurred. ‘The procedural representation is one in which the control information i.e.. necessary to use the knowledge is considered to be embedded in the knowledge itself. ‘+ Procedural knowledge answers the question "What can you do?! While declarative knowledge is demonstrated using nouns, Procedural knowledge relies on action words, or verbs. ‘* It isa person's ability to carry out actions to complete a task. The real difference between declarative and procedural views of knowledge lies in which the control information presides. Example: 1. man(marcus) 2. man(ceaser) 3. vx: man(2x) + person(x) 4. person(cleopatra) ‘The statements 1, 2 and 3 are procedural knowledge and 4 is a declarative knowledge. Artificial Intelligence Notes Forward & Backward Reasoning The object of a search procedure is to discover a path through a problem space from an initial configuration to a goal state. There are actually two directions in which such a search could proceed: Forward Reasoning, = from the start states = LHS rule must match with initial state = Eg: ADB,.BOC => A9C * Backward Reasoning, = from the goal states = RHS rules must match with goal state Eg: 8-Puzzle Problem In both the cases, the control strategy is it must cause motion and systematic. The production system model of the search process provides an easy way of viewing forward and backward reasoning as symmetric processes, Consider the problem of solving a particular instance of the 8-puzzle problem. The rules to be used for solving the puzzle can be written as: Assume the areas ofthe tray are numbered T2/3 4]sje Square 1 empty and Square 2 Contains tie. —> ‘Square 2 empty and Square 1 contains tile Square 1 emply and Square 4 contains tile n > ‘Square 4 emply and Square 1 contains tle Square 2 empty and Square 1 contains tile. ‘Square 1 empty and Square 2 contain tile A Sample of the Rules for Solving the 8-Pubale Reasoning Forward from Initial State: > Begin building a tree of move sequences that might be solved with initial configuration at root of the tree > Generate the next level of the tree by finding all the rules whose left sides match the root rode and using their right sides to create the new configurations. > Generate the next level by taking each node generated at the previous level and applying. to it all of the rules whose left sides match it. Continue until a configuration that matches the goal state is generated. Reasoning Backward from Goal State: > Begin building a tree of move sequences that might be solved with goal configuration at root of the tree. > Generate the next level of the tree by finding all the rules whose right sides match the root node. These are all the rules that, if only we could apply them, would generate the Artificial Intelligence Notes state we want. Use the left sides of the rules to generate the nodes at this second level of the tree. Generate the next level of the tree by taking each node at the previous level and finding all the rules whose right sides match it. Then use the corresponding left sides to generate the new nodes. Continue until a node that matches the initial state is generated, > This method of reasoning backward from the desired final state is often called goal- directed reasoning. - . To reason forward, the left sides (preconditions) are matched against the current state and the right sides (results) are used to generate new nodes until the goal is reached. To reason backward, the right sides are matched against the current node and the left sides are used to generate new nodes representing new goal states to be achieved. The following 4 factors influence whether itis better to reason Forward or Backward: 1. Are there more possible start states or goal states? We would like to move from the smaller set of states to the larger (and thus easier to find) set of states 2. In which direction branching factor (ie. average number of nodes that can be reached directly from a single node) is greater? We would like to proceed in the direction with lower branching factor. 3. Will the program be used to justify its reasoning process to a user? If so, itis important to proceed in the direction that corresponds more closely with the way the user will think. 4. What kind of event is going to trigger a problem-solving episode? If itis arrival of a new fact, forward reasoning makes sense. If it is a query to which a response is desired, backward reasoning is more natural. Backward-Chaining Rule Systems Backward-chaining rule systems are good for goal-directed problem solving. For example, a query system would probably use backward chaining to reason about and answer user questions. Unification tries to find a set of bindings for variables to equate a (sub) goal with the head of some rule. Medical expert system, diagnostic problems vy S Forward-Chaining Rule Systems > Instead of being directed by goals, we sometimes want to be directed by incoming data. > For example, suppose you sense searing heat near your hand. You are likely to jerk your hand away. > Rules that maich dump their right-hand side assertions into the state and the process repeats. > Matching is typically more complex for forward-chaining systems than backward ones. > Synthesis systems — Design/Configuration Artificial Intelligence Notes Example of Typical Forward Chaining Rules 1). Ifhot and smoky then ADD fire 2) If alarm_beeps then ADD smoky 3) If fire then ADD switchon_sprinkles Facts 1) alarm_beeps (given) 2) hot (given) (3) smoky (from Fl by R2) (4) fie (from F2, F4 by RI) (6) switch_on_sprinklers (from F2 by R3) Example of Typical Backward Chaining Goal: Should | switch on sprinklers? Combining Forward and Backward Reasoning. Sometimes certain aspects of a problem are best handled via forward chaining and other aspects by backward chaining. Consider a forward-chaining medical diagnosis program. It might accept twenty or so facts about a patient's condition then forward chain on those concepts to try to deduce the nature and/or cause of the disease. Now suppose that at some point, the eft side of e rule was nearly satisfied — nine out of ten of its preconditions were met. It might be efficient to apply backward reasoning to satisfy the tenth precondition in a directed manner, rather than wait for forward chaining to supply the fact by accident. ‘Whether it is possible to use the same rules for both forward and backward reasoning also depends on the form of the rules themselves. If both left sides and right sides contain pure assertions, then forward chaining can match assertions on the left sid@ of @ rule and add to the state description the assertions on the right side, But if arbitrary procedures are allowed as the right sides of rules then the rules will not be reversible. Logic Programming > Logic Programming is @ programming language paradigm in which logical assertions are viewed as programs. > There are several logic programming systems in use today, the most popular of which is PROLOG. A PROLOG program is described as a series of logical assertions, each of which is a Horn clause. > AHom clause is a cause that has at most one positive literal. Thus p. pv @. p> 4 are all Horn clauses Programs written in pure PROLOG are composed only of Horn Clauses. Artificial Intelligence Notes Syntactic Difference between the logic and the PROLOG representations, including: -) In logic, variables are explicitly quantified. In PROLOG, quantification is provided implicitly by the way the variables are interpreted. © The distinction between variables and constants is made in PROLOG by having all variables begin with uppercase letters and all constants begin with lowercase letters In logic. there are explicit symbols for and (x) and or (v). In PROLOG, there isan explicit symbol for and (,), but there is none for or. In logic. implications of the form “p implies q” as written as p->q. In PROLOG, the same implication is written “backward!” as g:p. Example: vx: pet(x) /\ smaix) + apartmentpet{x) Wx: cattx) \/ dog(x) —> pet(x) Vx: poodle(x) + dog(x) \ smalk(x) poodles ftujfy) A Representation In Logic apartmentpet (x): pet (x) :- cat(x). pet (xX) :- dog(x). dog(X) :- poodle (x) . small(x) :~ poodle (x). poodle (fluffy). A Representation in PROLOG A Declarative and a Procedural Representation pet(X), small(x). ‘The first two of these differences arise naturally from the fact that PROLOG programs are actually sets of Horn Clauses that have been transformed as follows i If the Horn Clause contains no negative literals ( positive), then leave it as it is. (Otherwise, return the Horn clause as an implication, combining all of the negative into the antecedent of the implication and leaving the single positive literal (if there is cone) as the consequent. contains a single literal which is als This procedure causes a clause, which originally consisted of a disjunction of literals (all but one of which were negative), to be transformed to single implication whose antecedent is a conjunction of (what are now positive) literals. Artificial Intelligence Notes Yar: Wy : carla) A fishty) likes —t0- eat) Ye : calicotx) -> cat(x) Ys: tunat x) fish(x) ‘tuna( Charlie) ‘una(Herd) calico(Puss) {a) Convert these wi's into Hom clauses. (b) Convert the Hom clanses into a PROLOG program. (©) Write a PROLOG query corresponding to the question, “What does Puss like to eat?” and show how it will be answered by your program. (@) Hom clauses: 1. 7oar(x) V fish(y) V Likes-to-eat(x. y) 2. >ealico(z) V cat(x) 3. tuna(x) V fish) 4. tuna(Charlie) 5. tuna(Herb) 6. calico(Puss) {b) PROLOG program: Likeatoeat (X,Y) i= cat (x), fish(Y). cat (X) calico (Xx). fish (X) t- tuna(X). ‘tuna (charlie). ‘tuna (herb) ealico (puss). (©) Query: Qe likestoeat (puss, x). Answer: charlie Matching We described the process of using search to solve problems as the application of appropriate rules to individual problem states to generate new states to which the rules can then be applied and s0 forth until a solution is found. How we extract from the entire collection of rules those that can be applied at a given point? To do so requires some kind of matching between the current state and the preconditions of the rules. How should this be done? The answer to this question can be critical to the success of a rule based system. Artificial Intelligence Notes ‘A more complex matching is required when the preconditions of rule specify required properties that are not stated explicitly in the description of the current state. In this case, a separate set of rules must be used to describe how some properties can be inferred from others. An even more complex matching process is required if rules should be applied and if their pre condition approximately match the current situation. This is often the case in situations involving physical descriptions of the world, Indexing (One way to select applicable rules is to do a simple search though all the rules comparing each one’s precondition to the current state and extracting all the one’s that match. There are two problems with this simple solution: i. The large number of rules will be necessary and scanning through all of them at every step would be inefficient. Ti, It's not always obvious whether @ rule's preconditions are satisfied by a particular state. Solution: Instead of searching through rules use the current state as an index into the rules and select the matching one’s immediately. ae ee a © rajyloia|e| a 2 ajaalalale . move pawn tom 5 + Square. a2) : a] tw Squat era) 2 afelalalalalele ale + glolalelelalelz: aloln “Another Way to Describe Chess Moves ne Lape Ces Move Matching process is easy but at the price of complete lack of generality in the statement of the rules. Despite some limitations of this approach, Indexing in some form is very important in the efficient operation of rule based systems. Matching with Variables The problem of selecting applicable rules is made more difficult when preconditions are not stated as exact descriptions of particular situations but rather describe properties that the situations must have. It often turns out that discovering whether there is a match between a particular situation and the preconditions of a given rule must itself involve a significant search process. Backward-chaining systems usually use depth-first backtracking to select individual rules, but forward-chaining systems generally employ sophisticated conflict resolution strategies to choose ‘among the applicable rules. ‘While it is possible to apply unification repeatedly over the cross product of preconditions and state description elements, it is more efficient to consider the many-many match problem, in Artificial Intelligence Notes Which many rules are matched against many elements in the state description simultaneously. (One efficient many-many match algorithm is RETE. RETE Matching Algorithm ‘The matching consists of 3 parts 1. Rules & Productions 2. Working Memory 3. Inference Engine ‘The inference Engine is a cycle of production system which is match, select, execute. Inference engine Qe Working Knowledge memory Pen @ INFERENCE ENGINE The above cycle is repeated until no rules are put in the conflict set or until stopping condition is reached. In order to verify several conditions, itis @ time consuming process. To eliminate the need to perform thousands of matches of cycles on effective matching algorithm is called RETE. User Interface The Algorithm consists of two Steps 1. Working memory changes need to be examined. 2. Grouping rules which share the same condition & linking them to their common terms. RETE Algorithm is many-match algorithm (In which many rules are matched against many elements). RETE uses forward chaining systems which generally employee sophisticated conflict resolution strategies to choose among applicable rules. RETE gains efficiency from 3 major sources, RETE maintains a network of rule condition and it uses changes in the state description to determine which new rules might apply. Full matching is only pursued for candidates that could be affected by incoming/outgoing data. 2. Structural Similarity in rules: RETE stores the rules so that they share structures in memory. set of conditions that appear in several rules are matched once for cycle. 3. Persistence of variable binding consistency. While all the individual preconditions of the rule might be met, there may be variable binding conflicts that prevent the rule from firing. son(Mary, John) and son (Bill, Bob) son(x,y) * son(y,2) > grandparents(x,2) ‘an be minimized. RETE remembers its previous calculations and is able to merge nding information efficiently. Artificial Intelligence Notes Approximate Matching: Rules should be applied if their preconditions approximately match to the current situation Eg: Speech understanding program Rules: A description of a physical waveform to phones Physical Signal: difference in the way individuals speak, result of background noise. Conflict Resolution: ‘When several rules matched at once such a situation is called conflict resolution. There are 3 approaches to the problem of conflict resolution in production system. 1. Preference based on rule match: a. Physical order of rules in which they are presented to the system. b. Priority is given to rules in the order in which they appear 2. Preference based on the objects match: a. Considers importance of objects that are matched b. Considers the position of the match able objects in terms of Long Term Memory (LTM) & Short Term Memory(STM) LIM: Stores a set of rules STM (Working Memory) long term memory 3._ Preference based on the Action: a. One way to do is find all the rules temporarily and examine the results of each. Using a Heuristic Function that can evaluate each of the resulting states compare the merits of the result and then select the preferred one. ): Serves as storage area for the facts deduced by rules in Search Control Knowledge: > It is knowledge about which paths are most likely to lead quickly to a goal state > Search Control Knowledge requires Meta Knowledge. > It-can take many forms. Knowledge about ‘© Which states are more preferable to others. © which rule to apply in a given situation ©. the Order in which to pursue sub goals (© useful Sequences of rules to apply. ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING-18CS71 MODULE - 2 PART -2 CONCEPT LEARNING ‘+ Learning involves acquiring general concepts from specific training examples. Example: People continually learn general concepts or categories such as "bird, "car," "situations in which I should study more in order to pass the exam," etc. © Each such concept can be viewed as describing some subset of objects or events defined over a larger set ‘* Alternatively, each concept can be thought of as a Boolean-valued function defined over this larger set. (Example: A function defined over all animals, whose value is true for birds and false for other animals). 1.2. Definition: A CONCEPT LEARNING TASK “Inferring a Boolean-valued function from training examples of its input and output” Consider the example task of learning the target concept "Days on which John enjoys his favorite water sport. Example | Sky | AirTemp | Humidity | Wind | Water | Forecast | EnjoySport TY [Sunny | Warm | Normal | Strong | Warm | Same Yes 2 [Sunny | Warm | High | Strong | Warm | Same Yes 3 | Rainy | Cold | High | Strong | Warm |” Change No 4 [Sumy | Warm [Hi Strong | Coot | Change [Yes ‘Table: Positive and negative training examples for the target concept EnjoySport. TASK> To learn to predict the value of EnjoySport for an arbitrary day, based on the values of its other attributes? What hypothesis representation is provided to the learner? “Conjunction of constraints on the instance attributes.” Approach: Department of ISE, CiTech, Bangalore 1 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING-18CS71 Let each hypothesis be a vector of six constraints, specifying the values of the six attributes Sky, AirTemp, Humidity, Wind, Water, and Forecast. For each attribute, the hypothesis will either - Indicate by a “?” that any value is acceptable for this attribute, - Specify a “single required value” (e.g., Warm) for the attribute, or - Indicate by a "0" that no value is acceptable > If some instance x satisfies all the constraints of hypothe ). > The hypothesis that PERSON enjoys his favorite sport only on cold days with high as a positive example (h(x) = humidity is represented by the expression (2, Cold, High, ?, ?, 2) > The most general hypothesis-that every day is a positive example-is represented by Q 22,2.) > The most specific possible hypothesis-that no day is ap. represented by ©, 0,0,0,0,0) X- The set of items over which the concept is defined is called the se¢ of instances. Example, ‘+X is the set of all possible days, each represented by the attributes: Sky, AirTemp, Humidity, Wind, Water, and Forecast. * The concept or function to be learned is lled the target concept, which is denoted by c. c can be any Boolean valued function defined over the instances X c:X P01} * The target concept corresponds to the value of the attribute EnjoySport c(x) = 1 if EnjoySport = Yes, and e(x) = 0if EnjoySport = No ~ Instances for which e(x) = 1 > positive examples, or members of the target concept. Department of ISE, CiTech, Bangalore 2 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING-18CS71 - Instances for which c(x) = 0 > negative examples, or not target concept. ~The ordered pair (x, e(x)) > Describes the trai 1g example consisting of the instance x and its target concept value ¢(x). - D Set of available training examples - H- Set of all possible hypotheses that the learner may consider regarding the identity of the target concept. Each hypothesis in H represents a Boolean-valued function defined overX heX 3 (0,1) GOAL of the learner> To find a hypothesis h such that h(x) (x) for all x in X Given: Instances X: Possible days, each described by the attributes * Sky (with possible values Sunny, Cloudy, and Rainy), + AirTemp (with values Warm and Cold), * Humidity (with values Normal and High), * Wind (with values Strong and Weak), = Water (with values Warm and Cool), * Forecast (with values Same and Change). Hypotheses H > Each hypothesis is described by a conjunction of constraints on the attributes Sky, AirTemp, Humidity, Wind, Water, and Forecast. >The constraints may be"?" (any value is acceptable), “®” (no value is acceptable), or a specific value Target concept c: EnjoySport : X — {0,1} Training examples D: Positive and negative examples of the target function. To DETERMINE} A hypothesis h in H such that h(x) = e(x) for all. x in X ‘Table: The EnjoySport conceptlearningtask. . Bangalore. ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING-18CS71 ‘The inductive learning hypothesi Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples. 1.3. CONCEPT LEARNING AS SEARCH - Concept learning can be viewed as the task of searching through a large space of hypotheses implicitly defined by the hypothesis representation. ~The goal of this search is to find the hypothesis that best fits the training examples. Example, Consider the instances X and hypotheses H in the EnjoySport learning task. © The attribute Sky has three possible values, and AirTemp, Humidity, Wind, Water, Forecast each have two possible values, the instance space X contains exactly. = 96 distinct instances © 54.44.44 = 5120 syntactically distinct hypotheses within H. «Every hypothesis containing one or more "O” symbols represents the empty set of instances; that is, it classifies every instance as negative 1 + (4.3.3.3.3.3) = 973. Semantically distinct hypotheses 1.3.1. General-to-Specific Ordering of Hypotheses Consider the two hypotheses h.= (Sunny, ?, 2, Strong, ?, 2) h.= (Sunny, 2, 2, 2,2, 2) Consider the sets of instances that are classified positive by hi and by hz © hy imposes fewer constraints on the instance> classifies more instances as positive © Any instance classified positive by hy will also be classified positive by h.. Therefore, ho is more general than h, Department of ISE, CiTech, Bangalore 4 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING-18CS71 Given hypotheses hy and hy, h; is more-general-than or- equal do hn if and only if any instance that satisfies hy also satisfies hi, Defini mn: Let hj and h, be Boolean-valued functions defined over X. Then hy is more general-than-or-equal-to h. (written h, >, h,) if and only if (ve XN) =1) > (hy) =D) Instances X Hypotheses H Specie Gevert = Suny: Harm, High, Srong. Cool, Same> Any= Sin 2.2 Song 2, ?> Sy, arm, High, Ligh, Warm, Same> Ie Sums 22.2.2. A> Consider the first training example x= Sunny Warm Normal Strong Warm Same>, + From x, it is clear that hypothesis h is too specific. None of the "0" constraints in h are satisfied by this example, so each is replaced by the next ‘more general constraint that fits the example hy = Consider the second traini 1g example, a2 = , + Department of ISE, CiTech, Bangalore 6 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING-18CS71 ‘The second trai ng example forces the algorithm to further generalize hy, this time substituting a "2" in place of any attribute value in h that is, not satisfied by the new example a = Consider the third training example x3 = , ~ Upon encountering the third training the algorithm makes no change to h. The FIND-S algorithm simply ignores every negative example. ha = < Sunny Warm ? Strong Warm Same> Consider the fourth training example xu = , + The fourth example leads to a further generalization of h ha = < Sunny Warm ? Strong ?? > Instances X Hypotheses H Ho Specitic iy <2.0.0, 2.0.0 Sua Warm ih Sng Warm Somers >= Is ‘The key property of the FIND-S algorithm FIND-S is guaranteed to output the most specific hypothesis within H that is consistent with the positive training examples Department of ISE, CiTech, Bangalore 7 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING-18CS71 15. FIND-S algorithm’s final hypothesis will also be consistent with the ned in H, and negative examples provided the correct target concept is cont: provided the training examples are correct. Unanswered by FIND-S ‘Has the learner converged to the correct target concept? * Why prefer the most specific hypothesis? * Are the training examples consistent? © What if there are several maximally specific consistent hypotheses? VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM The key idea in the CANDIDATE-ELIMINATION algorithm is to output a description of the set of all hypotheses consistent with the training examples. Representation Definition: consistent- A hypothesis h is consistent with a set of trai examples D if and only if h(x) = e(x) for each example (x, e(x)) in D. Consistent(h, D) = ((x,c(a)) € Dyh(x) = (x) Definition: version space- The version space, denoted V Sp, p with respect to hypothesis space H and training examples D, the subset of hypotheses from HT consistent with the training examples in D Wu [he H | Consistenkh,D)} ‘The LIST-THEN-ELIMINATION algorithm ‘The LIST-THEN-ELIMINATE algorithm first initializes the version space to contain all hypotheses in H and then eliminates any hypothesis found inconsistent with any training example. . VersionSpace® a list containing every hypothesis in H 2. For each training example, (x,c(x)) Remove from VersionSpace any hypothesis h for which h(x) # (x) 3. Output the list of hypotheses in VersionSpace Department of ISE, CiTech, angalore, ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING-18CS71 © List-Then-Eliminate works in principle, so long as version space is finite Since it requires exhaustive enumeration of all hypotheses in practice it is not feasible ‘The version space is represented by its most general and least general members. ‘These member form general and specific boundary sets that delimit the version space within the partially ordered hypothesis space. Definit training data D, is the set of maximally general members of H consistent with D. m: The general boundary G, with respect to hypothesis space Hand G=(g eH |Consistentg,D) (—3g'= H)\g">, g) Consistent g',D)I} Definition: The specific boundary S, with respect to hypothesis space Hand training data D, is the set of mii mally general (i.e., maximally specific) members of H consistent with D. = (8 €H | Consistents, D) A(—3s'= H\(s >, 8") AConsistents',D)]} ‘Theorem: Version Space representation theorem Theorem: Let X be an arbitrary set of instances and Let H be a set of Boolean- valued hypotheses defined over X, Let e: X {O, 1} be an arbitrary target concept defined over X, and let D be an arbitrary set of training examples {(x, (x))). For all X, H, ¢, and D such that $ and G are well defined, Wy.» (he H [Gs €S)Gg Gig >, h>, s) To Prove: i, Every h satisfying the right hand side of the above expression is in VS H, D ii, Every member of VS H, p satisfies the right-hand side of the expression Sketch of proot i. Let g,h, sbe arbitrary members of G, H, S respectively with ¢>, h2, s ‘* By the definition of S, s must be satisfied by all positive examples in D. Because >, s h must also be satisfied by all positive examples in D. * By the definition of G, g cannot be satisfied by any negative example in D, and because g >, / h cannot be satisfied by any negative example in D. Department of ISE, CiTech, Bangalore 9 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING-18CS71 Because h is satisfied by all positive examples in D and by no negative examples in D, h is consistent with D, and therefore h is a member of VSH,D- ii, It can be proven by assuming some h in VSj1,p, that does not satisfy the right-hand side of the expression, then showing that this leads to an inconsistency CANDIDATE-ELIMINATION Learning Algorithm The CANDIDATE-ELIMINTION algorithm computes the version space containing all hypotheses from H that are consistent with an observed sequence of training examples. Initialize G to the set of maximally general hypotheses in H Initialize S to the set of maximally specific hypotheses in H For each training example d, do + Ifdisa positive example + Remove from G any hypothesis inconsistent with d + For each hypothesis s in $ that is not consistent with d + Remove s from S + Add to $ all minimal generalizations h of s such that + his consistent with d, and some member of G is more general than h + Remove from $ any hypothesis that is more general than another Ss hypothesis + If dis a negative example + Remove from $ any hypothesis inconsistent with d + For each hypothesis g in G that is not consistent with d + Remove g from G + Add to G all minimal specializations h of g such that + his consistent with d, and some member of S is more specific than h + Remove from G any hypothesis that is less general than another hypothesis in G CANDIDATE- ELIMINTION algorithm using version spaces Department of ISE, CiTech, Bangalore 10 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING-18CS71 An Illustrative Example Example] Sky ‘emp | Humidity \d_ | Water | Forecast | EnjoySport 1 [Sunny | Warm | Normal | Strong | Warm | Same Yes 2 [Sunny | Warm |~ High | Strong | Warm | Same Yes 3 | Rainy | Cold [~ High | Strong | Warm | Change No 4 [Sunny] Warm [~ High | Strong | Coot | Change | Yes CANDIDATE-ELIMINTION algorithm begins by initializing the version space to the set of all hypotheses in H; Initializing the G boundary set to contain the most general hypothesis in Go><2.22%%0 Initializing the S boundary set to contain the most specific (least general) hypothesis So> <0, 0,0,0,0,0> ‘© When the first training example is presented, the CANDIDATE-ELIMINTION algorithm checks the S boundary and finds that itis overly specific and it fails to cover the positive example. © The boundary is therefore revised by moving to the least more general hypothesis that covers this new example ‘* No update of the G boundary is needed in response to this training example because Go correctly covers this example For training example d, (Sunny, Warm, Normal , Strong Warm, Same) + So (G.drd.d.ds0) S (Sunny, Warm, Normal, Strong Warm, Same) Go When the second training example is observed, it has a similar effect of generalizing S further to S2, leaving G again unchanged i.e., G2 = G1 Department of ISE, CiTech, angalore, 1 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING-18CS71 For training example d, (Sunny, Warm, High, Strong ,Warm, Same) + Ss (Sunny, Warm, Normal Strong Warm, Same) 3: (Sunny, Warm, Strong, Warm, Same) G Consider the third training example. T boundary of the version space is overly general, that is, the hypothesis in G negative example reveals that the G incorrectly predicts that this new example is a positive example. The hypothesis in the G boundary must therefore be specialized until it correctly classifies this new negative example. For training example d, (Rainy, Cold, High, Strong Warm, Change) ~ S2 Sa | (Sunny, Warm, ?,Strong,Warm, Same) G Tsirny 222 AMMA, PPI? Same) G { (2.2,2,2,2.2) Given that there are six attributes that could be specified to specialize G2, why are there only three new hypotheses in Gs? For example, the hypothesis h = (?, ?, Normal, ?, 2, 2) is a minimal specialization of G2 that correctly labels the new example as a negative example, but it is not included in Gs. The reason this hypothesis is excluded is that it is inconsistent with the previously encountered positive examples Department of ISE, CiTech, Bangalore. 2 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING-18CS71 Consider the fourth training example. For training example d, (Sunny, Warm, High, Strong ,Cool Change) + $1 |(Sunny, Warm, ?, Strong, Warm, Same) G (Sunny, 2,2,2,2, )(2, War, 2, 2,2,2)(2,2,2,2,2, Same) This positive example further generalizes the S boundary of the version space. It also results in removing one member of the G boundary, because this member fails to cover the new positive example, After processing these four examples, the boundary sets Ss and Gs delimit the version space of all hypotheses consistent with the set of incrementally observed training examples. Ss ‘Stamny, Warm, ?, Strong, ?.?) (Sunny, 2,2, Strong ,?,?)( Sunny, Warm, ?,2,2,2)(2,Warm, ?, Strong .?, Ga (Sunny, ?,?,2,?,2){2,Warm, 2, 2,2?) 1.6. INDUCTIVE BIAS ‘The fundamental questions for inductive inference i. What if the target concept is not contained in the hypothesis space? Can we avoid this difficulty by using a hypothesis space that includes every possible hypothesis? Department of ISE, CiTech, Bangalore. B ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING-18CS71 iii, How does the size of this hypothesis space influence the ability of the algorithm to generalize to unobserved instances? iv. How does the size of the hypothesis space influence the number of training examples that must be observed? mns are examined in the context of the CANDIDATE- ‘These fundamental que: ELIMINTION algorithm. A Biased Hypothesis Space Suppose the target concept is not contained in the hypothesis space H, then obvious solution is to enrich the hypothesis space to include every possible hypothesis. * Consider the EnjoySport example in which the hypothesis space is restricted to include only conjunctions of attribute values. Because of this restriction, the hypothesis space is unable to represent even simple disjunctive target concepts such as "Sky = Sunny or Sky = Cloudy." * The following three training examples of disjunctive hypothesis, the algorithm would find that there are zero hypotheses in the ve mn space Example | Sky | AirTemp | Humidity | Wind | Water | Forecast | EnjoySport 1 [Sunny | Warm [Normal | Strong [ Coot | Change [~ Yes 2 [Rainy | Warm | Normal | Strong | Cool | Change | Yes 3 [Cloudy| Warm | Normal [Strong | Cool |Change | No * If Candidate Elimination algorithm is applied, then it end up with empty Version Space. After first two training example $2= * S2is overly general and it incorrectly covers the third negative training example! So H does not include the appropriate c. > a more expressive hypothesis spaci An Unbiased Learner required + The solution to the problem of assuring that the target concept is in the hypothesis space H is to provide a hypothesis space capable of representing Department of ISE, CiTech, Bangalore 4 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING-18CS71 every teachable concept that is representing every possible subset of the instances X. © The set of all subsets of a set X is called the power set of X ©. In the EnjoySport learning task the size of the instance space X of days described by the six attributes is 96 instances. o Thus, there are 2° distinct target concepts that could be defined over 1 s instance space and learner might be called upon to learn, © The conjunctive hypothesis space is able to represent only 973 of these - a biased hypothesis space indeed © Let us reformulate the EnjoySport learning task in an unbiased way by defining a new hypothesis space H’ that can represent every subset of stances, © The target concept "Sky = Sunny or Sky = Cloudy" could then be described as (Sunny, ?, ?, ?, 2, 2) ¥ (Cloudy, ?, 2,2, 2, 2) the Futility of Bias-Free Learning Inductive learning requires some form of prior assumptions, or inductive bias. Definition: Consider a concept learning algorithm L for the set of instances X. * Let c be an arbitrary concept defined over X # Let De= {(x,, c(x))} be an arbitrary set of training examples of c. # Let L (xi , De) denote the classification assigned to the instance xj by L after training on the data De. © The inductive bias of L is any minimal set of assertions B such that for any target concept c and corresponding training examples De. (Wx; € X)[(B AD. Ax;)F Li, Ded) ‘The below figure expl ‘+ Modeling inductive systems by equivalent deductive systems. ‘© The input-output behavior of the CANDIDATE-ELIMINATION algorithm using a hypothesis space H is identical to that of a deductive theorem prover Department of ISE, CiTech, Bangalore 15 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING-18CS71 utilizing the assertion "H contaii therefore called the inductive bias of the CANDIDATE-ELIMINATION algorithm. the target concept.” This assertion is © Charact them by th ing inductive systems by their inductive bias allows modelling equivalent deductive systems. This provides a way to compare inductive systems according to their policies for generalizing beyond the observed training data. Inductive System ‘Training Bx Candidate Training Examples Elimination Classification of new Algorithm instance, or “don't know” e New Instance __|_ Using hypothesis space H Equivalent Deduetive System Classification of new instance, or “don't New Instance ‘Theorem Prover Assertion “H contains the target concept Inductive bi made explicit Department of ISE, CiTech, Bangalore 16

You might also like