UNIT-3: Knowledge Representation & Reasoning

www.uptunotes.
com (Artificial Intelligence)UNIT-3
UNIT-3
Knowledge Representation & Reasoning:
 Propositional logic
 Theory of first order logic
 Inference in First order logic
 Forward & Backward chaining,
 Resolution.
Probabilistic reasoning
 Utility theory
 Hidden Markov Models (HMM)
 Bayesian Networks
Mr. Anuj Khanna

Assitant Professor(KIOT,Kanpur)
www.uptunotes.com (Artificial Intelligence)UNIT-3
Short Question & Answers

Ques 1. Differentiate between declarative knowledge and procedural knowledge.
Ans : Declarative knowledge means representation of facts or assertions. A declarative representation

declares ever piece of knowledge and permits the reasoning system to use the rules of inferences and derive
some new facts and conclusions. A declarative knowledge consists of a database containing relevant
information of some objects. E.g : Relational database of Company employees , Students record in a
particular class.
Procedural Knowledge represents actions or consequences and tells HOW of a situation. This knowledge
uses inference rules to manipulate these procedures to arrive at the result. Example Algorithm to solve
Travelling salesman problem sequentially in a systematic order.
Ques 2. Define the terms Bilief, and hypothesis , Knowledge, Epistemology.
 Belief : This is any meaningful and coherent expression that can be manipulated .
 Hypothesis: This is a justified belief that is not known to be true. Thus hypothesis is a belief
which is backed up with some supporting evidence.
 Knowledge: True justified belief is called knowledge.
 Epistemology: Study of the nature of knowledge.
Ques 3. What is formal logic? Give an example.
Ans : This is a technique for interpreting some sort of reasoning process. It is a symbolic manipulation
mechanism. Given a set of sentences taken to be true , the technique determines what other sentences can be
arranged to be true. The logical nature or validity of argument depends on the form of argument.
Example: Consider following two sentences: All men are mortal 2. Socrates is a man , So we can infer
that Socrates is mortal.
Ques 4. What is CNF and DNF ?
Ans : CNF( Conjunctive Normal Form) : A formula P is said to be in CNF , if it is of the form
P = P1 ˄ P2 ˄P3 , ….,Pn-1 , Pn. ; n ≥1, where each Pi from i = 1 to n is a disjunction of an atom
Example: (Q  P) ˄ (T  ~ Q) ˄ ( P  ~T).
DNF(Disjunctive Normal form) A formula P is said to be in DNF if it has the forma
P = P1  P2  P3, ….  Pn-1  Pn.; n ≥1, where each Pi from i = 1 to n is a conjunction of an
atom . Example: (Q ˄ P)  (T ˄~ Q)  ( P ˄ ~T)
Mr. Anuj Khanna

Ques 5.What are Horn Clauses ? What is its usefulness in logic programming?
Ans : A horn clause is a clause(disjunction of literals) with at most one positive literal. A horn clause with
exactly one positive literal is called definite clause. A horn clause with no positive literals is sometimes
called a goal clause. A dual horn clause is a clause with at most one negative literal.
Example : ~ P  Q  ….  ~ T  U.is a definite horn clause. Relevance of horn clause to theorem

proving by predicate logic resolution is that the resolution of two horn clauses is a horn clause. Resolution
of a goal clause and a definite clause is again a goal clause. In automated reasoning it improves the
efficiency of algorithms. Prolog is based on Horn clauses.
Ques 6. Determine whether the following PL formula is (a) Satisfiable (b) contradictory
(c) Valid : (p  q)  r  q
Ans : Truth table for above problem :
p Q p˄q r q ~q r  ~q (p  q)  r  q
T T T T T F T T
T F F T F T T T
F T F F T F F T
F F F F F T T T
Therefore, the given formula is a Tautology.
Ques 7. Convert the following sentences into wff of Predicate Logic ( First order logic).
(i) Ruma dislikes children who drink tea.
(ii) Any person who is respected by every person is a king.
Ans : (i)  x child(x) ˄ DrinkTea (x) →Dislikes ( Ruma, x)

(ii)  x  y : Person (y) ˄ Respects( y , x) → King (x)
Mr. Anuj Khanna

Long Question & Answers

Ques 8 : Define the term knowledge. What is the role of knowledge in Artificial Intelligence?
Explain various techniques of knowledge representation.
Ans : Knowledge: Knowledge is just another form of data. Data is a raw facts. When these raw facts are
organized systematically and are ready to be processed in human brain or some machine , then it becomes
the knowledge. From this knowledge we can easily draw desired conclusions which can be used to solve
real world complex and simple problems.
Example : A doctor treating a patient requires both the knowledge as well as data. The data is patient’s
record (i.e. patient’s history, measurements of vital signs , diagnosticreports, response of medicines etc…).
Knowledge is that information which the doctor has gained in medical college during his studies.
Cycle of knowledge from data is as follows :
(a) Raw data when refined , processed or analyzed yields information which becomes useful in
answering users queries.
(b) Further refinement , analysis and the adition of heuristics, information may be converted into
knowledge, which is useful in problem solving and from which additional knowledge may be
inferred.
Role of Knowledge in AI : Knowledge is central to AI. More is the knowledge then better are the chances
of a person to be more intelligent as compared from others. Knowledge also improves search efficiency of
human brain. Knowledge to support Intelligence is needed because :
(a) We can understand natural language with the help of it and use it when required.
(b) We can make decisions if we possess sufficient knowledge about the certain domain.
(c) We can recognize different objects with varying features quite easily.
(d) We can interpret various changing situations very easily and logically.
(e) We can plan strategies to solve difficult problems altogether.
(f) Knowledge is dynamic and Data is static.
An AI system must be capable of doing following three things :
(a) Store the knowledge in knowledge base(Both static and dynamic KB)
(b) Apply the knowledge stored to solve problems.
(c) Acquire new knowledge through the experience.
Three key components of an AI system.
Mr. Anuj Khanna

1. Representation 2. Learning 3. Reasoning.

Various techniques of knowledge representation
Simple
Inheritable
relational
knowledge
knowledge
Inferential Procedural
knowledge knowledge.
(A ) Relation Knowledge : This is the simplest way to represent knowledge in static form , which is stroed
in a database as a set of records.Facts about the set of objects and relationship between objects are set out
systematically in columns. This technique has very little opportunity for inference. But it provides
knowledge base for other powerful inference mechanisms.Example: Set of records of Employees in an
organization Set of records and related information of voters for elections.
(B) Inheritable Knowledge :One of the most useful form of inference is property inheritance. In this
method Elements of certain classes inherit attributes and values from more general classes in which they are
needed. Features of inheritable knowledge are :
 Property inheritance (Objects inherit values from being members of a class, data must be organized
into a hierarchy of classes.)
 Boxed nodes (contains objects and values of attributes of objects).
 Values can be objects with attributes and so on…
 Arrows ( point from object to its value).
 This structure is known as Slot and Filler Architecture, Semantic network or collection of
frames.
 In semantic networks nodes of classes or objects with some inherent meaning are connected in a
network structure.
Mr. Anuj Khanna

(C) Inferential Knowledge : Knowledge is useless unless there is some inference process that can exploit
it.The required inference process implements the standard logical rules of inference. It represents knowledge
as a form of formal logic . Example : All dogs have tails , x : dog(x) →hastail(x)
This knowledge supports automated reasoning. Advantages of this approach is:
 It has set of strict rules.
 Can be used to derive more facts.
 Truth of new statements can be verified.
 Guaranteed correctness.
(D) Procedural Knowledge: This is encoded form of some procedures. Example: Small programs that
know how to do specific things , how to proceed e.g. a parser in a natural language system has the
knowledge that a noun phrase may contain articles, adjectives and nouns. It is represented by calls to
routines that know how to process articles , adjectives and nouns.
Advantages :
 Heuristic or domain specific knowledge can be represented
 Extended logical inferences, like default reasoning is incorporated.
 Side effects of actions may be modeled.
Mr. Anuj Khanna

Disadvantages :
 Not all the cases may be represented.
 Not all the deductions may be correct
 Modularity is not necessary, control information is tedious.
Ques 9 : Define the term logic. What is the role of logic in Artificial Intelligence? Compare
Propositional logic with First order logic (Predicate Calculus).
Ans : Logic is defined as a scientific study of the process of reasoning and the system of rules and
procedures that help in reasoning process. Logic is the process of reasoning representations using
expressions in formal logic to represent the knowledge required. Inference rules and proof procedures can
apply this knowledge to solve specific problems.
We can derive new piece of knowledge by proving that it is a consequence of knowledge that is already
known. We generate logical statements to prove the certain assertions.
Algorithm = logic + control
Role of Logic in AI
 Computer scientists are familiar with the idea that logic provides techniques for analyzing the
inferential properties of languages. Logic can provide specification for a programming language by
characterizing a mapping from programs to the computations that they implement.
 A compiler that implements the language can be incomplete as long as it approximates the logical
requirements of given problem. This makes it possible to involve logic in AI applications to vary
from relatively weak uses in which logic informs the implementation process with analysis in depth .
 Logical theories in AI are independent from implementations. They provide insights into the
reasoning
problem without directly informing the implementation.
 Ideas from logic theorem proving and model construction techniques are used in AI.
 Logic works as a analysis tool , knowledge representation technique for automated reasoning and
developing Expert Systems. Also it gives the base to programming language like Prolog to develop
AI softwares.
Mr. Anuj Khanna

George Boole (1815-1864) wrote a book in , named as “ Investigation of Laws of Thoughts”

To investigate the fundamental laws of those operations of the mind by which reasoning is
performed ; to give expression to them in the symbolical language of a Calculus and upon this
foundation to establish the science of Logic and construct its method. To make this method
Itself the basis of a general method from the various elements of truth brought to view in the
course of these inquiries some probable intimations concerning the nature and constitution of
human mind.
Comparison b/w Propositional Logic & First Order Predicate Logic
S.NO PL FOPL
1. Less Declarative More Declarative
2. Contexts dependent semantics Context independent semantics
3. Ambiguous and less expressive Unambiguous and more expressive.
4. Propositions are used as components Use of predicates/relations between
with logical connectives. objects, functions , variables , logical
connectives and quantifiers( Existential
and Universal)
5. Rules of inferences are used for Rules of inferences are used along with
deduction like Modus Ponen, Modus the rules of Quantifiers .
Tollens,disjunctive syllogism etc.
6. Inference algorithms like inference Inference algorithms like Unification ,
rules , DPLL, GSAT are used. Resolution , backward and forward
chaining are used.
7. NP complete Semi-decidable
Ques 10 (A) Convert the following sentences to wff in first order predicate logic.
(i) No coat is water proof unless it has been specially treated.
(ii) A drunker is enemy of himself.
(iii) Any teacher is better than a lawyer.
(iv) If x and y are both greater than zero, so is the product of x and y.
(v)Every one in the purchasing department over 30 years is married.
(B) Determine whether each of the following sentence is satisfiable, contradictory or valid
S1 : (p  q)  (p   q)  p S2 : p  q   p
Mr. Anuj Khanna

Ans : (A) (i) No coat is water proof unless it has been specially treated.
x : [ C(x) → ( ~W(x)  S(x) ] , where :

C(x) : x is a coat , ~ W(x) : x is not water proof , S(x) : x is specially treated.
(ii)A drunker is enemy of himself
x : [ D(x) → E(x,x)] , where : D(x) : x is a drunker , E(x,x) : x is enemy of x.
(iii) Any teacher is better than a lawyer.

x : [ T(x) →  y : ( L(y) ˄ B(x , y)] , where :
T(x) : x is a teacher , L(y) : y is lawyer , B(x , y) : x is better than y.
(iv) If x and y are both greater than zero, so is the product of x and y.
x y [ GT (x , 0 ) ˄ GT (y , 0) → GT ( times (x , y) , 0 ) ].
Where : GT : greater than , times(x ,y) : x times y (times is predicate), or we can use
product_of (x , y) , product_of is a function.
(v) Every one in the purchasing department over 30 years is married.

x y [ works_in (x , purch_deptt ) ˄ has_age (x , y ) ˄ GT(y,30 ) → Married(x) ]
(B) (i) Truth table for : (p  q)  (p   q) p

P Q pq ~q p  ~q (p  q)  (p   q) p
T T T F T T
T F T T T T
F T T F F F
F F F T T T
Hence by last column of truth table, the above statement is satisfiable.
(ii)Truth table for : p  q   p
Hence by last column of

P q p→q ~p pq  p
truth table, the above
statement
T T T F F
is satisfiable.
T F F F T
F T T T T
F F T T T
Mr. Anuj Khanna

Ques 11: Using the inference rules of Propositional logic , Prove the validity of following
axioms:
(i) If either algebra is required or geometry is required then all students will study
mathematics.
(ii) Algebra is required and trignometry is required therefore all students will study
mathematics.
Ans : Converting above sentences to propositional logic and applying inference rules :
(i) (A  G → S)
(ii) (A ˄ T) To prove that : S is true
Where A : algebra is required , G : geometry is required , T : trigonometry is required.
(iii) (A ˄ T) is true

By simplification A is true (applying simplification in formula (ii))
(iv) Now (A  G ) is true. (applying addition in (iii))
(v) Therefore , S is true ( applying Modus Ponen b/w (i) & (iv))
Hence above axioms are valid, because all are proved to be true.
Ques 12 : Determine whether the following argument is valid or not. “ If I work whole night on this
problem, then I can solve it . If I solve the problem , then I will understand the topic.
Therefore , I will work whole night on this problem, then I will understand the topic.”
(i) WN → S , where WN : If I work whole night, S : I can solve it
(ii) S → U , where U : I will understand the topic,
To prove the validity of : WN → U.

(iii) Between the axioms (i) & (ii) apply Hypothetical syllogism/chain rule of inference.
So we get : WN →U, Hence the validity of axioms is proved.
Ques 13. Given the following sentences, Prove their validity :

(i) Either Smith attended the meeting or Smith was not invited in the meeting.
(ii) If directors wanted Smith in meeting then Smith was invited in me.
(iii) Smith didn’t attended the meeting.
(iv) If director’s didn’t want Smith in meeting and Smith was not invited to meeting, then
Smith is on his way out of the company.
Mr. Anuj Khanna

(i) ( A  ~ I ) , where A : Smith attended the meeting , ~I :smith was not invited in meeting.
(ii) ( D → I ) , where D : Directors wanted Smith in meeting , I : Smith was invited in meeting.
(iii) ~ A , Smith did not attend the meeting.
(iv) ( ~ D ˄ ~ I ) → W , To prove that W is true , Smith is on his way out of the company.
(v) ~ I (By applying Disjunctive Syllogism b/w axiom (i) & (iii).
(vi) ~ D ( By applying Modus Tollens b/w axiom (ii) & (v)).
(vii) (~ D ˄ ~ I) ( By applying Conjunction b/w axiom (v) & (vi)).
(viii) W ( By applying Modus Ponen b/w axiom (iv) & (vii)). (Hence Proved.)
Ques 14 : What is clause form of a wff (well-formed formula)? Convert the following formula into
clause form :  x  y [  z P( f(x), y, z) → {  u Q( x , u) ˄  v R( y, v) } ].
Ans : Clause Form : In Theory of logic either it is propositional logic or predicate logic , while proving the
validity of statements using resolution principle it is required to convert well-formed formula into the
clause form. Clause form is the set of axioms in which propositions or formula are connected only through
OR (˅) connector.
Step 1: Elimination of Implication: Applying P → Q  ~ P ˅ Q
xy (~ z P ( f(x), y, z ) ˅ (  u Q ( x , u) ˄  v R( y ,v) )
Step 2 : Resolving the scope of Negation: Applying ~  (x) F(x)   x ~ F(x).
xy ( z ~ P ( f(x), y, z ) ˅ (  u Q ( x , u) ˄  v R( y ,v) )
Step 3. Applying Qx F(x) ˅ G  Qx [ F(x) ˅ G ]
xy z (~ P ( f(x), y, z ) ˅ (  u Q ( x , u) ˄  v R( y ,v) )
Step 4. Conversion to Prenex Normal Form
xy z  u  v (~ P ( f(x), y, z ) ˅ ( Q ( x , u) ˄ R( y ,v) )
Step5. Skolemization : Conversion to Skolem standard form
y (~ P ( f(a), y, g(y) ) ˅ ( Q ( a , h(y) ) ˄ R( y , I(v) )
Step 6. Removal of Universal Quantifiers
(~ P ( f(a), y, g(y) ) ˅ ( Q ( a , h(y) ) ˄ (R( y , I(v) ) )
Mr. Anuj Khanna

Step 7. Apply Distributive Law for CNF: P ˅ ( Q ˄ R )  ( P ˅ Q ) ˄ ( P ˅ R )
( ~ P ( f (a) , y , g(y) ) ˅ Q ( a , h (y) ) ˄ ( ~ P ( f (a) , y , g(y) ) ˅ R ( y , I (y) )
Step 8. On removing ˄ we get two clauses:
Clause 1: ( ~ P ( f (a) , y , g(y) ) ˅ Q ( a , h (y) )
Clause 2 : ( ~ P ( f (a) , y , g(y) ) ˅ R ( y , I (y) )
Ques 15 : (A) What is resolution Principle in propositional logic, explain?

(B) Let the following set of axioms is given to be true: P , (P ˄ Q ) → R ,
( S ˅ T ) → Q , T . Assumption is that all are true. To Prove that R is true.
Ans : Resolution Principle : This is also called proof by refutation. To prove a statement is valid ,
resolution attempts to show that the negation of statement produces a contradiction with known
statements . At each step two clauses, called PARENT CLAUSES are compared / resolved,
yielding a new clause that has been inferred from them.
Example : Let two clauses in PL C1 and C2 are given as :
C1 : winter ˅ summer , C2: ~ winter ˅ cold . Assumption is that both C1 and C2
Are true. From C1 and C2 we can infer/deduce summer ˅ cold. This is RESOLVENT CLAUSE
Resolvent Clause is obtained by combining all of the literals of the two parent clauses except the
ones that cancel. If the clause that is produced is empty clause, then a contradiction has been found.
E.g : winter and ~ winter will produce an empty clause.
Algorithm of resolution in propositional logic:
Step 1: Convert all the propositions of F to clause form, where F is set of axioms.
Step 2: Negate proposition P and convert the result to clause form. Add it to the set of clauses
obtained in step 1.
Step 3. Repeat until either a contradiction is found or no progress can be made:
(a) Select two clauses as a parent clause.

(b) Resolve them together. Resolvent clause will be the disjunction of all literals of both the
parent clause with following conditions :
Mr. Anuj Khanna

(i) If there are any pairs of literals L and ~L such that one of the parent clauses
contains L and other contains ~L , then select one such pair and eliminate both L
and ~L from resolvent clause.
(c) If resolvent is empty clause , then a contradiction has been found. If it is not , then ad it
to the set of clauses available to the procedure.
Ans : (B) Let ~R is true, ad it to the set of clauses formed from given axioms(as a set of support).
C1 : P is true , C2 : ~P V ~Q V R ( By eliminating implication in
(P ˄ Q) → R
C3 : ~ S ˅ Q , C4 : ~ T ˅ Q , C5 : T , C6 : ~R.
( Eliminating implication from ( S ˅ T ) → Q
~ ( S ˅ T ) ˅ Q ≡ (~ S ˄ ~ T ) V Q ( By demorgan’s law), Now apply distributive law
We obtain : (~ S ˅ Q ) ˄ (~ T ˅ Q ) , convert it into two clauses C3 and C4 after
removing AND connector.
Clauses C1 to C5 are base set and C6 is set of support.
~P˅~Q˅R ~R
~ P ˅ ~ Q (Resolvent Clause ) P
~T˅Q ~Q
~T T
Assumption that ~ R is true is false.
So R is true.
Empty Clause
(Contradiction Found)
Mr. Anuj Khanna

Ques 16: How is resolution in first order predicate logic different from that of propositional
performed? What is Unification Algorithm & why it is required?
Ans : In FOPL , while solving through resolution , situation is more complicated since we must consider all
the possible ways of substituting values for variables. Due to the presence of existential and universal
quantifiers in wff and arguments in predicates , the thing becomes more complicated
Theoretical basis of resolution procedure in predicate logic is “Herbrand’s Theorem” , which is as

follows :
(i) To show that a set of clause S is is unsatisfiable, it is necessary to consider only
interpretations over a particular set, called as Herbrand Universe S.
(ii) A set of clauses S is unsatisfiable iff a finite subset of ground instances ( in which all
bound variables have a value substituted for them), of S is unsatisfiable.
Finding a contradiction is to try systematically the possible substitutions and see if each produces a
contradiction. To apply resolution in predicate logic , we first need to apply unification technique.
Because in FOPL literals with arguments are to be resolved , then matching of arguments is also
required.
Unification Algorithm: Unification algorithm is used as a Recursive Procedure. Let two literals in
FOPL are P (x ,x ) and P ( y , z ). Here predicate name P matches in both literals , but arguments do
not match. O now substitution is required. Now 1st arguments of both x and y do not match. So
substitute y for x , then it will match.
So substitution 𝝈 = y / x is required. (𝝈 is called UNIFIER)Now if we apply 𝜎 = z / x ,

then it is not a consistent substitution , because we can not substitute both y and z for x.
So after applying 𝜎 = y / x , we can perform : P ( y , y ) and P ( y , z ) . Now unify aruments

y and z , by 𝜎 = z / y. So new composition can be : (z / y)(y /
Some Rules for unification algorithm :
i. A variable can be unified with a constant.

ii. A variable can be unified with another variable.
iii. A variable can be unified with a function.
iv. A variable can’t be unified by a function which has an argument as a same variable.
Mr. Anuj Khanna

v. A constant can’t be unified by a constant.

vi. Predicate/ Literals’ with different number of arguments can’t be unified.
Ques 17: Given the following set of facts, Prove that “ Some who are intelligent can’t read ”.
(i) Who ever can read is literal.

(ii) Dolphins are not literate
(iii) Some Dolphins are intelligent.
Ans : Solution : Form wff of given sentences.
S1 : ∀x [ R (x) → L(x)] , R(x) : whoever can read , L(x) : x is literate.

S2 : ~ L (Dolphins) , ~ L means not literate.
S3 : ∃x [ D(x) ˄ I (x) ] , D(x) : x is Dolphin, I(x) : x is intelligent.
S1 to S3 is Base Set. Let us assume that negation of statement to be proved is true.
So to prove that : ∃x [ I(x) ˄ ~ R (x) ] is true, we assume ~∃x [ I(x) ˄ ~ R (x) ] is true.
So add it as a set of support in the Base Set.
~∃x [ I(x) ˄ ~ R (x) ] ≡ ∀ x [ ~ I(x) ˅ R(x) ] ≡ ~ I(x) ˅ R(x)
Convert all wffs into clause form :
C1 : ~R(x) ˅ L(x) , C2 : ~ L( Dolphins)
In S3 : Apply existential Instantiation to remove ∃ quantifier.
Therefore C3 : D(c) ˄ I(c) { This is in CNF now.}.

Now two clauses can be formed after eliminating Connector ˄. So we get :
C3 (a) : D (c) , C3(b) : I(c).
C4 : ~I(x) ˅ R(x) , This is Set of Support.
Mr. Anuj Khanna

Ques 18 : Given the following set of facts :-
(i) John likes all kinds of food

(ii) Apples are food
(iv) Anything any one eats and is not killed by is food.
(iii) Bill eats peanuts and is still alive.
(iv) Sue eats everything Bill eats
Translate above into predicate logic. Convert each wff so formed in the clause form.
“ Prove that John likes peanuts Using resolution “
Mr. Anuj Khanna

Ans .Converting given statements into wff of FOPL

∀ 𝑥 : Food(x) → Likes (John , x) C1 C7
Food (Apples)
Food (Chicken) 𝝈 = x/Peanuts
∀ 𝑥 ∀ 𝑦 : Eats ( x , y) ˄ ~ Killed (x) → Food ( y)
∀ 𝑥 : Eats ( Bill , Peanuts ) ˄ alive (Bill )
∀ 𝑥 : Eats ( Bill , x) → Eats (Sue , x) ~ Food ( Peanuts)
C4
To Prove that : Likes ( John , Peanuts)
Conversion of above wffs into clause form : 𝝈 = y / Peanuts

C1 : ~ Food (x) ˅ Likes ( John , x)
C2 : Food (Apples) ~ Eats ( x , Peanuts ) ˅ Killed (x)
C3 : Food ( Chicken)
C4 : ∀ 𝑥 ∀ 𝑦 : Eats ( x , y) ˄ ~ Killed (x) → Food ( y)
𝝈 = 𝒙/𝑩𝒊𝒍𝒍 C5 (a)
≡ ~ [ Eats ( x , y) ˄ ~ Killed (x) ] ˅ Food ( y)
C5(a) : Eats ( Bill , Peanuts)
C5 (b) : alive (Bill) OR ~ Killed ( Bill ) Killed (Bill)
C6 : ~ Eats ( Bill , x) ˅ Eats ( Sue , x )
But from Sub clause C5 (b) we have

Let us assume that John does not Likes peanuts is True. Alive (Bill), i.e Bill is alive. So
C7 : ~ Likes ( John , Peanuts ) , ( This is set of contradiction has occurred.
Therefore, our assumption that John
support ) does not likes Peanuts is false. Hence
we can say that Likes( John ,
Peanuts) is true.
Ques 19 : Explain Backward and forward Chaining , with example in logic representation. Also mention
advantages and disadvantages of both the algorithms.
Ans : The process of the output of one rule activating another rule is called chaining. Chaining technique is
to break the task into small procedures and then to inform each procedure within the sequence by itself. Two
types of chaining techniques are known: forward chaining and backward chaining.
Mr. Anuj Khanna

(A) Forward chaining :

 This a data-driven reasoning, and starts with the known facts and tries to match the rules
with these facts.
 There is a possibility that all the rules match the information (conditions). In forward chaining,
firstly the rules looking for matching facts are tested, and then the action is executed.
 In the next stage the working memory/term memory is updated by new facts and the matching
process all over again starts. This process is running until no more rules are left, or the goal is
reached.
 Forward chaining is useful when a lot of information is available. Forward chaining is useful to
be implemented if there are an infinite number of potential solutions like configuration
problems and planning.
A rule based KB is given as : and it is to prove the conclusion.
Rule1: IF A OR B THEN C
Rule 2 : IF D AND E AND F THEN G
Rule 3: IF C AND G THEN H
The following facts are presented: B, D, E, F. Goal: prove H. The structure of a forward chaining
example is given in the following figure:
Backward Chaining :
 The opposite of a forward chaining is a backward chaining.
 Contrast to forward chaining, a backward chaining is a goal-driven reasoning method. The
backward chaining starts from the goal (from the end) which is a hypothetical solution and the
inference engine tries to find the matching evidence.
Mr. Anuj Khanna

 When it is found, the condition becomes the sub-goal, and then rules are searched to prove
these sub-goals. It simply matches the RHS of the goal. This process continues until all the
sub-goals are proved, and it backtracks to the previous step where a rule was chosen.
 If there is no rule to be established in an individual sub-goal, another rule is chosen.
 The backward chaining reasoning is good for the cases where there are not so much facts and
the information (facts) should be generated by the user. The backward chaining reasoning is
also effective for application in the diagnostic tasks.
In many cases the linear logic programming languages are implemented using the
backward chaining technique. The combination of backward chaining with forward
chaining provides better results in many applications.
Decision Criteria for Forward or Backward Reasoning

1. More possible goal states or start states?
(a) Move from smaller set of states to the larger
(b) Is Justification of Reasoning required?
2. Prefer direction that corresponds more closely to the way users think.
3. What kind of events triggers problem-solving?

(a)If it is arrival of a new fact, forward chaining makes sense.
(b) If it is a query to which a response is required, backward chaining is more natural.
4. In which direction is branching factor greatest?
(a) Go in direction with lower branching factor
Mr. Anuj Khanna

Advantages and disadvantages of forward chaining :

1. Runs great when a problem naturally begins by collecting data and searching for information that
can be collected from it to be used in future steps.
2. Forward chaining has the capability of providing a lot of data from the available few initial data or
facts.
3. Forward chaining is a very popular technique for implementation to expert systems, and systems
using production rules in the knowledge base. For the expert system that needs interruption, control,
monitoring, and planning, the forward chaining is the best choice.
4. When there are few facts and initial states, the forward chaining is very useful to be applied.
Disadvantages of a Forward Chaining :

1. New information will be generated by the inference engine without any knowledge about which
information will be used for reaching the goal.
2. The user might be asked to enter a lot of inputs without knowing which input is relevant to the
conclusion.
3. Several rules may fire that have nothing to reach the goal;
4. It might produce different conclusions which are the causes of a high cost of the chaining process.
Advantages of Backward Chaining :

1. The system will stop processing once the variable has its value. It's a “floor system”.
2. The system that uses backward chaining tries to set goals in order which they arrive in the
knowledge base.
3. The search in backward chaining is directed.
4. While searching, the backward chaining considers those parts of the knowledge base which are
directly related to the considered problem or backward chaining never performs unnecessary
inferences.
5. Backward chaining is an excellent tool for specific types of problems such as diagnosing and
debugging.
6. Compare to forward chaining, few data are asked, but many rules are searched.
Mr. Anuj Khanna

Some disadvantages of backward chaining:

1. The goal must be known to perform the backward chaining process;
2. The implementation process of backward chaining is difficult.
Ques 20: What is Utility theory and its importance in AI ? Explain with the help of suitable examples.
Ans : Utility theory is concerned with people's choices and decisions. It is concerned also with people's
preferences and with judgments of preferability, worth, value, goodness or any of a number of similar
concepts. Utility means quality of being useful. So as per this each state in environment has a degree of
usefulness to an agent, that agent will prefer states with higher utility.
Decision Theory = Probability theory + Utility Theory.
Interpretations of utility theory are often classified under two headings, prediction and prescription:
(i) The predictive approach is interested in the ability of a theory to predict actual choice behavior.
(ii) The prescriptive approach is interested in saying how a person ought to make a decision.
E.g : Psychologists are primarily interested in prediction.
Economists in both prediction and prescription. In statistics the emphasis is on prescription
in decision making under uncertainty. The emphasis in management science is prescriptive
also.
Sometimes it is useful to ignore uncertainty, focus on ultimate choices. Other times, must model
uncertainty explicitly. Examples: Insurance markets, Financial markets., Game theory. Rather than
choosing outcome directly, decision-maker chooses uncertain prospect (or lottery). A lottery is a probability
distribution over outcomes.
Expected Utility : Expected utility of action A , given evidence E , E ∪( A | E) is calculated as follows :

E ∪( A | E ) = ∑𝒊 𝑷 ( Result i(A) | D0 (A) , E ) ∪ ((𝑹𝒆𝒔𝒖𝒍𝒕𝒊 (𝑨 ) ), where ,
P (Resulti (A) | D0 (A) ) is probability assigned by agent for action A to be executed.
D0(A) : Proposition that A is executed in current state.
This has two basic components; consequences (or outcomes) and lotteries.
(a) Consequences: These are what the decision-maker ultimately cares about.
Example: “I get pneumonia, my health insurance company covers most of the costs, but I have to pay
a $500 deductible.” Consumer does not choose consequences directly. Lotteries Consumer chooses a
lottery, p
(b) Lotteries are probability distributions over consequences: p : C → [0, 1] ;
with ∑c ∈ C p (c) = 1. Set of all lotteries is denoted by P. Example: “A gold-level health insurance
plan, which covers all kinds of diseases, but has a $500 deductible.” Makes sense because consumer
assumed to rank health insurance plans only insofar as lead to different probability distributions over
consequences.
Mr. Anuj Khanna
Utility Function : U : P → R has an expected utility form if there exists a function
u : C → R such that U (p) = ∑ p (c) u (c) for all p ∈ P. c ∈ C . In this case, the function U is called an
expected utility function, and the function u is call a von Neumann-Morgenstern utility function. These
functions are used to capture agent’s preferences between various world states .This function assigns a
single number to express desirability of a state utilities. Utilities are combined with outcome probabilities of
actions to give an expected utility for each action. U (s) : Means utility of state S , for agent’s Decision.
Maximum expected Utility ( MEU) : This represents that a rational agent should select an action that
maximizes the agent’s expected utility. MEU principle says “ If an agent maximizes a utility function that
correctly reflects the performance measure by which its behavior is being judged , then it will achieve the
highest possible performance score if we average over the environment of agent.”
Ques 21: What are constraint notations in utility theory ? Define the term Lottery. Also mention the
following axioms of Utility Theory :
(i) Orderability (ii) Substitutability (iii) Monotonicity (iv)Decomposability.
Ans : Constraint Notations in Utility theory for two outcomes / consequences A and B are as mentioned
below :
 A  B : A is preferred over B.
 A ~ B : Agent is indifferent between A and B.
 A ≥ B : Agent prefers A to B or is indifferent b/w them.
A Lottery L with possible outcomes C1 , C2 , C3 …..Cn that can occur with probabilities [ p1 , C1 ;
p2 , C2 ; …..; pn , Cn ].Each outcome of a lootery can be an atomic state or another lottery.
Axioms of Utility Theory:
(i) Orderability : Given any two states , a rational agent must prefer one to other or else rate the
two as equally preferable. So agent can’t avoid the decision.
( A  B) ˅ ( B  A) ˅ ( A ~ B)
(ii) Substitutability: If an agent A is indifferent b/w two lotteries A and B , then the agent is
indifferent b/w two more complex lotteries that are same except that B is substituted for A in
one of them.
( A ~ B)  [ p , A ; 1 – p , c ] ~ [ p , B ; 1 – p , c]
(iii) Monotonicity: Let two lotteries have same outcomes A and b. If (A 

B) , then agent prefers lottery with higher probability for A.
( A  B)  ( p ≥ q  [ p , A ; 1 – p , B ] ≥ [ q , A ; 1 – q , B ]
Mr. Anuj Khanna

(iv) Decomposability: Compound lotteries can be reduced or decomposed to simpler ones .
[ p , A ; 1 – p, [ q , B ; 1 – q , C ] ] ~ [ p , A ; (1 - p) q, B ; (1 - p) (1 -q) , C]
Ques 22 : What is probability reasoning ? Why it is required in AI applications?
Ans : Probabilistic Reasoning in Intelligent Systems is a complete and accessible account of the theoretical
foundations and computational methods that underlie plausible reasoning under uncertainty.
Intelligent agent’s almost never have acess to the whole truth about their environment. So agents act under
uncertainty. The agent’s knowledge can only provide degree of belief. Main concept for dealing with degree
of belief is PROBABILITY THEORY.
 If probability is 0 , then belief is that statement is false.
 If probability is 1 , then belief is that statement is true.
Percepts received from the environment form the evidence on which probability assertions are based.
As agent receives new percepts , its probability assessments are updated to reflect new Evidence.
 Before the evidence is find , we talk about prior (unconditional) probability.
 After the evidence is given , we deal with posterior (conditional ) probability.
Probability associated with a proposition (sentence) P is the degree of belief associated with it in the
absence of any other information.
• In AI applications, sample points are defined by set of random variables
– Random vars: boolean, discrete, continuous
Probability Distribution: With respect to some random variable we talk about the probabilities of all
possible outcomes of a random variable. E.g : Let weather is random variable , Given that :
 P( weather = sunny) = 0.7 , P( weather = rainy) = 0.2 , P( weather = cloudy) = 0.08
P( weather = snowy ) = 0.02
Joint Probability Distribution: Joint probability distribution for a set of random variables gives the
probability of every atomic event on those random variables (i.e., every sample point).In this case
P(Weather, Cavity) can be given by a 4 × 2 matrix of values
Weather = Sunny Rainy Cloudy Snowy

Cavity = True 0.144 0.02 0.016 0.02
Cavity = False 0.576 0.08 0.064 0.08
This is known as Joint Probability Distribution of weather and cavity.
Mr. Anuj Khanna

If a complete set of random variable is covered then it is called “ Full Joint Probability Distribution”.
Conditional Probability:
Definition of conditional probability: P(a∣b) = P(a ∧ b) | P(b) if P(b) ≠ 0 .
Product rule gives an alternative formulation: P(a ∧ b) = P(a∣b) . P(b) = P(b∣a)P(a) .
A general version holds for whole distributions, e.g., P(Weather, Cavity) = P(Weather ∣Cavity)P(Cavity)
Chain rule is derived by successive application of product rule: P(X1, . . . , Xn) = P(X1, . . . , Xn−1)
P(Xn∣X1, . . . , Xn−1) = P(X1, . . . , Xn−2) P(Xn−1∣X1, . . . , Xn−2) P(Xn∣X1, . . . , Xn−1) = . . . = ∏ n i =
1 P(Xi ∣X1, . . . , Xi−1) .
Applications of Probability theory in AI

 Uncertainty in medical diagnosis
(i) Diseases produce symptoms (ii) In diagnosis, observed symptoms => disease ID
(iii) Uncertainties
• Symptoms may not occur
• Symptoms may not be reported
• Diagnostic tests not perfect
• False positive, false negative
• Uncertainty in medical decision-making
(iv) Physicians, patients must decide on treatments
(v) Treatments may not be successful
(vi)Treatments may have unpleasant side effects
Ques 23:Explain in detail Markov Model and its applications in Artificial Intelligence.
Ans. Markov Model:
 Markov model is an un-précised model that is used in the systems that does not have any fixed
patterns of occurrence i.e. randomly changing systems.
 Markov model is based upon the fact of having a random probability distribution or pattern that may
be analysed statistically but cannot be predicted precisely.
Mr. Anuj Khanna

 In Markov model, it is assumed that the future states only depend upon the current states and not the
previously occurred states. In I order markov, current state depends only on just previous state. i.e.
Conditional probability is : P ( Xt | X0 : t-1) = P ( Xt | X t-1)
Set of states: { S1 S2 , S3 …. Sn }. Process moves from one state to another generating a sequence of
states.
Observable state sequence lead to a Markov Chain Model. Non Observable state leads to Hidden
Markov Models.
Transition Probability Matrix: Each time when a new state is reached the system is set to have
incremented one step ahead. Each step represents a time period which would result in another possible
state. Let Si is state I of environment for I = 1 , 2… n.
Conditional probability of moving from state Si to Sj = P ( Sj | Si ) = P ij, Si : current state , Sj : next

state. Pij = 0 if no transition takes place.
𝑃11 𝑃12 … … . 𝑃1𝑚

𝑃21 𝑃22 … … . 𝑃2𝑚
Transition Matrix : P= ……………………….
……………………….
[ 𝑃𝑚1 𝑃𝑚2 𝑃𝑚𝑚 ]
 Markov chain property: probability of each subsequent state depends only on what was the
previous state: P ( Sik | Si1 , Si2 ,……., Sik-1) = P ( Sik | Sik - 1) .
To define Markov model, the following probabilities have to be specified:
Transition probabilities: a ij = P ( Sj |Si) i.e. probability of transition from state i to j.
Initial Probabilities: ∏𝒊 = 𝑷 (𝑺𝒊) , Calculation of conditional probabilities of state sequences
are given as below :
P ( Si1 , Si2 , …….Sik-1 , Sik) = P ( Sik | Si1 , Si2 ,……., Sik-1). P ( Si1 , Si2 , …… Sik-1)
= P ( Sik | Sik-1 ) . P ( Si1 , Si2 , ….. Sik-2)
= P ( Sik | Sik-1). P( Sik-1 | Sik-2)………..P ( Si2 | Si1) . P(Si).
There are four common Markov-Models:
(i)Markov Decision Models (ii) Markov Chains (iii) Hidden Markov Model (iv)Partially
observable Markov Decision Process
Example : Consider a Problem of weather conditions, Transition diagram is as given below :
Mr. Anuj Khanna

•  Two states: { ‘ Rain’ and ‘ Dry’}

•  Transition probabilities: P(‘Rain ’|‘Rain’)=0.3 , P(‘Dry ’|‘Rain’)=0.7 , P(‘Rain ‘|’Dry’)=0.2,
P(‘ Dry ’|‘Dry’) =0.8
•  Initial probabilities: say P(‘Rain’) =0.4 , P(‘Dry’) = 0.6 . Suppose we want to calculate a probability of a
sequence of states in our example, {‘Dry’,’Dry’,’Rain’,Rain’}.
P({‘Dry’, ’Dry’,’ Rain’, Rain’} ) = P(‘Rain ‘|’Rain’) P(‘Rain ’|’Dry’) P(‘Dry ‘|’Dry’) P(‘Dry’)
= 0.3*0.2*0.8*0.6 = 0.0288 ≈ 0.0
Ques 24 : Explain Hidden Markov Model and its applications in AI .

Ans : Hidden Markov Model(HMM)
Hidden Markov-Model is an temporal probabilistic model for which a single discontinuous random
variable determines all the states of the system. A Hidden Markov Model, is a stochastic model where the
states of the model are hidden. Each state can emit an output which is observed . This model is used
because simple markov chain is too restricted for complex applications.
 It means that, possible values of variable = Possible states in the system.

 For example: Sunlight can be the variable and Sun can be the only possible state.
 To make markov model more flexible in HMM assumptions are made that the observations of
model are probabilistic function of each state.
Concept of Hidden Markov Model
Let Imagine , You were locked in a room for several days and you were asked about the weather outside.
The only piece of evidence you have is whether the person who comes into the room bringing your daily
meal is carrying an umbrella or not.
Mr. Anuj Khanna

What is hidden? Sunny, Rainy, Cloudy

What can you observe? Umbrella or Not
 In Hidden Markov-Model, every individual state has limited number of transitions and emissions.
State sequences are not directly observable, rather it can be recognized from the sequence of
observations produced by the system.
 Probability is assigned for each transition between states.
 Hence, the past states are totally independent of future states.
 The fact that HMM is called hidden because of its ability of being a memory less process i.e. its
future and past states are not dependent on each other.
 This can be achieved on two algorithms called as:
(i) Forward Algorithm. (ii) Backward Algorithm.
Components of HMM :
 Set of states: { S1 S2 , S3 …. Sn }.
 Sequence of states generated by the system : { Si1 , Si2 , …….Sik-1 , Sik }
 Joint probability Distribution by Markovian Chain :
P ( Sik | Si1 , Si2 ,……., Sik-1) = P ( Sik | Sik - 1)
Observations / Visible states : { V1 , V2 , …Vm-1 , V m}
For HMM following probabilities are to be specified:

(a) Transition Probabilities: a ij = P ( Sj |Si) i.e. probability of transition from state i to j.
(b) Observation probability Matrix: B = ( bi ( Vm) ) , where bi ( Vm ) = P ( Vm | Si).
(c) Vector of initial probabilities : : ∏𝒊 = 𝑷 (𝑺𝒊)
Model is defined as : M = ( A , B , 𝝅).
Transient state: Process does not returns in this state.
Recurrent state: Initial State and process returns to it at last with probability = 1.
Absorbing state: If a process enters to a state and is destined to remain there forever , Then it is
called absorbing state.
Applications Of Hidden Markov Model

 Speech Recognition.
 Gesture Recognition.
 Language Recognition.
Mr. Anuj Khanna

 Motion Sensing and Analysis.

 Protein Folding.
Ques 25 : Consider the following data provided for Weather Forecasting Scenario.
Two states (Hidden) : ‘Low’ and ‘High’ atmospheric pressure.
Two observations (Visible States) : ‘Rain’ and ‘Dry’.
Suppose  we want to calculate a probability of a sequence of observations in our
example, { ‘Dry’,’ Rain’}.
Ans : Solution :
Transition probabilities:
P(‘Low’|‘Low’) = 0.3
P(‘High’|‘Low’) = 0.7,
P(‘Low ’|‘High’) = 0.2 ,
P(‘High ’|‘High’) = 0.8
Observation probabilities:
P(‘Rain ’|‘Low’) = 0.6
P(‘Dry ’|‘Low’) = 0.4
P(‘Rain ’|‘High’) = 0.4
P(‘Dry ’|‘High’) =0.3 .
Initial probabilities: say P(‘Low’) = 0.4 , P(‘High’) = 0.6 .
Mr. Anuj Khanna

Calculation of observation sequence probability
Consider all possible hidden state sequences:

P({‘Dry’, ’Rain’} ) = P({‘Dry’,’ Rain’} , {‘Low’, ‘Low’}) + P({‘Dry’,’ Rain’} ,
{‘Low’, ‘High’}) + P ({‘Dry’,’ Rain’} , {‘High’, ‘Low’}) +
P({‘Dry’,’ Rain’} , {‘High’, ‘High’})
Where first term is :

P ({‘Dry’,’ Rain’} , {‘Low’, ‘Low’}) = P({‘Dry’,’ Rain’} | {‘Low’, ‘Low’}) P({‘Low’, ‘Low’})
= P (‘Dry ‘|’Low’) . P (‘Rain ‘|’Low’) P (‘Low’) P (‘Low’|’Low)
= 0.4*0.4*0.6*0.4*0.3
Ques 26 : Explain in detail Bayesian Theory and its use in AI. Define Likelihood ratio.
Ans : In probabilistic reasoning our conclusions are generally based on available evidences and past
experience . This information is mostly incomplete. When outcomes are unpredictable we use probabilistic
reasoning, E.g Weather forecasting system, Disease Diagnosis, Traffic congestion control system.
 When a doctor examines a patient’s history , symptoms , test rules , evidence of possible disease.
 In weather fore casting prediction of tomorrow’s cloud coverage , wind speed and direction , sun
heat intensity.
 A Business manager must take decision based on uncertain predictions , when to launch a new
product . Factors can be : Target consumer’s life style , population growth in specific city / state,
Average income of consumers, economic scenario of the country . All this can be depend on past
experience of market.
From the product rule of probability theory we express the following equations:
P ( a ∧ b ) = P(a ∣ b) . P( b ) ……….Eq 1.
P( a ∧ b ) = P( b ∣ a ) P( a ) …………Eq 2.
𝑷(𝒂 |𝒃) 𝑷 (𝒃)
On Equating both the equations: P(b|a)=
𝑷(𝒂)
Baye’s rule is used in modern AI systems for probabilistic inferences. It uses he notion of conditional
probability: P ( H | E ), This expression is read as “ The probability of hypothesis H given that we have
observed evidence E ”. For this we require prior probability H ( if we have no evidence) and extent to which
E provides evidence of H.
Mr. Anuj Khanna

𝑷( 𝑬 |𝑯𝒊).𝑷(𝑯𝒊)
Baye’s theorem states : P ( Hi | E ) =
∑𝑲
𝒏=𝟏 𝑷 (𝑬 |𝑯𝒏).𝑷(𝑯𝒏)
Where , P ( Hi | E) = Probability that hypothesis Hi is true given evidence E.

P( E | Hi) = Probability that we will observe evidence E given that hypothesis Hi is true.
P( Hi) = Priori probability that Hi is true in absence of E.
K = No. of possible hypothesis.
Example : (i) If we know the prior probabilities of finding each of the various minerals and we know the
probabilities that if mineral is present then certain physical characteristics will be observed. So Baye’s rule
can be used to find likelihood of minerals to be present.
(ii) Let for solving a medical diagnosis problem :
S : patient has spots , F : Patient has high fever , M : Patient has measles.
Without any additional evidence , presence of spots serves as evidence in favour of measles. It also
Serves as evidence of fever measles would cause fever. But if patient has measles is already known.
Alternatively either spots or fever alone would constitute evidence in favour of measles.
Likelihood Ratio: This is also a conditional probability expression obtained from Baye’s Rule.
If probability P( E ) is difficult to obtain , then we can write as :
𝑷( 𝑬 |~𝑯).𝑷(~𝑯)
P(~H|E) = ……. Eq (i)
𝑷( 𝑬 )
𝑷( 𝑬 |𝑯).𝑷(𝑯)
We have P ( H | E) = ……….Eq (ii)
𝑷( 𝑬 )
On dividing Eq ( ii ) by Eq ( i) We get :
𝑷( 𝑯 | 𝑬) 𝑷( 𝑬 |𝑯).𝑷(𝑯)
= ………….Eq (iii)
𝑷(~𝑯 | 𝑬 ) 𝑷( 𝑬 |~ 𝑯 ) 𝑷 ( ~ 𝑯)
This is Ratio of a probability of an event to the probability of its negation. Ratio is known as
𝑷( 𝑬 | 𝑯)
“ ODDs of Event : O ( E)”. Ratio is known as Likelihood ratio w.r.t H = L (E/H)
𝑷(𝑬 | ~𝑯)
Odds likelihood form of Baye’s Rule from Eq (iii) is : O ( H | E) = L ( E | H ) . O( H )
Disadvantages of Baye’s Theorem: For a complex problem , the size of joint probabilities that we
require to compute this function grows as 2 n if n different propositions are there.
 Knowledge acquisition is difficult. Too many probabilities are needed.
 Sapce for all probabilities is too large.
 Computation terms of all probabilities are too large.
Mr. Anuj Khanna

Ques 27 : What is Bayesian Network or Belief Network ? Explain its importance with the help of
an example.
Ans : To describe a real world , it is not necessary to use huge joint probability table in which the list of
probabilities of all possible outcomes is stored. To represent relationship between independent and
conditional independent variables a systematic approach in the form of a data structure called Bayesian
Network is used. It is also known as Causal network, Belief network , probabilistic network, Knowledge
Map. Extension of this is decision network or influence diagram.
“ A Bayesian network is a directed graph in which each node is attached with a quantitative probability
information”. This network is supported by CPT, known as conditional probability table. These are used
for representing knowledge in an Uncertain Domain
 Belief network used to encode the meaningful dependence between variables.
1. Nodes represent random variables 2. Arcs represent direct influence
2. Nodes have conditional probability table that gives that variables probability given the
different states of its parents
 The Semantics of Belief Networks
1. To construct network , think of as representing the joint probability distribution.
2. To infer from network , think of as representing conditional independence statements.

3. Calculate a member of the joint probability by multiplying individual conditional probabilities.
P(X1=x1, . . . Xn=xn) = P(X1=x1 | parents(X1)) * . . . * P( Xn=xn | parents (Xn) )
P (X1 , X2 , ….Xn , Xn-1) = ∏𝐧𝐢=𝟏 𝐏( 𝐗𝐢 |𝐩𝐚𝐫𝐞𝐧𝐭𝐬 (𝐗𝐢))
 To incrementally construct a network:

1. Decide on the variables
2. Decide on an ordering of them : The direct influences must be added to network first if they are to
become parents of the node they influence. So correct order in which to ad nodes is to add the Root
Causes first, then the variables they influence ans so on until we reach leaves( having no direct
causal influence on other variables).A node is conditionally independent of its non-descendants
given its parent. A node is conditionally independent of all other nodes innetwork given its parents ,
children and children’s parents.
Mr. Anuj Khanna

3. Do until no variables are left:

(a)Pick a variable and make a node for it
b) Set its parents to the minimal set of pre-existing nodes
(c) Define its conditional probability
 Often, the resulting conditional probability tables are much smaller than the exponential size of the full
joint. Different tables may encode the same probabilities.
 Some canonical distributions that appear in conditional probability tables:
(a) deterministic logical relationship (e.g. AND, OR)
(b) deterministic numeric relationship (e.g. MIN)
(c) parameteric relationship (e.g. weighted sum in neural net)
(d) noisy logical relationship (e.g. noisy-OR, noisy-MAX)
 Inference in Belief Networks: agate beliefs. After constructing such a network an inference engine
can use it to maintain and propagate beliefs. When new information is received , the effects can be
propagated throughout the network , until equilibrium probabilities are reached.
(a) Diagnostic inference: symptoms to causes
(b) Causal inference: causes to symptoms
(c) Intercausal inference
(d) Mixed inference: mixes those above
 Inference in Multiply Connected Belief Networks
(a)Multiply connected graphs have 2 nodes connected by more than one path
(b)Techniques for handling:
 Clustering: Group some of the intermediate nodes into one meganode.
 Pro: Perhaps best way to get exact evaluation.

 Con: Conditional probability tables may exponentially increase in size.
 Cutset conditioning: Obtain simpler polytrees by instantiating variables as constants.
 Con: May obtain exponential number of simpler polytrees.

 Pro: It may be safe to ignore trees with lo probability (bounded cutset
conditioning).
Mr. Anuj Khanna

 Stochastic simulation`: run thru the net with randomly choosen values for each node
(weighed by prior probabilities).
 The probability of any atomic event (it's joint probability) can be gotten from the
network.
 The correct order to add the nodes is "root causes" first, then the variables they influence
until we reach the "leaves", which have no direct causal influence on the other variables.
 If we don't, the network will have : More links and less natural probabilities needed
 Example: Scenario is about a new burglar alarm installed at home. It also
responds in minor earthquakes. Two neighbors John and Mary are always available in
case of any emergency. John always calls when he hears alarms but sometimes confuses
with telephone ring. Mary likes loud music and sometimes misses to hear the alarm
sound. The probabilities actually summarize a potentially infinite set of circumstances in
which the alarm might fail to go off ( E.g : High humidity , power failure , dead battery ,
cut wires , a dead mouse stuck inside the bell etc.) OR ( John or Mary might fail to
call and report it due to out for lunch , on vacations , temporarily deaf , passing of
airplane near the home etc.
Joint Probability Distribution is : P ( Burglary | alarm , JohnCalls , MarCalls) = P( Burglary | Alarm).

So only Alarm as a parent is needed.
Mr. Anuj Khanna

[ END OF 3rd UNIT ]
Mr. Anuj Khanna

www.uptunotes.com (Artificial
Intelligence)UNIT-3
Anuj Khanna, Assistant Professor ( CSE Deptt)

UNIT-3: Knowledge Representation & Reasoning

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

UNIT-3: Knowledge Representation & Reasoning

Uploaded by

Copyright:

Available Formats

www.uptunotes.

com (Artificial Intelligence)UNIT-3

Mr. Anuj Khanna

Short Question & Answers

Ans : Declarative knowledge means representation of facts or assertions. A declarative representation

Ques 2. Define the terms Bilief, and hypothesis , Knowledge, Epistemology.

Ques 4. What is CNF and DNF ?

Mr. Anuj Khanna

Example : ~ P  Q  ….  ~ T  U.is a definite horn clause. Relevance of horn clause to theorem

Ans : Truth table for above problem :

Therefore, the given formula is a Tautology.

Ans : (i)  x child(x) ˄ DrinkTea (x) →Dislikes ( Ruma, x)

Mr. Anuj Khanna

Long Question & Answers

Mr. Anuj Khanna

1. Representation 2. Learning 3. Reasoning.

Mr. Anuj Khanna

Mr. Anuj Khanna

Algorithm = logic + control

Mr. Anuj Khanna

George Boole (1815-1864) wrote a book in , named as “ Investigation of Laws of Thoughts”

Mr. Anuj Khanna

x : [ C(x) → ( ~W(x)  S(x) ] , where :

(iii) Any teacher is better than a lawyer.

(v) Every one in the purchasing department over 30 years is married.

(B) (i) Truth table for : (p  q)  (p   q) p

(ii)Truth table for : p  q   p

Hence by last column of

Mr. Anuj Khanna

Where A : algebra is required , G : geometry is required , T : trigonometry is required.

(iii) (A ˄ T) is true

(i) WN → S , where WN : If I work whole night, S : I can solve it

(ii) S → U , where U : I will understand the topic,

To prove the validity of : WN → U.

Ques 13. Given the following sentences, Prove their validity :

Mr. Anuj Khanna

Step 2 : Resolving the scope of Negation: Applying ~  (x) F(x)   x ~ F(x).

xy ( z ~ P ( f(x), y, z ) ˅ (  u Q ( x , u) ˄  v R( y ,v) )

Step 3. Applying Qx F(x) ˅ G  Qx [ F(x) ˅ G ]

xy z (~ P ( f(x), y, z ) ˅ (  u Q ( x , u) ˄  v R( y ,v) )

Step 4. Conversion to Prenex Normal Form

xy z  u  v (~ P ( f(x), y, z ) ˅ ( Q ( x , u) ˄ R( y ,v) )

Step5. Skolemization : Conversion to Skolem standard form

y (~ P ( f(a), y, g(y) ) ˅ ( Q ( a , h(y) ) ˄ R( y , I(v) )

Step 6. Removal of Universal Quantifiers

(~ P ( f(a), y, g(y) ) ˅ ( Q ( a , h(y) ) ˄ (R( y , I(v) ) )

Mr. Anuj Khanna

Step 7. Apply Distributive Law for CNF: P ˅ ( Q ˄ R )  ( P ˅ Q ) ˄ ( P ˅ R )

( ~ P ( f (a) , y , g(y) ) ˅ Q ( a , h (y) ) ˄ ( ~ P ( f (a) , y , g(y) ) ˅ R ( y , I (y) )

Step 8. On removing ˄ we get two clauses:

Clause 1: ( ~ P ( f (a) , y , g(y) ) ˅ Q ( a , h (y) )

Clause 2 : ( ~ P ( f (a) , y , g(y) ) ˅ R ( y , I (y) )

Ques 15 : (A) What is resolution Principle in propositional logic, explain?

Step 3. Repeat until either a contradiction is found or no progress can be made:

(a) Select two clauses as a parent clause.

Mr. Anuj Khanna

Clauses C1 to C5 are base set and C6 is set of support.

Mr. Anuj Khanna

Theoretical basis of resolution procedure in predicate logic is “Herbrand’s Theorem” , which is as

So substitution 𝝈 = y / x is required. (𝝈 is called UNIFIER)Now if we apply 𝜎 = z / x ,

So after applying 𝜎 = y / x , we can perform : P ( y , y ) and P ( y , z ) . Now unify aruments

Some Rules for unification algorithm :

•  Two states: { ‘ Rain’ and ‘ Dry’}

Consider all possible hidden state sequences: