Professional Documents
Culture Documents
UNIT-3: Knowledge Representation & Reasoning
UNIT-3: Knowledge Representation & Reasoning
UNIT-3
Knowledge Representation & Reasoning:
Propositional logic
Theory of first order logic
Inference in First order logic
Forward & Backward chaining,
Resolution.
Probabilistic reasoning
Utility theory
Hidden Markov Models (HMM)
Bayesian Networks
Belief : This is any meaningful and coherent expression that can be manipulated .
Hypothesis: This is a justified belief that is not known to be true. Thus hypothesis is a belief
which is backed up with some supporting evidence.
Knowledge: True justified belief is called knowledge.
Epistemology: Study of the nature of knowledge.
Ques 3. What is formal logic? Give an example.
Ans : This is a technique for interpreting some sort of reasoning process. It is a symbolic manipulation
mechanism. Given a set of sentences taken to be true , the technique determines what other sentences can be
arranged to be true. The logical nature or validity of argument depends on the form of argument.
Example: Consider following two sentences: All men are mortal 2. Socrates is a man , So we can infer
that Socrates is mortal.
Ans : CNF( Conjunctive Normal Form) : A formula P is said to be in CNF , if it is of the form
P = P1 ˄ P2 ˄P3 , ….,Pn-1 , Pn. ; n ≥1, where each Pi from i = 1 to n is a disjunction of an atom
Example: (Q P) ˄ (T ~ Q) ˄ ( P ~T).
DNF(Disjunctive Normal form) A formula P is said to be in DNF if it has the forma
P = P1 P2 P3, …. Pn-1 Pn.; n ≥1, where each Pi from i = 1 to n is a conjunction of an
atom . Example: (Q ˄ P) (T ˄~ Q) ( P ˄ ~T)
Ques 5.What are Horn Clauses ? What is its usefulness in logic programming?
Ans : A horn clause is a clause(disjunction of literals) with at most one positive literal. A horn clause with
exactly one positive literal is called definite clause. A horn clause with no positive literals is sometimes
called a goal clause. A dual horn clause is a clause with at most one negative literal.
Ques 6. Determine whether the following PL formula is (a) Satisfiable (b) contradictory
(c) Valid : (p q) r q
p Q p˄q r q ~q r ~q (p q) r q
T T T T T F T T
T F F T F T T T
F T F F T F F T
F F F F F T T T
Ques 7. Convert the following sentences into wff of Predicate Logic ( First order logic).
(i) Ruma dislikes children who drink tea.
(ii) Any person who is respected by every person is a king.
Ans : Knowledge: Knowledge is just another form of data. Data is a raw facts. When these raw facts are
organized systematically and are ready to be processed in human brain or some machine , then it becomes
the knowledge. From this knowledge we can easily draw desired conclusions which can be used to solve
real world complex and simple problems.
Example : A doctor treating a patient requires both the knowledge as well as data. The data is patient’s
record (i.e. patient’s history, measurements of vital signs , diagnosticreports, response of medicines etc…).
Knowledge is that information which the doctor has gained in medical college during his studies.
Cycle of knowledge from data is as follows :
(a) Raw data when refined , processed or analyzed yields information which becomes useful in
answering users queries.
(b) Further refinement , analysis and the adition of heuristics, information may be converted into
knowledge, which is useful in problem solving and from which additional knowledge may be
inferred.
Role of Knowledge in AI : Knowledge is central to AI. More is the knowledge then better are the chances
of a person to be more intelligent as compared from others. Knowledge also improves search efficiency of
human brain. Knowledge to support Intelligence is needed because :
(a) We can understand natural language with the help of it and use it when required.
(b) We can make decisions if we possess sufficient knowledge about the certain domain.
(c) We can recognize different objects with varying features quite easily.
(d) We can interpret various changing situations very easily and logically.
(e) We can plan strategies to solve difficult problems altogether.
(f) Knowledge is dynamic and Data is static.
An AI system must be capable of doing following three things :
(a) Store the knowledge in knowledge base(Both static and dynamic KB)
(b) Apply the knowledge stored to solve problems.
(c) Acquire new knowledge through the experience.
Three key components of an AI system.
Simple
Inheritable
relational
knowledge
knowledge
Inferential Procedural
knowledge knowledge.
(A ) Relation Knowledge : This is the simplest way to represent knowledge in static form , which is stroed
in a database as a set of records.Facts about the set of objects and relationship between objects are set out
systematically in columns. This technique has very little opportunity for inference. But it provides
knowledge base for other powerful inference mechanisms.Example: Set of records of Employees in an
organization Set of records and related information of voters for elections.
(B) Inheritable Knowledge :One of the most useful form of inference is property inheritance. In this
method Elements of certain classes inherit attributes and values from more general classes in which they are
needed. Features of inheritable knowledge are :
Property inheritance (Objects inherit values from being members of a class, data must be organized
into a hierarchy of classes.)
Boxed nodes (contains objects and values of attributes of objects).
Values can be objects with attributes and so on…
Arrows ( point from object to its value).
This structure is known as Slot and Filler Architecture, Semantic network or collection of
frames.
In semantic networks nodes of classes or objects with some inherent meaning are connected in a
network structure.
(C) Inferential Knowledge : Knowledge is useless unless there is some inference process that can exploit
it.The required inference process implements the standard logical rules of inference. It represents knowledge
as a form of formal logic . Example : All dogs have tails , x : dog(x) →hastail(x)
This knowledge supports automated reasoning. Advantages of this approach is:
It has set of strict rules.
Can be used to derive more facts.
Truth of new statements can be verified.
Guaranteed correctness.
(D) Procedural Knowledge: This is encoded form of some procedures. Example: Small programs that
know how to do specific things , how to proceed e.g. a parser in a natural language system has the
knowledge that a noun phrase may contain articles, adjectives and nouns. It is represented by calls to
routines that know how to process articles , adjectives and nouns.
Advantages :
Heuristic or domain specific knowledge can be represented
Extended logical inferences, like default reasoning is incorporated.
Side effects of actions may be modeled.
Disadvantages :
Not all the cases may be represented.
Not all the deductions may be correct
Modularity is not necessary, control information is tedious.
Ques 9 : Define the term logic. What is the role of logic in Artificial Intelligence? Compare
Propositional logic with First order logic (Predicate Calculus).
Ans : Logic is defined as a scientific study of the process of reasoning and the system of rules and
procedures that help in reasoning process. Logic is the process of reasoning representations using
expressions in formal logic to represent the knowledge required. Inference rules and proof procedures can
apply this knowledge to solve specific problems.
We can derive new piece of knowledge by proving that it is a consequence of knowledge that is already
known. We generate logical statements to prove the certain assertions.
Role of Logic in AI
Computer scientists are familiar with the idea that logic provides techniques for analyzing the
inferential properties of languages. Logic can provide specification for a programming language by
characterizing a mapping from programs to the computations that they implement.
A compiler that implements the language can be incomplete as long as it approximates the logical
requirements of given problem. This makes it possible to involve logic in AI applications to vary
from relatively weak uses in which logic informs the implementation process with analysis in depth .
Logical theories in AI are independent from implementations. They provide insights into the
reasoning
problem without directly informing the implementation.
Ideas from logic theorem proving and model construction techniques are used in AI.
Logic works as a analysis tool , knowledge representation technique for automated reasoning and
developing Expert Systems. Also it gives the base to programming language like Prolog to develop
AI softwares.
Ques 10 (A) Convert the following sentences to wff in first order predicate logic.
(i) No coat is water proof unless it has been specially treated.
(ii) A drunker is enemy of himself.
(iii) Any teacher is better than a lawyer.
(iv) If x and y are both greater than zero, so is the product of x and y.
(v)Every one in the purchasing department over 30 years is married.
(B) Determine whether each of the following sentence is satisfiable, contradictory or valid
S1 : (p q) (p q) p S2 : p q p
Ans : (A) (i) No coat is water proof unless it has been specially treated.
(iv) If x and y are both greater than zero, so is the product of x and y.
x y [ GT (x , 0 ) ˄ GT (y , 0) → GT ( times (x , y) , 0 ) ].
Where : GT : greater than , times(x ,y) : x times y (times is predicate), or we can use
product_of (x , y) , product_of is a function.
T T T F T T
T F T T T T
F T T F F F
F F F T T T
Hence by last column of truth table, the above statement is satisfiable.
Ques 11: Using the inference rules of Propositional logic , Prove the validity of following
axioms:
(i) If either algebra is required or geometry is required then all students will study
mathematics.
(ii) Algebra is required and trignometry is required therefore all students will study
mathematics.
Ans : Converting above sentences to propositional logic and applying inference rules :
(i) (A G → S)
(ii) (A ˄ T) To prove that : S is true
Hence above axioms are valid, because all are proved to be true.
Ques 12 : Determine whether the following argument is valid or not. “ If I work whole night on this
problem, then I can solve it . If I solve the problem , then I will understand the topic.
Therefore , I will work whole night on this problem, then I will understand the topic.”
Ans : Converting above sentences to propositional logic and applying inference rules :
Ans : Converting above sentences to propositional logic and applying inference rules :
(i) ( A ~ I ) , where A : Smith attended the meeting , ~I :smith was not invited in meeting.
(ii) ( D → I ) , where D : Directors wanted Smith in meeting , I : Smith was invited in meeting.
(iii) ~ A , Smith did not attend the meeting.
(iv) ( ~ D ˄ ~ I ) → W , To prove that W is true , Smith is on his way out of the company.
(v) ~ I (By applying Disjunctive Syllogism b/w axiom (i) & (iii).
(vi) ~ D ( By applying Modus Tollens b/w axiom (ii) & (v)).
(vii) (~ D ˄ ~ I) ( By applying Conjunction b/w axiom (v) & (vi)).
(viii) W ( By applying Modus Ponen b/w axiom (iv) & (vii)). (Hence Proved.)
Ques 14 : What is clause form of a wff (well-formed formula)? Convert the following formula into
clause form : x y [ z P( f(x), y, z) → { u Q( x , u) ˄ v R( y, v) } ].
Ans : Clause Form : In Theory of logic either it is propositional logic or predicate logic , while proving the
validity of statements using resolution principle it is required to convert well-formed formula into the
clause form. Clause form is the set of axioms in which propositions or formula are connected only through
OR (˅) connector.
Step 1: Elimination of Implication: Applying P → Q ~ P ˅ Q
xy (~ z P ( f(x), y, z ) ˅ ( u Q ( x , u) ˄ v R( y ,v) )
Ans : Resolution Principle : This is also called proof by refutation. To prove a statement is valid ,
resolution attempts to show that the negation of statement produces a contradiction with known
statements . At each step two clauses, called PARENT CLAUSES are compared / resolved,
yielding a new clause that has been inferred from them.
Example : Let two clauses in PL C1 and C2 are given as :
C1 : winter ˅ summer , C2: ~ winter ˅ cold . Assumption is that both C1 and C2
Are true. From C1 and C2 we can infer/deduce summer ˅ cold. This is RESOLVENT CLAUSE
Resolvent Clause is obtained by combining all of the literals of the two parent clauses except the
ones that cancel. If the clause that is produced is empty clause, then a contradiction has been found.
E.g : winter and ~ winter will produce an empty clause.
Algorithm of resolution in propositional logic:
Step 1: Convert all the propositions of F to clause form, where F is set of axioms.
Step 2: Negate proposition P and convert the result to clause form. Add it to the set of clauses
obtained in step 1.
(i) If there are any pairs of literals L and ~L such that one of the parent clauses
contains L and other contains ~L , then select one such pair and eliminate both L
and ~L from resolvent clause.
(c) If resolvent is empty clause , then a contradiction has been found. If it is not , then ad it
to the set of clauses available to the procedure.
Ans : (B) Let ~R is true, ad it to the set of clauses formed from given axioms(as a set of support).
C1 : P is true , C2 : ~P V ~Q V R ( By eliminating implication in
(P ˄ Q) → R
C3 : ~ S ˅ Q , C4 : ~ T ˅ Q , C5 : T , C6 : ~R.
( Eliminating implication from ( S ˅ T ) → Q
~ ( S ˅ T ) ˅ Q ≡ (~ S ˄ ~ T ) V Q ( By demorgan’s law), Now apply distributive law
We obtain : (~ S ˅ Q ) ˄ (~ T ˅ Q ) , convert it into two clauses C3 and C4 after
removing AND connector.
~P˅~Q˅R ~R
~ P ˅ ~ Q (Resolvent Clause ) P
~T˅Q ~Q
~T T
Assumption that ~ R is true is false.
So R is true.
Empty Clause
(Contradiction Found)
Ques 16: How is resolution in first order predicate logic different from that of propositional
performed? What is Unification Algorithm & why it is required?
Ans : In FOPL , while solving through resolution , situation is more complicated since we must consider all
the possible ways of substituting values for variables. Due to the presence of existential and universal
quantifiers in wff and arguments in predicates , the thing becomes more complicated
Finding a contradiction is to try systematically the possible substitutions and see if each produces a
contradiction. To apply resolution in predicate logic , we first need to apply unification technique.
Because in FOPL literals with arguments are to be resolved , then matching of arguments is also
required.
Unification Algorithm: Unification algorithm is used as a Recursive Procedure. Let two literals in
FOPL are P (x ,x ) and P ( y , z ). Here predicate name P matches in both literals , but arguments do
not match. O now substitution is required. Now 1st arguments of both x and y do not match. So
substitute y for x , then it will match.
Ques 17: Given the following set of facts, Prove that “ Some who are intelligent can’t read ”.
Ques 19 : Explain Backward and forward Chaining , with example in logic representation. Also mention
advantages and disadvantages of both the algorithms.
Ans : The process of the output of one rule activating another rule is called chaining. Chaining technique is
to break the task into small procedures and then to inform each procedure within the sequence by itself. Two
types of chaining techniques are known: forward chaining and backward chaining.
Backward Chaining :
The opposite of a forward chaining is a backward chaining.
Contrast to forward chaining, a backward chaining is a goal-driven reasoning method. The
backward chaining starts from the goal (from the end) which is a hypothetical solution and the
inference engine tries to find the matching evidence.
When it is found, the condition becomes the sub-goal, and then rules are searched to prove
these sub-goals. It simply matches the RHS of the goal. This process continues until all the
sub-goals are proved, and it backtracks to the previous step where a rule was chosen.
If there is no rule to be established in an individual sub-goal, another rule is chosen.
The backward chaining reasoning is good for the cases where there are not so much facts and
the information (facts) should be generated by the user. The backward chaining reasoning is
also effective for application in the diagnostic tasks.
In many cases the linear logic programming languages are implemented using the
backward chaining technique. The combination of backward chaining with forward
chaining provides better results in many applications.
Ques 20: What is Utility theory and its importance in AI ? Explain with the help of suitable examples.
Ans : Utility theory is concerned with people's choices and decisions. It is concerned also with people's
preferences and with judgments of preferability, worth, value, goodness or any of a number of similar
concepts. Utility means quality of being useful. So as per this each state in environment has a degree of
usefulness to an agent, that agent will prefer states with higher utility.
Interpretations of utility theory are often classified under two headings, prediction and prescription:
(i) The predictive approach is interested in the ability of a theory to predict actual choice behavior.
(ii) The prescriptive approach is interested in saying how a person ought to make a decision.
E.g : Psychologists are primarily interested in prediction.
Economists in both prediction and prescription. In statistics the emphasis is on prescription
in decision making under uncertainty. The emphasis in management science is prescriptive
also.
Sometimes it is useful to ignore uncertainty, focus on ultimate choices. Other times, must model
uncertainty explicitly. Examples: Insurance markets, Financial markets., Game theory. Rather than
choosing outcome directly, decision-maker chooses uncertain prospect (or lottery). A lottery is a probability
distribution over outcomes.
This has two basic components; consequences (or outcomes) and lotteries.
(a) Consequences: These are what the decision-maker ultimately cares about.
Example: “I get pneumonia, my health insurance company covers most of the costs, but I have to pay
a $500 deductible.” Consumer does not choose consequences directly. Lotteries Consumer chooses a
lottery, p
(b) Lotteries are probability distributions over consequences: p : C → [0, 1] ;
with ∑c ∈ C p (c) = 1. Set of all lotteries is denoted by P. Example: “A gold-level health insurance
plan, which covers all kinds of diseases, but has a $500 deductible.” Makes sense because consumer
assumed to rank health insurance plans only insofar as lead to different probability distributions over
consequences.
Mr. Anuj Khanna
Assitant Professor(KIOT,Kanpur)
www.uptunotes.com (Artificial Intelligence)UNIT-3
u : C → R such that U (p) = ∑ p (c) u (c) for all p ∈ P. c ∈ C . In this case, the function U is called an
expected utility function, and the function u is call a von Neumann-Morgenstern utility function. These
functions are used to capture agent’s preferences between various world states .This function assigns a
single number to express desirability of a state utilities. Utilities are combined with outcome probabilities of
actions to give an expected utility for each action. U (s) : Means utility of state S , for agent’s Decision.
Maximum expected Utility ( MEU) : This represents that a rational agent should select an action that
maximizes the agent’s expected utility. MEU principle says “ If an agent maximizes a utility function that
correctly reflects the performance measure by which its behavior is being judged , then it will achieve the
highest possible performance score if we average over the environment of agent.”
Ques 21: What are constraint notations in utility theory ? Define the term Lottery. Also mention the
following axioms of Utility Theory :
(i) Orderability (ii) Substitutability (iii) Monotonicity (iv)Decomposability.
Ans : Constraint Notations in Utility theory for two outcomes / consequences A and B are as mentioned
below :
A B : A is preferred over B.
A ~ B : Agent is indifferent between A and B.
A ≥ B : Agent prefers A to B or is indifferent b/w them.
A Lottery L with possible outcomes C1 , C2 , C3 …..Cn that can occur with probabilities [ p1 , C1 ;
p2 , C2 ; …..; pn , Cn ].Each outcome of a lootery can be an atomic state or another lottery.
(i) Orderability : Given any two states , a rational agent must prefer one to other or else rate the
two as equally preferable. So agent can’t avoid the decision.
( A B) ˅ ( B A) ˅ ( A ~ B)
(ii) Substitutability: If an agent A is indifferent b/w two lotteries A and B , then the agent is
indifferent b/w two more complex lotteries that are same except that B is substituted for A in
one of them.
( A ~ B) [ p , A ; 1 – p , c ] ~ [ p , B ; 1 – p , c]
[ p , A ; 1 – p, [ q , B ; 1 – q , C ] ] ~ [ p , A ; (1 - p) q, B ; (1 - p) (1 -q) , C]
Ans : Probabilistic Reasoning in Intelligent Systems is a complete and accessible account of the theoretical
foundations and computational methods that underlie plausible reasoning under uncertainty.
Intelligent agent’s almost never have acess to the whole truth about their environment. So agents act under
uncertainty. The agent’s knowledge can only provide degree of belief. Main concept for dealing with degree
of belief is PROBABILITY THEORY.
If probability is 0 , then belief is that statement is false.
If probability is 1 , then belief is that statement is true.
Percepts received from the environment form the evidence on which probability assertions are based.
As agent receives new percepts , its probability assessments are updated to reflect new Evidence.
Before the evidence is find , we talk about prior (unconditional) probability.
After the evidence is given , we deal with posterior (conditional ) probability.
Probability associated with a proposition (sentence) P is the degree of belief associated with it in the
absence of any other information.
• In AI applications, sample points are defined by set of random variables
– Random vars: boolean, discrete, continuous
Probability Distribution: With respect to some random variable we talk about the probabilities of all
possible outcomes of a random variable. E.g : Let weather is random variable , Given that :
P( weather = sunny) = 0.7 , P( weather = rainy) = 0.2 , P( weather = cloudy) = 0.08
P( weather = snowy ) = 0.02
Joint Probability Distribution: Joint probability distribution for a set of random variables gives the
probability of every atomic event on those random variables (i.e., every sample point).In this case
P(Weather, Cavity) can be given by a 4 × 2 matrix of values
If a complete set of random variable is covered then it is called “ Full Joint Probability Distribution”.
Conditional Probability:
Definition of conditional probability: P(a∣b) = P(a ∧ b) | P(b) if P(b) ≠ 0 .
Product rule gives an alternative formulation: P(a ∧ b) = P(a∣b) . P(b) = P(b∣a)P(a) .
A general version holds for whole distributions, e.g., P(Weather, Cavity) = P(Weather ∣Cavity)P(Cavity)
Chain rule is derived by successive application of product rule: P(X1, . . . , Xn) = P(X1, . . . , Xn−1)
P(Xn∣X1, . . . , Xn−1) = P(X1, . . . , Xn−2) P(Xn−1∣X1, . . . , Xn−2) P(Xn∣X1, . . . , Xn−1) = . . . = ∏ n i =
1 P(Xi ∣X1, . . . , Xi−1) .
Ques 23:Explain in detail Markov Model and its applications in Artificial Intelligence.
Markov model is an un-précised model that is used in the systems that does not have any fixed
patterns of occurrence i.e. randomly changing systems.
Markov model is based upon the fact of having a random probability distribution or pattern that may
be analysed statistically but cannot be predicted precisely.
In Markov model, it is assumed that the future states only depend upon the current states and not the
previously occurred states. In I order markov, current state depends only on just previous state. i.e.
Conditional probability is : P ( Xt | X0 : t-1) = P ( Xt | X t-1)
Set of states: { S1 S2 , S3 …. Sn }. Process moves from one state to another generating a sequence of
states.
Observable state sequence lead to a Markov Chain Model. Non Observable state leads to Hidden
Markov Models.
Transition Probability Matrix: Each time when a new state is reached the system is set to have
incremented one step ahead. Each step represents a time period which would result in another possible
state. Let Si is state I of environment for I = 1 , 2… n.
Markov chain property: probability of each subsequent state depends only on what was the
previous state: P ( Sik | Si1 , Si2 ,……., Sik-1) = P ( Sik | Sik - 1) .
To define Markov model, the following probabilities have to be specified:
Transition probabilities: a ij = P ( Sj |Si) i.e. probability of transition from state i to j.
Initial Probabilities: ∏𝒊 = 𝑷 (𝑺𝒊) , Calculation of conditional probabilities of state sequences
are given as below :
P ( Si1 , Si2 , …….Sik-1 , Sik) = P ( Sik | Si1 , Si2 ,……., Sik-1). P ( Si1 , Si2 , …… Sik-1)
= P ( Sik | Sik-1 ) . P ( Si1 , Si2 , ….. Sik-2)
= P ( Sik | Sik-1). P( Sik-1 | Sik-2)………..P ( Si2 | Si1) . P(Si).
There are four common Markov-Models:
(i)Markov Decision Models (ii) Markov Chains (iii) Hidden Markov Model (iv)Partially
observable Markov Decision Process
Example : Consider a Problem of weather conditions, Transition diagram is as given below :
states of the model are hidden. Each state can emit an output which is observed . This model is used
because simple markov chain is too restricted for complex applications.
In Hidden Markov-Model, every individual state has limited number of transitions and emissions.
State sequences are not directly observable, rather it can be recognized from the sequence of
observations produced by the system.
Probability is assigned for each transition between states.
Hence, the past states are totally independent of future states.
The fact that HMM is called hidden because of its ability of being a memory less process i.e. its
future and past states are not dependent on each other.
This can be achieved on two algorithms called as:
(i) Forward Algorithm. (ii) Backward Algorithm.
Components of HMM :
Set of states: { S1 S2 , S3 …. Sn }.
Sequence of states generated by the system : { Si1 , Si2 , …….Sik-1 , Sik }
Joint probability Distribution by Markovian Chain :
P ( Sik | Si1 , Si2 ,……., Sik-1) = P ( Sik | Sik - 1)
Observations / Visible states : { V1 , V2 , …Vm-1 , V m}
Ques 25 : Consider the following data provided for Weather Forecasting Scenario.
Two states (Hidden) : ‘Low’ and ‘High’ atmospheric pressure.
Two observations (Visible States) : ‘Rain’ and ‘Dry’.
Suppose we want to calculate a probability of a sequence of observations in our
example, { ‘Dry’,’ Rain’}.
Ans : Solution :
Transition probabilities:
P(‘Low’|‘Low’) = 0.3
P(‘High’|‘Low’) = 0.7,
P(‘Low ’|‘High’) = 0.2 ,
P(‘High ’|‘High’) = 0.8
Observation probabilities:
P(‘Rain ’|‘Low’) = 0.6
P(‘Dry ’|‘Low’) = 0.4
P(‘Rain ’|‘High’) = 0.4
P(‘Dry ’|‘High’) =0.3 .
= 0.4*0.4*0.6*0.4*0.3
Ques 26 : Explain in detail Bayesian Theory and its use in AI. Define Likelihood ratio.
Ans : In probabilistic reasoning our conclusions are generally based on available evidences and past
experience . This information is mostly incomplete. When outcomes are unpredictable we use probabilistic
reasoning, E.g Weather forecasting system, Disease Diagnosis, Traffic congestion control system.
When a doctor examines a patient’s history , symptoms , test rules , evidence of possible disease.
In weather fore casting prediction of tomorrow’s cloud coverage , wind speed and direction , sun
heat intensity.
A Business manager must take decision based on uncertain predictions , when to launch a new
product . Factors can be : Target consumer’s life style , population growth in specific city / state,
Average income of consumers, economic scenario of the country . All this can be depend on past
experience of market.
From the product rule of probability theory we express the following equations:
P ( a ∧ b ) = P(a ∣ b) . P( b ) ……….Eq 1.
P( a ∧ b ) = P( b ∣ a ) P( a ) …………Eq 2.
𝑷(𝒂 |𝒃) 𝑷 (𝒃)
On Equating both the equations: P(b|a)=
𝑷(𝒂)
Baye’s rule is used in modern AI systems for probabilistic inferences. It uses he notion of conditional
probability: P ( H | E ), This expression is read as “ The probability of hypothesis H given that we have
observed evidence E ”. For this we require prior probability H ( if we have no evidence) and extent to which
E provides evidence of H.
𝑷( 𝑬 |𝑯𝒊).𝑷(𝑯𝒊)
Baye’s theorem states : P ( Hi | E ) =
∑𝑲
𝒏=𝟏 𝑷 (𝑬 |𝑯𝒏).𝑷(𝑯𝒏)
On dividing Eq ( ii ) by Eq ( i) We get :
𝑷( 𝑯 | 𝑬) 𝑷( 𝑬 |𝑯).𝑷(𝑯)
= ………….Eq (iii)
𝑷(~𝑯 | 𝑬 ) 𝑷( 𝑬 |~ 𝑯 ) 𝑷 ( ~ 𝑯)
This is Ratio of a probability of an event to the probability of its negation. Ratio is known as
𝑷( 𝑬 | 𝑯)
“ ODDs of Event : O ( E)”. Ratio is known as Likelihood ratio w.r.t H = L (E/H)
𝑷(𝑬 | ~𝑯)
Odds likelihood form of Baye’s Rule from Eq (iii) is : O ( H | E) = L ( E | H ) . O( H )
Disadvantages of Baye’s Theorem: For a complex problem , the size of joint probabilities that we
require to compute this function grows as 2 n if n different propositions are there.
Knowledge acquisition is difficult. Too many probabilities are needed.
Sapce for all probabilities is too large.
Computation terms of all probabilities are too large.
Ques 27 : What is Bayesian Network or Belief Network ? Explain its importance with the help of
an example.
Ans : To describe a real world , it is not necessary to use huge joint probability table in which the list of
probabilities of all possible outcomes is stored. To represent relationship between independent and
conditional independent variables a systematic approach in the form of a data structure called Bayesian
Network is used. It is also known as Causal network, Belief network , probabilistic network, Knowledge
Map. Extension of this is decision network or influence diagram.
“ A Bayesian network is a directed graph in which each node is attached with a quantitative probability
information”. This network is supported by CPT, known as conditional probability table. These are used
for representing knowledge in an Uncertain Domain
Belief network used to encode the meaningful dependence between variables.
1. Nodes represent random variables 2. Arcs represent direct influence
2. Nodes have conditional probability table that gives that variables probability given the
different states of its parents
The Semantics of Belief Networks
1. To construct network , think of as representing the joint probability distribution.
Inference in Belief Networks: agate beliefs. After constructing such a network an inference engine
can use it to maintain and propagate beliefs. When new information is received , the effects can be
propagated throughout the network , until equilibrium probabilities are reached.
(a) Diagnostic inference: symptoms to causes
(b) Causal inference: causes to symptoms
(c) Intercausal inference
(d) Mixed inference: mixes those above
Inference in Multiply Connected Belief Networks
(a)Multiply connected graphs have 2 nodes connected by more than one path
(b)Techniques for handling:
Clustering: Group some of the intermediate nodes into one meganode.
Stochastic simulation`: run thru the net with randomly choosen values for each node
(weighed by prior probabilities).
The probability of any atomic event (it's joint probability) can be gotten from the
network.
The correct order to add the nodes is "root causes" first, then the variables they influence
until we reach the "leaves", which have no direct causal influence on the other variables.
If we don't, the network will have : More links and less natural probabilities needed
Example: Scenario is about a new burglar alarm installed at home. It also
responds in minor earthquakes. Two neighbors John and Mary are always available in
case of any emergency. John always calls when he hears alarms but sometimes confuses
with telephone ring. Mary likes loud music and sometimes misses to hear the alarm
sound. The probabilities actually summarize a potentially infinite set of circumstances in
which the alarm might fail to go off ( E.g : High humidity , power failure , dead battery ,
cut wires , a dead mouse stuck inside the bell etc.) OR ( John or Mary might fail to
call and report it due to out for lunch , on vacations , temporarily deaf , passing of
airplane near the home etc.