AI Unit-4

ARTIFICIAL INTELLIGENCE
Dr. Anand M
Assistant Professor
Department of Data Science and Business Systems
SRM Institute of Science and Technology
Procedural v/s Declarative Knowledge
Procedural v/s Declarative Knowledge
A Declarative representation is one in which knowledge is specified but the use to which that
knowledge is to be put in, is not given.
A Procedural representation is one in which the control information that is necessary to use the
knowledge is considered to be embedded in the knowledge itself.
To use a procedural representation, we need to augment it with an interpreter that follows the
instructions given in the knowledge.
The difference between the declarative and the procedural views of knowledge lies in where
control information resides.
2
Consider the example
man(Marcus)
man (Ceaser)
Person(Cleopatra)
Vx : man(x)  person(x)
Now we want to extract from this knowledge base the ans to the question :
Ǝy : person (y)
Marcus, Ceaser and Cleopatra can be the answers
3
As there is more than one value that satisfies the predicate, but only one value is needed, the
answer depends on the order in which the assertions are examined during the search of a response.
If we view the assertions as declarative, then we cannot depict how they will be examined. If we
view them as procedural, then they do.
Let us view these assertions as a non deterministic program whose output is simply not defined,
now this means that there is no difference between Procedural & Declarative Statements. But most of
the machines don’t do so, they hold on to what ever method they have, either sequential or in parallel.
The focus is on working on the control model.
4
man(Marcus)
man (Ceaser)
Vx : man(x)  person(x)
Person(Cleopatra)
If we view this as declarative then there is no difference with the previous statement. But viewed
procedurally, and using the control model, we used to got Cleopatra as the answer, now the answer is
marcus.
The answer can vary by changing the way the interpreter works.
The distinction between the two forms is often very fuzzy. Rather then trying to prove which technique
is better, what we should do is to figure out what the ways in which rule formalisms and interpreters can
be combined to solve problems.
5
Logic Programming
Logic programming is a programming language paradigm in which logical assertions are viewed as
programs, e.g : PROLOG
A PROLOG program is described as a series of logical assertions, each of which is a Horn Clause.
A Horn Clause is a clause that has at most one positive literal.
Eg p, ¬ p V q etc are also Horn Clauses.
The fact that PROLOG programs are composed only of Horn Clauses and not of arbitrary logical
expressions has two important consequences.
Because of uniform representation a simple & effective interpreter can be written.
The logic of Horn Clause systems is decidable.
6
A Declarative and a Procedural Representation
7
144
Answering Questions in PROLOG
8
145
Logic Programming
Logic Programming
Even PROLOG works on backward reasoning.
The program is read top to bottom, left to right and search is performed depth-first with
backtracking.
There are some syntactic difference between the logic and the PROLOG representations as
mentioned in Fig 6.1
The key difference between the logic & PROLOG representation is that PROLOG interpreter has a
fixed control strategy, so assertions in the PROLOG program define a particular search path to
answer any question.
Where as Logical assertions define set of answers that they justify, there can be more than one
answers, it can be forward or backward tracking .
9
Logic Programming
Logic Programming
Control Strategy for PROLOG states that we begin with a problem statement, which is viewed as a
goal to be proved.
Look for the assertions that can prove the goal.
To decide whether a fact or a rule can be applied to the current problem, invoke a standard
unification procedure.
Reason backward from that goal until a path is found that terminates with assertions in the
program.
Consider paths using a depth-first search strategy and use backtracking.
Propagate to the answer by satisfying the conditions.
10
Forward v/s
Forward v/s Backward Reasoning
Backward Reasoning
The objective of any search is to find a path through a problem space from the initial to the final
one.
There are 2 directions to go and find the answer
Forward
Backward
8-square problem
Reason forward from the initial states : Begin building a tree of move sequences that might be
solution by starting with the initial configuration(s) at the root of the tree. Generate the next level of
tree by finding all the rules whose left sides match the root node and use the right sides to create the
new configurations. Generate each node by taking each node generated at the previous level and
applying to it all of the rules whose left sides match it. Continue.
11
Forward v/s Backward Reasoning
Reason backward from the goal states : Reason backward from the goal states : Begin building a
tree of move sequences that might be solution by starting with the goal configuration(s) at the root of
the tree. Generate the next level of tree by finding all the rules whose right sides match the root node
and use the left sides to create the new configurations. Generate each node by taking each node
generated at the previous level and applying to it all of the rules whose right sides match it. Continue.
This is also called Goal-Directed Reasoning.
To summarize, to reason forward, the left sides(pre conditions) are matched against the current state
and the right sides(the results) are used to generate new nodes until the goal is reached.
To reason backwards, the right sides are matched against the current node and the left sides are
used to generate new nodes.
12
A Sample of the Rules for
Solving the 8-Puzzle
..
An Examples :
.
Start Goal
13
146
Factors that influence whether to choose forward or backward reasoning :
Are there more possible start states or goal states? We would like to go from smaller set of
states to larger set of states.
In which direction is the branching factor (the average number of nodes that can be reached
directly from a single node) greater? We would like to proceed in the direction with the lower
branching factor.
Will the program be asked to justify its reasoning process to the user? It so, it is important to
proceed in the direction that corresponds more closely with the way user will think.
What kind of event is going to trigger a problem-solving episode? If it is the arrival of a new
fact , forward reasoning should be used. If it a query to which response is desired, use backward
reasoning.
14
Home to unknown place example.
MYCIN
Bidirectional Search ( The two searches must pass each other)
Forward Rules : which encode knowledge about how to respond to certain input configurations.
Backward Rules : which encode knowledge about how to achieve particular goals.
15
Backward-Chaining Rule Systems
PROLOG & MYCIN is an example of this.
These are good for goal-directed problem solving.
Forward –Chaining Rule Systems
We work on the incoming data here.
The left sides of rules are matched with against the state description.
The rules that match the state dump their right side assertions into the state.
Matching is more complex for forward chaining systems than backward ones.
OPS5 is the examples of the same.
16
Combining Forward v/s Backward Reasoning
Patients example of diagnosis.
In some systems ,this is only possible in reversible rules.
17
MATCHING
Matching
Till now we have used search to solve the problems as the application of appropriate rules.
 We applied them to individual problem states to generate new states to which the rules can then be
applied, until a solution is found.
We suggest that a clever search involves choosing from among the rules that can be applied at a
particular point, but we do not talk about how to extract from the entire collection of rules those that
can be applied at a given point.
To do this we need matching.
18
MATCHING
Indexing
Do a simple search through all the rules, comparing each one’s precondition to the current state and
extracting all the ones that match.
But this has two problems
In order to solve very interesting problems , it will be necessary to use a large number of rules,
scanning through all of them at every step of the search would be hopelessly inefficient.
It is not always immediately obvious whether a rule’s preconditions are satisfied by a particular
state.
To solve the first problem, use simple indexing. Eg. In Chess ,combine all moves at a particular
board state together.
19
Representation Affects Matching
20
149
MATCHING
Matching with Variables
The problem of selecting applicable rules is made more difficult when preconditions are not stated
as exact descriptions of particular situations but rather describe properties that the situations must
have.
Then we need to match a particular situation and the preconditions of a given situation.
In many rules based systems, we need to compute the whole set of rules that match the current state
description. Backward Chaining Systems usually use depth-first backtracking to select individual
rules, but forward chaining systems use Conflict Resolution Strategies.
One efficient many to many match algorithm is RETE
21
Many to Many Matching
 The temporal nature of data.
Rules usually do not alter the state description radically. Instead, a rule will typically add one or two element or delete one or two,
but most of the state description remain same.
 Structural similarity in rules.

Different rules may share large number of preconditions.
 Persistence of variable binding consistency.

When all individual preconditions of the rule might be met, there may be variable binding conflicts that prevent the rule from firing.
Given son(Mary,Joe) and son(Bill,Bob).

22
150
MATCHING
Complex & Approximate Matching
A more complex matching process is required when the preconditions of a rule specify required
properties that are not stated explicitly in the description of the current state. In this case, a separate
set of rules must be used to describe how some properties can be inferred from others.
An even more complex matching process is required if rules should be applied if their
preconditions approximately match the current situation. Example of listening to a recording of a
telephonic conversation.
For some problems, almost all the action is in the matching of the rules to the problem state. Once
that is done, so few rules apply that the remaining search is trivial. Example ELIZA
23
A Bit of a Dialogue with ELIZA
24
151
Some ELIZA-like rules
My brother is mean to me.

Eliza Response: Who else in your family is mean to you? Or
Tell me more about family.
25
152
Conflict Resolution
The result of the matching process is a list of rules whose antecedents have matched the current
state description along with whatever variable binding were generated by the matching process.
It is the job of the search method to decide on the order in which the rules will be applied. But
sometimes it is useful to incorporate some of the decision making into the matching process. This
phase is called conflict resolution.
There are three basic approaches to the problem of conflict resolution in the production system
Assign a preference based on the rule that matched.
Assign a preference based on the objects that matched.
Assign a preference based on the action that the matched rule would perform.
Conflict Resolution
 Preferences based on rules.
- Rule order. E.g. PROLOG
- Prefer special cases over more general ones.
- There are two to decide which rule is general.
- If the set of preconditions of one rule contain all the preconditions of another(plus some others), then second rule is more general than
first.
- If the set of preconditions of one rule are same as those of another except that in first case variables are specified and in second there are
constants, then first rule is more general than second.
 Preferences based on objects.
- Prefer some objects to others.
- If input sentence contained variety of keywords that ELIZA knew. If that happened, then ELIZA made use of the fact that some keywords had been
marked as being more significant then another.
P: I know everybody laughed at me.
ELIZA: You say you know everybody laughed at you. Or,
Who in particular are you thinking of?
 Preferences based on states.
- Heuristic
153
Control Knowledge
• Intelligent program require search, search is computationally intractable unless it
is constrained by knowledge about the world.
• Knowledge about which paths are more likely to lead quickly to goal state is often
called search controlled knowledge. It can make many forms:
1. Knowledge about which states are more preferable to others.
2. Knowledge about which rule to apply in a given situation.
3. Knowledge about the order in which to pursue subgoals.
4. Knowledge about useful sequences of rules to apply.
28
Control Knowledge
• A number of AI systems represent their control knowledge with rules.
• SOAR: is a general architecture for building intelligent system. SOAR is based on set
of specific, cognitively motivated hypotheses about the structure of human problem
solving. In SOAR:
1. Long-Term Memory is stored as a set of productions (or rules).
2. Short-Term Memory (also called working memory) is a buffer is analogues to the state
description in problem solving.
3. All problem-solving activity takes place as a state space traversal.
4. All intermediate and final results of problem solving are remembered (or, chunked) for further
reference.
29
Control Knowledge
30
Representing knowledge in an uncertain
domain
Uncertain knowledge and reasoning
• Probability theory
• Bayesian networks
31
1. Probability theory
1.1 Uncertain knowledge

p symptom(p, Toothache)  disease(p,cavity)
p sympt(p,Toothache) 
disease(p,cavity)  disease(p,gum_disease) …
• PL
- laziness
- theoretical ignorance
- practical ignorance
• Probability theory  degree of belief or plausibility of a statement – a
numerical measure in [0,1]
• Degree of truth – fuzzy logic  degree of belief
32
1.2 Definitions
• Unconditional or prior probability of A – the degree of belief in A in the absence of

any other information – P(A)
• A – random variable
• Probability distribution – P(A), P(A,B)
Example
P(Weather = Sunny) = 0.1
P(Weather = Rain) = 0.7
P(Weather = Snow) = 0.2
Weather – random variable
• P(Weather) = (0.1, 0.7, 0.2) – probability dsitribution
• Conditional probability – posterior – once the agent has obtained some evidence B
for A - P(A|B)
• P(Cavity | Toothache) = 0.8
33
Definitions - cont
• Axioms of probability
• The measure of the occurrence of an event (random variable) A – a
function P:S  R satisfying the axioms:
• 0  P(A)  1
• P(S) = 1 ( or P(true) = 1 and P(false) = 0)
• P(A  B) = P(A) + P(B) - P(A  B)
P(A  ~A) = P(A)+P(~A) –P(false) = P(true)

P(~A) = 1 – P(A)
34
Definitions - cont
A and B mutually exclusive  P(A  B) = P(A) + P(B)

P(e1  e2  e3  … en) = P(e1) + P(e2) + P(e3) + … + P(en)
The probability of a proposition a is equal to the sum of the probabilities

of the atomic events in which a holds
e(a) – the set of atomic events in which a holds
P(a) =  P(ei)
eie(a)
35
1.3 Product rule
Conditional probabilities can be defined in terms of unconditional

probabilities
The condition probability of the occurrence of A if event B occurs
• P(A|B) = P(A  B) / P(B)
This can be written also as:
• P(A  B) = P(A|B) * P(B)
For probability distributions
• P(A=a1  B=b1) = P(A=a1|B=b1) * P(B=b1)
• P(A=a1  B=b2) = P(A=a1|B=b2) * P(B=b2) ….
• P(X,Y) = P(X|Y)*P(Y)
36
1.4 Bayes’ rule and its use
P(A  B) = P(A|B) *P(B)

P(A  B) = P(B|A) *P(A)
Bays’ rule (theorem)

• P(B|A) = P(A | B) * P(B) / P(A)
• P(B|A) = P(A | B) * P(B) / P(A)

Bayes Theorem
hi – hypotheses (i=1,k);
e1,…,en - evidence
P(hi)
P(hi | e1,…,en)
P(e1,…,en| hi)
P ( e1 , e 2 , ... , e n | h i )  P ( h i )
P ( h i | e 1 , e 2 , . .. , e n ) = k
, i = 1, k
 P ( e1 , e 2 , ... , e n | h j )  P ( h j )
j 1
38
Bayes’ Theorem - cont
If e1,…,en are independent hypotheses then
P(e 1 , e 2 ,..., e n | h j ) = P(e 1 | h j )  P(e 2 | h j )  ...  P(e n | h j ), j = 1, k
PROSPECTOR
39
1.5 Inferences
Probability distribution P(Cavity, Tooth)

Tooth  Tooth
Cavity 0.04 0.06
 Cavity 0.01 0.89
P(Cavity) = 0.04 + 0.06 = 0.1

P(Cavity  Tooth) = 0.04 + 0.01 + 0.06 = 0.11
P(Cavity | Tooth) = P(Cavity  Tooth) / P(Tooth) = 0.04 / 0.05
40
Inferences
Tooth ~ Tooth
Catch ~ Catch Catch ~ Catch
Cavity 0.108 0.012 0.072 0.008
~ Cavity 0.016 0.064 0.144 0.576
Probability distributions P(Cavity, Tooth, Catch)
P(Cavity) = 0.108 + 0.012 + 0.72 + 0.008 = 0.2

P(Cavity  Tooth) = 0.108 + 0.012 + 0.072 + 0.008 + 0.016
+ 0.064 = 0.28
P(Cavity | Tooth) = P(Cavity  Tooth) / P(Tooth) =
[P(Cavity  Tooth  Catch) + P(Cavity  Tooth  ~ Catch)] * / P(Tooth)
41
2 Bayesian networks
• Represent dependencies among random variables
• Give a short specification of conditional probability distribution
• Many random variables are conditionally independent
• Simplifies computations
• Graphical representation
• DAG – causal relationships among random variables
• Allows inferences based on the network structure
42
2.1 Definition of Bayesian networks
A BN is a DAG in which each node is annotated with quantitative

probability information, namely:
• Nodes represent random variables (discrete or continuous)
• Directed links XY: X has a direct influence on Y, X is said to be a parent
of Y
• each node X has an associated conditional probability table, P(Xi |
Parents(Xi)) that quantify the effects of the parents on the node
Example: Weather, Cavity, Toothache, Catch

• Weather, Cavity  Toothache, Cavity  Catch
43
Bayesian network - example
P(B) Burglary P(E)
0.001 Earthquake
0.002
B E P(A)
T T 0.95
T F 0.94 Alarm
F T 0.29
F F 0.001
A P(J) A P(M)
T 0.9 JohnCalls MaryCalls T 0.7
F 0.05 F 0.01
B E P(A | B, E)
T F
Conditional probability T T 0.95 0.05
table T F 0.94 0.06
F T 0.29 0.71
F F 0.001 0.999
44
2.2 Bayesian network semantics
A) Represent a probability distribution

B) Specify conditional independence – build the network
A) each value of the probability distribution can be computed as:

P(X1=x1  … Xn=xn) = P(x1,…, xn) =
i=1,n P(xi | Parents(xi))
where Parents(xi) represent the specific values of Parents(Xi)
45
2.3 Building the network
P(X1=x1  … Xn=xn) = P(x1,…, xn) =

P(xn | xn-1,…, x1) * P(xn-1,…, x1) = … =
P(xn | xn-1,…, x1) * P(xn-1 | xn-2,…, x1)* … P(x2|x1) * P(x1) =
i=1,n P(xi | xi-1,…, x1)
• We can see that P(Xi | Xi-1,…, X1) = P(xi | Parents(Xi)) if

Parents(Xi)  { Xi-1,…, X1}
• The condition may be satisfied by labeling the nodes in an order consistent with a
DAG
• Intuitively, the parents of a node Xi must be all the nodes
Xi-1,…, X1 which have a direct influence on Xi.
46
Building the network - cont
• Pick a set of random variables that describe the problem

• Pick an ordering of those variables
• while there are still variables repeat
(a) choose a variable Xi and add a node associated to Xi
(b) assign Parents(Xi)  a minimal set of nodes that already exists in the network
such that the conditional independence property is satisfied
(c) define the conditional probability table for Xi
• Because each node is linked only to previous nodes  DAG

• P(MaryCalls | JohnCals, Alarm, Burglary, Earthquake) =
P(MaryCalls | Alarm)
47
Compactness of node ordering
• Far more compact than a probability distribution

• Example of locally structured system (or sparse): each component
interacts directly only with a limited number of other components
• Associated usually with a linear growth in complexity rather than with an
exponential one
• The order of adding the nodes is important
• The correct order in which to add nodes is to add the “root causes” first,
then the variables they influence, and so on, until we reach the leaves
48
2.4 Probabilistic inferences
A V B
P(A  V  B) = P(A) * P(V|A) * P(B|V)

V
A B
P(A  V  B) = P(V) * P(A|V) * P(B|V)
A B
V P(A  V  B) = P(A) * P(B) * P(V|A,B)
49
Probabilistic inferences
P(B) Burglary P(E)

0.001 Earthquake
0.002
B E P(A)
T T 0.95
T F 0.94 Alarm
F T 0.29
F F 0.001
A P(J) A P(M)
F 0.05 F 0.01
P(J  M  A B E ) =

P(J|A)* P(M|A)*P(A|B E )*P(B) P(E)=
0.9 * 0.7 * 0.001 * 0.999 * 0.998 = 0.00062
50
Probabilistic inferences
P(B) Burglary P(E)

0.001 Earthquake
0.002
B E P(A)
T T 0.95
T F 0.94 Alarm
F T 0.29
F F 0.001
A P(J) A P(M)
F 0.05 F 0.01
P(A|B) = P(A|B,E) *P(E|B) + P(A| B,E)*P(E|B)

= P(A|B,E) *P(E) + P(A| B,E)*P(E)
= 0.95 * 0.002 + 0.94 * 0.998 = 0.94002
51
Dempster-Shafer Theory
• Probability theory limitation
• Assign a single number to measure any situation, no matter how it is complex
• Cannot deal with missing evidence, heuristics, and limited knowledge
• Dempster-Shafer theory
• Extend probability theory
• Consider a set of propositions as a whole
• Assign a set of propositions an interval [believe, plausibility] to constraint the degree of belief for each
individual propositions in the set
• The belief measure bel is in [0,1]
• 0 – no support evidence for a set of propositions
• 1 – full support evidence for a set of propositions
• The plausibility of p,
• pl(p) = 1 – bel(not(p))
• Reflect how evidence of not(p) relates to the possibility for belief in p
• Bel(not(p))=1: full support for not(p), no possibility for p
• Bel(not(p))=0: no support for not(p), full possibility for p
• Range is also in [0,1]
Properties of Dempster-Shafer
• Initially, no support evidence for either competing hypotheses, say h1
and h2
• Dempster-Shafer: [bel, pl] = [0, 1]
• Probability theory: p(h1)=p(h2)=0.5
• Dempster-Shafer belief functions satisfy weaker axioms than probability
function
• Two fundamental ideas:
• Obtaining belief degrees for one question from subjective probabilities for related
questions
• Using Dempster rule to combine these belief degrees when they are based on
independent evidence
An Example
• Two persons M and B with reliabilities detect a computer and claim the result independently. How you believe
their claims?
• Question (Q): detection claim
• Related question (RQ): detectors’ reliability
• Dempster-Shafer approach
• Obtain belief degrees for Q from subjective (prior) probabilities for RQ for each person
• Combine belief degrees from two persons
• Person M:
• reliability 0.9, unreliability 0.1
• Claim h1
• Belief degree of h1 is bel(h1)=0.9
• Belief degree of not(h1) is bel(not(h1))=0.0, different from probability theory, since no evidence supporting not(h1)
• pl(h1) = 1 – bel(not(h1)) = 1-0 =1
• Thus belief measure for M claim h1 is [0.9, 1]
• Person B:
• Reliability 0.8, unreliability 0.2
• Claim h2
• bel(h2) =0.8, bel(not(h2))=0, pl(h2)=1-bel(not(h2))=1-0
• Belief measure for B claim h2 is [0.8,1]
Combining Belief Measure
• Set of propositions: M claim h1 and B claim h2
• Case 1: h1 = h2
• Reliability M and B: 09x0.8=0.72
• Unreliability M and B: 0.1x0.2=0.02
• The probability that at least one of two is reliable: 1-0.02=0.98
• Belief measure for h1=h2 is [0.98,1]
• Case 2: h1 = not(h2)
• Cannot be both correct and reliable
• At least one is unreliable
• Reliable M and unreliable B: 0.9x(1-0.8)=0.18
• Reliable B and unreliable M: 0.8x(1-0.1)=0.08
• Unreliable M and B: (1-0.9)x(1-0.8)=0.02
• At least one is unreliable: 0.18+0.08+0.02=0.28
• Given at least one is unreliable, posterior probabilities
• Reliable M and unreliable B: 0.18/0.28=0.643
• Reliable B and unreliable M: 0.08/0.28=0.286
• Belief measure for h1
• Bel(h1)=0.643, bel(not(h1))=bel(h2)=0.286
• Pl(h1)=1-bel(not(h1))=1-0.286=0.714
• Belief measure: [0.643, 0.714]
• Belief measure for h2
• Bel(h2)=0.286, bel(not(h2))=bel(h1)=0.683
• Pl(h2)=1-bel(not(h2))=1-0.683=0.317
• Belief measure: [0.286, 0.317]
CSC411 Artificial Intelligence 55

Dempster’s Rule
• Assumption:
• probable questions are independent a priori
• As new evidence collected and conflicts, independency may disappear
• Two steps
1. Sort the uncertainties into a priori independent pieces of evidence
2. Carry out Dempster rule
• Consider the previous example
• After M and B claimed, a repair person is called to check the computer, and both M and B
witnessed this.
• Three independent items of evidence must be combined
• Not all evidence is directly supportive of individual elements of a set of
hypotheses, but often supports different subsets of hypotheses, in favor of some
and against others
General Dempster’s Rule
• Q – an exhaustive set of mutually exclusive hypotheses
• Z – a subset of Q
• M – probability density function to assign a belief measure to Z
• Mn(Z) – belief degree to Z, where n is the number of sources of evidences
Approximate Reasoning
• This is theory of uncertainty based on fuzzy logic and concerned
with quantifying and reasoning using natural language where words
have ambiguous meaning.
• Fuzzy logic is a superset of conventional logic – extended to handle
partial truth.
• Soft-computing means computing not based on classical two-valued
logics – includes fuzzy logic, neural networks, and probabilistic
reasoning.
Fuzzy Sets and Natural Language
• A discrimination function is a way to represent which objects are

members of a set.
• 1 means an object is an element
• 0 means an object is not an element
• Sets using this type of representation are called “crisp sets” as
opposed to “fuzzy sets”.
• Fuzzy logic plays the middle ground – like human reasoning –
everything consists of degrees – beauty, height, grace, etc.
Fuzzy Sets and Natural Language
• In fuzzy sets, an object may partially belong to a set measured by

the membership function – grade of membership.
• A fuzzy truth value is called a fuzzy qualifier.
• Compatibility means how well one object conforms to some
attribute.
• There are many type of membership functions.
• The crossover point is where  = 0.5
Fuzzy Set Operations
• An ordinary crisp set is a special case of a fuzzy set with

membership function [0, 1].
• All definitions, proofs, and theorems of fuzzy sets must be

compatible in the limit as the fuzziness goes to 0 and the fuzzy sets
become crisp sets.
Fuzzy Set Operations
Set equality Set Complement

Set Containment Proper Subset
Set Union Set Intersection
Set Product Power of a Set
Probabilistic Sum Bounded Sum
Bounded Product Bounded Difference
Concentration Dilation
Intensification Normalization
62
Fuzzy Relations
• A relation from a set A to a set B is a subset of the Cartesian

product:
A × B = {(a,b) | a  A and b  B}
• If X and Y are universal sets, then

R = {R(x, y) / (x, y) | (x, y)  X × Y}
Fuzzy Relations
• The composition of relations is the net effect of applying one

relation after another.
• For two binary relations P and Q, the composition of their relations

is the binary relation:
R(A, C) = Q(A, B)  P(B, C)
Table 5.7 Some Applications of Fuzzy Theory
65
Table 5.8 Some Fuzzy Terms of Natural Language
66
Linguistic Variables
• One application of fuzzy sets is computational linguistics –

calculating with natural language statements.
• Fuzzy sets and linguistic variables can be used to quantify the
meaning of natural language, which can then be manipulated.
• Linguistic variables must have a valid syntax and semantics.
67
Extension Principle
• The extension principle defines how to extend the domain of a

given crisp function to include fuzzy sets.
• Using this principle, ordinary or crisp functions can be extended to
work a fuzzy domain with fuzzy sets.
• This principle makes fuzzy sets applicable to all fields.
68
Fuzzy Logic
• Just as classical logic forms the basis of expert systems, fuzzy logic
forms the basis of fuzzy expert systems.
• Fuzzy logic is an extension of multivalued logic – the logic of

approximate reasoning – inference of possibly imprecise
conclusions from a set of possibly imprecise premises.
Possibility and Probability
and Fuzzy Logic
• In fuzzy logic, possibility refers to allowed values.
• Possibility distributions are not the same as probability distributions

– frequency of expected occurrence of some random variable.
Planning
• Find a sequence of actions to accomplish some specific task
• Knowledge intensive
• Organize pieces of knowledge and partial plans into a solution procedure
• Applications
• Robotics
• Expert systems in reasoning about events occurring over time
• Process control, monitor
• Natural language understanding where discussing plans, goals, and intentions
Robotics
• A plan is a set of atomic actions
• Atomic actions are domain-dependent
• Block world robot
• atomic actions may be
• Pick up object a
• Go to location x
• Task: go get block a from room b
• A plan:
• Put down whatever is now held
• Go to room b
• Go over to block a
• Pick up block a
• Leave room b
• Return to original location
• Planning is to search through a space of possible actions to find the sequence
necessary to accomplish the task
Issues of Planning
• State description of the world
• Possible states
• Atomic actions
• Effect of actions on the world
• State transition
• Which part changed
• Which part unchanged
• Frame problem: specification of exactly what is changed by performing an action on the world
• Generating, saving, optimizing plans
• Generalizing plan
• Recovering from unexpected plan failure
• Maintaining consistency between the world and the system internal model of the
world
The Blocks World
The blocks world consists of 5 blocks, 1 table, and 1 gripper
(hand).
Atomic Actions
• Goto(X, Y, Z)
• go to location (x, y, z)
• This location might be implicit in pickup(W) where block W has location (X, Y, Z)
• Pickup(W)
• pickup and hold block W from the current location
• The block is clear on top, the gripper is empty, and the location of block W is known
• Putdown(W)
• place W at the current location on the table
• W must be held
• Record the new location for W.
• Stack(U, V)
• place U on top of V
• The gripper must be holding U, and V is clear on top
• Unstack(U, V)
• remove U from the top of V
• U must be clear of other blocks, V must have U on top of it, and the gripper must be empty
State Representation
• A set of predicates and predicate relationships
location(W, X, Y, Z): block W is at (X, Y, Z)
on(X, Y): block X is immediately on top of block Y
clear(X): block X has nothing on top of it
gripping(X): the robot arm is holding block X
gripping(): the gripper is empty
ontable(W): block W is on the table
• Initial state
A number of truth relations or rules for performance are created for
the clear (X), ontable (X), and gripping( ).
Rules can be interpreted either logically or procedurally. Consider the

first rule.
– Logic interpretation: If block X is clear, there does not exist any block Y such that
Y is on top of X Initial state
– Procedural interpretation: to clear X, go and remove any state Y that might be on
top of X
Rules to operate on states and produce new states:
A (B C) means that A produces B when C is true

Consider Rule 4:
– For all blocks X, pickup(X) means gripping(X) if the hand is empty (gripping
nothing) and X is clear.
Frame rules (axioms) to describe what predicates are not changed by
rule applications and thus carried over the new states
These two rules say:

– ontable is not affected by the stack and unstack operators
May have other frame axioms such as:
– on and clear are affected by stack and unstack operators only when that particular
on relation is unstacked or when a clear relation is stacked
– Thus, on(b, a) is not affected by unstacked(c, d)
New State
Operators and frame axioms define a state space
New state by applying the unstack operator and the frame axioms to the
nine predicates of the initial state: STATE 1
Summarization on Planning
• Planning may be seen as a state space search
• New states are produced by general operators such as stack and
unstack plus frame rules
• The techniques of graph search may be applied to find a path from
the start state to the goal state. The operators on this path constitute
a plan
Portion of the state space for blocks world

AI Unit-4

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AI Unit-4

Uploaded by

Copyright:

Available Formats

ARTIFICIAL INTELLIGENCE

 Structural similarity in rules.

 Persistence of variable binding consistency.

Given son(Mary,Joe) and son(Bill,Bob).

My brother is mean to me.

Tell me more about family.

1.1 Uncertain knowledge

• Unconditional or prior probability of A – the degree of belief in A in the absence of

P(A  ~A) = P(A)+P(~A) –P(false) = P(true)

A and B mutually exclusive  P(A  B) = P(A) + P(B)

The probability of a proposition a is equal to the sum of the probabilities

Conditional probabilities can be defined in terms of unconditional

P(A  B) = P(A|B) *P(B)

Bays’ rule (theorem)

• P(B|A) = P(A | B) * P(B) / P(A)

If e1,…,en are independent hypotheses then

P(e 1 , e 2 ,..., e n | h j ) = P(e 1 | h j )  P(e 2 | h j )  ...  P(e n | h j ), j = 1, k

Probability distribution P(Cavity, Tooth)

P(Cavity) = 0.04 + 0.06 = 0.1

Probability distributions P(Cavity, Tooth, Catch)

P(Cavity) = 0.108 + 0.012 + 0.72 + 0.008 = 0.2

A BN is a DAG in which each node is annotated with quantitative

Example: Weather, Cavity, Toothache, Catch

A) Represent a probability distribution

A) each value of the probability distribution can be computed as:

P(X1=x1  … Xn=xn) = P(x1,…, xn) =

• We can see that P(Xi | Xi-1,…, X1) = P(xi | Parents(Xi)) if

• Pick a set of random variables that describe the problem

• Because each node is linked only to previous nodes  DAG

• Far more compact than a probability distribution

P(A  V  B) = P(A) * P(V|A) * P(B|V)

V P(A  V  B) = P(A) * P(B) * P(V|A,B)

P(B) Burglary P(E)

P(J  M  A B E ) =

P(B) Burglary P(E)

P(A|B) = P(A|B,E) *P(E|B) + P(A| B,E)*P(E|B)

CSC411 Artificial Intelligence 55

• A discrimination function is a way to represent which objects are

• In fuzzy sets, an object may partially belong to a set measured by

• An ordinary crisp set is a special case of a fuzzy set with

• All definitions, proofs, and theorems of fuzzy sets must be

Set equality Set Complement

• A relation from a set A to a set B is a subset of the Cartesian

• If X and Y are universal sets, then

• The composition of relations is the net effect of applying one

• For two binary relations P and Q, the composition of their relations

• One application of fuzzy sets is computational linguistics –

• The extension principle defines how to extend the domain of a

• Fuzzy logic is an extension of multivalued logic – the logic of

• Possibility distributions are not the same as probability distributions

Rules can be interpreted either logically or procedurally. Consider the

A (B C) means that A produces B when C is true

These two rules say:

You might also like

P(A|B) = P(A|B,E) P(E|B) + P(A| B,E)P(E|B)