Artificial Intelligence: Rehearsal Lesson

Artificial Intelligence
Rehearsal Lesson
1 Ram Meshulam 2004

Solving Problems with Search
Algorithms
• Input: a problem P.
• Preprocessing:
– Define states and a state space
– Define Operators
– Define a start state and goal set of states.
• Processing:
– Activate a Search algorithm to find a path form
start to one of the goal states.
2 Ram Meshulam 2004

Uninformed Search
• Uninformed search methods use only
information available in the problem
definition.
– Breadth First Search (BFS)
– Depth First Search (DFS)
– Iterative DFS (IDA)
– Bi-directional search
– Uniform Cost Search (a.k.a. Dijkstra alg.)
3 Ram Meshulam 2004

Breadth-First-Search Attributes
• Completeness – yes (b  , d  )
• Optimality – yes, if graph is un-
weighted.
• Time Complexity: O(1  b  b 2  ...  b d 1  b)  O(b d 1 )
• Memory Complexity: O(b d 1 )
– Where b is branching factor and d is the
solution depth
4 Ram Meshulam 2004

Depth-First-Search Attributes
• Completeness – No. Infinite loops or
Infinite depth can occur.
• Optimality – No.
• Time Complexity: O (b m )
• Memory Complexity: O(bm) 1
– Where b is branching factor and m is the al s.
Optim
maximum depth of search tree 2 5
3
on
4 soluti
5 Ram Meshulam 2004
Limited DFS Attributes
• Completeness – Yes, if d≤l
• Optimality – No.
• Time Complexity: O (b l )
– If d<l, it is larger than in BFS
1  b  b  b  .  b  O ( b )
2 3 d d
• Memory Complexity: O(bl )
– Where b is branching factor and l is the
depth limit.
6 Ram Meshulam 2004

Depth-First Iterative-Deepening
0
4,10
1,3,
9
c
5,13
1  b  b  b  .  b  O ( b )
2,6,16 2 3 d d
7,17c 8,20
11 12 14 c
15 18 19 21 22
The numbers represent the order generated by DFID

7 Ram Meshulam 2004
Iterative-Deepening Attributes
• Completeness – Yes
1  b  b  b  .  b  O ( b )
• Optimality – yes, if graph is un-weighted.
• Time Complexity:
2 3 d d
O (( d ) b  ( d  1) b 2  ...  (1) b d )  O ( b d )
• Memory Complexity: O(db)

– Where b is branching factor and d is the maximum
depth of search tree
8 Ram Meshulam 2004
State Redundancies
• Closed list - a hash table which holds the
visited nodes.
• For example BFS:
Closed List
Open List (Frontier)
9 Ram Meshulam 2004

Uniform Cost Search Attributes
• Completeness: yes, for positive weights
• Optimality: yes
• Time & Memory complexity: O(b  c / e
)
– Where b is branching factor, c is the optimal solution
cost and e is the minimum edge cost
10 Ram Meshulam 2004

Best First Search Algorithms
• Principle: Expand node n with the best
evaluation function value f(n).
• Implement via a priority queue
• Algorithms differ with definition of f :
– Greedy Search: f ( n)  h( n)
– A*:
f ( n)  g ( n )  h( n)
– IDA*: iterative deepening version of A*
– Etc’

Best-FS Algorithm Pseudo code
1. Start with open = [initial-state].
2. While open is not empty do
1. Pick the best node on open.
2. If it is the goal node then return with success.
Otherwise find its successors.
3. Assign the successor nodes a score using the
evaluation function and add the scored nodes
to open
General Framework using Closed-
list (Graph-Search)
GraphSearch(Graph graph, Node start, Vector goals)
1. O make_data_structure(start) // open list
2. Cmake_hash_table // closed list
3. While O not empty loop
1. n  O.remove_front()
2. If goal (n) return n
3. If n is found on C  continue
4. //otherwise
5. O  successors (n)
6. Cn
4. Return null //no goal found

Greedy Search Attributes
• Completeness: No. Inaccurate heuristics can
cause loops (unless using a closed list), or
entering an infinite path
• Optimality: No. Inaccurate heuristics can
lead to a non optimal solution. s
1 3
• Time & Memory complexity:
a h=2 h=1 b
m
O (b )
2 g 1
A* Algorithm (1)
• Combines greedy h(n) and uniform cost g(n)
approaches.
• Evaluation function: f(n)=g(n)+h(n)
• Completeness:
– In a finite graph: Yes
– In an infinite graph: if all edge costs are finite and have
a minimum positive value, and all heuristic values are
finite and non-negative.
• Optimality:
– In tree-search: if h(n) is admissible
– In graph-search: if it is also consistent

Heuristic Function h(n)
• Admissible/Underestimate: h(n) never
overestimate the actual cost from n to goal
• Consistent/monotonic (desirable) :
h(m)-h(n) ≤w(n,m) where m is parent of n. This
.ensures f(n) ≥f(m)

A* Algorithm (2)
• optimally efficient: A* expands the
minimal number of nodes possible with any
given (consistent) heuristic.
• Time and space complexity:
– Worst case: Cost function f(n) = g(n)
O(b c / e )
– Best case: Cost function f(n) = g(n) + h*(n)
O(bd )
Duplicate Pruning
• Do not enter the father of the current state
– With or without using closed-list
• Using a closed-list, check the closed list before

entering new nodes to the open list
– Note: in A*, h has to be consistent!
– Do not remove the original check
• Using a stack, check the current branch and

stack status before entering new nodes
IDA* Algorithm
• Each iteration is a depth-first search that
keeps track of the cost evaluation f = g + h
of each node generated.
• The cost threshold is initialized to the
heuristic of the initial state.
• If a node is generated whose cost exceeds
the threshold for that iteration, its path is cut
off.

IDA* Attributes
• The cost threshold increases in each iteration to
the total cost of the lowest-cost node that was
pruned during the previous iteration.
• The algorithm terminates when a goal state is
reached whose total cost does not exceed the
current threshold.
• Completeness and Optimality: Like A*
• Space complexity: O(c)
• Time complexity*:
O(b c / e )
Local Search – Cont.
• In order to avoid local p Algorithm
maximum and p=0 Hill

plateaus we permit Climbing,GSAT
moves to states with p=1 Random Walk
lower values in
p=c (domain Mixed Walk,
probability p. specific) Mixed GSAT
• The different p=acceptor(dh, Simulated
T) Annealing
algorithms differ in p.

Hill Climbing
• Always choose the next best successor
• Stop when no improvement possible
• In order to avoid plateaus and local
maximum:
- Sideways move
- Stochastic hill climbing
- Random-restart algorithm

Simulated Annealing – Pseudo code
Cont.
• Acceptor func.
h
example: 
c t
e
0  c 1
• Schedule func.
c round  startTemp
example:
0<c<1

Search Algorithms Hierarchy

Exercise
• What are the different Queue BFS
data structures used to
implement the open
list in BFS,DFS,Best- Stack DFS
FS:
Priority Best-FS
Queue (Greedy,A*,Unifo
rm-Cost Alg).

Minimax
• Perfect play for deterministic games
• Idea: choose move to position with highest minimax value
= best achievable payoff against best play
• E.g., 2-ply game:

Properties of minimax
• Complete? (=will not run forever) Yes (if tree is finite)
• Optimal? (=will find the optimal response) Yes (against an

optimal opponent)
• Time complexity? O(bm)
• Space complexity? O(bm) (depth-first exploration), O(bm)

for saving the optimal response
For chess, b ≈ 35, m ≈100 for "reasonable" games

 exact solution completely infeasible
•27 Ram Meshulam 2004
α-β pruning example





Planning
• Traditional search methods does not fit to a
large, real world problem
• We want to use general knowledge
• We need general heuristic
• Problem decomposition

STRIPS – Representation
• States and goal – sentences in FOL.
• Operators – are combined of 3 parts:
– Operator name
– Preconditions – a sentence describing the conditions
that must occur so that the operator can be executed.
– Effect – a sentence describing how the world has
change as a result of executing the operator. Has 2
parts:
• Add-list
• Delete-list
– Optionally, a set of (simple) variable constraints

Choosing an attribute
• Idea: a good attribute splits the examples into subsets that
are (ideally) "all positive" or "all negative"
•
• Patrons? is a better choice

Using information theory
• To implement Choose-Attribute in the DTL
algorithm
• Information Content of an answer (Entropy):
I(P(v1), … , P(vn)) = Σi=1 -P(vi) log2 P(vi)
• For a training set containing p positive examples
and n negative examples:
p n p p n n
I( , ) log2  log2
pn pn pn pn pn pn

Information gain
• A chosen attribute A divides the training set E into subsets
E1, … , Ev according to their values for A, where A has v
distinct values.
v
p i  ni pi ni
remainder ( A)   I( , )
i 1 p  n pi  ni pi  ni
• Information Gain (IG) or reduction in entropy from the
attribute test:
p n
IG( A)  I ( , )  remainder ( A)
pn pn
• Choose the attribute with the largest IG
Information gain
For the training set, p = n = 6, I(6/12, 6/12) = 1 bit
Consider the attributes Patrons and Type (and others too):
2 4 6 2 4
IG ( Patrons)  1  [ I (0,1)  I (1,0)  I ( , )]  .0541 bits
12 12 12 6 6
2 1 1 2 1 1 4 2 2 4 2 2
IG (Type )  1  [ I ( , )  I ( , )  I ( , )  I ( , )]  0 bits
12 2 2 12 2 2 12 4 4 12 4 4
Patrons has the highest IG of all attributes and so is chosen by the DTL
algorithm as the root

Bayes’ Rule
P(B|A) = P(A|B)*P(B)
P(A)
39
:Computing the denominator
#1 approach - compute relative likelihoods:
• If M (meningitis) and W(whiplash) are two possible
explanations
#2 approach - Using M & ~M:
• Checking the probability of M, ~M when S
– P(M|S) = P(S| M) * P(M) / P(S)
– P(~M|S) = P(S| ~M) * P(~M)/ P(S)
• P(M|S) + P(~M | S) = 1 (must sum to 1)
40
Perceptrons
• Linear separability
– A set of (2D) patterns (x1, x2) of two classes is linearly
separable if there exists a line on the (x1, x2) plane
• w0 + w1 x1 + w2 x2 = 0
• Separates all patterns of one class from the other class
– A perceptron can be built with
• 3 input x0 = 1, x1, x2 with weights w0, w1, w2
– n dimensional patterns (x1,…, xn)
• Hyperplane w0 + w1 x1 + w2 x2 +…+ wn xn = 0 dividing the
space into two regions

Backpropagation example
w13
x1 x3 w35
w14
w23
x5
x2 w24 x4 w45
Sigmoid as activation function with x=3:

• g(in) = 1/(1+℮-3·in)
• g’(in) = 3g(in)(1-g(in))

Adding the threshold
1 1
x0 w03
x6
w04 w65
w13
x1 x3 w35
w14
x5
w23
x2 w24 x4 w45

Training Set
• Logical XOR (exclusive OR) function
x1 x2 output
0 0 0
0 1 1
1 0 1
1 1 0
• Choose random weights

• <w03,w04,w13,w14,w23,w24,w65,w35,w45> =
<0.03,0.04,0.13,0.14,-0.23,-0.24,0.65,0.35,0.45>
• Learning rate: 0.1 for the hidden layers, 0.3 for the output layer

First Example
• Compute the outputs
• a0 = 1 , a1= 0 , a2 = 0
• a3 = g(1*0.03 + 0*0.13 + 0*-0.23) = 0.522
• a4 = g(1*0.04 + 0*0.14 + 0*-0.24) = 0.530
• a6 = 1, a5 = g(0.65*1 + 0.35*0.522 + 0.45*0.530) = 0.961
• Calculate ∆5 = 3*g(1.0712)*(1-g(1.0712))*(0-0.961) = -0.108
• Calculate ∆6, ∆3, ∆4
• ∆6 = 3*g(1)*(1-g(1))*(0.65*-0.108) = -0.010
• ∆3 = 3*g(0.03)*(1-g(0.03))*(0.35*-0.108) = -0.028
• ∆4 = 3*g(0.04)*(1-g(0.04))*(0.45*-0.108) = -0.036
• Update weights for the output layer
• w65 = 0.65 + 0.3*1*-0.108 = 0.618
• w35 = 0.35 + 0.3*0.522*-0.108 = 0.333
• w45 = 0.45 + 0.3*0.530*-0.108 = 0.433
First Example (cont)
• ∆0 = 3*g(1)*(1-g(1))*(0.03*-0.028 + 0.04*-0.036) = -0.001
• ∆1 = 3*g(0)*(1-g(0))*(0.13*-0.028 + 0.14*-0.036) = -0.006
• ∆2 = 3*g(0)*(1-g(0))*(-0.23*-0.028 + -0.24*-0.036) = 0.011
• Update weights for the hidden layer
• w03 = 0.03 + 0.1*1*-0.028 = 0.027
• w04 = 0.04 + 0.1*1*-0.036 = 0.036
• w13 = 0.13 + 0.1*0*-0.028 = 0.13
• w14 = 0.14 + 0.1*0*-0.036 = 0.14
• w23 = -0.23 + 0.1*0*-0.028 = -0.23
• w24 = -0.24 + 0.1*0*-0.036 = -0.24

Second Example
• Compute the outputs
• a0 = 1, a1= 0 , a2 = 1
• a3 = g(1*0.027 + 0*0.13 + 1*-0.23) = 0.352
• a4 = g(1*0.036 + 0*0.14 + 1*-0.24) = 0.352
• a6 = 1, a5 = g(0.618*1 + 0.333*0.352 + 0.433*0.352) = 0.935
• Calculate ∆1 = 3*g(0.888)*(1-g(0.888))*(1-0.935) = 0.012
• ∆6 = 3*g(1)*(1-g(1))*(0.618*0.012) = 0.001
• ∆3 = 3*g(-0.203)*(1-g(-0.203))*(0.333*0.012) = 0.003
• ∆4 = 3*g(-0.204)*(1-g(-0.204))*(0.433*0.012) = 0.004
• Update weights for the output layer
• w65 = 0.618 + 0.3*1*0.012 = 0.623
• w35 = 0.333 + 0.3*0.352*0.012 = 0.334
• w45 = 0.433 + 0.3*0.352*0.012 = 0.434
Second Example (cont)
• Skipped, we do not use them
• Update weights for the hidden layer
• w03 = 0.027 + 0.1*1*0.003 = 0.027
• w04 = 0.036 + 0.1*1*0.004 = 0.036
• w13 = 0.13 + 0.1*0*0.003 = 0.13
• w14 = 0.14 + 0.1*0*0.004 = 0.14
• w23 = -0.23 + 0.1*1*0.003 = -0.23
• w24 = -0.24 + 0.1*1*0.004 = -0.24

Bayesian networks
• Syntax:
– a set of nodes, one per variable
– a directed, acyclic graph (link ≈ "directly influences")
– a conditional distribution for each node given its parents:
P (Xi | Parents (Xi))- conditional probability table (CPT)

Calculation of Joint Probability
• Given its parents, each node is conditionally independent
of everything except its descendants
• Thus,
P(x1x2…xn) = Pi=1,…,nP(xi|parents(Xi))
 full joint distribution table
• Every BN over a domain implicitly represents some joint
distribution over that domain

Connection Types
X ind. Z, given Y? X ind. Z? Diagram Name
Yes Not necessarily Causal chain
Yes No Common Cause
No Yes Common Effect

Reachability (the Bayes Ball)
• Shade evidence nodes
• Start at source node
• Try to reach target by search
• States: node, along with previous arc
• Successor function:
– Unobserved nodes:
• To any child of X
• To any parent of X if coming from a child
– Observed nodes:
• From parent of X to parent of X
• If you can’t reach a node, it’s conditionally independent of
the start node
Naive Bayes Classifiers
Task: Classify a new instance D based on a tuple of attribute values into
one of the classes cj  C
D  x1 , x2 ,  , xn
cMAP  argmax P(c | x1 , x2 ,  , xn )

cC
P ( x1 , x2 ,  , xn | c) P (c)
 argmax
c C P ( x1 , x2 ,  , xn )
 argmax P ( x1 , x2 ,  , xn | c) P(c)
c C
53 CIS 391- Intro to AI

Robots Environment Assumptions
• Static - to be able to guarantee completeness
• Inaccessible - greater impact on the on-line version
• Non-deterministic (move 5M, but able to move 5.1M)
• Continuous
– Exact cellular decomposition
– Approximate cellular decomposition
54
MSTC- Multi Robot Spanning Tree
Coverage
• Complete - with approximate cellular decomposition
• Robust
– Coverage completed as long as one robot is alive
– The robustness mechanism is simple
• Off-line and On-line algorithms
– Off-line:
o Analysis according to initial positions
o Efficiency improvements
– On-line:
o Implemented on simulation of real-robots
55
Off-line Coverage, Basic Assumptions
• Area division – n cells
• k homogenous robots
• Equal associated tool size
• Robots movement
56
STC: Spanning Tree Coverage
(Gabrieli and Rimon 2001)
• Area division
• Graph definition
• Building the spanning tree
57
Non-backtracking MSTC
• Initialization phase: Build STC, distribute to robots
• Distributed execution: Each robot follows its section
– Low risk of collisions
C
!Robot B is done
!Robot A is done
B
!Robot C is done
58
Backtracking MSTC
• Similar initialization phase
• Robots backtrack to assist others
• No point is covered more than twice
D
C
B
A
59

Artificial Intelligence: Rehearsal Lesson

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Artificial Intelligence: Rehearsal Lesson

Uploaded by

Copyright:

Available Formats

Artificial Intelligence

1 Ram Meshulam 2004

2 Ram Meshulam 2004

3 Ram Meshulam 2004

4 Ram Meshulam 2004

6 Ram Meshulam 2004

The numbers represent the order generated by DFID

• Memory Complexity: O(db)

Open List (Frontier)

9 Ram Meshulam 2004

10 Ram Meshulam 2004

11 Ram Meshulam 2004

13 Ram Meshulam 2004

15 Ram Meshulam 2004

16 Ram Meshulam 2004

• Using a closed-list, check the closed list before

• Using a stack, check the current branch and

19 Ram Meshulam 2004

maximum and p=0 Hill

21 Ram Meshulam 2004

22 Ram Meshulam 2004

23 Ram Meshulam 2004

24 Ram Meshulam 2004

25 Ram Meshulam 2004

26 Ram Meshulam 2004

• Optimal? (=will find the optimal response) Yes (against an

• Space complexity? O(bm) (depth-first exploration), O(bm)

For chess, b ≈ 35, m ≈100 for "reasonable" games

28 Ram Meshulam 2004

29 Ram Meshulam 2004

30 Ram Meshulam 2004

31 Ram Meshulam 2004

32 Ram Meshulam 2004

33 Ram Meshulam 2004

34 Ram Meshulam 2004

• Patrons? is a better choice

35 Ram Meshulam 2004

36 Ram Meshulam 2004

Consider the attributes Patrons and Type (and others too):

38 Ram Meshulam 2004

41 Ram Meshulam 2004

Sigmoid as activation function with x=3:

42 Ram Meshulam 2004

43 Ram Meshulam 2004

• Choose random weights

44 Ram Meshulam 2004

46 Ram Meshulam 2004

48 Ram Meshulam 2004

49 Ram Meshulam 2004

50 Ram Meshulam 2004

Yes Not necessarily Causal chain

Yes No Common Cause

No Yes Common Effect

51 Ram Meshulam 2004

cMAP  argmax P(c | x1 , x2 ,  , xn )

53 CIS 391- Intro to AI

You might also like