Artificial Intelligence Final Exam May 9, 2012

Problem 1: 15 points
The following problem is known as the BETWEENNESS problem. You are given a universal set U of elements and a collection of constraints of the form Between(x,y,z) where x,y,z are specified elements of U , meaning that either x < y < z or x > y > z. The problem is to find an linear ordering of the elements satisfying all the constraints (or, as a decision problem, determine whether such an ordering exists). The problem is NP-complete. For instance, suppose that U is the set {a,b,c,d,e,f,g,h} and you are given the following constraints; Between(d,a,h) Between(f,c,g) Between(a,b,c) Between(a,h,f) Between(h,g,d) Between(a,c,h) Between(a,e,f) Between(f,h,c) One solution for these constraints is the ordering d,a,g,b,c,h,e,f . A. Describe a state space that would be suitable for an exhaustive blind search. Your description should specify (a) what a state is; (b) what are the successors to a given state; (c) what is the start state; (d) how are goal states recognized. Answer: State: The first k items in the list. Successors: Add the k + 1st item, in such a way that none of the Between constraints are violated. Start state: The null list. Goal state: All the items are in the list, with all the constraints satisfied. B. In the state space in (A), what is the depth of the space? What is the branching factor? Is the space a tree? Which search strategy would be most suitable: depth-first search, breadth-first search, or iterative deepening? Answer: Depth: k. Branching factor: k. Tree? Yes. Strategy: DFS. C. The BETWEENNESS problem cover can be transformed into a satisfiability problem, using propositional atoms of the form “Element-Place” meaning that the specified element is in the specified place. In the above example, there would be 64 atoms a1 ...a8, b1 ...h8 and the valuation corresponding to the above solution would be d1=TRUE, a2=TRUE ...f8=TRUE and the other atoms are FALSE. Describe the constraints over these atoms needed to solve the problem. Your description should apply to any problem instance, not just to the one illustrated. For each category of constraint, give one instance of the constraint from the above example (or describe the instance — in some formulations, the constraints get very long.) Your formulation should generate a set of propositional sentences that is polynomial in the size of the problem; however, the sentences do not have to be in conjunctive normal form. 1

Example: a1 ∨ a2 ∨ .y. ~W V ~P. assert that it is at one of the position. For each element x.z) for each pair of indices i. Answer: 1. assert ¬(xi ∧ xj). For each element x and pair of places i < j. Try P=TRUE. (3) and (6).Answer: Category 1: Every element is somewhere in the list. Convert these to CNF. assert ¬(xi ∧ yi). Give a trace of the execution of the Davis-Putnam algorithm. Category 2: No element is at two places in the list. ¬X⇒(Q ∨ W). 6.. Example: The constraint Between(d. delete ~P from (2). X V Q V W W V ~X V R. empty valuation. assert that. ~P V ~Q. Problem 2: 10 points Consider the following set of propositional formulas: P ∨ Q. Category 4: The betweenness constraints. (¬W ∧ X)⇒R. and try “TRUE” before “FALSE”. State S1: 2 . P⇒¬(Q ∨ R). When a choice point is reached. Example: ¬(a1 ∧ a2) Category 3: No place has two elements. For each betweenness constraint Between(x. Example: ¬(a1 ∧ b1).h) gives rise to (among others) the sentence (d7 ∧ h3)⇒(a4 ∨ a5 ∨ a6). if xi and zk then y is at one of the positions between i and k.a. P V Q. B. Answer: State S0: Full set of clauses above. 3. Delete (1). ~P V ~R. k that are at least 2 apart. A. No easy operations.∨ a8. 5. 2. For each place i and each pair of elements x = y.. W⇒¬P. 4. choose the first unbound atom alphabetically.

(6). R. Solution: P=FALSE. X V Q V W 5. 3.S --. and W from (4) and (5) 4. Set X=TRUE. delete Q. ~R 4. Pamela. p. Sam only gives people a book if he has read it. Delete (2). Q=TRUE. X 5. ~Q 3.P. W V ~X V R. iii. State S2 1. X and R are arbitrary. and 6 are singleton clauses. b).p) — book b is a biography of person p.p2 are friends. G(p1. ii. 6 ~W. R=FALSE.p2) — persons p1. 2. Q and W are pure literals.Anne. Let L be a first-order language of U with the following symbols: B(b. Delete (1). (4). delete P from (1). ~X 4 is a singleton clause. A. in S0. R(p. 5 is the null clause. W V ~X V R. Delete ~X from 5. (5) All satisfied. Set Q=FALSE.C. F(p1. Answer: ∀b. b) ∧ W(A.p2. (3). (6). Sam gave a copy of a book by Anne to Pamela. b)⇒R(S. Q 4. W=FALSE.2.b) — person p has read book b. Delete (2). W=TRUE.b) — person p1 gave a copy of book b to person p2. Express the following in L: i. P. The only books that Anne has written are biographies of friends of Charles. W(p. Charles. Sam A. Problem 3: 20 points Let U be a domain of people and books. X V Q V W 5. W=TRUE. Answer: ∃b G(S. Try P=FALSE. (3). 3 . Set Q=TRUE.b) — person p wrote book b.p G(S. Backtrack to the last choice. b).

p. Answer: ∃b. C). Resolving (9) with (10) gives 11. Resolving 8 with 11 gives the null clause.C).sk2(sk1)). p) ∧ F(p. ¬W(A. b) ∨ R(S. G(S. p). Resolving (1) with (2) gives 7. C). 4 . p) ∨ ¬F(p. C). B(sk1. b) ∨ F(sk2(b). Show how (iv) can be proven from (i-iii) using resolution theorem proving. ¬G(S. Your answer should show the clauses generated and the resolutions that lead to the solution. iv. b) ∧ B(b.P. b) ∨ ¬B(b.sk1). Problem 4: 15 points Consider the following Bayesian network: P Q R S W A. ¬R(S. 6. P and W are conditionally independent given S. Resolving (2) with (4) gives 8.sk1). P and W are absolutely independent. B. b)⇒∃p F(p. C).sk1). Sam has read some biography of a friend of Charles. You need not show the intermediate steps of Skolemization or the substeps of the resolution process (unification etc. sk2(b)).) Answer: Skolemizing (i-iii) and the negation of (iv) gives the following clauses: 1. F(sk2(sk1). S and R are absolutely independent. False. 5. ¬B(sk1. C) ∧ B(b.Answer: ∀b W(A. C). W(A. R(S. b) ∨ B(b. Resolving (7) with (6) gives 10.p R(S. ¬W(A. ¬F(sk2(sk1). Which of the following statements are true (more than one may be true): i. 4. True ii. 2. p) ∨ ¬F(p. Resolving (2) with (5) gives 9. 3. b). True iii.

Suppose that all the variables take 5 values. Suppose that all the random variables are Boolean. False vii.iv. False v. vi. all attributes are Boolean. 4 at S. Q and W are absolutely independent. 4*25=100 at R. B. B. True. S and R are conditionally independent given Q and W. I am not interested in your ability to do arithmetic. Give an expression for Prob(P=F|Q=F) in terms of quantities that are recorded in the Bayesian network.Y=F. How does the Naive Bayes classifier predict the value of C for W=F. Note that the instances with a null value for attribute A are discounted in the calculation for A. Answer: Prob(Q=F|P=F) = Prob(Q=F.Y=F. Q and W are conditionally independent given S. 4*25=100 at Q.P=F)*Prob(S=F) Problem 5: 10 points Consider a classification problem where W. True. 4*5=20 at W. W T T T T F F F Y T T F F T F ⊥ Z T T F ⊥ F F F C T F T F F T F number of instances 5 8 2 3 6 1 2 The value ⊥ is the null value.S=T|P=F) + Prob(Q=F. S and R are conditionally independent given Q. Answer: Using Naive Bayes.S=F|P=F) = Prob(Q=F|S=T. Total: 228. Y. Y=F.P=F)*Prob(S=T|P=F) + Prob(Q=F|S=F.P=F)*Prob(S=T) + Prob(Q=F|S=F.P=F)*Prob(S=F|P=F) = Prob(Q=F|S=T.Z=T) = P(W=F|C=F) · P(Y=F|C=F) · P(Z=T|C=F) · P(C=F) = (8/19) · (3/17) · (8/16) · (19/27). P(C=F | W=F. but are included in the calculations for the other attributes. P(C=T | W=F. 5 . How many numeric values are recorded in the network? Use the fact that the probabilities of the outcome of a random variable on a fixed condition add up to 1 to achieve as compact a representation as possible. and you have the following data set. Z=T? You can leave your answer as an unevaluated arithmetic expression.Z=T) = P(W=F|C=T) · P(Y=F|C=T) · P(Z=T|C=T) · P(C=T) = (7/8) · (3/8) · (5/8) · (8/27). and Z are the predictive attributes. Answer: 4 at P. C is the classification attribute.

x} and {y. What is a state in this state space? Answer: A state is a set of centers. All the attributes are Boolean and the learning algorithm is Naive Bayes. · P (Bk |C = T ) · P (C = T ) to P (A|C = F ) · P (B1 |C = F ) · . . The solution with clusters {w. does adding the Bi ’s significantly degrade the classifier? Case A. All the attributes are Boolean. so they do not lower the average entropy. and then learning a classifier for C based on A and all the Bi ’s. As it happens. · P (Bk |C = F ) · P (C = F ). All the attributes are numeric. there is no difference at all between just using A and using all the attributes. All the Bi ’s and C are both absolutely independent and conditionally independent given A. So the added factors are the same in both products. z}. Assume that ID3 is implemented so that. if you use only A. . Answer: In Naive Bayes. So in either case. sn are the data points. the k-means algorithm can be viewed as doing a form of hill-climbing n search. B. but is unlikely to change the outcome. whereas B1 . If your answer was that it was a failing of the search strategy. Adding the additional attributes adds a certain amount of noise. it is the case that P (Bi |C = T ) ≈ P (Bi |C = F ). C. Since each of the Bi ’s is independent of C.Problem 6: 15 points Suppose that you are trying to do classification learning from a labelled data set. which are useless. if no remaining attribute gives rise to a significant reduction in average entropy. then you compare P (A|C = T ) · P (B1 |C = T ) · . Once you have split on A. B1 . consisting of a single node testing on A. . Problem 7: 15 points A. the algorithm returns the identical decision tree. and the learning algorithm is ID3. The question then is. the 2-means clustering algorithm returns the pair of clusters {w. . none of the Bi adds more information. Case C. A is a very good predictor of C. There are predictive attributes A. . propose a more effective search strategy. If you use all the attributes. including the Bi ’s in the calculation turns the classifier into a practically random choice. and the learning algorithm is nearest neighbors. One can imagine first learning a classifier for C just based on A. The degradation is severe. then you compare P (A|C = T ) · P (C = T ) to P (A|C = F ) · P (C = F ). Answer: The top-level split will be on A. C(si ))2 where s1 . . z} has a lower value of the objective function. . 6 . given a data set (solid dots) and a starting pair of locations (stars) as shown below. propose a better objective function. Bk are entirely irrelevant. Thus. Answer: The distances from one point to another will be largely determined by the differences in the B dimensions. As discussed in class. . y} and {x. . Bk plus the classification attribute C. where the objective function is i=1 d(si . As discussed in class. . Therefore. and C(si ) is the location of the center of the cluster containing si . Does this non-intuitive result illustrate a failing of the objective function or of the search strategy? Answer: This is a failure of the search strategy. If your answer in (B) was that this is a failing of the objective function. for various learning algorithms. Assume that the data set contains a large number of instances. Case B. no split is carried out.

E. Is the unintuitive answer in (D) a failing of the objective function or of the search strategy? Answer: A failing of the objective function.w y z x Answer: Random restart. this clustering does actually minimize the objective function. Answer: The five loosely scattered points just to the right of the line are in fact closer to the center of the right cluster than to the center of the left cluster. as shown.) Explain why the algorithm gives this division. (This was eyeballed. you will probably get a division of the data into the left-hand and right-hand sets at the dashed line. rather than the intuitively natural one. If you apply the 2-means clustering algorithm to a data set like that shown below. it is not exact. D. 7 . but k-means does not take that into account. The intuition is that the right cluster has a large diameter and that the left cluster has a small diameter.

Sign up to vote on this title
UsefulNot useful