NIBM MBA Guide to Artificial Intelligence Part I

National Institute of
Business Management
Master of Business
Administration (MBA)
Artificial Intelligence
(Part I)
CONTENTS
Chapter TITLE Page No.
I INTRODUCTION 5
II ARTIFICIAL INTELLIGENCE 32
III THE HISTORY OF ARTIFICIAL INTELLIGENCE 34
IV FRONTIERS OF ARTIFICIAL INTELLIGENCE 44
V PHILOSOPHY OF ARTIFICIAL INTELLIGENCE 52
VI THE LAW FOR ARTIFICIAL INTELLIGENCE 57
VII THE IMPACT OF ARTIFICIAL INTELLIGENCE ON HUMAN 67

LABOUR
VIII REASONING 74
IX LEARNING 88
X SUPERVISED LEARNING 103
XI UNSUPERVISED LEARNING 124

CHAPTER 1
INTRODUCTION
Intelligence is the ability to learn and cope or the ability to contemplate, think, and
reason. Human intelligence is the most complex entity forever, when we consider
intelligence. The intelligence behaviour of human brain results today's world is the result of
intelligence behaviour of human brain.
Artificial intelligence is a branch of computer science that studies how to program
computers to exhibit apparently intelligent behaviour. According to Riche, artificial
intelligence is the study of how to make computers do things which, at a moment, people do
better.
Artificial intelligence (AI) is the science and engineering of making intelligent
machines, and especially intelligent computer programs. Since ancient times, human beings
have been fascinated with the idea of creating intelligence machines. With the arrival of
much efficient computers, combined with over 50 years worth of research in artificial
intelligence programming techniques, this dream of ours is fast becoming a reality.
Researchers are modelling systems that can imitate human thought, understand speech, beat
a chess player, and do countless other feats never before possible. Many areas of science
have contributed to the subject of artificial intelligence such as the combination of computer
science, physiology, and philosophy.
According to McCarthy, It (AI) is the science and engineering of making intelligent
machines, especially intelligent computer programs. It is related to the similar task of using
computers to understand human intelligence, but artificial intelligence does not have to
confine itself to the methods that are biologically observable. This definition includes the
study of human intelligence through the assembly of artificial intelligence systems. The
word "artificial" is in the definition because artificial intelligence is about studying and
constructing man-made intelligence. This can help in our understanding of human
intelligence. Artificial intelligence is not necessarily about replicating human intelligence.
Problems of Artificial Intelligence
Almost all the problems stated in artificial intelligence commonly use the term 'state'.
It defines the state of the solution for the given problem at that particular step. In short, we
can summarise the solution of a problem by a collection of the problem states. The problem
solving procedure used is, to apply an operator to a state to get the next state. Then another
operator is applied to the new state to get the next. The process of deriving a new state from
the current state by applying the operator is continued till the desired state is reached. This
form of problem solving is termed as the 'state space' approach.
The subject area of artificial intelligence spans a broad prospect. The study of
artificial intelligence involves the various knowledge representation schemes, intelligent
5
search methods, techniques for automating machine learning and much more. The various
application fields include expert systems, image recognition, game-playing, theorem-
proving, natural language processing, robotics and many other disciplines. Artificial
intelligence is spread over to a wide discipline of knowledge such as Cognitive Science,
Computer Science, Mathematics, Philosophy, Psychology and Engineering.
What are artificial intelligence problems? How are artificial intelligence problems
different from others?
The answers can be summarised as follows:
1. If a solution to the problem needs symbolic representation in computers, then we can
consider it as an artificial intelligence problem.
Example: Understanding of English sentences by a computer when they are
represented in the symbolic form.
2. If there is a combinatorial explosion in outputting the result of a traditional problem,
then we can consider such problem as artificial intelligence problem.
Example 1 (8-Queens problem): Given a chess board of size 8 x 8, we have to place
8 queens on the chess board such that no queens contradict each other. There are 96
possible solutions.
Example 2 (The travelling salesman problem): A salesman needs to visit N cities.
He starts from one place and reaches back to the starting place, in such a way that
the distance travelled or the cost incurred is minimum. Note that as N increases, the
number of possibilities for finding the solution will change.
3. It is hard to characterize data in artificial intelligence problems. Adjectives like
white, beautiful, etc., cannot be quantified. So, we use fuzzy set theory to solve such
quantification.
4. The knowledge base of an artificial intelligence problem is voluminous. This size
increase happens by the change of knowledge from time to time by adapting new
knowledge.
5. The data or knowledge base is changing fast. The size of the knowledge base
increases day by day. This is achieved by learning process.
6. In doing dirty jobs, humans get fatigue whereas computers can perform such jobs
without any tiredness. For example, an ATM (automatic teller machine) of a bank
performs jobs as a clerk cum cashier for the entire day.
These type of questions has been focused in artificial intelligence. These are widespread in
the area of science and engineering discipline.
What are the characteristics of artificial intelligence problem? (How will you analyse
a problem?)
1. Is the problem decomposable or not?
6
Suppose we have to find whether the given problem is decomposable. The problem is
decomposable if we can find and add the solutions to give the solution of the given
problem.
2. Can the solution steps be ignored? In theorem-proving, the solution steps can be ignored.
In bridge or chess, the solution steps are not ignorable. Also, they are not recoverable.
But in 8-puzzle problem, the solution steps are recoverable.
3. Is the universe predictable?
In 8-puzzle problem or any scientific problem, the universe (all in the state space
representation) is certain whereas in playing cards, the outcome is not certain, that is, the
universe is not predictable.
4. Is the solution to the goal absolute or relative?
In 8-puzzle problem, any path leading to the solution may be taken since the aim is to
obtain goal state, and here the solution to the goal is absolute. In travelling salesman
problem, we have to specify the path rather than the goal. Here the solution to the goal is
relative.
5. Is the knowledge base consistent or not? If a knowledge base contains some fact A and
¬A, then it is not consistent.
6. The role of knowledge.
In 8-puzzle, chess, etc., we get a solution with the help of a knowledge base. In the
election of a politician, collecting knowledge base from newspapers or from any other
resources will not be helpful to decide his winning or losing. So, in this context, the
knowledge base does not play any role.
7. Is the interaction with the computer necessary?
Some problems may need human interaction. For example, playing computer chess,
computer bridge, etc. In solving scientific problems through programs, human interaction
is not necessary.
Topics of Artificial intelligence
The subject of artificial intelligence originated from game playing and theorem
proving programs. From a number of parent disciplines, it gradually enriched with theories
and solutions. Being a new discipline of science, the importance of topics covered under the
subject changes very much with time. Artificial intelligence covers the areas in industrial
applications including process plants. Many subsystems are included in this area. Major
topics, which we find important and valuable to understand the subject, are outlined below.
Learning Systems
Learning systems need special mention among the subject of artificial intelligence.
learning is any process by which a system improves performance from experience'. We can
lustrate the concept of learning by taking reference to a natural problem.
7
There are many types of learning methods, which can adapt knowledge in different
ways. Fe supervised learning problems, artificial neural networks (ANN) are gaining
importance. ANN represents the electrical analogue system of the biological nervous
system. Inductive learning and analogy-based learning are also other common learning
methods. For example, pigeon flies, parrot flies, eagle flies and the system learns a bird flies.
This is what is called inductive learning. On the other hand, learning the fact that the motion
of electrons in an atom is similar to the movement of planets in solar system is an example
of analogue-based learning.
Knowledge Representation and Reasoning

Knowledge is an expertise and skill acquired by people which is very hard to
represent in computers. The knowledge base is similar to a database which is necessary for
expert system building and reasoning. In reasoning problems, one has to achieve a pre-
defined target state from one or more given start states. Therefore, less number of transitions
gives high efficiency in the reasoning system to reach the goal state. To increase the
efficiency of a reasoning system requires the minimisation of intermediate states, which
ultimately calls for an organized and complete knowledge base. A minimum number of
search attempts is needed for a complete and organized storehouse of knowledge to identify
proper knowledge at a given problem state. This yields the right next state on a leading edge
of the problem-solving process. Therefore, the organization of knowledge is of supreme
importance in knowledge engineering. A variety of knowledge representation techniques are
used in artificial intelligence such as production rules, predicate logic, frames, filler and slots
and semantic nets. For a particular type of representational schema, the selection of
knowledge depends both on the nature of applications and the choice of users. The testing
of satisfiability of a goal from a given set of data and knowledge is the main concern of
reasoning problem.
Planning
Planning is a significant area in artificial intelligence. Though the problems of
reasoning and planning seemingly share many similar issues, they have a basic difference in
the definitions. Then again, problem planning involves the determination of methodology
from which a successful result can be achieved from the known initial states. Extensive
applications in robotics and navigational problems are key components of automated
planning. Planning is focused in the area of decomposition of an original problem into
appropriate subunits and uses interaction on these subunits during the problem solving
process. The major components of a planning system include heuristic functions, set of rules
in the problem state and a deduction process. The deduction process will find a solution to
a problem when it has found a sequence of operators that travel from an initial state to the
goal state.
8
Knowledge Acquisition
Acquisition, or 'elicitation', of knowledge is uniformly as hard for human beings as
it is for a computer machine. Knowledge acquisition includes the making of new pieces of
knowledge from a given knowledge network, setting active structures for existing
knowledge, learning knowledge from the environment and fine tuning of knowledge. Using
machine learning approaches, automated acquisition of knowledge is possible and is a
dynamic area of current research in artificial intelligence. A set of knowledge string
mechanisms are required to store the acquired knowledge. Frames, conceptual dependency,
scripts, semantic nets, etc., are such data structures for knowledge store. The captured
knowledge is stored in these frameworks accordingly. Each of these data structure is useful
to the corresponding
knowledge representation scheme.
Intelligent Search
The search problems encountered in computer science are of a deterministic nature.

This means, the order of visiting elements of the search space is known. For example, in our
basic graph traversal algorithms such as DFS and BFS, one knows the node sequence based
on its visiting in a tree. The search problems in artificial intelligence are non-deterministic
and the order of visiting the elements in the search space is completely dependent on the
given data sets. There are a variety of search techniques used for mining databases depending
on its usage. A*, best-first search, greedy search, depth first search, breadth first search, etc.,
are some examples.
Logic Programing
Logic symbols have grown up with respect to their use. Mathematicians are used to
designing various tools for representing logical statements by symbolic operators.
Propotional logic is such an attempt, which deals with a set of binary statements
(propositions) associated by Boolean operators. To handle complex situations of the real
world, the logic of propositions is regularly enriched. It is termed as predicate logic or first-
order logic. A logic-based programming using predicate logic is a classical example for
computer language. PRO LOG (Programming in LOGic) is a computer language that
supports logic programs. It was widely used for artificial intelligence programming. Logic
programming is one of the prime areas of research in artificial intelligence that covers
knowledge representation structures. The eventual aim of this research is to expand the
PROLOG compiler to handle spatial temporal models and maintain a parallel programming
environment. LISP (LISt Processor) is another artificial intelligence programming language
that can handle strings. These languages are old languages in programming sense because
the present language shells can perform much better than the above stated languages
Soft Computing
Soft computing is a popular problem solving technique that made an effort to mimic
the intelligence found in nature. Statistical learning, fuzzy logic methods, genetic or
9
evolutionary algorithms, fractals, probabilistic reasoning and chaos theories are the most
important constituents of soft computing. Most of the soft computing techniques are inspired
from nature.
The human brain can process millions of data, learn from experience, generalize
from learning rules, recognize patterns and make decisions. It is in effect a good engineering
model that performs these tasks as well as it can use heuristic approximations, low precisions
or less generality, depending on the problem to be solved. We want to transfer some of these
abilities into our designed model, algorithms, machines and artificial intelligence systems,
that is, to solve a given task based on previous experience with reasonable accuracy at
reasonable cost and time.
The guiding principle of soft computing is, 'Exploit tolerance for imprecision,
uncertainty and partial truth to achieve tractability, robustness and low solution cost. In
general, soft computing is a collection of computing techniques and tools, shared by closely
related disciplines that comprise fuzzy logic, artificial neural nets, genetic algorithms, belief
calculus, inductive logic programming and so on. These tools are used separately as well as
jointly depending on the type of the domain of applications.
Management of Imprecision and Uncertainty
Data and knowledge bases in many typical artificial intelligence problems, such as
reasoning and planning, are often contaminated with various forms of incompleteness. The
incompleteness in data and knowledge bases is always a problem for reasoning and planning
in artificial intelligence problems. This incompleteness of data can be called as imprecision.
Imprecision generally occurs in database due to the lack of appropriate data and poor
authenticity level of sources. It is an uncertainty when the incompleteness of knowledge is
referred that originates in the knowledge base due to the lack of certainty in the pieces of
knowledge. A complex problem in reasoning is the presence of imprecision of data and
uncertainty of knowledge. There are numerous techniques and tools developed for reasoning
under incomplete data and knowledge. Some of the techniques employed are fuzzy,
stochastic and belief network models. In every problem, the system always undergoes the
change of states.
Timelines of Artificial Intelligence
The passionate period
The period from the mid 1960s to the mid 1970s is termed as 'the romantic period'.
The romantic period of artificial intelligence is known to be 'machine understanding'. During
this period, the major activity was to make the machine 'understand', by which it usually
meant the understanding of natural languages.
The modern period
The modern period of artificial intelligence starts from the later half of the 1970s to
the present day. More complex practical problems were solved and even were tried to solve
in this period. Newtonian machine is an expert system that deals with real-life problems. To
solve real world problems, researchers are also engaged in theoretical research on artificial
10
intelligence. including heuristic search, uncertainty modelling, machine learning and various
reasoning. Summarising, this artificial intelligence period includes research on both, various
theories and practical aspects.
These periods summarise roughly the history of artificial intelligence. Hence, we are
able to understand artificial intelligence history by periods and its evolution. The summary
of the incidents and milestones in artificial intelligence are as follows:
1. 1917: Karel Capek suggested a new verbal term 'robot' (in Czech 'robot' means
'worker').
2. 1928: John von Neumann suggested MINIMAX theorem, which is used in some
game playing programs.
3. 1943: Neural network architecture for intelligence was proposed by McCulloch and
Pitt.
4. 1950: Isaac Asimov proposed 'I, Robot'. In the same year, Shannon proposed chess
program. Alan Turing proposed the Turing test in his paper 'Computing Machinery
and Intelligence'.
5. 1954: Isaac Asimov proposed 'The Caves of Steel'. This was the first Robot science
fiction.
6. 1955: Newell, Shaw and Simon developed 'IPL-11', the first artificial intelligence
language.
7. 1956: Newell, Shaw and Simon created 'The Logic Theorist', a program that solves
mathematical problems. In this year, the demonstration of the first running artificial
intelligence program, the Logic Theorist (LT) written by Allen Newell, J.C. Shaw
and Herbert Simon was done at Carnegie Mellon University (CMU). Also, the first
commercial robot, from the Unimation company founded by George Devol and
Joseph Engel berger, based on Devol's patents was invented.
8. 1957: Newell, Shaw and Simon proposed the general problem solver (CPS) means-
ends analysis. Norm Chomsky wrote 'Syntactic Structures' for language processing.
9. 1952-62: First checkers game program was written by Arthur Samuel at IBM. This
was a major achievement for the sufficient skill that challenges a world champion.
10. 1958: John McCarthy at MIT invented the LISP language. Another achievement is
a theorem proving in geometry developed by Nathan Rochester at IBM exploits a
semantic model of the domain in the form of diagrams of 'typical' cases.
11. Early 60's: Margaret Masterman at Cambridge University designed semantic nets for
machine translation.
12. 1961: First symbolic integration program (SAINT) was written by James Slagle at
MIT, which can solve calculus problems.
13. 1962: Unimation, the first industrial robot company was founded.
14. 1963: Thomas Evans at MIT developed ANALOGY, through which computers can
solve the same analogy problems as are given on IQ tests. During the same year, Ivan
Sutherland at MIT introduced the idea of interactive graphics into computing. At the
11
same time, the first collection of articles about artificial intelligence (Computers and
Thought) was proposed by E.A. Feigenbaum and Julian Feldman.
15. 1964: Danny Bobrow at MIT shows that computers can solve algebra word problems
correctly by understanding natural language. In the same year, the logical
representation of knowledge for question answering systems was demonstrated by
Bert Raphael at MIT.
16. 1965: Resolution method is a mathematical procedure invented by J. Alan Robinson
that allowed programs to work efficiently with formal logic as a representation
language. At the same time, ELISA was developed by Joseph Weizenhaum at MIT;
is an interactive program that carries on a dialogue in English on any topic.
17. 1966: Ross Quillian demonstrated a new knowledge representation scheme-semantic
nets. First machine intelligence workshop and influential annual series were
organized.
18. 1967: An expert system named Dendral was demonstrated at Stanford University to
interpret mass spectra on organic chemical compounds. Dentral was considered one
of the first successful knowledge-based program for scientific reasoning. In this year,
Joel Moses demonstrated the power of symbolic reasoning for integration problems
in the Macsyma program at MIT. This was .a knowledge-based program in
mathematics. A scientist at MIT, Richard Greenblatt, built a knowledge-based chess-
playing program, MacHack, which was good enough in tournament play.
19. Late 60s: Doug Engelhart at Stanford Research Institute (SRI) invented the mouse.
20. 1968: Marvin Minsky and Seymour Papert published perceptron in the ANN. This
demonstrated the restrictions of simple neural nets.
21. 1969: SRI introduced a new robot. This robot was demonstrated by combining
locomotion, perception and problem solving. Roger Schank at Stanford University
defined conceptual dependency model for natural language understanding. Another
milestone in the artificial intelligence was International Joint Conference on
Artificial Intelligence (IJCAI). This conference is a continuous and successful event
presently.
22. 1970: SCHOLAR is an interactive program for computer-aided instructions
developed by Jaime Carbonell. This program was fulfilled on semantic nets as the
representation of knowledge. Augmented Transition Networks (ATN's) was
proposed by Bill Woods that described a representation schema for natural language
understanding. ARCH is an example in the world of children's blocks written by
Patrick Winston at MIT.
23. Early 70's: An influential natural language processing was proposed by Jane
Robinson and Don Walker at SRI.
24. 1971: The ability of computers to understand English sentences was demonstrated
by Terry Winograd at MIT. A robot arm named SHRDLU carried out instructions
typed in English, and then moved the arm based on that instruction
12
25. 1972: PROLOG was developed by Alain Colmerauer; it is another popular AI
language. PROLOG has its roots in first-order logic and is declarative. Most of the
·expressions are represented in terms of relations as facts and rules. A computation
is carried out by running a query over these relations.
26. 1973: Freddy, a robot, was developed by the Assembly Robotics group at Edinburgh
University. This robot was capable of using vision to locate and assemble models.
This robot was one of the first robots to assemble wooden models using vision to
identify and locate the parts. A robot with six electro-mechanically driven axes
developed by the KUKA Robot Group was the first industrial robot.
27. 1974: A rule-based system for knowledge representation and inference in the domain
of medical diagnosis was developed by Ted Shortliffe. This expert system was
known as MYCIN from the Stanford University artificial intelligence laboratory.
This may be the first expert system with professional components. At the same time,
the first planning program was proposed by Earl Sacerdoti. It was popular with the
techniques of hierarchical planning.
28. 1975: Marvin Minsky published a new knowledge representation structure named
frames. Another learning program called Dentral which produced new results in
chemistry, as the part of expert system shell component, was developed. A
Programmable Universal Manipulation Arm (PUMA) was developed by Victor
Scheinman.
29. Mid 70's: A new approach that discovered modelling was established by Barbara
Grosz at SRI. This model used Natural Language Processing (NLP) environment.
Xerox cooperation developed the Smalltalk language. This used the icon-oriented
interfaces and the power of object-oriented programming. MIT published 'primal
sketch for visual perception' was another step in artificial intelligence research.
30. 1976: Doug Lenat's artificial modelling (AM) program (Stanford University)
demonstrated the discovery model which is loosely-guided search for interesting
conjectures. Randall Davis at Stanford University demonstrated the power of meta-
level reasoning.
31. Late 70's: A group of engineers at Stanford University demonstrated the power of
the ARPAnet for scientific collaboration.
32. 1978: Tom Mitchell, at Stanford University, developed the version spaces for
describing the search space of a concept formation program. One of the cornerstones
of artificial intelligence known as 'satisfying' was proposed by Herb Simon. He got
the Nobel Prize in Economics for the theory of bounded rationality. The MOLGEN
program was demonstrated at the Stanford University as an object-oriented
representation of knowledge. This program can be used to plan gene-cloning
experiments.
33. 1979: EMYCIN program was developed by Bill VanMelie at Stanford University.
Through his program, he demonstrated the generality of MYCIN's representation of
13
knowledge and the style of reasoning. This was the general model for many
commercial expert system 'shells'. INTERNIST, a knowledge-based medical
diagnosis program, was developed by Jack Myers and Harry People at the University
of Pittsburgh. This knowledge building was based on clinical knowledge. The CHI
system for automatic programming was developed at Stanford University. The same
university developed a computer-controlled, autonomous vehicle. A non-monotonic
logics formal aspect of truth maintenance was developed by John McCarthy at
Stanford University. Raj Reddy founded the Robotics Institute at Carnegie Mellon
University.
34. 1980's: LISP machines were designed and marketed. First expert system shells for
commercial applications was developed on the same year.
35. 1980: A new blackboard model was published. This was a framework for the
HEARSAYII speech understanding system.
36. 1981: The connection machine was designed by Danny Hillis, a massively parallel
architecture that brings new power to artificial intelligence and for general
computation.
37. 1983: An artificial intelligence research project, SOAR was developed by John Laird
and Paul Rosenbloom. SOAR shows intelligent behaviour used as a general
cognitive architecture for developing systems. Another work was James Allen's
interval calculus, the first widely used formalization of temporal events.
38. Mid 80's: Neural networks became widely used with the back-propagation algorithm.
39. 1985: An autonomous drawing program was developed by Harold Cohen.
40. 1987: Marvin Minsky published 'The Society of Mind'. In this paper, he described
the human mind as a collection of cooperating agents.
41. 1989: ALVINN (An Autonomous Land Vehicle in a Neural Network) was developed
by Dean Pomerleau at CMU. ALVIN is computer controlled system that drove a car
coast-to-coast . ALVINN is designed as a single hidden layer perceptron system that
uses back-propagation algorithm for learning purpose.
42. 1990's: Major advances in all the areas of artificial intelligence were noted in this
decade. There was a significant demonstration in machine learning, computer gamesi
case-based reasoning, uncertain reasoning, intelligent tutoring, multi-agent planning,
scheduling, data mining, natural language understanding and translation, computer
vision, virtual reality and other topics. Collaborative research shows a significant
development in the development of humanoid robot.
43. Early 90's: Gerry Tesauro developed a backgammon program that used
reinforcement learning. This game was powerful at the level of a championship game
playing with the help of world class players.
44. 1997: The Deep Blue chess program beat Garry Kasparov (world chess champion).
Another event was the first official Robo-Cup soccer. This match featured with 40
teams of interacting robots and over 5000 spectators.
14
45. Late 90's: Artificial intelligence created a role in web crawlers. Artificial intelligence
based information extraction programs were popularly used in the World-Wide-
Web. Another milestone was the demonstration of an 'intelligent room' and
emotional agents at MIT.
46. 1999: Remote agent was developed as an artificial intelligence system. This system
has a significant role in many escalations.
47. 2000: The 'Smart toys' or interactive robot pets were commercially developed and
marketed. Another robot with a face that expressed emotions was developed by
Cynthia Breazeal at MIT.
48. 2002: Researchers at the MIT developed 'iRobot'. Also, they introduced 'Roomba', a
vacuum cleaning robot.
49. 2004: The Defence Advanced Research Projects Agency (DARPA) sponsored a
competition for autonomous (driver less) vehicles.
50. 2005: An artificially intelligent humanoid robot was designed by Honda's ASIMO
robot which is able to walk faster than a human. ASIMO was created at Honda's
Research and Development Wako Fundamental Technical Research Centre in Japan.
The technology recommended was based on tracking web activity or media usage
bringing artificial intelligence to marketing. Another development is 'Blue Brain'. It
is a project to simulate the brain at molecular detail by reverse-engineering the
mammalian brain down to the molecular level.
51. 2006: Artificial intelligence was applied to the development of a cognitive radio
engine. Artificial intelligence is used to understand biological intelligence.
52. 2008: Researchers at the University of Alberta solved checkers game.
53. 2009: The largest and the strongest industrial robot with six axes was developed by
FANUC Robotics America.
54. Today: ?
Production Systems
The production system is a model of computation that has been proved particularly
important in artificial intelligence, both for implementing search algorithms and for
modelling human problem solving. A production system consists of the following
components
1. Global database (working memory)
2. A set of rules
3. A goal
There are four basic types of production systems:
Monotonic production system: Monotonic production system (MPS) is a program in which
the application of a rule never prevents later application of the another rule that could also
have been applied at the time the first rule was selected.
15
Non-monotonic production system: Non-monotonic production system is a program in
which the application of a rule prevents the later application of the another rule which may
not have been applied at the time the first rule was selected, i.e., it is a system in which the
above rule is not true, i.e., the monotonic production system rule not true.
Partially commutative production system: Partially commutative production system is a

program with the property that the application of those rules that are allowable transforms
from state 'x' to state 'y'.
Commutative production system: Commutative law based production system is a program
which satisfies both monotonic and partially commutative production systems.
Advantages
For structuring artificial intelligence programs, production system work as an
excellent tool. This technique is highly modular, as individual rules can be added, modified
or removed independently. The production rules are expressed in a natural form while it is
used in problem
solving. So the statements contained in the knowledge base should be recorded for an expert
thinking out loud.
Disadvantages
It may be very difficult to analyse the flow of control within a production system due
to the fact that individual rules do not call each other. This is a major drawback of the
production system building. The production systems always describe the sort of operations
that are being performed while searching solution for a problem. Several production systems
exist for various type of problems. The efficiency varies from problem to problem. In a wide
sense, we may not feel that there is a relation between the problem and the production
system. But it is not so. For ignorable problems, we make use of partially commutative and
monotonic production systems. These systems are of importance for humans from the point
of view of implementation. As these systems are implemented without the ability of
backtracking to the previous states, the efficiency can be increased because the system will
16
not keep track of the changes made in the search process. In systems like puzzle games,
where the changes can be reversed and the order of operation is not critical, we can make
use of the monotonic partially commutative systems. Production systems those are not
partially commutative are useful for many problems in which irreversible changes occur,
such as chemical analysis. In such systems, the order of operation is very important and the
correct decisions are needed to be taken at the first encounter itself.
Travelling Salesman Problem
Enumerate the global database for the following problem and find a solution.
A salesman must visit each of the five cities as shown in Figure . There is a road
between every pair of cities and the corresponding distances are given. The problem is to
find a minimal path that he visits each of the cities only once and returns to the starting city.
The goal is to find the minimum path. The rule is to branch out from a node which
has
minimal distance from the root node.
Solution
Let A be selected as the starting node. We can specify the global database in the form
of a graph. The minimum path is A→C→D→E→B→A and Cost = 34. Figure 1.4(b) shows
the global database for the travelling salesman problem.
Water-Jug Problem
Specify the global database and the set of rules for the following water-jug problem.
There are two jugs of 4-L and 3-L capacities. Neither has any maskings made on it.
There is a pump issued to fill the jugs with water. How can we obtain exactly 2-L of water
in the 4-L jug.
Solution
Rules of the problem are as follows:
17
1. We can empty the jug fully on to the ground or from one jug to the other.
2. We should not fill up a jug or empty a jug half way or fractionally from the pump or
to the ground.
The global database can be represented by an ordered part of number (x, y) for x = 0, 1, 2,
3, 4 and y = 0, 1, 2, 3 which represents the amount of water in 4-L jug and 3-L jug,
respectively.
The applicable rules of the production system are as follows:
1. Fill the 4-L jug if it is partially filled.
((x, y)/x < 4)) → (4, y)
2. Fill the 3-L jug if it is partially filled.
((x, y)/y < 3)) → (x, 3)
3. Pour d litres of water from the 4-L jug on to the ground.
((x, y)/x > 0)) → (x - d, y)
4. Pour d litres of water from the 3-L jug on to the ground.
((x, y)/y > 0)) → (x, y - d)
5. Empty the 4-L jug on to the ground.
((x, y)/x > 0)) → (0, y)
6. Empty the 3-L jug on to the ground.
((x, y)/y > 0)) → (x, 0)
7. Pour water from the 3-L jug into the 4-L jug until it is full.
((x, y)/x+y ≥4) & (y ≥ 0)) → (4,4-(4-x))
8. Pour water from the 4-L jug into the 3-L jug until it is full.
((x, y)/(x + y≥ 3) & (x > 0)) → (x - (3 - y), 3)
9. Pour all water from 3-L to 4-L jug.
((x, y)/(x + y ≥ 4) & (y > 0)) → (x + y, 0)
10. Pour all water from 4-L to 3-L jug.
((x, y)/(x + y ≤ 3) & (x > 0))-+ (0, x + y)
Table shows the solution for the water-jug problem derived from the set of rules discussed
above
18
The Knight's Tour Problem
In a chess game, a knight can move two squares either horizontally or vertically
followed by one square in an orthogonal direction as long as it does not move off the board.
Find out the series of legal moves in which the knight lands on each square of a chess board
exactly once. Given a 3 x 3 chess board.
Solution
The 16 production rules, Rule number, condition and action, are given below.
1. Knight on square 1→ Move Knight to 8
2. Knight on square 1 → Move Knight to 6
11. Knight on square 7 →Move Knight to 2

The goal is to place the knights in the 3 x 3 board.
The solution to the knight tour problem is illustrated in Table
19
Table: Solution to the Knight tour problem is illustrated in Table:
State Space Representation

A production system consists of a global database, a set of rules and a goal. The
initial state can be represented in a way in which the computer can understand. This
representation is known as state space representation. By taking up an applicable rule, we
can derive another state in the solution path of the problem. The same has to be represented
in the computer in a form in which it can understand. Thus, all intermediate configurations
which are obtained by applying the rules in a production system, known as the set of states
in the global database (space), can be represented in the form in which computer can
understand. Following sections discuss problem examples using state space representation.
State Space Search

A problem in a state space is represented in terms of states and operators that change states.
A state space consists of the following:
1. The representation of a state in the system can be in the form of data structure. For
example, in a board game, the board represents the current state of the game.
2. A set of operators that can change from state to state. In a board game, the operators
are the legal moves from any given state. Often the operators are represented as
programs that change a state representation to represent the new state
3. An initial state
4. A set of final states that may be desirable in some cases and undesirable in others.
This is often represented implicitly by a program that detects terminal state.
Branches of Artificial Intelligence

Some common branches of artificial intelligence are described below. The main
branches are useful in research and development for problem solving.
Logical artificial intelligence: What a program knows about the world is represented by
sentence of some mathematical logical language. To achieve its goals, the program decides
what to do by inferring that certain actions are appropriate.
20
Search : Artificial intelligence programs often examine large numbers of possibilities. For
example, these examine the moves in a chess game or the inferences by a theorem proving
program. Discoveries that can answer the question as to how to do this even more efficiently
are being made continuously. Various search techniques are highly useful in the
development of intelligence application, which cut the time delays at the time of searching.
Pattern recognition: When a program makes clarification of some variety, it is often
programmed to compare what it sees with a pattern. For example, to find a face, a vision
program may try to match a pattern of eyes and a nose in a scene. More complex patterns
like a natural language text, a chess position or the history of some event are also studied.
More complex methods are required to find specific patterns. Various pattern recognition
algorithms are useful for patterns detection.
Representation: In some way, facts about the world are represented. Usually, the languages
of mathematical logic are used to represent facts. Most of the facts are represented in
symbolically. These representations can be used in the artificial intelligence algorithms for
making a solution. Facts are highly connected with rules when they are in action with real
problems.
Inference: It is deriving logical conclusions from a true premise known or assumed. In some
cases, mathematical logical deduction is adequate, but new methods of non-monotonic
inference have been added to the logic set. One of the simplest kinds of non-monotonic
reasoning is default reasoning. This makes a conclusion by default, but the conclusion can
be withdrawn if there is evidence to the contrary. For example, we heard a word bird; we
inferred that it can fly and has wings. Is it true for penguin? So we reverse conclude in the
case of a penguin. This conclusion may have led to constitute the non-monotonic character
of the reasoning. A set of conclusions that can be drawn from a set of premises gives
monotonic increasing function of the premises.
Common sense knowledge and reasoning: From 1950s onwards, this is one of the areas
of artificial intelligence that is the farthest from human-level in spite of the fact that it has
been an active research. It is necessary for new ideas while there has been considerable
progress in developing systems of non-monotonic reasoning and theories of action. An
example of such a system is the CYC system that contains a large but spotty collection of
common sense facts.
Learning from experience: Artificial intelligence programs can learn from experience.
There are many learning approaches to artificial intelligence based on specialised
application. There is a learning law that expressed in the facts, logic. Programs can only
learn what facts or behaviours their formalisms can represent, and, unfortunately, almost all
learning systems are based on very limited abilities to represent information.
Planning: Planning in artificial intelligence deals with general facts of the world and their
effects of actions. For a particular solution and statement of a goal achievement, planning
plays its facts. For a sequence of actions, the strategy of planning is common for real world
example.
21
Epistemology: Epistemology deals with the philosophy of knowledge. In artificial
intelligence programs, it involves the study of the kinds of knowledge that are required for
solving problems in the world.
Ontology: Ontology is the study of kinds of things that exist. In artificial intelligence, the
programs and sentences deal with various kinds of objects, and we study what these kinds
are and what their basic properties are. Emphasis on ontology-related programs was initiated
in 1990s.
Heuristics: To identify or discover the idea embedded in a program, we can use certain
functions of mathematics. Heuristic, a term used in artificial intelligence, has many
meanings. For example, in shortest path identification problems, a heuristic is a function,
h(n), defined to measure an estimate of the cost of the cheapest path from a particular node
to the goal node.
Genetic programming: Genetic programing is a soft computing technique for getting
programs to solve a task by mating random and selecting the fittest in millions of
generations. Genetic programming (GP) has evolved from a machine learning technique like
genetic algorithms that is used to optimise a population of computer programs.
Applications of Artificial Intelligence

Every branch of science, engineering and technology shares the tools and techniques
available in the domain of artificial intelligence. However, for the sake of convenience of
readers, we can see here a few typical applications where artificial intelligence plays a
significant and decisive role in engineering automation.
Game Playing
In game' theory, a formal model of interactive situation is defined. There may be two
or more players in the system out of which one player is the decision problem. The players
define their preferences, information and strategic actions when they meet for the game. It
always influences the outcome of the game.
Game playing is one of the leading domains where artificial intelligence has been
applied with great success. Games can generate extremely large search space. A large and
complex powerful technique determines the alternatives to explore in the problem space.
These techniques are based on heuristics (proceeding to a solution by trial and error) and
constitute a major area of artificial intelligence research. The best example is a computer
chess program, which can defeat even world chess champions. A computer game can be
formally defined as a kind of search problem with initial state, set of operators, terminal state
and a utility function.
Expert Systems
An expert system is a software that manipulates encoded knowledge to solve
problems in a specialized domain that normally requires human expertise. An expert system
is an artificial intelligence program in which the system's knowledge is obtained from an
22
expert source such that the machine can offer intelligent advice or take an intelligent decision
in solving a problem. Here, artificial intelligence performs a decision-making task much
faster than the humans can. For example, MYCIN is one of the first expert systems which
can diagnose infections and suggest treatments, sometimes better than the medical
practitioners. Developing an expert system means extracting from human expert knowledge
that can be considered experts, rules strategies and procedures to solve a pre-determined
group of problems and aggregating this knowledge into a computer software system. This
process is known as knowledge engineering and the person responsible for it (the knowledge
engineer) must understand.
The main components of an expert system shell are knowledge base, inference
engine and user interface. There are rule based, knowledge based and hybrid expert systems.
One of the major bottlenecks in the development of expert system is the acquisition of the
expert's knowledge about the problem. Typically, the expert's knowledge is elicited through
the questionnaires, interviews protocol analysis or a combination of some of these. These
direct knowledge acquisition methods can be very time and effort consuming. Modern
expert systems use self learning methods to dynamically learn new rules that can be
validated by an expert and can be added to the knowledge base. There are many applications
incorporated with expert systems.
Natural Language Processing

Natural language processing {NLP) is a technique that builds the ability for machines
to read and understand the languages that humans speak. NLP is a fusion field of computer
science and linguistics. This field is disturbed with the interactions between computers and
natural languages. Powerful NLP system is able to acquire new knowledge by reading the
text to understand the situations. This area is a hotcake for researchers. Major areas of NLP
include information retrieval (IR), information extraction (IE), speech processing,
stemming, text simplification, text-to-speech, text-proofing, natural language search, query
expansion and truecasing.
The important tasks of NLP are automatic summarisation, coreference resolution,
discourse analysis, machine translation, morphological segmentation, named entity
recognition (NER, natural language generation, natural language understanding, optical
character recognition (OCR), part-of-speech tagging, parsing, question answering,
relationship extraction, sentence breaking, sentiment analysis, speech recognition, speech
segmentation, topic segmentation, word segmentation and word sense disambiguation. The
major application domains of NLP include text processing (e.g., word processing, e-mail,
spelling and grammar checkers), interfaces to databases (e.g., query languages, information
retrieval, data mining, text summarisation), expert systems (e.g., explanations, disease
diagnosis) and linguistics (e.g., machine translation, content analysis, writers' assistants,
language generation).
23
Machine learning algorithms are used in NLP tasks. A large number of features that
are generated from the input data are used by all of these algorithms. Parts-of-speech (POS)
is a typical feature that might be the identity of the word being processed. The POS tag is
used to identify the words immediately to the left and right of the word to the left, and
whether the word being considered or its immediate neighbours are content words or
function words. All these algorithms are different but the same nature of rules are generated.
Image Understanding
Many of the artificial intelligence programs are engineered to solve some useful
problems without regard for their similarities to human cerebral architecture. A two-
dimensional array of pixels containing grey levels can be used to receive digital image that
is recognised by a video camera. To interpret a scene, the image is passed through three
basic processes like low, medium and high level vision. The low level visions pre-process
the image by filtering from noise. The system in medium level deals with the improvement
of details and segmentation. The segmentation of image is partitioning the image into the
objects of interest. The high level vision system includes yet another three steps; first is the
recognition of the objects from the segmented image, then labelling the image and, finally,
interpretation of the scene. It is in the high level vision system we make use of the artificial
intelligence tools and techniques. Pattern classification helps in the recognition of objects
from its image. It is presently realised by the supervised learning algorithms. On the other
hand, interpretation process requires knowledge-based computation.
Robotics
Artificial intelligence is applied in robotics, in order to see, hear and to react to other
sensory stimuli. Robotic is employed in tasks requiring a high degree of precision or that are
risky for humans. Another important factor is the size of the space within which the robot
makes possible sequences of moves. Even an extremely simple robot is capable of
generating a vast number of potential move sequences.
Machine Learning
Machine learning is a branch of artificial intelligence that deals with the design and
development of algorithms. So, machine learning plays as a scientific discipline which is
concerned with the design and development of algorithms that allow computers to evolve
behaviours based on empirical data, such as from sensor data or databases. The major focus
of the study is to make the system automatically learn from the available data and
environment and to train itself for making intelligent decisions. It, thus, helps in creating a
system, which is otherwise difficult to construct manually, with lot of ease and reduced
effort.
Learning is one of the most important components of intelligent behaviour. A learner
takes advantage of training examples (data) to capture the characteristics of interest of their
unknown underlying probability distribution. After the training, the learner should check
24
itself by testing. The training and testing data can be seen as examples that illustrate
associations between observed variables.
A key focus of machine learning research is to automatically learn to recognise
complex patterns and to make intelligent decisions based on data. The difficulty lies in the
fact that a set of all possible behaviours is too large to be covered by the set of observed
examples. Decision tree learning, association rule learning, support vector machines, hidden
markov models, neural networks, inductive logic programming, genetic algorithms and
fuzzy logic are some examples of machine learning algorithms
Neural networks: Neural networks or artificial neural networks (ANN) are modelled from
the biological inspiration of the human brain. ANN are composed of a number of artificial
neurons that imply the learning information stored through the node weights. But we cannot
say all learned knowledge is perfect. The ability to generalise problems is the advantage of
the ANN. First, we model the network then lean it and, finally, test it. The performance of
the network depends on the trammg m which how well it can generalise. This depends on
how well the network was designed and trained. Another criterion of ANN is dataset
selection for its training. Researchers who are concerned in computerised prediction/
classification problems are aware of ANN.
A neural network is an interconnected group of nodes, similar to the vast network of
neurons in the human brain. Computers are programmed to simulate intelligence by copying
the types of physical connections that occur in human brains. Neural computing systems
mimic the brain through a network of highly interconnected processing units. These in turn
afford learning capabilities for the computers and enable them to recognize and to
understand very complex or mysterious patterns
Inductive logic programming (ILP): It is the technique of defining the problem by
checking the results of a function. The process will be successful only if there are sufficient
examples and will depend on how complex the functions are. Even recursively defined,
inductive programming algorithms can learn simple logic programs. Apply to the process of
real-life data rather than computer generated functions, more composite programs will prove
challenging to learn. Inductive logic programming is a subfield of machine learning which
makes use of logic programming for rule creation. Thus, it makes the systems capable of
predicting results by learning the background knowledge and by learning various examples.
Inductive logic programming in simple words can be mentioned as the process of
reasoning from specific observations to create a general hypothesis. There will be a logic
program 'B' which represents the background knowledge, then a set of positive (E +ve) and
negative (E-ve) examples. From these, a hypothesis 'H' or a rule is generated. We can explain
ILP in simple way. There are mainly two strategies for learning a rule: top-down approach
and bottom-up approach. Top-down approach starts with the most general rule or empty rule
and similar constraints are added until all positive examples are covered. FOIL is a system
that works on top-down strategy. The latter approach starts with the most specific rule and
25
constraints are removed in order to add more positive results. GOLEM is an !LP system
which works on the bottom-up approach. Inductive logic programming is mainly used for
prediction and classification problems.
Decision tree learning: A decision tree is a structure that allows the learning of opinions in
binary or more related to object based on their attributes. From a series of examples, a
classification model can be built from the continuous learning by an algorithm. This
classification model is the decision tree that can handle example correctly or nothing is done.
The results are displayed with the structure of the tree, if it is correct Bayesian networks:
This is a tree structure built on decisions that allows learning of opinions. There are options
like good or bad related to objects, based on their attributes like length, colour, age, etc.
From a series of examples, the learning algorithm builds a decision tree which can classify
new examples. No modification has been carried out if the new examples are handled
correctly. Otherwise, the tree structure is modified, until the correct results are displayed.
The major challenges of the Bayesian network are handling errors in values (noise), getting
the algorithm to perform well on very large sets of data and determining the optimal fit of
the tree to the training and test data.
Association learning: Association mining is finding frequent patterns, associations
correlations or causal structures among sets of items or objects in transaction databases and
other information repositories. When given a set of transactions, this will find rules that will
predict the occurrence of an item based on the occurrences of other items in the transaction.
It is necessary to satisfy a user-specified minimum support and a user-specified minimum
confidence at the same time for association rules. Association rule generation is usually split
up into two steps: (1) minimum support is applied to find all frequent item sets in a database.
(2) These frequent item sets and the minimum confidence constraint are used to form rules.
Association rule learning is a machine learning technique which discovers relations
between variables. In this method, the rules are formed for problem scenario. Association
rules examine the causal correlations between non-fatal and fatal events. For example, an
association rule is in the form of e1 , e2, ... , ek→ f, conf, where e i is a non-fatal event, f is a
fatal event and
conf is the corresponding confidence value.
Pattern Recognition
It is 'the act of taking in raw data and taking an action based on the category of the
pattern'. Most research in pattern recognition is about methods for supervised learning and
unsupervised learning.
The major aim of pattern recognition is the classification of data (patterns) that is
based either on apriori knowledge or on statistical information extracted from the patterns.
Classifications of patterns are usually groups of measurements or observations, defining
points in a suitable multidimensional space. This is in distinction to pattern matching, where
26
the pattern is rigidly specified. Some of the application areas of pattern recognition are as
follows:
1. Optical character recognition

2. Handwriting recognition
3. Speech recognition
4. Face recognition
Virtual Reality
Virtual reality is an attempt for creating natural interaction. For this purpose, a
variety of specialized input and output devices are necessary. Normally, a tracker is used as
the input device. It is capable of reporting its location space and orientation. The tracking
device can either be optical, magnetic or acoustic. Tracker also comes along with the
traditional input devices like joystick or mouse. The attempt to create a truly natural input
device was fulfilled by the development of data gloves. Data glove has numerous sensors
that are capable of reading the angle of each of the finger joints. Thus, by wearing the gloves,
the user can interact with the virtual world through hand gestures. For creating the visual
world experience, stereoscopic displays are used. Two views of the simulated world are
generated, one for each eye, and stereoscopic device is used to show the correct view to each
eye.
We can apply the virtual reality environment in a variety of ways. In research, virtual
environments are used to visually explore whatever physical world phenomenon is seen. It
can also be used for training personnel for work in dangerous environments like nuclear
power stations or with expensive equipment through simulation. Air plane pilots, for
example, train in flight simulators. Virtual reality can enable medical personnel to practice
new surgical procedures on simulated individuals. In the field of entertainment also virtual
reality is used; it provides highly engaging way to experience imaginary worlds and to play
games. When designing prototypes for new products, virtual reality provides a way for
experimentation with these designs.
Hardware
A variety of specialized input and output devices are required to achieve the virtual
reality field. One of the most important input devices used in a virtual environment is a
tracker.
This device is capable of reporting its location in space and its orientation. The mode of
operation of the tracking device can be optical, magnetic or acoustic. The tracker can be
combined with input device, such a5 a mouse or a joystick to perform in its virtual reality
environment. Another virtual reality environment is the data glove. It is a natural input
device outfitted with sensors that can read the angle of each of the finger joints in the hand.
The user wears such a glove to interact with the virtual world through hand gestures. This
movement includes pointing or making a fist to the application. Real-world visual
27
experience is approximated using stereoscopic displays. For each eye of the viewer,
simulated world is generated. A stereoscopic display device is used to show the correct view
to each eye. A variety of such simulated input mediums are created for the user to justify the
virtual reality environment.
Computer Vision
Computer vision is the science and technology of machines that can see. This means
that for solving tasks, the machine is able to extract information from an image. Computer
vision plays here to recognise the object a5 an image and identify the tasks. It is a scientific
discipline concerned with the theory behind artificial systems that extract information from
images. The image data can be in the form of picture, video sequences, multidimensional
data from a medical scanner or views from multiple cameras.
Computer vision seeks to apply its theories and models to the construction of
computer
vision systems because it is a technological discipline. The examples of applications of
computer vision include systems for
1. controlling processes, for example, an industrial robot or an autonomous vehicle;
2. detecting events, for example, for visual surveillance or people counting;
3. organizing information, for example, for indexing databases of images and image
Sequences:
4. modelling objects or environments, for example, industrial inspection, medical
image analysis or topographical modelling; and
5. interaction, for example, as the input to a device for computer-human interaction
Nature Inspired Computing

A new computing technique is required for the researchers for solving artificial
intelligence problems that have not yet been covered. Nature inspired computing is one such
computing technique that absorbs ideas by observing how nature behaves in various
situations to solve complex problems. Research on this computing technique has opened
new branches such as neural networks, artificial immune systems, biological computation,
evolutionary computation, swarm intelligence, and so on.
The emergence of nature inspired computing was made possible by the advancement
of computer science and the remarkable growth of computing techniques. In almost all fields
of biology, physics, engineering, economy and management, nature inspired computing
techniques are applied. The models are swarms, colonies or other natural similes-software
agents that are suitable for modelling particularly complex and dynamic systems.
Algorithms that find optimal paths to food sources have already been developed. These
developments are mainly inspired by ant colonies that exhibit swarm intelligence. This is
called ant colony optimization (ACO), highly used in optimization problems. Similarly,
particle swarm optimization (PSO) is used in a variety of problems. Molecular electronics
28
attempts to develop solid-state components similar to molecular structures. A biological
molecule such as DNA has inspired researchers to work on DNA-based computing.
A typical nature inspired computing system is based on self-organization and
complex systems obtained from nature. It is derived as computing system operated by a
population of autonomous entities surrounded by environment. Nature inspired computing
techniques can applied to a wide range of problems like classification, prediction or
optimisation. It is adapted from the unseen data and is capable of learning. It is so robust
that it can handle incomplete data. The computation techniques are decentralized in the
control of computational activities. One of the subset of nature inspired computing is
biological inspired computing. There are three important differences between traditional
computing systems and biological information processing systems. For higher level
operations, the components of biological systems respond slowly but implement. A major
ability of the biological system is to assemble and grow own interconnection densities.
Biological systems' implementation is not a planned one. The biologically inspired
computing is a field of study that loosely knits together the subfields related to the topics of
connectivity, social behaviour and emergence. The biologically inspired computing is
related to the field of artificial intelligence. It can also link to machine learning as well as
the fields of biology, computer science and mathematics. To summarise, nature inspired
computing system is the use of computers to model nature and simultaneously the study of
nature to improve the usage of computers.
Major difference between bio-inspired computing from artificial intelligence is in
how it takes more evolutionary approach to machine learning. In traditional artificial
intelligence, programmer is a creator and makes something and imbues it with its
intelligence. Bio-inspired techniques involve a method of specifying the set of simple rules
(these rules are adhered from simple organisms) and a method of iteratively applying those
rules. Some forms of complex behaviour arise after several generations of rule applications
usually that make some cases. The complexities of such works are very high.
Intelligent Control
In a process control system, the controller is designed from the known models of a
process and the required control objective. The existing techniques for controller design are
not valid when the dynamics of the plant is not completely known. In such situations, rule-
based control is mostly used and for controller realization, an expert engineer will set the
production rules. The predecessor part of the rules in a rule-based system is then searched
against the dynamic response of the process plant parameters. Rules whose antecedent part
matches with the plant response are selected and fired. There will be situations where more
than one rule is firable; in such case, a set of strategies set by the controller is used to resolve
it. The other extreme situation may also occur, when no rule matches with the plant response.
Such situations are handled by fuzzy logic which is capable of matching the antecedent parts
of rules with the plant response. In various industrial plants, fuzzy control has been
29
successfully implemented. For example, for power control in a nuclear reactor, various
controllers are implemented. Moreover, the issue in process control is to design a plant
(process) estimator. This will attempt to follow the actual plant response, when both the
plant and the estimator are jointly excited by a common input signal. The new tools for plant
estimations which are recently developed are fuzzy and artificial neural network-based
learning.
Artificial intelligence tree

To summarise, the artificial intelligence is looked upon as a tree with
Fruits: Applications
Branches: Expert systems, natural language processing, speech understanding, robotics and
sensory systems, computer vision, neural computing, fuzzy logic, genetic algorithm.
Roots: Psychology, philosophy, electrical engineering, management science, computer
science, Linguistics
Questions
1. Define artificial intelligence.

2. What is the difference between intelligence and artificial intelligence?
3. Explain the architecture of artificial intelligence system.
4. Define intelligence. What is the intelligent behaviour of a machine?
5. Describe the Turing test for artificial intelligence and justify its validity from a
modern standpoint.
6. Write a note on the birth and development of present day artificial intelligence
research.
7. Write a note on important events in the history of artificial intelligence research.
8. What are the characteristics of intelligence?
9. What is artificial intelligence technique?
10. What are the knowledge characteristics?
11. What do you know about production system?
12. How can you analyse a problem?
13. Explain the concept of operators or production rules?
14. Define the problem as state space search. Examine in detail.
15. Define the problem as state space problem.
16. Define the concept of water-jug problem. Find three solutions to water-jug problem.
17. Compare if there is a water scarcity, which solution is better and if there is no scarcity
of water, which solution is better?
18. Define the problem as state space problem and prove missionaries and cannibals
problem.
19. By using state space problem, write one solution to the Tower of Hanoi problem.
30
20. Explain the concept of artificial intelligence problem characteristics in detail.
21. List out the various machine learning techniques.
22. There are various machine learning techniques. How will you select an appropriate
one for your problem?
23. What is intelligent control? What is its role in artificial intelligence?
24. Why is planning important in artificial intelligence?
25. What about IQ? Do computer programs have IQs?
26. How far is AI from reaching human-level intelligence? When will it happen?
What about other comparisons between human and computer intelligence? What is
artificial intelligence and artificial technique? Briefly explain how artificial intelligence
technique can be represented. List out some of the task domains of artificial intelligence.
31
CHAPTER 2
ARTIFICIAL INTELLIGENCE
What is artificial intelligence?
John McCarthy, a founding father described the process in 1955 as "that of making
a machine behave in ways that would be called intelligent if a human were so behaving.
INTRODUCTION
Using human capabilities as a yardstick for Artificial Intelligence (AI) Machines are
able to perform lots of tasks that people can’t do at all, and many such performances certainly
feel like displays of intelligence. A security program may suspect a cyber attack based on
an unusual pattern of data access requests in a span of just five hundred milliseconds; a
tsunami warning system may sound an alarm based on barely perceptible changes in ocean
heights that mirror complex undersea geography; a drug discovery program may propose a
novel admixture by finding a previously unnoticed pattern of molecular arrangements in
successful cancer treatment compounds. The behavior exhibited by systems like these,
which will become ever more common in the near future, doesn't lend itself to comparison
with human capabilities. Nonetheless, we are likely to regard such systems as artificially
intelligent.
The essence of AI-indeed, the essence of intelligence-is the ability to make
appropriate generalizations in a timely fashion based on limited data. The broader the
domain of application, the quicker conclusions are drawn with minimal information the
more intelligent the behavior.
Can a computer ever really be smarter than a human being?
In a word, yes-but most likely in limited ways. It's possible that at some point in the
future public sentiment will have shifted sufficiently to accept the idea that computers are in
general superior to humans in some fairly broad classes of intellectual tasks, but this doesn't
mean that machines will dominate or obsolete us. TMs can count bills faster than we can,
cameras can see in the ark, but we don't regard any of these as threatening our primacy.
Computer programs can already play games, scan a crowd for familiar faces, and
recommend movies as well or better than we can, yet few people are intimidated by these
competencies.
Intelligence the word for machines, is likely to apply to well-defined activities in
which the goals can be easily specified and measured. But not to others in which success is
more subjective.
No doubt computer programs can be written that will at least plausibly attempt to
answer these sorts of subjective or judgmental questions, but their answers will not be
regarded as preferable to, more perceptive than, or wiser than those of humans.
32
Machines as "more intelligent" than humans may seem uncomfortable, by the time
it happens it will likely be no more remarkable than many prior technological advances
anticipated with horror, such as in vitro fertilization ("test-tube babies"), the supposedly
numbing and dumbing effects of television on children, and (my personal favorite) the
menace of recorded music.
To understand why computers are likely to exceed human intelligence in a variety of
practical applications, it's helpful to start with the straightforward observation that
computers surpass human capabilities in many tasks today, including some that we might
have believed require human intelligence. Driving cars, playing Jeopardy, predicting wars,
and summarizing press releases are but a few examples.
With their superior speed, accuracy, and memory capacity, computers may more
effectively perform tasks like playing chess or translating text by searching for answers,
whereas humans may perform better using other techniques. But as automated methods
continue to chip away at a long list of abilities previously considered the unique and sole
province of humans, the distinction between human and machine intelligence for the average
person may blur into irrelevance. Sometimes the best man or woman for the job may be a
machine.
33
CHAPTER 3
THE HISTORY OF ARTIFICIAL INTELLIGENCE
INTRODUCTION
The first use of "artificial intelligence" can be attributed to a specific individual-John
McCarthy, in 1956 who considered very aspect of learning or any other feature (J' of
intelligence can in principle be so precisely described that a machine can be made to simulate
it, to make machines use language, form abstractions and concepts, solve kinds of problems
now reserved for humans, and improve themselves."
Further symbolic logic, the branch of mathematics that deals with representing
concepts and statements as symbols, then defines various transformations to manipulate
these symbols to reason deductively from hypotheses to conclusions (or inductively from
conclusions back to hypotheses). For instance, symbols might represent "Socrates," "man,"
and "mortal," as well as the statements "Socrates is a man," and "All men are mortal." From
this, you could formally derive that "Socrates is mortal." Most mathematicians who studied
symbolic logic were concerned not with actually proving statements like this or applying the
techniques to specific problems, but rather with studying the theoretical properties of logical
systems-such as delineating what such systems can and can't do.
But the advent of electronic computing devices raised the possibility that all this
theory might actually have a practical use. After all, computers had proved their mettle
during World War II as an efficient way to calculate ballistics tables (how to aim big guns)
and to encrypt, decrypt, and even break secret codes. Attempt to expand the use of
computers beyond crunching numbers and processing data to manipulating symbols.
McCarthy himself went on to create a number of seminal inventions in the field, most
notably the elegant programming language LISP, which stood for "list processing," not a
speech impediment.
McCarthy believed that a computer could simulate many or all advanced human
cognitive functions. As he put it, "The speeds and memory capacities of present computers
may be insufficient to simulate many of the higher functions of the human brain, but the
major obstacle is not lack of machine capacity, but our inability to write programs taking
full advantage of what we have .... Probably a truly intelligent machine will carry out
activities which may best be described as self improvement.... A fairly attractive and yet
clearly incomplete conjecture is that the difference between creative thinking and
unimaginative competent thinking lies in the injection of some randomness. The randomness
must be guided by intuition to be efficient. In other words, the educated guess or the hunch
34
include controlled randomness in otherwise orderly thinking.'" All these somewhat off-the-
cuff remarks presaged important areas of study within the field.
Progress in AI has gone through several highly visible cycles of boom and bust,
creating periodic so-called "AI winters" in which the field was substantially out of favor
with governmental and industrial patrons.
Some of the early groundbreaking work involved highly visible accomplishments
such as Arthur Samuel's 1959 checkers player. This remarkable program demonstrated to
the world the novel proposition that a computer could be programmed to learn to play a
game better than its creator. It could improve its performance by playing and could do
something that humans could not-play against itself to practice-eventually reaching
advanced amateur status.
What is the “Physical Symbol System Hypothesis”?
"Symbols lie at the root of intelligent action, which is, of course, the primary topic
of artificial intelligence .... A physical symbol system is a machine that produces through
time an evolving collection of symbol structures”.
A physical symbol system has the necessary and sufficient means for general
intelligent action. By "necessary” any system that exhibits general intelligence will prove
upon analysis to be a physical symbol system. By "sufficient" any physical symbol system
of sufficient size can be organized further to exhibit general intelligence. By "general
intelligent action" to indicate ... the same scope of intelligence as we see in human action:
... in any real situation behavior appropriate to the ends of the system and adaptive to the
demands of the environment can occur, within some limits of speed and complexity.
While their characterization of the dominant approach to AI at the time is insightful
and· inspiring, in retrospect it suffers from a significant defect. Despite the fact that it is
presented as an empirical hypothesis, it is not, by itself, subject to confirmation or refutation.
Alternative approaches to AI not based on anything like their proposed methodology could
be equally or more effective in achieving their aspirations for the field, calling into question
whether "symbols lie at the root of intelligent action." Yet, their rebuttal could be that an
equivalent (or better) physical symbol system solution may exist; it just hasn't been
developed yet. In other words, their description of the field is a little like a prescription for
how to address a golf ball in order to drive it as straight and as far as possible (keep your
head steady and your eye on the ball, use your leading arm for power and your following
arm for control). Equating this with the optimal (or only) way to play the game would seem
to overreach. At least one alternative approach (machine learning) to AI that s no obvious
relationship to their hypothesis did emerge, but not before another major wave of systems
targeted at practical applications that followed the symbolic systems approach arose in the
early 1980s.
35
What is (or was) expert systems?
Around 1980, a new class of systems, called at the time “expert systems" or
"knowledge systems/' arose. The idea was to capture and duplicate scarce human expertise
in a computable form, in the hope of making this capability available more widely and
inexpensively.
Typically, expert systems were highly specialized or, in the jargon of the time,
"domain specific”. Why any program that performs a sufficiently sophisticated task is not
considered an expert system, or at least wasn't back when the term was popularized. The
main difference is in how the expertise is represented. In contrast to the procedural method
of computer programming common at the time (and still today), where a problem is broken
down into a series of sequential steps, expert systems instead employed a different approach,
a natural application of the symbolic systems concept. These computer programs
deconstructed tasks requiring expertise into two components: the "knowledge base" -a
collection of facts, rules, and relationships about a specific domain of interest represented in
symbolic form-and a general purpose "inference engine" that described how to manipulate
and combine these symbols. Representing the facts and rules explicitly had the advantage
that the systems could be more easily modified as new facts or knowledge were incorporated.
In particular, the people programming expert systems-who became known as "knowledge
engineers" -could create these systems by interviewing practitioners and incrementally
incorporating their expertise into computer programs, whose performance could then be
tested, evaluated, and improved accordingly. The common approach to programming
required the programmer him- or herself to be an expert in the domain, not to mention be
readily available to make changes, both obvious practical impediments. By contrast, the
concept behind expert systems was to represent the knowledge of the domain explicitly,
making it available for inspection and modification. This approach also allowed programs
to be more fault tolerant, that is, they tended to be more forgiving of programming mistakes.
Equally important, this structure provided a convenient framework for the program to
"explain" its reasoning.
What is planning?
The symbol system approach is by no means dead. It is alive and well, most
prominently in a subfield of AI called "planning," which is concerned with developing
techniques to address problems that require formulating a series of steps to accomplish some
desired goal. Examples include giving driving directions, playing games, packing odd-sized
boxes into a truck, proving mathematical theorems, analyzing legal contracts and
regulations, cooking recipes, laying out transistors on computer chips, assembling
equipment, describing regulations and rules in computable form, and controlling air traffic.
The common element of these challenges is that there is usually a known initial state,
one or more desired final states, a specific set of operations or "moves" available to proceed
from initial to final state(s), and some measure of the value of a solution, such as minimizing
36
the number of steps required. In other words, planning systems figure out what to do. While
you might suspect that anything goes in solving planning problems, in practice these
challenges mostly fall into well defined classes with characteristic mathematical properties
that are amenable to different techniques.
Most planning systems engage in symbolic inference enhanced with what's called
heuristic reasoning. Heuristic reasoning tackles a common, if not universal, problem
plaguing the symbolic systems approach-that the number of possible sequences of steps can
be very large called a "combinatorial explosion.
Planning systems employ a variety of strategies. Some start with the goal and reason
backward, in an attempt to find the initial conditions that will get there. For instance, if you
have several errands to run but want to be sure to get home in time for dinner at six, you
might work backward in time, subtracting how long each stop is going to take you, to figure
out when you have to leave. Others reason forward, from hypotheses to conclusions, or
attempt to simplify the task by first solving smaller sub problems, then connecting them
together into a comprehensive solution.
One active field of AI research that employs planning techniques is "general game
playing." This is exactly what it sounds like. A program is presented with a set of rules for
playing a game that it has no previous knowledge of, but is told nothing at all about how to
play it well. Then it has to figure out its own playing strategies by reasoning about what's
likely to work well. Other widespread modem applications of planning that use heuristic
techniques are the navigation programs that provide you with driving directions and what
are called "nonplayer characters" (NPCs) in computer games-those seemingly cunning and
wily animated characters that often shoot at or antagonize you.
Planning systems, and more generally the symbol systems approach, are what is
somewhat derisively (or affectionately, depending on what flavor of AI you happen to be
most fond of) today called "Good Old-Fashioned AI," or GOFAI. In any case, subsequent
developments have demonstrated that for all its appeal, the physical symbol system
hypothesis is not the only game in town.
What is machine learning?
The ability to learn is an important aspect of human intelligence. The question is
how do people learn? And can we program computers to learn the same way?
Learning is not inconsistent with the physical symbol system hypothesis; it's just not
obvious a priori how it fits in. Typically, in an AI application following the symbol systems
approach, the learning (if any) is done up front, to help develop the symbols and rules that
are ultimately packaged up and used for the intended application. But just as the role of
knowledge may have been underappreciated in the earliest AI systems, the importance and
37
value of learning-not only in advance but as an ongoing part of solving many problems of
practical interest-may not have received the attention it deserved.
Learning presumably comes mainly from experience, practice, or training, not solely
from reasoning, though this can certainly be helpful. To say that something is learned implies
that it is more than just captured and stored, as data is in a database-it must be represented
in some way that it can be put to use. As a general description, computer programs that learn
extract patterns from data. That data may take a seemingly infinite variety of forms-video
taken from a moving car, reports of emergency room visits, surface temperatures in the
Arctic, Facebook likes, ant trails, recordings of human speech, clicks on online ads, birth
records from the Middle Ages, sonar soundings, credit card transactions, the dimming of
distant stars when transited by orbiting planets, stock trades, phone calls, ticket purchases,
transcripts of legal proceedings, tweets (from both Twitter and birds)-just about anything
that can be captured, quantified, or represented in digital form.
The new data-centric approach to AI goes by several names, most commonly
"machine learning," though you may have heard it referred to in the press as "big data" or as
"neural networks" -a specific approach to machine learning (but not the only one).
What are artificial neural networks?
An artificial neural network is a computer program inspired by certain presumed
organizational principles of a real neural network (such as your brain). The relationship
between artificial neural networks and real ones is mostly aspirational.
The structure of the brain at the detail level is composed of a mostly homogeneous
mass of cells called neurons, which interconnect with each other at "synapses" to send and
receive electrical or chemical signals. When these signals exceed a certain level or form a
certain pattern, a neuron “fires”, meaning that it in turn signals other neurons that it is
connected to. The gross structure of the brain-which layers and regions are typically
involved in various activities, such as seeing, getting hungry, doing arithmetic, adjusting
your heart rate, recognizing faces, and wiggling your big toe. How the neurons are
connected to perform these tasks.
Neurons in an artificial neural network are commonly organized into a series of
layers. The neurons at each level are connected only to those at the level above and below
them in the hierarchy, and the interconnections are usually modeled as numeric weights,
with (for instance) o representing "not connected" and representing "strongly connected."
The lowest level actually receives input from outside the network-for instance, each low-
level neuron might process information about a specific dot (pixel) from a camera. The
neurons at higher levels-in what are called the "hidden layers" –receive input only from
neurons below them. The entire structure is then presented with examples, such as pictures
of cats, and the weights are propagated up (and often back down) the hierarchy until it is
38
"tuned" to recognize cats-which is indicated by a particular neuron (or pattern of neurons)
firing, usually at the highest level?
An artificial neural network actually detects is incredibly sophisticated and complex
correlations between the images-regardless of whether they are rotated, stretched, partially
obscured, or the like. After training on perhaps millions and millions of images, it develops
the ability to detect similar patterns in pictures not presented to it previously. In other words,
it learns to identify pictures of cats all by itself. Whether this has anything to do with how
we learn to recognize cats is an open question, but what's not open to dispute is that it works,
and it works quite well. The most recent crop of such systems can actually outperform
humans at many recognition tasks.
Artificial neural networks are constructions that resonate with arbitrarily complex
patterns present in their inputs. They are mirrors of their experience. In this sense, they don't
"learn how to do things" in the common sense of the phrase-developing a principled
understanding of the underlying relationships and properties of their world.
Instead, they are incredibly skilled mimics, finding correlations and responding to
novel inputs as if to say, "This reminds me of ... ," and in doing so imitate successful
strategies gleaned from a large collection of examples. An open philosophical question is
whether this approach is equivalent to understanding causation. Are we really doing the
same thing, or is there something more to the way humans learn and interact with the world?
And if the end result-the behavior-is the same, does any such distinction matter?
How did machine learning arise?
A network of brain neurons could be modeled by, of all things, logical expressions.
Despite the fact that brains are soft, wet, gelatinous masses, the signaling in the brain is
digital. Indeed it appears to be binary. This is another example of the role that mathematical
formalization can play in propelling science forward. Specification of the nervous net
provides the law of necessary connection whereby one can compute from the description of
any state that of the succeeding state. The possibility that modeling the brain mathematically
could lead to progress in treating psychiatric disorders, which was natural.
A major current focus in machine learning-refers to the use of artificial neural
networks that have many internal layers (referred to as hidden layers). But the main driver
of renewed interest in the field was the growing availability of example data in computer-
readable form, not to mention that computers were improving at a blistering pace in terms
of both storage and processing capacity. In particular, a new class of powerful parallel
processing supercomputers called connection machines could simulate the behavior of
multiple artificial neurons at the same time.
Despite the promise of these novel machines, they were ultimately overtaken by
standard nonparallel commercial processors because the economics of mass production
accelerated their development faster than such specialized machines. The same fate was to
39
befall another related computer engineering development at the time-machines designed
specifically to process McCarthy's AI language LISP. A new attempt to develop processors
targeted to artificial neural networks is currently under way, most notably at IBM.f Its latest
effort is a 5.4-billion-transistor chip with 4,096 neurosynaptic cores that integrates 1 million
neurons and 256 million synapses. Each chip supports a thousand times as many artificial
neurons they can be tiled and stacked in two dimensions, not to mention they probably
operate at least 1 million times faster.
"It's a new landmark of the brain-inspired computers ... [it] approximate[s] the
structure and function of the brain in silicon." Time will tell if this proves to be the computer
architecture of the future or simply another misguided attempt to build computers tailored
for a specialized class of applications.
While machine learning systems are currently experiencing a boom in commercial
investment and being applied to an incredible variety of problems with significant success,
perhaps the most remarkable application is some recent work in which the techniques are
used not to simulate the brain but to reverse engineer it. Jack Gallant at the Henry H.
Wheeler Jr. Brain Imaging Center of the University of California at Berkeley is succeeding
in using machine learning techniques to read minds. Trained a machine learning system to
look for patterns from any array of brain sensors while they show test subjects pictures of
various objects, like scissors, bottles, or shoes. Putting a new subject into the test rig and
show him or her a picture. Once trained, this program can correctly identify what the subject
is looking at with significant accuracy.
There are two promising aspects of this first the techniques currently used to measure
brain activity are quite crude, mainly blood flow occurring in cubic brain segments three
millimeters on a side (called "voxels"), the modem equivalent of Rosenblatt's low-resolution
twenty-by-twenty grid of photocells. As brain activity measurement instruments become
more sensitive and detailed, potentially even detecting the firing of individual neurons, the
quality of the interpretations is likely to improve dramatically. Second, the results are not
specific to a particular person-the system can train on one set of test subjects, then use those
results to interpret what a different subject is looking at with high accuracy. This means that
at least to the level of detail researchers are currently studying, human brains are not as
idiosyncratic as one might suppose.
Unless (or until) fundamental limitations emerge, this work opens up the real
prospect of integrating our own brains with the electronic world-in other words,
communicating with and controlling computers, machines, and robots simply with our
minds, just as we do our own bodies. It also raises the scary prospect that our own thoughts
may no longer be entirely private.
40
Symbolic Reasoning VS machine learning?
Focused exclusively on one or the other of these two approaches, there's no reason
in principle that they can't both be profitably integrated into a single design. Indeed, there is
considerable effort in this direction and conferences devoted to it.
But the plain fact is that these approaches have different strengths and weaknesses.
In general, symbolic reasoning is more appropriate for problems that require abstract
reasoning, while machine learning is better for situations that require sensory perception or
extracting patterns from noisy data. For instance, suppose you want to build a robot that can
ride a bike (that is, one that can control the pedals and handlebars, and is able to balance).
Representing this problem in symbolic terms may be possible, but imagine trying to
interview a human expert in an effort to build an expert system to do this. There certainly
are experts at riding bikes, but the nature of their expertise simply doesn't lend itself to
description in words. Clearly, knowledge and expertise can take forms that resist
codification into human language or any explicitly symbolic form.
But there are other issues for which machine learning techniques aren't well suited.
To state the obvious, machine learning is not useful for problems where there's no data, just
some initial conditions, a bunch of constraints, and one shot to get it right. For example,
mistakes made in the design of computer chips can be very expensive and damaging. After
Intel was forced to recall its Pentium 5 processor in 1994 due to a bug in certain math
functions, interest surged in formal methods for verifying that circuits performed as
expected. There are two parts to this problem: first, how to describe in an abstract way the
functions of the circuit that you are trying to verify; and second, how to perform the test in
a practical amount of time and at an acceptable cost while still guaranteeing that the results
are correct. After a decade of work on this problem, a language for specifying the desired
behavior was accepted and standardized by the IEEE (Institute for Electrical and Electronic
Engineers) in 2005, followed by a variety of commercial and proprietary programs to
perform the actual verification.• But the field of AI suffers from an unusual deficiency-once
a particular problem is considered solved, it often is no longer considered Al. So "formal
verification" and "model checking," at least as applied to computer hardware, are now
independent fields, though they trace their intellectual roots back to early theorem provers
like that of Newell and Simon.
That said, many problems that you might think of as requiring logic and reasoning
are surprisingly amenable to machine learning techniques.
In the early days of AI, the available computers were simply not powerful enough to
machine learn much of interest. They offered only a miniscule fraction of the processing
speed of today's computers and sported an equally vanishing amount of memory available
to store data. But most important, there simply weren't many sources of machine-readable
data available to learn from. Most communication was on paper and available only at
specific locations-as anyone who has tried to get their mother's birth certificate can attest.
41
For real time learning, the data from sensors was equally primitive or available only in an
analog form that resisted processing digitally. So four trends-improvements in computing
speed and memory, the transition from physically to electronically stored data, easier access
(mainly due to the Internet), and low-cost high resolution digital sensors-were prime drivers
in the refocusing of effort from symbolic reasoning to machine learning.
Important historical milestones in AI
There have been technical and scientific break-throughs that are significant
intellectual achievements underlying many of the great advances in the field. There are also
many highly successful applications with great impact on society that are secret, proprietary,
or otherwise hidden from view. Examples include programs that scan our communications
(for better or for worse), trade securities, detect cyber attacks, review our credit card
transactions for fraud, and no doubt many others.
Probably the first objective and easily comprehensible mile-stone to capture the
public's imagination was the program Deep Blue, which beat Garry Kasparov, then the world
chess champion, in a six-game tournament in 1997. The match was a nail-biter -Deep Blue
triumphed only in the kaJ game. Adding to the drama, Kasparov, a child prodigy considered
possibly the greatest chess player of all time (and apparently a bit of a prima donna at the
age of thirty-four), promptly accused IBM of cheating, based mainly on his conviction that
a machine could never have formulated such brilliant strategies.
In any case, this victory, after decades of missed predictions by overly optimistic
prognosticators, received widespread attention and sparked endless debates about what it
"meant" for human supremacy over machines. Chess had long been held out as a bastion of
intellectual achievement likely to resist any attempt at automation. But like most if not all
such encroachments by technology into formerly exclusively human domains, the
accomplishment was soon accepted as routine rather than a call to arms that mechanical
minds were approaching from all directions to take over the world. Those downplaying the
import of the victory mostly focused on the role of the specially designed supercomputer
used for the task rather than the sophisticated programming techniques developed by the
team, which suited IBM just fine, since the company was in the business of selling the latest
and greatest hardware. Explanations of the programming techniques used also helped to
demystify this fear: if you can see the emperor's naked body, perhaps he's not so superhuman
after all. Today, expert-level computer chess-playing programs are commonplace and so
powerful that they are no longer routinely pitted against human players. Chess programs
capable of grandmaster-level play could be run on a garden-variety smartphone.
With computer chess now regarded as a "solved problem," attention moved on to a
completely different sort of challenge: driving a car without human intervention. The main
technological barrier is not control of the car-most modern vehicles already interpose
electronics between the driver and the controls- but rather the ability to sense the
environment in sufficient detail and respond quickly enough. An emerging technology call
42
LIDAR (for light/laser detection and ranging), mainly used for military mapping and
targeting, proved just the ticket for sensing, but interpreting the results was another matter.
Integrating the stream of data into features and obstructions of interest-such as trees, cars,
people, and bicycles-required significant advances in the state of the art of computer vision.
43
CHAPTER 4
FRONTIERS OF ARTIFICIAL INTELLIGENCE
What are the main areas of research and development in AI?

Work in artificial intelligence is generally divided into a number of subfields that
address common, though difficult, practical problems or require different tools or skills.
Some of the more prominent are robotics, computer vision, speech recognition, and natural
language processing. A brief explanation of each follows.
What is robotics?
Robotics should require little description-it involves building machines that are
capable of performing physical tasks. Most people think of robots as mimicking human
form, but of course that's not necessary. Much ongoing work seeks to develop lighter-weight,
more flexible, stronger materials and methods of control as well as novel designs (often
inspired by nature), but what really distinguishes robotic research in AI from more
pedestrian mechanical automation is the attempt to build devices that are capable of more
general classes of tasks. For instance, all sorts of special-purpose machines exist that pack
specific foods and products into shipping cartons and containers. But creating a single device
capable of handling a wide variety of shapes, sizes, weights, and fragility remains a
challenge at the forefront of AI.1 The main issue here is adapting to changing or chaotic
environments as they continually shift. The signature accomplishment of robotics research
in this regard is the autonomous vehicle, which navigates roads and negotiates spaces in
concert with human-controlled vehicles, bicycles, and pedestrians, despite all the attendant
novelty and unpredictability.
AI technology opens whole new vistas of economic opportunity by enabling robots
to work where people can't. Robots are of great value for all sorts of tasks that are too
dangerous or costly for people to do. These might be mining or farming the sea bottom,
eliminating agricultural pests by targeting them with insect-specific mechanical predators,
or cleaning up industrial accidents.
One obvious such area is exploration of space. In 1993, NASA sent the space shuttle
with seven people on board on a mission to repair the Hubble space station; the objective
was to perform an exceptionally precise operation to correct the space telescope's optics.
This was the first of five manned Hubble maintenance missions.• In 2004, serious
consideration was given to using a two-armed Canadian robot called Dextre instead of
astronauts for the final mission, but it was judged too risky given the state of the art at the
time!' Nonetheless, robotic devices are likely to be much more practical for the sorts of tasks
we are likely to want to perform off the earth in the near future, such as analyzing geological
samples, searching for biological life, mining asteroids, and diverting astronomical bodies
44
whose paths threaten earth. The NASA Mars rovers-Opportunity and Curiosity-are prime
examples of this approach, though the degree to which they rely on AI technology is Jess
clear.
Closer to home, the most recent DARPA robotics challenge was motivated by the
difficulty of getting human personnel into the Fukushima nuclear plants following their
meltdown.' Teams competed to have their robots perform a variety of common tasks such
as driving a utility vehicle, opening a door, locating and closing a valve, and connecting a
fire hose to a standpipe.
Home robots are a very long way from providing the sort of general assistance that
a human caregiver typically provides.
There are robots for entertainment. These usually take anthropomorphic forms like
the preprogrammed anirnatronic figures common in theme parks such as Disneyland but are
considerably more flexible and interactive. Pepper, from Alderbaran Robotics and SoftBank
Mobile, tries to read your intentions and respond appropriately. It is currently used to greet
visitors to SoftBank stores in Japan, where it can answer limited questions about products
and services, but its primary value is to engage and delight customers. There have also been
many generations of interactive toy robots.
One of the most exciting recent developments in the field is own as "swarm robotics."
Large collections of relatively simple uniform robots are programmed with rules, and when
these are applied in aggregate to the entire group, the robots exhibit complex behavior, called
"emergent behavior." TIUs same effect is observed in anthills and beehives, whose members
as communities solve problems that are far beyond the comprehension or capabilities of any
individual. While swarm robots could be any size, much research is focused on small (insect
sized) or microscopic ("nanorobotic") scales. Collections of these devices can work together
to perform some task, for instance, locating people trapped in collapsed buildings or
detecting toxic spills. They typically coordinate by forming ad hoc networks or
communicating peer to peer with nearby units.
It's difficult to exaggerate the potential benefits and dangers of this technology. On
the positive side, it could facilitate tremendous medical advances, such as performing
noninvasive surgical procedures from inside the body. Imagine a syringe full of robots the
size of T-cells that mimic the function of the immune system, able to seek and attack blood
home cancers. Or a shoebox full of robots the size of cockroaches that scurry around
collecting dust from floors and walls, stuffing their bounty into a small bag for easy disposal.
Imagine releasing thousands of mole-sized robots to explore for minerals underground,
followed by tiny robotic miners.
But there are also significant dangers. The same technology that might cure blood
borne cancers can be used to kill you, or perhaps even to control you." Anyone who has tried
to eliminate ants from the kitchen knows how difficult it can be to prevent an invasion by a
45
tiny, organized army. The potential military or terrorist applications of swarm robotics are
truly too horrific to contemplate.
Military applications are too numerous, and perhaps dangerous, to mention. While
the popular imagination conjures up visions of Terminator-style robotic soldiers running
around a theater of battle bearing arms, the truth is very different. Military robots will not
be designed to use weapons, they are the weapons. Examples include guns that can identify
targets and shoot autonomously, flying drones that can deliver explosive charges to precise
locations, and land mines that explode only when specific enemy vehicles are within range.
The possibilities are so disturbing that significant efforts are under way by the United
Nations and the military establishment to study the ethics and efficacy of using such precise
munitions to support or replace personnel in war zones. The current consensus is that as a
matter of caution, a human should be "in the loop" for all targeting decisions before pulling
the trigger, but it's not entirely clear that this is practical, or ethically defensible, since
requiring such review may put lives at risk.
More clear-cut applications of AI, robotics shades from simple devices that perform
rote actions (as are common in factories) to complex systems that sense their environment,
reason, take action, and adjust their plans in response to new observations, so the boundaries
of the field are far from clear. But it’s helpful to bear in mind that actual progress lags
behind public perception considerably. It’s easy to shoot a video of an engaging robot with
big eyes and an expressive face interacting in socially appropriate ways with a trained
demonstrator, but for the most part these systems are far more fragile than people expect, at
least so far.
What is computer vision?
Computer vision is primarily focused on equipping computers with the ability to
“see”, in the sense of interpreting visual images. Work in the field of computer vision has
paralleled the transition from symbolic systems to machine learning. Early efforts focused
on crafting algorithms that used specialized knowledge of visual images and descriptions of
objects of interest to look for semantically meaningful elements like lines, regions, and so
on, which were often then aggregated into larger and more general entities. For instance, a
program designed to identify a chair might look for legs, a seat, a back, and the like. But
the more modern approach is to use machine learning, often specialized types of neural nets
(called convolutional neural nets, or CNNs), to build models of objects from large
collections of examples. Very loosely speaking, CNNs look for patterns in small,
overlapping sections of an image, then can spread what they “learn” first to neighboring
sections and then to progressively larger regions of the image.
Using these techniques, recent progress in the field has been quite rapid. For
instance, accuracy on the annual ImageNet Large Scale Visual Recognition Challenge,
whose goal is todetect two hundred types of objects and localize them ("point them out") in
15o,ooo photographs containing a thousand object categories, has increased dramatically.
46
Error rates are in the range of 5 percent, down from several times that only a few years
ago.18 The contest is now expanding to the identification of objects in videos and to more
narrative descriptions of scenes, such as "The boy kicked the ball but missed the goal."
But the promise of this field extends beyond just visual imagery. A different way to
think about computer vision, or visual processing in general, is that it takes as input flat, two
dimensional images representing light reflected off of three dimensional surfaces, then
interprets or reconstructs a model of the original scene. It may reconstruct a scene based on,
for instance, multiple images from different viewpoints (stereo vision), knowledge of
geometry and physics of light, reflectivity of various surfaces, and an understanding of the
characteristics of real-world objects (people usually ride horses, not the other way around).
The real, three-dimensional world obeys certain rules of composition, and these rules
constrain the simplified two-dimensional view projected to the human eye or a digital
camera. (These are the rules that optical illusions violate.) However, the same techniques
have much broader application. While our eyes and most cameras sample reflected light,
there are all sorts of sensors that collect data about the real world beyond what humans can
see. Special devices, for instance, can measure infrared (heat), and reflected signals (e.g.,
radar and vibrations). The same basic rules and techniques that are used to process light,
suitably adapted, can be applied to interpreting and reconstructing scenes based on these
invisible signals.
Examples are the location and shape of oil formations underground, brain tumors,
and imperfections in concrete dams under stress. As long as we have sufficient knowledge
regarding the material characteristics of the domain we are examining, and have some
method to collect signals that project these domains into images in ways that we understand,
we can use computer vision techniques, broadly construed, to process them. In principle,
neither the scenes nor the images need be physical. As long as the domains obey certain
rules, and the images represent a lower dimensional array of data points known to correspond
to elements of the domain, the data can be processed to provide insight into the structure of
the domain.
In other words, computers can 11See" things that we can't. This isn't as mystical as
it sounds-the same is true of lots of animals. For instance, bats see using reflected sounds,
and most birds are capable of seeing colors that humans can't, a skill they use to select mates,
signal hunger, and foil nest parasites.
What are the main applications of computer vision technology? A myriad of real
world problems depends on identifying and locating objects of interest in a given setting.
Seemingly simple tasks, such as swinging a hammer at nails, stacking dishes, painting
houses, mowing lawns, and picking ripe fruit depend on knowing where things are. The
technology to act on this information-basic mechanical engineering and robotics-has been
available for some time, but has been limited to environments where objects of interest were
in predefined, fixed positions, such as on factory floors. But the recent advances in computer
47
vision make it possible to perform physical tasks such as these in less structured, real-world
environments. Over the next few decades, we are likely to witness a dramatic expansion of
the classes of tasks-and therefore jobs-that can be performed by machines.
A second major area of application is to information itself. The data we are
collecting, storing, and sharing is becoming increasingly visual in nature. The development
of the digital camera, particularly when integrated into ubiquitous communications devices
such as smartphones, has lowered the cost of taking and sharing photos to near zero, so
instead of tapping out ''I'm visiting the Golden Gate Bridge with my parents" on a tiny
keyboard, many people simply click and send a picture. As a result, the proportion of visual
information flowing through the Internet has ballooned. Video alone is projected to comprise
84 percent of all Internet traffic by 2018, according to a recent industry study.
The problem is that unlike textual data, which we can interpret electronically for
purposes of cataloging and retrieval, we have no way to manage pictures and videos unless
they come labeled at the source or are categorized by a human. The bulk of electronic data
shifts from textual to visual forms, we are in danger of "going dark" on the information
flowing through our expanding digital networks.
But computer vision techniques offer the promise to manage all this automatically.
Face recognition programs are already used for purposes as diverse as national security to
flagging your friends in Facebook pictures. But soon the ability to interpret and label images
will expand to include nearly every recognizable object, event, product, person, or scene that
you may care to inquire about. Computer vision technology may arrive just in time to help
us from drowning in an ocean of our own information.
What is speech recognition?
In contrast to humans, who presumably spoke before they wrote, computer use of
language has been the other way around. Speech recognition is considerably more difficult
than processing written language, in large part because of the variability and noise inherent
in audio streams of spoken language. Separating the “signal” from the “noise”, and
transcribing it into the proper written words, is a daunting task for humans as well as
computers, as any consumer of closed-captioning on TV can attest. But separating the
vocalizations from background sounds is only the start of the problem. As early researchers
in this field quickly discovered, there’s no obvious break between words, contrary to what
you may think when you listen to someone talk. Considerable meaning is also conveyed by
how you vary your volume and tone (called “prosody” by linguists). In English, you can
change the meaning of an utterance completely by raising your pitch at the end of a sentence-
consider the difference between the way you say, “This is true” and “This is true?” Then
there’s the problem of distinguishing homonyms-different words or phrases that sound the
same, such as “died” and “dyed”. Who the speaker is, the domain of discourse, the previous
context (if any), different voices, cadences, speed, and inflections further complicate this
task.
48
The problem of recognizing speech differs fundamentally from interpreting a picture
in that the former presents a single variable (sound waves) that changes dynamically over
time, while the latter is a snapshot (so to speak) of reflected light in two dimensions at a
single point in time. The information contained in the data is also fundamentally different.
Speech is a man-made artifact intended to communicate a thought or idea as expressed in a
specific sequence of words that are encoded as human-generated sounds. Sometimes this is
enhanced with additional information –expressed through tonality, pacing accent,
vocabulary, and so on-that may signal the emotional state of speakers, their status relative
to listeners, or their “tribal affiliation”.
By contrast, pictures are naturally occurring patterns that obey the law of physics.
Thus different tools and techniques can be appropriately applied.
With all these challenges, it’s a miracle that the problem can be solved at all. Most
early speech recognition efforts attempted to simplify the task by limiting the vocabulary,
operating in a simplified domain (like playing chess), requiring the speaker to pause between
words, and either designing for a specific speaker or requiring extensive training sessions
(for both the human speaker and the machine.
The application of modern machine learning techniques-once again driven by the
ability to capture and analyze large collections of speech samples – increased the precision
and utility of these systems.
The combination of more powerful computers, access to large · aunts of training
data, and machine learning techniques conspired to crack the problem and to deliver systems
of practical and commercial importance. While the current state of the art in computer speech
recognition is decidedly less capable than human speakers, the utility of this technology for
limited domains is quite impressive, for example, in Google Voice and Apple's Siri, each
available on the respective company's smartphones.
What is natural language processing?
A primary distinguishing factor between humans and other animals is our ability to
use language." We use our words not only to communicate but also to help us think,
remember, assign things to categories, and label individuals. Language serves not only to
describe but also to educate, create, imagine, indicate intentions, make commitments, and
identify people of similar heritage, among many other things. Like us, languages evolve and
tailor themselves to our needs, almost as though they were living creatures in their own right.
Language arose as a natural extension of gestures as a means of communication-ones
performed with the tongue and mouth instead of the hands and arms. And indeed, gesturing
and talking frequently co-occur in common use. (Some people have considerable trouble
articulating their thoughts while sitting on their hands.) The appeal of this innovation as an
aid to hunting and gathering is obvious: you free up your limbs to use for other purposes,
and you can communicate without being in the line of sight. Better language means more
49
food, so the motivation to bone up, so to speak, must have been strong. Not to mention the
selective advantages language confers in promoting romance, trading, training, and
codifying social conventions (rules and laws), which are reasons enough for it to catch on
like wildfire, regardless of its origins.
But none of this has anything to do with machines or computers. While we talk about
computer languages, the use of the term for these formal constructions is little more than an
analogy, similar to the terms machine learning or information superhighway. Computer
languages are designed for one purpose: to make it easier to program computers in a precise
and unambiguous way. Programs that process computer languages, called compilers, are
really formal methods for converting a more abstract but nonetheless rigorous specification
of some computational process into a form that can be executed on a particular computing
device. We don’t use Java (a popular computer language) to write poetry.
There's a strong relationship between computer and human languages-or so people
have believed until fairly recently. Descriptive linguists have long sought to codify the
structure of language, dating back at least to the Indian grammarian Panini, who codified the
syntax of the Sanskrit language into 3,996 rules in the fourth century BC. And indeed, today
we continue to teach grammar in, well, grammar school. But as every student quickly
discovers, the rules don't always hold-you have to remember lots of exceptions. What these
didactic failures indicate is that our attempts to reduce language to rules is at best an
oversimplification, and at worst, just plain faulty.
But given the nearly universally accepted view that language obeys syntactic rules,
it’s no surprise that early researchers in computational linguistics approached the problem
of processing natural (human) language on a computer by codifying it into a more
sophisticated form of the same basic word categories and sentence structures you learned in
school: nouns, verb phrases, subordinate clauses, and the like.
The processing of natural language by computer limped along for many decades until
someone tried a completely different approach: machine learning, and more particularly
statistical machine learning methods. While earlier approaches required the hand-crafting
of rules, the new approach mainly required access to large bodies of text, and such “corpora”,
as collections of text are called, became larger and easier to gather as more and more written
language was available in computer-readable form.
Work in the field focused on some problems of considerable practical significance,
such as translating text from one language to another, generating summaries of documents,
or answering questions, usually from a database of facts about some area of interest.
To focus on translation as an example, the big advantage is that you can start with
pairs of correctly translated text to learn from, with limited need for other forms of
knowledge or information about the subject matter. By automatically finding correlations
between the source and target examples, statistical machine translation programs (as they
50
are called) can learn not only the underlying structure of the input samples but how these
correlate with the correct translation in the output samples." These techniques don't
definitively say that one phrase translates into another, but they provide a probability that
various potential translations are correct.
A computer program, with no real-world experience and no knowledge of what the
text is about, could do a reasonable job of translating one language into another, much less
beat out a computer program crafted by a human who is an expert speaker of both languages.
But given enough examples, that's exactly what these systems can do. One of the remarkable
achievements of modern AI could be couched as a discovery in search of an explanation:
how simply finding correlations between enough examples can yield insights and solve
problems at a superhuman level, with no deeper understanding or causal knowledge about a
domain. It raises the possibility that our human efforts at explanation are little more than
convenient fictions, grand yet often imperfect summaries of myriad correlations and facts
beyond the capacity of the human mind to comprehend. Yet, the success of machine
translation, along with numerous other problem domains currently under study by that the
way we organize our thoughts may be only one of many possible ways to understand our
world-and indeed may not be the best way. In general, what machine translation programs
actually Jearn and how they perform their task is currently as incomprehensible and
impenetrable as the inner workings of the human brain.
51
CHAPTER 5
PHILOSOPHY OF ARTIFICIAL INTELLIGENCE
What is the philosophy of AI?

Artificial Intelligence does pose real challenges to philosophical and religious
doctrine about human uniqueness and our place in the universe. Intelligent machines offer
the potential to shine an objective light on fundamental questions about the nature of our
minds, the existence of free will, and whether nonbiological agents can be said to be alive.
The philosophy of AI asks the question of whether computers, machines in general,
or for that matter anything that is not of natural origin can be said to have a mind, and / or
to think. The answer, simply put, depends on what you mean by "mind" and "think." The
debate has raged on in various forms-unabated and unresolved-for decades, with no end in
sight.
What is “strong” versus “weak” AI?
Strong AI posits that machines do or ultimately will have minds, while weak AI
asserts that they merely simulate, rather than duplicate, real intelligence. Stated another
way, the distinction is between whether machines can be truly intelligent or simply able to
act "as if" they are intelligent.
Can a computer “think”?
We might consider thinking to be the ability to manipulate symbols to reason from
Initial assumptions to conclusions. From this perspective, it should be noncontroversial that
computer programs, as we currently Interpret them, are capable of such manipulations and
therefore are capable of thinking. But surely just stirring up a brew of symbols isn't sufficient
it has to mean something or do something. Otherwise, there's not much justification for
distinguishing one computer program from another, and any program that we Interpret as
processing symbols-no matter now trivial-would qualify as thinking, which doesn't seem
right. But how does a computer program mean or do something?
The use of symbols for reasoning and communication. A distinction is commonly
made between syntax, the rules for arranging and manipulating symbols, and semantics, the
meaning of the symbols and rules. While syntax is pretty easy is to understand, semantics is
not-even the experts don't agree on what "meaning" means. Most theories propose that
meaning requires some way of relating the symbols themselves to the things they denote in
the real world.
Example help. To visualize consider the following symbols !, @, # and $ as
connected substitute different symbols, leaving everything else the same:
52
1+1=2
1+2=3
2+1=3
l+3=4
3+1=4
2+2=4
We all know what 1, 2, 3 and 4 mean, except for the minor inconvenience that they
don't actually mean anything more or less than !, @, #, and $ do. They derive their meaning
from how we connect them to other concepts or real-world objects. If we connect $ with any
collection of four things, an expanded set of the above rules is exceedingly useful for solving
certain problems of great practical significance.
And to do something requires some connection between the actor manipulating the
symbol system and something external to that actor. In the case of computer programs, this
could (for instance) be figuring out how much you owe the phone company this month, the
movement of a chess piece (physically or virtually), or a robot picking up a pencil. Only in
this context can you say that the symbol manipulations have meaning.
Now, common arithmetic is one thing, but a vastly expanded concept of symbols and
rules is a reasonable description of just about any computer program at some level, even if
it's possible to make other interpretations of those same programs. It's an incredible eye-
opener for most computer science majors when they first discover that all the math they ever
learned in high school is simply a special case of some surprisingly easy to understand more
general rules.
Computers, by themselves, can't "think" in this sense at all, since they don't actually
mean or do anything-at best, they manipulate symbols. We're the ones associating their
computations with the external world. Electrons may be floating around in circuits, but we
are the ones interpreting this activity as symbol manipulation.
While the symbols themselves may be devoid of any semantics, perhaps the meaning
arises out of their relationships to other symbols, just as the definition of a word in a
dictionary is expressed in terms of other words. Machine learning algorithms suffer from
the same conceptual (though not practical) shortcoming-they reflect the complexity of the
real world, but without some connection to that world, it's just so much unmoored structure.
What's the difference between ideas swirling around in your brain and bytes zipping
around in a computer? In both cases, information is going in, represented in some form that
can plausibly be called symbolic (discrete nerve signals from your eyes, for example),
getting processed, and coming back out (nerve signals to your hand to press keys on your
keyboard, resulting in a spreadsheet of total monthly sales).
53
These must, in fact, be different things, but we just don't understand yet what the
brain is doing. The program is simulating thinking, not duplicating the process that occurs
in human minds when they engage in these activities.
Computer programs, taken by themselves, don't really square with our commonsense
intuition about what it means to think. They are "simply" carrying out logical, deterministic
sequences of actions, no matter how complex, changing their internal configurations from
one state to another. But here's where we get into trouble: if you believe that our brains are
little more than symbol manipulators composed of biological material, then you are naturally
forced to conclude that your brain, by itself, can't think either. Disconnect it from the outside
world, and it would be doing just what a computer does. But that doesn't square with our
commonsense intuition that even if we sit in a dark, quiet room, deprived of all input and
output, we can still sit there and think. We can't have it both ways: if symbol manipulation
is the basis of intelligence, either both people and machines can think (in principle, if not in
practice today), or neither can.
But if you prefer to maintain the comforting conceit that we are special-different
from machines in some fundamental way yet to be determined or that we are imbued with
some mystical qualities quite distinct from the rest of the natural world, then you can cling
to the notion that thinking is uniquely human, and machines are simply pretenders to our
cognitive abilities. It's your choice. But before you make up your mind, bear in mind that
there's an accumulating body of evidence chipping away at our seemingly obvious intuitions
about our most quintessentially human abilities-for example, that we actually have free will.
Can a computer have free will?
Virtually everyone believes that humans, and possibly some animals, have free will,
but can a machine or a computer also have free will? There is a long (intellectual and
religious history of debate about the nature and existence of free will. We have the ability
to make considered choices, possibly swayed but not determined by forces outside of
ourselves. So the first thing to observe is that we make a distinction between inside and
outside: to understand free will, we have to wrap a box around what is "us" to separate it
from what is 11TIOt us." But that alone is not enough. Inside the box, we must be free to
consider our options without undue influence so we can make a thoughtful choice, without
having a particular conclusion preordained or forced upon us. An important consequence of
this concept is that our decisions must not, in principle, be predictable. If they were, we
wouldn't really be making a free choice.
Assume that computers cannot have free will because they are different from us in
two key respects. First, they work according to well-understood engineering principles and
so can always be predicted. Second, they can't really be said to consider choices in the same
sense that people do. The problem is, both of these assertions are questionable at best.
54
The physical world operates in accordance with certain laws of nature, whether or
not we know or can know what those laws are. This is not to say that everything is
predetermined indeed, randomness may in fact be a fundamental part of nature. But
randomness is just that random, not a free pass for things to happen in accordance with some
grander plan or principle that is somehow outside of the laws of nature. Otherwise those
plans would simply be part of the laws. In other words, there is no such thing as magic.
Assume that your mind arises from your brain, and your brain is a physical object subject to
the laws of nature.
Another way to say this is that given a particular state of mind, there will be an
equally distinct state of the brain two different incompatible thoughts or beliefs can't arise
from a single physical arrangement of matter and energy in your brain. I'm not aware of any
objective evidence to the contrary, but that doesn't mean for certain that these assumptions
are correct-indeed, much of the historical debate over free will focuses on precisely these
assumptions, so to some degree I've baked in my conclusions by taking these positions.
We can build a robot that does exactly what you did. No matter how we program that
robot to make decisions, no matter how predictable that robot is, as long as it has access to
an outside forecast of its own actions, that forecast can't always be correct. The robot can
simply wait for that forecast, then do the opposite. So a sufficiently capable robot can't
always be predicted, where "sufficiently capable" means it has access to the attempt to
predict what it will do.
A deterministic machine, whose behavior is completely specified and understood,
can always be predicted. Any given state of a computer program may transition to its next
state in an entirely predictable way, but surprisingly, we can't simply string knowledge of
these states together to get a complete picture of what the program will ultimately do. And
the same, of course, is true for you in particular, you can never accurately predict your own
behavior. It's possible that this is why we have the strong intuition that we have free will,
but this is simply an interesting hypothesis, not a proven fact. Other possibilities are that our
subjective sense of free will has arisen to serve some yet to be identified evolutionary
purpose(s), like desiring sweets or being attracted to the opposite sex.
Look at the question of how computers make decisions. Using metaphor and analogy
to solve the problems. Our sense of choice is nothing more than an illusion. Presumably
your brain, as a physical object, plays by the same rules as the rest of the physical world,
and so may be subject to inspection and analysis. And if your mind arises from your brain,
at some level it too must operate according to some laws of nature, whether we understand
those laws yet or not. Introducing randomness into the picture doesn't get around this
problem, and neither does the peculiar fact that lots of deterministic processes are
nonetheless not subject to prediction even in principle. Finally, there's no reason other than
wishful thinking to suggest that machines are in this regard any different from us. This is
55
not to say that people and machines are equivalent in all respects-they clearly aren't. But
when it comes to making choices, so far, at least, there aren't good reasons to believe they
operate according to different natural or scientific principles.
Either both people and computers can have free will, or neither can-at least until we
discover some evidence to the contrary.
Can a computer be conscious?
People are conscious but machines cannot be. But it's equally unjustified to assert
that machines can be conscious. At the present time there's no credible way to establish
whether computers and animals-or other people, for that matter-experience consciousness
the same way we feel that we do.
The notion of consciousness or subjective experience more generally, simply doesn't
apply to machines.
Can a computer "feel"?
The answers to our questions hinged largely on whether you regard words like
intelligence, thinking, and feeling as connoting something sacrosanct about humans (or at
least biological creatures), or whether you are comfortable expanding their applicability to
certain artifacts.
The challenge posed by AI is how to describe, and therefore how to understand and
reason about, a phenomenon never before encountered in human experience-computational
devices capable of perception, reasoning, and complex actions. But the words that seem to
most closely fit these new developments are colored with implications about humanity's
uniqueness.
It’s relatively simple to build a robot that flinches, cries out, and / or simply says,
"Ouch, that hurts" when you pinch it. Since computers are so different from us (at least
today) and are designed by us for specific purposes (as opposed to naturally occurring), it
seems logical to say they don't, and most likely never will, have real feelings.
Our intuitions lead us to believe that machines, no matter how sophisticated, raise no
ethical concerns in their own right. On the other, we can't comfortably exclude certain
entities from the community of living things based solely on what materials they are
composed of.
56
CHAPTER 6
THE LAW FOR ARTIFICIAL INTELLIGENCE
How will AI affect the law?

Al will significantly impact a wide variety of human activities and have a dramatic
influence on many fields, professions, and markets. Any attempt to catalog these would
necessarily be incomplete and go quickly out of date. The potential effects of AI on the
nature, practice, and application of the law. AI will change the practice of law as well as
the way laws will be formulated and administered, and why the emergence of Al systems
will require modification and extension of current legal concepts and principles.
How will AI change the practice of law?
To understand how AI is likely to impact the practice of law, it's helpful to
understand how it is currently practiced, at least in the United States. The American Bar
Association (ABA), an influential trade organization, was formed in 1878 by seventy five
prominent lawyers from around the country, and today has over four hundred thousand
members. As of 2014, there were nearly 1.3 million lawyers licensed to practice in the
United States, 75 percent of whom were in private practice. While the ABA engages in
many laudable efforts to ensure that the practice of Jaw meets high ethical and professional
standards, its primary mission is to promote the interests of lawyers ("Goal 1: Serve Our
Members"). Like virtually all professional guilds, the ABA, along with a patchwork of state
and local counterparts, influences if not controls who can practice Jaw, how they can
promote their services, and how much they can charge. It serves as the gatekeeper to the
profession by accrediting law schools, from which most states require aspiring lawyers to
obtain a Jaw degree before they take their bar exams and therefore become licensed to
practice law. To maintain this control, the ABA proposes model rules regarding the
unauthorized practice of Jaw, which is considered a criminal-as opposed to civil-offense in
most jurisdictions.
In essence, society has struck a bargain with the legal profession: it is permitted to
operate a monopoly, controlling access and maintaining price integrity, in return for making
legal assistance available to those unable to afford a lawyer "pro bono" (free), mainly via a
network of public and private legal aid services. The problem is, the profession has largely
failed to keep up its end of the bargain. As of 2009, a study found that one legal aid attorney
was available to serve 6-415 low-income people, while attorneys in private practice
available for those above the poverty level served only 429 people. Other studies show that
30 percent of low-income Americans have little or no access to legal assistance, and even
non indigent citizens cannot afford to pursue appropriate legal redress a significant
57
percentage of the time. It’s just plain expensive to hire a lawyer, and often difficult to
manage him or her when you do.
Technology to serve the legal profession has advanced tremendously over the past
few decades, if not centuries. The ability to collect and widely disseminate legal statutes and
judicial decisions that serve as precedents is a relatively recent occurrence. Practice of law
was largely limited by the number of books he could carry and court arguments in his time
were often little more than reciting aphorisms like "What's good for the goose is good for
the gander. Today, not only do attorneys have near-instant access to virtually all case law,
a wide variety of information systems support their work in drafting contracts, briefs, and
all manner of legal documents.
Lawyers are disinclined to adopt technology that speeds their work unless they are
paid on contingency or through fixed fees. In other words, the main impediment to making
legal services more broadly available and affordable is the economic structure of the legal
profession. Because of this, many lawyers are understandably resistant to any technology,
no matter how effective and efficient, that can help people to help themselves. But creating
that technology is exactly where AI is heading.
AI system can do them as well as or better than the average lawyer. At the very least,
such automated systems can handle the bulk of the work, reserving only the exceptions and
complex cases for human review. The impact of automation-whether AI-based or not-is at
least one underlying reason that enrollment in law schools, and the starting salaries of
lawyers, has been dropping, precipitating a crisis in the profession.
The most obvious way to assist consumers with legal matters as to provide them with
sample "fill in the blanks" forms. As a general matter, these are considered legal, though
even that has been challenged by at least one bar association. It was a short hop from
providing such forms on paper to providing them online over the Internet. But from there,
the trouble starts. If you are going to provide the forms, why not help the customer fill them
out? And since lots of "blanks" are contingent, based on the contents of other "blanks," why
not have the software skip the inappropriate ones? But even this obvious step toward
efficiency, using so-called 'decision trees, has been ferociously resisted by the legal
profession. While it's generally acceptable for software programs to provide forms, it is not
acceptable for them to do "document preparation. Other valuable online legal services hide
under the fig leaf that they are "referral services," which are permitted, though heavily
regulated.
Many technologists focus their automation efforts on peripheral problems, such as
resolving disputes before they rise to the level of a legal action or get to trial. Courts and
litigants have long encouraged the use of quasi-judicial pretrial resolution forums to reduce
caseloads (known as alternative dispute resolution). If conflicts can be resolved privately,
all parties are better off. To date, this involves the use of professional negotiators, mediators,
and arbitrators essentially acting as private judges. However, new techniques are moving the
58
role of technology beyond simply facilitating communication between the parties to actively
participating in the resolution process. Such systems employ game theory, analysis of
successful outcomes, and negotiation strategies to resolve issues using methodology that
litigants perceive to be objective and unbiased, making them more amenable to settlement.
How is AI used to help lawyers?
The economics favor adoption of technology by lawyers. One such thriving area is
called "e-discovery." In the course of litigation, both plaintiffs and defendants are permitted
access to each other's relevant documents to look for evidence pertinent to the case. The
problem is, this document production may be voluminous. Until fairly recently, the review
of discovery documents was done by lawyers, or at least trained specialists such as
paralegals. Due to the ease of maintaining electronic documents (indeed, it's a challenge to
get rid of them), not to mention that so much of today's business is transacted in electronic
form, the volumes produced in response to discovery requests can be staggering. For
example, in one antitrust case, Microsoft produced over 25 million pages of documents, all
of which had to be reviewed not only for relevance but often to redact nonmaterial
confidential information which might be subject to a so-called protective order prohibiting
even the client from viewing the contents., How could this possibly be completed in a
practical time frame at a reasonable cost (meaning one that the lawyer's clients can
stomach)? AI to the rescue.
A technique called "predictive coding" can permit a computer to perform this mind
numbing task with speed, diligence, and accuracy far exceeding that of human reviewers.
First, human attorneys review a set of sample documents statistically selected to represent
the characteristics of the entire collection. Then a machine learning program goes to work
identifying criteria that will permit it to match the human performance as closely as possible.
The criteria may involve everything from simple phrase matching to very sophisticated
semantic analysis of the text, context, and participants. The newly trained program is then
run on a subset of the remaining 0tems to produce a new set of documents, and these in turn
are reviewed by the attorneys. This process iterates until the program is capable of selecting
adequately relevant documents on its own. (The technique is similar to the way email spam
filters are tuned using feedback from users who mark messages as "junk.") E-discovery has
spawned an entire mini-industry of service providers. Indeed, Microsoft itself recently
acquired Equivio, one of the leading companies in the field.
Other efforts include ways of predicting the outcome of lawsuits. For instance, a
recent effort to apply machine learning techniques to predicting the outcome of U.S.
Supreme Court decisions was able to correctly guess the judges' decisions more than 70
percent of the time, using data only from cases prior to the case predicted. It does so by
analyzing the voting behavior of each individual justice from a database of sixty eight
thousand such votes. Such information is critical for lawyers in preparing their cases and
advising their clients.
59
What is computational law?
Laws are objective, easy to understand, and apply to particular situations. To this
end, those who draft legislation and regulations attempt to be as precise as possible as they
craft the statutes. But the fact remains that natural language tends to be inexact. In many
cases, it would be more useful to express the intent in a formal language-akin to, if not in
fact, a computer language. Note that this need transcends the law; it applies to any
circumstances where rules, regulations, and just plain business processes are used.
The advantage of taking a more formal approach to expressing procedures,
requirements, and restrictions goes beyond just being clear and exact-it opens the door to
interpretation and application by automatic systems. Consider, for instance, tax regulations.
Luckily, lawyers are not mandated to prepare your taxes. A great deal of effort has gone into
computer programs capable of assisting you in filling out your tax forms and calculating
what you owe. Some jurisdictions, primarily in Europe providing you with provisional
returns prepopulated with information already reported by third parties. All you have to do
is review, approve, and make the required payment(s). Note that the Internal Revenue
Service already possesses this information for your taxes and uses it to verify your returns,
so in principle it would be a simple step to give you access to it. Currently, over 90 percent
of all U.S. individual tax returns are filed electronically.
The advantages of enacting tax laws and regulations expressed not only in prose but
in computable form are obvious. But there are many areas of laws, rules, and processes
besides straightforward calculations that could benefit from formal procedural
specifications. Representation in this form makes it possible for the laws themselves to be
formally studied-for example, for completeness and consistency-and to be reasoned about,
explained, and applied. This area of study in AI is called computational law. One potential
application, colloquially referred to as "the cop in the backseat," considers how traffic laws
might be automatically transmitted and displayed-or simply acted upon-in your car as they
are applicable. For instance, your future self-driving car should automatically obey the speed
limit-but where, exactly will that information come from? If it's collected in the style of
electronic maps by third parties, it could easily go out of date. But if it's queried and
transmitted to your car as it drives, it can always be up to speed, so to speak.
Once such systems are available, not only will it be possible to make the laws much
easier to comply with, the law will become much more responsive and flexible. For instance,
a new driver might be issued a restricted driver's license, one that is limited to certain
thoroughfares and hours as determined by traffic enforcement authorities on a dynamic
basis. If traffic is light enough and conditions are clear and dry, perhaps your sixteen-year-
old child should be permitted to drive at night unaccompanied, except on certain holidays
like New Year's Eve.
The implications for drafting, enacting, communicating, and enforcing the law could
change dramatically. One such opportunity, currently being studied by the Office of
60
Financial Research of the U.S. Treasury Department, is known as "computable contracts."
The basic idea is to allow relatively straightforward agreements, such as loans and leases, to
be represented in a logical form amenable to formal analysis and dispute resolution.
Can a computer program enter into agreements and contracts?
They already do. When you purchase something online, no human makes the
decision to contract with you, yet the commitment is binding. The Uniform Electronic
Transactions Act (UETA), which has been adopted by all U.S. states except Washington,
Illinois, and New York specifically validates contracts formed by electronic agents
authorized by their principals. Similarly, programs trade stocks, approve credit card
purchases, issue credits, and so on.
Currently they do so "on behalf" of a principal (corporation or person) who is bound
by their actions, but this is likely to change as increasingly autonomous intelligent agents
engage in activities that are further detached from those whom they represent. This can
change in two opposing ways. One, as they become more capable, we may wind up limiting
the class of transactions they will be legally permitted to engage in on behalf of natural
people. But in other circumstances, we may permit them to enter into contracts by and for
themselves, without requiring that a natural person be the legal entity bound by the
commitment.
An intelligent agent limited in what it is permitted to do
There are many situations where the law (or rule) implicitly assumes that you, or at
least a human agent representing you, are the only potential actor-usually to ensure that
everyone has an equal opportunity to access some scarce resource, or at least to extract the
same personal cost from all takers. The whole concept of standing in line is based on this
principle. But intelligent systems may violate this assumption. For instance, many
commercially available passenger vehicles are capable of parking themselves. To ensure that
this free resource is distributed equitably and is used for temporary periods, such as while
you are shopping or eating out, as opposed to all-day parking for employees who work
nearby. The time limitation is intended to extract a cost-you have to return to your car and,
if you desire more time in the area, repark it. So is it fair to permit a self-driving car to rep
ark itself every two hours? This would seem to violate the intent, though not the letter, of
the law. A less visible though more annoying example is the use of so-called bats to purchase
scarce resources online, such as concert tickets. Responding to consumer complaints,
several jurisdictions have outlawed the practice, though to limited or no practical effect.
Legislation requiring people to personally review and approve all voting decisions,
regardless of how the decision is made, prior to voting or arranging for their vote to be cast.
In other words, the new law makes it illegal for a computer to vote on your behalf without
your affirmative review, one of the first of many areas where the law begins to regulate what
61
you can and cannot use an intelligent machine for, even if the underlying action is perfectly
legal for you to perform yourself.
Why the use of intelligent agents to act on your behalf may be reasonably restricted
in the future and, moreover, why these restrictions may ultimately be put in place on an ad
hoc, as-needed basis.
People Responsibility for their intelligent agents
Bearing the risks and costs of permitting your robotic personal assistant to engage in
simple transactions for your benefit-like making dinner reservations, renewing a
prescription, or booking travel-may be a reasonable tradeoff for the increased convenience,
but there are circumstances where you may be less happy about accepting full responsibility
for its actions. For example, what if your robot inadvertently pushes someone into the path
of an oncoming bus, breaks an expensive vase at Tiffany's, or pulls a fire alarm handle after
mistaking a table-side cherries jubilee flambe for a flash fire? Would you feel as responsible
for these actions as if you had done them person ally? The question is, of course, if you're
not responsible, who is? You may suddenly become a proponent of establishing a legal
framework for assigning the blame to the autonomous agent itself. To consider this
possibility, it's helpful to note that we already hold some nonnatural entities accountable for
their actions: corporations. Indeed, they have considerable rights and responsibilities under
the law as entities unto themselves.
Corporations are legal entities that serve several purposes, most notably to generate
profits. But that's not all-they provide a mechanism for limiting liability, sharing costs and
benefits, and serving as a vehicle for groups of people to act in concert, not to mention
potentially serving the needs of customers or broader society in general. Corporations can
enter into contracts, own assets, and more recently are entitled to muted rights of free speech
to rights, corporations also' have responsibilities, which may include registration, licensing
and reporting, paying taxes, and obeying all relevant laws and regulations.
For many purposes, corporations exist under the legal rubric of "persons," though
they are, of course, distinct from natural persons. Indeed, the word itself derives from the
Latin corpus, meaning "body."
Corporate law is a reasonable model for the possibility of extending rights and
responsibilities to intelligent machines. Indeed, there's nothing to stop you from creating
such a device and forming a corporation to own it. But why would you want to? Many
professionals, such as doctors and lawyers, form LLCs (limited liability corporations) that
insulate their personal assets from their professional activities, in case of malpractice suits.
In some places, individual taxis are separate corporations for just this reason. Consider how
much stronger this motivation might be when you own a fleet of autonomous taxis. You may
feel a personal sense of responsibility, or at least control, if you or a family member is driving
and causes an accident. But if the car is out there on its own, cruising around and looking
62
for fares, you might be more concerned: it's ten p.m.-do you know where your taxi is? What
if it's picked up a fare wearing a ski mask and holding a gun, who instructs it to drive him
or her to the nearest bank and wait outside with the motor running? Does that make you an
accessory to robbery? Should it? Why risk losing your business because of a programming
mistake by some anonymous AI engineer?
In this case, your autonomous taxi is still entering into transactions on behalf of
another legally sanctioned entity the corporation that owns it. But could it ever make sense
to permit such an artifact to actually have such rights and responsibilities by itself? The key
to permitting this is providing a legally sanctioned source of restitution. In most cases, that
means that some pool of assets must be available to compensate an aggrieved party.
AI system to own property?
A significant function of incorporation is to shield the stockholders from liability. In
their stead, the corporation's own assets are at risk in the event of a legal claim. These assets
may take many forms-cash, inventory, real estate, loans, and so on. But unless we permit AI
systems to own property, the only evident asset available is the system itself. Though this
may be quite valuable-it may, for instance, include unique expertise or data or, in the case
of a robotic system, its physical embodiment (hardware) or ability to perform labor of some
sort-this may be cold comfort to someone who simply prefers cash compensation for a loss.
The obvious solution is to permit the system itself to own assets, just as a taxi wrapped in a
corporation may have some accumulation of receipts in a bank account in addition to the
vehicle itself and rights in the form of a "medallion" (basically a license to operate).
However, permitting AI systems capable of independent action to own assets is
potentially quite dangerous. In contrast to corporations, which are entirely dependent on
humans to take actions, these systems are, in principle, capable of taking actions by
themselves. They can potentially devise business strategies, make investments, develop new
products or processes, patent inventions and, most important, own property-notably
including other AI systems.
This is not to say that machines cannot be granted rights, including the right to own
assets, but such rights should be limited and go hand in hand with responsibilities, such as
passing competency tests and obtaining operating licenses. Corporations have rights (such
as limited free speech), but these go hand in hand with responsibilities (such as preserving
the environment). For instance, a computer program could be granted the right to draft
contracts if and only if it passes the bar exam. In this sense, it may be appropriate for
sufficiently capable AI systems, like corporations, to be limited "persons" under the law.
Can an AI system commit a crime?
Yes, it can. Society also designates certain behavior as crimes, that is, actions that
are prohibited either for moral reasons or because they cause harm to the social order or the
public interest. For example, it is a crime in California to eat dogs and cats but not chickens
63
or fish, though all of these are commonly held as pets. It is also a crime to operate a vehicle
off roads in a manner that may cause environmental damage. Obviously, an autonomous
vehicle could cause environmental damage, even if inadvertent, and that's a crime. (Note
that some actions can be both torts and crimes, such as shooting someone.
Some crimes, such as murder (as opposed to manslaughter) are considered more
serious because they involve an ethical transgression. That is, the actor is expected to know
that what he or she is doing is morally wrong. The law presumes that the person committing
the crime has what's called "moral agency." Moral agency requires two things: that the actor
be capable of understanding the consequences of their behavior, and that they have a choice
of actions. Surprisingly, you don't have to be human to have moral agency.
Many people don't realize that corporations, as distinct from their managers,
employees, or stockholders, can be held responsible for committing crimes. For example,
the oil company Chevron has a long rap sheet of criminal convictions, mostly for willful
pollution, though its employees have rarely been charged individually in connection with
these actions. In at least some of these cases, the corporation itself is considered to have
moral agency because the institution is capable of understanding the consequences of its
behavior and has a choice of actions (whether or not to commit the crime), though this
concept is not without some controversy.
So can a computer program be a moral agent? It can, because it meets the definition.
There's no reason you can't write a program that knows what it is doing, knows it is illegal
(and presumably therefore unethical), and can make a choice as to what actions to take.
There's nothing that requires a moral agent to "feel" anything about right and wrong-the
requirement is simply that it knows the difference.
Without proper programming, machines are natural psychopaths, but they don't have
to behave that way. It's entirely possible to program a machine to respect an ethical theory
and apply it to a pattern of facts, so it follows that machines can know right from wrong and
make moral decisions. Indeed, this area of inquiry, called computational ethics, seeks to
create artificial moral agents. It's a special case of a broader problem we are going to face as
AI systems increasingly interact with people-how to ensure that they respect often implicit
human conventions of politeness, such as waiting your turn to get on the bus or taking only
one free newspaper. Creating computer programs that are properly socialized and respect
our sense of right and wrong is likely to be a significant technological challenge.
Computers and the law?
Legal transgressions are sometimes expected, or possibly even required. Obeying
rules isn't sufficient to ensure moral behavior. For instance, we wouldn't want a dog-walking
robot whose dog is mauling a child to stop in its tracks because of a "Keep off the grass”
sign. Nearer term, autonomous vehicles raise a host of troubling behavioral issues. For
example, would you want your self-driving car to patiently wait for stoplights when it's
64
rushing you to the hospital in a life-threatening emergency? Should it cross a double-yellow
center line to avoid hitting a dog running across the street? The behavioral rules we live by
aren't created in a vacuum-they are formulated on the assumption that people are capable of
recognizing when a more important goal justifies bending or breaking them.
While it's possible to design machines that can modify their own rules in response to
observations of circumstances, the question arises as to what principles these modifications
should follow. Deeper precepts are required to provide guidance, particularly when rules do
not apply or rules should be broken in the service of some higher ethical imperative. So it's
critical for us to develop explicit, implementable moral theories to guide the behavior of
intelligent machines.
How can an AI system be held accountable for criminal acts?
Anything that is capable of pursuing a goal can be punished. You simply have to
interfere with its ability to attain its goal. If it is capable of adapting in any way, it will, at
the very least, alter its behavior. By interfering in the right way, you can accomplish what
you are trying to achieve.
Legal theory offers four primary objectives for punishment: deterrence,
rehabilitation, restitution, and revenge. In the case of an AI system, deterrence is simple:
shut it off or otherwise prevent it from doing what you don't want it to do.
Example a machine learning system that has been tuned up over a period of time, an
effort that might be difficult or impossible to re-create, perhaps because the training data
was ephemeral. For instance, imagine a system designed to deflect cyber attacks on critical
infrastructure, such as the electrical grid. It detects unusual patterns of activity against a
constantly changing background of legitimate activity. (This is a real application.) The
problem is, it starts blocking legitimate traffic from a newly inaugurated distributed
electrical grid management system designed to avoid sudden systemic blackouts. (This is a
hypothetical example.) How can you fix this? Basically, you have to retrain it. For example,
you might challenge it with fake transactions that mirror legitimate ones, and indicate that
these should not be blocked.
More generally, if you introduce a cost for undesirable actions into an AI system that
changes its calculation as to how it can best achieve its goals, it will alter its behavior
accordingly. An autonomous taxi whose objective is to maximize revenue might find that
speeding through yellow lights reduces travel time and increases tips, but if a fine is levied
for doing so, this "punishment" will change its reasoning and therefore its conduct.
Restitution is mainly a question of identifying a pool of assets exposed to potential
forfeiture. Whether the restitution is paid to an injured party as a result of a tort or constitutes
a fine levied by some appropriate governmental authority, it is still a legitimate way to hold
an AI system accountable for its behavior.
65
Revenge, however, is another matter. In principle it is in the eye of the beholder, but
commonly, the goal is to create a negative emotional state in the bad actor-such as remorse
or a longing for restored personal liberty (lost as a result of incarceration). None of this
makes sense when dealing with a nonbiological entity, even if it may be tempting to throw
your computer out the window when you feel it is misbehaving. But emotional satisfaction
need not be rational to be effective, as anyone who has kicked a broken vending machine
can attest.
66
CHAPTER 7
THE IMPACT OF ARTIFICIAL INTELLIGENCE
ON HUMAN LABOR
INTRODUCTION
Are robots going to take away our jobs?

Technological improvements have raised productivity and increased economic
output throughout human history, most notably during the industrial revolution. In plain
language, this means that fewer people are needed to perform the same amount of work. But
it's equally true that historically, the increased wealth resulting from these improvements has
created new jobs, though this effect is rarely immediate. More important, the new jobs are
seldom comparable to the ones lost, so the displaced workers often lack the skills needed to
fill the newly created positions. As long as these effects are gradual, the labor markets can
adapt gracefully, but if they are rapid or abrupt, significant dislocations can occur.
The actual process by which machines displace human workers is mum more subtle.
In practice, automation replaces skills, not jobs, and correspondingly, what employers need
is not workers but the results obtained by applying those skills. To be successful, makers of
robots don't have to replace people; they have to provide maci1ines with the requisite skills
to perform useful tasks. And while their products may not replace workers one to one, they
nonetheless put people out of work because fewer workers are required-what makes some
workers more productive also puts other workers out of their jobs. But the process also
changes the jobs of those still employed by eliminating the need for a particular skill, and
possibly adding requirements for new skills.
A good example of this process is as close as the checkout stand at your local
supermarket. The clerks who total your bill (cashiers) and package your groceries (baggers)
are engaged in a number of skill-based tasks that have changed significantly over the past
few decades. The cashiers used to examine earn item in your shopping cart and key the price
into their cash register, whereas now they simply swipe the items across a bar-code reader.
The advantages of the new system in terms of accuracy, time, and labor are obvious. But the
main reason the cashiers are still there to greet you is that certain items require special
handling. In particular, bags of loose produce need to be identified and weighed to determine
a price. And these are skills that so far, at least, have resisted automation.
Nothing about AI changes the fundamentals of how labor markets evolve with
technology. From an economic standpoint, AI technology is just another advance in
automation. But its potential to rapidly encroach on current workers' skills is unparalleled
in the recent history of technological innovation, with the possible exception of the invention
of the computer itself.
67
In summary, to understand whether AI is going to put someone "out of a job" it's
necessary to understand what skills, in aggregate, that worker utilizes, whether those skills
are separable from the rest of the work he or she performs, and how susceptible those skills
are to automation, with or without the application of AI. As a general principle, the fewer
unique skills a given worker utilizes, the more vulnerable he or she may be to replacement
by a machine, depending on the skills, of course. But even if only a portion of a worker's
expertise or experience is amenable to replacement, improving productivity has the effect of
reducing overall employment.
So yes, robots are going to take our jobs, but a more useful way to think about it is
that they are obsoleting our skills, a process economists call "de-skilling," appropriately
enough. And there's nothing new about this process-the magnitude of the impact of AI in
particular will depend on how quickly and widely the new technologies will facilitate
automation of workers' skills. And on that front, the news is not good for humans.
What new tasks will AI systems automate?
What sorts of skills currently resistant to automation are likely to be susceptible to
AI techniques. The most obvious are tasks that require simple perceptual skills, such as the
ability to see. It has long been possible for a mechanical arm to pick up a known object in a
known orientation at a known location, but many practical tasks involve the simple act of
looking at the object to figure out where it is before taking some equally simple action, such
as picking fruit off a tree, collecting trash, straightening and restocking items on shelves,
packing merchandise into boxes for shipping, setting roof tiles, separating recyclable
materials, loading and unloading trucks, and cleaning up spilled items. Anyone employed
to perform these tasks today is in imminent danger of replacement due to advances in
computer vision.
There is another broad class of jobs in which we employ people just to pay attention.
The potential of facial recognition in crowds to flag possible suspects in well known, but the
accuracy and broad deployment of such systems is increasing dramatically, heightening
privacy concerns. Visual recognition systems will be able to identify and discriminate
permitted from prohibited activities, such as a customer in a store walking into an area
restricted to employees or attempting to carry out merchandise without paying. A number
of supervisory functions fall into this class. For example, an AI system will be able to
summon a waiter to refill a patron's water glass or clear a table for the next guest. One
computer vision system currently being tested on the Stanford University campus counts the
number of people entering a restroom in order to customize the schedule for attendants to
service the facility. Future traffic lights will be able to anticipate your arrival, change
dynamically to facilitate vehicle flow, and stop traffic when a pedestrian or obstruction (such
as a dog) appears.
Historically, the jobs most susceptible to automation have been characterized as
those that are routine-meaning that they involve performing the same activity or task
68
repeatedly or, with the advent of computers, are readily described in an explicit set of steps
or rules, and so can be more easily reduced to a programmatic formulation. But AI is
expanding this constraint to many tasks that are decidedly less routine. For example,
successfully driving a car may be well defined, but the task is hardly routine. The same could
be said for reading handwritten documents or translating text between languages. Yet,
machine learning techniques have proven very adept at these challenges, often equaling or
exceeding human capabilities.
Using so-called big data, many tasks that might otherwise appear to require insight
and experience are now within the purview of today' s or tomorrow's machines. Indeed,
detecting patterns too subtle or fleeting for human analysis, such as the flow of data in a
network, the movement of potentially hostile troops near a contested national border, or the
activity of credit card charges signaling the possibility of fraud, is now practical.
In short, new AI technologies promise to dramatically improve productivity in a
number of areas previously resistant to automation-and they therefore also run the risk of
devastating many professions.
Which jobs are most and least at risk?
A remarkable 47 percent of today's jobs are at high risk of automation over the next
few years and decades, and another 19 percent are at medium risk. They regard only one
third of current workers to be relatively safe from replacement over the next one to two
decades.
How will Al affect blue-collar workers?
Industrial robots have long been used for simple, repetitive tasks like welding and
assembling, but recent breakthroughs in sensory systems permit these mechanical servants
to escape the factory floor and seek wider employment. The missing part is the "brains." We
can connect low-cost sensors up to dexterous manipulators, but translating the stream of data
into action is another matter. This is a hard problem, but AI engineers have a trick up their
sleeves: plenty of economically valuable undertakings can be deconstructed into a series of
smaller, simpler tasks, each performed by separate devices and techniques. Complex tasks
such as installing a new lawn sprinkler system can be broken down into more easily
automated components. One robot may deliver the materials, another may dig the ditch, a
third might lay and connect the pipe, and a fourth might backfill. A human worker might
still be needed to design the layout and supervise the work but that's cold comfort to the
myriad manual laborers previously required to complete this job. Industrial and commercial
automation on a grand scale doesn't require some magical recreation of human intelligence-
it just has to get the job done.
And that's where AI comes in. For the most part, one-off solutions that don't
generalize are perfectly fine as long as they deliver economic value.
69
At the current state of the art, there's no fundamental scientific breakthrough required
for an Al-based system to tackle the vast majority of blue-collar jobs-the constraint is mostly
just painstaking engineering. As long as the task is well defined, relies on available sensory
input, and falls within the operational capability of available mechanical technology, it's
simply a matter of time until some clever inventor configures the components and writes the
software to replace human workers.
A selection of blue-collar professions among the most susceptible to automation:
• sewer diggers
• watch repairers
• machine operators (many different subcategories)
• tellers
• shipping, receiving, and traffic clerks
• drivers
• inspectors, testers, sorters, and samplers
• projectionists
• cashiers
• grinders and polishers
• farm laborers
• lobby attendants, ticket takers
• cooks
• gaming dealers
• locomotive engineers
• counter attendants (at cafeterias, coffee shops, and the like)
• postal clerks
• landscapers and groundskeepers
• electrical and electronic equipment assemblers
• print binding and finishing workers
While it may be possible to automate these professions in the near future, it's worth
noting that it may not always be desirable to do so, In many cases, the value delivered by a
worker is social as well as physical.
By contrast, there’s a similar selection of blue-collar jobs that the Oxford study lists
among the least susceptible to automation:
• recreational therapists
• audiologists
• occupational therapists
• orthotists and prosthetists
• choreographers
70
• physicians and surgeons
• dentists and orthodontists
• fabric and apparel patterrunakers
• athletic trainers
• foresters
• registered nurses
• makeup artists
• pharmacists
• coaches and scouts
• physical therapists
• photographers
• chiropractors
• veterinarians
• fine artists and craft artists
• floral designers
How will AI affect white-collar professions?

White-collar jobs are characterized by the processing of information, so many of
them are natural targets for automation by a computer. Some are engaged in a rote process,
such as transcription of handwritten notes into electronic form. Sometimes this process
requires skills that come naturally to humans but are more difficult for machines (at least
today), such as converting spoken language into written words. A subset of white-collar
workers are so called knowledge workers, whose main value is expertise but whose output
is still information, such as software engineers and accountants.
In some ways, applying AI techniques to white-collar tasks is less challenging than
to blue-collar tasks. As a general matter, manipulating information is easier than integrating
with the physical world, and more natural for computers. In addition, white-collar tasks tend
not to be as real-time as blue-collar tasks.
In terms of their impact on human employment, AI technologies do not align
naturally with our human inclination to accord status and respect to certain professions over
others. Many low-status professions are exceptionally difficult to automate, while some
high-status ones are relatively straightforward. For instance, while the skills and experience
required to write a cogent news article would seem to be the exclusive purview of
professional journalists, computer programs can now write at a level that is difficult to
distinguish from articles created by human writers, at least in certain domains.
The Oxford study includes the following selection of white-collar professions as
among the most susceptible to automation:
• tax preparers
71
• title examiners
• insurance underwriters and claims processors
• data entry and brokerage clerks
• loan officers
• credit analysts
• bookkeeping, accounting, and auditing clerks
• payroll clerks
• file clerks
• switchboard operators
• benefits managers
• library assistants
• nuclear power reactor operators
• budget analysts
• technical writers
• medical transcriptionists
• cartographers
• proofreaders
• word processors and typists
And the study counts the following among those the hardest to automate:
• computer systems analysts
• engineers
• multimedia artists and animators
• computer and information research scientists
• chief executives
• composers
• fashion designers
• photographers
• database administrators
• purchasing managers
• lawyers
• writers and authors
• software developers
• mathematicians
• editors
• graphic designers
• air traffic controllers
• sound engineers
• desktop publishers
72
Missing from the above lists are what are called "pink-collar" workers. These are
people who work primarily in service industries where face-to-face contact is an essential
component of their duties, or in which the observation or expression of human emotions is
important. Examples are waiters and waitresses (who provide table service as opposed to
simply processing orders), clinical psychologists, police, administrative assistants,
classroom teachers, real estate agents, consultative sales professionals, clergy, supervisors,
and nurses. While some aspects of their jobs may be subject to computerization, the
remaining portions mainly those that require an intuitive connection with other people-are
likely to resist being automated for the foreseeable future.
73
CHAPTER 8
REASONING
INTRODUCTION
The mental ability found in humans which is able to generate conclusions from
assumptions or premises can be termed as reasoning. In other words, it is, amongst other
things, the means by which rational beings propose specific reasons or explanations of cause
and effect. In contrast to reason as an abstract noun, a reason is a consideration which
explains or justifies. When we need a system to perform a task which has not been told, then
the system need to reason it. It must figure out what to do, from the knowledge it already
has. We have already come across such examples of reasoning in the previous chapters. For
example, if we know parrot is a bird and all birds have wings, then we may ask, do parrots
have wings? This is a simple example for reasoning; there are many situations where we
need to go on answering the questions to get an inference.
Types of Reasoning
There are many types of reasoning methodologies which generate knowledge in
artificial intelligence. The major types of reasoning are explained below with the help of
natural sentences.
Abduction
Abduction is the process of creating explanatory hypothesis. This is the process by
which we try to explain things which were observed or felt in some manner but have no
existing knowledge or proof to substantiate the phenomenon. It, thus, needs some hypothesis
which needs to be further worked on to verify whether it was true or not.
For example, when a doctor encounters a patient who is having the same symptoms
as someone who he has treated before, he may wonder whether this present condition is
something similar to that.
Analogical reasoning
It is relating things to solve other situations. In this method of reasoning, for every
situation, an analogy of the similar one is taken into consideration. It then tries to create a
mapping all the attributes of the analogical situation to the present problem.
To use an analogy, the following steps are involved:
1. Fix and start with the target problem (domain) for which you need to create new
understanding.
2. Find a similar domain so that we get some attributes similar to our target.
3. Find exact items from the matching domain.
4. Find associated items in the target domain.
74
5. Transfer attributes from the matching domain to the target domain. A mapping must
be created from the corresponding matching and target domain.
For example,
"This organisation is like an express train. It runs fast and reaches the destination in-
time, and now it needs to clean and rest for a while".
"Today is neither sunny day nor rainy day. We don't need an umbrella”.
Cause-and-effect reasoning
This reasoning shows causes and resulting effect. When we deal with a problem, it
will show a set of causes and effects that are involved in it. It, thus, gives an understanding
why a particular event happened or how this will happen if we move forward. This also
draws an inevitable link between what happens first and what happens next. If we look over
the correlation, it may show coincidence to giving certain evidence of causality. If you
cannot have a causal linkage, then you may be successful just by asserting it. This is because
people will challenge a cause-and-effect assertion.
For example,
"If I transfer the money to your bank account, you will be more rich. I will transfer
the money".
Comparative reasoning
It may compare one thing against another. Comparative reasoning helps to find the
importance of some attributes by comparing it with other. The gap size between the things
compared indicates importance. If we compare against a high-grade attribute, then we feel
ours as unimportant and while comparing against a low-grade will make our attribute look
good. It is always important when a size gap between the objects is compared. To create
logical arguments, we need to fix the validity of the comparison benchmark. For less logic,
we may merely assume the benchmark. There are various ways to compare. For example,
• Comparison of people what they have got (or not got) against what others have.
• Comparison of past with future.
• Comparison of what is actual with what is ideal.
• Comparison of words and actions against values.
Conditional Reasoning
Conditional reasoning is based on if-then statements. For example, if we generate a
reasoning based on if A then B, it means B will work only when the condition A is true.
There is more scope as it opens door for the situation on what will happen if A is false,
because it is not explained whether B will be true or false when A is false. This reasoning
system is highly useful when we design a rule-base system for a problem. These rules are
forwarded from an expert's point of view. For example,
75
If tank is full then switch-off the pump
Criteria reasoning
Criteria reasoning is the process of comparison against an already established
criterion. It begins by defining the criteria from which the outcome of a particular decision
is judged. The outcomes are then evaluated and the best criteria are selected. In logical
arguments, it takes more time to establish a criterion but in a less logical situation, we assume
the criteria and start working on it. Criteria which appeal to common values are likely to be
easily accepted.
De-compositional reasoning
As the name suggests, it is the reasoning by which you break down the whole
problem into small understandable parts. Then these small parts are analysed and conclusion
is drawn on how they fit together. Finally, the conclusion on the whole matter is drawn. For
example, "To know how a Rubik cube operates, I will pull it apart to see its hidden workings.
By slowly reassembling it, I am able to explain its apparently magical organisation as a
whole in terms of geometry".
Deductive reasoning
It starts with a general rule and moves to a specific direction. This means, the process
starts with a general case and deduces itself to specific instances. 'Hypothetical deduction'
starts with an assumed hypothesis or theory. The argument is not questioned because the
assumption may be well-accepted or it may be rather shakier. Scientists use deduction; they
take a general scientific law and apply it to a certain case. Scientists assume that the law is
true. Deduction can be used to test an induction by applying it. Although in the test cases,
the initial theory is assumed to be true only temporarily.
For example,
"You have tickets for the cricket match. You agree to meet John and Jolly at the corner of
special box or at the seats".
If you see Jolly at the corner of the special box, you expect to see John as well.
If you do not see either of them at the corner, you expect to see them at the seats when you
get to the stadium.
Logical arguments:
(John AND .Jolly) will be located at corner OR
(John AND Jolly) will be located at seats
NOT (John AND Jolly) located at corner
(John AND Jolly) located at seats
76
Exemplar reasoning
The use of examples in argument is the main use of exemplar reasoning. This
reasoning may give a story or may be a short comparator as an example. Duplication is
possible in a situation or may be a relatively distant metaphor for reasoning. A known
person, a recognized situation or something not directly known to the other person is useful
in exemplar reasoning.
For example,
"You should go out more frequently. I have a pal who used to stay in and was never really
joyful".
Inductive reasoning
Starting from specifics and deriving a general rule is the key for induction. Inductive
reasoning, or induction, is a reasoning technique that is formed from a specific case or cases
and reached to a general rule. From observations, it draws inferences in order to make
generalisations. Inference can be done in four stages:
Observation: Without a bias, collect facts.
Analysis: Identifying the patterns of regularity by classifying the facts.
Inference: These patterns are used to infer generalisations about the relations between the
facts.
Confirmation: For further observation, the inference is tested.
In an argument, we might do the following steps:
Derive a general rule in an accepted area and then apply the rule in the area where we want
the person to behave.
Give those details, and then explain what it all means.
Think about the benefits of all the parts and only get to the overall benefits later.
Take situations that have already happened and provide explanations on why they happened.
Inductive arguments can include the following:
Part-to-whole: Here the whole is assumed to be like individual parts (only bigger).
Extrapolations: Even the unstudied area is considered and treated as that which was already
learned.
Predictions: Here the future is assumed to be like the past.
Modal logic
It is the argument about necessity and possibility. It describes the situations based on
necessity and possibility and also explores how these intertwine. The process is not carried
out by stating the truth but on how likely it is. Thus, use words like can, may, should, ought,
77
must, have to, etc. How true or necessary something is gives you more potential in
arguments. If you have an analogue continuity of alternatives, rather than the black-and-
white binary decision of simply whether something is true or false, necessary or unnecessary
are major concerns of this logic.
Traditional logic
Traditional logic originated from Aristotle. It obeys formal rules and facts. Start with
premises that are assumed to be true. Then use only logical rationale to derive a conclusion.
Always check whether it is applied correctly. Keep emotion well out of it.
For example,
"Some apples are green. Some apples are red. I don't know if there are any green and red
apples”.
Pros-vs-cons reasoning
Pros are arguments in the favour of a case and cons are arguments against the case (cons).
Pros vs-cons reasoning seeks to weigh up the arguments for a case against the arguments
against the case. A conclusion is arrived at from the arguments checking whether the pros
or the cons were stronger. Thus, a 'reasonable' conclusion is formed. Things that will make
a 'pro' stronger (and vice versa) include the following:
1. Include logical arguments.
2. Provide evidence that can be well displayed. This includes actions and perceptions
of other people.
3. Put greater emphasis on key words.
4. Stick to the subject and provide more arguments for the case.
We may start discussions either by supporting or standing against the situation.
Starting with the favoured side allows you to fill the other person's mind with the key points,
such that the second list becomes less easy to absorb. But starting with the disfavoured side
allows you to make it sound reasonable, then knocking down each of the disfavoured
arguments with stronger arguments for the contrary case. Any how, the final conclusion will
always be checking which was stronger, pros or cons.
For example,
"Vinod likes it, John likes it, Reji likes it, Hari likes it. Only Anand and Jolly don't like it.
Most them are like it".
Set-based reasoning
Set-based reasoning is normally found in set theory. Its arguments range around
whether things are the members of named groups or not. For instance, a cat is an animal but
not a 'vegetable'. A basic assumption of set-based reasoning is that an item can be categorised
into a given group or set. This also assumes that both the item and the set exist in the first
place. The following arguments then may include the consideration of going beyond
78
between sets and the implications of this. Set reasoning often, thus, includes statements
along the lines of
A is a B
If A is a B then ...
A is not a B, but it is a C
A is both C and D, therefore ...
For example,
"He works for defence. Defence people are intelligent. Therefore, he is intelligent".
"If this is an international standard product, then it will use ISO standard encryption coding".
Systemic reasoning
This is the most generic approach. We can try to understand a case by considering
the whole system. The various attributes are checked and analysed. The relation between
them is also validated. You can use de-compositional scheme of reasoning to identify the
case which does not belong to your consideration.
For example, "I am planning to purchase a villa in the middle of the city by
considering landscaping, swimming pool and tall building around it. I am also considering
pleasant movement of the people during the day-time especially on Sundays".
Syllogistic reasoning
During conversation, we make many sentences and try to imply logical connection
between them. But many a times, the truth and logic that we assume will not be there. The
following statements and conclusion will help in understanding more about it.
Statement 1: "All human being are animals”.
Statement 2: "Some animals are aggressive".
Conclusion: "Some people are aggressive”.
This seems to be a reasonable conclusion, but then consider the following:
Statement 1: "All human beings are animals".
Statement 2: "Some animals are male".
Conclusion: "Some human beings are male".
Reasoning is exactly the same as in the above example; conclusion appears to be
ridiculous and false. See the first example; it arrived at a false conclusion. The animals who
are aggressive are not necessarily human beings. We know that it is a true logic for
statements in certain situations, when we speak about assumptions and beliefs to less
acceptable ends. We can use the Venn diagram to visualise reasoning by set theory.
79
Non-monotonic Inference Methods
These methods are not valid forms of inferencing. But they are important since they
are being used in everyday activities. These are as follows:
Abductive inference: This is based on the use of known knowledge to explain or justify a
valid conclusion, given the truth of propositions P and Q and the implication, P → Q,
conclude P.
For example, people who are drunk stagger when they walk. It is not reasonable to
conclude that a person who is staggering is drunk.
Inductive inference: This is based on the assumption that a recurring pattern observed for
some event or entity implies that the pattern is true for all entities in their class.
For example, given p(a 1 ), p(a 2 ), .... Then conclude p(x).
Analogical inference: This is a form of inference obtained with experience. When we find
that A is related in a certain way to B and A' is similar in some context to A', then we
conclude that B' has some relation to A' in the same context.
For example, if a 2 x 2 system of linear equations can be solved in some manner,
then the same method can be adopted for a 3 x 3 system also.
Non-monotonic Reasoning
Conventional reasoning using predicate logic is based on three important
assumptions.
1. The predicate descriptions must be sufficient with respect to our application domain.
That is, all information necessary to solve the problem must be represented.
2. The information base must be consistent, i.e., the pieces of knowledge cannot
contradict each other.
3. Through the use of inference rules, the known information grows monotonically.
If any of these is not satisfied, the conventional logic-based approach will not work.
Non-monotonic systems address each of these three issues.
1. Reasoning systems are often faced with a lack of knowledge about domain.
2. An assumption in traditional logic-based systems is that the knowledge supporting
reasoning must be consistent.
3. Whether we wish to use logic to address the problem of how a knowledge base is
updated.
Truth maintenance systems (TMS) have been developed as a variety of means of
implementing non-monotonic reasoning systems.
Truth Maintenance Systems
The idea of a truth maintenance system (TMS) grew out of dependency-directed
backtracking. A TMS may be employed to protect the logical integrity of the conclusions of
80
an inferring system. A TMS allows assertions to be connected via a spread shed-like network
of dependencies. In a justification-based TMS (JTMS), the TMS itself does not know
anything about the structure of the assertions themselves. The TMS's only role is not to serve
as a book keeper for a separate problem-solving system, which in turn provides it with both
assertions and dependencies among assertions
The applications of a TMS are as follows:
1. Solve search problems, e.g., schedule meetings so that everyone can attend.
2. Determine what derived conclusions are still true when some facts change.
3. Determine what part failures could have caused a machine to fail.
ATMS
Assumption-based truth maintenance system or ATMS is a powerful tool for
different applications in the domain of uncertain reasoning. ATMS keeps for each
proposition a set of assumptions under which the proposition is true. A proposition is
believed at a given time iff one of the assumption sets supporting it is satisfied. A traditional
ATMS is based on propositional logic, but it is limited to Horn clauses and to simple queries.
A Horn clause is a clause (a disjunction of literals) with at most one positive literal. An
ATMS requires a subset of propositions declared as assumptions. Assumptions are needed
to express uncertainty. The fundamental ATMS problem is to identify the combinations of
assumptions (labels) such that a given query holds. The weak point of the traditional ATMS
concept is the restriction to Horn clauses. In many cases, non-Horn clauses are needed to
express the knowledge. The advantage of ATMS lies in its efficient computations.
JTMS
Justification-based truth maintenance system (JTMS) keeps for each derived
proposition a list of sets of propositions from which it was derived, and for each proposition,
a list of propositions it supports. When a proposition is retracted, the system checks each
proposition that depends on it, and if the proposition no longer has support, retracts it. There
are three main operations that are performed by the JTMS.
1. The JTMS inspects the network of justifications. This inspection can be triggered by
queries from the problem solver such as:
"Should I believe proposition P?",
"Why should I believe proposition P?" and
"What assumptions underlie proposition P?"
2. The second operation of the JTMS is to modify the dependency network, where
modifications are driven by information supplied by the problem solver.
Modifications include adding new propositions, adding or removing premises,
adding contradictions and justifying the belief in a proposition.
3. The final operation of the JTMS is to update the network. This operation is executed
whenever a change is made in a dependency network. The update operation
81
recomputes the labels of all propositions in a manner that is consistent with existing
justifications.
Consider an example.
∀x good-student (x) /\ M study-hard (x) → study-hard (x) ∀y party-person (y) →
not (study-hard (y)) /\ good-student (David).
Reasoning with Fuzzy Logic
Fuzzy logic uses a different approach to represent uncertainty in problem solving. It
focuses on describing events ambiguities rather than the uncertainty about the occurrence of
an event. Modifying the definitions of set theory and logic results in the fuzzy set theory.
Traditional set theory defines set memberships as a Boolean predicate.
Fuzzy Sets
In the usual set theory, an object is a member of a set or is not a member of the set.
Traditional logics are based on the notion that P( a) is true if a is the member of the set,
otherwise P (a) is false. There is no parallel containment. This amounts to the use of a
characteristics function for a set A.
f A (x) = l; if A ∈ A
= 0; otherwise
We may generalise the notion of x by allowing the characteristic functions to have
values other than 0 or 1. We define the notion of a fuzzy set with characteristic function 'U',
which maps the universal set U to the values in the interval [0, 1].
That is, u: U → [0, 1]
Let U be a set and x be an element of U. A fuzzy set A is a set of ordered pairs (x,
U A ( x),
∀x ∈ U, where U A is known as membership function, where values lies in the interval [0,
1].
Example: Let TALL(x) → Height of a person. We can define the fuzzy set A with the
membership function values as:
U A (O) = 0 = U A (10) = U A (20) = U A (30) = U A (40)
U A (50) = 0.2, U A (60) = 0.4, UA(70) = 0.6, U A( 100) = 1
All the properties and theorems holding good for the set theory will also hold good
for fuzzy sets.
Fuzzy Reasoning
There are various methods of fuzzy inference. We can consider Zadeh's
compositional rules of inference.
82
Rule-based Reasoning
Resolution by verification requires the set of wffs to be converted to clause forms
but a clause form may not exactly give an answer which is required by the problem. So one
can deal with the wffs themselves and adopt methods of a given system called a rule-based
system. In rule-based system, we have a set of fact and rules and a goal sentence. The set of
rules forms a rule base and each rule can be used to produce a new fact (i.e., a conclusion)
from one or more established facts. But the process can work in two different ways.
Advantages of rules
1. Knowledge comes in meaningful chunks,
2. New knowledge can be added incrementally.
3. Rules can make conclusions based on different kinds of data, depending on what is
available.
4. Rule conclusions provide "islands" that give multiplicative power.
5. Rules can be used to provide explanations, control problem-solving process and
check new rules for errors.
In its place of representing knowledge in a relatively declarative, static way as a
bunch of things that are true, the rule-based system represents knowledge in terms of a group
of rules that given a different situation. A rule-based system consists of IF-THEN rules, a
group of facts and some interpreter controlling the application of the rules, given the facts.
A rule-based reasoning systems works in two ways: forward reasoning (forward
chaining) systems and backward reasoning (backward chaining) systems. In a forward
reasoning system, you start with the initial facts and keep using the rules to draw new
conclusions (or take certain actions) given those facts. In a backward reasoning system, we
start with some hypothesis (or goal), to prove and keep looking for rules that will allow us
to conclude that hypothesis, perhaps setting new sub-goals to prove as we go. Forward
reasoning systems are primarily data driven, while backward reasoning systems are goal-
driven. Both methods are useful for reasoning purpose.
In general, the reasoning in problem solving is mainly in the forward direction.
However, when the forward reasoning cannot solve the problem, an alternative way is to use
83
the backward reasoning technique. Sometimes, the forward and backward reasoning are
used together for solving a problem.
Backward Reasoning
We can adopt either forward reasoning (bottom-up resolution) or backward chaining
(top-down resolution). In top-down resolution, we obtain a sequence of derives as inferences
and two sentences are resolved. This means that in backward chaining, we start with the goal
sentence, and the base rules are arrived at using the given facts whereas in forward chaining)
we start with the facts given in rule-base system to obtain the goal sentence. Backward-
chaining architecture means that no rules are fired upon the assertion of new knowledge. If
we find an unknown predicate about a known piece of knowledge detected in an operator's
condition list, all rules appropriate to the knowledge in question are fired until the question
is answered or until quiescence. The backward chaining systems generally work from a goal
state back to the start state. The backward procedure is given in Algorithm 13.
For example: Consider the facts (Fl, F2, F3 and F4) and rules (Rl, R2, R3, R4 and
R5).
Facts
Fl: Dog (Fido)
F2: ¬ Barks (Fido)
F3: Wags-Tail (Fido)
F4: Mews (Myrtle)
Algorithm 13 Backward chaining procedure

Inputs: A finite list of atomic sentences, q 1 , q 2 , ... , Q n
Outputs: Yes or no depending on whether a given knowledge ba.se entails all of the q,
SOLVE(q 1 , q,, ... , q n ]=
if n = 0 then return yes
For each clause c ∈ knowledge ba.se F, do
if c = (q 1 , ¬p 1 , ..., ¬P m ] and
SOLVE [p 1 , p 2 , ...,p m , q 1 , q 2 ,...,q n ]
then return yes
end for
return no
Rules
Rl: Wags-Tail (xl) ∧ Dog (xl) → Friendly (xl)
84
R2: Friendly (x2) ∧ ¬ Barks (x2) → ¬ Afraid (y2, x2)
R3: Dog (x3) → Animal (x3)
R4: Cat (x4) → Animal (x4)
R5: Mews (x5) → Cat (x5)
"If there a cat and a dog, the cat is not afraid of the dog". Solve the problem by backward
reasoning.
First we convert the goal into wff.
∃x: ∃y: Cat (x) ∧ Dog (y) ∧ Afraid (x, y)
i.e., Cat (x) ∧ Dog (y) ∧ ¬ Afraid (x, y)
By backward chaining, we start from this goal state.
Forward Reasoning
In forward reasoning, we obtain a sequence of assertions as inferences and two
sentences are resolved. Forward chaining implies that upon the assertion of new knowledge,
all relevant inductive and deductive rules are fired exhaustively, effectively making all
knowledge about the current state explicit within the state. Forward chaining may be
regarded as progress from a known state (the original knowledge) towards a goal state(s).
The forward procedure is given in Algorithm 14.
Consider the fact (Fl) and rules (Rl, R2). Solve the following problem by forward
chaining.
Fl: Fido barks and bites or Fido is not a dog.
Rl: All bulldogs are dogs.
R2: Anyone who barks is noisy.
Algorithm 14 Forward reasoning procedure
Inputs: A finite list of atomic sentences, q 1 , q 2 , ...,q n
Outputs: Yes or no depending on whether a given KB entails all of the q i
1. if all of the goals q i are marked as solved, then return yes
2. Check if there is a clause [q 1 , ¬ p i ,...., ¬ P n ] in the KB,
such that all of its negative atoms ¬p 1 , ..., ¬ P n are marked as solved,
and such that the positive atom p is not marked as solved
3. if there is such a clause, mark p as solved and go to step 1
otherwise, return no
The goal is to prove that "There exists someone who is not a bulldog or who is noisy”.
Equivalent wffs are
Fl: ¬ Dog (Fido) → Barks (Fido) /\ Bites (Fido)
85
Rl: ¬ Dog (x) → Bulldog (x) Bulldog (x) → Dog (x)
R2: Barks(y) → Noisy (y)
Goal: ¬ Bulldog (z) ∨ Noisy (z)
Comparison of forward and backward reasoning
The choice of reasoning strategy depends on the properties of the rule set. If there is
a single goal (conclusion), backward chaining will normally be more efficient, as there is no
wasteful generation of irrelevant conclusions. But if there are many different ways of
demonstrating any particular fact, backwards chaining may be wasteful. Forward chaining
is likely to be more efficient if there are many conclusions to be drawn or where we have a
small set of initial facts. It may also be preferable if conclusions tend to have many rules.
Backward chaining is likely to be more efficient where there is a single conclusion to be
drawn or where the initial set of facts is large.
In summary, forward reasoning is the process of searching for a solution path
connecting initial fact(s) with desired conclusion(s). States are combinations of facts and
each rule is a method for generating a single successor (i.e., it defines a single transition).
Backward reasoning is the process of searching for a solution path connecting some final
conclusion with one or more initial facts. States are combinations of required conclusions
and the transitions are defined by the rules. Each rule is a method for generating further
'required conclusions' from existing required conclusions.
Diagnosis Reasoning
Diagnosis is the process of determining the cause of any abnormal or unexpected
behaviour.
Case-based Reasoning Systems
Case-based reasoning is a particular technique used for the development of a
knowledge based system. Case-based reasoning compares the current situation (or case) with
situations that have been encountered in the past to see if one or more of the earlier situations
can provide a model for how to act in the current situation. Case-based reasoning can trace
its roots to the field of psychology and theories about how human memory works. Episodic
memory provides a method for storing and recalling large chunks of related information
such as events, scenes: occurrences and stories. Schema reasoning is the process of applying
chunks of information to new situations. Case ultimately refers to an experience; everything
is a case! We can just define it as follows:
"A case is a contextualised piece of knowledge representing an experience that
teaches a lesson fundamental to achieving the goals of the reasoner" .
The major advantages are as follows:
1. Increased availability and reusability of expertise at reduced cost.
86
2. Fast and consistent response.
3. Increased safety.
4. Learning component enables adaptation to similar situations.
5. Works well in conjunction with a human operator (system can make suggestions in
unusual situations).
Major challenges with practical issues of case-based reasoning are as follows:
1. Domain knowledge acquisition.
2. Indexing and retrieving case information.
3. Completeness of case base.
4. Maintenance of case base.
5. How many cases are needed?
6. How to remove overlapping cases?
7. How to search efficiently?
8. What features to use for indexing?
9. How to weigh the features?
10. How to acquire adaptation knowledge?
Model-based Reasoning Systems
Model-based reasoning refers to an inference method used in expert systems.
Basically, model-based reasoning uses a model of the physical world. The main focus of
application development is creating a model with observed data to derive conclusions such
as a diagnosis or a prediction. A model-based reasoning schematic is shown in Figure.
Figure Model-based reasoning

The model-based reasoning system in engineering models forms the basis for
diagnosis by which the interrogation of fault propagation graphs is very efficient. A hybrid
approach is used to combine these techniques and the system gives more flexibility. Major
challenges include model building, difficult validation and scalability.
Questions
1. Explain the difference between forward chaining and backward chaining and the
situations where each one is more appropriate.
2. Discuss the main advantages of applying fuzzy logic in knowledge-based systems.
3. Why is backward chaining used for diagnostic problems?
87
CHAPTER 9
LEARNING
INTRODUCTION
Learning is the process by which a system improves its performance from experience
or environment. Learning may also be defined as the art of acquiring new or modifying
existing knowledge and skills. In short, we can define learning as a change in behaviour.
Learning has many roles in artificial intelligence applications. For example, in robotics or
expert system development, learning plays a separate module in the system. Machine
learning is a computerised process which gives an improved performance in the application
development with human computer interaction.
What is learning?
If a computer is able to learn new things, and adapt to new situations, then we say
that the computer has been equipped with 'learning programs'. Learning is an area of
artificial intelligence that focuses on the processes of self-improvement. Information
processes improve their performance or enlarge their knowledge bases. Computers do the
learning process through the following steps.
1. One way of learning is data acquisition of new knowledge. Any program that does
such data acquisition is a learning program.
2. The problem solving is another component of learning.
Learning is hard because of the following reasons:
1. Intelligence implies that an organism or machine must be able to adapt to new
situations.
2. It must be able to learn and must do new things.
3. This requires knowledge acquisition, inference, updating/refinement of knowledge
base, acquisition of heuristics, applying faster searches, etc.
How computers learn?
Many approaches have been attempted to provide a machine with learning
capabilities. This is because learning tasks cover a wide range of phenomena. A few
examples of how one may learn as follows:
Skill refinement: One can learn by practice, e.g., playing the guitar.
Knowledge acquisition: Learning through experience and storing it in a knowledge base is
called knowledge acquisition. Rote learning can be cited as an example of this.
Taking advice: It is similar to rote learning, but the input knowledge may need to transform
effectively by an order.
Problem solving: If we solve a problem systematically, we will get experience to solve
similar problems. For the next time, we can solve similar problems more efficiently. Usually,
88
this does not involve new knowledge gathering but may involve reorganisation of data or
remembering how the achieve the solution.
Induction: One can learn from examples. Humans often classify things in the world without
knowing explicit rules. It usually involves a teacher or a trainer to a.id the classification.
Discovery: Here one learns knowledge without the aid of a teacher or a supervisor. New
knowledge elements are inferred from the knowledge base obtained from a supervisor.
Analogy: It may be able to transfer little knowledge for getting a better solution of the task
in hand a system can use. This system finds and stores the similarities in the information.
Types of Learning
Learning is acquiring new or modifying existing knowledge, behaviours, skills,
values or preferences and may involve synthesising different types of information. The
ability to learn is possessed by humans, animals and some machines.
Rote Learning
The simplest kind of machine learning is recording of data. Data storage programs
can be used for making the machines learn. Such a type of learning is known as rote learning.
In the game of chess, checkers or tic-tac-toe, one can find the backup value of a
configuration (board states) by MINMAX procedure. If the backed-up value is stored along
with the storage of the board position, this can be used in the next game. Hence, such a
configuration arises without using MINIMAX procedure once again, which consumes once
again. Here the machine has learned the best move for a given board position. A few more
capabilities for rote learning can be included in the following.
1. Organized storage: There is organized storage of information. In order to use the
stored value, the board position should be organized in such a way that the process
of retrieval is the fastest. This is done by indexing the board position by the number
of pieces on the board.
2. Generalisation: To keep the number of stored objects at a manageable level, some
kind of generalisation is necessary.
3. Direction: The program should try to find the required stored value intelligently for
a given board position. So, we have to focus the attention to a single processing
direction.
Learning by Parameter Adjustment
Game play determines the best moves by using static evaluation function or by
MINIMAX procedure. Pattern classification does this by assigning weights to the various
features that a particular pattern contains in chess game. This is done by considering a
polynomial of the form C 1 t + C 2 t2 + C 3 t3 + · · · + C 16 t16, where one can use 16 features and
determine the 16 weights. The adjustment of C 1 , C 2 , ..., C 16 depending on the configuration
is known as parameter adjustment. The value will be set with the best values obtained by
experience. The computer plays the game with another copy of itself which was configured
89
earlier. If the new set of parameter values are better in the sense of winning the game, they
are changed to the new values and a copy of it is made. This process is repeated several
times. This is the way by which the parameters are adjusted for a game. This learning process
is applicable for initial game knowledge base building.
Learning by General Problem Solving
Learning can be thought of as a form of problem solving methods. General problem
solver (GPS) provides a design of a learning program. It is necessary to have a table that
operates relevant to reducing the important differences between a given state and the goal
state. Suppose we decide that rather than providing such a table to GPS, we want the program
to teach the table itself from its own experience, then we say that GPS is another type of
learning program.
GPS was an attempt to construct a general problem solving mechanism that could
solve problems in a new area given domain-specific knowledge about that area.
GPS is a generalized state-space search mechanism. It has the following components:
1. A set of states and operators that change states. Specification of start and goal states.
2. A procedure for identifying differences between states.
3. A table of connections, which connects observed differences with operators that may
be relevant for reducing those differences.
GPS works by determining the difference between the current state and the goal state
and selecting operators relevant to reducing that difference. Hopefully, this will give
direction to the search and make it more efficient than a blind state-space search.
To apply GPS, the problem of discovery of a good set of differences used by another
copy of GPS working in a given problem domain, it is necessary to do the following:
1. Input a set of operators that will be available in the application domain.
2. Define a goal for the learning task. This goal is to provide a good set of difference
for the application task.
3. Define a set of differences used by the learning task.
4. Define a set of operators for the learning task.
Concept Learning
Classification is the process of assigning to a particular input the name of the class
to which it belongs. It is an important component of problem solving task. The idea of
producing a classification programme that can evolve its own definition and construct these
class definitions is called concept learning. The technique used depends on the way that
classes are desired. Suppose if: the current goal is to go from place A to place B and there is
a wall separating the two places then: look for a doorway in the wall and go through it.
90
To use this rule successfully, the systems matching outlived must be able to identify
an object as a "wall". Without recognizing the object to be the ('wall'', the rule can never be
invoked.
Learning by Analogy
Analogy involves a complicated mapping between what might appear to be two
dissimilar concepts. Analogy is a powerful inference rule. It allows similarities between the
objects to be stated. An analogical learning system uses frame structure.
For example, suppose we wish to say that "John is like a turbo-jet".
This means that the person is 'John' who's movement or speech is very fast. Here, the
turbo-jet is an object that maps to the person 'John'.
Analogical problems are giving learning process which maps in two different ways.
These two methods of analogical problem solving that have been studied in artificial
intelligence are transformational analogy and derivational analogy.
Transformational analogy
Look for a similar solution and copy it to the new situation making suitable
substitutions wherever appropriate (example, geometry). In 1983, Carbonell described this
method. The idea is to transform a solution to a previous solved problem into a solution for
the current problem (Figure 7.1). Transformational analogy does not look at how the
problem was solved; it only looks at the final solution. For example, we know how to solve
a problem related to an isosceles triangle. The same methodology can be adapted to solve a
problem related to an equilateral triangle.
Derivation analogy
The history of the problem solution, the steps involved, is often relevant. In 1986,
Carbonell showed that derivation analogy is a necessary component in the transfer of skills
in complex domains. For example, we know how to find the area of a triangle and a square.
Derivation analogy will help to solve the volume of a pyramid from this knowledge.
Machine Learning
Machine learning is the branch of artificial intelligence concerned with the design
and development of algorithms. It allows the computer to behave in a way based on the
empirical data, such as from electronic sensors or databases. A major focus of machine
learning research is to automatically train the system for recognising complex patterns and
to create intelligent decisions based on the provided inputs. The real set of possible data will
be too large when compared to the inputs that we supply. Hence, it is always difficult to
produce an effective output.
91
Why Machine Learning?
Machine learning is the field of study that gives computers the ability to learn without
being explicitly programmed. The main aim of machine learning is to develop systems that
are too difficult or expensive to construct manually. The systems must automatically adapt
and customise themselves to individual users like personalising news, mail filtering, etc.
Computers are used in various fields of science and for various tasks. Normally, the
programmers design and implement software that they need for particular functions.
However, there are many tasks for which the implementation is difficult or impossible. We
can generally classify such problems into four types.
First, the problems where no human experts exist, for example, in modern automated
manufacturing facilities, we need to predict the failure of machines before it happens by
analysing the sensor details. As the machines are new and no experts exist for such task, a
machine learning system can be employed to study the sensor data and to predict subsequent
machine failures.
Figure : Branches of machine learning

Second, the problems where human experts exist, but they will be unable to explain
their expertise. There are various tasks like speech recognition, hand-writing recognition,
natural language understanding, etc. Every human exhibits minimum expert-level abilities
for these tasks, but nobody can describe the detailed steps that they follow while performing
the task. Fortunately, we can provide the machine with the corresponding output that we
expect for a particular input. Thus, the machine learning algorithms can help in mapping the
inputs to the outputs.
Third, the problems where the environment changes frequently. For example, we
have people investing in the share markets. The behaviour of the market changes frequently
and according to that, the customers need to purchase or sell their shares. To predict the
future behaviour of the stock market, even an expert programmer will need to rewrite the
codes of his program frequently. A learning program can relieve the programmer from the
burden of constantly modifying and tuning a set of learned prediction rules.
Fourth, the applications that need to be customised for individual users or for a group
of users. For example, a program to filter unwanted electronic mail messages. The
advertisement about refrigerators and air-coolers will always be a spam message for the
people in Alaska. Similarly, the advertisement about sweaters and jackets will be of no use
92
for people living in extreme hot regions. Thus, different users will need different filters. It
is not reasonable to expect each user to program his or her rules, and it is infeasible to provide
every user with a software engineer to keep the rules up-to-date. In such a situation, a
machine learning system will be more useful which can automatically learn which mail
messages to reject and which to maintain by altering the filtering rules.
Many of the similar research queries same as in the fields of statistics, psychology
and data mining are being addressed by machine learning but with a different outlook.
Statistics focuses on learning the phenomena that generated the data. Data mining studies
are always interested in finding the patterns and to make conclusions which will be easily
understandable by people. Psychological studies focus on the mechanism that underlies the
reason of various types of behaviours shown by different people. We can illustrate this with
an example; consider the case about speech recognition. A machine learning approach
focuses on building an accurate and efficient speech recognition system. But the statistician
might collaborate with a psychologist to find hypotheses about the mechanisms underlying
in speech recognition. A data mining technique will look for patterns in speech data and will
be interested in whether these can be applied to some group of people according to gender
or age. Now, let us identify the types of problems in which machine learning is being used.
Types of Problems in Machine Learning
Machine learning is all about algorithms and designing of algorithms for computers
to train them to improve performance. It is not just the search of consciousness in machines
but also comes close to the traditional goals defined by artificial intelligence. There are
various machine learning algorithms available but selecting an apt algorithm for the kind of
problem that we need to solve is the thumb rule in the process of learning. Learning is just
the matter of finding statistical regularities or other patterns in the data. In a broad sense, we
can classify the machine learning problems as follows:
1. Classification problems
2. Decision problems
3. Learning problems
The types of learning algorithms fall into several classifications. One classification
is the type of result expected from the algorithm. For example, some problems are
classification problems and other may be used for solving the problem. Let us take some
example for a classification problem. Hand-written recognition and digit recognition are
problems where the computer needs to learn how to classify the given inputs by checking
the patterns with have been already fed to it. There are various hand written recognition
software which give extremely good results. The neural network community works on the
digit recognition problem. For the same problem, support vector machines offer a better
solution classifier.
93
We can have machine learning for problem solving too. It is mainly used in medical
diagnoses and in computational biology. For instance, if the computer is trained to
distinguish between cancer cells and normal tissues, then for a given a set of attributes, the
system will be able to distinguish between likely cancer cells and normal tissues. Neural
networks or support vector machines can sort out the learning process of the problem.
Another learning technique that can solve the problem is a decision tree.
Decision tree is a relatively simple classification technique that follows the concept
of following a tree of questions and answers. If the answer to a question is yes, then the
algorithm proceeds down one branch of the tree and if the answer to a question is no, the
algorithm takes the other branch. Finally, the algorithm will reach a leaf node with the final
classification. Learning decision trees is straight-forward and requires only less work than
neural nets. So you need to be aware that even the simplest learning algorithms can
accomplish great deals. Thus, a decision tree learning algorithm can be used in any scenario
where you can collect a reasonable number of attributes and a classification system that can
expect effective results.
Let us look at an example of classification learning, speech recognition. Usually, the
computer will be given a set of training instances which consist of sound and the word that
particular sound corresponds to. This type of learning can be well carried out with neural
networks, though it is hard to imagine that the problem is simple enough for decision trees.
One of the machine learning technique by name, hidden Markov model, is designed
specifically for working with time series data like this and has shown good results in speech
processing.
There is a common type of learning which is not designed to create classifications of
inputs, but to make decisions. Such type of learning is called decision problem. It normally
requires various sorts of assumptions about the present state of the system to predict the
rules. Decision problems sometimes need to make a single decision and sometimes multiple
decisions. When there are multiple decisions to be made, the problem becomes trickier
because it needs to take into account all the possibilities and all the consequences of actions
that the system performs.
For understanding both classification and decision problems, we make use of the
concept of utility function. The basic idea is that the computer will gain a particular amount
of value while performing some actions. The agent needs to learn what actions resulted in
positive utility pay-off and what actions resulted in negative utility. For example, if you are
making a robot to explore a jungle, sometimes, you will never know where exactly the robot
is located. We are not aware of what will happen to the robot when he steps forward or turns
left. Thus, we are either not sure of the state of world or not sure whether its actions will
work as expected.
The computer (agent) needs to learn the various aspects of the model which are not
provided. In our example, either the location or the action needed will be unknown for a
94
particular instant. The system needs to track that and learn how the world looks like or what
effect a particular action can cause. Once these different functions are learned, the correct
action to take is simply a matter of deciding which action maximises the "expected utility"
of the agent. The expected utility can be calculated by multiplying the probability of each
pay-off by pay-off and taking the total sum. In cases of making multiple decisions and when
reasonably small group of possible states are available, an algorithm known as a Markov
decision process is commonly used.
In some cases, the actual calculation of the utilities can be avoided as some learning
algorithms incorporate the feature of predicting future from the information they learned.
Reinforcement learning is a common method for these kinds of situations.
Supervised learning is the type of learning where the training is done by providing
correct results. Thus, a feedback system will be equipped in order to get the correct result
and to check the progress of learning. In unsupervised learning, the goal is harder because
there are no pre determined output and sample patterns. The system needs to undergo a lot
of training to get results.
Develop a system that is too difficult/expensive to construct manually because it
requires specific detailed skills or knowledge tuned to a specific task. It is one of the
knowledge engineering bottleneck. Develop a system that can automatically adapt and
customise itself to individual users. Data mining techniques are used to discover new
knowledge from large databases. The training may have some experience such as direct
experience or indirect experience. Direct experience is the given sample input-output pair
for a useful target function, and indirect experience is the given feedback that is not a direct
input-output pair for a useful target function.
History of Machine Learning

1950s can be termed as exploratory period during which many general techniques
were born. It was during this time, Samuel Checkers' playing program, considered as the
world's first self-learning program, was developed. It demonstrated the fundamental concept
of artificial intelligence. Selfridge's pandemonium model proposed by Selfridge in 1959 also
noted its success in modeling human pattern recognition.
During 1960s the concept of neural networks was developed. Rosenblatt developed
the perceptron, which was modeled after neurons. It was the precursor to later work in neural
networks. In 1969, Minsky and Papert wrote a seminal paper proving that the perceptron
was inherently limited, which discouraged research in this area for almost 20 years.
1970s were characterized by the development of more practical algorithms, often
using symbolic techniques. Winston's important work on learning (Winston's arch learner)
in blocks-world domain happened in 1970. The idea of macro-operators was developed by
95
Fikes during 1972. Mathematical discovery with AM by Lenat in 1977 and scientific
discovery with BACON by Langley in 1978 were the major discoveries.
The explosion of different techniques and increased emphasis on evaluation
happened during 1980s. The notion of version spaces was developed by Mitchell in 1982.
Quinlan in 1983 created the ID3 decision tree algorithm. Valiant in 1984 defined the idea of
probably approximately correct (PAC) learning, still the dominant theoretical paradigm.
Connectionist research became active again. The back propagation algorithm (Rumelhart in
1986) overcame many of the limitations of the perceptron. The COBWEB clustering system
was developed by Fisher in 1987. Explanation-based learning, speed-up learning and case-
based learning became very popular during this era.
The rapid changes in 1990s were triggered by the data mining applications. Data
mining emerged as an important tool for many business activities. Reinforcement learning
and inductive logic programming were also developed during this period. Bayes' net learning
also grabbed pace during the second half of 1990s.
It was during early 2000s that studies on support vector machines (SVM) and kernel
methods became prominent. Compilers, debugging tools and graphic interfaces were
introduced in the process of learning. Applications to robotics, adaptive software agents,
bagging, boosting and stacking also gained pace during this period.
The data, information, knowledge and wisdom hierarchy is summarised in Figure
7.5. Data is accumulated rapidly and it needs to be analysed for the use of computational
intelligence schemes to extract hidden patterns in the (useful information) bodies of data for
use in decision support and estimation. It is the automated extraction of hidden predictive
information from large databases.
Aspects of Inputs to Training
The major classifications are supervised versus unsupervised training (class
information on training example is available or not), incremental versus non-incremental
training (training examples are given one time or consequentially) and homogeneous versus
heterogeneous training (described by only one type of data like numeric, symbolic, etc., or
several types of data).
Supervised versus Unsupervised Training
As seen, supervised learning is employed when we have a sample data. So it is most
commonly used in classification problems because the goal is to get the agent learn a
classification system that is created. The digit recognition problem can be considered as an
example of supervised learning. In the case of neural networks and decision trees, we have
pre-determined classification and, hence, supervised learning is used commonly. In the case
of neural networks, the classification is mainly used for determining the error of the network
and to minimise it. In the case of decision trees, the classification is used for determining
the apt attributes which provide more information that can be used for solving the puzzles.
96
In short, we may conclude as, supervised learning will always have some "supervision” for
problem solving. To minimize the errors and to adjust the input parameters, hidden Markov
and Bayesian network relies a lot on supervised learning. Thus, we can say that in
classification problems, the goal of the learning algorithm is to minimise the error with
respect to the given inputs. These inputs are called the training set, which is an example set.
The training set will have both positive and negative inputs. Then only we can understand
the behaviour of system in both situations.
Unsupervised learning is normally treated as difficult because the goal of this process
is to have the computer learn something that we do not tell it how to do. Two approaches
are mainly employed for unsupervised learning.
The first one is to teach the system not by giving sample categories but by providing
some positive flag as reward to denote success. This type of training is mainly used for
decision making problems and not for classification because the systems always tries to get
maximum rewards. This approach is simiiar to the real-world scenario where some positive
actions are rewarded and the others are punished. In unsupervised learning, the system
always tracks the rewards for its previous actions and learns what need to be done for a
similar problem. This type of learning is not extremely beneficial because this type of
learning is very time consuming as the process goes through trial-and-error methods.
The second type of unsupervised learning is clustering. Here the system tries to find
similar data and groups them into a single cluster. The goal is not to maximise a utility
function for this type of learning. The similarity in the training data is identified for further
training. For example, clustering individuals based on gender will result in a cluster of males
in one group and females in another. This data-driven approach works well only if there is
sufficient data. This principle is used by websites like amazon. com to recommend certain
books for a particular group of users. Algorithms which are used for these purposes are
called information filtering algorithms. Information about the members in a cluster always
helps to produce meaningful; results. Unsupervised learning also has a problem of over
fitting the training data.
Learning Systems
For decision making classification and prediction problems, we can use machine
learning approaches. How can we evaluate the performance of the developed algorithms?
How will we quantify the performance of these models? Many standard methods are suitable
for measuring the performance of a classification system.
It was Arthur Samuels who developed the first machine learning system at IBM in
1959. There are two approaches to develop a sample complexity. This needs a number
training example to learn an accurate function. The major advantage is data-driven approach
that is able to transform high-dimensional noisy data into low-dimensional information. This
97
provides monitoring capability and facilitates model-building via identification of dynamic
relationships among data elements. One of the major challenge is that it is highly dependent
on quantity and quality of system operational data.
Machine Learning Applications
Major applications of machine learning can be summarised as follows:
1. Optical character recognition: Categorising the images of handwritten characters by
the letter represented in a text. ANN and HMM are widely used for optical character
recognition.
2. Face detection: Finding faces in images or indicating if a face is present. HMM is
suitable for developing such application.
3. Spam filtering: Identifying e-mail messages as spam or non-spam from a bunch of
emails. Any of the machine learning technique such as SVM, ANN or HMM can be
used to develop such applications.
4. Topic spotting: In order to categorize news articles (say) as to whether they are about
politics, sports, entertainment, etc., HMM or SVM can make an influence in this
application.
5. Spoken language understanding: Within the context of a limited domain,
determining the meaning of something uttered by a speaker to the extent that it can
be classified into one of the fixed set of categories. For language classification,
HMM is highly useful.
6. Medical diagnosis: For diagnosing a patient as a sufferer or non-sufferer of some
disease, any rule-based system can be useful.
7. Customer segmentation: Predicting which customers will respond to a particular
promotion in a domain task. This prediction problem can be incorporated with any
of the prediction technique like SVM or ANN.
8. Fraud detection: For identifying a credit card transaction and identifying whether a
fraud happens. Using this prediction system, any supervised machine learning can
be applied to make a solution.
9. Weather prediction: To predict whether or not it will rain tomorrow is a type of
application that can be develop in HMM.
Quantification of Classification
A standard method is available to quantify the classification for a learned model. A
classification model is a mapping from instances to predicted classes. A classifier results a
set of real output values between a threshold value. Consider two classes of prediction
problem (binary classification). The outcome is labelled into positive (p) or negative (n)
class. For the classification model, the outcome is labelled as actual class and predicted class
{Y, N}. For a binary classifier, there are four possible outcomes. If the outcome and the
actual value are positive, then it is classified as true positive (TP), but the actual value is
counted as negative, then it is classified as false positive (FP). Similarly, if the outcome and
98
actual values are negative, then it is counted as true negative (TN), but if the actual value is
positive, then it is counted as false negative (FN).
Receiver operator characteristic (ROC) graph is a technique for visualising,

organizing and selecting classifiers based on their performance. The ROC analysis is widely
used to analyse the machine learning classifiers and is generally useful as performance
graphing method. ROC is a graph with true positive rates (sensitivity) on X-axis and false
positive rate (1 – specificity) on the Y-axis).
Some classifiers, such as neural networks or hidden Markov model classifiers, yield
a score-a numeric value that represents the degree for which an instance is a member of the
class. Such a ranking or scoring classification can be used with a threshold to produce a
discrete (binary) classifier. (If the classifier output is above the threshold, the classifier
produces a Y, else an N). In the ROC space, each threshold produces a different point. We
can vary the threshold and trace a curve through the ROC space. Any ROC curve generated
from a finite set of instances is acutely a step function, which approaches a true curve as the
number of instances approaches infinity.
Intelligent Agents
Machine learning processes produce numerous models in the form of software.
These models are very useful for various applications. For example, a spam filter in the e-
mail server acts as an intelligent software component. We call this system an intelligent
agent. Basically, these agents are software components incorporated within an application.
Anything which can perceive its environment through sensors and react upon the
environment through actuators can be termed as an agent. A human agent is eyes, ears and
any other organs. For sensors, hands, legs, mouth and other body parts for actuators. A
robotic agent is any one of cameras and infra red range finders for sensors and various
motors for actuators. Intelligent agents are described as an abstract functional system similar
99
to a computer program. The scientist trio, Smith, Cypher and Spohrer, in the year 1994,
described an agent as a persistent software entity dedicated to a specific purpose.
Intelligent agents can be called software entities which process some operations on
behalf of an end user or another program with some degree of independence. While doing
so, the agents employ some knowledge or represent the user's goals or desires. One of the
major differences between a program and an agent is its size, purpose, persistence and
autonomy. A program's size is smaller than that of an agent. Programs are multifunctional
whereas an agent has a particular purpose. The duration of the agent's life does not depend
wholly on a user launching and quitting it. The autonomy of an agent does not need user's
input to function. Agents are divided into five classes based on their degree of perceived
intelligence and capability.
A biological agent performs biological activities in human cell. For example, an
enzyme or a virus will make a biological function in the human cell. Robotic agents
automatically perform their operations like a crawler in the Web architecture. Computational
agents are classified into software and artificial agents. Virtual reality is an application of
artificial life agents. Specific tasks are assisted by agent software which is called task
specific agent. An environment agent is a program that is replicated in the network like a
spam.
1. Simple reflex agents
The decisions are taken based on the current perception and not on the history of
perceptions. This can be basically termed as a condition-action rule, that is, if < condition >
then < action >. There is a lookup table of actions from which actions are taken for every
possible state of environment. If the environment has 'n' variables, each with 't' possible
states, then the table size is 't n' · This will work only for a small number of possible states for
the environment and works if the environment is fully observable.
Simple reflex agents have the admirable property of being simple, but they turn out
to be of very limited intelligence.
2. Model-based reflex agents
If the world is not fully observable, the agent must remember observations on the
parts of the environment which it cannot currently see. This need to have an internal
representation of the world or internal state. Since this representation is a model of
the world, we call it a model-based agent.
The most effective way to handle partial observability is to keep track of the part of
the world it cannot see now. That is, the agent should maintain some sort of internal
state that depends on the percept history and thereby reflects at least some of the
unobserved aspects of the current state. The knowledge required are as follows: we
need some information about how the world evolves independently of the agent and
next we need some information about how the agent's own actions affect the world.
100
The knowledge about "how the world works" is called a model of the world. An
agent that uses such a r model is called a model-based agent.
3. Goal-based agents
The agent has a purpose and the action to be taken depends on the current state and
on what it tries to accomplish (the goal). In some cases, the goal is easy to achieve.
In others, it involves planning, shifting through a search space for possible solutions
and developing a strategy.
The agent sometimes needs some sort of goal information that describes situations
that are desirable. The agent program can combine this goal with information about
the results of possible actions. Sometimes, goal-based action selection is
straightforward, when goal satisfaction results immediately from a single action.
Sometimes it will be more tricky, when the agent has to consider long sequences of
twists and turns to find a way to achieve the goal. Searching and planning are the
subfields of artificial intelligence devoted to finding action sequences that achieve
the agent's goals.
4. Utility-based agents
The agent is aware of a utility function that estimates how close the current state is
to the agent's goal. If there is more than one way to reach the goal, then we need to
choose the best way of doing it. A utility function maps a state (or a sequence of
states) onto a real number, which describes the associated degree of happiness. A
complete specification of the utility function allows rational decisions in two kinds
of cases where goals are inadequate. First, when there are conflicting goals, only
some of which can be achieved {for example, speed and safety), the utility function
specifies the appropriate trade-off. Second, when there are several goals that the
agent can aim for, none of which can be achieved with certainty, utility provides a
way in which the likelihood of success can be weighed up against the importance of
the goals.
5. Learning agents
These agents are capable of acquiring new competence through observations and
actions. The basic components are learning element (modifies the performance
element), performance element (selects actions), feedback element (critic) and
exploration element (problem generator). Learning agents operate initially in an
unknown environment and become more competent than their initial knowledge.
They do not require to be hand programmed. The major parts of a learning agent are
as follows:
a) What we have considered previously for an agent is the performance element.
It will take in precepts and decide on actions to the agent.
b) A feedback from the critic is taken by the learning element. How will the
agent determine the performance element and modify this in future? This is
also take into consideration.
101
c) How well the agent is performing with respect to a fixed performance
standard?
d) This is critic question regarding the learning element.
e) A new and informative experience will lead to a problem generator. This will
be responsible for suggesting actions in informative experience.
Questions
1. Is the learner guaranteed to converge to the correct hypothesis in the limit as the
number of training examples increases indefinitely?
2. How many training examples are needed for a learner to construct (with high
probability) a highly accurate concept?
3. How will machine learning be helpful in job scheduling?
4. What are robotic agents? How do they reduce human efforts?
5. What are the advantages when machine performs intelligently?
6. Cite areas where a machine cannot excel a human being?
7. How will machines make decisions in various problems?
8. What is knowledge-based learning? Describe explanation-based generalization. In
what sense is this learning? What is the role of the data in driving explanation-based
generalisation?
9. What is stratification? What is cross-validation? What is "leave-one-out" cross-
validation? Why is cross-validation useful and what are its drawbacks?
102
CHAPTER 10
SUPERVISED LEARNING
INTRODUCTION
Learning is the process of acquiring knowledge. As mentioned in the earlier chapters,
we have basically three types of learning: supervised, unsupervised and reinforcement
learning. In simple words, supervised learning can be defined as the process of learning with
a supervisor or a teacher. In supervised classification, the classes will be predefined and even
the tests data will be classified to these classes. In short, the process will already have a
defined schema. Let us try to explain the concept with a simple example. Consider the
situation where A wants to travel from place X to Y. A is totally unaware of the route and
has no prior experience in travelling to place Y. Supervised learning is the process where a
supervisor gives A the route map and the kind of knowledge for travelling from X to Y.
Thus, he is fully equipped for the process of travelling.
Supervised learning or classification can be normally employed in situations where
the training data are fully available. In cases where a part of data is hidden or missing, this
type of learning will fail. Throughout this chapter, we are focusing on the various types of
supervised classifiers, namely support vector machine, inductive logic programming, case
based reasoning, ensemble classifier, nearest neighbourhood classifier and fuzzy networks.
Support Vector Machines
Support vector machines (SVMs) fall into supervised learning used for classification
and regression problems. An SVM plays in a linear fashion and can be used as a prediction
model. In another words, SVM is a model that uses machine learning theory to maximize
predictive accuracy while automatically avoiding over-fit to the data. SVM uses statistical
learning methodology by using a hypothesis space of a linear space of functions in a high
dimensional future space. A training algorithm can be incorporated with SVM for learning
or automatically setting values in its model. Vladimir Vapnik and co-workers at AT&T Bell
Laboratories in 1995 developed SVM as a statistical learning model to solve regression
problems.
First, we introduce the margin between two classes. This section gives the intuition
about margins and about the "confidence" of our predictions. Consider that a logic regression
p(y = 1|x; θ) is modelled by hθ (x) = g(θT x) and we would like to predict 1 on an input x if
and only if hθ (x) > 0.5, or equivalently. Consider a positive training example (y = 1). The
larger θx is, the larger also is h(x) = p(y = 1|x;w,b), and, thus, also the higher our degree of
confidence that the label is 1.
Consider Figure in which x's represent positive training examples and o's denote
negative training examples. A decision boundary separates a hyperplane between these
points. Consider three points A, B and C on the figure. Point A is far from the hyperplane.
103
For making a prediction very near to a value of y at A, it confidently seems that y = 1. Point
C is very close to the decision boundary, and while it is on the side of the decision boundary
on which we would predict y = 1, it seems likely that just a small change to the decision
boundary could easily have caused out prediction to be y = 0. Point B is between points A
and C. That is, if a point is far from the separating hyperplane, then we may be significantly
more confident in our predictions.
Figure 13.3 Separating Hyperplane

Margins
Let us formalise the notions of the functional and geometric margins. Given a
training example (xi, yi,), we define the functional margin of (w, b) with respect to the
training example
If yi = 1, then for the functional margin to be large and our prediction to be accurate,
we need wT x + b to be a large positive number. Conversely, if yi, = 1, then for the functional
margin to be large, we need wT x + b to be a large negative number. In order to give a correct
prediction, the functional margin should be larger. Formally, we can define a linear
classification system that halves the input space with a hyperplane P defined by the equation
The class yi is defined by
From the possible hyperplanes, SVM selects the one where the distance of the
hyperplane is as large as possible. For a good training dataset, every test vector is located in
radius r from the training vector. If the chosen hyperplane is located farthest possible from
104
the data, then the desired hyperplane maximises the margin. The hyperplane also bisects the
lines between the closest points on convex hull of the two datasets.
Inductive Logic Programming
A learning process may either be deductive or inductive. Deductive learning works
on existing facts and knowledge and deduces new knowledge from the old. In contrast,
inductive learning uses examples and generates hypotheses based on similarities between
them. Inductive logic programming (ILP) is a subarea of machine learning, that in turn is
part of artificial intelligence. ILP uses contributions from logic programming and statistics
and tries to automate the induction processes. Simply, we can say that inductive logic
programming is the combination of inductive machine learning and logic programming.
Inductive logic programming is a new technology combining the principles of
inductive machine learning with the representation of logic programming. This technology
aims at introducing general rules starting from specific observations and background
knowledge.
Inductive logic programming is regarded as one of the most interesting machine
learning technique. ILP = I ∩ LP, i.e., ILP is the intersection of techniques and interests in
induction and logic programming. This makes inductive logic programming more powerful
than traditional techniques that learn from examples. Inductive logic programming uses an
expressive first-order logic framework instead of traditional attribute value framework, and
facilitates the use of background knowledge. The attribute thing is important because many
domains of expertise cannot be formulated in an attribute-value framework. The second
thing is also significant because background knowledge is the key to success in nearly all
applications of artificial intelligence. At the same time, inductive logic programming has the
room for both theory and practice. Inductive logic programming has a strong theoretical
basis as it inherited many results from logic programming and conceptual learning theory.
Inductive logic programming also has very impressive applications in scientific discovery,
knowledge synthesis and logic programming.
Background knowledge plays a central role in relational learning, where the task is
to define, from given examples, an unknown relation (i.e., the target predicate) in terms of
known relations from background knowledge. If the hypothesis language of a relation
learner is the language of logic programs, then learning is, in fact, logic program synthesis
and has been named inductive logic programming (ILP). In ILP systems, the training
examples, the background knowledge and the induced hypothesis are all expressed in a logic
program form, with additional restrictions imposed on each of the three languages. For
example, training examples are typically represented as ground facts of the target predicate,
and most often background knowledge is restricted to be of the same form.
105
ILP automated learning of logic rules from examples and background knowledge.
For example, learn the rule for grandparents, given the background knowledge of parents
and examples of grandparents.
parent_of( charles,george)
parent_of(george,diana)
parent_of(bob,harry)
parent_of(harry,elizabeth)
grandparent_of(X,Y) :- parent_of(X,Z), parent_of(Z,Y)
From the program, we can ask queries about grandparents
Query: grandparent_of(X,Y)?
Answers:
grandparent_of( charles,diana)
grandparent_of(bob,elizabeth)
Formal definition:
Given: A logic program B representing background knowledge
A set of positive examples E+
A set of negative examples E-
Find hypothesis H such that
1. B U H e ⇒ for all e ∈ E+.
2. B U H f ¬ ⇒ for all f ∈ E-.
3. B U H is consistent.
Assume that B ¬ ⇒ e for some e ∈ E+.
Generic ILP Algorithm
Procedure of ILP (Examples):
1. INITIALISE (Theories, Examples) repeat
2. T = SELECT (Theories, Examples)
3. T i = ln =REFINE (T, Examples)
4. Theories=REDUCE (Theories Ti, Examples) until STOPPINGCRITERION
(Theories, Examples
5. return ( Theories)
1. INITIALISE: initialise a set of theories (e.g., Theories? = true or Theories =
Examples)
106
2. SELECT: select the most promising candidate theory
3. REFINE: apply refine operators that guarantee new theories (specialization,
generalisation,)
4. REDUCE: discard unpromising theories
5. STOPPINGCRITERION: determine whether the current set of theories is already
good enough (e.g., when it contains a complete and consistent theory), SELECT and
REDUCE together implement the search strategy (e.g., hill-climbing: REDUCE =
only keep the best theory.)
Principal Approaches to ILP
The first approach uses techniques based on inverting a resolution proof, and the
second uses generalisation of decision-tree methods.
Inverse resolution
Resolution is based on the observation that, if an example follows from Background
∧ Hypothesis ∧ Descriptions, then one must be able to prove this fact. Inverse resolution is
based on the principle of resolution but it says that if we can "run the proof backwards," then
we can find a Hypothesis from the descriptions and as the proof goes through, it can even
find a way to the background knowledge.
Generating inverse proofs
The backward proof process consists of individual backward steps. An ordinary
resolution step takes two clauses C 1 and C 2 and resolves them to produce the resolvent C.
An inverse resolution step takes a resolvent C and produces two clauses C 1 and C 2 , such
that C is the result of resolving C 1 , and C 2 .
Furthermore, the clauses that participate in each step can be chosen from the
background knowledge, from the example Descriptions, from the negated Classifications or
from hypothesized clauses that have already been generated in the inverse resolution tree.
An exhaustive search process for inverse resolution would be extremely inefficient.
Inductive logic programming systems use a number of restrictions to make the process more
manageable, including the elimination of function symbols, generating only the most
specific hypotheses possible, and the use of Horn clauses. One can also consider inverting
the restricted resolution strategies. With a restricted but complete strategy, such as linear
resolution, the inverse resolution process will be more efficient because certain clauses will
be ruled out as candidates for C 1 and C 2 . Other useful constraints include the fact that all
the hypothesised clauses must be consistent with each other, and that each hypothesised
clause must agree with the observations.
Discovering new predicates and new knowledge
An inverse resolution procedure that inverts a complete resolution strategy is, in
principle, a complete algorithm for learning first-order theories. That is, if some unknown
107
hypothesis generates a set of examples, then an inverse resolution procedure can generate
hypothesis from the examples. This observation suggests an interesting possibility. For
example, the available examples include a variety of trajectories of falling bodies. Would an
inverse resolution program be theoretically capable of inferring the law of gravity? The
answer is clearly yes, because the law of gravity allows one to explain the examples, given
suitable background mathematics. Similarly, one can imagine that electromagnetism,
quantum mechanics and the theory of relativity are also within the scope of ILP programs.
However, such imaginings are at par with the proverbial monkey with a typewriter, at least
until we find ways to overcome the very large branching factors and the lack of structure in
the search space that characterise current systems.
One thing that inverse resolution systems will do for you is invent new predicates.
This ability is often seen as somewhat magical, because computers are often thought of as
"merely working with what they are given". In fact, new predicates fall directly out of the
inverse resolution step. The simplest case arises while hypothesising two new clauses C 1
and C 2 , given a clause C. The resolution of C1 and C2 eliminates a literal that the two
clauses share; hence, it is quite possible that the eliminated literal contained a predicate that
does not appear in C. Thus, when working backwards, one possibility is to generate a new
predicate from which to reconstruct the missing literal.
As we mentioned earlier, the invention of new predicates can significantly reduce
the size of the definition of the goal predicate. Hence, by including the ability to invent new
predicates, inverse resolution systems can often solve learning problems that are infeasible
with other techniques. Some of the deepest revolutions in science come from the invention
of new predicates and functions. For example, Galileo's invention of acceleration or Joule's
invention of thermal energy. Once these terms are available, the discovery of new laws
becomes (relatively) easy. The difficult part lies in realising that some new entity, with a
specific relationship to existing entities, will allow an entire body of observations to be
explained with a much simpler and more elegant theory than previously existed.
As yet, ILP systems have not been applied to such difficult tasks. It does appear,
however, that the ability to use background knowledge provides significant advantages. In
several applications, ILP techniques have outperformed knowledge-free methods. For
example, in molecular biology, it is useful to have background knowledge about typical
molecular bonding patterns, valences of atoms, bond strengths and so on. Using such
knowledge, Stephen Muggleton's GOLEM system has been able to generate high-quality
predictions of both protein structure from sequence information and the therapeutic efficacy
of various drugs based on their molecular structures. These results, like Meta-DENDRAL's,
were considered sufficiently interesting in their own right to be published in leading
scientific journals. The differences between GOLEM'S and Meta-DENDRAL's performance
are that (i) the new domains are much more difficult and {ii) GOLEM is a completely
108
general-purpose program that is able to make use of background knowledge about any
domain whatsoever.
Top-Down learning system
The second approach to ILP is essentially a generalisation of the techniques of
decision-tree learning to the first-order case. Rather than starting from the observations and
working backwards, we start with a very general rule and gradually specialise it so that it
fits the data. This is essentially what happens in decision-tree learning, where a decision
tree is gradually grown until it is consistent with the observations. In the first-order case, we
use first-order literals instead of attributes, and the hypothesis is a set of clauses instead of a
decision tree. FOIL is one of the first programs to use this approach.
characteristics of ILP System
ILP systems can be incremental/non-incremental or interactive/non-interactive.
They describes the way the evidence E (examples) is obtained. In non-incremental or
empirical ILP, the evidence is given at the start and not changed afterwards. In incremental
ILP, the examples are input one by one by the user, in a piecewise fashion.
In interactive ILP, the learner is allowed to pose questions to an oracle (i.e., the user)
about the intended interpretation. Usually, these questions query the user for the intended
interpretation of an example or a clause. The answers to the queries allow pruning large parts
of the search space. Most systems are non-interactive.
Other ILP systems
• FOIL (Quinlan and Cameron-Jones 1993): top-down hill-climbing search
• Progol (Muggleton, 1995): top-down best-first search with bottom clause
• Golem (Muggleton and Feng, 1992): bottom-up hill-climbing search
• LINUS (Lavrac and Dzeroski, 1994): propositionalisation
• Aleph ( Progol), Tilde (relational decision trees)
ILP applications
• Life sciences
• Mutagenecity, predicting toxicology
• Protein structure/folding
• Natural language processing
• English verb past tense
• Document analysis and classification
• Engineering
• Finite element mesh design
• Environmental sciences
• Biodegradability of chemical compounds
109
ILP is a subfield of machine learning which uses logic programming as a uniform
representation for Examples, Background knowledge and Hypotheses. There are many ILP
systems including PROGOL, FOIL, LINUS, etc. Given an encoding of the known
background knowledge and a set of examples represented as a logical database of facts, an
ILP system will derive a hypothesized logic program which entails all the positive and none
of the negative examples. The various ILP applications include bioinformatics, natural
language processing and engineering.
~ase-based Reasoning
Case-based Reasoning
Case-based reasoning (CBR), as the name implies, is the reasoning based on cases.
We can define CBR as the process of problem solving where the results are based on the
solution of previous similar problems. For finding solution to a problem, solutions that were
opted for similar cases in the past are retrieved. It is then used as a baseline for solving the
present problem. The solution derived will be tested and retained. CBR is well accepted in
many case because we can create solution from some background knowledge rather than
from scratch.
How does CBR Work?
Aamodt and Plaza described case-based reasoning as a cyclic process comprising of
four Res (Figure):
• Retrieve
• Reuse
• Revise
• Retain
Figure CBR steps

Whenever a new problem arrives, the database is checked for similar cases.
Whenever it finds a match, the particular case is retrieved. The retrieved problem will
already have a result; it then reused and tested for validity. If the retrieved case is not similar
110
to the current problem, the result need to be revisited and the result, thus, obtained is retained
for future references.
There are various CBR tools which are used for the auto-retrieval of cases. But it
always requires an expert to revisit the cases and to produce valid results.
The simple CBR cycle can be further divided to a hierarchy of CBR tasks. Figure
gives a general idea on the various subtasks that are involved.
Figure Tasks included in the CBR

Cease Base and Case Representation
The case base can be easily termed as a database of cases. A case is a contextualized
piece of knowledge representing an experience. It contains the past lesson that is the content
of the case and the context in which the lesson can be used. Typically a case comprises the
following:
Problem description: It details the state of the world when the case occurred and, if
appropriate, what problem was needed to be solved at that time.
Solution: It details the solution to the problem specified in the problem description or the
reaction to the given situation.
Outcome: It details the resulting state of the world when the solution was applied.
However, the concepts of problem and solution have no general definitions and vary
from application to application. There should be some method in which the cases need to be
stored for easy retrieval. For this, the most commonly used method is indexing. Indexing
always needs to be predictive and must be concrete enough to be recognised in future. The
cases are normally stored in a dynamic memory model, which helps to find the cases easily.
111
Reuse the Case(s) to attempt to solve the problem
When the match is found out, the system tries to use the same solution that is
retrieved for this current problem too. However, the current problem will not be similar to
the one which is retrieved; so the use of the copy of retrieved solution will not be giving a
proper solution. Hence, the solutions need to be adjusted and modified depending on the
parameters and the situation where the problem is being used. This is termed as adaptation
of the retrieved solution to fit the current problem. For the process of adaptation, the reviewer
must have good in-depth domain knowledge and must be able to adjust the parameters and
configuration according to the rules and constraints. The adaptation required depends on the
type of application.
Revise the proposed solution
The major task in CBR is to refine the result. As case-based reasoning suggests only
the solution of previous matches, it is a hard task in validating the result. The correctness
can be found out by applying the revisited solution in a real environment or in some
simulation. If the result is found successful, the solution is retained and stored for future
references. When the generated result is found to be a failure, then arises the opportunity for
learning from failure. The case is revisited by an expert; he changes the user inputs and tries
to create a solution specific to the needed domain. It involves not only repairing the current
solution but also giving explanation for similar problems. It can also be retained for future
references.
Retain the new experiences as a new case in the case base
Retaining is the process of storing or incorporating valuable knowledge that we have
framed for future reuse. This is the final step in the CBR cycle. It can be considered as the
feedback session where the system stores data according to the degree of success. Learning
takes place from the given feedback to the overall system in order to improve a certain
performance using some experience or instructions. Learning can appear in many ways, like
through inductive inference or through decision tress. Induction helps to refine the solution
based on the examples available with their solution. Machine learning methods can be used
in order to improve the knowledge container of a CBR system (adding, creating, deleting
cases) of the similarity measure (adjusting weights) and of the solution transformation (new
adaptation rules), as well as, techniques from statistics and information theory.
Where to use CBR? ,
In which situations can we employ the CBR? The most commonly used situations
are mentioned below.
• When we have a previous record of the solved problem?
• When the expert finds difficult to find a solution from scratch?
• When we have a large domain knowledge obtained through examples?
• We have an experience of handling similar situation.
112
While using CBR for categorising the cases, we do not really need a big knowledge
acquisition phase. Case-based reasoning makes the cases to grow incrementally and will
maintain such cases in the CBR libraries.
Applications
There is a wide range of applications where CBR is being employed. In medical
field, CBR is most commonly used for diagnosis. For diagnosis, the system will try to
retrieve a past case with the same symptom and nature, thus, providing faster service. In
customer service, there are situations where same compliant or same help is needed for a lot
of customers; so if we already have a scheme implemented by checking past details, a perfect
help desk can be made up. In financial market, CBR is used to determine the values of
variables by comparing to something similar. Assessment tasks are quite common and make
use of CBR. Even in applications of decision making and for creating new designs in
architecture and industrial setup, we always checks for similar cases. Thus, CBR is used in
those fields also.
CBR Issues
We have stated the various advantages of CBR and the areas where it is used. But
we need to keep in mind the various issues CBR can get through if a proper scheme is not
employed. The various sensitive problems that the CBR faces include the following:
• The way in which the cases are presented.
• The indexing which is used for easy retrieval.
• The problems while handling larger databases and multimedia data.
• A general adaptation scheme to make the retrieved solution to fit the current
situation.
Hence, there is still a wide area open for exploring case-based reasoning.
Ensemble Classifiers
For each example set, the predicted output of each of these networks is combined to
produce the output of the ensemble classifier. Bagging and boosting are common methods
used in ensemble classifiers.
An ensemble of classifiers is a set of classifiers whose individual decisions are
combined in some way to classify new examples. Ensemble classifiers use multiple models
to obtain better predictive performance. Ensembles combine multiple hypotheses to form a
better hypothesis. In other words, an ensemble is a technique for combining many weak
learners in an attempt to produce a strong learner. Since it can be trained and then used to
make predictions, it is a supervised learning algorithm.
why ensemble learning?
• Combining the predictions of an ensemble is often more accurate than the individual
classifiers that make them up.
113
• The classifiers should be accurate and diverse.
• An accurate classifier is one that has an error rate of better than random guessing.
• Uncorrelated errors of individual classifiers can be eliminated by averaging.
How to construct ensemble?
The two basic questions in designing ensembles are how to generate the basic
hypothesis h1, h2, ... and how to combine them. Usually, we take a weighted node.
For example, ensemble(x) = sign (Σw i ,h i ,(x)), where wi > 0 is the weight of
hypothesis h i .w i > w j means hi is more reliable than h j .
Types of ensemble
There are different types of ensemble learning methods. Some of the common
methods are
• AdaBoost
• Bayes optimal classifier
• Bayesian model averaging
AdaBoost
AdaBoost (adaptive boosting) is an ensemble machine learning algorithm formulated
by Yoav Freund and Robert Schapire in 1997. This algorithm uses the conjunction of other
machine learning algorithms in order to improve the learning performance. Adaboost is very
sensitive to the noisy input data and is less susceptible to the over-fitting problem than most
learning algorithms. A major advantage of combined classifier is better classification
performance than individual classifiers. There are serial, parallel and hybrid classifiers. For
these classifiers, we can use the same dataset, and can manipulate the training set: multiple
training sets are created by resembling the original data according to some distribution.
There are two basic steps in AdaBoost: bagging and boosting. In 1996, Breiman
derived the bagging method which can create classifiers using training sets that are
bootstrapped (drawn with replacement). Bagging improves generalisation performance by
reducing the variance of the base classifiers. The performance of bagging depends on the
stability of the base classifier. For an unstable base classifier, bagging helps to reduce the
errors associated with random fluctuations in the training data. If a base classifier is stable,
bagging may not be able to improve; rather it could degrade the performance. Bagging is
less susceptible to model over-fitting when applied to noisy data.
Boosting is a sequential production of classifiers that is dependent on the previous
one, and focuses on the previous one's errors. Boosting incorrectly predicts the previous
classifiers that are chosen more often or are weighted more heavily.
Algorithm
Consider a training set T of data points xi, and their labels yi in a two-class problem
(+1, -1). We assign initial weights wi, i = 1 to all data points xi. There is a pool of M
114
classifiers to perform M iterations. Hence, W is the sum of weights of all data points, and
We the sum of weights of those data points where the classifier being considered yields the
wrong label.
Form m = 1 to M
1. Select and extract from the pool of classifiers the classifier k m which minimises
2. Set the weight βm of the classifier to
Where e m = W e /W
3. Update the weights of the data points for the next iteration. If k m (x i ) is a miss, set
otherwise
Bayes Optimal Classifier

The Bayes optimal classifier is an optimal classification technique. It is an ensemble
of all the hypotheses available. So we can rate this as an ideal ensemble and it outperforms
the performance of any other ensemble in this category. The entire hypothesis is rated and
vote is given to them depending on the likelihood that it creates a valid hypothesis. To
facilitate the training of data of finite size, the vote of each hypothesis is also multiplied by
the prior probability of that hypothesis.
Limitations
1. It is very difficult to iterate again and again because the hypothesis space is very
large.
2. Almost all hypotheses yield only predicted class.
3. The training set is non-trivial due to the computing an unbiased estimate of the
probability.
4. Estimating the prior probability for each hypothesis is rarely feasible.
115
Bayesian model averaging
Bayesian model averaging is an ensemble technique that seeks to approximate the
Bayes optimal classifier by sampling hypotheses from the hypothesis space and combining
them using Bayes' law. Bayes' optimal classifier was more like a virtual classifier but the
Bayesian model averaging can be implemented. The hypotheses are sampled using MCMC
techniques. While implementing this classifier during certain situations when the hypothesis
is generalised using this manner and averaged according to Bayes' law, the expected error is
bounded to be at most twice the expected error of the Bayes optimal classifier. Even though
this technique provides correctness theoretically, it has a tendency to promote over-fitting.
Simpler ensemble techniques like bagging perform well than this classifier.
Summary
Some issues are related to ensembles classifiers, like parallelism in ensembles. That
is, bagging is easily parallelised but boosting is not. Major variation of boosting needs to
handle noisy data. The diversity of ensembles affect their generalisation performance.
Ensemble classifiers can be summarised as follows:
Ensembles: basic motivation committee of technical experts are more effective than trying
to derive a solution by a single super genius.
• Popular ensemble techniques
• Manipulate training data: bagging and boosting
• Manipulate output values: error-correcting output coding
• Manipulate input feature space
• Train classifiers using different subsets of features
• Work when the input features are highly redundant
Nearest Neighbourhood
In binary classification, we have a training dataset, which contains both attributes x
and a class label c. For example, the vector x might represent an image of a digit, and c
labels which digit it is, c ∈ 0, 1, ... , 9. A dataset D of P training data points is given by D =
{I'', c'}, where i = 1, ... , P. For a novel x to return the correct class c(x) is the major aim of
this method.
The dissimilarity function is defined as d(x, y) symmetric in its arguments (d(x, y) =
d(x, y)d(y, x)) that measure the dissimilarity between the data points x and y.
To classify a new vector x, given a set of training data (xi, ci); i = 1, ... , P,
• Calculate the dissimilarity of the test point x to each of the stored points, di = d(x,
xi).
• Find the training point xi* which is closest to x by finding i* such that di* < di, for all
i = l,...,P.
• Assign the class label c(x) = ci*.
116
The nearest neighbourhood algorithm is simple and powerful, and is used in many
applications such as travelling salesman problem.
Let us discuss briefly nearest neighbourhood learning. Nearest neighbourhood
learning is a method used in pattern recognition. Assume that we are using this to classify
the set of pictures from a traffic jam as trucks and SUVs. Also assume that the pictures of
all other vehicles are removed by some other system. Our system somehow got the weight
and power of the vehicle. Now, we need to add this new vehicle to one of the existing classes,
namely truck and SUV, based on their weight and power. The problem is that some people
call big SUV a truck or perhaps a mini-truck and some others call small trucks an SUV. The
only thin& our system has is the data of a dozen of vehicles that are already classified as
truck or SUV.
Consider Figure. The X -axis represents the weight of the vehicle and the Y –axis
represents the power of the vehicle. Class1 represents the class of SUV and class2 represents
the class of truck. With this given data, our system can classify an unclassified vehicle X as
either a truck or an SUV. What we do is pick an element from both the classes which are
closest to X in the above picture and compare the distance of these two with X. Then we
assign X to the class of the vehicle which is nearest to X. Formally, we define a function
f(X) = closest(class1) – closest(class2). Now, we can say that if (X) is positive, then X
belongs to class2 and otherwise class1.
Assume that we have a really heavy SUV in our class of SUV. This heavy SUV that
has class membership of one class but does, in fact, lies closer to another class is not typical
of its class type but is included nonetheless. In this instance, if our unclassified input is
measured against this rogue sample, i.e., the heavy SUV, it will invariably result in
misclassification. The solution to this fairly basic problem is to take several distance
measures against many class samples such that the effect of any rogue measurement made
is likely to be averaged out. This is K nearest neighbour classification, where K is the
number of neighbouring samples against which we decide to measure.
117
Now, let us say that presently there are n elements in classl and m elements in class2.
We have to calculate the average distance of our new element. With both classes, we have
to find its distance with the n + m elements which are distributed among the two classes.
There are different ways in which we can measure these distance. They are explained in the
following sections.
Hamming distance measure
For two vectors, the Hamming distance is found by evaluating the difference
between each component of one vector with the corresponding component of the other, and
summing these differences to provide an absolute value for the variation between the two
vectors. The Hamming distance is often used to compare binary vectors. It is perhaps
obvious that in this case, the Hamming distance provides a value for the number of bits that
are different between the two vectors. In actual fact, the Hamming distance measure for
binary data can be performed simply by the exclusive-OR function.
Euclidean distance measure
One of the most common metrics used is the Euclidean distance measure. Consider
an example in a rectangular coordinate system where we have two vectors (X and Y) and
we wish to find the distance between them, d(X, Y).
The above equation is for n-dimensional vector. For a two-dimensional vector, the
equation becomes
The Euclidean metric is widely used mainly because it is simple to calculate. For
binary input vectors, the metric reduces to a special case which is mathematically equivalent
to the square root of the Hamming distance.
City block distance (Manhattan)
A simplified version of the Euclidean distance measure is the city block measure.
This method performs the Euclidean measure without calculating the squared or square root
functions. Thus, we have
The effect of this, apart from the obvious one that it is much faster to compute than
the Euclidean, is that points of equal distance from a vector lie on a square boundary about
the vector, as opposed to a circular boundary for the Euclidean. The enclosing circle shown
is the Euclidean boundary for equidistant points about the vector. For the city block distance,
118
anything falling on the square boundary will yield the same distance value. As you no doubt
realise, this does introduce some error into the measure, but this is accepted as a compromise
between accuracy and speed of calculation.
Square distance
Further simplification of the Euclidean distance measure will consequently add more
error to the square distance. Considering this, distance between two vectors is defined as the
absolute maximum of the difference between them.
This again defines a square boundary for the points equidistant from a vector. It is,
however, a larger square than that of the city block, and is, consequently, a coarser measure.
As before, however, the error is tolerated as a compromise between speed and accuracy.
That concludes a brief look at distance metrics; it is by no means exhaustive but we hope
that it least indicates the possible techniques available for comparing the similarity of two
vectors. In the following section, we focus again on the idea of discriminating functions
using decision boundaries rather than comparison methods.
Fuzzy Network
Every intelligent technique has various computational attributes like the ability to
learn, explanation of decisions, operational speed, etc. The properties change from one
application to another and the best suited ones are selected for suitable problems. For
example, for pattern recognition, neural networks are good but they are not suitable for
explaining how they reach that decisions. Meanwhile, fuzzy logic systems are good at
explaining about the decisions but not able to acquire rules which was used to make those
decisions. These limitations were the trigger for creating a hybrid system so that individual
drawbacks could be easy removed.
Fuzzy network refers to the combination of artificial neural network and fuzzy logic.
It was proposed by J.S.R Jang. It is the combination network which incorporates human-like
reasoning style of fuzzy system with the network structure of a neural network. This type of
hybrid structure is called neuro-fuzzy hybridisation or simple fuzzy neural network (FNN).
Interpretability and accuracy are the two major strengths the fuzzy neural networks possess.
A fuzzy neural network is a machine learning system that finds the parameters of a
fuzzy system. The fuzzy system includes fuzzy sets and fuzzy rules. The basic concept
supporting the fuzzy theory is the fuzzy set. Fuzzy sets were initially introduced into neural
networks in 1974 by Lee. A neural network is considered a fuzzy neural network if the
signals and/or the weights in the system are based around fuzzy sets. One feature of fuzzy
set is the ability to realise a complex non-linear input-output relation as a synthesis of
multiple simple input output relations. This idea is similar to that of neural networks. The
simple input-output relation will be described in each rule. The boundary of the rule areas is
119
not sharp but fuzzy. Another feature of fuzzy set is the ability to separate logic and fuzziness.
Since conventional two-value logic-based systems cannot do this, their rules are modified
when either logic or fuzziness should be changed.
Fuzzy Systems
Fuzzy systems can be represented as networks. The computing units must implement
fuzzy operators. Figure 13.9 shows a network with four bidden units. Each one of them
receives the inputs x1, x2 and x3 which correspond to the fuzzy categorisation of a specific
number. The fuzzy operators are evaluated in parallel in the hidden layer of the network
which corresponds to the set of inference rules. The last unit in the network is the defuzzier,
which transforms the fuzzy inferences into a specific control variable. The importance of
each fuzzy inference rule is weighed by the numbers a α1, α2 and α3 as in a weighted centroid
computation (Figure).
Figure Weighted centroid computations
More complex rules can be implemented and this can lead to networks with several
layers. However, fuzzy systems do not usually lead to very deep networks. Since at each
fuzzy inference step, the precision of the conclusion is reduced, it is not advisable to build
too long a inference chain. Fuzzy operators cannot be computed exactly by sigmoid units,
but for some of them, a relatively good approximation is possible, for example, for bounded
sum or bounded difference. A fuzzy inference chain using these operators can be
approximated by a neural network.
The de-fuzzier operator in the last layer can be approximated with standard units. If
the membership functions are triangles, the surface of the triangles grows quadratically with
the height. A quadratic function of this form can be approximated in the pertinent interval
using sigmoid. The parameters of the approximation can be set with the help of a learning
algorithm.
120
Info Fuzzy Networks
Info fuzzy network (IFN) is a greedy machine learning algorithm for supervised
learning. The data structure produced by this learning algorithm is also called IFN. The
construction is similar to decision trees construction. It constructs a directed graph.
IFN construction algorithm
Input: Input variables, a list of data records, and a minimal statistical significance
which is used to check whether the node need to be split or not. Create the root node and the
layer of the target variable.
1. Till all the attributes are used or till the system is delivering high performance, do
a) An attribute with maximal provisional mutual information is found.
b) Validate the contribution of the attribute that has statistical significance using the
likelihood ratio test.
c) If the contribution of the current attribute has a statistical significance, then split toe
node in the previous layer. Otherwise, using the majority rule, create a node from
that node to one 'of the value nodes of the target variable.
2. List of variables are returned which can be used by the net.
Fuzzy Neural Systems
A neuro-fuzzy network is a fuzzy inference system (FIS) in the body of an artificial
neural network. Fuzzy inference system and neural networks are complementary
technologies in the design of adaptive intelligent systems. ANN learns from scratch by
adjusting the interconnections between the layers. Fuzzy inference system is a popular
computing framework based on the concept of fuzzy set theory, fuzzy if-then rule and fuzzy
reasoning. The learning algorithm fine tunes the underlying fuzzy inference system. There
are several layers that simulate the processes involved in a fuzzy inference like fuzzification,
inference, aggregation and de-fuzzification depending on the fuzzy inference system.
Embedding a fuzzy inference system in a general structure of an ANN has the benefit of
using available ANN training methods to find the parameters of a fuzzy system; Comparison
between neural networks and fuzzy systems is shown in Table 13.1.
Architecture
We can represent the neuro-fuzzy system as a three-layer feed-forward network as shown in
Figure.
121
Figure Neuro-fuzzy system architecture
Layer one corresponds to the input variables. Fuzzy rules are represented in the
middle layer and the third layer represents the output variables. Fuzzy connection weights
are formed from the fuzzy sets by the use of layers. Fuzzy sets are encoded in the units of
inner layers when we have a multiple layered design.
There are basically two types of fuzzy neural networks: cooperative and hybrid fuzzy
neural networks. For the cooperative neural fuzzy systems, artificial neural network and
fuzzy system work separately from each other. At this time, ANN learns the parameters from
the fuzzy system. This can be either performed off-line or on-line while the fuzzy system is
applied. Hybrid neuro-fuzzy systems are homogeneous and usually resemble neural
networks. A fuzzy system can be interpreted as a special kind of neural network. The
advantage of such hybrid neuro-fuzzy system is its architecture since both fuzzy system and
neural network do not have to communicate any more with each other. Another modified
version of fuzzy neural network is concurrent which incorporates the schemes of both hybrid
and cooperative networks.
There are some mandatory constraints that the learning algorithm must enforce to
guarantee the characteristics of a fuzzy system:
• Fuzzy sets must stay and convex normal.
• Fuzzy sets must not exchange their relative positions.
• Fuzzy sets must always overlap.
Additionally, there do exist some optional constraints like the following:
• Fuzzy sets must stay symmetric.
• The membership degrees must sum up to 1.
The basic concept supporting the fuzzy theory is the fuzzy sets. The major research
field includes fuzzy sets, fuzzy logic, and fuzzy measure. For knowledge processing and
approximate reasoning, fuzzy reasoning is made used. It is one of the major applications in
this field. Fuzzy control is an application of fuzzy reasoning to control. Although most
122
applications of fuzzy theory have been biased toward engineering, these applications have
recently reached other disciplines, such as medical diagnostics, psychology, education,
economy, management, sociology, etc.
Questions
1. How is background knowledge important in ILP systems?
2. How to calculate hyperplane for SVM?
3. How exactly do neural network and support vector machine methods reduce
dimensionality?
4. Explain the various ILP systems.
5. Describe the main principles of a case-based reasoning system, its operation and the
process.
6. Describe the type of problems where rule-based systems are more appropriate than
CBR systems.
7. What is an ensemble and why do we use ensemble? What makes a good ensemble?
8. Explain the concepts of bagging/boosting.
9. How are classifications merged in an ensemble classifier?
10. Explain info fuzzy networks.
11. Explain adaptive fuzzy systems.
12. Diagrammatically illustrate and discuss the major modules involved in fuzzy logic
control system.
13. Discuss the application of fuzzy logic control in image processing with relevant
examples.
14. What are the key components in an ensemble system?
15. Describe the nearest neighbour's algorithm.
123
CHAPTER 11
UNSUPERVISED LEARNING
INTRODUCTION
In supervised learning, the classes for prediction will be predetermined. It can be
viewed as a finite set which is already defined by humans. Hence, the main task of machine
learning is to search for patterns and to create models according to the predetermined
formats. Decision tree induction, naive Bayes, etc., are examples for supervised learning
algorithms.
Unsupervised learning is a method where we assign values for each action. The
major task is to develop the classification automatically. It then checks the data to find the
similarity and groups them in the particular cluster. It has been found successful in many
applications like creating and training machines capable of driving automobiles. Clustering
is useful when we have enough data to form clusters. The data about the members in a cluster
can be used to produce more accurate rules or to draw conclusions. Classification learning
is used in cases where the decision made by an algorithm is used as input somewhere else.
Both techniques give accurate result but the only thing we need to be careful is the selection
of the learning algorithm that depends on the circumstances and the kind of problem on
which it is applied.
Expectation Maximisation
It was Dempster in the year 1977 who first introduced the expectation maximization
(EM) algorithm. Since then, EM has been widely used in many fields like bioinformatics,
reinforcement learning and cognitive mapping. One reason that makes the EM algorithm so
appealing is the property which allows the computation of problems even with hidden or
missing variables. For the various machine learning algorithms like decision trees, and
neural networks, we need to have an observed value for the attributes, then only it can be
used in classification. However, the EM algorithm not only allows to use data that is
occasionally observed, but also allows you to use data that is never directly observed within
the problem. By introducing hidden variables, the complexity of problems can be reduced.
It is an elegant and powerful method for finding the maximum likelihood of models with
hidden variables.
The expectation maximisation algorithm begins by assigning random values to all
the parameters that need to be estimated. It alternates between two steps, called the
expectation step (i.e., the E-step) and the maximisation step (i.e., the M-step), respectively.
General Expectation Maximisation
To start with, let us initialise the notation. Let the entire training data constitute one
outcome x of a random variable X. Let the parameters of the model be p(x;θ). According to
124
the principle of maximum likelihood, our goal is to choose e which maximises the likelihood
function, that is, L(θ; x) = p(x; θ).
Now, let us initialise Z, a discrete auxiliary random variable whose distribution is a
function of θ, the same as that of X. Let z range over the possible outcomes of Z. By
definition.
In general, the E-step of an EM algorithm is to compute p(x, z;θ t ) for all z. The M-
step is then to find θ to maximise ΣzP (x, z : θt) log p (x, z;θ).
EM Algorithm
1. For each i = 1, 2, 3, .. . , do
2. q = P, -1
3. E-step: compute the complete data samples fq : X → R expected by q
4. fq(x) = f(y) · q(x|y) where y = yield(x)
5. M-step: compute a maximum likelihood estimate p of the model M on fq.
6. L(fq;p) = maz L(fq, p)
7. Pi = p
8. End for each i
9. Print P o , P 1 , P 2 ,...
Remark: P is each instance of the model M
125
EM Example
Parameter estimation for complete and incomplete data is possible by using EM
algorithm.
Maximum likelihood estimation: We have the maximum likelihood procedure which
calculates the counts of heads and tails for both coins A and B separately during every toss.
This count is then used to estimate the coin biases.
Expectation maximisation:
I. The EM algorithm begins with the initial guess of parameters.
II. In the E-step, a probability distribution over possible completions is computed using
the current parameters.
III. In the M-step, new parameters are determined using the current completions.
IV. Algorithm converges after several iterations of the E-step and M-step.
Consider an example; two coins (say coin A and coin B) are flipped 4 times. The
sequence patterns are
Probability of D for given M as

p: Probability of H from coin A
q: Probability of H from coin B
Let us say heads and tails for coin A and coin B are h and t, respectively.
126
The missing data identified are
Coin A Coin B
HHHT HTTH
TTTH HTHH
THTT HTTT
TTHT HHHH
THHH HTHT
If we know the labels (which flips from which coin), we can find the ML values for
p and q. We can we use to label p and q and compute the values as p = 3/4, q = 3/10
So the expected labels are as follow:
Compute expected labels from the missing data. Compute the counts of heads and
tails (fractions). Divide to get the new probabilities as follows:
Features of EM
The major features of expectation maximisation algorithm fall into three points.
First, convergence-guaranteed to converge to at least a local maximum of the likelihood
which is non-decreasing across iterations. Second, efficiency-big steps initially but other
algorithms better later. Third, generality-can be defined for many probabilistic models that
combined with a prior estimation. There are some limitations of EM techniques when
applied to the real situations. Local minima is one of the major problem that occurs in EM
algorithm too. For example, one component poorly fits two clusters, while two components
127
split up a single cluster. Another problem is degeneracy. This means) two components may
merge; a component may lock onto one data point, with variance going to zero. EM may be
intractable for complex models this is an active research topic.
Applications
For data clustering for machine learning, we make use of EM. In psychometrics, EM
is almost indispensable for estimating item parameters and latent abilities of item response
theory models. EM is capable of dealing with missing data and unidentified variables. This
feature makes it more prominent in price and risk management of a portfolio. EM algorithm
is used for medical image reconstruction, in positron emission tomography and single
photon emission computed tomography.
Self Organising Maps
Self-organising maps are neural networks that are effective in dealing with
unexpected and changing conditions. A self-organising map is a type of artificial neural
network which is trained using unsupervised learning methods. It was invented by Teuvo
Kohonen, a professor of the Academy of Finland. Hence, it is also called a Kohonen map. It
can be viewed as a data visualisation technique that helps to clearly understand the high
dimensional data by reducing the dimensions of data by plotting it into a map. It, thus, can
be used for representing the clustering concept by grouping the data together. In short, we
can say that SOM reduces data dimensions and displays similarities among the various data.
Structure of a Self-organising Map
This has a feed-forward structure. A single computational layer of neurons is packed
in rows and columns. Each neuron is fully connected to all the source units in the input layer.
A one dimensional map will just have a single row or column in the computational layer.
Self-organising Maps in Detail
Self-organising maps comprise of an input neuron layer and a competition layer. The
weight of the connection from the input neuron to the single neuron in the competition layer
is taken as the reference vector in the input space. The training method which is used to train
a self organising map is called competition learning. When an input is given to the network,
a neuron in the competition layer is determined which is having the reference vector closer
to the input pattern. This neuron is termed as the winner neuron and is the focal point of the
weight changes.
Neighbourhood relation in self-organising maps is defined on the competition layer.
It helps in identifying the weights that need to be changed. The neighbourhood relations are
normally represented as a two-dimensional grid. The vertices of the grid represent the neuron
and the grid can take any shape such as rectangular or hexagonal. The weights of all the
neurons in the competition layer, which are situated within a certain radius aronnd the winner
neuron, are also adapted during the learning process. But the strength of the adaptation of
such close neurons may depend on their distance from the winner neuron.
128
The basic self-organising map has two operating modes:
1. Building of the map is the primary step in the training process. Then the neural
network organises itself using the competitive process. A large number of input
vectors are fed to the network. Almost all types of vectors which are expected are
represented in the process.
2. Each new input vector should be quickly given a location on the map during the
mapping process. Then the vector is automatically classified or categorised. There
will be one single winning neuron. It is the neuron whose weight vector lies closest
to the input vector. This can be simply determined by calcuiating the Euclidean
distance between input vectors and weight vector.
Learning Process of SOM
We can generally divide the learning process into six steps.
1. Initialise each node's weights.
2. Choose a random vector from the training data and present it to the self-organising
maps.
3. Find the best matching unit (BMU) by calculating the distance between the input
vector and the weights of each node.
4. The radius of the neighbourhood around the BMU is calculated. The size of the
neighbourhood decreases with each iteration.
5. Each node in the BMU's neighbourhood has its weights adjusted to become more
like the BMU. Nodes closest to the BMU are altered more than the nodes furthest
away in the neighbourhood.
6. Repeat from step 2 for enough iterations for convergence.
After initialising the weights of the nodes and choosing the random vector, we need
to find the best match. We find the BMU by calculating the Euclidean distance among the
nodes weights (W 1 , W 2 , ..., W n ) and the input vectors values (V 1 , V 2 , ..., V n )· The Euclidean
distance can be represented as a measurement of similarity between two sets of data. It can
be calculated by
where i varies from 0 to n. After finding the BMU, we need to find the BMU neighbourhood.
The size of the neighbourhood is an exponential decay function that shrinks on each
iteration. The process continues until the neighbourhood is just the BMU itself. We can
represent the width of the lattice as follows:
where λ(t) is the width of the lattice at time t and t/λ is the time constant which looks after
the number of iterations made.
129
Algorithm: SOM
We can formalise the algorithm as follows:
1. Select the output layer network topology. Initialise current neighbourhood distance,
D(0), to positive value.
2. Initialise weights from inputs to outputs to small random values.
3. Lett= 1.
4. While computational bounds are not exceeded, do
a) Select an input sample it.
b) Compute the square of the Euclidean distances of i l from weight vectors (w j )
associated with each node
c) Select output node j* that has weight vector with minimum value from step
2.
d) Update weight to all nodes within a topological distance given by D(t) from
j* using the weight update rule
e) Increment t.
5. EndWhile
SOM Example
Let us now look at a simple example for illustrating the process of self-organising
map. Let i 1 , i 2 , i 3 and i 4 be four training sets
i1: (1, 1, 0, 0)
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)
and for more simplicity, let us assume there are only 2 neurons in the output layer. The
learning rate is also initialised as 0.6.
Advantages
Self-organising maps are very easy to interpret and understand. It is a simple scheme
which is widely used in classification, as similar groups lie close; it is very helpful in
understanding also. Unlike multidimensional scaling or N-land, people can quickly pick up
on how to use them in an effective manner. The process works well and produces the result
very efficiently.
130
Disadvantages
The major drawback of self-organising maps is the problem of getting the right data.
We are given multidimensional data and need to map it into a simple map. Hence, we need
a value for each dimension for the creation of map. Hence, it may be difficult in mapping to
find all the values that correspond to each object. This problem is referred as missing data
issue. Another problem is that every self-organising map is different and finds different
similarities among the sample vectors. Computational cost is also a major setback in self
organising maps; as the dimension increases, the cost for implementation and the time for
computation also increases.
Applications
Self-organising maps have been applied in many areas. Here are just some of them.
• Bibliographic classification.
• Image browsing systems.
• Medical diagnosis.
• Interpreting seismic activity.
• Speech recognition (this is what Kohonen used them for initially).
• Data compression.
• Separating sound sources
• Environmental modelling.
• Natural language processing like linguistic analysis, parsing, learning languages,
hyphenation patterns.
• Optimisation problem such as configuration of telephone connections, VLSI design,
time series prediction, scheduling algorithms.
• Signal processing such as adaptive filters 1 real-time signal analysis, radar, sonar
seismic, USG, EKG, EEG and other medical signals.
• Image recognition and processing areas like segmentation, object recognition,
texture recognition, etc.
• Content-based retrieval, example of WebSOM similarity based image retrieval.
Adaptive Resonance Theory
Dr. Stephen Grossberg in the year 1976 invented adaptive resonance theory (ART).
It is a basic unsupervised learning model. ART is a family of neural architectures; the basic
architecture is ART1 proposed by Carpenter and Grossberg in the year 1987. It was able to
learn and recognise valid binary patterns. The network invented for categorising arbitrary
sequences of analog pattern was introduced by Carpenter and Grossberg in the year 1987; it
was termed as ART2.
Data changes and so do databases. So learning always becomes difficult in such ever
changing databases. There has always been a point of interest for incorporating learning in
131
such changing situations. Such a difficulty is called stability plasticity dilemma. It is the
basic problem where the network is not able to learn new information on top of the old. ART
was proposed to overcome this dilemma. For example, consider a multilayer perceptron
which is fully trained and generating the outputs well. If we try to introduce a new training
vector on top of such a system, the result will be worst, and it will even destroy the previous
learning. ART is a self-organising network which solves the stability plasticity dilemma.
Before going deep into the ART, let us have a diagrammatic presentation on the
various types of ART networks available.
Important ART Networks
We are only considering the unsupervised ART systems in our following
discussions.
Adaptive Resonance Theory: Architecture and Operation

As mentioned above, ART is a self-organising network and is modelled for
generating massively parallel architectures. The system is able to resonate between the
plastic and the stable states. Plastic is the mode where the system is in learning phase, where
the internal parameters can be modified, whereas the stable state can be seen as a fixed
classification set. Let us now look at the basic architecture of an ART network ART is
basically implemented in three versions, namely ART1, ART2 and ART3. Figure shows the
basic classifications of ART networks. Here, let us discuss the ART1 network.
ART Architecture
The basic ART architecture is given in Figure ~- Basically, it consists of two layers:
an input layer and an o11tput layer. The input layer is also called comparison layer.
Similarly, the output layer is termed as recognition layer. These two layers are connected
extensively by a feed-forward and feedback networks. Let W represents the weights in the
feed-forward path and T be the weights in the feedback network The layers also have control
132
signals, namely control 1 in the input layer and control 2 in the output layer. The inputs are
common to each node in the output and input layers. The network also has a reset circuit. It
is responsible for comparing the input to a vigilance threshold that determines whether new
patterns need to be created for a particular input.
Figure ART architecture

The operations in an ART network can be classified as a four-stage process:
initialization phase, recognition phase, comparison phase and search phase.
The initialisation phase has more features than the other neural networks
initialization steps. Control 1 and control 2 are used for controlling data flow through the
layers. Control 1 determines data flow through the input layer and its value changes between
1 and 0. Whenever an input is present in the comparison layer, the value of control 1 is 1,
and it is 0, when the node in the recognition layer is active. Similarly, control 2 enables the
layer by a value 1 and disables the layer by value 0. The feed-forward and feedback weights
are also initialised during this phase. The feed-forward weights are initialised by the
equation, 1/(1 + n), where n is the number of nodes. The feedback weights are initialised to
binary one.
In the recognition phase, the inputs are passed through the network and are matched
against the best match that is available in the output layer. If we actually look close at the
layer, we can see that the layers have three inputs: input vector, control signal and the
feedback signal. The network works on two-third rule, that is, the node gives a positive result
when any of the two signals acting on the node become active. Each weight vector W at the
output layer can be seen as a stored pattern. The input vector is compared with the stored
patterns at each node and the one having closest comparison is selected as the wining node,
i.e., which generated the maximum result.
In the comparison phase, the two-third rule is used for calculating the output from
each node. The pattern and the input vector are ANDed together to produce a new vector on
the output of the comparison layer. This is termed as the comparison vector. The comparison
vector is passed to the reset circuit with the current input. The reset circuit is now responsible
for testing the similarity between the input and the comparison vector. The test result will
133
be a ratio, denoted by 8, and if this 8 is greater than the vigilance threshold, then the
classification is complete and the class membership is denoted by the active node in the
output layer.
Search phase is about finding new matching vectors for the inputs presented in the
comparison layer. The best match is calculated as said in the comparison step. If there is no
node that is having the same pattern as that of the input layer, then the process again starts
with the creation of a new pattern by considering the latest input vector which is presented
in the comparison layer. The process is repeated until the patterns are fully recognised. This
completes the working of ART networks.
ART Algorithm .
As we have seen the architecture and the operations of ART networks, now let us
discuss the ART algorithm. The steps are just the symbolic representation of the above
mentioned phases.
1. Initialise the network. W and T are initialized. The vigilance threshold is also made
equal to a value between 0 and 1.
2. Apply new inputs.
3. Compute the matching, µ j = ΣW ij (t)X i .
4. Select the best match, i.e., µ j = max [µ j ]
5. Check whether the match corresponds to the nodes in the recognition layer; if yes,
go to
6. step 6, else go to step 2.
7. Adapt best match, and repeat the process.
*0.5 is the biased value.
ART networks continue to add new information, until they utilises all of the available
memory, and continually refine the knowledge stored within it as new information is
presented. This makes the network more reliable and is used in various applications. The
network has also proved to produce results in noisy environments too. This was one of the
most discussed and controversial works of Grossberg.
Questions
1. Why is the nave Bayesian classification called naive?
2. Why does the EM (expectation maximization) algorithm necessarily converge?
What are some of its common applications?
3. What does it mean to say: "In information geometry, the E step and the M step are
interpreted as projections under dual affine connections?"
4. Describe the concept of exponential decay in case of self-organising maps.
5. How is the process of learning carried out in SOM?
134
6. What are the main similarities and differences between feed-forward neural networks
and self-organising maps?
7. What are the main features of an SOM that can help to analyse the business data in
a data warehouse?
8. What is a main difference between RBF neural networks and SVM?
9. Explain fixed center RBF network and an adaptive RBF network?
135

NIBM MBA Guide to Artificial Intelligence Part I

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

NIBM MBA Guide to Artificial Intelligence Part I

Uploaded by

Copyright:

Available Formats

National Institute of

III THE HISTORY OF ARTIFICIAL INTELLIGENCE 34

IV FRONTIERS OF ARTIFICIAL INTELLIGENCE 44

V PHILOSOPHY OF ARTIFICIAL INTELLIGENCE 52

VI THE LAW FOR ARTIFICIAL INTELLIGENCE 57

VII THE IMPACT OF ARTIFICIAL INTELLIGENCE ON HUMAN 67

X SUPERVISED LEARNING 103

XI UNSUPERVISED LEARNING 124

Problems of Artificial Intelligence

Knowledge Representation and Reasoning

The search problems encountered in computer science are of a deterministic nature.

Partially commutative production system: Partially commutative production system is a

Travelling Salesman Problem

2. Knight on square 1 → Move Knight to 6

3. Knight on square 2 → Move Knight to 9

4. Knight on square 2 → Move Knight to 7

5. Knight on square 3 → Move Knight to 4

6. Knight on square 3 → Move Knight to 8

7. Knight on square 4 → Move Knight to 9

8. Knight on square 4 → Move Knight to 3

9. Knight on square 6 → Move Knight to 1

10. Knight on square 6 → Move Knight to 7

11. Knight on square 7 →Move Knight to 2

12. Knight on square 7 →Move Knight to 6

13. Knight on square 8 →Move Knight to 3

14. Knight on square 8 →Move Knight to 1

15. Knight on square 9 →Move Knight to 2

16. Knight on square 9 →Move Knight to 4

State Space Representation

State Space Search

Branches of Artificial Intelligence

Applications of Artificial Intelligence

Natural Language Processing

1. Optical character recognition

Nature Inspired Computing

Artificial intelligence tree

1. Define artificial intelligence.

What are the main areas of research and development in AI?

What is the philosophy of AI?

How will AI affect the law?

Are robots going to take away our jobs?

How will AI affect white-collar professions?

Algorithm 13 Backward chaining procedure

Figure Model-based reasoning

Figure : Branches of machine learning

History of Machine Learning

Receiver operator characteristic (ROC) graph is a technique for visualising,

Figure 13.3 Separating Hyperplane

The class yi is defined by

Figure CBR steps

Figure Tasks included in the CBR

2. Set the weight βm of the classifier to

Bayes Optimal Classifier

Figure Weighted centroid computations

Probability of D for given M as

So the expected labels are as follow:

Adaptive Resonance Theory: Architecture and Operation

Figure ART architecture

You might also like