You are on page 1of 41

Artificial Intelligence (6CS6.

2)
Unit 5 : Learning

Topics

5.1 Introduction to Learning and Learning Methods


5.1.1 Rote learning
5.1.2 Learning by Taking Advice
5.1.3 Learning in Problem Solving Experience
5.1.4 Learning from Examples or Induction
5.1.5 Explanation Based Learning
5.1.6 Discovery
5.1.7 Analogy
5.2 Formal Learning Theory
5.3 Introduction to Neural Networks and Applications
5.4 Common sense reasoning
5.5 Expert systems

5.1 Introduction to Learning


An AI system cannot be called intelligent until It can learn to
do new things and adapt to new situations, rather than simply
doing as they are told to do.
Definitions of Learning: changes in the system that are
adaptive in the sense that they enable the system to do the
same task more efficiently and more effectively the next time.
Learning covers a wide range of phenomenon:
1. Skill Refinement : Practice makes skills improve. More
you play tennis, better you get
2. Knowledge Acquisition : Knowledge is generally
acquired through experience.

Various Learning Mechanism


1. Simple storing of computed information or rote learning, is
the most basic learning activity. Many computer programs
ie., database systems can be said to learn in this sense
although most people would not call such simple storage
learning.
2. Another way we learn if through taking advice from others.
Advice taking is similar to rote learning, but high-level advice
may not be in a form simple enough for a program to use
directly in problem solving.
3. People also learn through their own problem-solving
experience.
4. Learning from examples : we often learn to classify things
in the world without being given explicit rules. Learning from
examples usually involves a teacher who helps us classify
things by correcting us when we are wrong.

5.1.1 : Rote Learning *


When a computer stores a piece of data, it is performing a
basic form of learning. In case of data caching, we store
computed values so that we need not to re-compute them
later.
When computation is more expensive than recall, this
strategy can save a significant amount of time. Caching has
been used in AI programs to produce some surprising
performance improvements. Such caching is known as rote
learning.
Rote learning does not involve any sophisticated problemsolving capabilities. It shows the need for some capabilities
required of complex learning systems such as:
Organized Storage of information
Generalization

Rote Learning : Example

In figure 1, value of A is
computed as 10. This value
is stored for some future
use.
In figure 2, value of A
required again to solve new
game tree.
Instead of re-computing the
value of node A, rote
learning is being used and
stored value of node A is
directly applied.

5.1.2 : Learning by taking Advice


A computer can do very little without a program for it to run. When
a programmer writes a series of instructions into a computer, a
basic type of learning is taking place: The programmer is sort of
a teacher and the computer is a sort of student.
After being programmed, the computer is now able to do
something it previously could not.
Executing a program may not be such a simple matter. Suppose
the program is written in high level language such as Prolog, some
interpreter or compiler must intervene to change the teachers
instructions into code that the machine can execute directly.
People process advice in an analogous way. In chess, the advice
take control of the chess board center is useless unless the
player can translate the advice into concrete moves and plans. A
computer program might make use of the advice by adjusting its
static evaluation function to include a factor based on the number
of center squares attacked by its own pieces.

Learning by advice cont


A program called FOO, which accepts advice for playing hearts,
a card game. A human user first translates the advice from english
into a representation that FOO can understand.
A human can watch FOO play, detect new mistakes, and correct
them through yet more advice, such as play high cards when it is
safe to do so.
Limitation: Ability to operationalize the knowledge is a
challenging job in learning by advice.

5.1.3 : Learning In Problem Solving Experience


Can program get better without the aid of a teacher?
It can be by generalizing from its own experiences.
Various techniques are as follows:
5.1.3.a Learning by Parameter Adjustment
5.1.3.b Learning with Macro Operators
5.1.3.c Learning by Chunking

5.1.3.a : Learning by Parameter Adjustment

Many programs rely on an evaluation procedure that combines


information from several sources into a single summary statistic.
1. Game playing programs do this in their static evaluation functions in
which a variety of factors such as piece advantage and mobility are
combined into a single score reflecting the desirability (goodness or
usefulness) of a particular board position.
2. Pattern classification programs often combine several features to
determine the correct category into which a given stimulus should be
placed.

In designing such programs, it is often difficult to know a priori how much


weight should be attached to each feature being used. One way of
finding the correct weights is to begin with some estimate of the correct
settings and then to let the program modify the settings on the basis of its
experience.

Features that appear to be good predictors of overall success will have


their weights increased, while those that do not will have their weights
decreased. Samuels checkers program uses static evaluation function
in the polynomial: c1t1 + c2t2 + +c16 t16

The t terms are the values of the sixteen features that contribute to the
evaluation. The c terms are the coefficients that are attached to each of
these values. As learning progresses, the c values will change.

5.1.3.b Learning by Macro-Operators


Sequences of actions that can be treated as a whole are called
macro-operators.
Example: suppose we want to go to the main post office of the
city. Our solution may involve getting in your car, starting it and
driving along a certain route. Substantial planning may go into
choosing the appropriate route.
Here, we need not to plan about how to start the car. We can use
START-CAR as an atomic action. It actually consists of several
primitive actions like (1) sitting down, (2) adjusting the mirror, (3)
inserting the key and (4) turning the key.
Macro-operators were used in the early problem solving system
STRIPS. After each problem solving episode, the learning
component takes the computed plan and stores it away as a
macro-operator, or MACROP.
MACROP is just like a regular operator, except that it consists of a
sequence of actions, not just a single one.

5.1.3.c Learning by Chunking


Chunking is a process similar in flavor to macro-operators.
The idea of chunking comes from the psychological literature
on memory and problem solving. Its computational basis is in
Production systems.
When a system detects useful sequence of production firings,
it creates chunk, which is essentially a large production that
does the work of an entire sequence of smaller ones.
SOAR is an example production system which uses
chunking.
Chunks learned during the initial stages of solving a problem
are applicable in the later stages of the same problem-solving
episode. After a solution is found, the chunks remain in
memory, ready for use in the next problem.
At present, chunking is inadequate for duplicating the
contents of large directly-computed macro-operator tables.

5.1.4 : Learning from Examples: Induction **

Classification is the process of assigning, to a particular input, the name


of a class to which it belongs. The classes from which the classification
procedure can choose can be described in a variety of ways.

Their definition will depend on the use to which they are put. Classification
is an important component of many problem solving tasks.

Before classification can be done, the classes it will use must be defined:
1. Isolate a set of features that are relevant to the task domain. Define
each class by a weighted sum of values of these features. Exa: task
is weather prediction, the parameters can be measurements such as
rainfall, location of cold fronts etc.
2. Isolate a set of features that are relevant to the task domain. Define
each class as a structure composed of these features. Ex:
classifying animals, various features can be such things as color,
length of neck etc

The idea of producing a classification program that can evolve its own
class definitions is called concept learning or induction.

Winstons Learning Program


This program is an early structural concept learning program.
It operates in a simple blocks world domain. Its goal was to
construct representations of the definitions of concepts in
blocks domain.
For example, it learned the concepts House, Tent and Arch.
A near miss is an object that is not an instance of the
concept in question but that is very similar to such instances.

Winstons Learning Program

Figure shown above shows structural description of a house.

Winstons Learning Program

Figure shown above shows structural description of a arch. Figure (b) describe that
a brick is supported by two bricks. Figure (c) describe that a wedge is supported by
two bricks.

Basic approach of Winstons Program


1.

Begin with a structural description of one known instance of


the concept. Call that description the concept definition.

2.

Examine descriptions of other known instances of the


concepts. Generalize the definition to include them.

3.

Examine the descriptions of near misses of the concept.


Restrict the definition to exclude these.

5. Explanation-Based Learning
Limitation of learning by example: Induction Learning requires a
substantial number of training instances for describing complex
concept .
But human beings can learn quite a bit from single examples.
Humans dont need to see dozens of positive and negative
examples of fork( chess) positions in order to learn to avoid this
trap in the future and perhaps use it to our advantage.
What makes such single-example learning possible? The answer
is knowledge. Much of the recent work in machine learning has
moved away from the empirical, data intensive approach described
in the last section toward this more analytical knowledge intensive
approach.
A number of independent studies led to the characterization of this
approach as explanation-base learning (EBL). An EBL system
attempts to learn from a single example x by explaining why x is an
example of the target concept. The explanation is then
generalized, and then systems performance is improved through
the availability of this knowledge.

EBL cont

We can think of EBL programs as accepting the following as input:


A training example
A goal concept: A high level description of what the program is
supposed to learn
An operational criterion- A description of which concepts are usable.
A domain theory: A set of rules that describe relationships between
objects and actions in a domain.

From this EBL computes a generalization of the training example that is


sufficient to describe the goal concept, and also satisfies the
operationality criterion.

Explanation-based generalization (EBG) is an algorithm for EBL and has


two steps: (1) explain, (2) generalize

During the explanation step, the domain theory is used to prune away all
the unimportant aspects of the training example with respect to the goal
concept. What is left is an explanation of why the training example is an
instance of the goal concept. This explanation is expressed in terms that
satisfy the operationality criterion. The next step is to generalize the
explanation as far as possible while still describing the goal concept.

5.1.6 : Discovery
Learning is the process by which one entity acquires
knowledge. Usually that knowledge is already possessed by
some number of other entities who may serve as teachers.
Discovery is a restricted form of learning in which one
entity acquires knowledge without the help of a teacher.
Various types are as follows:
5.1.6.a Theory-Driven Discovery
5.1.6.b Data Driven Discovery
5.1.6.c Clustering

5.1.6.a : Theory-driven Discovery (AM)

Discovery is certainly a type of learning. More clearly we can say it is type


of problem solving. Suppose that we want to build a program to discover
things in maths, such a program would have to rely heavily on the
problem-solving techniques.

AM is written by Lenat and it worked from a few basic concepts of set


theory to discover a good deal of standard number theory.

AM exploited a variety of general-purpose AI techniques. It used a frame


system to represent mathematical concepts. One of the major activities
of AM is to create new concepts and fill in their slots.

AM uses Heuristic search, guided by a set of 250 heuristic rules


representing hints about activities that are likely to lead to interesting
discoveries.

In one run AM discovered concept of prime numbers. How did it do it


Having stumbled onto the natural numbers, AM explored operations
such as addition, multiplication and their inverses. It created the
concept of divisibility and noticed that some numbers had very few
divisors.

5.1.6.b Data Driven Discovery (BACON)

AM showed how discovery might occur in theoretical setting. This type of


scientific discovery has inspired several computer models. BACON is one of
such models.

Langley presented a model of data-driven scientific discovery that has been


implemented as a program called BACON ( named after Sir Francis Bacon,
a philosopher of science). BACON begins with a set of variables for a
problem.

Example: In the study of the behavior of gases, some variables are p (


pressure), V (volume), n (amount in moles) and T (temperature). A law,
called Ideal Gas Law, relates these variables. BACON can derive this law
on its own.

First, BACON holds the variables n and T constant, performing experiments


at different pressures p1, p2 and p3. BACON notices that as the pressure
increases, the volume V decreases. For all values, n,p, V and T, pV/nT =
8.32 which is ideal gas law as shown by BACON.

BACON also discover wide variety of scientific laws such as Keplers third
law, Ohms law, the conservation of momentum and Joules law.

Limitation: Much more work must be done in areas of science that BACON
does not model.

5.1.6.c : Clustering
Clustering is very similar to induction. In Inductive learning a
program learns to classify objects based on the labeling
provided by a teacher,
In clustering, no class labeling are provided. The program
must discover for itself the natural classes that exist for the
objects, in addition to a method for classifying instances.
AUTOCLASS is one program that accepts a number of
training cases and hypothesizes a set of classes. For any
given case, the program provides a set of probabilities that
predict into which classes the case is likely to fall.
In one application, AUTOCLASS found meaningful new
classes of stars from their infrared spectral data. This was an
instance of true discovery by computer, since the facts it
discovered were previously unknown to astronomy.
AUTOCLASS uses statistical Bayesian reasoning of the
type discussed.

5.1.7 : Analogy

Analogy is a powerful inference tool. Our language and reasoning


are full with analogies.
1. Last month, the stock market was a roller coaster.
2. Virat Kohli was like a fire engine in last match.
3. Mamta Banarjee filp flops on railway budget.

To learn above examples, a complicated mapping is done between


what appear to be dissimilar concepts. In example 1, to understand,
it is necessary to do two things:
1. Pick out one key property of a roller coaster, namely that it
travels up and down rapidly.
2. Realize that physical travel is itself an analogy for numerical
fluctuations of stock market.

Limitation: This is no easy trick. The space of possible analogies is


very large. An AI program that is unable to grasp analogy will be
difficult to talk to and consequently difficult to teach. Thus analogical
reasoning is an important factor in learning by advice taking.
Humans often solve problems by making analogies to things they
already understand how to do.

5.1.7.a : Transformational Analogy

Suppose we are asked to prove a theorem in a plane


geography. We look for a previous theorem that is very
similar and copy its proof, make its substitution where
required.
Here idea is to transform a solution of a previous problem
into solution of current problem.

5.1.7.b : Derivational Analogy

The transformational analogy does not look at how the old problem
was solved. It only looks at final solution of old problem.
Often the internal details of old solution are also relevant to solve the
new problem. The detailed history of a problem solving episode is
called derivation.
Analogy which use these derivation in their new problem solving
process, is referred as Derivational Analogy.

5.2 Formal Learning Theory *


Learning has attracted the attention of mathematicians and
theoretical computer scientists. Inductive learning in
particular has received considerable attention.
Formally, a device learns a concept if it can given positive
and negative examples, produces and algorithm that will
classify future examples correctly with probability 1/h.
Complexity of learning a concept is function of 3 factors:
1. error tolerance (h)
2. number of binary features present in the examples (t)
3. size of the rule necessary to make the discrimination (f)
If the number of training examples required is polynomial in
h, t, and f, then the concept is said to be learnable.

Formal Learning Theory cont..


Example: Given positive and negative examples of strings in
some regular language, can we efficiently induce the finite
automation that produces all and only the strings in the
language? The answer is no; an exponential number of
computational steps is required.
It is difficult to tell how such mathematical studies of learning
will affect the ways in which we solve AI problems in practice.
After all, people are able to solve many exponentially hard
problems by using knowledge to constrain the space of
possible solutions.
Perhaps mathematical theory will one day be used to quantify
the use of such knowledge but this prospect seems far off.

Formal Learning Theory cont..


3 positive and 3 negative example of concept elephant are
shown below:

Gray ? Mammal ? Large? Vegetarian ?

Wild ?

Elephant

Elephant

Mouse

Giraffe

Dinosaur

Elephant

5.3 Neural Network *


Benefits of NN:
Pattern recognition, learning, classification, generalization and
abstraction, and interpretation of incomplete and noisy inputs
Character, speech and visual recognition can be done.
Can provide some human problem-solving characteristics.
Can tackle new kinds of problems.
Robust and Fast.
Flexible and easy to maintain.
Powerful hybrid systems.
Limitation of NN:
Do not do well at tasks that are not done well by people
Lack explanation capabilities
Limitations and expense of hardware technology restrict most applications
to software simulations
Training time can be excessive and tedious
Usually requires large amounts of training and test data

5.4 Commonsense Reasoning

5.5 Expert Systems **


Definition: Expert system is a specific type of AI system that
solve such problem which are usually solved by human
experts.
Humans need to acquire some special training to solve
expert level problems. Example of expert level problems are
architecture design, engineering task, medical diagnosis etc.
To solve expert level problem, an ES require following things:
1. Access to domain specific knowledge base
2. Exploit one or more reasoning mechanism
3. Mechanism for explanation

Architecture of Expert Systems

Components of Expert Systems


1.

2.
3.
4.
5.
6.
7.
8.

Knowledge Acquisition
Subsystem
Knowledge Base
Inference Engine
User Interface
Blackboard (Workplace)
Explanation Subsystem (Justifier)
Knowledge Refining System
User

User Interface

Inference Engine

Knowledge Base

1. Knowledge Acquisition Subsystem

Knowledge acquisition is the accumulation, transfer and


transformation of problem-solving expertise from experts and/or
documented knowledge sources to a computer program for
constructing or expanding the knowledge base
Requires a knowledge engineer

2. Knowledge Base
The knowledge base contains the knowledge necessary for
understanding, formulating, and solving problems
Two Basic Knowledge Base Elements
Facts
Special heuristics, or rules that direct the use of knowledge
Knowledge is the primary raw material of ES
Incorporated knowledge representation

3. Inference Engine:
The brain of the ES
The control structure (rule interpreter)
Provides methodology for reasoning
Major Elements are as follows:
1. Interpreter
2. Scheduler
3. Consistency Enforcer
4. User Interface:
Language processor for friendly, problem-oriented
communication
Consists of menus and graphics.

5. Blackboard (Workplace) :

Area of working memory to


1. Describe the current problem
2. Record Intermediate results
Records Intermediate Hypotheses and Decisions
1. Plan
2. Agenda
3. Solution

6. Explanation Subsystem (Justifier):

Traces responsibility and explains the ES behavior by interactively


answering questions Why? How? What? (Where? When? Who?)
Knowledge Refining System

Learning for improving performance

Human Element in Expert Systems


1. Expert: Expert has the special knowledge, judgment, experience and
methods to give advice and solve problems. He provides knowledge
about task performance.
2. Knowledge Engineer: KE Helps the expert, structure the problem area by
interpreting and integrating human answers to questions, drawing
analogies, posing counterexamples, and bringing to light conceptual
difficulties. KE is usually the system builder.
3. User: Possible Classes of Users
A non-expert client seeking direct advice (ES acts as a Consultant )
A student who wants to learn (ES acts as a Instructor)
An ES builder improving or increasing the knowledge base (Partner)
An expert (ES acts as a Colleague or Assistant)
The Expert and the Knowledge Engineer Should Anticipate Users'
Needs and Limitations When Designing ES.
4. Others: System Builder, Systems Analyst, Tool Builder, Vendors,
Support Staff, Network Expert

ES Shell

Includes All Generic ES Components


But No Knowledge
Example: EMYCIN from MYCIN
(E = Empty)

Expert Systems Benefits:


Limitations of Expert Systems:
1. Increased Output and
1.
ES work well only in a narrow
Productivity
domain of knowledge.
2. Enhancement of Problem
2.
Lack of trust by end-users
Solving and Decision Making
3.
Knowledge is not always readily
3. Improved Decision Quality
available. Expertise can be hard
to extract from humans.
4. Increased Process and Product
Quality
4.
Each experts approach may be
different, yet correct
5. Flexibility
5.
Expert system users have
6. Easier Equipment Operation
natural cognitive limits
7. Operation in Hazardous
6.
Knowledge engineers are rare
Environments
and expensive
8. Integration of Several Experts'
7.
ES may not be able to arrive at
Opinions
valid conclusions
9. Can Work with Incomplete or
Uncertain Information

Thanks

You might also like