You are on page 1of 17

NPTEL

Video Course on Machine Learning

Professor Carl Gustaf Jansson, KTH

Week 4 Inductive Learning based on


Symbolic Representations
and Weak Theories

Video 4.2 Generalization as Search Part 1


Agenda for the Lecture
Generalization as search in general

The hypothesis language


The ´more specific than´ relation
Partial Orderings

Generate and Test strategies for Generalization

Data driven strategies for generalization

Depth first search


Breadth first search
Version Space Approach
Overview
The purpose of this lecture is to compare various approaches to generalization
in terms of a single framework which we term Generalization as Search.

Toward this end, generalization is cast as a search problem, and alternative


methods for generalization are characterized in terms of the search strategies that
they employ.

The lecture is based on the seminal paper by Tom Mitchell from 1977, "Version
Spaces: A Candidate Elimination Approach to Rule Learning", T.M.
Mitchell, Proceedings of the 5th International Joint Conference on Artificial
Intelligence, Cambridge MA, August 1977, pp. 305-310.
Generalization as Search
Given
• A language in which to describe instances = instance language. Hypotheses
• A set of positive and negative training instances of a target
generalization to be learned H1 H2 H3
• A language in which to describe generalizations = hypothesis language. Specific
• The hypothesis language spans a Hypothesis space = H
• H is a partially ordered set (Poset) by the ´more specific than´ order
relation that relates a more general concept to a more specific.
• A matching predicate M that can test whether a given instance and H4 H5 H6
generalization match.
Determine
H7 H8 H9 H10 General
Generalizations within the provided hypothesis language that are consistent
with the presented training instances = if and only if it matches every
positive instance and no negative instance in the set.
Constraints for the feasibility of the approach:
(1) the training instances contain no errors and
(2) it is possible to formulate a correct description of the target
generalization within the given generalization language.
The Hypothesis (Generalization) Language
The choice of a Hypothesis language does have a major influence on the
capabilities and efficiency of the learning system. In choosing a generalization
language, the designer fixates the domain of generalizations which the program
may describe, and therefore learn.

Most current systems employ generalization languages that are biased in the sense
that they are capable of representing only some of the possible sets of describable
instances. This bias constitutes both a strength and a weakness for the system:
1. if the bias is inappropriate, it can prevent the system from ever inferring
correct generalizations.
2. if the bias is appropriate, it can provide the basis for important inductive leaps
beyond information directly available from the training instances.

The choice of generalization language also has a strong influence on the resource
requirements of the system:
3. The complexity for hypotheses represented by graphs can be exponential, while
the complexity for hypotheses represented by feature vectors is linear.
4. a language for which the general-to-specific ordering is shallow and branchy
will typically yield larger hypothesis space than a language in which the
ordering is narrow but deep.
The ´more-specific-than“ relation between Hypotheses
Car
Given two generalizations, G1 and G2, G1 is ´more-specific-than G2 if and only if
the set of instances that G1 matches = M(G1,i) is a proper subset of the instances
that G2 matches =M(G2,i) considering all instances I and the matching predicate M.

Note that the definition of the relation is extensional - based upon the instance sets
that the generalizations covers. Also note that the definition of this relation is
Vehicle
dependent only on the defined instance language, hypothesis language and matching
predicate.

The more-specific-than relation imposes a partial ordering over the generalizations


in the hypothesis space. It provides a powerful basis for organizing the search
through the hypothesis space.

In order for the ´more specific-than´ relation to be practically computable by a


computer program, it must be possible to determine whether G1 is more-specific-
than G2 by examining the descriptions of G1 and G2, without computing the
(possibly infinite) sets of instances that they match. This requirement places
restrictions upon the nature of generalization languages.
Partial ordering
In mathematics a partially ordered set (Poset) formalizes the ordering
of the elements of a set.

A Poset consists of a set together with a binary relation indicating that,


orders pairs of elements in the set such one of the elements precedes the
other in the ordering.

The word "partial" in the name "partially ordered set indicates that not
every pair of elements needs to be directly ordered.
Example: The Hasse diagram of
That is, there may be pairs of elements for which neither element the set of all subsets of a three-
precedes the other in the Poset. In contrast in a total ordering every pair element set {x, y, z}, ordered by
is directly ordered. inclusion.
Example
Throughout this lecture a simple example will be used and a pair of instance and
hypothesis languages.
In this example , the instances are unordered pairs of simple objects characterized
by three properties. Each object is described by
- its shape (e.g., square, circle, triangle),
- its color (e.g., red, orange, yellow) and
- its size (e.g., large, small).
The instance language
The observation (instance) language describes each instance as an unordered
pair of feature vectors.
Example Instance l: { (Large Red Square) (Small Yellow Circle) )
The hypothesis language
The hypothesis language is equivalent to the instance language with only addition that
feature values can have a wildcard (question mark).
Example Generalization 1: {(Small ? Circle) (Large ? ?) }
Example of a partially ordered hypothesis space
G1
Specific
G3
X

G2
General

G1: {(Large Red Circle) (Large ? ?) G2: {(? ? Circle) (Large ? ?) G3: {(? ? Circle) (Large Blue ?)

Here, G1 is more-specific-than G2: the constraints in G2 are logically implied by those in G1, and
therefore any instance which matches G1 must also match G2.

In contrast, G3 and G1 are not comparable generalizations according to the more-specific-than


relation.

Although the sets of instances characterized by G3 and GI intersect, neither set contains the other.
Generalization Strategies

Data driven strategies

The instance space is traversed Depth first search


and as consequence the Breadth first search
hypothesis space is searched. Version Space Strategy

Generate and test strategies

The hypothesis space is traversed General to specific


And as a consequence the Specific to General
instance spaced is searched
Generate-and-Test Strategies
Generate-and-test strategies, generates new hypotheses according to a predetermined
procedure that is independent of the input data. Each newly generated hypothesis is
then tested against the entire set of available training data, and identified as either an
acceptable generalization, a node to be expanded further by the generator, or a node
to be pruned from the search.

Generate-and-test strategies typically consider all available training instances at each


step of the search to test newly generated hypotheses.

Because they judge the generated hypotheses by their performance over many
instances rather than making decisions based upon individual training instances,
they can accommodate quite severe errors in the training data.

On the other hand, generate-and-test strategies are not well suited to incremental
processing of training data -- should unexpected data become available, the
generate-and-test search may have to be completely re-executed.

Furthermore, since the generation of hypotheses is not influenced by the data, the
search can be quite branchy and expensive.
Comparison of three Data-Driven Generalization Strategies
Many generalization programs employ search strategies that are data-driven, in the sense that
they consider discrepancies between the current hypothesis and available data in order to
determine appropriate revisions to the current hypothesis.

The aim of this lecture is to study three alternative classes of data-driven strategies :
- Depth first search.
- Specific to General Breadth first search
- The Version Space approach.

For each one of these we will:


- Give a qualitative characterization
- Scetch a protypical algorithm
- Trace a simple example
- Give some comments on the trace.
Depth-first Search Strategy
One data-driven strategy for generalizing from examples is Depth-first search. In this
strategy, a single generalization is kept as the current best hypothesis for describing the
identity of the target generalization: the current best hypothesis = CBH.

The search starts by choosing a CBH consistent with the first observation (instance).

The CBH is then tested against a new training instance, and is altered as needed
making the definition either more restrictive (for negative observations) or more relaxed
(for positive observations). In doing so, one must look back at all of the previous
observations to insure that consistency is maintained.

In each step there are many different ways of changing the CBH. Depth-first search
explores all of these options through traditional backtracking.

Drawbacks with the Depth-fist approach:


• It is costly to test each alteration to the current hypothesis for consistency with past
training instances.
• In the event that subsequent instances reveal an incorrect choice has been made, the
program must backtrack to reconsider previously training instances and
generalizations.
Depth-first Search Strategy

For each subsequent instance, i


begin
if i is a negative instance, and i matches CBH
then
begin
- Consider ways of making CBH more specific so that i no longer matches it.
- Test these possible revisions to find those that match all earlier positive instances.
- Choose one acceptable revision as the new CBH.
end
else
if i is a positive instance, and i does not match CBH then
begin
- Consider ways of making CBH more general so that i matches it.
- Test these possible revisions to find those that do not match any earlier negative instance.
- Choose one acceptable revision as the new CBH.
end
if none of the considered revisions to CBH result in a generalization consistent with previous instances as well as i,
then Backtrack to an earlier version of CBH, and try a different branch in the search, and reprocess instances that have
been processed since that point.
end
Depth-first Search trace
Training Instances:

Instance 1. (Large Red Triangle) (Small Blue Circle) } positive


Instance 2. { (Large Blue Circle) (Small Red Triangle)} positive
Instance 3. { (Large Blue Triangle) (Small Blue Triangle) } negative

Search

CBH1: { (Large Red Triangle) (Small Blue Circle) }

CBH2: { (Large ? ?) (Small ? ? ) }

CBH3: (? Red Triangle) (? Blue Circle)}


Comments to the Depth-first Search trace
The first positive training instance leads to initializing the current best hypothesis to
CBH1, which matches no instances other than the first positive instance.

When the second positive instance is observed, CBH1 must be revised so that it will
match the new positive instance. Notice that there are many plausible revisions to
CBH1 in addition to CBH2.

The third training instance which conflicts with CBH2. In this case, although CBH2
could be specialized to exclude the new negative instance, no such revision is
consistent with the observed positive instances.

Therefore, the system must backtrack to an earlier version of the CBH, reconsidering
its previous revisions to determine a revision CBH3 that will be consistent with the
new negative instance as well as the observed positive instances.
To be continued in Part 2

You might also like