Professional Documents
Culture Documents
The lecture is based on the seminal paper by Tom Mitchell from 1977, "Version
Spaces: A Candidate Elimination Approach to Rule Learning", T.M.
Mitchell, Proceedings of the 5th International Joint Conference on Artificial
Intelligence, Cambridge MA, August 1977, pp. 305-310.
Generalization as Search
Given
• A language in which to describe instances = instance language. Hypotheses
• A set of positive and negative training instances of a target
generalization to be learned H1 H2 H3
• A language in which to describe generalizations = hypothesis language. Specific
• The hypothesis language spans a Hypothesis space = H
• H is a partially ordered set (Poset) by the ´more specific than´ order
relation that relates a more general concept to a more specific.
• A matching predicate M that can test whether a given instance and H4 H5 H6
generalization match.
Determine
H7 H8 H9 H10 General
Generalizations within the provided hypothesis language that are consistent
with the presented training instances = if and only if it matches every
positive instance and no negative instance in the set.
Constraints for the feasibility of the approach:
(1) the training instances contain no errors and
(2) it is possible to formulate a correct description of the target
generalization within the given generalization language.
The Hypothesis (Generalization) Language
The choice of a Hypothesis language does have a major influence on the
capabilities and efficiency of the learning system. In choosing a generalization
language, the designer fixates the domain of generalizations which the program
may describe, and therefore learn.
Most current systems employ generalization languages that are biased in the sense
that they are capable of representing only some of the possible sets of describable
instances. This bias constitutes both a strength and a weakness for the system:
1. if the bias is inappropriate, it can prevent the system from ever inferring
correct generalizations.
2. if the bias is appropriate, it can provide the basis for important inductive leaps
beyond information directly available from the training instances.
The choice of generalization language also has a strong influence on the resource
requirements of the system:
3. The complexity for hypotheses represented by graphs can be exponential, while
the complexity for hypotheses represented by feature vectors is linear.
4. a language for which the general-to-specific ordering is shallow and branchy
will typically yield larger hypothesis space than a language in which the
ordering is narrow but deep.
The ´more-specific-than“ relation between Hypotheses
Car
Given two generalizations, G1 and G2, G1 is ´more-specific-than G2 if and only if
the set of instances that G1 matches = M(G1,i) is a proper subset of the instances
that G2 matches =M(G2,i) considering all instances I and the matching predicate M.
Note that the definition of the relation is extensional - based upon the instance sets
that the generalizations covers. Also note that the definition of this relation is
Vehicle
dependent only on the defined instance language, hypothesis language and matching
predicate.
The word "partial" in the name "partially ordered set indicates that not
every pair of elements needs to be directly ordered.
Example: The Hasse diagram of
That is, there may be pairs of elements for which neither element the set of all subsets of a three-
precedes the other in the Poset. In contrast in a total ordering every pair element set {x, y, z}, ordered by
is directly ordered. inclusion.
Example
Throughout this lecture a simple example will be used and a pair of instance and
hypothesis languages.
In this example , the instances are unordered pairs of simple objects characterized
by three properties. Each object is described by
- its shape (e.g., square, circle, triangle),
- its color (e.g., red, orange, yellow) and
- its size (e.g., large, small).
The instance language
The observation (instance) language describes each instance as an unordered
pair of feature vectors.
Example Instance l: { (Large Red Square) (Small Yellow Circle) )
The hypothesis language
The hypothesis language is equivalent to the instance language with only addition that
feature values can have a wildcard (question mark).
Example Generalization 1: {(Small ? Circle) (Large ? ?) }
Example of a partially ordered hypothesis space
G1
Specific
G3
X
G2
General
G1: {(Large Red Circle) (Large ? ?) G2: {(? ? Circle) (Large ? ?) G3: {(? ? Circle) (Large Blue ?)
Here, G1 is more-specific-than G2: the constraints in G2 are logically implied by those in G1, and
therefore any instance which matches G1 must also match G2.
Although the sets of instances characterized by G3 and GI intersect, neither set contains the other.
Generalization Strategies
Because they judge the generated hypotheses by their performance over many
instances rather than making decisions based upon individual training instances,
they can accommodate quite severe errors in the training data.
On the other hand, generate-and-test strategies are not well suited to incremental
processing of training data -- should unexpected data become available, the
generate-and-test search may have to be completely re-executed.
Furthermore, since the generation of hypotheses is not influenced by the data, the
search can be quite branchy and expensive.
Comparison of three Data-Driven Generalization Strategies
Many generalization programs employ search strategies that are data-driven, in the sense that
they consider discrepancies between the current hypothesis and available data in order to
determine appropriate revisions to the current hypothesis.
The aim of this lecture is to study three alternative classes of data-driven strategies :
- Depth first search.
- Specific to General Breadth first search
- The Version Space approach.
The search starts by choosing a CBH consistent with the first observation (instance).
The CBH is then tested against a new training instance, and is altered as needed
making the definition either more restrictive (for negative observations) or more relaxed
(for positive observations). In doing so, one must look back at all of the previous
observations to insure that consistency is maintained.
In each step there are many different ways of changing the CBH. Depth-first search
explores all of these options through traditional backtracking.
Search
When the second positive instance is observed, CBH1 must be revised so that it will
match the new positive instance. Notice that there are many plausible revisions to
CBH1 in addition to CBH2.
The third training instance which conflicts with CBH2. In this case, although CBH2
could be specialized to exclude the new negative instance, no such revision is
consistent with the observed positive instances.
Therefore, the system must backtrack to an earlier version of the CBH, reconsidering
its previous revisions to determine a revision CBH3 that will be consistent with the
new negative instance as well as the observed positive instances.
To be continued in Part 2