You are on page 1of 36

Chapter 2

Concept Learning
What is a Concept?
• Much of learning involves acquiring general
concepts from specific training examples
• Concept: subset of objects from some space
• Example: The concept of a bird is the subset
of all objects (i.e., the set of all things or all
animals) that belong to the category of bird.]

• Alternatively, a concept is a boolean-valued


function defined over this larger set 2
What is a Concept?
• Let there be a set of objects, X.
X = {White Fang, Scooby Doo, Wile E, Lassie}
• A concept C is…
A subset of X
C = dogs = {Lassie, Scooby Doo}
• A function that returns 1 only for elements in
the concept
C(Lassie) = 1, C(Wile E) = 0

3
What is Concept-Learning?
• Concept learning: Defining a function that
specifies which elements are in the concept
set.
• Given a set of examples labeled as members
or non-members of a concept, concept-
learning consists of automatically inferring
the general definition of this concept.
• In other words, concept-learning consists of
approximating a boolean-valued function
from training examples of its input and
output.
Instance Representation
• Represent an object (or instance) as an n-
tuple of attributes
• Example: Days (6-tuples)

5
Example Concept Function
• “Days on which my friend Ted enjoys his
favorite water sport”
Sky: Sunny, Cloudy, Rainy Wind: Strong, Weak
AirTemp: Warm, Cold Water: Warm, Cool
Humidity: Normal, High Forecast: Same, Change
▪ Instances in the training set (out of the 96 possible)
(3*2*2*2*2*2 = 96 )

6
Hypotheses representation
• h is a set of constraints on attributes:
– a specific value: e.g. Water = Warm
– any value allowed: e.g. Water = ?
– no value allowed: e.g. Water = Ø

• Example hypothesis:
Sky AirTemp Humidity Wind Water Forecast
Sunny, ?, ?, Strong, ?, Same
Corresponding to boolean function:
Sunny(Sky) ∧ Strong(Wind) ∧ Same(Forecast)

• H, hypotheses space, all possible h


7
Hypothesis satisfaction
• An instance x satisfies an hypothesis h iff all
the constraints expressed by h are satisfied by
the attribute values in x.
• Example 1:
x1: Sunny, Warm, Normal, Strong, Warm, Same
h1: Sunny, ?, ?, Strong, ?, Same Satisfies? Yes

• Example 2:
x2: Sunny, Warm, Normal, Strong, Warm, Same
h2: Sunny, ?, ?, Ø, ?, Same Satisfies? No
8
Formal task description
• Given:
– X all possible days, as described by the attributes
– A set of hypothesis H, a conjunction of constraints on the
attributes, representing a function h: X → {0, 1}
[h(x) = 1 if x satisfies h; h(x) = 0 if x does not satisfy h]
– A target concept: c: X → {0, 1} where
c(x) = 1 iff EnjoySport = Yes;
c(x) = 0 iff EnjoySport = No;
– A training set of possible instances D: {x, c(x)}
• Goal: find a hypothesis h in H such that
h(x) = c(x) for all x in D
Hopefully h will be able to predict outside D…
9
The inductive learning assumption
▪ We can at best guarantee that the output
hypothesis fits the target concept over the
training data
▪ Assumption: an hypothesis that approximates
well the training data will also approximate
the target function over unobserved examples
▪ i.e. given a significant training set, the output
hypothesis is able to make predictions

10
Concept learning as search
▪ Concept learning is a task of searching an hypotheses space
▪ The representation chosen for hypotheses determines the
search space
▪ In the example we have:
▪ 3 x 25 = 96 possible instances (6 attributes)
▪ 1 + 4 x 35= 973 possible hypothesis
considering that all the hypothesis with some 
are semantically equivalent, i.e. inconsistent
▪ Structuring the search space may help in searching more
efficiently

11
General to specific ordering
• Consider:
h1 = Sunny, ?, ?, Strong, ?, ?
h2 = Sunny, ?, ?, ?, ?, ?
• Any instance classified positive by h1 will also be classified
positive by h2
• Example - x1: Sunny, Warm, Normal, Strong, Warm, Same
• h2 is more general than h1
• Definition: hj g hk iff (x  X ) [(hk = 1) → (hj = 1)]
g more general or equal; >g strictly more general
• Most general hypothesis: ?, ?, ?, ?, ?, ?
• Most specific hypothesis: Ø, Ø, Ø, Ø, Ø, Ø 12
More-General-Than Relation
• For any instance x in X and hypothesis h in H,
we say that x satisfies h if and only if h(x) = 1.
• More-General-Than-Or-Equal Relation:
Let h1 and h2 be two boolean-valued
functions defined over X.
Then h2 is more-general-than-or-equal-to h1
(written h2 ≥ h1) if and only if any instance
that satisfies h2 also satisfies h1.
• h2 is more-general-than h1 ( h2 > h1) if and
only if h2 ≥ h1 is true and h1 ≥ h2 is false. We
also say h1 is more-specific-than h2 13
General to specific ordering: induced
structure

14
Find-S: finding the most specific
hypothesis
• Exploiting the structure we have alternatives to
enumeration …
1.Initialize h to the most specific hypothesis in H
2.For each positive training instance:
for each attribute constraint a in h:
If the constraint a is satisfied by x then do nothing
else replace a in h by the next more general
constraint satified by x (move towards a more
general hypothesis)
3.Output hypothesis h 15
Find-S in action

16
Problems with FIND-S Algorithm
• Has FIND-S converged to the correct target concept?
– Although FIND-S will find a hypothesis consistent with the training
data, it has no way to determine whether it has found the only
hypothesis in H consistent with the data (i.e., the correct target
concept), or whether there are many other consistent hypotheses as
well.
– We would prefer a learning algorithm that could determine whether it
had converged and, if not, at least characterize its uncertainty
regarding the true identity of the target concept.
• Why prefer the most specific hypothesis?
– In case there are multiple hypotheses consistent with the training
examples, FIND-S will find the most specific.
– It is unclear whether we should prefer this hypothesis over, say, the
most general, or some other hypothesis of intermediate generality.

17
Problems with FIND-S Algorithm
• Are the training examples consistent?
– In most practical learning problems there is some chance that the training
examples will contain at least some errors or noise.
– Such inconsistent sets of training examples can severely mislead FIND-S, given
the fact that it ignores negative examples.
– We would prefer an algorithm that could at least detect when the training
data is inconsistent and, preferably, accommodate such errors.
• What if there are several maximally specific consistent
hypotheses?
– In the hypothesis language H for the EnjoySport task, there is always a unique,
most specific hypothesis consistent with any set of positive examples.
– However, for other hypothesis spaces there can be several maximally specific
hypotheses consistent with the data.
– In this case, FIND-S must be extended to allow it to backtrack on its choices
of how to generalize the hypothesis, to accommodate the possibility that the
target concept lies along a different branch of the partial ordering than the
branch it has selected.
18
Candidate Elimination Algorithm
▪ FIND-S outputs a hypothesis from H, that is
consistent with the training examples, this is just one
of many hypotheses from H that might fit the
training data equally well
▪ The idea: output a description of the set of all
hypotheses consistent with the training examples
(correctly classify training examples).
▪ Version space: a representation of the set of
hypotheses which are consistent with training
examples D
1. an explicit list of hypotheses (List-Then-Eliminate)
2. a compact representation of hypotheses which exploits
the more_general_than partial ordering (Candidate-
19
Elimination)
Version space
• The version space VSH,D is the subset of the hypothesis
from H consistent with the training example in D
VSH,D {h  H | Consistent(h, D)}
• An hypothesis h is consistent with a set of training
examples D iff h(x) = c(x) for each example in D
Consistent(h, D)  ( x, c(x)  D) h(x) = c(x))
Note: "x satisfies h" (h(x)=1) different from “h
consistent with x"
In particular when an hypothesis h is consistent with a
negative example d =x, c(x)=No, then x must not
satisfy h 20
The List-Then-Eliminate algorithm
Version space as list of hypotheses
1.VersionSpace  a list containing every
hypothesis in H
2.For each training example, x, c(x)
Remove from VersionSpace any hypothesis h
for which h(x)  c(x)
3.Output the list of hypotheses in VersionSpace

•Problems
– The hypothesis space must be finite
– Enumeration of all the hypothesis, rather inefficient
21
Example Version Space

Note: The output of Find-S is just Sunny, Warm, ?, Strong, ?, ?


▪ Version space represented by its most general members G
and its most specific members S (boundaries)
22
General and specific boundaries
▪ The Specific boundary, S, of version space VSH,D is the set of its
minimally general (most specific) members
S {s  H | Consistent(s, D)(s'  H)[(s gs')  Consistent(s', D)]}

Note: any member of S is satisfied by all positive examples, but


more specific hypotheses fail to capture some

▪ The General boundary, G, of version space VSH,D is the set of its


maximally general members
G {g  H | Consistent(g, D)(g'  H)[(g' g g)  Consistent(g', D)]}

Note: any member of G is satisfied by no negative example but


more general hypothesis cover some negative example
23
Candidate Elimination Algorithm
• Initialize G to the set of maximally general hypotheses
in H
• Initialize S to the set of maximally specific hypotheses
in H
• For each training example d, do
– If d is a positive example
• Remove from G any hypothesis inconsistent with d,
• For each hypothesis s in S that is not consistent with d,
– Remove s from S
– Add to S all minimal generalizations h of s such that
– » h is consistent with d, and some member of G is more general than h
– Remove from S any hypothesis that is more general than another hypothesis in S

– If d is a negative example
• Remove from S any hypothesis inconsistent with d
• For each hypothesis g in G that is not consistent with d
– Remove g from G
– Add to G all minimal specializations h of g such that
– » h is consistent with d, and some member of S is more specific than h 24
– Remove from G any hypothesis that is less general than another hypothesis in G
Example: initially
S0: , , , , . 

G0 ?, ?, ?, ?, ?, ?

25
Example:
after X1 = Sunny, Warm, Normal, Strong, Warm, Same  +

S0: , , , , . 

S1: Sunny,Warm, Normal, Strong, Warm, Same

G0 , G1 ?, ?, ?, ?, ?, ?

26
Example:
after X2 = Sunny, Warm, High, Strong, Warm, Same +

S1: Sunny,Warm, Normal, Strong, Warm, Same

S2: Sunny,Warm, ?, Strong, Warm, Same

G1 , G2 ?, ?, ?, ?, ?, ?

27
Example:
after x3 = Rainy, Cold, High, Strong, Warm, Change  −

S2, S3: Sunny, Warm, ?, Strong, Warm, Same

G3 : Sunny, ?, ?, ?, ?, ? ?, Warm, ?, ?, ?, ? ?, ?, ?, ?, ?, Same

G2: ?, ?, ?, ?, ?, ?

• Make general hypothesis (G2) more specific


• Comparing all the attributes of the negative instance with the positive
instance
• If attribute found different, create a dedicated set for the attribute. 28
• Specific hypothesis (S2) remains the same
Example:
after x4 = Sunny, Warm, High, Strong, Cool Change  +

S3 Sunny, Warm, ?, Strong, Warm, Same

S4 Sunny, Warm, ?, Strong, ?, ?

G4 : Sunny, ?, ?, ?, ?, ? ?, Warm, ?, ?, ?, ?

G3 : Sunny, ?, ?, ?, ?, ? ?, Warm, ?, ?, ?, ? ?, ?, ?, ?, ?, Same

One member of G3 is also removed because it fails to cover this new


positive training example
29
Learned Version Space
This is the Version Space which will be obtained

30
Observations
▪ The learned Version Space correctly describes the
target concept, provided:
1. There are no errors in the training examples
2. There is some hypothesis that correctly describes the
target concept
▪ If S and G converge to a single hypothesis the
concept is exactly learned
▪ In case of errors in the training, useful hypothesis
are discarded, no recovery possible
▪ An empty version space means no hypothesis in H
is consistent with training examples
31
Ordering on training examples
▪ The learned version space does not change
with different orderings of training examples
▪ Efficiency does
▪ Optimal strategy (if you are allowed to choose)
▪ Generate instances that satisfy half the hypotheses
in the current version space. For example:
Sunny, Warm, Normal, Light, Warm, Same satisfies 3/6
hypotheses.
▪ Ideally the VS can be reduced by half at each
experiment
▪ Correct target found in log2|VS| experiments 32
Use of partially learned concepts
Example to Classify:

• Classified as positive by all hypothesis,


33
since satisfies any hypothesis in S
Classifying new examples
Example to Classify:

• Classified as negative by all hypothesis, since does


not satisfy any hypothesis in G 34
Classifying new examples
Example to Classify:

• Uncertain classification: half hypothesis are


consistent, half are not consistent 35
Classifying new examples
Example to Classify:
Sunny, Cold, Normal, Strong, Warm, Same

• 4 hypothesis not satisfied; 2 satisfied


• Probably a negative instance. Majority vote?
36

You might also like