You are on page 1of 40

Machine Learning

Bilal Khan
SEARCH IN CONCEPT SPACE

2
SEARCH IN CONCEPT SPACE

Concept Learning by Induction


• Learning has been classified into several types: deductive,
inductive, etc.

• Much of human learning involves acquiring general


concepts from specific training examples (this is called
inductive learning)

3
SEARCH IN CONCEPT SPACE

Concept Learning by Induction


• Example: Concept of ball
* red, round, small
* green, round, small
* red, round, medium

4
SEARCH IN CONCEPT SPACE

Concept Learning by Induction


• Each concept can be thought of as a Boolean-valued function whose
value is true for some inputs and false for all the rest
(e.g. a function defined over all the animals, whose value is true for
birds and false for all the other animals)
• This lecture is about the problem of automatically inferring the general
definition of some concept, given examples labeled as members or
nonmembers of the concept. This task is called concept learning, or
approximating (inferring) a Boolean valued function from examples

5
SEARCH IN CONCEPT SPACE

Concept Learning by Induction

• Target Concept to be learnt: “Days on which Ahmed


enjoys his favorite water sport”
• Training Examples present are:

6
SEARCH IN CONCEPT SPACE

Concept Learning by Induction


• The training examples are described by the values of
seven “Attributes”

• The task is to learn to predict the value of the attribute


EnjoySport for an arbitrary day, based on the values of its
other attributes

7
SEARCH IN CONCEPT SPACE

Concept (Hypothesis) Representation

8
SEARCH IN CONCEPT SPACE

Concept Learning by Induction: Hypothesis Representation

• The possible concepts are called Hypotheses and we need


an appropriate representation for the hypotheses
• Let the hypothesis be a conjunction of constraints on the
attribute-values

9
SEARCH IN CONCEPT SPACE

Concept Learning by Induction: Hypothesis Representation

• If
(sky = sunny)  (temp = warm)  (humidity = ?) 
(wind = strong)  (water = ?)  (forecast = same)
then
Enjoy Sport = Yes
else
Enjoy sport = No

• Alternatively, this can be written as:


{sunny, warm, ?, strong, ?, same}

10
SEARCH IN CONCEPT SPACE

Concept Learning by Induction: Hypothesis Representation


• For each attribute, the hypothesis will have either
? Any value is acceptable
Value Any single value is acceptable
 No value is acceptable

11
SEARCH IN CONCEPT SPACE

Concept Learning by Induction: Hypothesis Representation

• If some instance (example/observation) satisfies all the


constraints of a hypothesis, then it is classified as positive
(belonging to the concept)

• The most general hypothesis is {?, ?, ?, ?, ?, ?}


It would classify every example as a positive example

• The most specific hypothesis is {, , , , , }


It would classify every example as negative

12
SEARCH IN CONCEPT SPACE

Concept Learning by Induction: Hypothesis Representation

By selecting a hypothesis representation, the space of all hypotheses


(that the program can ever represent and therefore can ever learn)
is implicitly defined

In our example, the instance space X can contain 3.2.2.2.2.2 = 96


distinct instances

13
SEARCH IN CONCEPT SPACE

Concept Learning by Induction: Hypothesis Representation

If we use the following hypothesis representation

There are 5.4.4.4.4.4 = 5120 syntactically distinct hypotheses.


Since every hypothesis containing even one  classifies every
instance as negative, hence semantically distinct hypotheses are:
4.3.3.3.3.3 + 1 = 973
14
SEARCH IN CONCEPT SPACE

Concept Learning by Induction: Hypothesis Representation

Most practical learning tasks involve much larger, sometimes


infinite, hypothesis spaces

15
SEARCH IN CONCEPT SPACE

Search in Concept (Hypothesis) Space

16
SEARCH IN CONCEPT SPACE

Concept Learning by Induction: Search in Hypotheses Space

Concept learning can be viewed as the task of searching through


a large space of hypotheses implicitly defined by the
hypothesis representation

The goal of this search is to find the hypothesis that best fits the
training examples

17
SEARCH IN CONCEPT SPACE

Concept Learning by Induction: Basic Assumption

Once a hypothesis that best fits the training examples is found,


we can use it to predict the class label of new examples

The basic assumption while using this hypothesis is:

Any hypothesis found to approximate the target function well


over a sufficiently large set of training examples will also
approximate the target function well over other unobserved
examples

18
SEARCH IN CONCEPT SPACE

Concept Learning by Induction: General to Specific Ordering

If we view learning as a search problem, then it is natural


that our study of learning algorithms will examine
different strategies for searching the hypothesis space

Many algorithms for concept learning organize the search


through the hypothesis space by relying on a general to
specific ordering of hypotheses

19
SEARCH IN CONCEPT SPACE

Concept Learning by Induction: General to Specific Ordering

Example:
Consider h1 = {sunny, ?, ?, strong, ?, ?}
h2 = {sunny, ?, ?, ?, ?, ?}
any instance classified positive by h1 will also be
classified positive by h2 (because it imposes fewer
constraints on the instance)
Hence h2 is more general than h1 and h1 is more specific
than h2

20
SEARCH IN CONCEPT SPACE

Concept Learning by Induction: General to Specific Ordering

Consider the three hypotheses h1, h2 and h3

21
SEARCH IN CONCEPT SPACE

Concept Learning by Induction: General to Specific Ordering


• Neither h1 nor h3 is more general than the other

• h2 is more general than both h1 and h3

• Note that the “more-general-than” relationship is


independent of the target concept. It depends only on which
instances satisfy the two hypotheses and not on the
classification of those instances according to the target
concept

22
SEARCH IN CONCEPT SPACE

List – then – Eliminate Algorithm

23
SEARCH IN CONCEPT SPACE

List-then-Eliminate Algorithm

This algorithm first makes a set of all possible hypotheses


possible, then eliminates any hypothesis found
inconsistent with any training example

The set of candidate hypotheses thus shrinks as more


examples are observed, until only those one hypotheses
remain that are consistent with all the observed examples

24
SEARCH IN CONCEPT SPACE

List-then-Eliminate Algorithm

25
SEARCH IN CONCEPT SPACE

List-then-Eliminate Algorithm

For the Enjoy Sport data we can list 973 possible hypotheses

Then we can test each hypothesis to see whether it confirms


with our training data set or not

26
SEARCH IN CONCEPT SPACE

{?, ?, ?, ?, ?, ?}

27
SEARCH IN CONCEPT SPACE

List-then-Eliminate Algorithm

For this data we will be left with the following hypotheses

h1 = {Sunny, Warm, ?, Strong, ?, ?}


h2 = {Sunny, ?, ?, Strong, ?, ?}
h3 = {Sunny, Warm, ?, ?, ?, ?}
h4 = {?, Warm, ?, Strong, ?, ?}
h5 = {Sunny, ?, ?, ?, ?, ?}
h6 = {?, Warm, ?, ?, ?, ?}

28
SEARCH IN CONCEPT SPACE

List-then-Eliminate Algorithm

If insufficient data is available to narrow the set of


hypotheses to the consistent hypothesis, then the
algorithm can output the entire set of hypotheses
consistent with the observed data

It has the advantage that it guarantees to output all the


hypotheses consistent with the training data

Unfortunately, it requires exhaustive listing of all hypotheses


– an unrealistic requirement for practical problems
29
SEARCH IN CONCEPT SPACE

Find-S Algorithm

30
SEARCH IN CONCEPT SPACE

Find-S Algorithm

How to find a hypothesis consistent with the observed training


examples?
- A hypothesis is consistent with the training examples if it
correctly classifies these examples

A positive training example is an example of the concept to be


learnt.

Similarly a negative training example is not an example of the


concept

31
SEARCH IN CONCEPT SPACE

Find-S Algorithm

We say that a hypothesis covers a positive training example if it


correctly classifies the example as positive

One way is to begin with the most specific possible hypothesis,


then generalize it each time it fails to cover a positive
training example (i.e. classifies it as negative)

The algorithm based on this method is called Find-S


It finds a maximally specific hypothesis

32
SEARCH IN CONCEPT SPACE

Find-S Algorithm

33
SEARCH IN CONCEPT SPACE

Find-S Algorithm

34
SEARCH IN CONCEPT SPACE

Find-S Algorithm

The nodes shown in the diagram are the possible hypotheses


allowed by our hypothesis representation scheme

Note that our search is guided by the positive examples and we


consider only those hypotheses which are consistent with the
positive training examples

The search moves from hypothesis to hypothesis, searching


from the most specific to progressively more general
hypotheses
35
SEARCH IN CONCEPT SPACE

Find-S Algorithm

At each step, the hypothesis is generalized only as far as


necessary to cover the new positive example

Therefore, at each stage the hypothesis is the most specific


hypothesis consistent with the training examples observed up
to this point

Hence, it is called Find-S

36
SEARCH IN CONCEPT SPACE

Find-S Algorithm

Note that the algorithm does not get any help from the negative
examples

However, since at each step our current hypothesis is maximally


specific it will never cover (falsely classify) any negative
example. In other words, it will be always consistent with
each negative training example

However the data must be noise free and our hypothesis


representation should be such that the true concept can be
described by it
37
SEARCH IN CONCEPT SPACE

Find-S Algorithm

Problems with Find-S:

1. No way of knowing if there are other possible target


concepts
2. Why prefer the most specific hypothesis?
3. If the training examples are not consistent, the algorithm
may fail
4. What if there are several maximally specific consistent
hypotheses?

38
SEARCH IN CONCEPT SPACE

Definition: Version Space

Version Space is the set of hypotheses consistent with the


training examples of a problem

Find-S algorithm finds one hypothesis present in the Version


Space, however there may be others consistent
hypotheses

39
SEARCH IN CONCEPT SPACE

References

Sections 2.1 – 2.3, 2.5.2 of T. Mitchell

40

You might also like