You are on page 1of 10

Module 3 AI & ML

Basics of Learning Theory


3.1 Introduction to Learning and its Types

The process of acquiring knowledge and expertise through study, experience, or being taught is
called as learning. Generally, humans learn in different ways. To make machines learn, we need to
simulate the strategies of human learning in machines. But will the machines learn? This depends
on the nature of problem that the computers can solve. There are two kinds of problems: well-
posed and ill-posed. Computers can only solve well-posed problems, as these have well-defined
specifications and have the following components inherent to it.

1. Class of learning tasks (T)


2. A measure of performance (P)
3. A source of experience (E)

The standard definition of learning proposed by Tom Mitchell is that a program can learn from E
for the Task T, and P improves with experience E.

Let x be the input and X be the input space, which is the set of all inputs, and Y is the output
space, which is the set of all possible outputs. Let the unknown target function be f:X Y, that
maps the input space to output space. The objective of the learning program is to pick a function,
g: X Y to approximate hypothesis f. All the possible formulae from a hypothesis space. In
short, let H be the set of all formulae form which the learning algorithm chooses. The choice is
good when the hypothesis g replicates f for all samples.

Fig 3.1 Learning Environment

It can be observed that training samples and target function are dependent on the given problem.
The learning algorithm and hypothesis set are independent of the given problem. Thus, learning
model is informally the hypothesis set and learning algorithm. Thus, learning model can stated as
follows:

Learning model = Hypothesis Set + Learning Algorithm

3.2 Learning types


There are different types of learning. Some of the different learning method are as follows:

1 Dept of CSE, Geetha N


Module 3 AI & ML

1. Learn by memorization or learn by repetition also called as Rote learning is done by


memorizing without understanding the logic or concept. Although the rote learning is basically
learning by repetition, in machine learning perspective, the learning occur by simply comparing
with the existing knowledge for the same input data and producing the output is present.

2. Learn by examples also called as learn by experience or previous knowledge acquired at


some time, is like finding an analogy, which means performance inductive learning some
observation that formulate a general concept. Here the learner learns by inferring a general rules
from the set of observations or examples. Therefore, inductive learning is also called as discovery
learning.

3. Learn by being taught by an expert or a teacher, generally called as passive learning.


However, there is a special kind of learning called active learning where the learner can
interactively query a teacher or expert to label unlabelled data instances with the desired output.

4. Learn by critical thinking, also called as deductive learning, deduces new facts or
conclusions from the related known facts and information.

5. Self learning, also called as reinforcement learning, is a self-directed learning that normally
learns from mistakes punishments and rewards.

6. Learning to solve problems is a type of cognitive learning where learning happens in the
mind and is possible by devising a methodology to achieve a goal. Here, the learner initially is not
aware of the solution or the way to achieve the goal but only knows the goal. The learning
happens either directly from the initial state by following the steps to achieve the goal or indirectly
by inferring the behaviour.

7. Learning by generalizing explanation, also called as explanation based learning (EBL), is


another learning method that exploits domain knowledge from experts to improve the accuracy of
learned concept by supervised learning.

3.3 Introduction to computation Learning theory


There are many questions that has been raised by mathematicians and logicians over the time
taken by computers to learn. Some of the questions are as follows:

1. How can a learning system predict an unseen instance?


2. How do the hypothesis h is close to f, when hypothesis f itself is unknown?
3. How many samples are required?
4. Can we measure the performance of a learning system?
5. Is the solution obtained local or global?

These questions are the basis of a field called 'computational learning theory' or in short (COLT).
It is a specialized field of study of machine learning. COLT deals with the formal method used for
learning systems. It deals with Framework for quantifying learning task and learning algorithms.
It provides of fundamental basis for study of machine learning. Computational learning theory
uses many concepts from divorce areas such as theoretical computer science, artificial Intelligence
and statistics. The core concept of COLT is the concept of learning framework. One such

2 Dept of CSE, Geetha N


Module 3 AI & ML

important Framework is probably approximate learning (PAC). COLT focuses on supervised


learning task. In the complexity of analyzing is difficult, normally, binary classification tasks are
considered for analysis.

3.4 Design of a Learning System


A system that is built around a learning algorithm is called a learning system. The design of
systems focuses on following steps:

Step 1) Choosing the Training Experience: The very important and first task is to choose the
training data or training experience which will be fed to the Machine Learning Algorithm. It is
important to note that the data or experience that we fed to the algorithm must have a significant
impact on the Success or Failure of the Model. So Training data or experience should be chosen
wisely.
Below are the attributes which will impact on Success and Failure of Data:
 The training experience will be able to provide direct or indirect feedback regarding choices.
For example: While Playing chess the training data will provide feedback to itself like
instead of this move if this is chosen the chances of success increases.
 Second important attribute is the degree to which the learner will control the sequences of
training examples. For example: when training data is fed to the machine then at that time
accuracy is very less but when it gains experience while playing again and again with itself
or opponent the machine algorithm will get feedback and control the chess game accordingly.
 Third important attribute is how it will represent the distribution of examples over which
performance will be measured. For example, a Machine learning algorithm will get
experience while going through a number of different cases and different examples. Thus,
Machine Learning Algorithm will get more and more experience by passing through more
and more examples and hence its performance will increase.
Step 2- Choosing target function: The next important step is choosing the target function. It
means according to the knowledge fed to the algorithm the machine learning will choose
NextMove function which will describe what type of legal moves should be taken. For example
: While playing chess with the opponent, when opponent will play then the machine learning
algorithm will decide what be the number of possible legal moves taken in order to get succes s.
Step 3- Choosing Representation for Target function: When the machine algorithm will
know all the possible legal moves the next step is to choose the optimized move using any
representation i.e. using linear Equations, Hierarchical Graph Representation, Tabular form etc.
The NextMove function will move the Target move like out of these move which will provide
more success rate. For Example : while playing chess machine have 4 possible moves, so the
machine will choose that optimized move which will provide success to it.

3 Dept of CSE, Geetha N


Module 3 AI & ML

Step 4- Choosing Function Approximation Algorithm: An optimized move cannot be chosen


just with the training data. The training data had to go through with set of example and through
these examples the training data will approximates which steps are chosen and after that
machine will provide feedback on it. For Example : When a training data of Playing chess is fed
to algorithm so at that time it is not machine algorithm will fail or get success and again from
that failure or success it will measure while next move what step should be chosen and what is
its success rate.
Step 5- Final Design: The final design is created at last when system goes from number of
examples , failures and success , correct and incorrect decision and what will be the next step
etc. Example: DeepBlue is an intelligent computer which is ML-based won chess game against
the chess expert Garry Kasparov, and it became the first computer which had beaten a human
chess expert.

3.5 Introduction to Concept Learning


Concept learning is a learning strategy of acquiring abstract knowledge or inferring a general
concept or deriving a category from the given training samples. Inducing general functions from
specific training examples is a main issue of machine learning. Concept Learning: Acquiring the
definition of a general category from given sample positive and negative training examples of the
category. Concept Learning can seen as a problem of searching through a predefined space of
potential hypotheses for the hypothesis that best fits the training examples. The hypothesis space
has a general-to-specific ordering of hypotheses, and the search can be efficiently organized by
taking advantage of a naturally occurring structure over the hypothesis space. A Formal Definition
for Concept Learning: Inferring a boolean-valued function from training examples of its input and
output. An example for concept-learning is the learning of bird-concept from the given examples
of birds (positive examples) and non-birds (negative examples). We are trying to learn the
definition of a concept from given examples.
Concept learning requires three things:
1. Input – Training dataset which is a set of training instances, each labelled with the name of
a concept or category to which belongs. Use this past experience to train and build the model.
2. Output - target concept or target function f. It is a mapping function f(x) from input x to
output y. It is to determine the specific features or common features to identify an object. In
other words, it is to find the hypothesis to determine the target concept. For example, the
specific set of features to identify enjoysport class label in data sample.
3. Test - new instances to test the learned model.
Formally, concept learning is defined as - "given a set of hypotheses, the learner searches through
the hypothesis space to identify the best hypothesis that matches the target concept.”

4 Dept of CSE, Geetha N


Module 3 AI & ML

A set of example days, and each is described by six attributes. The task is to learn to predict the
value of EnjoySport for arbitrary day, based on the values of its attribute values.

3.5.1 Hypothesis Representation


Each hypothesis consists of a conjuction of constraints on the instance attributes. Each hypothesis
will be a vector of six constraints, specifying the values of the six attribute
– (Sky, AirTemp, Humidity, Wind, Water, and Forecast).
Each attribute will be:
? - indicating any value is acceptable for the attribute (don’t care)
single value – specifying a single required value (ex. Warm) (specific)
ø - indicating no value is acceptable for the attribute (no value)
A hypothesis:
Sky AirTemp Humidity Wind Water Forecast - < Sunny, ? , ? , Strong , ? , Same >
The most general hypothesis – that every day is a positive example <?,?,?,?,?,?>
The most specific hypothesis – that no day is a positive example < ø , ø , ø , ø , ø , ø >
EnjoySport concept learning task requires learning the sets of days for which EnjoySport=yes,
describing this set by a conjunction of constraints over the instance attributes.
Given – Instances X : set of all possible days, each described by the attributes
• Sky – (values: Sunny, Cloudy, Rainy)
• AirTemp – (values: Warm, Cold)
• Humidity – (values: Normal, High)
• Wind – (values: Strong, Weak)
• Water – (values: Warm, Cold)
• Forecast – (values: Same, Change)

Target Concept (Function) c : EnjoySport : X – {0,1}


Hypotheses H : Each hypothesis is described by a conjunction of constraints on the attributes.
Training Examples D : positive and negative examples of the target function
Determine – A hypothesis h in H such that h(x) = c(x) for all x in D.

Concept learning can be viewed as the task of searching through a large space of hypotheses
implicitly defined by the hypothesis representation. The goal of this search is to find the
hypothesis that best fits the training examples. By selecting a hypothesis representation, the
designer of the learning algorithm implicitly defines the space of all hypotheses that the program
can ever represent and therefore can ever learn.
Sky has 3 possible values, and other 5 attributes have 2 possible values. There are 96 (=
3.2.2.2.2.2) distinct instances in X. There are 5120 (=5.4.4.4.4.4) syntactically distinct hypotheses
in H. Two more values for attributes: ? and 0 . Every hypothesis containing one or more 0
symbols represents the empty set of instances; that is, it classifies every instance as negative.
There are 973 (= 1 + 4.3.3.3.3.3) semantically distinct hypotheses in H. Only one more value for
attributes: ?, and one hypothesis representing empty set of instances. Although EnjoySport has
5 Dept of CSE, Geetha N
Module 3 AI & ML

small, finite hypothesis space, most learning tasks have much larger (even infinite) hypothesis
spaces. – We need efficient search algorithms on the hypothesis spaces.

3.5.2 Version Spaces


This subset of all hypotheses is called the version space with respect to the hypothesis space H
and the training examples D, because it contains all plausible versions of the target concept.

3.5.3 Generalization and Specification


By generalization of the most specific hypothesis and by specialization of the most general
hypothesis, the hypothesis space can be searched for an approximation hypothesis that matches all
positive instances but does not match any negative instances.
Searching the Hypothesis Space : There are two ways of learning the hypothesis, consistent with
all the training instances from the large hypothesis space.
1. Specification – General to Specific learning : this learning methodology will search
through the hypothesis space for an approximate hypothesis by generalizing the most
specific hypothesis.
2. Generalization – Specific to General learning : this learning methodology will search
through the hypothesis space for an approximate hypothesis by specializing the most
general hypothesis.

3.5.4 Hypothesis Space Search by Find – S Algorithm


FIND-S Algorithm starts from the most specific hypothesis and generalize it by considering only
positive examples. FIND-S algorithm ignores negative examples. – As long as the hypothesis
space contains a hypothesis that describes the true target concept, and the training data contains no
errors, ignoring negative examples does not cause to any problem. FIND-S algorithm finds the
most specific hypothesis within H that is consistent with the positive training examples. – The
final hypothesis will also be consistent with negative examples if the correct target concept is in
H, and the training examples are correct.

FIND-S Algorithm

1. Initialize h to the most specific hypothesis in H


2. For each positive training instance x
For each attribute constraint a, in h
If the constraint a, is satisfied by x
Then do nothing
Else replace a, in h by the next more general constraint that is satisfied by x
3. Output hypothesis h

Training Examples:

6 Dept of CSE, Geetha N


Module 3 AI & ML

Step 1: Initialize h to the most specific hypothesis in H - h0 = < ø , ø , ø , ø , ø , ø >

Step 2 of Find-S Algorithm


First iteration
h0 = (ø, ø, ø, ø, ø, ø)

X1 = <Sunny, Warm, Normal, Strong, Warm, Same>

h1 = <Sunny, Warm, Normal, Strong, Warm, Same>

Second iteration h1 = <Sunny, Warm, Normal, Strong, Warm, Same>

X2 = <Sunny, Warm, High, Strong, Warm, Same>

h2 = <Sunny, Warm, ?, Strong, Warm, Same>

Third iteration h2 = <Sunny, Warm, ?, Strong, Warm, Same>

X3 = <Rainy, Cold, High, Strong, Warm, Change> – No

X3 is Negative example Hence ignored

h3 = <Sunny, Warm, ?, Strong, Warm, Same>

Fourth iteration h3 = <Sunny, Warm, ?, Strong, Warm, Same>

X4 = <Sunny, Warm, High, Strong, Cool, Change>

h4 = <Sunny, Warm, ?, Strong, ?, ?>

Step 3 - The final maximally specific hypothesis is <Sunny, Warm, ?, Strong, ?, ?>

Example 2 :

1. How many concepts are possible for this instance space?


Solution: 2 * 3 * 2 * 2 * 3 = 72
2. How many hypotheses can be expressed by the hypothesis language?
Solution: 4 * 5 * 4 * 4 * 5 = 1600
Semantically Distinct Hypothesis = ( 3 * 4 * 3 * 3 * 4 ) + 1 = 433

7 Dept of CSE, Geetha N


Module 3 AI & ML

3. Apply the FIND-S algorithm by hand on the given training set. Consider the examples in the
specified order and write down your hypothesis each time after observing an example.
Step 1: h0 = (ø, ø, ø, ø, ø)
Step 2:
First iteration X1 = (some, small, no, expensive, many) – No
Negative Example Hence Ignore

h1 = (ø, ø, ø, ø, ø)

Second iteration h1 = (ø, ø, ø, ø, ø)

X2 = (many, big, no, expensive, one) – Yes

h2 = (many, big, no, expensive, one)

Third iteration h2 = (many, big, no, expensive, one)

X3 = (some, big, always, expensive, few) – No (Negative


example hence Ignore)

h3 = (many, big, no, expensive, one)

Fourth iteration h3 = (many, big, no, expensive, one)

X4 = (many, medium, no, expensive, many) – Yes

h4 = (many, ?, no, expensive, ?)

Fifth iteration h4 = (many, ?, no, expensive, ?)

X5 = (many, small, no, affordable, many) – Yes

h5 = (many, ?, no, ?, ?)

Step 3: Final Maximally Specific Hypothesis is: h5 = (many, ?, no, ?, ?)

3.5.5 Hypothesis Space Search by Candidate Elimination Algorithm


The candidate elimination algorithm incrementally builds the version space given a hypothesis
space H and a set E of examples. The examples are added one by one; each example possibly
shrinks the version space by removing the hypotheses that are inconsistent with the example.
The candidate elimination algorithm does this by updating the general and specific boundary for
each new example.
 You can consider this as an extended form of the Find-S algorithm.
 Consider both positive and negative examples.
8 Dept of CSE, Geetha N
Module 3 AI & ML

 Actually, positive examples are used here as the Find-S algorithm (Basically they are
generalizing from the specification). While the negative example is specified in the
generalizing form.

Algorithm

Step 1: Initialize General Hypothesis and Specific Hypothesis.


Step 2: For each training example
Step 3: If example is positive example
if attribute_value == hypothesis_value:
Do nothing
else:
replace attribute value with '?' (Basically generalizing it)
Step 4: If example is Negative example
Make generalize hypothesis more specific.

Training Examples:

9 Dept of CSE, Geetha N


Module 3 AI & ML

Example 2:

10 Dept of CSE, Geetha N

You might also like