Professional Documents
Culture Documents
From the perspective of inductive learning, we are given input samples (x) and output samples
(f(x)) and the problem is to estimate the function (f). Specifically, the problem is to generalize
from the samples and the mapping to be useful to estimate the output for new samples in the
future.
In practice it is almost always too hard to estimate the function, so we are looking for very good
approximations of the function.
There are problems where inductive learning is not a hood idea. It is important when to use and
when not to use supervised machine learning.
Problems where there is no human expert. If people do not know the answer they
cannot write a program to solve it. These are areas of true discovery.
Humans can perform the task but no one can describe how to do it. There are
problems where humans can do things that computer cannot do or do well. Examples
include riding a bike or driving a car.
Problems where the desired function changes frequently. Humans could describe it
and they could write a program to do it, but the problem changes too often. It is not cost
effective. Examples include the stock market.
Problems where each user needs a custom function. It is not cost effective to write a
custom program for each user. Example is recommendations of movies or books on
Netflix or Amazon.
The Essence of Inductive Learning
We can write a program that works perfectly for the data that we have. This function will be
maximally overfit. But we have no idea how well it will work on new data, it will likely de very
badly because we may never see the same examples again.
The data is not enough. You can predict anything you like. And this would be naive assume
nothing about the problem.
In practice we are not naive. There is an underlying problem and we are interested in an accurate
approximation of the function. There is a double exponential number of possible classifiers in the
number of input states. Finding a good approximate for the function is very difficult.
There are classes of hypotheses that we can try. That is the form that the solution may take or the
representation. We cannot know which is most suitable for our problem before hand. We have to
use experimentation to discover what works on the problem.
In practice we start with a small hypothesis class and slowly grow the hypothesis class until we
get a good result.
Training example: a sample from x including its output from the target function
Target function: the mapping function f from x to f(x)
Hypothesis: approximation of f, a candidate function.
Concept: A boolean target function, positive examples and negative examples for the 1/0
class values.
Classifier: Learning program outputs a classifier that can be used to classify.
Learner: Process that creates the classifier.
Hypothesis space: set of possible approximations of f that the algorithm can create.
Version space: subset of the hypothesis space that is consistent with the observed data.
Key issues in machine learning:
Search procedure
o Direct computation: No search, just calculate what is needed.
o Local: Search though the hypothesis space to refine the hypothesis.
o Constructive: Build the hypothesis piece by piece.
Timing
o Eager: Learning performed up front. Most algorithms are eager.
o Lazy: Learning performed at the time that it is needed
Online vs Batch
o Online: Learning based on each pattern as it is observed.
o Batch: Learning over groups of patters. Most algorithms are batch.