Empirical risk minimization
Empirical risk minimization (ERM) is a principle in statistical learning
which defines a family of Machine learning Algorithms and is used to give
theoretical bounds on their performance.
The core idea is that we cannot know exactly how well an algorithm will
work in practice (the true "risk") because we don't know the true distribution
of data that the algorithm will work on, but we can instead measure its
performance on a known set of training data (the "empirical" risk).
The empirical risk is the average loss over the data points.
For example: Consumers tend to consult first their smartphones before
buying something in-store.
R(f)=ÊL(f(X),Y ). Picking the function f∗ that minimizes, it is
known as empirical risk minimization.
The size of the dataset has a big impact on empirical risk
minimization. If we get more data, the empirical risk will
approach the true risk.
The complexity of the underlying distribution affects how well
we can approximate it. If it's too complex, we would need more
data to get a good approximation.
ERM try to imitate Observations and It works on cost averaged
over the distribution of inputs and outputs.
Background
We have two spaces of objects X and Y and would like to learn a
function ℎ:X→Y (often called hypothesis) which outputs an object x ∈
X and y ∈ Y.
we have at our disposal a training set of n examples
where is an input and is the corresponding
response that we wish to get from
We assume that there is a joint probability distribution P(x,y) over X and
Y that the training set consists of n instances from P(x,y).
The risk associated with hypothesis h(x) is then defined as the expectation of the
loss function:
A loss function commonly used in theory is the 0-1 loss function
The ultimate goal of a learning algorithm is to find a h* hypothesis among a
fixed class of functions H for which the risk R(ℎ) minimal
Empirical risk minimization Calculation
In general, the risk R(ℎ) cannot be computed because the distribution
P(x,y) is unknown to the learning algorithm (this situation is referred
to as agnostic learning). However, we can compute an approximation,
called empirical risk, by averaging the loss function on the training
set;
more formally, computing the expectation with respect to the
empirical measure is
The empirical risk minimization principle states that the learning
algorithm should choose a hypothesis h which minimizes the
empirical risk:
Thus the learning algorithm defined by the ERM principle
consists in solving the above optimization problem.
Advantages:
ERM is essential to understanding the limits of machine
learning algorithms and to form a good basis for practical
problem-solving skills.
It is more reliable because it represents a real-life
experience and not just theories.
Empirical research is important in today's world because most
people believe in something only that they can see, hear or
experience.