0% found this document useful (0 votes)
102 views7 pages

Understanding Empirical Risk Minimization

Uploaded by

shaik amreen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views7 pages

Understanding Empirical Risk Minimization

Uploaded by

shaik amreen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Empirical risk minimization

 Empirical risk minimization (ERM) is a principle in statistical learning


which defines a family of Machine learning Algorithms and is used to give
theoretical bounds on their performance.

 The core idea is that we cannot know exactly how well an algorithm will
work in practice (the true "risk") because we don't know the true distribution
of data that the algorithm will work on, but we can instead measure its
performance on a known set of training data (the "empirical" risk).

 The empirical risk is the average loss over the data points.

 For example: Consumers tend to consult first their smartphones before


buying something in-store.
 R(f)=ÊL(f(X),Y ). Picking the function f∗ that minimizes, it is
known as empirical risk minimization.

The size of the dataset has a big impact on empirical risk


minimization. If we get more data, the empirical risk will
approach the true risk.

The complexity of the underlying distribution affects how well


we can approximate it. If it's too complex, we would need more
data to get a good approximation.

ERM try to imitate Observations and It works on cost averaged


over the distribution of inputs and outputs.
Background
We have two spaces of objects X and Y and would like to learn a
function ℎ:X→Y (often called hypothesis) which outputs an object x ∈
X and y ∈ Y.

 we have at our disposal a training set of n examples

where is an input and is the corresponding


response that we wish to get from

We assume that there is a joint probability distribution P(x,y) over X and
Y that the training set consists of n instances from P(x,y).
The risk associated with hypothesis h(x) is then defined as the expectation of the
loss function:

A loss function commonly used in theory is the 0-1 loss function

 The ultimate goal of a learning algorithm is to find a h* hypothesis among a


fixed class of functions H for which the risk R(ℎ) minimal


Empirical risk minimization Calculation
In general, the risk R(ℎ) cannot be computed because the distribution
P(x,y) is unknown to the learning algorithm (this situation is referred
to as agnostic learning). However, we can compute an approximation,
called empirical risk, by averaging the loss function on the training
set;

more formally, computing the expectation with respect to the


empirical measure is
The empirical risk minimization principle states that the learning
algorithm should choose a hypothesis h which minimizes the
empirical risk:

Thus the learning algorithm defined by the ERM principle


consists in solving the above optimization problem.
Advantages:
ERM is essential to understanding the limits of machine
learning algorithms and to form a good basis for practical
problem-solving skills.

It is more reliable because it represents a real-life


experience and not just theories.

Empirical research is important in today's world because most


people believe in something only that they can see, hear or
experience.

You might also like