Few Shot Learning

Play with a live Neptune project -> Take a tour
Table of contents
MLOps Blog
Understanding Few-Shot
Learning in Computer Vision:
What You Need to Know
8 min
Vladimir Lyashenko
27th January, 2023
Computer Vision ML Model Development
Since the first convolutional neural network

(CNN) algorithms were created, they have
drastically improved deep learning
performance on computer vision (CV) tasks.
In 2015, Microsoft reported that their model

was actually better than humans at classifying
images from the ImageNet dataset.[1]
Nowadays, computers have no match when it

comes to using billions of images to solve a
specific task. Still, in the real world, you can
rarely build or find a dataset with that many
samples.
How do we overcome this problem? If we’re

talking about a CV task, we can use data
augmentation (DA), or collect and label
additional data.
DA is a powerful tool and might be a big part of

the solution. Labeling additional samples is a
time-consuming and expensive task, but it does
deliver better results.
RELATED ARTICLES
Data Augmentation in Python: Everything
You Need to Know
Data Augmentation in NLP: Best Practices
From a Kaggle Master
If the dataset is really small, both of these

techniques might not help us. Imagine a task
where we need to build a classification with
only one or two samples per class, and each
sample is super difficult to find.
This would call for innovative approaches. Few-

Shot Learning (FSL) is one of them.
In this article we’ll cover:
What Few-Shot Learning is – definition,

purpose, and FSL problem example
Few-Shot Learning variations – N-Shot
Learning, Few-Shot Learning, One-Shot
Learning, Zero-Shot Learning
Few-Shot Learning approaches – Meta-
Learning, Data-level, Parameter-level
Meta-Learning algorithm – definition,
Metric-Learning, Gradient-Based Meta-
Learning
Algorithms for Few-Shot image
classification – Model-Agnostic Meta-
Learning, Matching, Prototypical and
Relation Networks
Few-Shot Object Detection – YOLOMAML
What is Few-Shot
learning?
Few-Shot Learning is a sub-area of machine

learning. It’s about classifying new data when
you have only a few training samples with
supervised information.
FSL is a rather young area that needs more

research and refinement. As of today, you can
use it in CV tasks. A computer vision model can
work quite well with relatively few training
samples. Throughout this article, we’ll be
focusing on FSL in computer vision.
For example: say we work in healthcare and

have a problem with categorizing bone illnesses
via x-ray photos.
Some rare pathologies might lack enough

images to be used in the training set. This is
exactly the type of problem that can be solved
by building an FSL classifier.
Few-Shot variations
Let’s take a look at different variations and

extreme cases of FSL. In general, researchers
identify four types:
1. N-Shot Learning (NSL)

2. Few-Shot Learning
3. One-Shot Learning (OSL)
4. Less than one or Zero-Shot Learning (ZSL)
When we’re talking about FSL, we usually mean

N-way-K-Shot-classification.
N stands for the number of classes, and K for

the number of samples from each class to train
on.
N-Shot Learning is seen as a more broad

concept than all the others. It means that Few-
Shot, One-Shot, and Zero-Shot Learning are
sub-fields of NSL.
Zero-Shot
To me, ZSL is the most interesting. The goal of

Zero-Shot Learning is to classify unseen
classes without any training examples.
It may seem a little crazy, but think about it this

way: can you classify an object without even
seeing it? If you have the general idea of an
object, its appearance, properties, and
functionality, it shouldn’t be a problem.
This is the approach that you use when doing

ZSL and, according to current trends, Zero-
Shot Learning will soon become more
effective.
One-Shot and Few-Shot
By this point, you probably see a general

concept, so it’ll be no surprise that in One-Shot
Learning, we only have a single sample of each
class. Few-Shot has two to five samples per
each class, making it just a more flexible
version of OSL.
When we talk about the overall concept, we use

the Few-Shot Learning term. But this area is
quite young, so people will use these terms
differently. Keep that in mind when you’re
reading articles.
Few-Shot learning
approaches
All right, time to move to a more practical field

and talk about different Few-Shot Learning
problem approaches.
First of all, let’s define an N-way-K-Shot-

classification problem. Imagine that we have:
1. A training (support) set that consists of:

N class labels
K labeled images for each class (a small
amount, less than ten samples per class)
2. Q query images
We want to classify Q query images among the

N classes. The N * K samples in the training set
are the only examples that we have. The main
problem here is not enough training data. [1]
Few-Shot Image Classification with Meta-Learning
The first and most obvious step in an FSL task

is to gain experience from other, similar
problems. This is why Few-Shot Learning is
characterized as a Meta-Learning problem.
Let’s make this clear: in a traditional

classification problem, we try to learn how to
classify from the training data, and evaluate
using test data.
In Meta-Learning, we learn how to learn to

classify given a set of training data. We use one
set of classification problems for other,
unrelated sets.
Generally, there are two approaches that you

should consider when solving FSL problems:
Data-level approach (DLA)

Parameter-level approach (PLA)
Data-level approach
This approach is really simple. It’s based on the

concept that if you don’t have enough data to
build a reliable model and avoid overfitting and
underfitting, you should simply add more data.
That is why many FSL problems are solved by

using additional information from a large base-
dataset. The key feature of the base-dataset is
that it doesn’t have classes that we have in our
support-set for the Few-Shot task. For
example, if we want to classify a specific bird
species, the base-dataset can have images of
many other birds.
We can also produce more data ourselves. To

reach this goal, we can use data
augmentation, or even generative adversarial
networks (GANs).
Parameter-level approach
From the parameter-level point of view, it’s

quite easy to overfit on Few-Shot Learning
samples, as they have extensive and high-
dimensional spaces quite often.
To overcome this problem we should limit the

parameter space and use regularization and
proper loss functions. The model will generalize
the limited number of training samples.
On the other hand, we can enhance model

performance by directing it to the extensive
parameter space. If we use a standard
optimization algorithm, it might not give reliable
results because of the small amount of training
data.
That is why on the parameter-level we train

our model to find the best route in the
parameter space to give optimal prediction
results. As we have already mentioned above,
this technique is called Meta-Learning.
Meta-Learning
algorithm
In the classic paradigm, when we have a

specific task, an algorithm is learning if its task
performance improves with experience. In the
Meta-Learning paradigm, we have a set of
tasks. An algorithm is learning to learn if its
performance at each task improves with
experience and with the number of tasks. This
algorithm is called a Meta-Learning algorithm.
Imagine that we have a test task TEST. We will

train our Meta-Learning algorithm on a batch
of training tasks TRAIN. Training experience
gained from attempting to solve TRAIN tasks
will be used to solve the TEST task.
Solving an FSL task has a set sequence of

steps. Imagine we have a classification problem
as we mentioned before. To start, we need to
choose a base-dataset. Choosing a base-
dataset is crucial. You want to pick a good one,
so be careful.
Right now we have the N-way-K-Shot-

classification problem (let’s name it TEST) and
a large base-dataset that we’ll use as a Meta-
Learning training set (TRAIN).
The whole Meta-Training process will have a

finite number of episodes. We form an episode
like this:
From the TRAIN, we sample N classes and K

support-set images per each class, along with Q
query images. This way, we form a classification
task that’s similar to our ultimate TEST task.
At the end of each episode, the parameters of

the model are trained to maximize the accuracy
of Q images from the query set. This is where
our model learns the ability to solve an unseen
classification problem. [1]
The overall efficiency of the model is measured

by its accuracy on the TEST classification task.
Few-Shot Image Classification with Meta-Learning
In recent years, researchers published many

Meta-Learning algorithms for solving FSL
classification problems. All of them can be
divided into two large groups: Metric-Learning
and Gradient-Based Meta-Learning
algorithms.
Metric-Learning
When we talk about Metric-Learning, we

usually refer to the technique of learning a
distance function over objects.
In general, Metric-Learning algorithms learn to

compare data samples. In the case of a Few-
Shot classification problem, they classify query
samples based on their similarity to the support
samples.
As you might have already guessed, if we’re

working with images, we basically train a
convolutional neural network to output an
image embedding vector, which is later
compared to other embeddings to predict the
class.
Gradient-Based Meta-Learning
For the Gradient-Based approach, you need to

build a meta-learner and a base-learner.
Meta-learner is a model that learns across

episodes, whereas a base-learner is a model
that is initialized and trained inside each
episode by the meta-learner.
Imagine an episode of Meta-training with some

classification task defined by a N * K images
support-set and a Q query set:
1. We choose a meta-learner model,

2. Episode is started,
3. We initialize the base-learner (typically a
CNN classifier),
4. We train it on the support-set (the exact
algorithm used to train the base-learner is
defined by the meta-learner),
5. Base-learner predicts the classes on the
query set,
6. Meta-learner parameters are trained on the
loss resulting from the classification error,
7. From this point, the pipeline may differ
based on your choice of meta-learner. [1]
Algorithms for Few-

Shot image
classification
This section comes from “Few-Shot Image

Classification with Meta-Learning“, written by
Etienne Bennequin.
From the general picture, let’s move on to the

specific Meta-Learning algorithms that are
used to solve Few-Shot Learning image
classification problems.
In this section we’ll cover:
1. Model-Agnostic Meta-Learning (MAML)

2. Matching Networks
3. Prototypical Networks
4. Relation Network
Model-Agnostic Meta-Learning
MAML is based on the Gradient-Based Meta-

Learning (GBML) concept. As we’ve already
figured out, GBML is about the meta-learner
acquiring prior experience from training the
base-model and learning the common features
representations of all tasks.
Whenever there is a new task to learn, the

meta-learner with its prior experience will be
fine-tuned a little bit using the small amount of
new training data brought by the new task.
Still, we don’t want to start from a random

parameter initialization. If we do so, our
algorithm will not converge to good
performance after a few updates.
MAML aims to solve this problem.
MAML provides a good initialization of a meta-

learner’s parameters to achieve optimal fast
learning on a new task with only a small
number of gradient steps while avoiding
overfitting that may happen when using a small
dataset.
Here is how it’s done:
1. The meta-learner creates a copy of itself (C)

at the beginning of each episode,
2. C is trained on the episode (just as we have
previously discussed, with the help of base-
model),
3. C makes predictions on the query set,
4. The loss computed from these predictions is
used to update C,
5. This continues until you’ve trained on all
episodes.
The greatest advantage of this technique is that

it’s conceived to be agnostic of the meta-
learner algorithm choice. Thus, the MAML
method is widely used with many machine
learning algorithms that need fast adaptation,
especially Deep Neural Networks.
Matching Networks
Matching Networks (MN) was the first Metric-

Learning algorithm designed to solve FSL
problems.
For the Matching Networks algorithm, you

need to use a large base-dataset to solve a
Few-Shot Learning task. As shown above, this
dataset is split into episodes. After that, for
each episode, Matching Networks apply the
following procedure:
1. Each image from the support and the query

set is fed to a CNN that outputs embeddings
for them,
2. Each query image is classified using the
softmax of the cosine distance from its
embeddings to the support-set embeddings,
3. The Cross-Entropy Loss on the resulting
classification is backpropagated through the
CNN.
This way, Matching Networks learn to compute

image embeddings. This approach allows MN to
classify images with no specific prior
knowledge of classes. Everything is done
simply by comparing different instances of the
classes.
Since the classes are different in every episode,

Matching Networks compute features of the
images that are relevant to discriminate
between classes. On the contrary, in the case of
a standard classification, the algorithm learns
the features that are specific to each class.
It’s worth mentioning that the authors actually

proposed some improvements to the initial
algorithms. For example, they augmented their
algorithm with bidirectional LSTM. The
embedding of each image started depending on
the embeddings of the others.
All improvement proposals may be found in

their initial article. Still, you must remember
that improving the performance of the
algorithm might make the computational time
longer.
Prototypical Networks
Prototypical Networks (PN) are similar to

Matching Networks. Still, there are small
differences that help to enhance the algorithm’s
performance. PN actually obtains better results
than MN.
The PN process is essentially the same, but the

query image embeddings are not compared to
every image embedding from the support set.
Instead, Prototypical Networks propose an
alternative approach.
In PN, you need to form class prototypes. They

are basically class embeddings formed by
averaging the embeddings of images from this
class. The query image embeddings are then
compared only to these class prototypes.
It’s worth mentioning that in the case of a One-

Shot Learning problem, the algorithm is similar
to Matching Networks.
Also, PN uses Euclidean distance instead of

cosine distance. It’s seen as a major part of the
algorithm’s improvements.
Relation Network
All experiments carried out to build Matching

and Prototypical Networks actually led to the
creation of the Relation Network (RN). RN was
built on the PN concept but added big changes
to the algorithm.
The distance function was not defined in

advance but learned by the algorithm. RN has
its own relation module that does this. If you
want to learn more, check out the initial article.
The overall structure is as follows. The relation

module is put on the top of the embedding
module, which is the part that computes
embeddings and class prototypes from input
images.
:

Few Shot Learning

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Few Shot Learning

Uploaded by

Copyright:

Available Formats

Play with a live Neptune project -> Take a tour

27th January, 2023

Computer Vision ML Model Development

Since the ﬁrst convolutional neural network

In 2015, Microsoft reported that their model

Nowadays, computers have no match when it

How do we overcome this problem? If we’re

DA is a powerful tool and might be a big part of

If the dataset is really small, both of these

This would call for innovative approaches. Few-

In this article we’ll cover:

What Few-Shot Learning is – deﬁnition,

Few-Shot Learning is a sub-area of machine

FSL is a rather young area that needs more

For example: say we work in healthcare and

Some rare pathologies might lack enough

Let’s take a look at different variations and

1. N-Shot Learning (NSL)

When we’re talking about FSL, we usually mean

N stands for the number of classes, and K for

N-Shot Learning is seen as a more broad

To me, ZSL is the most interesting. The goal of

It may seem a little crazy, but think about it this

This is the approach that you use when doing

One-Shot and Few-Shot

By this point, you probably see a general

When we talk about the overall concept, we use

All right, time to move to a more practical ﬁeld

First of all, let’s deﬁne an N-way-K-Shot-

1. A training (support) set that consists of:

We want to classify Q query images among the

Few-Shot Image Classiﬁcation with Meta-Learning

The ﬁrst and most obvious step in an FSL task

Let’s make this clear: in a traditional

In Meta-Learning, we learn how to learn to

Generally, there are two approaches that you

Data-level approach (DLA)

This approach is really simple. It’s based on the

That is why many FSL problems are solved by

We can also produce more data ourselves. To

From the parameter-level point of view, it’s

To overcome this problem we should limit the

On the other hand, we can enhance model

That is why on the parameter-level we train

In the classic paradigm, when we have a

Imagine that we have a test task TEST. We will

Solving an FSL task has a set sequence of

Right now we have the N-way-K-Shot-

The whole Meta-Training process will have a

From the TRAIN, we sample N classes and K

At the end of each episode, the parameters of

The overall efﬁciency of the model is measured

Few-Shot Image Classiﬁcation with Meta-Learning

In recent years, researchers published many

When we talk about Metric-Learning, we

In general, Metric-Learning algorithms learn to

As you might have already guessed, if we’re

For the Gradient-Based approach, you need to

Meta-learner is a model that learns across

Imagine an episode of Meta-training with some

1. We choose a meta-learner model,

Algorithms for Few-