You are on page 1of 1

Play with a live Neptune project -> Take a tour

Table of contents

MLOps Blog

Understanding Few-Shot
Learning in Computer Vision:
What You Need to Know

8 min

Vladimir Lyashenko

27th January, 2023

Computer Vision ML Model Development

Since the first convolutional neural network


(CNN) algorithms were created, they have
drastically improved deep learning
performance on computer vision (CV) tasks. 

In 2015, Microsoft reported that their model


was actually better than humans at classifying
images from the ImageNet dataset.[1]

Nowadays, computers have no match when it


comes to using billions of images to solve a
specific task. Still, in the real world, you can
rarely build or find a dataset with that many
samples.

How do we overcome this problem? If we’re


talking about a CV task, we can use data
augmentation (DA), or collect and label
additional data. 

DA is a powerful tool and might be a big part of


the solution. Labeling additional samples is a
time-consuming and expensive task, but it does
deliver better results.

RELATED ARTICLES
Data Augmentation in Python: Everything
You Need to Know
Data Augmentation in NLP: Best Practices
From a Kaggle Master

If the dataset is really small, both of these


techniques might not help us. Imagine a task
where we need to build a classification with
only one or two samples per class, and each
sample is super difficult to find. 

This would call for innovative approaches. Few-


Shot Learning (FSL) is one of them.

In this article we’ll cover:

What Few-Shot Learning is – definition,


purpose, and FSL problem example
Few-Shot Learning variations – N-Shot
Learning, Few-Shot Learning, One-Shot
Learning, Zero-Shot Learning
Few-Shot Learning approaches – Meta-
Learning, Data-level, Parameter-level 
Meta-Learning algorithm – definition,
Metric-Learning, Gradient-Based Meta-
Learning
Algorithms for Few-Shot image
classification – Model-Agnostic Meta-
Learning, Matching, Prototypical and
Relation Networks
Few-Shot Object Detection – YOLOMAML

What is Few-Shot
learning?

Few-Shot Learning is a sub-area of machine


learning. It’s about classifying new data when
you have only a few training samples with
supervised information.

FSL is a rather young area that needs more


research and refinement. As of today, you can
use it in CV tasks. A computer vision model can
work quite well with relatively few training
samples. Throughout this article, we’ll be
focusing on FSL in computer vision.

For example: say we work in healthcare and


have a problem with categorizing bone illnesses
via x-ray photos. 

Some rare pathologies might lack enough


images to be used in the training set. This is
exactly the type of problem that can be solved
by building an FSL classifier.

Few-Shot variations

Let’s take a look at different variations and


extreme cases of FSL. In general, researchers
identify four types:

1. N-Shot Learning (NSL)


2. Few-Shot Learning
3. One-Shot Learning (OSL)
4. Less than one or Zero-Shot Learning (ZSL)

When we’re talking about FSL, we usually mean


N-way-K-Shot-classification.

N stands for the number of classes, and K for


the number of samples from each class to train
on.

N-Shot Learning is seen as a more broad


concept than all the others. It means that Few-
Shot, One-Shot, and Zero-Shot Learning are
sub-fields of NSL.

Zero-Shot

To me, ZSL is the most interesting. The goal of


Zero-Shot Learning is to classify unseen
classes without any training examples.

It may seem a little crazy, but think about it this


way: can you classify an object without even
seeing it? If you have the general idea of an
object, its appearance, properties, and
functionality, it shouldn’t be a problem.

This is the approach that you use when doing


ZSL and, according to current trends, Zero-
Shot Learning will soon become more
effective.

One-Shot and Few-Shot

By this point, you probably see a general


concept, so it’ll be no surprise that in One-Shot
Learning, we only have a single sample of each
class. Few-Shot has two to five samples per
each class, making it just a more flexible
version of OSL.

When we talk about the overall concept, we use


the Few-Shot Learning term. But this area is
quite young, so people will use these terms
differently. Keep that in mind when you’re
reading articles.

Few-Shot learning
approaches

All right, time to move to a more practical field


and talk about different Few-Shot Learning
problem approaches.

First of all, let’s define an N-way-K-Shot-


classification problem. Imagine that we have:

1. A training (support) set that consists of:


N class labels
K labeled images for each class (a small
amount, less than ten samples per class)
2. Q query images

We want to classify Q query images among the


N classes. The N * K samples in the training set
are the only examples that we have. The main
problem here is not enough training data. [1] 

Few-Shot Image Classification with Meta-Learning

The first and most obvious step in an FSL task


is to gain experience from other, similar
problems. This is why Few-Shot Learning is
characterized as a Meta-Learning problem.

Let’s make this clear: in a traditional


classification problem, we try to learn how to
classify from the training data, and evaluate
using test data. 

In Meta-Learning, we learn how to learn to


classify given a set of training data. We use one
set of classification problems for other,
unrelated sets.

Generally, there are two approaches that you


should consider when solving FSL problems:

Data-level approach (DLA)


Parameter-level approach (PLA)

Data-level approach

This approach is really simple. It’s based on the


concept that if you don’t have enough data to
build a reliable model and avoid overfitting and
underfitting, you should simply add more data.

That is why many FSL problems are solved by


using additional information from a large base-
dataset. The key feature of the base-dataset is
that it doesn’t have classes that we have in our
support-set for the Few-Shot task. For
example, if we want to classify a specific bird
species, the base-dataset can have images of
many other birds.

We can also produce more data ourselves. To


reach this goal, we can use data
augmentation, or even generative adversarial
networks (GANs). 

Parameter-level approach

From the parameter-level point of view, it’s


quite easy to overfit on Few-Shot Learning
samples, as they have extensive and high-
dimensional spaces quite often.

To overcome this problem we should limit the


parameter space and use regularization and
proper loss functions. The model will generalize
the limited number of training samples.

On the other hand, we can enhance model


performance by directing it to the extensive
parameter space. If we use a standard
optimization algorithm, it might not give reliable
results because of the small amount of training
data.

That is why on the parameter-level we train


our model to find the best route in the
parameter space to give optimal prediction
results. As we have already mentioned above,
this technique is called Meta-Learning.

Meta-Learning
algorithm

In the classic paradigm, when we have a


specific task, an algorithm is learning if its task
performance improves with experience. In the
Meta-Learning paradigm, we have a set of
tasks. An algorithm is learning to learn if its
performance at each task improves with
experience and with the number of tasks. This
algorithm is called a Meta-Learning algorithm.

Imagine that we have a test task TEST. We will


train our Meta-Learning algorithm on a batch
of training tasks TRAIN. Training experience
gained from attempting to solve TRAIN tasks
will be used to solve the TEST task.

Solving an FSL task has a set sequence of


steps. Imagine we have a classification problem
as we mentioned before. To start, we need to
choose a base-dataset. Choosing a base-
dataset is crucial. You want to pick a good one,
so be careful.

Right now we have the N-way-K-Shot-


classification problem (let’s name it TEST) and
a large base-dataset that we’ll use as a Meta-
Learning training set (TRAIN). 

The whole Meta-Training process will have a


finite number of episodes. We form an episode
like this:

From the TRAIN, we sample N classes and K


support-set images per each class, along with Q
query images. This way, we form a classification
task that’s similar to our ultimate TEST task. 

At the end of each episode, the parameters of


the model are trained to maximize the accuracy
of Q images from the query set. This is where
our model learns the ability to solve an unseen
classification problem. [1]

The overall efficiency of the model is measured


by its accuracy on the TEST classification task.

Few-Shot Image Classification with Meta-Learning

In recent years, researchers published many


Meta-Learning algorithms for solving FSL
classification problems. All of them can be
divided into two large groups: Metric-Learning
and Gradient-Based Meta-Learning
algorithms.

Metric-Learning

When we talk about Metric-Learning, we


usually refer to the technique of learning a
distance function over objects. 

In general, Metric-Learning algorithms learn to


compare data samples. In the case of a Few-
Shot classification problem, they classify query
samples based on their similarity to the support
samples. 

As you might have already guessed, if we’re


working with images, we basically train a
convolutional neural network to output an
image embedding vector, which is later
compared to other embeddings to predict the
class.

Gradient-Based Meta-Learning

For the Gradient-Based approach, you need to


build a meta-learner and a base-learner.

Meta-learner is a model that learns across


episodes, whereas a base-learner is a model
that is initialized and trained inside each
episode by the meta-learner.

Imagine an episode of Meta-training with some


classification task defined by a N * K images
support-set and a Q query set:

1. We choose a meta-learner model,


2. Episode is started,
3. We initialize the base-learner (typically a
CNN classifier),
4. We train it on the support-set (the exact
algorithm used to train the base-learner is
defined by the meta-learner),
5. Base-learner predicts the classes on the
query set,
6. Meta-learner parameters are trained on the
loss resulting from the classification error,
7. From this point, the pipeline may differ
based on your choice of meta-learner. [1]

Algorithms for Few-


Shot image
classification

This section comes from “Few-Shot Image


Classification with Meta-Learning“, written by
Etienne Bennequin.

From the general picture, let’s move on to the


specific Meta-Learning algorithms that are
used to solve Few-Shot Learning image
classification problems.

In this section we’ll cover:

1. Model-Agnostic Meta-Learning (MAML)


2. Matching Networks
3. Prototypical Networks
4. Relation Network

Model-Agnostic Meta-Learning

MAML is based on the Gradient-Based Meta-


Learning (GBML) concept. As we’ve already
figured out, GBML is about the meta-learner
acquiring prior experience from training the
base-model and learning the common features
representations of all tasks.

Whenever there is a new task to learn, the


meta-learner with its prior experience will be
fine-tuned a little bit using the small amount of
new training data brought by the new task. 

Still, we don’t want to start from a random


parameter initialization. If we do so, our
algorithm will not converge to good
performance after a few updates.

MAML aims to solve this problem.

MAML provides a good initialization of a meta-


learner’s parameters to achieve optimal fast
learning on a new task with only a small
number of gradient steps while avoiding
overfitting that may happen when using a small
dataset.

Here is how it’s done:

1. The meta-learner creates a copy of itself (C)


at the beginning of each episode,
2. C is trained on the episode (just as we have
previously discussed, with the help of base-
model),
3. C makes predictions on the query set,
4. The loss computed from these predictions is
used to update C,
5. This continues until you’ve trained on all
episodes.

The greatest advantage of this technique is that


it’s conceived to be agnostic of the meta-
learner algorithm choice. Thus, the MAML
method is widely used with many machine
learning algorithms that need fast adaptation,
especially Deep Neural Networks.

Matching Networks

Matching Networks (MN) was the first Metric-


Learning algorithm designed to solve FSL
problems.

For the Matching Networks algorithm, you


need to use a large base-dataset to solve a
Few-Shot Learning task. As shown above, this
dataset is split into episodes. After that, for
each episode, Matching Networks apply the
following procedure:

1. Each image from the support and the query


set is fed to a CNN that outputs embeddings
for them,
2. Each query image is classified using the
softmax of the cosine distance from its
embeddings to the support-set embeddings,
3. The Cross-Entropy Loss on the resulting
classification is backpropagated through the
CNN.

This way, Matching Networks learn to compute


image embeddings. This approach allows MN to
classify images with no specific prior
knowledge of classes. Everything is done
simply by comparing different instances of the
classes.

Since the classes are different in every episode,


Matching Networks compute features of the
images that are relevant to discriminate
between classes. On the contrary, in the case of
a standard classification, the algorithm learns
the features that are specific to each class. 

It’s worth mentioning that the authors actually


proposed some improvements to the initial
algorithms. For example, they augmented their
algorithm with bidirectional LSTM. The
embedding of each image started depending on
the embeddings of the others.

All improvement proposals may be found in


their initial article. Still, you must remember
that improving the performance of the
algorithm might make the computational time
longer.

Prototypical Networks

Prototypical Networks (PN) are similar to


Matching Networks. Still, there are small
differences that help to enhance the algorithm’s
performance. PN actually obtains better results
than MN.

The PN process is essentially the same, but the


query image embeddings are not compared to
every image embedding from the support set.
Instead, Prototypical Networks propose an
alternative approach.

In PN, you need to form class prototypes. They


are basically class embeddings formed by
averaging the embeddings of images from this
class. The query image embeddings are then
compared only to these class prototypes.

It’s worth mentioning that in the case of a One-


Shot Learning problem, the algorithm is similar
to Matching Networks.

Also, PN uses Euclidean distance instead of


cosine distance. It’s seen as a major part of the
algorithm’s improvements.

Relation Network

All experiments carried out to build Matching


and Prototypical Networks actually led to the
creation of the Relation Network (RN). RN was
built on the PN concept but added big changes
to the algorithm.

The distance function was not defined in


advance but learned by the algorithm. RN has
its own relation module that does this. If you
want to learn more, check out the initial article.

The overall structure is as follows. The relation


module is put on the top of the embedding
module, which is the part that computes
embeddings and class prototypes from input
images.
:

You might also like