You are on page 1of 1

One Shot Learning

e.g: can you develop a computer vision system that can look at two images it has
never seen before and say whether they represent the same object?
One of the key problems in many computer vision problems is that you don’t have
many labeled images to train your neural network. For instance, a classic facial
recognition algorithm must be trained on many images of the same person to be able
to recognize her.

Imagine what this would mean for a facial recognition system used at an
international airport. You would need several images of every single person who
would possibly pass through that airport, which could amount to billions of images.

Instead of treating the task as a classification problem, one-shot learning turns


it into a difference-evaluation problem.

When a deep learning model is adjusted for one-shot learning, it takes two images
(e.g., the passport image and the image of the person looking at the camera) and
returns a value that shows the similarity between the two images. If the images
contain the same object (or the same face), the neural network returns a value that
is smaller than a specific threshold (say, zero) and if they’re not the same
object, it will be higher than the threshold.

The key to one-shot learning is an architecture called the “Siamese neural


network.”

we use a function called “triplet loss.” Basically, the triplet loss trains the
neural network by giving it three images: an anchor image, a positive image, and a
negative image. The neural network must adjust its parameters so that the feature
encoding values for the anchor and positive image are very close while that of the
negative image is very different.

For instance, in the case of the facial recognition example, a trained Siamese
neural network should be able to compare two images in terms of facial features
such as distance between eyes, nose, and mouth.

Training the Siamese network still requires a fairly large set of APN trios. But
creating the training data is much easier than the classic datasets that need each
image to be labeled. Say you have a dataset of 20 face images from two people,
which means you have 10 images per person. You can generate 1,800 APN trios from
this dataset. (You use the 10 pictures of each person to create 10×9 AP pairs and
combine it with the remaining 10 images to create a total of 10x9x10x2 = 1800 APN
trios)

With 30 images, you can create 5,400 trios, and with 100 images, you can create
81,000 APNs. Ideally, your dataset should have a diversity of face images to better
generalize across different features. Another good idea is to use a previously
trained convolutional neural network and finetune it for one-shot learning.

Limitations:
Each Siamese neural network is just useful for the one task it has been trained on.
A neural network tuned for one-shot learning for facial recognition can’t be used
for some other task, such as telling whether two pictures contain the same dog or
the same car.

The neural networks are also sensitive to other variations. For instance, the
accuracy can degrade considerably if the person in one of the images is wearing a
hat, scarf, or glasses, and the person in the other image is not.

You might also like