You are on page 1of 2

In modern day face recognition, the pipeline consists of 4 stages:

● Detect
● Align
● Represent
● Classify
This paper describes employing 3D face modeling in which the images can be aligned and
represented to the network. The alignment of the images and feeding it to the CNN is what
makes this approach different from rest of Deep Learning architecture and matches closely to
human accuracy.

Introduction
Face recognition in unconstrained images have always been a challenge as the brightness,
occlusions, in the image, deteriorate the performance of the classifier.
We present a system (​DeepFace​) that has closed the majority of the remaining gap in the most
popular benchmark in unconstrained face recognition, and is now at the brink of human level
accuracy. It is trained on a large dataset of faces acquired from a population vastly different
than the one used to construct the evaluation benchmarks, and it is able to outperform existing
systems with only very minimal adaptation. Moreover, the system produces an extremely
compact face representation.

Face Alignment
● Get rid of variations within the face images, so that every face seems to look straight into
the camera
● Two types of alignments done are:
1. 2D Alignment
2. 3D Alignment
● ​2D Alignment​ search for the landmarks(fiducial points) on the face.
○ They use SVR for this feature.
○ After every SVR, the localized landmarks help transform the face
● 3D Alignment
1. The 2D alignment allows to normalize variations in the 2D plane, not out of the
plane variations. To normalize out of the plane variations they need a 3D
transformation.
2. They detect an additional 67 landmarks on the faces (again via SVRs).
3. They construct a human face mesh from a dataset (USF Human-ID).
4. They map the 67 landmarks to that mesh.
5. From this landmarks frontalized image can be recovered.
● CNN
1. CNN receives the frontalized image.
2. They use a dropout only after the first layer.
3. Cross entropy loss is used
● Results
○ They train their network on the Social Face Classification (SFC) dataset.
That seems to be a Facebook-internal dataset (i.e. not public) with 4.4
million faces of 4k people.
When applied to the LFW dataset:
○ Face recognition ("which person is shown in the image") (apparently they
re trained the whole model on LFW for this task?):
○ Simple SVM with LBP (i.e. not their network): 91.4% mean accuracy.
○ Their model, no frontalization (only 2d alignment): 94.3% mean accuracy.
○ Their model, no frontalization, no 2d alignment: 87.9% mean accuracy.
Face verification (two images -> same/not same person) (apparently also trained
on LFW? unclear):
○ Method 1 (inner product + threshold): 95.92% mean accuracy.
○ Method 2 (X^2 vector + SVM): 97.00% mean accuracy.
○ Method 3 (siamese): Apparently 96.17% accuracy alone, and 97.25%
when used in an ensemble with other methods (under special training
schedule using SFC dataset).

You might also like