Sunday, February 12, 2017

High Quality Face Recognition with Deep Metric Learning

Since the last dlib release, I've been working on adding easy to use deep metric learning tooling to dlib. Deep metric learning is useful for a lot of things, but the most popular application is face recognition. So obviously I had to add a face recognition example program to dlib. The new example comes with pictures of bald Hollywood action heroes and uses the provided deep metric model to identify how many different people there are and which faces belong to each person. The input images are shown below along with the four automatically identified face clusters:




Just like all the other example dlib models, the pretrained model used by this example program is in the public domain. So you can use it for anything you want. Also, the model has an accuracy of 99.38% on the standard Labeled Faces in the Wild benchmark. This is comparable to other state-of-the-art models and means that, given two face images, it correctly predicts if the images are of the same person 99.38% of the time.

For those interested in the model details, this model is a ResNet network with 29 conv layers. It's essentially a version of the ResNet-34 network from the paper Deep Residual Learning for Image Recognition by He, Zhang, Ren, and Sun with a few layers removed and the number of filters per layer reduced by half.

The network was trained from scratch on a dataset of about 3 million faces. This dataset is derived from a number of datasets. The face scrub dataset[2], the VGG dataset[1], and then a large number of images I personally scraped from the internet. I tried as best I could to clean up the combined dataset by removing labeling errors, which meant filtering out a lot of stuff from VGG. I did this by repeatedly training a face recognition model and then using graph clustering methods and a lot of manual review to clean up the dataset. In the end, about half the images are from VGG and face scrub. Also, the total number of individual identities in the dataset is 7485. I made sure to avoid overlap with identities in LFW so the LFW evaluation would be valid.

The network training started with randomly initialized weights and used a structured metric loss that tries to project all the identities into non-overlapping balls of radius 0.6. The loss is basically a type of pair-wise hinge loss that runs over all pairs in a mini-batch and includes hard-negative mining at the mini-batch level. The training code is obviously also available, since that sort of thing is basically the point of dlib. You can find all details on training and model specifics by reading the example program and consulting the referenced parts of dlib.  There is also a Python API for accessing the face recognition model.



[1] O. M. Parkhi, A. Vedaldi, A. Zisserman Deep Face Recognition British Machine Vision Conference, 2015.
[2] H.-W. Ng, S. Winkler. A data-driven approach to cleaning large face datasets. Proc. IEEE International Conference on Image Processing (ICIP), Paris, France, Oct. 27-30, 2014

403 comments :

«Oldest   ‹Older   401 – 403 of 403
Tsai Joy said...

Hi Andrey, yes it's Euclidean score not euler, my bad ;)
(Euclidean score = 1.0 - Euclidean distance)

As for the second question, yes there were not-sure regions (Euclidean score 60~70) where the face recognition(FR) had trouble giving correct results. Therefore, I previously skipped these frames completely, due to the inference is a video, and I have many frames of face images to use.

In short, Euclidean score below 60 was set as "Unconfidence" where the inference face is labeled "Unknown" (Unknown means not enrolled in database). In contrast, Euclidean score above 70 will be labeled as the enrolled name in database e.g. "WILL". For Euclidean score between 60~70, false results will occur, i.e. the FR will think it's someone else "John" when it's actually "WILL".

But all these measures are just a work around for Euclidean score, as the SVM score now can give a more definite "-1" as different faces and "+1" as same faces between inference and enrolled. I guess you can say high varianced SVM score (I normalized from -1~1 to 0.00~1.00) is better in FR application, due to differ the "higher known confidence" and "lower unknown Unconfidence".

Andrey Zakharoff said...

@Tsay Joy Hi Joy, I use formula Probability=sqrt(1.- Euclidean distance), also this is not real probability, but probability-like value. Actually,in this way I de-linearize output value. Due to the changed slope of the function I get stretched scores near 0. and shrinked where Euclidean distance coming to 1. Don't you think, your SVM does something like this?

Andrey Zakharoff said...

Please do not SPAM here!

«Oldest ‹Older   401 – 403 of 403   Newer› Newest»