dlib C++ Library: High Quality Face Recognition with Deep Metric Learning

Sunday, February 12, 2017

High Quality Face Recognition with Deep Metric Learning

Since the last dlib release, I've been working on adding easy to use deep metric learning tooling to dlib. Deep metric learning is useful for a lot of things, but the most popular application is face recognition. So obviously I had to add a face recognition example program to dlib. The new example comes with pictures of bald Hollywood action heroes and uses the provided deep metric model to identify how many different people there are and which faces belong to each person. The input images are shown below along with the four automatically identified face clusters:

Just like all the other example dlib models, the pretrained model used by this example program is in the public domain. So you can use it for anything you want. Also, the model has an accuracy of 99.38% on the standard Labeled Faces in the Wild benchmark. This is comparable to other state-of-the-art models and means that, given two face images, it correctly predicts if the images are of the same person 99.38% of the time.

For those interested in the model details, this model is a ResNet network with 29 conv layers. It's essentially a version of the ResNet-34 network from the paper Deep Residual Learning for Image Recognition by He, Zhang, Ren, and Sun with a few layers removed and the number of filters per layer reduced by half.

The network was trained from scratch on a dataset of about 3 million faces. This dataset is derived from a number of datasets. The face scrub dataset[2], the VGG dataset[1], and then a large number of images I personally scraped from the internet. I tried as best I could to clean up the combined dataset by removing labeling errors, which meant filtering out a lot of stuff from VGG. I did this by repeatedly training a face recognition model and then using graph clustering methods and a lot of manual review to clean up the dataset. In the end, about half the images are from VGG and face scrub. Also, the total number of individual identities in the dataset is 7485. I made sure to avoid overlap with identities in LFW so the LFW evaluation would be valid.

The network training started with randomly initialized weights and used a structured metric loss that tries to project all the identities into non-overlapping balls of radius 0.6. The loss is basically a type of pair-wise hinge loss that runs over all pairs in a mini-batch and includes hard-negative mining at the mini-batch level. The training code is obviously also available, since that sort of thing is basically the point of dlib. You can find all details on training and model specifics by reading the example program and consulting the referenced parts of dlib. There is also a Python API for accessing the face recognition model.

[1] O. M. Parkhi, A. Vedaldi, A. Zisserman Deep Face Recognition British Machine Vision Conference, 2015.
[2] H.-W. Ng, S. Winkler. A data-driven approach to cleaning large face datasets. Proc. IEEE International Conference on Image Processing (ICIP), Paris, France, Oct. 27-30, 2014

466 comments :

«Oldest ‹Older 201 – 400 of 466 Newer› Newest»

Tapas said...: Hello Davis,
I got it working. I simply created an object of dlib.rectangle by giving the image information as constructor arguments and passed as second argument to facerec.compute_face_descriptor. It working.

Thanks; August 24, 2017 at 3:21 AM
Unknown said...: Hi Davis,
Thanks for this cool stuff,
I wonder about the face descriptor computation time.
My core-i7, SSD, GTX1080 GPU, takes 0.35 sec to extract feature for a single face without any image jittering or augmentation.

Is that normal?
It seems to slow to me somehow for real-time purposes?; September 1, 2017 at 3:46 AM
Davis King said...: That's very slow. You are probably not using cuda, blas, or any other such optimizations. When you compile cmake will print messages telling you what it's doing. You can see if it's using these things.; September 1, 2017 at 5:39 AM
miguel said...: Hi Davis,

First of all congratulations on your work! Really impressive, and dlib is for me one of the
greatest ML tools around.

I am trying to retrain you model for my type of images (works very well, still I would to train on my own set.). I have close to 500K identities, the problem is that I have 1M images (two per subject). Do you think I still can get a good model even without several images per subject?

Thanks in advance,
Miguel; September 1, 2017 at 7:38 PM
Davis King said...: Thanks, I'm glad you like dlib :)

You can try with only 2 per subject, although I'm pretty sure the resulting model isn't going to be very good. The general consensus in the research community seems to be that you need a lot of within-class examples to learn this kind of model. That's also been my experience as well.; September 1, 2017 at 9:22 PM
miguel said...: I understand what you're saying, but I will need to give it a shoot since my case is very specific. Other way might be to do some transfer learning, I don't know exactly how I can do it but I will need to take a look.

Anyway, thank you a lot.

Cheers,
Miguel; September 2, 2017 at 4:04 AM
defd said...: Hello Davis, I wander, is it possible to "straighten" a detected face using dlib? Here's an example :
https://i.stack.imgur.com/4Y9HD.jpg

I only care about landmarks; September 7, 2017 at 11:27 AM
Davis King said...: You can rotate them upright. But there isn't any 3D face warping in dlib if that's what you are asking about.; September 8, 2017 at 5:40 AM
timedacorn said...: This comment has been removed by the author.; September 9, 2017 at 1:31 AM
Unknown said...: instead of http://dlib.net/files/shape_predictor_68_face_landmarks.dat.bz2

Can I use my own landmark detector dat file which is trained by dlib_shape_detector but detects 90 points not 68?; September 14, 2017 at 8:40 AM
Unknown said...: This comment has been removed by the author.; September 14, 2017 at 8:40 AM
Davis King said...: You can do whatever you want so long as the faces are cropped and aligned the same way as the dlib example shows.; September 14, 2017 at 9:26 AM
Unknown said...: It is no doubt awesome post! Great job Davis KIng! And Thank you!; September 18, 2017 at 4:11 AM
Unknown said...: It is no doubt awesome post! Great job Davis KIng! And Thank you!; September 18, 2017 at 4:12 AM
Davis King said...: Thanks :); September 18, 2017 at 7:15 AM
Unknown said...: Thank you Davis King! It works pretty good!; September 19, 2017 at 9:12 AM
Unknown said...: I have performed the so called t-SNE on the faceScurb database (5 pictures per person) with Dlib face descriptor. Maybe it would be interesting to someone other, - final picture could be found here https://drive.google.com/file/d/0B7JrJeplhKLveFgyRDlPNHBUR28/view?usp=sharing; September 20, 2017 at 10:18 AM
Giorgos B. said...: Hello Davis,
I would like to train an object detector with 8 classes (dog,cat and some other animals) but i'm willing to execute it with a video as input and so, i would like it to be as quick as possible. I've already tested the train_object_detector.cpp, but it is really slow and decreases the video's frame rate (due to high resolution). Which is the fastest detector i could use? Is there any particular solution you could propose?
Really thanks, i've been working months now on dlib, and still i have many things to learn... :); September 21, 2017 at 11:22 AM
Davis King said...: The HOG detector in that example is the fastest one available. Also, http://dlib.net/faq.html#Whyisdlibslow; September 22, 2017 at 9:23 AM
Unknown said...: I noticed in my tests that:

1) A face without a mouth visible got detected as a face

2) When comparing the descriptor of said mouthless face, with the descriptor of a face of the same person with a mouse, we still get a very close distance. I.e. it correctly finds them to be the same person!

Am I right in assuming that this is because the mouth data is not used in the descriptor?

(I know that this is not a dlib-specific question and has more to do with the deep learning involved, but I'm not sure where else to ask this. Any tips for other forums?)

Thank you!; September 25, 2017 at 10:49 AM
Davis King said...: No, the whole face crop is used in the computation, including the mouth. The point of this thing is to be robust to all kinds of changes to someone's face that still preserves their identity. So it's good that this works like this.; September 25, 2017 at 3:58 PM
kim said...: Hi, first I have to say: great work, Dlib has helped me a lot.

Now for the question, is it possible to identify which image gave which face after the clustering?

Thanks :); October 1, 2017 at 11:58 AM
Davis King said...: Thanks, I'm glad you like dlib.

Yes, you can find that out. Look at the code. It's trivially available information in the example program.; October 1, 2017 at 4:39 PM
Unknown said...: Hi dear Davis, thank you very much for your works,
The mmod_human_face_detector is a great model. As you mentioned, your face detector for preprocessing of face recognition is HOG-based frontal face detector.
Can I use dnn face detector model in face recognition and have the same performance?; October 14, 2017 at 8:23 AM
Davis King said...: You can use any detector so long as you are able to align the faces the same way. To do this with the CNN model you need to use the 5-point face landmarking model released with the newest dlib. When using the 5-point model you can use either the CNN or HOG face detector and they will both give the same performance.; October 14, 2017 at 8:32 AM
Unknown said...: Dear Davis, Thanks for your quick reply
I used CNN model with 5-point face landmarking, but I have an error:

Error detected in file c:\dlib\dlib\image_transforms/interpolation.h.
Error detected in function struct dlib::chip_details __cdecl dlib::get_face_chip_details(const class dlib::full_object_detection &,const unsigned long,const double).

Failing expression was det.num_parts() == 68.
chip_details get_face_chip_details()
You must give a detection with exactly 68 parts in it.
det.num_parts(): 5

The code is here:
auto shape = sp(img, det);
matrix face_chip;
extract_image_chip(img, get_face_chip_details(shape, 150, 0.25), face_chip);
face = move(face_chip);
matrix face_descriptor = net(face);

Can you help me?; October 15, 2017 at 2:16 AM
Davis King said...: Think about it. You are trying to use a model that wasn't created until dlib 19.7, but you are using an older version of dlib. How can that work?; October 15, 2017 at 7:28 AM
Unknown said...: Hello Davis
i have installed CUDA then compiled dlib and opencv its working complete. i want to know if "dlib.face_recognition_model_v1.compute_face_descriptor" function utilizing CUDA?if not i have to write wrapper for python? or something like that. bcz i have to performance diff before and after installation of CUDA?
Thank you.; October 17, 2017 at 2:15 AM
Davis King said...: dlib.face_recognition_model_v1.compute_face_descriptor uses CUDA.; October 17, 2017 at 5:37 AM
Unknown said...: but there is no speed diff after and before compiled with CUDA? does it mean that CUDA makes not difference in this function?; October 17, 2017 at 11:14 AM
Davis King said...: Look at the CMake output when dlib is built. It will tell you if it's using CUDA or not.; October 17, 2017 at 7:14 PM
Unknown said...: hi Davis,

I found out this line of code: std::vector> face_descriptors = net(faces); takes most time. Each face image will takes around 300ms to convert into face descriptor. Any chance to reduce that?

Thanks.; October 17, 2017 at 11:24 PM
Davis King said...: Be sure to link to the Intel MKL if running on the CPU, or even better, use a fast GPU.; October 18, 2017 at 5:44 AM
Unknown said...: Yeh I compiled with CUDA and it runs faster now. Thanks :); October 18, 2017 at 5:58 AM
Unknown said...: This comment has been removed by the author.; October 18, 2017 at 8:30 AM
Unknown said...: hi Duc how much time it takes now after compiling with cuda? mine is 170ms after and before cuda but in both cases with Intel MKL how much yours now?; October 18, 2017 at 8:32 AM
Unknown said...: This line of code

std::vector> face_descriptors = net(faces);

takes 0.44 seconds on 17 faces so it is around 23ms per face. My GPU is Nvidia 870M GTX.

Cheers,; October 18, 2017 at 8:43 AM
Unknown said...: which compiler you used? and platform is windows? and how did you compiled with cuda can you tell me?; October 18, 2017 at 9:39 AM
Unknown said...: I compile on Linux.

The standard way to compile is to use Cmake as recommended by Davis. I first run cmake-gui to enable DLIB_USE_CUDA, and other options like DLIB_JPEG_SUPPORT, DLIB_PNG_SUPPORT, etc... After that just run following commands( refer here: http://dlib.net/compile.html )

cd examples
mkdir build
cmake-gui ( this is when you enable CUDA as mentioned above )
cd build
cmake ..
cmake --build . --config Release

Good luck.

ps: you said it takes 170ms, which is per face or on the whole vector of faces?; October 18, 2017 at 10:33 AM
Unknown said...: This comment has been removed by the author.; October 18, 2017 at 11:20 AM
Unknown said...: This comment has been removed by the author.; October 18, 2017 at 11:21 AM
Software Solutions said...: Hi Thanks for the great work

Can I please know if there is any way to get the names of the person for each of clustered group.

Basically wanted to train say 5 sets of people and recognize them in an image; October 22, 2017 at 10:20 AM
Tapas said...: Hello Davis,
Thanks for such a nice library.
I am facing a problem. In continuing to my comment posted on August 24, 2017 at 3:21 AM.

Once I get the face bounding box from video frame, I am extracting 128D features from bounding box. As I got the bounding box of detected face, I saved the face to disk. Reloading the saved face and again extracting 128D features(using dlib.rectangle on whole face-image)...These features are not matching with previous bounding box features. Why are the features not matching ?

Thanks
Tapas; October 23, 2017 at 6:56 AM
Unknown said...: I'm wondering if anyone has experimented to determine the minimum face resolution that will result in a reliable computation of the descriptor. For example, will a 40x40-pixel face be comparable to faces of higher resolution?; October 25, 2017 at 9:58 AM
Unknown said...: Hi Davis;
in python example, what is the role of shape in line ... compute_face_descriptor(img, shape)? Does the recognition model calculate the descriptor according to five landmarks? if so, is it possible to use another shape models -for example- that find 10 landmarks? in this case, does the distance threshold (0.6) change?

thanks in advance.; October 29, 2017 at 11:17 AM
Davis King said...: The landmarks are only used to align the face before the DNN extracts the face descriptor. How many landmarks you use doesn't really matter.; October 29, 2017 at 6:34 PM
Unknown said...: Hi Davis,

It seems that the size of output layer of the network model is 128 which corresponds to 128D vector. How the 128D vector is extracted from an image for training?; October 30, 2017 at 4:13 PM
Davis King said...: This example program shows how to train the model from images: http://dlib.net/dnn_metric_learning_on_images_ex.cpp.html; October 30, 2017 at 7:06 PM
Unknown said...: Hello Davis, How do I determine which image the "image index" refers to. Specifically:
I am training my own detector and received the following:
"RuntimeError: An impossible set of object labels was detected ..."
1. It said that the problem was "image index 1017". How do I find which image this is referring to in the xml file?
2. It also give the "truth rectangle" and "nearest detection template rect:" with their bounding box params. None of which match any of my bb's. What are these rectangles referring to?
3. Where do I adjust the "match_eps"
Thank you, Jon; November 8, 2017 at 6:12 PM
Unknown said...: Hello Davis;
Do You plan to provide python api for dnn metric learning?

Thanks.; November 11, 2017 at 6:13 AM
Davis King said...: Like this? https://github.com/davisking/dlib/blob/master/python_examples/face_recognition.py; November 11, 2017 at 7:11 AM
Unknown said...: No, I mean python equivalent for http://dlib.net/dnn_metric_learning_ex.cpp.html; November 11, 2017 at 7:24 AM
Davis King said...: No, I'm not going to add that since it's impossible to define the network architecture from python.; November 11, 2017 at 8:09 AM
Mike said...: Hi Davis,
have you ever tried it on a Jetson TX2. I wonder how fast it would be?
Is there any chance to optimize it on an ARM Cortex A7 (dual core) reaching app. 3-5fps?
Or would you rather say forget it?
Thanks!; November 23, 2017 at 7:50 AM
Davis King said...: I haven't used a Jetson, but that's a very popular way to run these DNNs. Most people find the performance to be quite reasonable.; November 24, 2017 at 8:35 AM
Unknown said...: Mike, I had posted a question about this in the TX2 forum after I did a bit of testing:

https://devtalk.nvidia.com/default/topic/1025670/jetson-tx2/dnn-face-detection-performance-on-tx2/post/5217046/

At least for my particular test (dlib DNN-based face detection on high-res images), it appears that the Jetson TX2 is ~10x slower than a GTX 1070. Please do let us know what you find. :); November 24, 2017 at 8:42 AM
Mike said...: Bill, thanks for your feedback. I have measured the execution time for the dnn_face_recognition_ex on a Jetson TX2.
It is a Release compile (though with debug info) using CUDA.
The time is exclusive of loading the data and displaying the images, just the inner execution: 4204 ms.
Is that in line with your findings?
I will get rid of the debug info and play with compiler settings.; November 27, 2017 at 11:30 AM
Unknown said...: Hey Mike,

Firstly, I haven't executed the dnn_face_recognition_ex example specifically. Sorry. Secondly, be sure to do a few iterations of whatever you are trying since the first few (?) may be much slower than the others, due to things being initialized the first time around...; November 27, 2017 at 11:41 AM
Mike said...: Hi Bill,
I do run the example several times but it does not get any better that 4,2 seconds. It detects 24 faces in total of 4 different guys, so my take is 5-6 fps. Any hints on compiler options to check?
Next, I will compare this result with a standard dual core ARMv7@1GHz, eventually using NEON and VFP support...; November 27, 2017 at 12:14 PM
Mike said...: Hi Bill,
the same code on an ARM Cortex-A7 @1GHz takes 130 seconds. It is a release compile with -march=armv7-a -mtune=cortex-a7 -mfpu=neon-vfpv4 -ffast-math -Ofast ...; November 28, 2017 at 10:12 AM
Unknown said...: Sounds like a decent speed-up. :); November 29, 2017 at 10:15 AM
Mike said...: Hello Davis,
I am right in assuming that the dnn_face_recogntion_ex as of V19.7 uses the HOG-based frontal face detector and NOT the CNN-based one? So I should be able to expect performance improvements for the landmark extraction by employing NEON?; November 30, 2017 at 8:22 AM
Davis King said...: Yes, that example uses HOG. You could just as easily use the CNN face detector with it though.; November 30, 2017 at 9:14 AM
Unknown said...: Dear Davis:
Recently i've been wanting to implement the OpenMP "#pragma omp parallel for"
in the dlib\matrix\matrix_default_mul.h
specifically like this:

for (long r = lhs_block.top(); r <= lhs_block.bottom(); ++r)
{
for (long c = lhs_block.left(); c<= lhs_block.right(); ++c)
{
const typename EXP2::type temp = lhs(r,c);
#pragma omp parallel for //<----------------inserted OpenMP line here
for (long i = rhs_block.left(); i <= rhs_block.right(); ++i)
{
dest(r,i) += rhs(c,i)*temp;
}
}
}
however when i run examples e.g. dnn_face_recognition_ex.cpp
i don't see multi-core processing (via Intel's VTUNE tool) when the get 128D line runs:
std::vector> face_descriptors = net(faces);

Where should i put the "#pragma omp parallel for" to enable OpenMP multi-core processing in dlib?; December 5, 2017 at 10:43 PM
Davis King said...: Linking dlib with the Intel MKL is a much better approach to get the kind of speed boost you are looking for. The CMake scripts are already setup to do it if you install the MKL.; December 6, 2017 at 6:00 AM
Mike said...: Hi Davis,
are the dlib-for-ARM improvements by fastfastball now part of dlib-19.7, e.g. SIMD for NEON and threading?
Or would I have to redo the changes for 19.7?; December 7, 2017 at 3:21 AM
Davis King said...: All that stuff is now part of the main dlib codebase, so yes, it's there. You don't need to do anything to get it.; December 7, 2017 at 5:39 AM
Unknown said...: Hi Davis,

Thank you for your great work! It really help me a lot in my project.

Regarding to the accuracy rate of 99.38% on LFW, do you only do the test on 1680 people pictured with more than one photo? The LFW says there are some incorrectly labeled photos. How do you process these photos? Manually correct them or ignore them in your test?

How do you do the recognition testing? Do you calculate all photos' 128D feature and then compare with each other and see whether the distance between same person's photo is less than 0.6?

Thank you in advance!

Kevin; December 12, 2017 at 10:23 AM
Davis King said...: I follow the exact evaluation protocol laid out by the LFW challenge. This file contains the entire test script for the dlib model: http://dlib.net/files/dlib_face_recognition_resnet_model_v1_lfw_test_scripts.tar.bz2. You can run it and see the LFW evaluation outputs.; December 12, 2017 at 10:36 AM
Unknown said...: Hi Davis,

Thank you for your reply!

I have a question about training face model. Could you give me some comments?

If I increase face images from 3 million to 6 million. Then, will the trained model work better to verify person's face? For example, accuracy rate is increased and false positive rate is decreased.

According to your experience, is there an upper boundary for the recognition capability? It means that recognition capability will not increase, although the number of training face image increases.

Best regards,
Kevin; December 13, 2017 at 11:14 AM
Davis King said...: Yes, more data is better. For instance, Google trained a face recognizer on 200 million faces and got great results.; December 13, 2017 at 11:29 AM
Unknown said...: Hello Davis,

Could you please answer how do you forming mini batch?

Do you take some number of different persons and some number of their unique images? For example, you choose 64 persons and for each take 8 images so the size of mini batch will be 512. Or you just take some random images and for one person you have 10 images for second 3 and so on.

Yury Savitskiy; December 14, 2017 at 2:19 AM
Davis King said...: You can make the mini-batches any way you want. To see what I did, refer to the metric learning example program.; December 14, 2017 at 6:21 AM
Unknown said...: This comment has been removed by the author.; December 24, 2017 at 12:46 AM
Unknown said...: Hi thank you davis for your well documented great work

I have encountered a problem, firstly i build dlib with mingw32 and i am using it in Qt, everything is ok, when i use a dlib function it does it job no problem but after i closed the application the process of that application have not been closed, there is still a process named my application.

well i couldn't find anything relevant to the problem

i might add i tested it in windows 7 and windows 10, both of them same

do you have any idea what is it going on?; December 24, 2017 at 12:48 AM
Unknown said...: How to pre-filter images during recognition phase, for liveness detection?
This to avoid using photos shown to the webcamera and get positive recognition. We only want alive people in front of the camera.
Dario; January 2, 2018 at 11:57 PM
Davis King said...: There aren't any functions in dlib that do this, so you will have to roll your own.; January 3, 2018 at 5:57 AM
Unknown said...: Hi, congrats for dlib, it rocks !
Can you be more precise for the dnn used for face comparaison an what it does exactly ? Thanks; January 18, 2018 at 5:27 AM
Davis King said...: The entire network is defined in the example program linked to from this blog post. If you read the example program you will find all the details.; January 18, 2018 at 10:38 AM
Unknown said...: Hi again

i wanted to run baldguys face recognition example but in this line:
faces.push_back(dlib::move(face_chip));
i get error that says 'move is not member of dlib'

i should add example works without 'move' function, I was just wondering, what is 'move' function doing. not using it will reduce accuracy of face recognition?

thank u for your great work; January 21, 2018 at 5:03 AM
Davis King said...: Right, there is no dlib::move(). Happily, that's not what is in the example code. You must have put that there yourself. Get the unmodified example and it will work.; January 21, 2018 at 7:23 AM
Unknown said...: Hi
regarding 'move' function, u mentioned maybe i added that myself
No i didn't put anything new to the example, it was like this:

http://dlib.net/dnn_face_recognition_ex.cpp.html; January 21, 2018 at 11:46 PM
Davis King said...: This is what is in the example:

faces.push_back(move(face_chip));

There is no dlib::move in there.; January 22, 2018 at 5:36 AM
Unknown said...: oooh sorry,
I was mixed up between namespaces................; January 22, 2018 at 2:47 PM
Unknown said...: Hi Davis,

Very new to deep learning, but I'm used to seeing trained models with a .params and .json file. I see yours is a .dat. I'm trying to get this to work on the AWS deeplens and having some trouble. Is there a way to turn the model into a .params and .json format?; January 22, 2018 at 8:55 PM
Davis King said...: That's not how this works. If you want to use dlib's models use dlib.; January 23, 2018 at 5:56 AM
miguel said...: Hi Davis,

Just open question, I saw that dlib has a repeat funcionality that allows to use much less memory during compilation (not sure if during execution). Is it possible to convert models without this replication layer from the models without it? Specifically, is it possible to convert this face model to use dlib::repeat?
Thanks,
Miguel; January 23, 2018 at 8:40 AM
Davis King said...: repeat is just a convenience. It doesn't make things faster or slower. And it only makes a substantive impact on compile times when using visual studio since visual studio has not so great template compilation in general. For gcc or clang it doesn't matter.

No, there is no conversion.; January 23, 2018 at 9:56 AM
Mike said...: Hello Davis,
you stated that the trained model has a 99.38% accuracy on the standard LFW face recognition benchmark. Is there a metric how that would translate into FAR/FRR values?
Thanks!
Michael; January 29, 2018 at 4:42 AM
Davis King said...: The FAR and FRR rates you get are going to be heavily dependent on your application and how you use it. So no, there is no general FAR or FRR value. For example, the larger the database of faces you are searching the more likely you are to get a false positive.; January 29, 2018 at 6:46 AM
k_man said...: Hello Davis!

Im trying reproduce your results on LFW data set. i saw the code you provided, ran him and got the same result. but when i look into the code, i saw that your function is running on get_lfw_pairs() witch return the pairs of images with rectangle for indexing. then it chooses the best_det according to overlap.
what are those rectangle that are coming from get_lfw_pairs()? (the other ones i get they are from the detector...)

Thanks!; February 12, 2018 at 8:33 AM
Davis King said...: If the detector didn't find a face then the box is just the box in the center of the image, which is where the face is nominally supposed to be.; February 12, 2018 at 8:58 AM
Unknown said...: I've been playing with dnn_metric_learning_on_images_ex (having read all the docs / comments) but there are still a few things that I'm not sure about:

- load_mini_batch ensures that the batch doesn't re-include the same person twice. However, when choosing the samples for a given person, it doesn't try to avoid including the same sample twice. Is this by design / ok? Will we run into problems if some of the persons in the training set have fewer than samples_per_id samples available?

- I read that for dlib_face_recognition_resnet_model_v1 you used a mini-batch size of 35x15 instead of 5x5. Is that just for performance reasons or would the results have been significantly different?

Thanks!; February 15, 2018 at 9:46 AM
Davis King said...: The images are jittered, so even if the same image is included it's fine. You could experiment to see if it would be better to avoid duplicates, but I doubt it matters, at least for most datasets.

Yes, the batch size is very significant. Some sizes lead to much higher accuracy models.; February 15, 2018 at 11:09 AM
Unknown said...: Hi

what if i want to classify cars with metric_loss using dnn_metric_learning_images example

do you think i will achieve acceptable accuracy?; February 17, 2018 at 2:50 AM
Davis King said...: It might work great, the only way to know is to try and see what happens though.; February 17, 2018 at 8:12 AM
Unknown said...: Thanks for this great package!
I using face recognition api. I want to know that how it needs only one image for recognition?
or it just recognizes a difference between two image.; February 21, 2018 at 5:10 AM
Unknown said...: Hi

I wonder how Length function calculates difference between two face vectors?

and can i find a accuracy percentage in that number which is less than 1,
for example 0.5 means 90 percent; February 27, 2018 at 7:03 AM
Unknown said...: How can i compute a matching metric between 2 faces ?; March 5, 2018 at 2:06 PM
Davis King said...: Read the blog post and the linked example program, it's literally about answering that question.; March 5, 2018 at 2:36 PM
Unknown said...: This comment has been removed by the author.; March 6, 2018 at 5:02 AM
Unknown said...: Yea i saw that. I would like to make a matching metric. For example i have a standard image named img0 and several others img1 img2 img3. I would like to have a percentage of match between img0 and {img1, img2, img3}. I can't do this with graph clustering.

I made this but i don't known if it's right :
perc_mathing1 = 1-[length(img0-img1)/length(img0)]
perc_mathing2 = 1-[length(img0-img2)/length(img0)]
perc_mathing3 = 1-[length(img0-img3)/length(img0)]; March 6, 2018 at 5:03 AM
Unknown said...: The neural network is trained with 7485 persons. But it can recognize a person which is not is the database.
The tested person will be close to the trained person with she shares the most physical similarities ?; March 20, 2018 at 3:16 AM
KBN said...: Hi Davis,

I have a question about the 3 million data set. When you said the network was trained on 3 million faces' data set, does it mean that there are 3 million distinct faces or the number of distinct faces is lesser than 3 million? If so, how many are distinct faces out of the 3 million data set?

Thank you.
Marc; March 31, 2018 at 6:39 AM
Davis King said...: 3 million images. The number of identities is in this blog post.; March 31, 2018 at 7:31 AM
Unknown said...: Hi Davis,

It is needless to say that your face recognition network works great! However, I am curious to know how did you select the distance threshold (0.6)? Did that parameter affect the training rate or time or accuracy? Also, what is the bound on the norm of the 128-d embedding that you get from an image? The FaceNet (Google team) restrict the norm of the 128-d embedding to be unity. But your network does not give that. I am wondering if you implicitly bound the norm of the 128-d embedding. I look forward to your reply.; April 4, 2018 at 11:00 AM
Davis King said...: I picked 0.6 empirically. It doesn't have any grand significance. 0.6 gave the best results.

I didn't place any bound on the norm. The loss function is constructed in a way that doesn't depend on there being any particular bound. Although, the relative setting of the threshold (the 0.6) and the amount of weight decay implicitly determine the scale of the norm. Look at the documentation for the loss layer for the details.; April 4, 2018 at 5:29 PM
Unknown said...: I couldn't find the documentation for the loss layer in your website. I typed loss layer / metric in the quick search bar of the documentation page, but didn't find anything related to that. The only thing I can see are C++ source codes, which are hard to parse through and understand the math behind your loss metric. Please correct me if I am wrong, I thin you are using the hinge loss with a threshold of 0.6 as described in the paper (http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf)

If this is how your loss metric function is, then did you consider all pairwise image combinations in your dataset, unlike the triplet loss combination chosen by FaceNet paper by Google team? The data size becomes huge when you have to consider all pairwise image combinations.; April 5, 2018 at 11:20 AM
Davis King said...: http://dlib.net/ml.html#loss_metric_

Click on more details and read the big comment about the loss layer.; April 5, 2018 at 12:17 PM
Unknown said...: Hi Davis,

Thanks a lot! I went over the details and it looks like your loss metric is very similar to the Hadsell 2006 paper in that you also force the embeddings of similar images to lie within the distance threshold in addition to forcing the embeddings of dissimilar images to lie outside the threshold. The one last thing is the hard negative image mining of the non-match pairs. How do you decide the N worst non-match pairs? What is the metric here? Do you consider those N pairs among all the non-match pairs whose mutual Euclidean distance is farthest from 0.6?; April 5, 2018 at 1:29 PM
Davis King said...: You sort them by distance and take the pairs that are the most wrong.; April 5, 2018 at 2:16 PM
KBN said...: I've a simple question. pardon my ignorance.

The face recognition example uses 2 .dat files. 1 for face landmark 1 for dnn.
If I were to replace the dnn model with one that has been trained with 10 million faces, do I also need to rebuild the face landmark file? If so, which example file should I refer?

Thank you.; April 5, 2018 at 5:37 PM
Unknown said...: This comment has been removed by the author.; April 6, 2018 at 12:16 PM
Unknown said...: HI

i was wondering which of 5 point or 68 point face landmark algorithms could give better accuracy in face recognition

i saw you used 5 point in example of face classification and 68 point in code for testing overall accuracy; April 6, 2018 at 12:17 PM
Davis King said...: The documentation has links to all relevant examples for each object. E.g. http://dlib.net/ml.html#shape_predictor_trainer

Also, there is no reason why retraining a face recognition model would invalidate a shape predictor model.

The 5 and 68 point models should give the same face recognition accuracy. I recommend using the 5 point model because it's smaller and faster.; April 7, 2018 at 7:11 AM
KBN said...: A question about the training database.
What is your opinion of the noise in the database for training? I realized that those large available databases often contain noise (e.g. out of 100 images of ID_person1, some of them are not ID_person1).
Should I spend time to manually remove those noise? If not, to what percentage should those noise be considered as 'within the tolerant'?
Does your 3m faces also contain some noise? Do we know the % of noise?

Thank you.; April 8, 2018 at 5:16 PM
Anonymous said...: Hi Davis.
According to their paper, FaceNet training takes 1000-2000 hours. What about yours?
Can you compare your model with the FaceNet?; April 8, 2018 at 7:51 PM
Davis King said...: The quality of the training data is very important. I spent a lot of time fixing errors in my dataset and each round of fixing errors notably improved the resulting model. You should spend time doing this, retraining to see how much the improvement is, and repeating that until you get tired of doing it or until the results stop improving. My resulting dataset is quite accurate.

Training the model in dlib takes about a day on a 1080ti.; April 8, 2018 at 8:49 PM
Anonymous said...: How/When to decide a new training is required? Any hint? If we use a classifier (such as one-class SVM or multiclass SVM) on top of your system (in 128x1 space), does it increase the system performance?; April 9, 2018 at 12:55 AM
Davis King said...: Do experiments to find out. If all else fails then you need to retrain.; April 9, 2018 at 3:39 PM
Mike said...: Hello Davis,
on an ARM based embedded system we have neither CUDA nor AVX nor OpenCL. Wo you encourage looking into employing NEON and VFP units to achieve a worth while speed-up? Or is the structure of the computations unsuitable for either one?
Thanks!; April 10, 2018 at 3:33 PM
Davis King said...: NEON is very good, I would use it. Although the DNN code in dlib doesn't do anything with it currently. You can turn on gcc options that automatically use it and use profile driven optimization and maybe that will get you part of the way there.; April 11, 2018 at 7:08 AM
Tapas said...: Hello Davis,
Thanks for wonderful model and library.
In recent version of dlib 19.9, I used face_clustering.py to clean up a big data set. I meant, I preserved faces of biggest cluster. Now when I try to run dnn_metric_learning_on_images_ex, I ame getting error:
EXCEPTION IN LOADING DATA
jpeg_loader: error while reading /tmp/cluster_sample_5/n005387/face_162.jpg

Please help.

Thanks; April 13, 2018 at 7:41 AM
Unknown said...: This comment has been removed by the author.; April 16, 2018 at 11:44 PM
Unknown said...: hi
i wanted to convert matrix rgd_pixel to opencv Mat with toMat function but i get this error

type is not a member of cv::DataType dlib::rgb_pixel

this error comes from to_open_cv.h

what am i doing wrong?; April 16, 2018 at 11:47 PM
Tapas said...: This comment has been removed by the author.; April 19, 2018 at 7:18 AM
Davis King said...: The distance can be any number >= 0.; April 19, 2018 at 8:51 AM
johnpuskin99 said...: This comment has been removed by the author.; April 22, 2018 at 6:10 AM
johnpuskin99 said...: This comment has been removed by the author.; April 22, 2018 at 6:13 AM
johnpuskin99 said...: This comment has been removed by the author.; April 22, 2018 at 6:15 AM
Davis King said...: Copy the mmod_rects into rectangles.; April 22, 2018 at 6:38 AM
Unknown said...: I have noticed that descriptors of b&w photos of people seem to skew towards being close to descriptors of other b&w photos. I assume that this is due to (at least partially) representation bias in the training set. This makes me wonder what would happen if everything were retrained on the same training set, but with all training data converted to grayscale beforehand. Would we lose the bias without losing any recognition accuracy? Has anyone tried it?; April 25, 2018 at 3:33 PM
Davis King said...: Yes, it's not going to work as well with black and white pictures since it's trained on color pictures. It's likely that it would be somewhat better for black and white images if trained specifically on black and white images. I haven't done this though.; April 25, 2018 at 4:09 PM
Mike said...: Measuring performance on the Jetson TX2, I found most of the processing time is still spend on face detection. I already did some shortcuts like limiting the pyramid levels to 2 (object_detector) and spatial subsampling the input image by 2 in each direction.

Here are the results in ms for
face detection = 66
5 landmarks = 6
face chip extraction = 2
dnn vector calc = 10

Are you aware of the paper: Compact Convolutional Neural Network Cascade for Face Detection
(https://arxiv.org/ftp/arxiv/papers/1508/1508.01292.pdf)
The authors claim to have found a "new level of performance/speed ratio for the frontal face detection problem".
This would be an excellent candidate for dlib...; April 26, 2018 at 8:07 AM
Davis King said...: I haven't seen that paper, but that kind of cascade is very common and certainly generally improves detector speeds. At some point I'll add a faster version of the CNN face detector to dlib that uses a cascade, but it wouldn't be until the end of the year at the earliest.; April 26, 2018 at 10:24 AM
johnpuskin99 said...: Hi Davis,

Your model trained on color images. When testing/comparing faces, I plan to use a color image and corresponding gray valued ones. In other words, for a given face, two (128x1) feature vector will be constructed. The smaller distance between two faces will be accepted as a real distance between faces. What do you think about this working schema?; April 26, 2018 at 2:43 PM
Unknown said...: Hi Davis King
I'm try to find a papers related to what is a target output for training? (How to set 128D value target for training)
Can you reference for me some paper that show it ?; May 3, 2018 at 11:44 PM
Unknown said...: Hi

i encountered an error in face recognition process, it happens when two faces are partially covering each other and i pushback those two inside vector of matrix then the net and ....

but when i process those two face separately no problem

i wonder what could be the problem?; May 6, 2018 at 2:47 AM
Tapas said...: Hi Davis King,

How can I change the distance_threshold and margin for loss_metric_. The easiest way is to change in loss.h. Any other better way to do the same.

Thanks; May 24, 2018 at 6:35 AM
Davis King said...: You can set them by calling the loss_metric_'s constructor.; May 24, 2018 at 6:55 AM
Tapas said...: Very Thanks for prompt reply. I am calling loss_metric_'s constructor in compute_loss_value_and_gradient, but both margin and thresholds not changing. Assigning new values directly to margin and dist_thresh in same function(compute_loss_value_and_gradient) is giving readonly error for these two variables. If doing something wrong then apologies. But still any hint would be highly appreciated

Thanks; May 24, 2018 at 7:21 AM
Unknown said...: Hey there. You've mentioned previously about how to speed up the face detector on C++. Are there any ways to speed it up on python?; July 17, 2018 at 3:48 PM
Andyrey said...: Hi Davis,I have been working with face detection and then recognition with your dlib. Detection (I use different method) can yield false positive sometimes. Is there any method to filter it out with the deep metric vector (128 numbers) on recognition stage?; July 18, 2018 at 10:35 AM
timedacorn said...: Hi Davis,

Is there a reason why for the Chinese whispers algorithm we are not using the weights between the nodes as the edge weight. While building the graph there is no 'distance' argument being passed ( in edges.push_back(sample_pair(i,j)), so the default edge weight of one is being used for all edges (in both the tools/python and also in c++ example). Shouldn't we pass the edge weight when building the graph?; July 24, 2018 at 8:07 AM
gwern said...: Did you ever compare performance on the clean vs unclean dataset? How much did cleaning improve final performance or reduce training time?

> I have tried to use dlib to detect anime faces but only work less than 50% of the time.

Richard: Use Nagadomi's face detector. I've used it on thousands of images from my Danbooru2017 collection, and it has good accuracy; it only occasionally selects non-faces, and errors tend to be more like poor cropping.; August 3, 2018 at 7:25 PM
Unknown said...: Hey davis,thanx forthe awesome work.
Can i know the architecture for your 29 layer resnet.the exact architecture.is there any chance of getting the source code of your resnet face recognition model; August 8, 2018 at 11:14 AM
Davis King said...: Cleaning the dataset made a huge difference. It is very worth it to make sure your dataset has correct labels.; August 11, 2018 at 7:34 AM
gwern said...: But how much difference?; August 11, 2018 at 9:50 AM
Davis King said...: I don't remember how much exactly. But anyone trying to train such a model without spending time cleaning their dataset is wasting their time. The single most important detail of making this kind of thing work is having a good dataset.; August 12, 2018 at 8:27 AM
Unknown said...: Hello Davis,

First of all, thank you very much for releasing such a State-of-the-Art Net (and more generally for all of your work).

I realized that when the face descriptors are computed on rescaled image of LFW by a factor 2 the TPR may drop only by less than 0.5 percent (@FPR ~0.01). The native face size is then around 64x64 before the face-chipization at 250x250.
A before-deep-learning way of thinking would then be : "let's just work at smaller scale and use smaller filters".
I wonder if it would worth the try to learn a ResNet with a 128x128 or even 64x64 entry-layer?
or would the dramatic loss of freedom degrees of such a Resnet prevent to get a good metric?

Did you already try to train such smaller Net?

Thanks again.; August 29, 2018 at 11:30 AM
Davis King said...: Reducing the input resolution is probably going to negatively impact the accuracy. I tried a lot of possible architectures, and the one I posted works the best. I'm sure there is room for improvement though.; August 29, 2018 at 12:00 PM
Unknown said...: hi Davis
I noticed the dnn_metric_learning_on_images_ex.cpp uses input_rgb_image as the input layer and the python binding as well as the LFW test suite use input_rgb_image_sized. It lead to serialization problems. Since I have had a couple of models trained, is there a way to convert models with input_rgb_image to those with input_rgb_image_sized?

Thanks!; August 29, 2018 at 10:04 PM
Davis King said...: You can convert between networks by just saying:

net1 = net2;

That works whenever every layer in net1 is constructable from net2, which is the case here.; August 29, 2018 at 10:53 PM
Unknown said...: Thanks,Davis
Sorry, I'm still a little confused. Do you mean to replace the network of dnn_face_recognition_ex.cpp with the network of dnn_metric_learning_on_images_ex.cpp? Or is it modified in dnn_face_recognition_ex.cpp? Maybe my understanding is not good, I hope you can tell me more details.
thank you; August 30, 2018 at 12:05 AM
Unknown said...: Hello Davis,
I'm using the code in dnn_metric_learning_on_image_ex.cpp to train a face recognition model using about 3M images. Although I set set_iterations_without_progress_threshold(10000) as you recommend, it still terminates quite soon after 20000 steps. How many step did it take you to train the dlib_face_recognition_resnet_model_v1.dat model?; September 23, 2018 at 8:12 AM
Davis King said...: I don't recall how many steps, but training took about a day on a 1080ti. So a lot more than 20,000 steps.; September 23, 2018 at 8:17 AM
Unknown said...: Thanks, Davis. I have another question.
The learning rate is decreased during the training procedure. Right? But how? You use Learning Rate Schedules like learning rate exponential decay or you drop it value when the loss increases? I'm not familiar with C++ so it's hard for me to find the answer in your code.; September 24, 2018 at 9:01 PM
Davis King said...: It's just waiting for the loss to stop decreasing. Then it reduces the learning rate by some user defined multiple. This is all explained in the documentation at length. You don't have to read the code.; September 24, 2018 at 10:04 PM
Unknown said...: Thank you very much for your support. (y); September 25, 2018 at 8:57 AM
Unknown said...: Hi Davis,
Is it possible to extract features using intermediary layers of the face recognition model? Hence rather than get 128 vector, get a higher dimensional data for some other face related analysis. Thanks; September 26, 2018 at 6:29 AM
Unknown said...: Hi Davis!
On my computer with 6 processors, I run up to 4 threads and everything works fine with one condition.
The face search zone is fixed.
As soon as I complicate the algorithm by changing the search zone depending on the location of the face, an exception occurs in the method
faces = (* dlib :: frontal_face_detector (roiImg));
If in this sophisticated version, following the launch of a thread, set an expectation of completion threads work - (t[i] = thread(f,...); t[i].join;) - the program works without failures.
In my examples, it turns out that the frontal_face_detector is not thread-safe.
Is it so?
Thank you Nikolay.; October 11, 2018 at 12:57 AM
Davis King said...: In general, you can't operate on a single instance of an object with multiple threads without performing some kind of thread synchronization. That applies here as well.; October 11, 2018 at 7:56 PM
Unknown said...: Hi,
The face recognition model is poor with child faces. Is it because your training pictures contained only adults? Can it be improved by using pictures of children?

Thanks :); October 22, 2018 at 7:50 AM
Davis King said...: Yes, the training dataset is very heavily biased towards adults. If you trained on a large database of children I'm sure the model would be much better.; October 22, 2018 at 9:31 PM
Unknown said...: This comment has been removed by the author.; November 2, 2018 at 11:56 PM
Unknown said...: This comment has been removed by the author.; November 3, 2018 at 12:35 AM
Unknown said...: Dear Davis:
Thanks again for this amazing library!
I ran training on a 2M faces database via dnn_metric_learning_on_images_ex.cpp with the following result:
step#: 110326 learning rate: 0.0001 average loss: 0.0145144 steps without apparent progress: 9926
Saved state to face_metric_sync_
done training
num_right: 300
num_wrong: 0

However when I used it in dnn_face_recognition_ex.cpp it gets this error:
An error occurred while trying to read the first object from the file dlib_face_recognition_resnet_model_v1.dat.
ERROR: Unexpected version found while deserializing dlib::input_rgb_image_sized.

I used dlib HOG face detector to extract face crops to 150 with padding 0.25
Bellow is what changes I made to the training code, specifically in the load_mini_batch function -> //added by me 181103
what could I have done wrong?

void load_mini_batch (
const size_t num_people, // how many different people to include
const size_t samples_per_id, // how many images per person to select.
dlib::rand& rnd,
const std::vector>& objs, // output of load_objects_list()
std::vector>& images,
std::vector& labels,
frontal_face_detector &detector, //added by me 181103
shape_predictor &sp //added by me 181103
)
{
images.clear();
labels.clear();
DLIB_CASSERT(num_people <= objs.size(), "The dataset doesn't have that many people in it.");

std::vector already_selected(objs.size(), false);
matrix image;
for (size_t i = 0; i < num_people; ++i)
{
size_t id = rnd.get_random_32bit_number()%objs.size();
// don't pick a person we already added to the mini-batch
while(already_selected[id])
id = rnd.get_random_32bit_number()%objs.size();
already_selected[id] = true;

for (size_t j = 0; j < samples_per_id; ++j)
{
const auto& obj = objs[id][rnd.get_random_32bit_number()%objs[id].size()];
load_image(image, obj);

/////////////added by me 181103/////////
matrix < rgb_pixel >face_chip;
std::vector det = detector(image);
if (det.size() == 0)
continue;
dlib::full_object_detection shape;
shape = sp(image, det[0]);
extract_image_chip(image, get_face_chip_details(shape, 150, 0.25), face_chip);
images.push_back(std::move(face_chip));
/////////////added by me 181103/////////

labels.push_back(id);
}
}

// You might want to do some data augmentation at this point. Here we do some simple
// color augmentation.
for (auto&& crop : images)
{
disturb_colors(crop,rnd);
// Jitter most crops
if (rnd.get_random_double() > 0.1)
crop = jitter_image(crop,rnd);
}

// All the images going into a mini-batch have to be the same size. And really, all
// the images in your entire training dataset should be the same size for what we are
// doing to make the most sense.
DLIB_CASSERT(images.size() > 0);
for (auto&& img : images)
{
DLIB_CASSERT(img.nr() == images[0].nr() && img.nc() == images[0].nc(),
"All the images in a single mini-batch must be the same size.");
}
}; November 3, 2018 at 12:48 AM
Mike said...: Hello Davis,

remember I mentioned this paper "Compact Convolutional Neural Network Cascade for Face Detection"
(https://arxiv.org/ftp/arxiv/papers/1508/1508.01292.pdf) to you. Face detection is still the main bottleneck in the whole processing chain of realtime facial recognition:
face detection = 66ms
5 landmarks = 6ms
face chip extraction = 2ms
dnn vector calc = 10ms

Is there any chance we would see the above algorithm in dlib?; November 3, 2018 at 10:59 AM
Davis King said...: Look at the code you are running. You get that error because those two examples use different network definitions. Look at the definitions and look at the error message, it should be very clear what's happening. You have to use compatible network definitions.

As for the other poster's question. I'm not going to make a new face detector any time soon as I'm busy with other projects.; November 3, 2018 at 6:42 PM
Unknown said...: Thanks Davis for the quick reply!
I changed dnn_face_recognition_ex.cpp
input_rgb_image_sized<150>
into
input_rgb_image
and it worked!
The results looks promising.
I will run it against LFW and YTF and see the scores ;); November 3, 2018 at 7:47 PM
Andyrey said...: Hi Tsai Joy! Do you train face recognition resnet from scratch or just fine-tuning? Is there such possibility(fine-tuning)? Interested in your results.; November 5, 2018 at 10:19 AM
Unknown said...: Hello Andrey Zakharoff:
Yes I trained it from scratch with dlib on a 1080 ti via CUDA cudnn with set_iterations_without_progress_threshold(10000) and faces cut to 0.25 padding with dlib HOG MMOD face detector. It took about 12 hrs but the results seemed worse than the original model trained by Davis. I'm still ongoing training with more data images (2M -> 6M). I'll update my results after the training is done.; November 7, 2018 at 6:31 AM
Andyrey said...: Hello Tsay, thanx for reply.
Did you take same dataset, which used Davis? (Or partially same?) I wonder is possible to use fine tuning train with additional small dataset, bcz Davis used rather big dataset, but there were mostly celebrities, it means they were in high resolution, but in practice we have small heads in large background picture, so after detection we get low-resolution face.
And interesting, would recognition work with profile faces, if specially trained?; November 7, 2018 at 9:30 AM
Unknown said...: Hello Andrey Zakharoff:
No I didn't use the same data-set, due to mentioned in one of the comments, to use MS-Celeb-1M instead. To my knowing, low resolution faces, low light, and profile (side) faces will affect accuracy. I've never seen anyone do training specifically on profile faces only, most just convert profile faces into front faces. You can reference to this paper "GridFace: Face Rectification via Learning Local Homography Transformations"; November 8, 2018 at 3:23 AM
Andyrey said...: Thank you, Tsay, this article is interesting, but too much theoretical: no output code on Github.
In my photo database i have several faces of each my colleges in individual folder(to be exact, 128vector features as binaries preprocessed for speed), and having detected face from camera, program finds the one from the database with minimal distance. How do you think, if I prepare binary average and variance over each folder and look for minimal distance to average, and then verify each of 128 features is inside [average +- variance*3] interval, will I achieve more accuracy? Especialy I upset when somebody's recognition goes to wrong individual folder.; November 9, 2018 at 3:07 AM
Unknown said...: This comment has been removed by the author.; November 9, 2018 at 3:38 AM
Unknown said...: Dear Davis:
I finished training with the following results:
step#: 235857 learning rate: 0.0001 average loss: 0.0124465 steps without apparent progress: 13805
step#: 235984 learning rate: 0.0001 average loss: 0.0120583 steps without apparent progress: 13956
step#: 236112 learning rate: 0.0001 average loss: 0.0114196 steps without apparent progress: 14081
step#: 236239 learning rate: 0.0001 average loss: 0.0123198 steps without apparent progress: 14146
step#: 236366 learning rate: 0.0001 average loss: 0.0123314 steps without apparent progress: 14312
step#: 236494 learning rate: 0.0001 average loss: 0.0141606 steps without apparent progress: 14976
Saved state to face_metric_sync_
done training
get_distance_threshold: 0.6
get_margin: 0.04
num_right: 279
num_wrong: 21

This is 93% (279/300) accuracy on a 6M images dataset, however when I implement it on my 1:N face recognition application,
I get the following results on a image with 8 faces in the same frame:
(The score means "1.0-Euler distance" and Unknown_threshold was set to 0.81)
[28532] Cam:(0), face_0_ID(0), Name:(Unknown), score:(0.81)
[28532] Cam:(0), face_1_ID(0), Name:(Unknown), score:(0.78)
[28532] Cam:(0), face_2_ID(96), Name:(Will), score:(0.85)
[28532] Cam:(0), face_3_ID(58), Name:(Adam), score:(0.82)
[28532] Cam:(0), face_4_ID(11), Name:(Eva), score:(0.83)
[28532] Cam:(0), face_5_ID(0), Name:(Unknown), score:(0.79)
[28532] Cam:(0), face_6_ID(47), Name:(Sarah), score:(0.87)
[28532] Cam:(0), face_7_ID(29), Name:(Henry), score:(0.82)

As you can see, the unknown faces score is very near known faces,
which will increase FAR (False Acceptance Rate).

How can I increase the euler distance for different faces,
and what hyperparameters should I change in the dnn_metric_learning_on_images_ex.cpp to improve accuracy?
(I fiddled with learning rates(intial 0.01 -> terminate at 0.0001) and iterations_without_progress_threshold(set to 15000) but without apparent improvement)

Thanks,
Joy; November 9, 2018 at 4:30 AM
Unknown said...: Dear Andrey Zakharoff:
As you can see in the comment above, I am confronting the similar problem myself.
I used the same method as you: find the minimal distance from the database and give its label name to the face if it's smaller than a given threshold. This results in some false recognition: unknown to known and known to other known. I read in this article that you can train a KNN classifier:
https://github.com/ageitgey/face_recognition/wiki/Face-Recognition-Accuracy-Problems
but I still think this is an overshot if you can make the distance accurate via training in the first place.
Joy; November 9, 2018 at 4:49 AM
octf said...: For real, how can I unsubscribe from this thread? I tried many times but new mails always pop up. It's been some months now.; November 9, 2018 at 4:58 AM
Andyrey said...: @octf

1.look down this blog page and find the string
Follow-up comments will be sent to "your_email" Unsubscribe
2. Push hipertext "Unsubscribe". Easy.; November 9, 2018 at 5:18 AM
octf said...: No that's what I've done. I also tried to unsubscribe from the email.
It just says you are already unsubscribed. Nothing really changes; November 9, 2018 at 5:28 AM
Andyrey said...: Dear Tsay Joy, in Issue "Knn classifier #655" of "ageitgey/face_recognition" I wrote my question about using knn method. If you are in the theme, could you answer the question there?; November 14, 2018 at 4:03 AM
Mike said...: @Tsai Joy
Thanks for your valuable insight in your results after training. I am experiencing the same behavior, that unknown faces score very close to known faces as weil. I did not train the network myself but used the data provided by Davis.

Can you confirm this same behavior in your investigations or do you observe this only with the net trained by yourself?
Did you find a solution for a better separation of known and unknown faces?; November 18, 2018 at 2:37 PM
Davis King said...: There are always going to be failure cases with any model. But what you should also do is train something like a linear svm to recognize a specific person if you want to get better performance for that specific person.; November 18, 2018 at 8:17 PM
Andyrey said...: @Davis, Hi Davis, how to train this svm model- one against all, or many in database against all set of unknown?; November 19, 2018 at 3:58 AM
Davis King said...: You train a binary linear svm to decide if it's the person of interest or not.; November 19, 2018 at 7:17 AM
Mike said...: Hi Davis, do you expect a linear SVM to be advantageous for recognizing a large number of known faces, e.g. 10,000 faces, enrolled in the database? I was thinking of the SVM function of supervised learning to be an integral part of the neural net.; November 19, 2018 at 7:31 AM
Davis King said...: If you know who you want to recognize then it's almost certainly better to train a model like a linear SVM on top of the network output. That's true regardless of any other considerations.; November 19, 2018 at 7:36 AM
Mike said...: Thanks Davis, could you point me to the right SVM trainer in dlib? My understanding is that I would need to look into svm_multiclass_linear_trainer, as it is not binary decision.; November 19, 2018 at 7:44 AM
nb357 said...: Hello Davis,

You mentioned training your model with 7485 identities.

What I mean to ask is, could a similar model be trained with a low number of samples per individual identity, but still using a large number of images ( > 1 million)?

The dataset I work with has 2-6 pictures for each subject, and I wonder if it is enough to train a robust recognition model.

I understand this is more of a conceptual question than a technical one about the library, but I take your opinion on the matter could be very helpful.

Thanks!; November 19, 2018 at 8:43 AM
Davis King said...: It will probably not work as well if you only have between 2-6 instances of each identity.; November 20, 2018 at 8:47 AM
Mike said...: Hi Davis, did you see my question regarding the recommended linear SVM classifier? Which one should I look into? we need to distinguish between a greta number of known faces.; November 20, 2018 at 11:16 AM
Davis King said...: I would use http://dlib.net/ml.html#svm_c_linear_dcd_trainer. Train classifiers independently so you don't have to retrain everything when you add one more person. Whichever model gives the largest output wins when you use it.; November 20, 2018 at 3:17 PM
Unknown said...: Dear Davis:
I used the svm_c_linear_dcd_trainer using m(128) with this result:
1 //George01 Same enroll image pushed to 5 samples as svm training label(+1)
0.323433 //Henry01
0.409045 //Henry02
0.425634 //George02
-0.339184 //Peter01
0.3545 //Henry03
0.350297 //Henry04
George, Henry, and Peter are different Asian people which look alike.
The results can see that it can differentiate -0.339184 Peter01 from the enrolled image George01.
Also 0.425634 George02 is indeed the highest score from the SVM same as the enrolled image George01.
However, from your original results in svm_c_linear_dcd_trainer you can make sinc points to nearly 1.0000
and non-sinc points to nearly -1.0000. How can I make same faces to +1.000 and different faces to -1.0000?; November 22, 2018 at 2:05 AM
Davis King said...: I have no idea what you did so I can't say. I have a feeling you didn't check to see what value of C is right to use though. You must read the documentation if you want to know how to use things.; November 22, 2018 at 7:27 AM
Unknown said...: whats the estimate time to train CNN for face detection using CPU?; November 26, 2018 at 4:19 PM
Unknown said...: As an update to the svm_c_linear_dcd_trainer for FR:
I used the disturb_colors from dlib to jitter the enrolled face
(due to our application need to enroll one face with only one image of that person)
and pushed all disturbed color faces into the sample points in vector to train the svm_c_linear_dcd_trainer.

The results looks much better than euler score (1.0 - euler distance):
name: WILL, euler score: 0.6078, SVM score: 0.3353, cosine score: 0.9223
name: WILL, euler score: 0.8007, SVM score: 0.8076, cosine score: 0.9800
name: WILL, euler score: 0.6252, SVM score: 0.3715, cosine score: 0.9286
name: WILL, euler score: 0.7867, SVM score: 0.7762, cosine score: 0.9771

as you can see the euler score used to range from "0.60~0.80"
and now SVM score range from "0.33~0.80"
with this range we can cut the Confidence threshold at e.g. "0.50" and get lower false alarms(lower FAR) e.g. "0.60" & "0.62" these not sure scores will be made to unknown instead of name id "WILL".
However large tests are still needed to verify FAR, I will test on larger enroll databases and update when done testing.
Thanks Davis for all the help!; November 26, 2018 at 10:33 PM
Andyrey said...: @Tsai Joy, good morning!
It is most important task to lower False Positive for Unknown, you are right.
I have some misunderstanding:
1) Do you mean Euclidean distance when say "euler distance"?
2)Is it good, that for the same person image- "WILL", you get so high varianced SVM score?
It is better, if for right person ("WILL") we get much higher score, and lower score for wrong (unknown) person(not "WILL", not "JOHN" etc).
Could you specify, please?; November 27, 2018 at 1:40 AM
Unknown said...: Hi Andrey, yes it's Euclidean score not euler, my bad ;)
(Euclidean score = 1.0 - Euclidean distance)

As for the second question, yes there were not-sure regions (Euclidean score 60~70) where the face recognition(FR) had trouble giving correct results. Therefore, I previously skipped these frames completely, due to the inference is a video, and I have many frames of face images to use.

In short, Euclidean score below 60 was set as "Unconfidence" where the inference face is labeled "Unknown" (Unknown means not enrolled in database). In contrast, Euclidean score above 70 will be labeled as the enrolled name in database e.g. "WILL". For Euclidean score between 60~70, false results will occur, i.e. the FR will think it's someone else "John" when it's actually "WILL".

But all these measures are just a work around for Euclidean score, as the SVM score now can give a more definite "-1" as different faces and "+1" as same faces between inference and enrolled. I guess you can say high varianced SVM score (I normalized from -1~1 to 0.00~1.00) is better in FR application, due to differ the "higher known confidence" and "lower unknown Unconfidence".; November 27, 2018 at 10:40 PM
Andyrey said...: @Tsay Joy Hi Joy, I use formula Probability=sqrt(1.- Euclidean distance), also this is not real probability, but probability-like value. Actually,in this way I de-linearize output value. Due to the changed slope of the function I get stretched scores near 0. and shrinked where Euclidean distance coming to 1. Don't you think, your SVM does something like this?; November 28, 2018 at 3:55 AM

«Oldest ‹Older 201 – 400 of 466 Newer› Newest»