Sunday, February 12, 2017

High Quality Face Recognition with Deep Metric Learning

Since the last dlib release, I've been working on adding easy to use deep metric learning tooling to dlib. Deep metric learning is useful for a lot of things, but the most popular application is face recognition. So obviously I had to add a face recognition example program to dlib. The new example comes with pictures of bald Hollywood action heroes and uses the provided deep metric model to identify how many different people there are and which faces belong to each person. The input images are shown below along with the four automatically identified face clusters:




Just like all the other example dlib models, the pretrained model used by this example program is in the public domain. So you can use it for anything you want. Also, the model has an accuracy of 99.38% on the standard Labeled Faces in the Wild benchmark. This is comparable to other state-of-the-art models and means that, given two face images, it correctly predicts if the images are of the same person 99.38% of the time.

For those interested in the model details, this model is a ResNet network with 29 conv layers. It's essentially a version of the ResNet-34 network from the paper Deep Residual Learning for Image Recognition by He, Zhang, Ren, and Sun with a few layers removed and the number of filters per layer reduced by half.

The network was trained from scratch on a dataset of about 3 million faces. This dataset is derived from a number of datasets. The face scrub dataset[2], the VGG dataset[1], and then a large number of images I personally scraped from the internet. I tried as best I could to clean up the combined dataset by removing labeling errors, which meant filtering out a lot of stuff from VGG. I did this by repeatedly training a face recognition model and then using graph clustering methods and a lot of manual review to clean up the dataset. In the end, about half the images are from VGG and face scrub. Also, the total number of individual identities in the dataset is 7485. I made sure to avoid overlap with identities in LFW so the LFW evaluation would be valid.

The network training started with randomly initialized weights and used a structured metric loss that tries to project all the identities into non-overlapping balls of radius 0.6. The loss is basically a type of pair-wise hinge loss that runs over all pairs in a mini-batch and includes hard-negative mining at the mini-batch level. The training code is obviously also available, since that sort of thing is basically the point of dlib. You can find all details on training and model specifics by reading the example program and consulting the referenced parts of dlib.  There is also a Python API for accessing the face recognition model.



[1] O. M. Parkhi, A. Vedaldi, A. Zisserman Deep Face Recognition British Machine Vision Conference, 2015.
[2] H.-W. Ng, S. Winkler. A data-driven approach to cleaning large face datasets. Proc. IEEE International Conference on Image Processing (ICIP), Paris, France, Oct. 27-30, 2014

343 comments :

«Oldest   ‹Older   201 – 343 of 343
Suren Tamrazyan said...

Hi Davis,
I'm interested in your opinion, does it make sense to take the first few layers of resnet(vgg, inception), freeze their weights, add new layers and train using a set of faces.

Tapas said...
This comment has been removed by the author.
Tapas said...

Hello Davis,
I got it working. I simply created an object of dlib.rectangle by giving the image information as constructor arguments and passed as second argument to facerec.compute_face_descriptor. It working.

Thanks

Jumabek Alikhanov said...

Hi Davis,
Thanks for this cool stuff,
I wonder about the face descriptor computation time.
My core-i7, SSD, GTX1080 GPU, takes 0.35 sec to extract feature for a single face without any image jittering or augmentation.

Is that normal?
It seems to slow to me somehow for real-time purposes?

Davis King said...

That's very slow. You are probably not using cuda, blas, or any other such optimizations. When you compile cmake will print messages telling you what it's doing. You can see if it's using these things.

miguel said...

Hi Davis,

First of all congratulations on your work! Really impressive, and dlib is for me one of the
greatest ML tools around.

I am trying to retrain you model for my type of images (works very well, still I would to train on my own set.). I have close to 500K identities, the problem is that I have 1M images (two per subject). Do you think I still can get a good model even without several images per subject?

Thanks in advance,
Miguel

Davis King said...

Thanks, I'm glad you like dlib :)

You can try with only 2 per subject, although I'm pretty sure the resulting model isn't going to be very good. The general consensus in the research community seems to be that you need a lot of within-class examples to learn this kind of model. That's also been my experience as well.

miguel said...

I understand what you're saying, but I will need to give it a shoot since my case is very specific. Other way might be to do some transfer learning, I don't know exactly how I can do it but I will need to take a look.

Anyway, thank you a lot.

Cheers,
Miguel

AMG4ever said...

Hello Davis, I wander, is it possible to "straighten" a detected face using dlib? Here's an example :
https://i.stack.imgur.com/4Y9HD.jpg

I only care about landmarks

Davis King said...

You can rotate them upright. But there isn't any 3D face warping in dlib if that's what you are asking about.

Siddhardha Saran said...
This comment has been removed by the author.
mehmet ali atici said...

instead of http://dlib.net/files/shape_predictor_68_face_landmarks.dat.bz2

Can I use my own landmark detector dat file which is trained by dlib_shape_detector but detects 90 points not 68?

mehmet ali atici said...
This comment has been removed by the author.
Davis King said...

You can do whatever you want so long as the faces are cropped and aligned the same way as the dlib example shows.

Саша Таранов said...

It is no doubt awesome post! Great job Davis KIng! And Thank you!

Саша Таранов said...

It is no doubt awesome post! Great job Davis KIng! And Thank you!

Davis King said...

Thanks :)

Саша Таранов said...

Thank you Davis King! It works pretty good!

Саша Таранов said...

I have performed the so called t-SNE on the faceScurb database (5 pictures per person) with Dlib face descriptor. Maybe it would be interesting to someone other, - final picture could be found here https://drive.google.com/file/d/0B7JrJeplhKLveFgyRDlPNHBUR28/view?usp=sharing

Giorgos B. said...

Hello Davis,
I would like to train an object detector with 8 classes (dog,cat and some other animals) but i'm willing to execute it with a video as input and so, i would like it to be as quick as possible. I've already tested the train_object_detector.cpp, but it is really slow and decreases the video's frame rate (due to high resolution). Which is the fastest detector i could use? Is there any particular solution you could propose?
Really thanks, i've been working months now on dlib, and still i have many things to learn... :)

Davis King said...

The HOG detector in that example is the fastest one available. Also, http://dlib.net/faq.html#Whyisdlibslow

Bill Klein said...

I noticed in my tests that:

1) A face without a mouth visible got detected as a face

2) When comparing the descriptor of said mouthless face, with the descriptor of a face of the same person with a mouse, we still get a very close distance. I.e. it correctly finds them to be the same person!

Am I right in assuming that this is because the mouth data is not used in the descriptor?

(I know that this is not a dlib-specific question and has more to do with the deep learning involved, but I'm not sure where else to ask this. Any tips for other forums?)

Thank you!

Davis King said...

No, the whole face crop is used in the computation, including the mouth. The point of this thing is to be robust to all kinds of changes to someone's face that still preserves their identity. So it's good that this works like this.

kim said...

Hi, first I have to say: great work, Dlib has helped me a lot.

Now for the question, is it possible to identify which image gave which face after the clustering?

Thanks :)

Davis King said...

Thanks, I'm glad you like dlib.

Yes, you can find that out. Look at the code. It's trivially available information in the example program.

Sobhan Mahdavi said...

Hi dear Davis, thank you very much for your works,
The mmod_human_face_detector is a great model. As you mentioned, your face detector for preprocessing of face recognition is HOG-based frontal face detector.
Can I use dnn face detector model in face recognition and have the same performance?

Davis King said...

You can use any detector so long as you are able to align the faces the same way. To do this with the CNN model you need to use the 5-point face landmarking model released with the newest dlib. When using the 5-point model you can use either the CNN or HOG face detector and they will both give the same performance.

Sobhan Mahdavi said...

Dear Davis, Thanks for your quick reply
I used CNN model with 5-point face landmarking, but I have an error:

Error detected in file c:\dlib\dlib\image_transforms/interpolation.h.
Error detected in function struct dlib::chip_details __cdecl dlib::get_face_chip_details(const class dlib::full_object_detection &,const unsigned long,const double).

Failing expression was det.num_parts() == 68.
chip_details get_face_chip_details()
You must give a detection with exactly 68 parts in it.
det.num_parts(): 5

The code is here:
auto shape = sp(img, det);
matrix face_chip;
extract_image_chip(img, get_face_chip_details(shape, 150, 0.25), face_chip);
face = move(face_chip);
matrix face_descriptor = net(face);

Can you help me?

Davis King said...

Think about it. You are trying to use a model that wasn't created until dlib 19.7, but you are using an older version of dlib. How can that work?

Mayur Patel said...

Hello Davis
i have installed CUDA then compiled dlib and opencv its working complete. i want to know if "dlib.face_recognition_model_v1.compute_face_descriptor" function utilizing CUDA?if not i have to write wrapper for python? or something like that. bcz i have to performance diff before and after installation of CUDA?
Thank you.

Davis King said...

dlib.face_recognition_model_v1.compute_face_descriptor uses CUDA.

Mayur Patel said...

but there is no speed diff after and before compiled with CUDA? does it mean that CUDA makes not difference in this function?

Davis King said...

Look at the CMake output when dlib is built. It will tell you if it's using CUDA or not.

Duc Vo said...

hi Davis,

I found out this line of code: std::vector> face_descriptors = net(faces); takes most time. Each face image will takes around 300ms to convert into face descriptor. Any chance to reduce that?

Thanks.

Davis King said...

Be sure to link to the Intel MKL if running on the CPU, or even better, use a fast GPU.

Duc Vo said...

Yeh I compiled with CUDA and it runs faster now. Thanks :)

Mayur Patel said...
This comment has been removed by the author.
Mayur Patel said...

hi Duc how much time it takes now after compiling with cuda? mine is 170ms after and before cuda but in both cases with Intel MKL how much yours now?

Duc Vo said...

This line of code

std::vector> face_descriptors = net(faces);

takes 0.44 seconds on 17 faces so it is around 23ms per face. My GPU is Nvidia 870M GTX.

Cheers,

Mayur Patel said...

which compiler you used? and platform is windows? and how did you compiled with cuda can you tell me?

Duc Vo said...

I compile on Linux.

The standard way to compile is to use Cmake as recommended by Davis. I first run cmake-gui to enable DLIB_USE_CUDA, and other options like DLIB_JPEG_SUPPORT, DLIB_PNG_SUPPORT, etc... After that just run following commands( refer here: http://dlib.net/compile.html )

cd examples
mkdir build
cmake-gui ( this is when you enable CUDA as mentioned above )
cd build
cmake ..
cmake --build . --config Release

Good luck.

ps: you said it takes 170ms, which is per face or on the whole vector of faces?

Mayur Patel said...
This comment has been removed by the author.
Mayur Patel said...
This comment has been removed by the author.
Ashok Bugude said...

Hi Thanks for the great work

Can I please know if there is any way to get the names of the person for each of clustered group.

Basically wanted to train say 5 sets of people and recognize them in an image

Tapas said...

Hello Davis,
Thanks for such a nice library.
I am facing a problem. In continuing to my comment posted on August 24, 2017 at 3:21 AM.

Once I get the face bounding box from video frame, I am extracting 128D features from bounding box. As I got the bounding box of detected face, I saved the face to disk. Reloading the saved face and again extracting 128D features(using dlib.rectangle on whole face-image)...These features are not matching with previous bounding box features. Why are the features not matching ?

Thanks
Tapas

Bill Klein said...

I'm wondering if anyone has experimented to determine the minimum face resolution that will result in a reliable computation of the descriptor. For example, will a 40x40-pixel face be comparable to faces of higher resolution?

mehmet ali atici said...

Hi Davis;
in python example, what is the role of shape in line ... compute_face_descriptor(img, shape)? Does the recognition model calculate the descriptor according to five landmarks? if so, is it possible to use another shape models -for example- that find 10 landmarks? in this case, does the distance threshold (0.6) change?

thanks in advance.

Davis King said...

The landmarks are only used to align the face before the DNN extracts the face descriptor. How many landmarks you use doesn't really matter.

mehmet ali atici said...

Hi Davis,

It seems that the size of output layer of the network model is 128 which corresponds to 128D vector. How the 128D vector is extracted from an image for training?

Davis King said...

This example program shows how to train the model from images: http://dlib.net/dnn_metric_learning_on_images_ex.cpp.html

Jon Hauris said...

Hello Davis, How do I determine which image the "image index" refers to. Specifically:
I am training my own detector and received the following:
"RuntimeError: An impossible set of object labels was detected ..."
1. It said that the problem was "image index 1017". How do I find which image this is referring to in the xml file?
2. It also give the "truth rectangle" and "nearest detection template rect:" with their bounding box params. None of which match any of my bb's. What are these rectangles referring to?
3. Where do I adjust the "match_eps"
Thank you, Jon

mehmet ali atici said...

Hello Davis;
Do You plan to provide python api for dnn metric learning?

Thanks.

Davis King said...

Like this? https://github.com/davisking/dlib/blob/master/python_examples/face_recognition.py

mehmet ali atici said...

No, I mean python equivalent for http://dlib.net/dnn_metric_learning_ex.cpp.html

Davis King said...

No, I'm not going to add that since it's impossible to define the network architecture from python.

Mike said...

Hi Davis,
have you ever tried it on a Jetson TX2. I wonder how fast it would be?
Is there any chance to optimize it on an ARM Cortex A7 (dual core) reaching app. 3-5fps?
Or would you rather say forget it?
Thanks!

Davis King said...

I haven't used a Jetson, but that's a very popular way to run these DNNs. Most people find the performance to be quite reasonable.

Bill Klein said...

Mike, I had posted a question about this in the TX2 forum after I did a bit of testing:

https://devtalk.nvidia.com/default/topic/1025670/jetson-tx2/dnn-face-detection-performance-on-tx2/post/5217046/

At least for my particular test (dlib DNN-based face detection on high-res images), it appears that the Jetson TX2 is ~10x slower than a GTX 1070. Please do let us know what you find. :)

Mike said...

Bill, thanks for your feedback. I have measured the execution time for the dnn_face_recognition_ex on a Jetson TX2.
It is a Release compile (though with debug info) using CUDA.
The time is exclusive of loading the data and displaying the images, just the inner execution: 4204 ms.
Is that in line with your findings?
I will get rid of the debug info and play with compiler settings.

Bill Klein said...

Hey Mike,

Firstly, I haven't executed the dnn_face_recognition_ex example specifically. Sorry. Secondly, be sure to do a few iterations of whatever you are trying since the first few (?) may be much slower than the others, due to things being initialized the first time around...

Mike said...

Hi Bill,
I do run the example several times but it does not get any better that 4,2 seconds. It detects 24 faces in total of 4 different guys, so my take is 5-6 fps. Any hints on compiler options to check?
Next, I will compare this result with a standard dual core ARMv7@1GHz, eventually using NEON and VFP support...

Mike said...

Hi Bill,
the same code on an ARM Cortex-A7 @1GHz takes 130 seconds. It is a release compile with -march=armv7-a -mtune=cortex-a7 -mfpu=neon-vfpv4 -ffast-math -Ofast ...

Bill Klein said...

Sounds like a decent speed-up. :)

Mike said...

Hello Davis,
I am right in assuming that the dnn_face_recogntion_ex as of V19.7 uses the HOG-based frontal face detector and NOT the CNN-based one? So I should be able to expect performance improvements for the landmark extraction by employing NEON?

Davis King said...

Yes, that example uses HOG. You could just as easily use the CNN face detector with it though.

Tsai Joy said...

Dear Davis:
Recently i've been wanting to implement the OpenMP "#pragma omp parallel for"
in the dlib\matrix\matrix_default_mul.h
specifically like this:

for (long r = lhs_block.top(); r <= lhs_block.bottom(); ++r)
{
for (long c = lhs_block.left(); c<= lhs_block.right(); ++c)
{
const typename EXP2::type temp = lhs(r,c);
#pragma omp parallel for //<----------------inserted OpenMP line here
for (long i = rhs_block.left(); i <= rhs_block.right(); ++i)
{
dest(r,i) += rhs(c,i)*temp;
}
}
}
however when i run examples e.g. dnn_face_recognition_ex.cpp
i don't see multi-core processing (via Intel's VTUNE tool) when the get 128D line runs:
std::vector> face_descriptors = net(faces);

Where should i put the "#pragma omp parallel for" to enable OpenMP multi-core processing in dlib?

Davis King said...

Linking dlib with the Intel MKL is a much better approach to get the kind of speed boost you are looking for. The CMake scripts are already setup to do it if you install the MKL.

Mike said...

Hi Davis,
are the dlib-for-ARM improvements by fastfastball now part of dlib-19.7, e.g. SIMD for NEON and threading?
Or would I have to redo the changes for 19.7?

Davis King said...

All that stuff is now part of the main dlib codebase, so yes, it's there. You don't need to do anything to get it.

Kevin Tian said...

Hi Davis,

Thank you for your great work! It really help me a lot in my project.

Regarding to the accuracy rate of 99.38% on LFW, do you only do the test on 1680 people pictured with more than one photo? The LFW says there are some incorrectly labeled photos. How do you process these photos? Manually correct them or ignore them in your test?

How do you do the recognition testing? Do you calculate all photos' 128D feature and then compare with each other and see whether the distance between same person's photo is less than 0.6?

Thank you in advance!

Kevin

Davis King said...

I follow the exact evaluation protocol laid out by the LFW challenge. This file contains the entire test script for the dlib model: http://dlib.net/files/dlib_face_recognition_resnet_model_v1_lfw_test_scripts.tar.bz2. You can run it and see the LFW evaluation outputs.

Kevin Tian said...

Hi Davis,

Thank you for your reply!

I have a question about training face model. Could you give me some comments?

If I increase face images from 3 million to 6 million. Then, will the trained model work better to verify person's face? For example, accuracy rate is increased and false positive rate is decreased.

According to your experience, is there an upper boundary for the recognition capability? It means that recognition capability will not increase, although the number of training face image increases.

Best regards,
Kevin

Davis King said...

Yes, more data is better. For instance, Google trained a face recognizer on 200 million faces and got great results.

Yury Savitskiy said...

Hello Davis,

Could you please answer how do you forming mini batch?

Do you take some number of different persons and some number of their unique images? For example, you choose 64 persons and for each take 8 images so the size of mini batch will be 512. Or you just take some random images and for one person you have 10 images for second 3 and so on.

Yury Savitskiy

Davis King said...

You can make the mini-batches any way you want. To see what I did, refer to the metric learning example program.

arash samurayi said...
This comment has been removed by the author.
arash samurayi said...

Hi thank you davis for your well documented great work

I have encountered a problem, firstly i build dlib with mingw32 and i am using it in Qt, everything is ok, when i use a dlib function it does it job no problem but after i closed the application the process of that application have not been closed, there is still a process named my application.

well i couldn't find anything relevant to the problem

i might add i tested it in windows 7 and windows 10, both of them same

do you have any idea what is it going on?

Dario Ravarro said...

How to pre-filter images during recognition phase, for liveness detection?
This to avoid using photos shown to the webcamera and get positive recognition. We only want alive people in front of the camera.
Dario

Davis King said...

There aren't any functions in dlib that do this, so you will have to roll your own.

Lucas Partisse said...

Hi, congrats for dlib, it rocks !
Can you be more precise for the dnn used for face comparaison an what it does exactly ? Thanks

Davis King said...

The entire network is defined in the example program linked to from this blog post. If you read the example program you will find all the details.

arash allahyari said...

Hi again

i wanted to run baldguys face recognition example but in this line:
faces.push_back(dlib::move(face_chip));
i get error that says 'move is not member of dlib'

i should add example works without 'move' function, I was just wondering, what is 'move' function doing. not using it will reduce accuracy of face recognition?

thank u for your great work

Davis King said...

Right, there is no dlib::move(). Happily, that's not what is in the example code. You must have put that there yourself. Get the unmodified example and it will work.

arash allahyari said...

Hi
regarding 'move' function, u mentioned maybe i added that myself
No i didn't put anything new to the example, it was like this:

http://dlib.net/dnn_face_recognition_ex.cpp.html

Davis King said...

This is what is in the example:

faces.push_back(move(face_chip));

There is no dlib::move in there.

arash allahyari said...

oooh sorry,
I was mixed up between namespaces................

Chris Nase said...

Hi Davis,

Very new to deep learning, but I'm used to seeing trained models with a .params and .json file. I see yours is a .dat. I'm trying to get this to work on the AWS deeplens and having some trouble. Is there a way to turn the model into a .params and .json format?

Davis King said...

That's not how this works. If you want to use dlib's models use dlib.

miguel said...

Hi Davis,

Just open question, I saw that dlib has a repeat funcionality that allows to use much less memory during compilation (not sure if during execution). Is it possible to convert models without this replication layer from the models without it? Specifically, is it possible to convert this face model to use dlib::repeat?
Thanks,
Miguel

Davis King said...

repeat is just a convenience. It doesn't make things faster or slower. And it only makes a substantive impact on compile times when using visual studio since visual studio has not so great template compilation in general. For gcc or clang it doesn't matter.

No, there is no conversion.

Mike said...

Hello Davis,
you stated that the trained model has a 99.38% accuracy on the standard LFW face recognition benchmark. Is there a metric how that would translate into FAR/FRR values?
Thanks!
Michael

Davis King said...

The FAR and FRR rates you get are going to be heavily dependent on your application and how you use it. So no, there is no general FAR or FRR value. For example, the larger the database of faces you are searching the more likely you are to get a false positive.

k_man said...

Hello Davis!

Im trying reproduce your results on LFW data set. i saw the code you provided, ran him and got the same result. but when i look into the code, i saw that your function is running on get_lfw_pairs() witch return the pairs of images with rectangle for indexing. then it chooses the best_det according to overlap.
what are those rectangle that are coming from get_lfw_pairs()? (the other ones i get they are from the detector...)

Thanks!

Davis King said...

If the detector didn't find a face then the box is just the box in the center of the image, which is where the face is nominally supposed to be.

Bill Klein said...

I've been playing with dnn_metric_learning_on_images_ex (having read all the docs / comments) but there are still a few things that I'm not sure about:

- load_mini_batch ensures that the batch doesn't re-include the same person twice. However, when choosing the samples for a given person, it doesn't try to avoid including the same sample twice. Is this by design / ok? Will we run into problems if some of the persons in the training set have fewer than samples_per_id samples available?

- I read that for dlib_face_recognition_resnet_model_v1 you used a mini-batch size of 35x15 instead of 5x5. Is that just for performance reasons or would the results have been significantly different?

Thanks!

Davis King said...

The images are jittered, so even if the same image is included it's fine. You could experiment to see if it would be better to avoid duplicates, but I doubt it matters, at least for most datasets.

Yes, the batch size is very significant. Some sizes lead to much higher accuracy models.

arash allahyari said...

Hi

what if i want to classify cars with metric_loss using dnn_metric_learning_images example

do you think i will achieve acceptable accuracy?

Davis King said...

It might work great, the only way to know is to try and see what happens though.

karan purohit said...

Thanks for this great package!
I using face recognition api. I want to know that how it needs only one image for recognition?
or it just recognizes a difference between two image.

arash allahyari said...

Hi

I wonder how Length function calculates difference between two face vectors?

and can i find a accuracy percentage in that number which is less than 1,
for example 0.5 means 90 percent

Lucas Partisse said...

How can i compute a matching metric between 2 faces ?

Davis King said...

Read the blog post and the linked example program, it's literally about answering that question.

Lucas Partisse said...
This comment has been removed by the author.
Lucas Partisse said...

Yea i saw that. I would like to make a matching metric. For example i have a standard image named img0 and several others img1 img2 img3. I would like to have a percentage of match between img0 and {img1, img2, img3}. I can't do this with graph clustering.

I made this but i don't known if it's right :
perc_mathing1 = 1-[length(img0-img1)/length(img0)]
perc_mathing2 = 1-[length(img0-img2)/length(img0)]
perc_mathing3 = 1-[length(img0-img3)/length(img0)]

lucas martinez said...

The neural network is trained with 7485 persons. But it can recognize a person which is not is the database.
The tested person will be close to the trained person with she shares the most physical similarities ?

KBN said...

Hi Davis,

I have a question about the 3 million data set. When you said the network was trained on 3 million faces' data set, does it mean that there are 3 million distinct faces or the number of distinct faces is lesser than 3 million? If so, how many are distinct faces out of the 3 million data set?

Thank you.
Marc

Davis King said...

3 million images. The number of identities is in this blog post.

Srinivasan Rajaraman said...

Hi Davis,

It is needless to say that your face recognition network works great! However, I am curious to know how did you select the distance threshold (0.6)? Did that parameter affect the training rate or time or accuracy? Also, what is the bound on the norm of the 128-d embedding that you get from an image? The FaceNet (Google team) restrict the norm of the 128-d embedding to be unity. But your network does not give that. I am wondering if you implicitly bound the norm of the 128-d embedding. I look forward to your reply.

Davis King said...

I picked 0.6 empirically. It doesn't have any grand significance. 0.6 gave the best results.

I didn't place any bound on the norm. The loss function is constructed in a way that doesn't depend on there being any particular bound. Although, the relative setting of the threshold (the 0.6) and the amount of weight decay implicitly determine the scale of the norm. Look at the documentation for the loss layer for the details.

Srinivasan Rajaraman said...

I couldn't find the documentation for the loss layer in your website. I typed loss layer / metric in the quick search bar of the documentation page, but didn't find anything related to that. The only thing I can see are C++ source codes, which are hard to parse through and understand the math behind your loss metric. Please correct me if I am wrong, I thin you are using the hinge loss with a threshold of 0.6 as described in the paper (http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf)

If this is how your loss metric function is, then did you consider all pairwise image combinations in your dataset, unlike the triplet loss combination chosen by FaceNet paper by Google team? The data size becomes huge when you have to consider all pairwise image combinations.

Davis King said...

http://dlib.net/ml.html#loss_metric_

Click on more details and read the big comment about the loss layer.

Srinivasan Rajaraman said...

Hi Davis,

Thanks a lot! I went over the details and it looks like your loss metric is very similar to the Hadsell 2006 paper in that you also force the embeddings of similar images to lie within the distance threshold in addition to forcing the embeddings of dissimilar images to lie outside the threshold. The one last thing is the hard negative image mining of the non-match pairs. How do you decide the N worst non-match pairs? What is the metric here? Do you consider those N pairs among all the non-match pairs whose mutual Euclidean distance is farthest from 0.6?

Davis King said...

You sort them by distance and take the pairs that are the most wrong.

KBN said...

I've a simple question. pardon my ignorance.

The face recognition example uses 2 .dat files. 1 for face landmark 1 for dnn.
If I were to replace the dnn model with one that has been trained with 10 million faces, do I also need to rebuild the face landmark file? If so, which example file should I refer?

Thank you.

arash allahyari said...
This comment has been removed by the author.
arash allahyari said...

HI

i was wondering which of 5 point or 68 point face landmark algorithms could give better accuracy in face recognition

i saw you used 5 point in example of face classification and 68 point in code for testing overall accuracy

Davis King said...

The documentation has links to all relevant examples for each object. E.g. http://dlib.net/ml.html#shape_predictor_trainer

Also, there is no reason why retraining a face recognition model would invalidate a shape predictor model.

The 5 and 68 point models should give the same face recognition accuracy. I recommend using the 5 point model because it's smaller and faster.

KBN said...

A question about the training database.
What is your opinion of the noise in the database for training? I realized that those large available databases often contain noise (e.g. out of 100 images of ID_person1, some of them are not ID_person1).
Should I spend time to manually remove those noise? If not, to what percentage should those noise be considered as 'within the tolerant'?
Does your 3m faces also contain some noise? Do we know the % of noise?

Thank you.

ismail josh said...

Hi Davis.
According to their paper, FaceNet training takes 1000-2000 hours. What about yours?
Can you compare your model with the FaceNet?

Davis King said...

The quality of the training data is very important. I spent a lot of time fixing errors in my dataset and each round of fixing errors notably improved the resulting model. You should spend time doing this, retraining to see how much the improvement is, and repeating that until you get tired of doing it or until the results stop improving. My resulting dataset is quite accurate.


Training the model in dlib takes about a day on a 1080ti.

ismail josh said...

How/When to decide a new training is required? Any hint? If we use a classifier (such as one-class SVM or multiclass SVM) on top of your system (in 128x1 space), does it increase the system performance?

Davis King said...

Do experiments to find out. If all else fails then you need to retrain.

Mike said...

Hello Davis,
on an ARM based embedded system we have neither CUDA nor AVX nor OpenCL. Wo you encourage looking into employing NEON and VFP units to achieve a worth while speed-up? Or is the structure of the computations unsuitable for either one?
Thanks!

Davis King said...

NEON is very good, I would use it. Although the DNN code in dlib doesn't do anything with it currently. You can turn on gcc options that automatically use it and use profile driven optimization and maybe that will get you part of the way there.

Tapas said...

Hello Davis,
Thanks for wonderful model and library.
In recent version of dlib 19.9, I used face_clustering.py to clean up a big data set. I meant, I preserved faces of biggest cluster. Now when I try to run dnn_metric_learning_on_images_ex, I ame getting error:
EXCEPTION IN LOADING DATA
jpeg_loader: error while reading /tmp/cluster_sample_5/n005387/face_162.jpg

Please help.

Thanks

arash allahyari said...
This comment has been removed by the author.
arash allahyari said...

hi
i wanted to convert matrix rgd_pixel to opencv Mat with toMat function but i get this error

type is not a member of cv::DataType dlib::rgb_pixel

this error comes from to_open_cv.h

what am i doing wrong?

Tapas said...
This comment has been removed by the author.
Davis King said...

The distance can be any number >= 0.

johnpuskin99 said...
This comment has been removed by the author.
johnpuskin99 said...
This comment has been removed by the author.
johnpuskin99 said...
This comment has been removed by the author.
Davis King said...

Copy the mmod_rects into rectangles.

Bill Klein said...

I have noticed that descriptors of b&w photos of people seem to skew towards being close to descriptors of other b&w photos. I assume that this is due to (at least partially) representation bias in the training set. This makes me wonder what would happen if everything were retrained on the same training set, but with all training data converted to grayscale beforehand. Would we lose the bias without losing any recognition accuracy? Has anyone tried it?

Davis King said...

Yes, it's not going to work as well with black and white pictures since it's trained on color pictures. It's likely that it would be somewhat better for black and white images if trained specifically on black and white images. I haven't done this though.

Mike said...

Measuring performance on the Jetson TX2, I found most of the processing time is still spend on face detection. I already did some shortcuts like limiting the pyramid levels to 2 (object_detector) and spatial subsampling the input image by 2 in each direction.

Here are the results in ms for
face detection = 66
5 landmarks = 6
face chip extraction = 2
dnn vector calc = 10

Are you aware of the paper: Compact Convolutional Neural Network Cascade for Face Detection
(https://arxiv.org/ftp/arxiv/papers/1508/1508.01292.pdf)
The authors claim to have found a "new level of performance/speed ratio for the frontal face detection problem".
This would be an excellent candidate for dlib...

Davis King said...

I haven't seen that paper, but that kind of cascade is very common and certainly generally improves detector speeds. At some point I'll add a faster version of the CNN face detector to dlib that uses a cascade, but it wouldn't be until the end of the year at the earliest.

johnpuskin99 said...

Hi Davis,

Your model trained on color images. When testing/comparing faces, I plan to use a color image and corresponding gray valued ones. In other words, for a given face, two (128x1) feature vector will be constructed. The smaller distance between two faces will be accepted as a real distance between faces. What do you think about this working schema?

Le Xuan Tuan Anh said...

Hi Davis King
I'm try to find a papers related to what is a target output for training? (How to set 128D value target for training)
Can you reference for me some paper that show it ?

arash allahyari said...

Hi

i encountered an error in face recognition process, it happens when two faces are partially covering each other and i pushback those two inside vector of matrix then the net and ....

but when i process those two face separately no problem

i wonder what could be the problem?

Tapas said...

Hi Davis King,

How can I change the distance_threshold and margin for loss_metric_. The easiest way is to change in loss.h. Any other better way to do the same.

Thanks

Davis King said...

You can set them by calling the loss_metric_'s constructor.

Tapas said...

Very Thanks for prompt reply. I am calling loss_metric_'s constructor in compute_loss_value_and_gradient, but both margin and thresholds not changing. Assigning new values directly to margin and dist_thresh in same function(compute_loss_value_and_gradient) is giving readonly error for these two variables. If doing something wrong then apologies. But still any hint would be highly appreciated

Thanks

«Oldest ‹Older   201 – 343 of 343   Newer› Newest»