Saturday, September 23, 2017

Fast Multiclass Object Detection in Dlib 19.7

The new version of dlib is out and the biggest new feature is the ability to train multiclass object detectors with dlib's convolutional neural network tooling.  The previous version only allowed you to train single class detectors, but this release adds the option to create single CNN models that output multiple labels.  As an example, I created a small 894 image dataset where I annotated the fronts and rears of cars and used it to train a 2-class detector.  You can see the resulting detector running in this video:

If you want to run the car detector from this video on your own images you can check out this example program.

I've also improved the detector speed in dlib 19.7 by pushing more of the processing to the GPU. This makes the detector 2.5x faster.  For example, running the detector on the 928x478 image used in this example program ran at 39fps in the previous version of dlib, but now runs at 98fps (when run on a NVIDIA 1080ti).

This release also includes a new 5-point face landmarking model that finds the corners of the eyes and bottom of nose:

Unlike the 68-point landmarking model included with dlib, this model is over 10x smaller at 8.8MB compared to the 68-point model's 96MB.  It also runs faster, and even more importantly, works with the state-of-the-art CNN face detector in dlib as well as the older HOG face detector in dlib.  The central use-case of the 5-point model is to perform 2D face alignment for applications like face recognition.  In any of the dlib code that does face alignment, the new 5-point model is a drop-in replacement for the 68-point model and in fact is the new recommended model to use with dlib's face recognition tooling.


Anh Tuấn Hoàng said...

Thank Davis King about the library. It helps me in my work.

Bill Klein said...

Great new stuff. You say that the "new 5-point model is a drop-in replacement for the 68-point model and in fact is the new recommended model to use with dlib's face recognition tooling." However, two questions:

- Is it recommended because the results are better or just because it's faster/lightweight?

- I know you say that it is a drop-in replacement, but does that mean that a face aligned in with the 68-point model can be compared directly (distance between descriptors) to a face aligned with the 5-point model without fear of any issues?


Davis King said...

The results should in general be the same, but it's faster and smaller. The alignment should actually be slightly more accurate in general, but not by a lot. The real benefit is speed, size, and ability to use it with the CNN face detector in addition to the HOG detector.

Yes, you can just replace the old shape model with the new model in any face recognition code that used the old one and it will work. I specifically made this new model to be a replacement for the old one. It will create the same kind of alignment as the old model and work with the previously trained face recognition model.

erm said...

Hello Davis King,

I was trying to compile the new release of dlib and I am having some inconvenients that I want to share with you.

Compiling on Windows
I used "dnn_face_recognition_ex.cpp" as test code. I had no problem compiling it using dlib-19.3 and dlib-19.4 in Visual Studio 2015 with cuda 8, but with dlib-19.7 I had the following errors:

1) dlib.lib(gpu_data.obj) : error LNK2005: already defined "void __cdecl dlib::memcpy(class dlib::gpu_data &,class dlib::gpu_data const &)" (?memcpy@dlib@@YAXAEAVgpu_data@1@AEBV21@@Z) in dnn_face_recognition_ex.obj

2) dlib.lib(gpu_data.obj) : error LNK2005: already defined "public: void __cdecl dlib::gpu_data::set_size(unsigned __int64)" (?set_size@gpu_data@dlib@@QEAAX_K@Z) in dnn_face_recognition_ex.obj

I tried using cudnn5 and 7 (no diference) and using the CMakeLists.txt in dlib folder from an older version (other errors appeared) that worked correctly for me.

I was wondering if maybe we have to follow different steps in order to compile this new version, or maybe the minimum requirements of the required software have changed or maybe something happens with Policy CMP0007, because I had a warning that said it was not set.

Compiling on Linux
On Linux I had no problem to compile and run dlib-19.3 and 19.4 in the past. Now with dlib-19.7 it appears the old problem of #define DLIB_JPEG_SUPPORT. When I run the cmake it does successfully, I checked if the DLIB_JPEG_SUPPORT was ON and if the code entered (in CMakeLists) in the JPEG FOUND statement and if the libjpeg library was found and all was right. Then the build at Release mode is also made correctly. But when I ran the code I had the problem of unable to load jpeg images because of the DLIB_JPEG_SUPPORT :( This just can be solved if I put a #define DLIB_JPEG_SUPPORT at the top of the cpp code.
Here I was wondering if something changed compared to previous releases, this is a bit strange to me because I had no problem with them.

Sorry for this long and boring text and thank you very much for your time and effort :)

Davis King said...

Nothing has changed in how dlib is built. You must just be making some kind of mistake. Follow the instructions at the top of this page to compile the example programs: Read the example cmakelists.txt file.

erm said...

Thank you for your fast answer and for your time. At least now I now that nothing has changed. I keep trying it. Regards!

Anh Tuấn Hoàng said...
This comment has been removed by the author.
Anh Tuấn Hoàng said...

Hi Davis King, Can you give me some advice about system specification?

Davis King said...

Get a NVIDIA 1080ti.

Phil said...

Long time user - first time writer. Thanks very much for your code.
We have built and used dlib in many situations (CPU and GPU) on many systems,
We are running your classifier as serialized in the code,
but on one particular Windows box, when we run face_detection (close enough to dnn_mmod_face_detection_ex), we get the following error:
Error detected at line 682.
Error detected in file e:\src\9.0-2017\_extrnheaders\dlib\dnn/loss.h.
Error detected in function void __cdecl dlib::loss_mmod_::to_label,class dlib::add_layer,class dlib::add_layer,class dlib::add_layer,class dlib::add_layer,class dlib::add_layer,class dlib::add_layer,class dlib::input_rgb_image_pyramid >,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,1,void>,classstd::vector >*>(const class dlib::tensor &,const class dlib::dimpl::subnet_wrapper,class dlib::add_layer,class dlib::add_layer,class dlib::add_layer,class dlib::add_layer,class dlib::add_layer,class dlib::add_layer,class dlib::input_rgb_image_pyramid >,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,1,void> &,class std::vector > *,double) const.

Failing expression was output_tensor.k() == (long)options.detector_windows.size().

Any hints as to what could be a machine dependency here? This seems to me to be entirely software defined.
BTW, we are definitely seeing the 25x speedup with the GPU - great job!

Davis King said...

Thanks, glad you like dlib :)

This should definitely not happen and there shouldn't be anything machine specific in the code. If I had to guess I would check if there is something wrong with the GPU that is causing it to output empty tensors, which itself shouldn't happen, but maybe something is horribly wrong with CUDA on that machine.

Phil said...

Thanks very much for the quick response - to help others - I got this message when somebody moved the training file away from the filename we were expecting. So we were trying to classify with an unloaded classifier - dlib was not at fault in anyway

Davis King said...

Yeah that's a problem :) You should have gotten an exception though when you tried to read the file.

Stefanelus said...

hey Davis,

when cmake the dlib there is any way to force looking for cuda, in most of the case the dlib is not build agaist cuda.

many thanks

Davis King said...

CMake looks for cuda by default. There is nothing you need to do to for it to look for it.

Sobhan Mahdavi said...

Oh, you are right. Thank you for your great work.

Chris Underwood said...

Mr. King,

In your blog post you mentioned that you created a small 894 image dataset and annotated the fronts and rears of cars and used it to train a 2-class detector. Is that dataset available for download?

I'm interested in taking advantage of the multiclass training and detection that you have implemented, in this iteration of dlib, in my own project.