Saturday, September 23, 2017

Fast Multiclass Object Detection in Dlib 19.7

The new version of dlib is out and the biggest new feature is the ability to train multiclass object detectors with dlib's convolutional neural network tooling.  The previous version only allowed you to train single class detectors, but this release adds the option to create single CNN models that output multiple labels.  As an example, I created a small 894 image dataset where I annotated the fronts and rears of cars and used it to train a 2-class detector.  You can see the resulting detector running in this video:

If you want to run the car detector from this video on your own images you can check out this example program.

I've also improved the detector speed in dlib 19.7 by pushing more of the processing to the GPU. This makes the detector 2.5x faster.  For example, running the detector on the 928x478 image used in this example program ran at 39fps in the previous version of dlib, but now runs at 98fps (when run on a NVIDIA 1080ti).

This release also includes a new 5-point face landmarking model that finds the corners of the eyes and bottom of nose:

Unlike the 68-point landmarking model included with dlib, this model is over 10x smaller at 8.8MB compared to the 68-point model's 96MB.  It also runs faster, and even more importantly, works with the state-of-the-art CNN face detector in dlib as well as the older HOG face detector in dlib.  The central use-case of the 5-point model is to perform 2D face alignment for applications like face recognition.  In any of the dlib code that does face alignment, the new 5-point model is a drop-in replacement for the 68-point model and in fact is the new recommended model to use with dlib's face recognition tooling.


Anh Tuấn Hoàng said...

Thank Davis King about the library. It helps me in my work.

Bill Klein said...

Great new stuff. You say that the "new 5-point model is a drop-in replacement for the 68-point model and in fact is the new recommended model to use with dlib's face recognition tooling." However, two questions:

- Is it recommended because the results are better or just because it's faster/lightweight?

- I know you say that it is a drop-in replacement, but does that mean that a face aligned in with the 68-point model can be compared directly (distance between descriptors) to a face aligned with the 5-point model without fear of any issues?


Davis King said...

The results should in general be the same, but it's faster and smaller. The alignment should actually be slightly more accurate in general, but not by a lot. The real benefit is speed, size, and ability to use it with the CNN face detector in addition to the HOG detector.

Yes, you can just replace the old shape model with the new model in any face recognition code that used the old one and it will work. I specifically made this new model to be a replacement for the old one. It will create the same kind of alignment as the old model and work with the previously trained face recognition model.

erm said...

Hello Davis King,

I was trying to compile the new release of dlib and I am having some inconvenients that I want to share with you.

Compiling on Windows
I used "dnn_face_recognition_ex.cpp" as test code. I had no problem compiling it using dlib-19.3 and dlib-19.4 in Visual Studio 2015 with cuda 8, but with dlib-19.7 I had the following errors:

1) dlib.lib(gpu_data.obj) : error LNK2005: already defined "void __cdecl dlib::memcpy(class dlib::gpu_data &,class dlib::gpu_data const &)" (?memcpy@dlib@@YAXAEAVgpu_data@1@AEBV21@@Z) in dnn_face_recognition_ex.obj

2) dlib.lib(gpu_data.obj) : error LNK2005: already defined "public: void __cdecl dlib::gpu_data::set_size(unsigned __int64)" (?set_size@gpu_data@dlib@@QEAAX_K@Z) in dnn_face_recognition_ex.obj

I tried using cudnn5 and 7 (no diference) and using the CMakeLists.txt in dlib folder from an older version (other errors appeared) that worked correctly for me.

I was wondering if maybe we have to follow different steps in order to compile this new version, or maybe the minimum requirements of the required software have changed or maybe something happens with Policy CMP0007, because I had a warning that said it was not set.

Compiling on Linux
On Linux I had no problem to compile and run dlib-19.3 and 19.4 in the past. Now with dlib-19.7 it appears the old problem of #define DLIB_JPEG_SUPPORT. When I run the cmake it does successfully, I checked if the DLIB_JPEG_SUPPORT was ON and if the code entered (in CMakeLists) in the JPEG FOUND statement and if the libjpeg library was found and all was right. Then the build at Release mode is also made correctly. But when I ran the code I had the problem of unable to load jpeg images because of the DLIB_JPEG_SUPPORT :( This just can be solved if I put a #define DLIB_JPEG_SUPPORT at the top of the cpp code.
Here I was wondering if something changed compared to previous releases, this is a bit strange to me because I had no problem with them.

Sorry for this long and boring text and thank you very much for your time and effort :)

Davis King said...

Nothing has changed in how dlib is built. You must just be making some kind of mistake. Follow the instructions at the top of this page to compile the example programs: Read the example cmakelists.txt file.

erm said...

Thank you for your fast answer and for your time. At least now I now that nothing has changed. I keep trying it. Regards!

Anh Tuấn Hoàng said...
This comment has been removed by the author.
Anh Tuấn Hoàng said...

Hi Davis King, Can you give me some advice about system specification?

Davis King said...

Get a NVIDIA 1080ti.

Phil said...

Long time user - first time writer. Thanks very much for your code.
We have built and used dlib in many situations (CPU and GPU) on many systems,
We are running your classifier as serialized in the code,
but on one particular Windows box, when we run face_detection (close enough to dnn_mmod_face_detection_ex), we get the following error:
Error detected at line 682.
Error detected in file e:\src\9.0-2017\_extrnheaders\dlib\dnn/loss.h.
Error detected in function void __cdecl dlib::loss_mmod_::to_label,class dlib::add_layer,class dlib::add_layer,class dlib::add_layer,class dlib::add_layer,class dlib::add_layer,class dlib::add_layer,class dlib::input_rgb_image_pyramid >,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,1,void>,classstd::vector >*>(const class dlib::tensor &,const class dlib::dimpl::subnet_wrapper,class dlib::add_layer,class dlib::add_layer,class dlib::add_layer,class dlib::add_layer,class dlib::add_layer,class dlib::add_layer,class dlib::input_rgb_image_pyramid >,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,void>,1,void> &,class std::vector > *,double) const.

Failing expression was output_tensor.k() == (long)options.detector_windows.size().

Any hints as to what could be a machine dependency here? This seems to me to be entirely software defined.
BTW, we are definitely seeing the 25x speedup with the GPU - great job!

Davis King said...

Thanks, glad you like dlib :)

This should definitely not happen and there shouldn't be anything machine specific in the code. If I had to guess I would check if there is something wrong with the GPU that is causing it to output empty tensors, which itself shouldn't happen, but maybe something is horribly wrong with CUDA on that machine.

Phil said...

Thanks very much for the quick response - to help others - I got this message when somebody moved the training file away from the filename we were expecting. So we were trying to classify with an unloaded classifier - dlib was not at fault in anyway

Davis King said...

Yeah that's a problem :) You should have gotten an exception though when you tried to read the file.

Stefanelus said...

hey Davis,

when cmake the dlib there is any way to force looking for cuda, in most of the case the dlib is not build agaist cuda.

many thanks

Davis King said...

CMake looks for cuda by default. There is nothing you need to do to for it to look for it.

Sobhan Mahdavi said...

Oh, you are right. Thank you for your great work.

Chris Underwood said...

Mr. King,

In your blog post you mentioned that you created a small 894 image dataset and annotated the fronts and rears of cars and used it to train a 2-class detector. Is that dataset available for download?

I'm interested in taking advantage of the multiclass training and detection that you have implemented, in this iteration of dlib, in my own project.

Rao M said...

@Davis -- the new 5-point model looks very robust! Would you be able to elaborate on the shape_predictor_trainer params you used to achieve this?

Davis King said...

The exact program that created the model is archived here for posterity:

Davis King said...

To find out what dataset was used to create the car detector, read the example:

Amritanshu Sinha said...

Hi Devis King,
I am using dlib from last one year. Thank u for your great work.

Currently i am using dlib 19.7 for the face detection
dlib.cnn_face_detection_model_v1('mmod_human_face_detector.dat') through python interface.
This algorithm giving better accuracy.
I am using NVIDIA gpu Quadro P5000 16 GB Ram.

I am sending frame in batches. Each frame is 2.2 MB (3072 X 2048).
batchdets = detector(imglist, 1,batch_size = 3)
Upto batch size = 3 ,its working fine. Ones we increase batch size to 4 its giving cuda memory error.
RuntimeError: Error while calling cudaMalloc(&data, new_size*sizeof(float))

Can you suggest how can we increase the batch size for speed up.

Davis King said...

You don't need to make the batch size bigger. You are already giving it a huge amount of work to do, so much that you are running out of GPU RAM.

Tapas said...

Hello Davis,
Thanks for the wonderful work.

Regarding previous query by Mr. Amritanshu Sinha, It looks strange than dlib cannot detect faces from a frames batch of more than 3 images(of 6 Mega pixel each) at one time on a GPU with whopping 16GBs of memory? Please clarify.

Thanks for your time

Davis King said...

3072 X 2048 is a big image. On top of that Amritanshu Sinha upsampled it so it's 4x bigger. I don't know what you expect. It's obviously going to use a lot of RAM to process such a huge image.

Tapas said...

Dear Davis,

Thanks for the reply. We are trying to do face recognition on a outdoor 6MP camera with 12mm lens. More megapixel means more clear faces and we need to process at least 15 fps. Can you suggest any workaround ?


Edoardo Gattei said...

Awesome! Man I would so need an official Android porting of this

Rao M said...

@Davis, thanks for the pointer to the training params for the 5 point model. I notice the image flipping you're doing in here as well. Is this trained with the same dataset as the 100MB model from the original implementation?

Davis King said...

No, it's an entirely unrelated dataset. I created this new 5 point dataset myself with dlib's imglab tool.

Rao M said...

Any details on size of dataset, image size, person and pose variation, etc? I'm always curious what you're using to get such robust performance of the models you create!

Davis King said...

It's all documented here:

C-x-s said...

Hi Davis,

great work you are doing!

I have one question: I tried to train a dnn_mmod using another dataset with more than 2 classes but the training fails completely (1 0 0)
I have a static camera and moving objects away and towards the camera. So the scale of the objects is changing - thought your pyramid input should help here - and also the aspect ratios. The trainer complains a bit about the aspect ratios...
Nevertheless, do you have a quick tip for me how to target the problem?

Thanks in advance!

Davis King said...

Thanks, glad you like dlib. See

ethaniel said...

Hi Davis,

Have you ever heard of FindFace? It's a system made by a russian company NTechLab which allows users to instantly find people on the russian social network VK. They have a database of 500 million photos and they can return extremely accurate results within 2-3 seconds.

Here is how they work: they have trained a neural network to detect and output 300 facial features. They say that they have 1.5 kilobytes of data per face. Do you think DLib could do this some time in the future?

Once they have the facial features they store them in a database which can then be easily accessed through indexes.

Davis King said...

I haven't heard of FindFace. But if you get your hands on 500 million faces you can certainly train a model using dlib from such data.

Tsai Joy said...

Dear Davis:
After training on 1080 ti with CUDA I get this error:

Error while calling cudaMalloc(&data, n) in file C:\dlib-19.7\dlib\dnn\cuda_data_ptr.cpp:28. code: 2, reason: out of memory
PS C:\dlib-19.7\examples>

I have already changed cropper batch size from 87 to 20, and every 10 mini-batches do a testing mini-batch. What parameters do I need to change additionally to overcome this error?

The following is the final output steps while training:

step#: 67164 learning rate: 0.0001 train loss: 0.00216064 test loss: 0.00377144 steps without apparent progress: train=6135, test=989

done training
dnn_trainer details:
net_type::num_layers: 21
net size: 0.938306MB
net architecture hash: 53d6dea8baae770fc4ed0b8ed8c88dcd
loss: loss_mmod (detector_windows:(68x70,67x70), loss per FA:1, loss per miss:1, truth match IOU thresh:0.5, overlaps_nms:(0.1,0.1), overlaps_ignore:(0.5,0.95))
synchronization file: mmod_cars_sync
trainer.get_solvers()[0]: sgd: weight_decay=0.0001, momentum=0.9
learning rate: 1e-05
learning rate shrink factor: 0.1
min learning rate: 1e-05
iterations without progress threshold: 50000
test iterations without progress threshold: 1000
random_cropper details:
chip_dims.rows: 350
chip_dims.cols: 350
randomly_flip: true
max_rotation_degrees: 2
min_object_size: 0.2
max_object_size: 0.7
background_crops_fraction: 0.5
translate_amount: 0.1

sync_filename: mmod_cars_sync
num training images: 9
training results: 1 0.555556 0.555556
Error while calling cudaMalloc(&data, n) in file C:\dlib-19.7\dlib\dnn\cuda_data_ptr.cpp:28. code: 2, reason: out of memory
PS C:\dlib-19.7\examples>

Tsai Joy said...

As a follow up I commented out these lines:
//upsample_image_dataset>(images_train, boxes_train, 1800*1800);
//upsample_image_dataset>(images_test, boxes_test, 1800*1800);
and the CUDA memory error bellow was solved:
Error while calling cudaMalloc(&data, n) in file C:\dlib-19.7\dlib\dnn\cuda_data_ptr.cpp:28. code: 2, reason: out of memory
PS C:\dlib-19.7\examples>
FYI my largest image size is 1400x1600 training on 1080 ti. So I guess 1800x1800 is still too high for the limit.

Vadim Pepelka said...

Hi Davis,

Thank you for your library. I try to apply multiclass CNN detector for OCR purposes. I've found that in some cases orientation of detector windows was changed. I suspect that the reason is a bug in the file loss.h lines 432-433 (442-443):
if (detector_width < min_target_size)
detector_width = min_target_size;
detector_height = min_target_size/ratio;

Davis King said...

Yes, the option setup code was a little bit wonky in 19.7. Use the latest dlib from github and it will do the right thing.

Amritanshu Sinha said...

Hi Davis,

I am using dlib "dlib_face_recognition_resnet_model_v1.dat" for the feature extraction. We further wants to train the network with some other data set. Can you suggest any way to loads the weights of model "dlib_face_recognition_resnet_model_v1.dat", so that we further train the model with given initial weights.

Thanks in advance..


mehmet ali atici said...

Hi Davis,
I want to train an object keypoints detector instead of the face landmarks. Accuracy is low for the moment. Bounding box of the detected object is not a square and You say to use find_affine_transform function in shape_predictor.h file but how can I do this for Python module? thanks.

Mathew i said...

Hi Davis,
Thank you for all your algorithms to the developers world.

Have you already released the dataset you used to train the 5 point landmark detector?

Davis King said...

Yes, the data is available. See for links and more information about it.