Comments on dlib C++ Library: Fast Multiclass Object Detection in Dlib 19.7

I'm writing this here, so other people can see...

2019-08-13T00:02:56.008-04:00

I'm writing this here, so other people can see the answer: I just compiled DLib w/ the cuda stuff and cuDNN, etc, and it built fine. Now, what's the way I pass my cv::cuda::GpuMat to dlib to get it to detect? Do I have to call download() on it first, to get it into CPU memory? Would it be faster to use OpenCV's cuda-based hog detector? (because it does take a GpuMat)! anybody have timing comparisons for GPU-based face detection? I'm so confused about which might be better, i'm now looking on NVidia's site for their face-detection stuff.

Hi @davis, by any chance does dlib 19.17 use threa...

2019-03-15T14:58:45.395-04:00

Hi @davis, by any chance does dlib 19.17 use threading to speed up face detection?

@sumit perhaps this will helpful https://www.pyima...

2018-07-16T16:21:47.187-04:00

@sumit perhaps this will helpful
https://www.pyimagesearch.com/2018/04/02/faster-facial-landmark-detector-with-dlib/

Hey Davis, Is the 5 point Landmark Detection impl...

2018-07-03T05:23:45.755-04:00

Hey Davis,

Is the 5 point Landmark Detection implements the same algorithms as the 68 point Landmark Detection Model? Or this one is based on something different.

Hi Davis! Could you share FAR FRR for dlib?

2018-07-02T09:04:21.181-04:00

Hi Davis!
Could you share FAR FRR for dlib?

Hi Davis, Can we use varying aspect ratios of bou...

2018-06-01T08:42:05.781-04:00

Hi Davis,

Can we use varying aspect ratios of bounding box for each label?

Turn on compiler optimizations and link to the Int...

2018-04-11T07:09:42.803-04:00

Turn on compiler optimizations and link to the Intel MKL.

I used your model to detect the sample graph you g...

2018-04-11T06:38:17.972-04:00

I used your model to detect the sample graph you gave. It takes 6 seconds to test a graph with the CPU. Is there any way to make it faster?

http://dlib.net/dlib/dnn/input_abstract.h.html#EXA...

2018-04-02T15:45:49.495-04:00

http://dlib.net/dlib/dnn/input_abstract.h.html#EXAMPLE_INPUT_LAYER

I'm not sure what you are referring to as inpu...

2018-04-02T15:16:07.866-04:00

I'm not sure what you are referring to as input layer. Is it

line 308 of https://github.com/davisking/dlib/blob/master/dlib/dnn/input.h

line 2714 of https://github.com/davisking/dlib/blob/master/dlib/dnn/core.h

or something else?

Look at the input layer's code. It's not ...

2018-04-02T14:28:53.020-04:00

Look at the input layer's code. It's not just copying the data to the tensor. You have to replicate its behavior.

Hi Davis, I just tried using the operator() we sp...

2018-04-02T13:57:20.194-04:00

Hi Davis,

I just tried using the operator() we spoke about, but I am running into a snag.

My prototype code, which works, looked like this:

net_type net;

//frame is a cv::Mat of type CV_8UC3

dlib::cv_image cvimg(frame);
dlib::matrix img;
dlib::assign_image(img, cvimg);

auto mmod_rectangles = net(img);

I would like my productized code to look something like:

unsigned char* devPtr = ...; // pointer to CUDA memory on GPU, where input image data already lives.

dlib::resizable_tensor tensorInput;
tensorInput.set_size(1, 3, h, w); //h, w are height and width of image

myFancyConversionKernel<<<...>>>(tensorInput.device_write_only(), devPtr);

//Here, myFancyConversionKernel is responsible for:
// 1) Convert from uchar8 pixel data to float32 in the range of 0.0f to 255.0f.
// 2) Deinterleave the RGB channels so that tensorInput contains data in planar format.
std::vector> mmod_rectangles;
net(tensorInput, std::back_insert_iterator(mmod_rectangles));

The new code build, runs, but does not seem to produce any face bounding box (i.e. mmod_rectangles.size() == 1, mmod_rectangles.at(0).size() == 0).

Is there anything that stands out as incorrect in what I am doing? I am uncertain about converting from uchar8 to float32, since in the prototype code, I did not perform any explicit conversion. I only did this conversion because it seems that dlib::resizable_tensor only supports float32 numerical format.

Thank you,

Dalei

That's right. You could also pass a tensor to...

2018-04-01T10:08:39.274-04:00

That's right. You could also pass a tensor to any of the other functions of the immediate sublayer, of which there are many options. But the operator() you mentioned is as good as any.

Hi Davis, Thank you for your reply. Regarding you...

2018-04-01T09:49:25.990-04:00

Hi Davis,

Thank you for your reply. Regarding your suggestion of passing a tensor to the network, my understanding is that the net work is really an object of dlib::loss_mmod templated class. dlib::loss_mmod is itself an alias of an instantiation of dlib::add_loss_layer class. dlib::add_loss_layer class has an operator() that takes a dlib::tensor as input, and that is the function you are referring to. Is my analysis correct?

Thank you
Dalei

You could write a custom input layer that takes in...

2018-04-01T08:13:39.607-04:00

You could write a custom input layer that takes input from your other source, which shouldn't be a big deal. You can also just call one of the network's member functions that takes a tensor as input rather than a matrix.

All the network computations run on the default CUDA stream. But you can just use per-thread default streams. Read the CUDA docs for details.

Hi Davis, I am trying to do something similar to ...

2018-03-31T18:37:44.602-04:00

Hi Davis,

I am trying to do something similar to the sample program: dnn_mmod_face_detection_ex.cpp

In the sample, the input to the CNN is a matrix object that is allocated on the host. In my code, prior to calling the net, I have some CUDA kernels that preprocess the image, so the image data is already on the GPU. Is there a way to invoke the CNN on the image data without first copying the image data back to the host?

Also, is there a way to run the CNN in a specified CUDA stream (i.e. the stream I used to run my preprocessing kernels)?

Thank you,

Dalei Wang

Yes, you can train whatever you want.

2018-03-14T20:40:54.295-04:00

Yes, you can train whatever you want.

Hello Davis Nice to see the vehicle ...

2018-03-14T13:44:52.083-04:00

Hello Davis

Nice to see the vehicle detector after face detector :) I would like to know, if we can train dlib to detect different classes - two wheeler, four wheeler, pedestrian.

Thanks
Bhomik

Yes, that's what the boxes are.

2018-03-07T17:31:27.666-05:00

Yes, that's what the boxes are.

Currently looking at train_face_5point_model and t...

2018-03-07T16:21:18.809-05:00

Currently looking at train_face_5point_model and the associated data-set dlib_faces_5points.tar... I notice that each file entry has two bounding boxes specified. Am I right in thinking that these are simply the two different bounding boxes detected by the CNN and the HOG detector? Otherwise, what do the two boxes represent? Thanks!

Loading more data than will fit on your GPU.

2018-02-22T21:21:49.080-05:00

Loading more data than will fit on your GPU.

HI Davis, What cases would cause this message &q...

2018-02-22T19:40:03.920-05:00

HI Davis,

What cases would cause this message "Not enough memory to handle tile data" on a GPU box?

Thank you!

Ok. I will try to find what the problem is. Thank ...

2018-02-01T10:11:12.758-05:00

Ok. I will try to find what the problem is. Thank you.

It should be much better than that. I don't k...

2018-02-01T09:50:46.772-05:00

It should be much better than that. I don't know what the problem is, but you aren't doing something right :)

I mean that is less accurate than reported in the ...

2018-02-01T09:44:27.328-05:00

I mean that is less accurate than reported in the paper. For example these images:

Original result from paper

https://ibb.co/mjW49R

The model I have trained obtain this result

https://ibb.co/dwwnUR

I supposed that the problem is on the training phase. Any parameter must be tuned? Any other idea? Thank you in advance.