Comments on dlib C++ Library: Easily Create High Quality Object Detectors with Deep Learning

The CNN in this context is f() in equation 3. It&...

2020-05-22T22:06:23.390-04:00

The CNN in this context is f() in equation 3. It's some big function, with a bunch of parameters, that given an image and a location in that image tells you how likely it is there is an object at that location. SGD optimizes over all the parameters of f(). In the paper f() is linear in the parameters, but in the CNN it's not. That's the only difference.

Hi Davis! I have a question in regards to the SGD...

2020-05-22T10:39:01.186-04:00

Hi Davis!

I have a question in regards to the SGD training of the CNN.

In the MMOD paper in equation (8) we have the optimization problem at hands.
I understand this optimization for linear models of the type showed in equation (3), but what does the model parameters w mean in the case of a CNN? Is it the actual filter weights?

My interpretation is that phi in eqation (3) is the CNN, and w is the SVM parameters. But if this is the case, how is the CNN trained?

Sorry if it is an unclear question.

The answers to your questions are all discussed in...

2019-01-29T07:40:30.599-05:00

The answers to your questions are all discussed in detail in the dlib documentation. In particular, this blog post you are commenting discusses some of it with links to additional relevant documentation.

I'm new to dlib and object detection (faces to...

2019-01-28T10:28:31.700-05:00

I'm new to dlib and object detection (faces too).
Recently read article and article and have some questions:
1. can't figure out what network is used in DLIB
2. is there any difference in using other networks with dlib? e.g. if I train a network (SSD+MobileNet, Faster R-CNN with MobileNet/Inception/Resnet, YOLO, etc.) what are specifications to use them?
3. how to optimize network and dlib for accuracy and speed?

The problems I have for now:
1. a long time for "detector", too much CPU usage
2. can not use the same "detector" in parallel (threads or tbb tasks for example)

Will appreciate any help or suggestions.
Thanks in advance.

Thank you very much!

2018-10-09T12:55:59.491-04:00

Thank you very much!

I forget how long it took. Probably under a day o...

2018-10-09T07:43:22.783-04:00

I forget how long it took. Probably under a day on a 1080ti.

How long did it take to train the dataset of ~7000...

2018-10-09T05:39:30.400-04:00

How long did it take to train the dataset of ~7000 faces?

Yes, that's right. There are multiple output...

2018-09-09T09:36:59.370-04:00

Yes, that's right.

There are multiple output channels to support multiple output box shapes and types.

Those visualizations of the image pyramid and the ...

2018-09-09T09:12:55.396-04:00

Those visualizations of the image pyramid and the heatmaps are exactly what I was looking for, thanks!

So if I understand correctly: if you had a sliding window of size 40x40, for example, but some of your labeled images had bounding boxes of size 80x80, then the sliding window would detect those larger labels in the lower levels of the image pyramid, and the bounding box would be scaled up accordingly on the output?

And one more question: If the output of the CNN in only one channel, then how is the bounding box information (width, height, coordinates) retrieved to compare to ground truth and compute the IoU?

Entire images are passed in. See http://blog.dlib...

2018-09-09T08:42:51.910-04:00

Entire images are passed in. See http://blog.dlib.net/2017/08/vehicle-detection-with-dlib-195_27.html for more details. Also go run the code and look at the images produced by the random cropper.

Hi Davis, I have a question about the sliding win...

2018-09-08T15:03:07.706-04:00

Hi Davis,

I have a question about the sliding window detection with MMOD. Specifically, one of your examples states that the receptive field of the CNN in 50x50 pixels, but then the random cropper is selecting random dimensions in the range 40x40-270x270. Are 50x50 windows of the cropped image being passed into the network one at a time, or is it the entire cropped image?

Maybe I have some misunderstanding, but I appreciate your help!

Hi Davis, I have a question about the sliding win...

2018-09-08T15:03:04.591-04:00

Yes, the DNN scans the image, more or less like HO...

2018-05-17T09:27:24.366-04:00

Yes, the DNN scans the image, more or less like HOG. There isn't much you can do to change the speed of the CNN without changing the network and retraining.

Yes, the CNN outputs confidence. Look at the documentation. It's all described in detail.

Hello Davis, I really like Dlib and I have used t...

2018-05-16T23:29:28.656-04:00

Hello Davis,
I really like Dlib and I have used this library for training my data for several months.
I have two questions about dnn face detection.

1. Does dnn face detector scans the image just like HoG detector did?
The scanning speed of HoG detection can be set by tuning scanner's parameters.
The code is something like this:
--
typedef dlib::scan_image_pyramid > image_scanner_type;
image_scanner_type scanner;
scanner.copy_configuration ( HoGDetector.get_scanner() );
scanner.set_max_pyramid_levels ( 3 );
--
But how about dnn detector?
I want to set the parameters of the scanner for tuning dnn scanning speed.

2. When I used dnn detector, can I get the confidence score of each sliding window?
Or can I get the detected boxes which the confidence is lower than 0?

Thank you.

Hi, how can I compile it by using mingw ? g++ -...

2018-04-16T13:49:49.243-04:00

Hi, how can I compile it by using mingw ?

g++ -std=c++11 -O3 -I.. ../dlib/all/source.cpp -lpthread -lX11 example_program_name.cpp

produce lots of compile error.

Hi Davis, I have problem on object detection, I re...

2018-03-26T20:58:59.503-04:00

Hi Davis,
I have problem on object detection, I really appreciate if there are solution using dlib.
I am currently using yolov2 and yolo-densenet for object detection, the result is not good on distinguishing objects with similar or same shape(but different color), even though I tried modifying parameters on data augmentation(hue,exposure, saturation, etc ) and other parameters, the result is still very bad. Perhaps because I could not get so much images for each class(currently only 8 images for each class), I am not sure, it is not easy to get so much images for each class, as we have more than 60,000 classes!

I remember that I have used the object detector sample in dlib 2 years ago, it is robust for color, however I used only for 1 single object detection, could you please give me some advice that if I want to detect objects with huge classes(10,000 at least) , thank you so much.

Thats awesome !! I am running it now on Jetson TK1...

2018-03-19T13:30:22.550-04:00

Thats awesome !! I am running it now on Jetson TK1 like charm but when face is being recognized on live camera feed, it gets slower. Otherwise its real-time. I am quite suspicious whether its using CPU or GPU to process the recognition ResNet model..

Anything with architecture 3.0 or newer should be ...

2018-03-19T13:02:12.925-04:00

Anything with architecture 3.0 or newer should be fine.

Awesome work Davis. I just want to know what is th...

2018-03-19T12:59:45.100-04:00

Awesome work Davis. I just want to know what is the minimum CUDA requirement for running dlib on GPU. I am planning to run it on Jetson TK1 which has a maxwell arch (Compute arch 3.2) and supports CUDA 6.5

Hello , excuse me , can anyone teach me how to ca...

2018-03-04T07:42:20.676-05:00

Hello , excuse me ,
can anyone teach me how to calculate how many layer does this DNN model ?

You can see the network definition in the example ...

2018-03-01T06:57:41.133-05:00

You can see the network definition in the example program: http://dlib.net/dnn_mmod_face_detection_ex.cpp.html. For instance, there are 7 convolution layers.

Excuse me , I want to know how many layer does thi...

2018-03-01T01:15:14.386-05:00

Excuse me , I want to know how many layer does this DNN face detection model ??

Look at the object_dector object. You can pull out...

2018-02-21T07:59:47.171-05:00

Look at the object_dector object. You can pull out whatever parts you want and pack them into a new object_detector. You don't need to retrain. http://dlib.net/dlib/image_processing/object_detector_abstract.h.html

Hi, I studied the examples of face detection and t...

2018-02-21T05:14:52.636-05:00

Hi, I studied the examples of face detection and training using hog. I want to speed up the face detection by using only 3 from 5 detectors from your pretrained model. I was hoping that after deserializing I get std::vector but it is only one object detector (object_detector > >) when I look at frontal_face_detector.h
It is even possible to use only 3 detectors (I want only frontal faces, not side looking)? Or I have to train it on my own?
thanks

Way ahead of you :), see http://blog.dlib.net/2014...

2018-01-26T09:04:05.994-05:00

Way ahead of you :), see http://blog.dlib.net/2014/04/dlib-187-released-make-your-own-object.html