Saturday, August 13, 2016

Dlib 19.1 Released

cuDNN 5.1 is out and it isn't completely backwards compatible with cuDNN 5.0 due to a bug in cuDNN 5.1.  For the curious, in cuDNN 5.1 cudnnGetConvolutionBackwardFilterAlgorithm() will select the winograd algorithm even when the conv descriptor has a stride not equal to 1, which is an error according to the cuDNN documentation.  If you then try to run the winograd algorithm, which is what cudnnGetConvolutionBackwardFilterAlgorithm() says to do, it leads to the wrong outputs and things don't work.  Fortunately, this was detected by dlib's unit tests :)

Therefore, dlib has been updated to work with cuDNN 5.1 and hence we have a dlib 19.1 release, which you can download from dlib's home page.

I also recently realized that the fancy std::async() in C++11, an API for launching asynchronous tasks, is not backed by any kind of load balancing at all.  For example, if you call std::async() at a faster rate than the tasks complete then your program will create an unbounded number of threads, leading to an eventual crash.  That's awful.  But std::async() is a nice API and I want to use it.  So dlib now contains dlib::async() which has the same interface, except instead of the half baked launch policy as the first argument, dlib::async() takes a dlib::thread_pool, giving dlib::async() all the bounded resource use properties of dlib::thread_pool.  Moreover, if you don't give dlib::async() a thread pool it will default to a global thread pool instance that contains std::thread::hardware_concurrency() threads.  Yay.

23 comments :

Rasel said...

Hi,

Thanks for nice and easy to use library. I was wondering how can I store the trained svm model and later used that fro prediction. I am particularly following the "svm_ex" example.

Thanks

Davis King said...

The svm_ex talks about how to do exactly that.

TripleKimchi said...

I'm having trouble with compiling dnn_introduction_ex.cpp

since dnn libraries require c++11, I'm trying to build it with MinGW.

it seems that mingw's std::thread::hardware_concurrency() is not implemented.

can you give me advice to replace it?

Currently I replaced it with number 0.

Davis King said...

You need a C++11 compiler. std::thread::hardware_concurrency() is part of C++11. I suppose you could replace it with the number of cores on your machine though. But you are much better off using a proper C++11 compiler.

DAXIA said...

A question about shape_predictor training
I have used all the parameters according to the paper "one millisecond ..." to train a shape predictor using HELEN dataset. The result seems reasonable, I get a 0.015 training error and a 0.061 test error. However, there is still a gap compared with the paper, which reported a test error of 0.049. I have tried several parameter combination but there is always a performance gap. I was wondering if your test result for 68 points shape prediction got comparable performance with the paper. And which do you think would be the most likely to be neglected parameters for an experiment?

Davis King said...

That's what I remember getting as well. I was never able to reproduce the 0.049 from the paper.

DAXIA said...

After modifying an error in preparing the bounding box I get the error rate 0.054 or 0.055, still a gap here. Thanks for your information.

HChu said...

Thanks for the excellent work. I tried SVM regression, it's really easy to use compared to LibSVM.

In real production code, we would like to have the dimension of the problem dynamically determined which is not easy when it's a template parameter. I tried to use std::valarray or boost::ublas::vector, but neither would work. Is it possible to do so in the current release?

Davis King said...

http://dlib.net/faq.html#HowdoIsetthesizeofamatrixatruntime

HChu said...

Thank you very much. matrix works.

I have a comment regarding central difference. It's better to replace
der(i) = (delta_plus - delta_minus)/(2*eps);
by
der(i) = (delta_plus - delta_minus)/((old_val+eps)-(old_val-eps));

Davis King said...

Yeah, that's probably a good idea. Do you want to test it out, see what kind of numerical effect it has, and submit a pull request on github? :)

HChu said...

I have performed extensive numerical experiments in the past which show that the formula I suggested is better by comparing with analytical derivatives. The reason is that the arguments you pass to the function are xp = fl(x0+eps) and xm = fl(x0-eps) which are rarely representable as floating point numbers. If |x0| is not extremely small xp-xm will be exact according to Sterbenz lemma, but very unlikely to be equal to 2*eps. Such a technique is used in reputable optimization libraries.

The reason I did extensive tests is that I spent huge amount of efforts in developing algorithms to accurately compute special functions (better than those in the boost library). In the absence of reference function values (some of them are not in boost), I had to rely on numerical derivatives to check how good my implementations are.

Matt said...

Davis, This is *awesome*. This feature looks amazing. I've seen a lot of research/projects on doing this type of thing with CNN, but the code is a mess. Dlib is really an amazing library -- the comments are phenomenal and the documentation makes it so easy to use. Thanks for putting this together!

Have you looked at this research: http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Li_A_Convolutional_Neural_2015_CVPR_paper.pdf

It's basically CNN sliding window detector, but they train different fidelity models (low-res for first pass, and higher-res for subsequent passes). The CPU performance is actually *better* than HOG/VJ/LBP and the GPU performance as well as accuracy is light-years ahead.

Matt said...

Davis, This is *awesome*. This feature looks amazing. I've seen a lot of research/projects on doing this type of thing with CNN, but the code is a mess. Dlib is really an amazing library -- the comments are phenomenal and the documentation makes it so easy to use. Thanks for putting this together!

Have you looked at this research: http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Li_A_Convolutional_Neural_2015_CVPR_paper.pdf

It's basically CNN sliding window detector, but they train different fidelity models (low-res for first pass, and higher-res for subsequent passes). The CPU performance is actually *better* than HOG/VJ/LBP and the GPU performance as well as accuracy is light-years ahead.

Matt said...

Davis, This is *awesome*. This feature looks amazing. I've seen a lot of research/projects on doing this type of thing with CNN, but the code is a mess. Dlib is really an amazing library -- the comments are phenomenal and the documentation makes it so easy to use. Thanks for putting this together!

Have you looked at this research: http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Li_A_Convolutional_Neural_2015_CVPR_paper.pdf

It's basically CNN sliding window detector, but they train different fidelity models (low-res for first pass, and higher-res for subsequent passes). The CPU performance is actually *better* than HOG/VJ/LBP and the GPU performance as well as accuracy is light-years ahead.

Matt said...

Davis, This is *awesome*. This feature looks amazing. I've seen a lot of research/projects on doing this type of thing with CNN, but the code is a mess. Dlib is really an amazing library -- the comments are phenomenal and the documentation makes it so easy to use. Thanks for putting this together!

Have you looked at this research: http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Li_A_Convolutional_Neural_2015_CVPR_paper.pdf

It's basically CNN sliding window detector, but they train different fidelity models (low-res for first pass, and higher-res for subsequent passes). The CPU performance is actually *better* than HOG/VJ/LBP and the GPU performance as well as accuracy is light-years ahead.

Davis King said...

Thanks, I'm glad you like it.

Yes, I've read that paper. It's very nice. Not sure if you have seen this but there is now a CNN detector in dlib: http://blog.dlib.net/2016/10/easily-create-high-quality-object.html

winter wang said...

Davis, thank your dlib work. i have got a surprised result after test your face dector and landmark.i want to know what datasets you used for face detector model trainning? thanks. in your blogs,i see iBug 300W dataset is used for landmark regressor trainning, but dataset used for detector training is not mentioned.

Davis King said...

The HOG detector was trained with this dataset: http://dlib.net/files/data/dlib_face_detector_training_data.tar.gz

winter wang said...

thanks a lot!

winter wang said...

Davis, the landmark points are not stable and has jitter effect in realtime video or recorded video clips. i think there are several reasons, one is video frame noise caused by hardware and environment, another is the lack of prediction current frame shape from previous shape. if the current frame shape is regressed from previous shape, then landmark point will be more stable than regress every frame as an image independently. maybe there need another regressor for shape tracking between consecutive frames different the one for still image. do you agree it? if it is needed, training data is generate from still image or should use real video sequence, such as 300VW dataset. Or are there any advice for overcoming the jitter effect in realvideo?
thanks a lot!

Davis King said...

There are many ways to reduce jitter, all depend on the specifics of your situation. The simplest is to buy a better camera. If you want algorithm ideas I would look at CVPR papers about this subject.

winter wang said...

thank your answer!