Saturday, August 13, 2016

Dlib 19.1 Released

cuDNN 5.1 is out and it isn't completely backwards compatible with cuDNN 5.0 due to a bug in cuDNN 5.1.  For the curious, in cuDNN 5.1 cudnnGetConvolutionBackwardFilterAlgorithm() will select the winograd algorithm even when the conv descriptor has a stride not equal to 1, which is an error according to the cuDNN documentation.  If you then try to run the winograd algorithm, which is what cudnnGetConvolutionBackwardFilterAlgorithm() says to do, it leads to the wrong outputs and things don't work.  Fortunately, this was detected by dlib's unit tests :)

Therefore, dlib has been updated to work with cuDNN 5.1 and hence we have a dlib 19.1 release, which you can download from dlib's home page.

I also recently realized that the fancy std::async() in C++11, an API for launching asynchronous tasks, is not backed by any kind of load balancing at all.  For example, if you call std::async() at a faster rate than the tasks complete then your program will create an unbounded number of threads, leading to an eventual crash.  That's awful.  But std::async() is a nice API and I want to use it.  So dlib now contains dlib::async() which has the same interface, except instead of the half baked launch policy as the first argument, dlib::async() takes a dlib::thread_pool, giving dlib::async() all the bounded resource use properties of dlib::thread_pool.  Moreover, if you don't give dlib::async() a thread pool it will default to a global thread pool instance that contains std::thread::hardware_concurrency() threads.  Yay.

Sunday, June 26, 2016

A Clean C++11 Deep Learning API

Dlib 19.0 is out and it has a lot of new features, like new elastic net and quadratic program solvers. But the feature I'm most excited about is the new deep learning API. There are a lot of existing deep learning frameworks, but none of them have clean C++ APIs. You have to use them through a language like Python or Lua, which is fine in and of itself. But if you are a professional software engineer working on embedded computer vision projects you are probably working in C++, and using those tools in these kinds of applications can be frustrating.

So if you use C++ to do computer vision work then dlib's deep learning framework is for you. It makes heavy use of C++11 features, allowing it to expose a very clean and lightweight API. For example, the venerable LeNet can be defined in pure C++ with a using statement:

LeNet

    using LeNet = loss_multiclass_log<
                                fc<10,        
                                relu<fc<84,   
                                relu<fc<120,  
                                max_pool<2,2,2,2,relu<con<16,5,5,1,1,
                                max_pool<2,2,2,2,relu<con<6,5,5,1,1,
                                input<matrix<unsigned char>>>>>>>>>>>>>>;

Then, using it to train and test a neural network looks like this:

    LeNet net;
    dnn_trainer<LeNet> trainer(net);
    trainer.set_learning_rate(0.01);
    trainer.set_min_learning_rate(0.00001);
    trainer.set_mini_batch_size(128);
    trainer.train(training_images, training_labels);
    // Ask the net to predict labels for all the testing images
    auto predicted_labels = net(testing_images);

Dlib will even automatically switch to lower learning rates when the training error stops improving, so you won't have to fiddle with learning rate schedules. The API will certainly let you do so if you want that control. But I've been able to train a number of state-of-the-art ImageNet models without any manual fiddling of learning rates, which I find to be very convenient.

Depending on how you compile dlib, it will use either the CPU or cuDNN v5. It also supports using multiple GPUs during training and has a "fast mode" and a "low VRAM" mode. Compared to Caffe, dlib's fast mode is about 1.6x times faster than Caffe but uses about 1.5x as much VRAM, while the low VRAM mode is about 0.85x the speed of Caffe but uses half the VRAM as Caffe. So dlib's new deep learning API is fast but can also let you run larger models in the same amount of VRAM if you are VRAM constrained.

It's also fully documented. The basics are covered in this tutorial and then more advanced concepts are covered in a follow on tutorial. These tutorials show how to define LeNet and ResNet architectures in dlib and another tutorial shows how to define Inception networks. And even more importantly, every function and class in the API is documented in the reference material. Moreover, if you want to define your own computational layersloss layers, input layers, or solvers, you can because the interfaces you have to implement are fully documented.

I've also included a pretrained ResNet34A model and this example shows how to use it to classify images. This pretrained model has a top5 error of 7.572% on the 2012 imagenet validation dataset, which is slightly better than the results reported in the original paper Deep Residual Learning for Image Recognition by He, Zhang, Ren, and Sun. Training this model took about two weeks while running on a single Titan X GPU.

To use the new deep learning tools, all you need to install is cuDNN v5.  Then you can compile the dlib example programs using the normal CMake commands.  There are no other dependencies. In fact, if you don't install cuDNN CMake will automatically configure dlib to use only the CPU and the examples will still run (but much slower).  You will however need a C++11 compiler, which precludes current versions of visual studio since they shamefully still lack full C++11 support.  But any mildly recent version of GCC will work.  Also, you can use visual studio with the non-DNN parts of dlib as they don't require C++11 support.

Finally, development of this new deep learning toolkit was sponsored by Systems & Technology Research, as part of the IARPA JANUS project. Without their support and feedback it wouldn't be nearly as polished and flexible. Jeffrey Byrne in particular was instrumental in finding bugs and usability problems in early versions of the API.





Friday, June 5, 2015

Reinforcement Learning, Control, and 3D Visualization

Over the last few months I've spent a lot of time studying optimal control and reinforcement learning. Aside from reading, one of the best ways to learn about something is to do it yourself, which in this case means a lot of playing around with the well known algorithms, and for those I really like, including them into dlib, which is the subject of this post.  So far I've added two methods, the first, added in a previous dlib release was the well known least squares policy iteration reinforcement learning algorithm.  The second, and my favorite so far due to its practicality, is a tool for solving model predictive control problems.

There is a dlib example program that explains the new model predictive control tool in detail.  But the basic idea is that it takes as input a simple linear equation defining how some process evolves in time and then tells you what control input you should apply to make the process go into some user specified state.  For example, imagine you have an air vehicle with a rocket on it and you want it to hover at some specific location in the air.  You could use a model predictive controller to find out what direction to fire the rocket at each moment to get the desired outcome.  In fact, the dlib example program is just that.  It produces the following visualization where the vehicle is the black dot and you want it to hover at the green location.  The rocket thrust is shown as the red line:


Another fun new tool in dlib is the perspective_window.  It's a super easy to use tool for visualizing 3D point cloud data.  For instance, the included example program shows how to make this:


Finally, Patrick Snape contributed Python bindings for dlib's video tracker, so now you can use it from Python.  To try out these new tools download the newest dlib release.



Tuesday, February 3, 2015

Python Stuff and Real-Time Video Object Tracking

The new version of dlib is out today. As promised, there is now a full Python API for using dlib's state-of-the-art object pose estimation and learning tools.  You can see examples of this API here and here.  Thank Patrick Snape, one of the main developers of the menpo project, for this addition.

Also, I've added an implementation of the winning algorithm from last year's Visual Object Tracking Challenge.  This was a method described in the paper:
Danelljan, Martin, et al. "Accurate scale estimation for robust visual tracking." Proceedings of the British Machine Vision Conference BMVC. 2014.
You can see some videos showing dlib's implementation of this new tracker in action on youtube:


All these videos were processed by exactly the same piece of software.  No hand tweaking or any funny business.  The only required input (other than the raw video) is a bounding box on the first frame and then the tracker automatically follows whatever is inside the box after that.  The whole thing runs at over 150fps on my desktop.  You can see an example program showing how to use it here, or just go download the new dlib instead :)

I've also finally posted the paper I've been writing on dlib's structural SVM based training algorithm, which is the algorithm behind the easy to use object detector.

Saturday, December 20, 2014

Dlib 18.12 released

I just released the next version of dlib.  This time I added tools for computing 2D FFTs, Hough transforms, image skeletonizations, and also a simple and type safe API for calling C++ code from MATLAB.  Readers familiar with writing MATLAB mex functions know how much of a pain it is, but no longer!  Here is an example of a C++ function callable from MATLAB using dlib's new MATLAB binding API.  You can also compile it with CMake so building it is super easy. There is an example CMake file in the dlib/matlab folder showing how to set it up.  I also used this tool to give the MITIE project a simple MATLAB API. So you can see another example of how easy it is to set this up in the MITIE MATLAB example.  

There are also some fun new things in the pipe for the next dlib release (v18.13).  First, Patrick Snape, one of the main developers of the menpo project, is adding a Python interface to dlib's shape prediction tools. You can follow that over on dlib's github repo.  I'm also working on a single object tracker for OpenCV's Vision Challenge which I plan to include in the next version of dlib.

Saturday, November 15, 2014

Dlib 18.11 released

The new version of dlib is out. This release contains mostly minor bug fixes and usability improvements, with the notable exception of new routines for extracting local-binary-pattern features from images and improved tools for learning distance metrics. See the release notes for further information.

I also recently found out about two particularly interesting projects that use dlib.  The first is menpo, a Python library focused on computer vision which is being developed by a team at Imperial College London.  If you are interested in a Python library that pulls together a bunch of computer vision tools then definitely check it out.  The other interesting project is Ceemple, which is basically an interactive language shell for C++.  They have integrated a bunch of libraries like dlib and OpenCV into it with the general goal of making C++ development feel more rapid and interactive.  So think of something like MATLAB or IPython, but for C++.

Tuesday, October 21, 2014

MITIE v0.3 Released: Now with Java and R APIs

We just made the next release of MITIE, a new DARPA funded information extraction tool being created by our team at MIT. This release is relatively minor and just adds APIs for Java and R.  The project page on github explains how to get started using either of these APIs.  

I want to take some time and explain how the Java API is implemented since, as I discovered while making MITIE's Java API, there aren't clear instructions for doing this anywhere on the internet. So hopefully this little tutorial will help you if you decide to make a similar Java binding to a C++ library.  So to begin, let's think about the requirements for a good Java binding:
  • You should be able to compile it from source with a simple command
  • A user of your library should not need to edit or configure anything to compile the API
  • The compilation process should work on any platform
  • Writing JNI is awful so you shouldn't have to do that
This pretty much leads you to Swig and CMake which are both great tools.  However, finding out how to get CMake to work with Swig was painful and is pretty much what this blog post is about.  Happily, it's possible to do and results in a very clean and easy to use mechanism for creating Java APIs.  In particular, you can compile MITIE's Swig/CMake based Java API using the usual CMake commands:
mkdir build
cd build
cmake ..
cmake --build . --config Release --target install
That creates a jar file and shared library file which together form the MITIE Java API.  Let's run through a little example to see how you can define new Java APIs.  Imagine you have created a simple C++ API that looks like this:
void printSomeString (const std::string& message);

class MyClass {
public:
    std::vector<std::string> getSomeStrings() const;
};
and you want to be able to use it from Java.  You just need to put this C++ API in a header file called swig_api.h and include some Swig commands that tell it what to call std::vector<std::string> in the generated Java API.  So the contents of swig_api.h would look like:
// Define some swig type maps that tell swig what to call various instantiations of
// std::vector.
#ifdef SWIG
%include "std_string.i"
%include "std_vector.i"
%template(StringVector)         std::vector<std::string>;
#endif

#include <string>
#include <vector>

void printSomeString (const std::string& message);

class MyClass {
public:
    std::vector<std::string> getSomeStrings() const;
};
The next step is to create a CMakeLists.txt file that tells CMake how to compile your API.  In our case, it would look like:

cmake_minimum_required (VERSION 2.8.4)

project(example)
set(java_package_name  edu.mit.ll.example)

# List the source files you want to compile into the Java API.  These contain 
# things like implementations of printSomeString() and whatever else you need.
set(source_files my_source.cpp another_source_file.cpp )

# List the folders that contain your header files
include_directories( . )

# List of libraries to link to.  For example, you might need to link to pthread
set(additional_link_libraries pthread)

# Tell CMake to put the compiled shared library and example.jar file into the
# same folder as this CMakeLists.txt file when the --target install option is
# executed. You can put any folder here, just give a path that is relative to
# the CMakeLists.txt file.
set(install_target_output_folder .)

include(cmake_swig_jni)
That's it.  Now you can compile your Java API using CMake and you will get an example.jar and example.dll or libexample.so file depending on your platform.  Then to use it you can write java code like this:
import edu.mit.ll.example.*;
public class Example {
    public static void main(String args[]) {
        global.printSomeString("hello world!");

        MyClass obj = new MyClass();
        StringVector temp = obj.getSomeStrings();
        for (int i = 0; i < temp.size(); ++i)
            System.out.println(temp.get(i));
    }
}
and execute it via:
javac -classpath example.jar  Example.java
java -classpath example.jar;. -Djava.library.path=. Example

assuming the examle.jar and shared library are in your current folder.  Note that Linux or OS X users will need to use a : as the classpath separator rather than ; as is required on Windows.  But that's it!  You just made a Java interface to your C++ library.  You might have noticed the include(cmake_swig_jni) statement though.  That is a bunch of CMake magic I had to write to make all this work, but work it does and on different platforms without trouble.  You can see a larger example of a Java to C++ binding in MITIE's github repo using this same setup.